DFKI-NLP

Research Group at DFKI

German Research Center for Artificial Intelligence (DFKI)

DFKI-NLP is a Natural Language Processing group of researchers, software engineers and students at the Berlin office of the German Research Center for Artificial Intelligence (DFKI). We’re working on basic and applied research in areas covering, among others, information extraction, knowledge base population, dialogue, sentiment analysis, and summarization, across various domains such as health, media, and science. We are particularly interested in core research on learning in low-resource settings, reasoning over larger contexts, and continual learning. We strive for a deeper understanding of human language and thinking, with the goal of developing novel methods for processing and generating human language text, speech, and knowledge. An important part of our work is the creation of corpora, the evaluation of NLP datasets and tasks, and explainability research.

Key topics:

Applied / domain-specific NLP
Evaluation methodology research
Dataset construction, linguistic annotation, synthetic data generation
Learning in low-resource settings and over large contexts
Multilingual NLP
Explainability

Our group forms a part of DFKI’s Speech and Language Technology department led by Prof. Sebastian Möller, and closely collaborates with e.g. the Quality and Usability chair of Technische Universität Berlin, DFKI’s Multilinguality and Language Technology department and the XPlaiNLP group of TU Berlin.

Latest News

September 30, 2025 1 min read

Two papers by DFKI-NLP authors accepted to EMNLP 2025

Two papers from researchers in the TRAILS project have been accepted as Main and Findings papers at the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025). EMNLP will take place November 4-9 in Suzhou, China. The first paper, titled “Multilingual Datasets for Custom Input Extraction and Explanation Requests Parsing in Conversational XAI Systems”, introduces two multilingual datasets in the context of Conversational XAI systems, one for intent recognition, and one for slot filling / input extraction. The second paper investigates political bias in LLMs through exchanging words in minimal sentence pairs with euphemisms or dysphemisms in German claims.

June 11, 2025 1 min read

One paper by DFKI-NLP authors accepted to ACL 2025 Findings

One paper by Qianli Wang and Nils Feldhus has been accepted to the Findings Track at the 63rd Annual Meeting of the Association for Computational Linguistics 2025 (ACL 2025). In the paper, they introduce ZeroCF, a faithful approach for leveraging important words derived from feature attribution methods to generate counterfactual examples in a zero-shot setting. Second, they present a new framework, FitCF, which further verifies aforementioned counterfactuals by label flip verification and then inserts them as demonstrations for few-shot prompting.

May 4, 2025 1 min read

One paper by DFKI-NLP authors accepted to RepL4NLP 2025 (at NAACL 2025)

One paper originating from research in the TRAILS project has been accepted to the 10th Workshop on Representation Learning for NLP (RepL4NLP 2025), co-located with NAACL 2025. In the paper, we reimagine classical probing to evaluate knowledge transfer from simple source to more complex target tasks. Instead of probing frozen representations from a complex source task on diverse simple target probing tasks (as usually done in probing), we explore the effectiveness of embeddings from multiple simple source tasks on a single target task. Our findings reveal that task embeddings vary significantly in utility for coreference resolution, with semantic similarity tasks (e.g., paraphrase detection) proving most beneficial. Additionally, representations from intermediate layers of fine-tuned models often outperform those from final layers.

February 3, 2025 1 min read

One paper by DFKI-NLP authors accepted to COLING 2025

One paper originating from research in the TRAILS project has been accepted to the Main Track at the 31st International Conference on Computational Linguistics 2025 (COLING 2025). In the paper, we present CROSS-REFINE, a generator-critic framework that enhances natural language explanations by refining initial outputs using feedback from a second LLM, outperforming SELF-REFINE and working effectively even with less powerful models.

November 10, 2024 1 min read

One paper by DFKI-NLP authors accepted to EMNLP 2024

One paper from researchers in the DFKI-NLP group has been accepted as a Findings paper at the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). EMNLP will take place November 12-16 in Miami, Florida. The paper presents CoXQL, a dataset for user intent recognition for conversational XAI systems, covering 31 intents, seven of which require filling multiple slots. The paper also presents an improved parsing approach for intent recognition and slot filling on this dataset, which is evaluated using different LLMs.

See all posts

Recent Publications

Multilingual Datasets for Custom Input Extraction and Explanation Requests Parsing in Conversational XAI Systems

Conversational explainable artificial intelligence (ConvXAI) systems based on large language models (LLMs) have garnered considerable …

Qianli Wang, Tatiana Anikina, Nils Feldhus, Simon Ostermann, Fedor Splitt, Jiaao Li, Yoana Tsoneva, Sebastian Möller, Vera Schmitt

PolBiX: Detecting LLMs' Political Bias in Fact-Checking through X-phemisms

Large Language Models are increasingly used in applications requiring objective assessment, which could be compromised by political …

Charlott Jakob, David Harbecke, Patrick Parschan, Pia Wenzel Neves, Vera Schmitt

PolBiX: Detecting LLMs' Political Bias in Fact-Checking through X-phemisms

FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation

Counterfactual examples are widely used in natural language processing (NLP) as valuable data to improve models, and in explainable …

Qianli Wang, Nils Feldhus, Simon Ostermann, Luis Felipe Villa-Arenas, Sebastian Möller, Vera Schmitt

Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution

In this work, we reimagine classical probing to evaluate knowledge transfer from simple source to more complex target tasks. Instead of …

Tatiana Anikina, Arne Binder, David Harbecke, Stalin Varanasi, Leonhard Hennig, Simon Ostermann, Sebastian Möller, Josef Van Genabith

See all publications

Projects

GenKI4Media - Generative AI assistants for the media, cultural and creative sectors

GenAI models have so far only been of limited use to SMEs, as they are not sufficiently adapted to the specialised domains of companies and therefore produce erroneous content more frequently than in general fields of knowledge. The aim of the GenKI4Media project is to tap into the innovative potential of generative AI with three new generative AI assistants for (1) ‘Generating multimodal media formats for culture, politics and education’, (2) ‘Standards and regulations in the media sector’ and (3) ‘Demonstrators for the creative/cultural sector’ in order to effectively support editorial work.

Leonhard Hennig

OUTLIER_DETECTION

Demo project for anonymization techniques.

TRAILS - Trustworthy and Inclusive Machines

Natural language processing (NLP) has demonstrated impressive performance in some human tasks. To achieve such performance, current neural models need to be pre-trained on huge amounts of raw text data. This dependence on uncurated data has at least four indirect and unintended consequences: 1) Uncurated data tends to be linguistically and culturally non-diverse due to the statistical dominance of major languages and dialects in online texts (English vs. North Frisian, US English vs. UK English, etc.). 2) Pre-trained neural models such as the ubiquitous pre-trained language models (PLM) reproduce the features present in the data, including human biases. 3) Rare phenomena (or languages) in the ’long tail’ are often not sufficiently taken into account in model evaluation, leading to an underestimation of model performance, especially in real-world application scenarios. 4) The focus on achieving state-of-the-art results through the use of transfer learning with giant PLMs such as GPT4 or mT5 often underestimates alternative methods that are more accessible, efficient and sustainable. As inclusion and trust are undermined by these problems, in TRAILS we focus on three main research directions to address such problems: (i) inclusion of underrepresented languages and cultures through multilingual and culturally sensitive NLP, (ii) robustness and fairness with respect to long-tail phenomena and classes and ’trustworthy content’, and (iii) robust and efficient NLP models that enable training and deployment of models for (i) and (ii). We also partially address economic inequality by aiming for more efficient models (objective (iii)), which directly translates into a lower resource/cost footprint.

Leonhard Hennig

TRAILS - Trustworthy and Inclusive Machines

Data4Transparency

According to the World Bank and the UN, some US$1tn is paid in bribes every year. Corrupt financial transactions divert funds from legitimate public services, as well as distort free markets—potentially thwarting economic development—and reduce trust in institutions. The Organized Crime and Corruption Reporting Project (OCCRP) is a global platform for investigative reporting, providing resources to journalists and media centres, enabling cost-effective collaboration between editors and offering tools to secure themselves against threats to independent media. Exposing previously-unknown connections between entities makes it possible for citizens, policymakers, activists and law enforcement agencies to act. As the number of such leaks and publications grows, there is an increasing need for effective, scalable and reproducible methods to discover any anomalies and evidence of malfeasance that might exist within them.

Steffen Castle, Leonhard Hennig

Text2Tech

The goal of the Text2Tech project is the research and development of automated methods for information extraction from unstructured text sources in order to be able to provide companies with decision-relevant knowledge about technological developments quickly and efficiently. AI-based methods for information extraction (IE) already make it possible to extract selected information, e.g. B. to people, companies and places automatically from text sources. In the Text2Tech project, such approaches are to be further developed in order to extract machine-readable knowledge about technologies, technology categories, companies and their relationships with each other from German and English-language, domain-specific text sources, using the example of the automotive industry. The most important research goals are the modeling and filling of domain-specific knowledge graphs (Knowledge Base Population), the development of methods for cross-lingual proper name recognition and linking (Named Entity Recognition or Entity Linking), relation extraction (Relation Extraction), as well as the development of Model compression methods so that models run efficiently even on small hardware.

David Harbecke, Leonhard Hennig

BIFOLD

BIFOLD conducts foundational research in big data management and machine learning, as well as its intersection, to educate future talent, and create high-impact knowledge exchange. The Berlin Institute for the Foundations of Learning and Data (BIFOLD), has evolved in 2019 from the merger of two national Artificial Intelligence Competence Centers: the Berlin Big Data Center (BBDC) and the Berlin Center for Machine Learning (BZML). Embedded in the vibrant Berlin metropolitan area, BIFOLD provides an outstanding scientific environment and numerous collaboration opportunities for national and international researchers. BIFOLD offers a broad range of research topics as well as a platform for interdisciplinary research and knowledge exchange with the sciences and humanities, industry, startups and society. Within BIFOLD, DFKI SLT conducts research in Clinical AI, specifically addressing the task of Pharmacovigilance. Pharmacovigilance is concerned with the assessment and prevention of adverse drug reactions (ADR) in pharmaceutical products. As the level of medication is generally raising all over the world, the potential risk of unwanted side effects, such as ADRs, is constantly increasing. Patients exchange views in their own language as ’experts in their own right,’ in social media and disease-specific forums. Our project addresses the detection and extraction of ADR from medical forums and social media across different languages using cross-lingual transfer learning in combination with external knowledge sources.

Philippe Thomas, Leonhard Hennig

Cora4NLP

Language is implicit - it omits information. Filling this information gap requires contextual inference, background- and commonsense knowledge, and reasoning over situational context. Language also evolves, i.e., it specializes and changes over time. For example, many different languages and domains exist, new domains arise, and both evolve constantly. Thus, language understanding also requires continuous and efficient adaptation to new languages and domains, and transfer to, and between, both. Current language understanding technology, however, focuses on high resource languages and domains, uses little to no context, and assumes static data, task, and target distributions. The research in Cora4NLP aims to address these challenges. It builds on the expertise and results of the predecessor project DEEPLEE and is carried out jointly between the language technology research departments in Berlin and Saarbrücken.

Leonhard Hennig, Christoph Alt

BBDC2

In order to optimally prepare industry, science and the society in Germany and Europe for the global Big Data trend, highly coordinated activities in research, teaching, and technology transfer regarding the integration of data analysis methods and scalable data processing are required. To achieve this, the Berlin Big Data Center is pursuing the following seven objectives: 1) Pooling expertise in scalable data management, data analytics, and big data application 2) Conducting fundamental research to develop novel and automatically scalable technologies capable of performing ‘Deep Analysis’ of ‘Big Data’. 3) Developing an integrated, declarative, highly scalable open-source system that enables the specification, automatic optimization, parallelization and hardware adaptation, and fault-tolerant, efficient execution of advanced data analysis problems, using varying methods (e.g., drawn from machine learning, linear algebra, statistics and probability theory, computational linguistics, or signal processing), leveraging our work on Apache Flink 4) Transfering technology and know-how to support innovation in companies and startups. 5) Educating data scientists with respect to the five big data dimensions (i.e., applications, economic, legal, social, and technological) via leading educational programs. 6) Empowering people to leverage ‘Smart Data’, i.e., to discover newfound information based on their massive data sets. 7)Enabling the general public to conduct sound data-driven decision-making.

Leonhard Hennig

DEEPLEE

The research work in DEEPLEE, which is carried out in the Language Technology research departments in Saabrücken and Berlin, builds on DFKI’s expertise in the areas of deep learning (DL) and language technology (LT) and develops it further. They aim for profound improvements of DL approaches in LT by focusing on four central, open research topics: Modularity in DNN architectures, Use of external knowledge, DNNs with explanation functionality, Machine Teaching Strategies for DNNs

Sven Schmeier, Christoph Alt, Robert Schwarzenberg

PLASS

The aim of the PLASS project is to develop a prototypical B2B platform for AI-based decision support for supply chain management. The focus is on the automatic recognition of decision-relevant information and the acquisition of structured knowledge from global and multilingual text sources. These sources provide a large database for SCM information, especially for the early detection of critical events and risks, but also of opportunities, e.g. through new technologies, at suppliers and supply chains. PLASS enables SMEs and large companies to continuously monitor their suppliers and supply chains, and supports supply chain managers in risk assessment and decision-making.

Leonhard Hennig, Christoph Alt

SIM3S

In the SIM3S project, data from the BMVI data offerings mCloud and MDM will be linked, refined and jointly analysed with other open data, user-generated content and data from individual modes of transport and other mobility-relevant companies in order to remove barriers and barriers to discrimination in everyday mobility. For the implementation of the project, state-of-the-art technologies and methods from the areas of Big Data Intelligent Analysis of mass data and artificial intelligence, in particular Natural Language Processing (NLP), are used.

Philippe Thomas

Datasets

September 18, 2024

LLM-based FAQ Rewrites

We introduce a German-language dataset comprising Frequently Asked Question-Answer pairs: raw FAQ drafts, their revisions by professional editors and LLM generated revisions. The data was used to investigate the use of large language models (LLMs) to enhance the editorial process of rewriting customer help pages. The corpus comprises 56 question-answer pairs addressing potential customer inquiries across various topics. For each FAQ pair, a raw input is provided by specialized departments, and a rewritten gold output is crafted by a professional editor of Deutsche Telekom. The final dataset also includes LLM generated FAQ-pairs. Please see our paper accepted at INLG 20204, Tokyo, Japan. You can find the Github repo containing the dataset here https://github.com/DFKI-NLP/faq-rewrites-llms.

May 24, 2023

The MultiTACRED dataset

MultiTACRED is a multilingual version of the large-scale TAC Relation Extraction Dataset. It covers 12 typologically diverse languages from 9 language families, and was created by machine-translating the instances of the original TACRED dataset and automatically projecting their entity annotations. For details of the original TACRED’s data collection and annotation process, see the Stanford paper. Translations are syntactically validated by checking the correctness of the XML tag markup. Any translations with an invalid tag structure, e.g. missing or invalid head or tail tag pairs, are discarded (on average, 2.3% of the instances). Languages covered are: Arabic, Chinese, Finnish, French, German, Hindi, Hungarian, Japanese, Polish, Russian, Spanish, Turkish. Intended use is supervised relation classification. Audience - researchers. The dataset will be released via the LDC (link will follow). Please see our ACL paper for full details. You can find the Github repo containing the translation and experiment code here https://github.com/DFKI-NLP/MultiTACRED.

July 1, 2020

Ex4CDS - Textual Explanations for Clinical Decision Support

Ex4CDS are explanations (or more precisely justifications) of physicians in the context of clinical decision support. In the course of a larger study, physicians estimated the probability of different clinical outcomes in nephology, namely rejection, graft loss and infections, within the next 90 days. Each estimation had to be justified within a short text - these are our explanations. The explanations were provided in German and have strong similarities to general clinical notes. You can find a description and the data here: https://github.com/DFKI-NLP/Ex4CDS

July 1, 2020

German Adverse Drug Reaction (ADR) detection in patient-generated content

In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get advice from medical doctors. As is common in social media data in this domain, the class labels of the corpus are very imbalanced. This and a high topic imbalance make it a very challenging dataset, since often, the same symptom can have several causes and is not always related to a medication intake. We aim to encourage further multi-lingual efforts in the domain of ADR detection. More info: https://aclanthology.org/2022.lrec-1.388/

July 1, 2020

MobASA Corpus

This repository contains corpus called MobASA: a novel German-language corpus of tweets annotated with their relevance for public transportation, and with sentiment towards aspects related to barrier-free travel. We identified and labeled topics important for passengers limited in their mobility due to disability, age, or when travelling with young children. The data can be used for as a training or test corpus for aspect-oriented sentiment analysis. Moreover, the corpus can benefit building inclusive public transportation systems. You can find the corpus here: https://github.com/DFKI-NLP/sim3s-corpus, and the description of the corpus here: https://aclanthology.org/2022.csrnlp-1.5.pdf

July 1, 2020

MobIE Corpus

This repository contains the DFKI MobIE Corpus (formerly “DAYSTREAM Corpus”), a dataset of 3,232 German-language documents collected between May 2015 - Apr 2019 that have been annotated with fine-grained geo-entities, such as location-street, location-stop and location-route, as well as standard named entity types (organization, date, number, etc). All location-related entities have been linked to either Open Street Map identifiers or database ids of Deutsche Bahn / Rhein-Main-Verkehrsverbund. The corpus has also been annotated with a set of 7 traffic-related n-ary relations and events, such as Accidents, Traffic jams, and Canceled Routes. It consists of Twitter messages, and traffic reports from e.g. radio stations, police and public transport providers. It allows for training and evaluating both named entity recognition algorithms that aim for fine-grained typing of geo-entities, entity linking of these entities, as well as n-ary relation extraction systems. You can find the description of the corpus here: https://www.dfki.de/web/forschung/projekte-publikationen/publikationen-uebersicht/publikation/11741/

December 16, 2019

Product Corpus

The Product Corpus is a dataset of 174 English web pages and social media posts annotated for product and company named entities, and the relation CompanyProvidesProduct. The goal is to make extraction of non-standard, B2B products and relations from unstructured text easier and more reliable. The corpus is also annotated for coreference chains of companies and products.

December 16, 2019

SmartData Corpus

The SmartData Corpus is a dataset of 2598 German-language documents which has been annotated with fine-grained geo-entities, such as streets, stops and routes, as well as standard named entity types. It has also been annotated with a set of 15 traffic- and industry-related n-ary relations and events, such as Accidents, Traffic jams, Acquisitions, and Strikes. The corpus consists of newswire texts, Twitter messages, and traffic reports from radio stations, police and railway companies. It allows for training and evaluating both named entity recognition algorithms that aim for fine-grained typing of geo-entities, as well as n-ary relation extraction systems.

Contact

Salzufer 15/16
10587 Berlin
Enter Salzufer 15/16 and take the elevator to Reception on Floor 4
9:00 to 17:00 Monday to Friday

DFKI-NLP

Research Group at DFKI

German Research Center for Artificial Intelligence (DFKI)

Latest News

People

Researchers

Leonhard Hennig

Philippe Thomas

Roland Roller

Lisa Raithel

Eleftherios Avramidis

PhD Candidates

David Harbecke

Arne Binder

Steffen Castle

Aleksandra Gabryszak

Ajay Madhavan Ravichandran

Yuxuan Chen

Alumni

Nils Feldhus

Malte Ostendorff

Robert Schwarzenberg

Nils Rethmeier

He Wang

Christoph Alt

Karolina Zaczynska

Marc Hübner

Recent Publications

Projects

Datasets

Contact