DFKI-NLP is a Natural Language Processing group of researchers, software engineers and students at the Berlin office of the German Research Center for Artificial Intelligence (DFKI) working on basic and applied research in areas covering, among others, information extraction, knowledge base population, dialogue, sentiment analysis, and summarization. We are particularly interested in core research on learning in low-resource settings, reasoning over larger contexts, and continual learning. We strive for a deeper understanding of human language and thinking, with the goal of developing novel methods for processing and generating human language text, speech, and knowledge. An important part of our work is the creation of corpora, the evaluation of NLP datasets and tasks, and the explainability of (neural) models.

Key topics:

  • Applied / domain-specific information extraction
  • Learning in low-resource settings and over large contexts
  • Construction and analysis of IE datasets, linguistic annotation
  • Multilingual information extraction
  • Evaluation methodology research
  • Explainability

Our group forms a part of DFKI’s Speech and Language Technology department led by Prof. Sebastian Möller, and closely collaborates with e.g. the Technische Universität Berlin, DFKI’s Language Technology and Multilinguality department and DFKI’s Intelligent Analytics for Massive Data group.

Latest News

Recent Publications

An Annotated Corpus of Textual Explanations for Clinical Decision Support

In recent years, machine learning for clinical decision support has gained more and more attention. In order to introduce such applications into clinical practice, a good performance might be essential, however, the aspect of trust should not be underestimated. For the treating physician using such a system and being (legally) responsible for the decision made, it is particularly important to understand the system’s recommendation. To provide insights into a model’s decision, various techniques from the field of explainability (XAI) have been proposed whose output is often enough not targeted to the domain experts that want to use the model. To close this gap, in this work, we explore how explanations could possibly look like in future. To this end, this work presents a dataset of textual explanations in context of decision support. Within a reader study, human physicians estimated the likelihood of possible negative patient outcomes in the near future and justified each decision with a few sentences. Using those sentences, we created a novel corpus, annotated with different semantic layers. Moreover, we provide an analysis of how those explanations are constructed, and how they change depending on physician, on the estimated risk and also in comparison to an automatic clinical decision support system with feature importance.


BIFOLD conducts foundational research in big data management and machine learning, as well as its intersection, to educate future talent, and create high-impact knowledge exchange. The Berlin Institute for the Foundations of Learning and Data (BIFOLD), has evolved in 2019 from the merger of two national Artificial Intelligence Competence Centers: the Berlin Big Data Center (BBDC) and the Berlin Center for Machine Learning (BZML). Embedded in the vibrant Berlin metropolitan area, BIFOLD provides an outstanding scientific environment and numerous collaboration opportunities for national and international researchers. BIFOLD offers a broad range of research topics as well as a platform for interdisciplinary research and knowledge exchange with the sciences and humanities, industry, startups and society. Within BIFOLD, DFKI SLT conducts research in Clinical AI, specifically addressing the task of Pharmacovigilance. Pharmacovigilance is concerned with the assessment and prevention of adverse drug reactions (ADR) in pharmaceutical products. As the level of medication is generally raising all over the world, the potential risk of unwanted side effects, such as ADRs, is constantly increasing. Patients exchange views in their own language as ‘experts in their own right,’ in social media and disease-specific forums. Our project addresses the detection and extraction of ADR from medical forums and social media across different languages using cross-lingual transfer learning in combination with external knowledge sources.
In order to optimally prepare industry, science and the society in Germany and Europe for the global Big Data trend, highly coordinated activities in research, teaching, and technology transfer regarding the integration of data analysis methods and scalable data processing are required. To achieve this, the Berlin Big Data Center is pursuing the following seven objectives: 1) Pooling expertise in scalable data management, data analytics, and big data application 2) Conducting fundamental research to develop novel and automatically scalable technologies capable of performing ‘Deep Analysis’ of ‘Big Data’. 3) Developing an integrated, declarative, highly scalable open-source system that enables the specification, automatic optimization, parallelization and hardware adaptation, and fault-tolerant, efficient execution of advanced data analysis problems, using varying methods (e.g., drawn from machine learning, linear algebra, statistics and probability theory, computational linguistics, or signal processing), leveraging our work on Apache Flink 4) Transfering technology and know-how to support innovation in companies and startups. 5) Educating data scientists with respect to the five big data dimensions (i.e., applications, economic, legal, social, and technological) via leading educational programs. 6) Empowering people to leverage ‘Smart Data’, i.e., to discover newfound information based on their massive data sets. 7)Enabling the general public to conduct sound data-driven decision-making.



  • Alt-Moabit 91c
    10559 Berlin
  • Enter Alt-Moabit 91c and take the elevator to Reception on Floor 4
  • 9:00 to 17:00 Monday to Friday