Projects

TRAILS - Trustworthy and Inclusive Machines
Natural language processing (NLP) has demonstrated impressive performance in some human tasks. To achieve such performance, current neural models need to be pre-trained on huge amounts of raw text data. This dependence on uncurated data has at least four indirect and unintended consequences: 1) Uncurated data tends to be linguistically and culturally non-diverse due to the statistical dominance of major languages and dialects in online texts (English vs. North Frisian, US English vs. UK English, etc.). 2) Pre-trained neural models such as the ubiquitous pre-trained language models (PLM) reproduce the features present in the data, including human biases. 3) Rare phenomena (or languages) in the ’long tail’ are often not sufficiently taken into account in model evaluation, leading to an underestimation of model performance, especially in real-world application scenarios. 4) The focus on achieving state-of-the-art results through the use of transfer learning with giant PLMs such as GPT4 or mT5 often underestimates alternative methods that are more accessible, efficient and sustainable. As inclusion and trust are undermined by these problems, in TRAILS we focus on three main research directions to address such problems: (i) inclusion of underrepresented languages and cultures through multilingual and culturally sensitive NLP, (ii) robustness and fairness with respect to long-tail phenomena and classes and ’trustworthy content’, and (iii) robust and efficient NLP models that enable training and deployment of models for (i) and (ii). We also partially address economic inequality by aiming for more efficient models (objective (iii)), which directly translates into a lower resource/cost footprint.
TRAILS - Trustworthy and Inclusive Machines
BIFOLD
BIFOLD conducts foundational research in big data management and machine learning, as well as its intersection, to educate future talent, and create high-impact knowledge exchange. The Berlin Institute for the Foundations of Learning and Data (BIFOLD), has evolved in 2019 from the merger of two national Artificial Intelligence Competence Centers: the Berlin Big Data Center (BBDC) and the Berlin Center for Machine Learning (BZML). Embedded in the vibrant Berlin metropolitan area, BIFOLD provides an outstanding scientific environment and numerous collaboration opportunities for national and international researchers. BIFOLD offers a broad range of research topics as well as a platform for interdisciplinary research and knowledge exchange with the sciences and humanities, industry, startups and society. Within BIFOLD, DFKI SLT conducts research in Clinical AI, specifically addressing the task of Pharmacovigilance. Pharmacovigilance is concerned with the assessment and prevention of adverse drug reactions (ADR) in pharmaceutical products. As the level of medication is generally raising all over the world, the potential risk of unwanted side effects, such as ADRs, is constantly increasing. Patients exchange views in their own language as ’experts in their own right,’ in social media and disease-specific forums. Our project addresses the detection and extraction of ADR from medical forums and social media across different languages using cross-lingual transfer learning in combination with external knowledge sources.
BIFOLD
BBDC2
In order to optimally prepare industry, science and the society in Germany and Europe for the global Big Data trend, highly coordinated activities in research, teaching, and technology transfer regarding the integration of data analysis methods and scalable data processing are required. To achieve this, the Berlin Big Data Center is pursuing the following seven objectives: 1) Pooling expertise in scalable data management, data analytics, and big data application 2) Conducting fundamental research to develop novel and automatically scalable technologies capable of performing ‘Deep Analysis’ of ‘Big Data’. 3) Developing an integrated, declarative, highly scalable open-source system that enables the specification, automatic optimization, parallelization and hardware adaptation, and fault-tolerant, efficient execution of advanced data analysis problems, using varying methods (e.g., drawn from machine learning, linear algebra, statistics and probability theory, computational linguistics, or signal processing), leveraging our work on Apache Flink 4) Transfering technology and know-how to support innovation in companies and startups. 5) Educating data scientists with respect to the five big data dimensions (i.e., applications, economic, legal, social, and technological) via leading educational programs. 6) Empowering people to leverage ‘Smart Data’, i.e., to discover newfound information based on their massive data sets. 7)Enabling the general public to conduct sound data-driven decision-making.
BBDC2