1

Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective

In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get …

Generating Extended and Multilingual Summaries with Pre-trained Transformers

Almost all summarisation methods and datasets focus on a single language and short summaries. We introduce a new dataset called WikinewsSum for English, German, French, Spanish, Portuguese, Polish, and Italian summarisation tailored for extended …

MobASA: Corpus for Aspect-based Sentiment Analysis and Social Inclusion in the Mobility Domain

In this paper we show how aspect-based sentiment analysis might help public transport companies to improve their social responsibility for accessible travel. We present MobASA: a novel German-language corpus of tweets annotated with their relevance …

Subjective Text Complexity Assessment for German

For different reasons, text can be difficult to read and understand for many people, especially if the text’s language is too complex. In order to provide suitable text for the target audience, it is necessary to measure its complexity. In this paper …

Mediators: Conversational Agents Explaining NLP Model Behavior

The human-centric explainable artificial intelligence (HCXAI) community has raised the need for framing the explanation process as a conversation between human and machine. In this position paper, we establish desiderata for Mediators, text-based …

Claim Extraction and Law Matching for COVID-19-related Legislation

To cope with the COVID-19 pandemic, many jurisdictions have introduced new or altered existing legislation. Even though these new rules are often communicated to the public in news articles, it remains challenging for laypersons to learn about what …

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

Document embeddings and similarity measures underpin content-based recommender systems, whereby a document is commonly represented as a single generic embedding. However, similarity computed on single vector representations provides only one …

HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information

Transformer-based language models usually treat texts as linear sequences. However, most texts also have an inherent hierarchical structure, i.e., parts of a text can be identified using their position in this hierarchy. In addition, section titles …

Perceptual Quality Dimensions of Machine-Generated Text with a Focus on Machine Translation

The quality of machine-generated text is a complex construct consisting of various aspects and dimensions. We present a study that aims to uncover relevant perceptual quality dimensions for one type of machine-generated text, that is, Machine …

A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition

Pre-trained language models (PLM) are effective components of few-shot named entity recognition (NER) approaches when augmented with continued pre-training on task-specific out-of-domain data or fine-tuning on in-domain data. However, their …