This repository contains corpus called MobASA: a novel German-language corpus of tweets annotated with their relevance for public transportation, and with sentiment towards aspects related to barrier-free travel. We identified and labeled topics important for passengers limited in their mobility due to disability, age, or when travelling with young children.
The data can be used for as a training or test corpus for aspect-oriented sentiment analysis. Moreover, the corpus can benefit building inclusive public transportation systems. You can find the corpus here: https://github.com/DFKI-NLP/sim3s-corpus, and the description of the corpus here: https://aclanthology.org/2022.csrnlp-1.5.pdf
This repository contains the DFKI MobIE Corpus (formerly "DAYSTREAM Corpus"), a dataset of 3,232 German-language documents collected between May 2015 - Apr 2019 that have been annotated with fine-grained geo-entities, such as location-street, location-stop and location-route, as well as standard named entity types (organization, date, number, etc). All location-related entities have been linked to either Open Street Map identifiers or database ids of Deutsche Bahn / Rhein-Main-Verkehrsverbund. The corpus has also been annotated with a set of 7 traffic-related n-ary relations and events, such as Accidents, Traffic jams, and Canceled Routes. It consists of Twitter messages, and traffic reports from e.g. radio stations, police and public transport providers. It allows for training and evaluating both named entity recognition algorithms that aim for fine-grained typing of geo-entities, entity linking of these entities, as well as n-ary relation extraction systems. You can find the description of the corpus here: https://www.dfki.de/web/forschung/projekte-publikationen/publikationen-uebersicht/publikation/11741/