Unsupervised SapBERT-based bi-encoders for medical concept annotation of clinical narratives with SNOMED CT

Abstract

Clinical narratives provide comprehensive patient information. Achieving interoperability involves mapping relevant details to standardized medical vocabularies. Typically, natural language processing divides this task into named entity recognition (NER) and medical concept normalization (MCN). State-of-the-art results require supervised setups with abundant training data. However, the limited availability of annotated data due to sensitivity and time constraints poses challenges. This study addressed the need for unsupervised medical concept annotation (MCA) to overcome these limitations and support the creation of annotated datasets.

Publication
SAGE Digital Health
Roland Roller
Roland Roller
Senior Researcher