Search CORE

4 research outputs found

Anotació del focus de la negació i de la temporalitat en informes mèdics

Author: Tañá Velasco Laura
Publication venue
Publication date: 15/09/2021
Field of study

Màster d'Humanitats Digitals, Facultat d'Informació i Mitjans Audiovisuals, Universitat de Barcelona. Curs: 2020-2021. Tutor: Taulé Delor, Mariona.En aquest treball, Anotació del focus de la negació i de la temporalitat en el domini mèdic, presentem les característiques del subllenguatge mèdic i ens centrem en el tractament del focus de la negació en documents del domini mèdic per a l’ensinistrament de sistemes de detecció de la negació basats en l’Aprenentatge Automàtic. En l’àrea de l’extracció d’informació l’expressió de la negació encara resulta un aspecte problemàtic, tot i que el seu tractament és important per comprendre correctament els textos. Volem contribuir en l’estudi del focus de la negació i crear un nou recurs lingüístic, el corpus ClUB-21 i la guia d’anotació corresponent. Tractem també la temporalitat i els diferents tipus d’expressions temporals per l’ambigüitat que generen a l’hora d’identificar el focus de la negació

Diposit Digital de la Universitat de Barcelona

Contributions to information extraction for spanish written biomedical text

Author: Pérez Miguel Naiara
Publication venue
Publication date: 28/03/2023
Field of study

285 p.Healthcare practice and clinical research produce vast amounts of digitised, unstructured data in multiple languages that are currently underexploited, despite their potential applications in improving healthcare experiences, supporting trainee education, or enabling biomedical research, for example. To automatically transform those contents into relevant, structured information, advanced Natural Language Processing (NLP) mechanisms are required. In NLP, this task is known as Information Extraction. Our work takes place within this growing field of clinical NLP for the Spanish language, as we tackle three distinct problems. First, we compare several supervised machine learning approaches to the problem of sensitive data detection and classification. Specifically, we study the different approaches and their transferability in two corpora, one synthetic and the other authentic. Second, we present and evaluate UMLSmapper, a knowledge-intensive system for biomedical term identification based on the UMLS Metathesaurus. This system recognises and codifies terms without relying on annotated data nor external Named Entity Recognition tools. Although technically naive, it performs on par with more evolved systems, and does not exhibit a considerable deviation from other approaches that rely on oracle terms. Finally, we present and exploit a new corpus of real health records manually annotated with negation and uncertainty information: NUBes. This corpus is the basis for two sets of experiments, one on cue andscope detection, and the other on assertion classification. Throughout the thesis, we apply and compare techniques of varying levels of sophistication and novelty, which reflects the rapid advancement of the field

Archivo Digital para la Docencia y la Investigación