4 research outputs found

    Anotaci贸 del focus de la negaci贸 i de la temporalitat en informes m猫dics

    Get PDF
    M脿ster d'Humanitats Digitals, Facultat d'Informaci贸 i Mitjans Audiovisuals, Universitat de Barcelona. Curs: 2020-2021. Tutor: Taul茅 Delor, Mariona.En aquest treball, Anotaci贸 del focus de la negaci贸 i de la temporalitat en el domini m猫dic, presentem les caracter铆stiques del subllenguatge m猫dic i ens centrem en el tractament del focus de la negaci贸 en documents del domini m猫dic per a l鈥檈nsinistrament de sistemes de detecci贸 de la negaci贸 basats en l鈥橝prenentatge Autom脿tic. En l鈥櫭爎ea de l鈥檈xtracci贸 d鈥檌nformaci贸 l鈥檈xpressi贸 de la negaci贸 encara resulta un aspecte problem脿tic, tot i que el seu tractament 茅s important per comprendre correctament els textos. Volem contribuir en l鈥檈studi del focus de la negaci贸 i crear un nou recurs ling眉铆stic, el corpus ClUB-21 i la guia d鈥檃notaci贸 corresponent. Tractem tamb茅 la temporalitat i els diferents tipus d鈥檈xpressions temporals per l鈥檃mbig眉itat que generen a l鈥檋ora d鈥檌dentificar el focus de la negaci贸

    Contributions to information extraction for spanish written biomedical text

    Get PDF
    285 p.Healthcare practice and clinical research produce vast amounts of digitised, unstructured data in multiple languages that are currently underexploited, despite their potential applications in improving healthcare experiences, supporting trainee education, or enabling biomedical research, for example. To automatically transform those contents into relevant, structured information, advanced Natural Language Processing (NLP) mechanisms are required. In NLP, this task is known as Information Extraction. Our work takes place within this growing field of clinical NLP for the Spanish language, as we tackle three distinct problems. First, we compare several supervised machine learning approaches to the problem of sensitive data detection and classification. Specifically, we study the different approaches and their transferability in two corpora, one synthetic and the other authentic. Second, we present and evaluate UMLSmapper, a knowledge-intensive system for biomedical term identification based on the UMLS Metathesaurus. This system recognises and codifies terms without relying on annotated data nor external Named Entity Recognition tools. Although technically naive, it performs on par with more evolved systems, and does not exhibit a considerable deviation from other approaches that rely on oracle terms. Finally, we present and exploit a new corpus of real health records manually annotated with negation and uncertainty information: NUBes. This corpus is the basis for two sets of experiments, one on cue andscope detection, and the other on assertion classification. Throughout the thesis, we apply and compare techniques of varying levels of sophistication and novelty, which reflects the rapid advancement of the field
    corecore