Search CORE

3 research outputs found

SemClinBr -- a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks

Author: Carvalho Deborah Ribeiro
Cintho Lilian Mie Mukai
da Silva Adalniza Moura Pucca
Gebeluca Caroline P.
Gumiel Yohan Bonescki
Hasan Sadid A.
Moro Claudia Maria Cabral
Oliveira Lucas Emanuel Silva e
Peters Ana Carolina
Publication venue
Publication date: 27/01/2020
Field of study

The high volume of research focusing on extracting patient's information from electronic health records (EHR) has led to an increase in the demand for annotated corpora, which are a very valuable resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multi-purpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field. In this study, we developed a semantically annotated corpus using clinical texts from multiple medical specialties, document types, and institutions. We present the following: (1) a survey listing common aspects and lessons learned from previous research, (2) a fine-grained annotation schema which could be replicated and guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations. The result of this work is the SemClinBr, a corpus that has 1,000 clinical notes, labeled with 65,117 entities and 11,263 relations, and can support a variety of clinical NLP tasks and boost the EHR's secondary use for the Portuguese language

arXiv.org e-Print Archive

TRIPOD - Text-based Risk Prioritisation of Dermatological Clinical Notes

Author: Catarina Magalhães Dias
Publication venue
Publication date: 07/07/2021
Field of study

Repositório Aberto da Universidade do Porto

Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora

Author: Cabral Moro Claudia,
Carvalho Deborah,
Claveau Vincent
Dalloux Clément
Grabar Natalia
Gumiel Yohan,
Oliveira Lucas,
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/06/2020
Field of study

International audienceAutomatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1