23 research outputs found
Извлечение именованных сущностей из русскоязычных документов с различной выраженностью структуры
This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.Данная работа посвящена решению задачи распознавания именованных сущностей для русскоязычных текстов на основе модели CRF. Рассмотрены два набора данных: документы о рефинансировании с хорошей структурой документа, слабоструктурированные тексты судебных протоколов. Было проведено тестирование модели при различных наборах текстовых признаков и параметрах CRF (алгоритмов оптимизации). В среднем по всем сущностям лучшее значение F"=меры для структурированных документов составило 0.99, а для слабоструктурированных 0.86
A Marker-based Neural Network System for Extracting Social Determinants of Health
Objective. The impact of social determinants of health (SDoH) on patients'
healthcare quality and the disparity is well-known. Many SDoH items are not
coded in structured forms in electronic health records. These items are often
captured in free-text clinical notes, but there are limited methods for
automatically extracting them. We explore a multi-stage pipeline involving
named entity recognition (NER), relation classification (RC), and text
classification methods to extract SDoH information from clinical notes
automatically.
Materials and Methods. The study uses the N2C2 Shared Task data, which was
collected from two sources of clinical notes: MIMIC-III and University of
Washington Harborview Medical Centers. It contains 4480 social history sections
with full annotation for twelve SDoHs. In order to handle the issue of
overlapping entities, we developed a novel marker-based NER model. We used it
in a multi-stage pipeline to extract SDoH information from clinical notes.
Results. Our marker-based system outperformed the state-of-the-art span-based
models at handling overlapping entities based on the overall Micro-F1 score
performance. It also achieved state-of-the-art performance compared to the
shared task methods.
Conclusion. The major finding of this study is that the multi-stage pipeline
effectively extracts SDoH information from clinical notes. This approach can
potentially improve the understanding and tracking of SDoHs in clinical
settings. However, error propagation may be an issue, and further research is
needed to improve the extraction of entities with complex semantic meanings and
low-resource entities using external knowledge
SinNer@Clef-Hipe2020 : Sinful adaptation of SotA models for Named Entity Recognition in French and German
International audienceIn this article we present the approaches developed by the Sorbonne-INRIA for NER (SinNer) team for the CLEF-HIPE 2020 challenge on Named Entity Processing on old newspapers. The challenge proposed various tasks for three languages, among them we focused on Named Entity Recognition in French and German texts. The best system we proposed ranked third for these two languages, it uses FastText em-beddings and Elmo language models (FrELMo and German ELMo). We show that combining several word representations enhances the quality of the results for all NE types and that the segmentation in sentences has an important impact on the results