Search CORE

23 research outputs found

Извлечение именованных сущностей из русскоязычных документов с различной выраженностью структуры

Author: Maria D. Averina
Olga A. Levanova
Мария Дмитриевна Аверина
Ольга Александровна Леванова
Publication venue: Yaroslavl State University
Publication date: 11/12/2023
Field of study

This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.Данная работа посвящена решению задачи распознавания именованных сущностей для русскоязычных текстов на основе модели CRF. Рассмотрены два набора данных: документы о рефинансировании с хорошей структурой документа, слабоструктурированные тексты судебных протоколов. Было проведено тестирование модели при различных наборах текстовых признаков и параметрах CRF (алгоритмов оптимизации). В среднем по всем сущностям лучшее значение F"=меры для структурированных документов составило 0.99, а для слабоструктурированных 0.86

Modeling and Analysis of Information Systems / Моделирование и анализ информационных систем (МАИС)

A Marker-based Neural Network System for Extracting Social Determinants of Health

Author: Rios Anthony
Zhao Xingmeng
Publication venue
Publication date: 24/12/2022
Field of study

Objective. The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically. Materials and Methods. The study uses the N2C2 Shared Task data, which was collected from two sources of clinical notes: MIMIC-III and University of Washington Harborview Medical Centers. It contains 4480 social history sections with full annotation for twelve SDoHs. In order to handle the issue of overlapping entities, we developed a novel marker-based NER model. We used it in a multi-stage pipeline to extract SDoH information from clinical notes. Results. Our marker-based system outperformed the state-of-the-art span-based models at handling overlapping entities based on the overall Micro-F1 score performance. It also achieved state-of-the-art performance compared to the shared task methods. Conclusion. The major finding of this study is that the multi-stage pipeline effectively extracts SDoH information from clinical notes. This approach can potentially improve the understanding and tracking of SDoHs in clinical settings. However, error propagation may be an issue, and further research is needed to improve the extraction of entities with complex semantic meanings and low-resource entities using external knowledge

arXiv.org e-Print Archive

SinNer@Clef-Hipe2020 : Sinful adaptation of SotA models for Named Entity Recognition in French and German

Author: Dupont Yoann
Lejeune Gaël
Ortiz Suárez Pedro Javier
Tian Tian
Publication venue: HAL CCSD
Publication date: 23/09/2020
Field of study

International audienceIn this article we present the approaches developed by the Sorbonne-INRIA for NER (SinNer) team for the CLEF-HIPE 2020 challenge on Named Entity Processing on old newspapers. The challenge proposed various tasks for three languages, among them we focused on Named Entity Recognition in French and German texts. The best system we proposed ranked third for these two languages, it uses FastText em-beddings and Elmo language models (FrELMo and German ELMo). We show that combining several word representations enhances the quality of the results for all NE types and that the segmentation in sentences has an important impact on the results

INRIA a CCSD electronic archive server

HAL-Rennes 1