223 research outputs found
Mapping of electronic health records in Spanish to the unified medical language system metathesaurus
[EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta
Processamento automático de texto de narrativas clínicas
The informatization of medical systems and the subsequent move towards
the usage of Electronic Health Records (EHR) over the paper format by
medical professionals allowed for safer and more e cient healthcare. Additionally,
EHR can also be used as a data source for observational studies
around the world. However, it is estimated that 70-80% of all clinical data
is in the form of unstructured free text and regarding the data that is structured,
not all of it follows the same standards, making it di cult to use on
the mentioned observational studies.
This dissertation aims to tackle those two adversities using natural language
processing for the task of extracting concepts from free text and, afterwards,
use a common data model to harmonize the data. The developed system
employs an annotator, namely cTAKES, to extract the concepts from free
text. The extracted concepts are then normalized using text preprocessing,
word embeddings, MetaMap and UMLS Metathesaurus lookup. Finally, the
normalized concepts are converted to the OMOP Common Data Model and
stored in a database.
In order to test the developed system, the i2b2 2010 data set was used.
The di erent components of the system were tested and evaluated separately,
with the concept extraction component achieving a precision, recall
and F-score of 77.12%, 70.29% and 73.55%, respectively. The normalization
component was evaluated by completing the N2C2 2019 challenge
track 3, where it achieved a 77.5% accuracy. Finally, during the OMOP
CDM conversion component, it was observed that 7.92% of the concepts
were lost during the process. In conclusion, even though the developed system
still has margin for improvements, it proves to be a viable method of
automatically processing clinical narratives.A informatização dos sistemas médicos e a subsequente tendência por parte
de profissionais de saúde a substituir registos em formato de papel por registos
eletrónicos de saúde, permitiu que os serviços de saúde se tornassem
mais seguros e eficientes. Além disso, estes registos eletrónicos apresentam
também o benefício de poderem ser utilizados como fonte de dados para estudos
observacionais. No entanto, estima-se que 70-80% de todos os dados
clínicos se encontrem na forma de texto livre não-estruturado e os dados
que estão estruturados não seguem todos os mesmos padrões, dificultando
o seu potencial uso nos estudos observacionais.
Esta dissertação pretende solucionar essas duas adversidades através do uso
de processamento de linguagem natural para a tarefa de extrair conceitos
de texto livre e, de seguida, usar um modelo comum de dados para os harmonizar.
O sistema desenvolvido utiliza um anotador, especificamente o
cTAKES, para extrair conceitos de texto livre. Os conceitos extraídos são,
então, normalizados através de técnicas de pré-processamento de texto,
Word Embeddings, MetaMap e um sistema de procura no Metathesaurus
do UMLS. Por fim, os conceitos normalizados são convertidos para o modelo
comum de dados da OMOP e guardados numa base de dados.
Para testar o sistema desenvolvido usou-se o conjunto de dados i2b2 de
2010. As diferentes partes do sistema foram testadas e avaliadas individualmente
sendo que na extração dos conceitos obteve-se uma precisão, recall e
F-score de 77.12%, 70.29% e 73.55%, respetivamente. A normalização foi
avaliada através do desafio N2C2 2019-track 3 onde se obteve uma exatidão
de 77.5%. Na conversão para o modelo comum de dados OMOP observou-se
que durante a conversão perderam-se 7.92% dos conceitos. Concluiu-se
que, embora o sistema desenvolvido ainda tenha margem para melhorias,
este demonstrou-se como um método viável de processamento automático
do texto de narrativas clínicas.Mestrado em Engenharia de Computadores e Telemátic
Clinical concept normalization on medical records using word embeddings and heuristics
Electronic health records contain valuable information on patients' clinical history in the form of free text. Manually analyzing millions of these documents is unfeasible and automatic natural language processing methods are essential for efficiently exploiting these data. Within this, normalization of clinical entities, where the aim is to link entity mentions to reference vocabularies, is of utmost importance to successfully extract knowledge from clinical narratives.
In this paper we present sieve-based models combined with heuristics and word embeddings and present results of our participation in the 2019 n2c2 (National NLP Clinical Challenges) shared-task on clinical concept normalization.publishe
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
Mapping of electronic health records in Spanish to the unified medical language system metathesaurus
[EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta
- …