223 research outputs found

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta

    Processamento automático de texto de narrativas clínicas

    Get PDF
    The informatization of medical systems and the subsequent move towards the usage of Electronic Health Records (EHR) over the paper format by medical professionals allowed for safer and more e cient healthcare. Additionally, EHR can also be used as a data source for observational studies around the world. However, it is estimated that 70-80% of all clinical data is in the form of unstructured free text and regarding the data that is structured, not all of it follows the same standards, making it di cult to use on the mentioned observational studies. This dissertation aims to tackle those two adversities using natural language processing for the task of extracting concepts from free text and, afterwards, use a common data model to harmonize the data. The developed system employs an annotator, namely cTAKES, to extract the concepts from free text. The extracted concepts are then normalized using text preprocessing, word embeddings, MetaMap and UMLS Metathesaurus lookup. Finally, the normalized concepts are converted to the OMOP Common Data Model and stored in a database. In order to test the developed system, the i2b2 2010 data set was used. The di erent components of the system were tested and evaluated separately, with the concept extraction component achieving a precision, recall and F-score of 77.12%, 70.29% and 73.55%, respectively. The normalization component was evaluated by completing the N2C2 2019 challenge track 3, where it achieved a 77.5% accuracy. Finally, during the OMOP CDM conversion component, it was observed that 7.92% of the concepts were lost during the process. In conclusion, even though the developed system still has margin for improvements, it proves to be a viable method of automatically processing clinical narratives.A informatização dos sistemas médicos e a subsequente tendência por parte de profissionais de saúde a substituir registos em formato de papel por registos eletrónicos de saúde, permitiu que os serviços de saúde se tornassem mais seguros e eficientes. Além disso, estes registos eletrónicos apresentam também o benefício de poderem ser utilizados como fonte de dados para estudos observacionais. No entanto, estima-se que 70-80% de todos os dados clínicos se encontrem na forma de texto livre não-estruturado e os dados que estão estruturados não seguem todos os mesmos padrões, dificultando o seu potencial uso nos estudos observacionais. Esta dissertação pretende solucionar essas duas adversidades através do uso de processamento de linguagem natural para a tarefa de extrair conceitos de texto livre e, de seguida, usar um modelo comum de dados para os harmonizar. O sistema desenvolvido utiliza um anotador, especificamente o cTAKES, para extrair conceitos de texto livre. Os conceitos extraídos são, então, normalizados através de técnicas de pré-processamento de texto, Word Embeddings, MetaMap e um sistema de procura no Metathesaurus do UMLS. Por fim, os conceitos normalizados são convertidos para o modelo comum de dados da OMOP e guardados numa base de dados. Para testar o sistema desenvolvido usou-se o conjunto de dados i2b2 de 2010. As diferentes partes do sistema foram testadas e avaliadas individualmente sendo que na extração dos conceitos obteve-se uma precisão, recall e F-score de 77.12%, 70.29% e 73.55%, respetivamente. A normalização foi avaliada através do desafio N2C2 2019-track 3 onde se obteve uma exatidão de 77.5%. Na conversão para o modelo comum de dados OMOP observou-se que durante a conversão perderam-se 7.92% dos conceitos. Concluiu-se que, embora o sistema desenvolvido ainda tenha margem para melhorias, este demonstrou-se como um método viável de processamento automático do texto de narrativas clínicas.Mestrado em Engenharia de Computadores e Telemátic

    Standardizing adverse drug event reporting data

    Full text link

    Clinical concept normalization on medical records using word embeddings and heuristics

    Get PDF
    Electronic health records contain valuable information on patients' clinical history in the form of free text. Manually analyzing millions of these documents is unfeasible and automatic natural language processing methods are essential for efficiently exploiting these data. Within this, normalization of clinical entities, where the aim is to link entity mentions to reference vocabularies, is of utmost importance to successfully extract knowledge from clinical narratives. In this paper we present sieve-based models combined with heuristics and word embeddings and present results of our participation in the 2019 n2c2 (National NLP Clinical Challenges) shared-task on clinical concept normalization.publishe

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Standardizing adverse drug event reporting data

    Get PDF

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta
    corecore