440 research outputs found

    Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts

    Full text link
    In this paper, we report a knowledge-based method for Word Sense Disambiguation in the domains of biomedical and clinical text. We combine word representations created on large corpora with a small number of definitions from the UMLS to create concept representations, which we then compare to representations of the context of ambiguous terms. Using no relational information, we obtain comparable performance to previous approaches on the MSH-WSD dataset, which is a well-known dataset in the biomedical domain. Additionally, our method is fast and easy to set up and extend to other domains. Supplementary materials, including source code, can be found at https: //github.com/clips/yarnComment: 6 pages, 1 figure, presented at the 15th Workshop on Biomedical Natural Language Processing, Berlin 201

    Doctor of Philosophy

    Get PDF
    dissertationDomain adaptation of natural language processing systems is challenging because it requires human expertise. While manual e ort is e ective in creating a high quality knowledge base, it is expensive and time consuming. Clinical text adds another layer of complexity to the task due to privacy and con dentiality restrictions that hinder the ability to share training corpora among di erent research groups. Semantic ambiguity is a major barrier for e ective and accurate concept recognition by natural language processing systems. In my research I propose an automated domain adaptation method that utilizes sublanguage semantic schema for all-word word sense disambiguation of clinical narrative. According to the sublanguage theory developed by Zellig Harris, domain-speci c language is characterized by a relatively small set of semantic classes that combine into a small number of sentence types. Previous research relied on manual analysis to create language models that could be used for more e ective natural language processing. Building on previous semantic type disambiguation research, I propose a method of resolving semantic ambiguity utilizing automatically acquired semantic type disambiguation rules applied on clinical text ambiguously mapped to a standard set of concepts. This research aims to provide an automatic method to acquire Sublanguage Semantic Schema (S3) and apply this model to disambiguate terms that map to more than one concept with di erent semantic types. The research is conducted using unmodi ed MetaMap version 2009, a concept recognition system provided by the National Library of Medicine, applied on a large set of clinical text. The project includes creating and comparing models, which are based on unambiguous concept mappings found in seventeen clinical note types. The e ectiveness of the nal application was validated through a manual review of a subset of processed clinical notes using recall, precision and F-score metrics

    GRAPH BASESD WORD SENSE DISAMBIGUATION FOR CLINICAL ABBREVIATIONS USING APACHE SPARK

    Get PDF
    Identification of the correct sense for an ambiguous word is one of the major challenges for language processing in all domains. Word Sense Disambiguation is the task of identifying the correct sense of an ambiguous word by referencing the surrounding context of the word. Similar to the narrative documents, clinical documents suffer from ambiguity issues that impact automatic extraction of correct sense from the document. In this project, we propose a graph-based solution based on an algorithm originally implemented by Osmar R. Zaine et al. for word sense disambiguation specifically focusing on clinical text. The algorithm makes use of proposed UMLS Metathesaurus as its source of knowledge. As an enhancement to the existing implementation of the algorithm, this project uses Apache Spark - A Big Data Technology for cluster based distributed processing and performance optimization

    High Throughput Neurological Phenotyping with MetaMap

    Get PDF
    The phenotyping of neurological patients involves the conversion of signs and symptoms into machine readable codes selected from an appropriate ontology. The phenotyping of neurological patients is manual and laborious. MetaMap is used for high throughput mapping of the medical literature to concepts in the Unified Medical Language System Metathesaurus (UMLS). MetaMap was evaluated as a tool for the high throughput phenotyping of neurological patients. Based on 15 patient histories from electronic health records, 30 patient histories from neurology textbooks, and 20 clinical summaries from the Online Mendelian Inheritance in Man repository, MetaMap showed a recall of 61-89%, a precision of 84-93%, and an accuracy of 56-84% for the identification of phenotype concepts. The most common cause of false negatives (failure to recognize a phenotype concept) was an inability of MetaMap to find concepts that were represented as a description or a definition of the concept. The most common cause of false positives (incorrect identification of a concept in the text) was a failure to recognize that a concept was negated. MetaMap shows potential for high throughput phenotyping of neurological patients if the problems of false negatives and false positives can be solved

    Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings

    Get PDF
    Thanks to Palestine Technical University-Kadoorie and Deep EMR project(TIN2017-87548-C2-1-R)for partially funding this work

    Using nurses’ natural language entries to build a concept-oriented terminology for patients’ chief complaints in the emergency department

    Get PDF
    Information about the chief complaint (CC), also known as the patient's reason for seeking emergency care, is critical for patient prioritization for treatment and determination of patient flow through the emergency department (ED). Triage nurses document the CC at the start of the ED visit, and the data are increasingly available in electronic form. Despite the clinical and operational significance of the CC to the ED, there is no standard CC terminology. We propose the construction of concept-oriented nursing terminologies from the actual language used by experts. We use text analysis to extract CC concepts from triage nurses' natural language entries. Our methodology for building the nursing terminology utilizes natural language processing techniques and the Unified Medical Language System
    • …
    corecore