44 research outputs found

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta

    Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Systematic Nomenclature of Medicine Clinical Terms (SNOMED CT) is being advocated as the foundation for encoding clinical documentation. While the electronic medical record is likely to play a critical role in pharmacovigilance - the detection of adverse events due to medications - classification and reporting of Adverse Events is currently based on the Medical Dictionary of Regulatory Activities (MedDRA). Complete and high-quality MedDRA-to-SNOMED CT mappings can therefore facilitate pharmacovigilance.</p> <p>The existing mappings, as determined through the Unified Medical Language System (UMLS), are partial, and record only one-to-one correspondences even though SNOMED CT can be used compositionally. Efforts to map previously unmapped MedDRA concepts would be most productive if focused on concepts that occur frequently in actual adverse event data.</p> <p>We aimed to identify aspects of MedDRA that complicate mapping to SNOMED CT, determine pattern in unmapped high-frequency MedDRA concepts, and to identify types of integration errors in the mapping of MedDRA to UMLS.</p> <p>Methods</p> <p>Using one years' data from the US Federal Drug Administrations Adverse Event Reporting System, we identified MedDRA preferred terms that collectively accounted for 95% of both Adverse Events and Therapeutic Indications records. After eliminating those already mapping to SNOMED CT, we attempted to map the remaining 645 Adverse-Event and 141 Therapeutic-Indications preferred terms with software assistance.</p> <p>Results</p> <p>All but 46 Adverse-Event and 7 Therapeutic-Indications preferred terms could be composed using SNOMED CT concepts: none of these required more than 3 SNOMED CT concepts to compose. We describe the common composition patterns in the paper. About 30% of both Adverse-Event and Therapeutic-Indications Preferred Terms corresponded to single SNOMED CT concepts: the correspondence was detectable by human inspection but had been missed during the integration process, which had created duplicated concepts in UMLS.</p> <p>Conclusions</p> <p>Identification of composite mapping patterns, and the types of errors that occur in the MedDRA content within UMLS, can focus larger-scale efforts on improving the quality of such mappings, which may assist in the creation of an adverse-events ontology.</p

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta

    Enriching and designing metaschemas for the UMLS semantic network

    Get PDF
    The disparate terminologies used by various biomedical applications or professionals make the communication between them more difficult. The Unified Medical Language System (UMLS) of the National Library of Medicine (NLM) is an attempt to integrate different medical terminologies into a unified representation framework to improve decision making and the quality of patient care as well as research in the health-care field. Metathesaurus (META) and Semantic Network (SN) are two main components of the UMLS system, where the SN provides a high-level abstract of the concepts in the META. This dissertation addresses three problems of the SN. First, the SN\u27s two-tree structure is restrictive because it does not allow a semantic type to be a specialization of several other semantic types. This restriction leads to the omission of some subsumption knowledge in the SN. Secondly, the SN is large and complex for comprehension purposes and it does not come with a pictorial representation for users. As a partial solution for this problem, several metaschemas were previously built as higher-level abstractions for the SN to help users\u27 orientation. Third, there is no efficient method to evaluate each metaschema. There is no technique to obtain a consolidated metaschema acceptable for a majority of the UMLS\u27s users. In this dissertation work the author attacked the described problems by using the following approaches. (1) The SN was expanded into the Enriched Semantic Network (ESN), a multiple subsumption structure with a directed acyclic graph (DAG) IS-A hierarchy, allowing a semantic type to have multiple parents. New viable IS-A links were added as warranted. Two methodologies were presented to identify and add new viable IS-A links. The ESN serves as an extended high-level abstract of the META. (2) The ESN\u27s semantic relationship distribution and concept configuration were studied. Rules were defined to derive the ESN\u27s semantic relationship distribution from the current SN\u27s semantic relationship distribution. A mapping function was defined to map the SN\u27s concept configuration to the ESN\u27s concept configuration, avoiding redundant classifications in the ESN\u27s concept configuration. (3) Several new metaschemas for the SN and the ESN were built and evaluated based on several different partitioning techniques. Each of these metaschema can serve as a higher-level abstraction of the SN (or the ESN)

    Structural auditing methodologies for controlled terminologies

    Get PDF
    Several auditing methodologies for large controlled terminologies are developed. These are applied to the Unified Medical Language System XXXX and the National Cancer Institute Thesaurus (NCIT). Structural auditing methodologies are based on the structural aspects such as IS-A hierarchy relationships groups of concepts assigned to semantic types and groups of relationships defined for concepts. Structurally uniform groups of concepts tend to be semantically uniform. Structural auditing methodologies focus on concepts with unlikely or rare configuration. These concepts have a high likelihood for errors. One of the methodologies is based on comparing hierarchical relationships between the META and SN, two major knowledge sources of the UMLS. In general, a correspondence between them is expected since the SN hierarchical relationships should abstract the META hierarchical relationships. It may indicate an error when a mismatch occurs. The UMLS SN has 135 categories called semantic types. However, in spite of its medium size, the SN has limited use for comprehension purposes because it cannot be easily represented in a pictorial form, it has many (about 7,000) relationships. Therefore, a higher-level abstraction for the SN called a metaschema, is constructed. Its nodes are meta-semantic types, each representing a connected group of semantic types of the SN. One of the auditing methodologies is based on a kind of metaschema called a cohesive metaschema. The focus is placed on concepts of intersections of meta-semantic types. As is shown, such concepts have high likelihood for errors. Another auditing methodology is based on dividing the NCIT into areas according to the roles of its concepts. Moreover, each multi-rooted area is further divided into pareas that are singly rooted. Each p-area contains a group of structurally and semantically uniform concepts. These groups, as well as two derived abstraction networks called taxonomies, help in focusing on concepts with potential errors. With genomic research being at the forefront of bioscience, this auditing methodology is applied to the Gene hierarchy as well as the Biological Process hierarchy of the NCIT, since processes are very important for gene information. The results support the hypothesis that the occurrence of errors is related to the size of p-areas. Errors are more frequent for small p-areas

    Extracting Synonymous Gene and Protein Terms From Biological Literature

    Get PDF
    Genes and proteins are often associated with multiple names. More names are added as new functional or structural information is discovered. Because authors can use any one of the known names for a gene or protein, information retrieval and extraction would benefit from identifying the gene and protein terms that are synonyms of the same substance

    A Model for a Data Dictionary Supporting Multiple Definitions, Views and Contexts

    Get PDF
    Auf dem Gebiet der Klinischen Studien sind präzise Begriffsdefinitionen äußerst wichtig, um eine objektive Datenerfassung und -auswertung zu gewährleisten. Zudem ermöglichen sie externen Experten die Forschungsergebnisse korrekt zu interpretieren und anzuwenden. Allerdings weisen viele Klinische Studien Defizite in diesem Punkt auf: Definitionen sind oft ungenau oder werden implizit verwendet. Außerdem sind Begriffe oft uneinheitlich definiert, obwohl standardisierte Definitionen im Hinblick auf einen weitreichenderen Austausch von Ergebnissen wünschenswert sind. Vor diesem Hintergrund entstand die Idee des Data Dictionary, dessen Ziel zunächst darin besteht, die Definitionsalternativen von Begriffen zu sammeln und Klinischen Studien zur Verfügung zu stellen. Zusätzlich soll die Analyse der Definitionen in Bezug auf ihre Gemeinsamkeiten und Unterschiede sowie deren Harmonisierung unterstützt werden. Standardisierte Begriffsdefinitionen werden jedoch nicht erzwungen, da die Unterschiede in Definitionen inhaltlich gerechtfertigt sein können, z.B. aufgrund der Verwendung in unterschiedlichen Fachgebieten, durch studienspezifische Bedingungen oder verschiedene Expertensichten. In der vorliegenden Arbeit wird ein Modell für das Data Dictionary entwickelt. Das entwickelte Modell folgt dem aus der Terminologie bekannten konzept-basierten Ansatz und erweitert diesen um die Möglichkeit der Repräsentation alternativer Definitionen. Insbesondere wird hierbei angestrebt, die Unterschiede in den Definitionen möglichst genau zu explizieren, um zwischen inhaltlich verschiedenen Definitionsalternativen (z.B. sich wider-sprechenden Expertenmeinungen) und konsistenten Varianten einer inhaltlichen Definition (z.B. verschiedene Sichten, Übersetzungen in verschiedene Sprachen) unterscheiden zu können. Mehrere Modellelemente widmen sich zudem der Explizierung von kontextuellen Informationen (z.B. der Gültigkeit innerhalb von Organisationen oder der Domäne zu der ein Konzept gehört), um die Auswahl und Wiederverwendung von Definitionen zu unterstützen. Diese Informationen erlauben verschiedene Sichten auf die Inhalte des Data Dictionary. Sichten werden dabei als kohärente Teilmengen des Data Dictionary betrachtet, die nur diejenigen Inhalte umfassen, die als relevant im ausgewählten Kontext spezifiziert sind

    DEVELOPING A CLINICAL LINGUISTIC FRAMEWORK FOR PROBLEM LIST GENERATION FROM CLINICAL TEXT

    Get PDF
    Regulatory institutions such as the Institute of Medicine and Joint Commission endorse problem lists as an effective method to facilitate transitions of care for patients. In practice, the problem list is a common model for documenting a care provider's medical reasoning with respect to a problem and its status during patient care. Although natural language processing (NLP) systems have been developed to support problem list generation, encoding many information layers - morphological, syntactic, semantic, discourse, and pragmatic - can prove computationally expensive. The contribution of each information layer for accurate problem list generation has not been formally assessed. We would expect a problem list generator that relies on natural language processing would improve its performance with the addition of rich semantic features We hypothesize that problem list generation can be approached as a two-step classification problem - problem mention status (Aim One) and patient problem status (Aim Two) classification. In Aim One, we will automatically classify the status of each problem mention using semantic features about problems described in the clinical narrative. In Aim Two, we will classify active patient problems from individual problem mentions and their statuses. We believe our proposal is significant in two ways. First, our experiments will develop and evaluate semantic features, some commonly modeled and others not in the clinical text. The annotations we use will be made openly available to other NLP researchers to encourage future research on this task and other related problems including foundational NLP algorithms (assertion classification and coreference resolution) and applied clinical applications (patient timeline and record visualization). Second, by generating and evaluating existing NLP systems, we are building an open-source problem list generator and demonstrating the performance for problem list generation using these features
    corecore