1,506 research outputs found
Mapping the Gene Ontology Into the Unified Medical Language System
We have recently mapped the Gene Ontology (GO), developed by the Gene Ontology
Consortium, into the National Library of Medicine's Unified Medical Language
System (UMLS). GO has been developed for the purpose of annotating gene products
in genome databases, and the UMLS has been developed as a framework for
integrating large numbers of disparate terminologies, primarily for the purpose of
providing better access to biomedical information sources. The mapping of GO to
UMLS highlighted issues in both terminology systems. After some initial explorations
and discussions between the UMLS and GO teams, the GO was integrated with the
UMLS. Overall, a total of 23% of the GO terms either matched directly (3%) or
linked (20%) to existing UMLS concepts. All GO terms now have a corresponding,
official UMLS concept, and the entire vocabulary is available through the web-based
UMLS Knowledge Source Server. The mapping of the Gene Ontology, with its focus
on structures, processes and functions at the molecular level, to the existing broad
coverage UMLS should contribute to linking the language and practices of clinical
medicine to the language and practices of genomics
Complementary and Integrative Health Lexicon (CIHLex) and Entity Recognition in the Literature
Objective: Our study aimed to construct an exhaustive Complementary and
Integrative Health (CIH) Lexicon (CIHLex) to better represent the often
underrepresented physical and psychological CIH approaches in standard
terminologies. We also intended to apply advanced Natural Language Processing
(NLP) models such as Bidirectional Encoder Representations from Transformers
(BERT) and GPT-3.5 Turbo for CIH named entity recognition, evaluating their
performance against established models like MetaMap and CLAMP. Materials and
Methods: We constructed the CIHLex by integrating various resources, compiling
and integrating data from biomedical literature and relevant knowledge bases.
The Lexicon encompasses 198 unique concepts with 1090 corresponding unique
terms. We matched these concepts to the Unified Medical Language System (UMLS).
Additionally, we developed and utilized BERT models and compared their
efficiency in CIH named entity recognition to that of other models such as
MetaMap, CLAMP, and GPT3.5-turbo. Results: From the 198 unique concepts in
CIHLex, 62.1% could be matched to at least one term in the UMLS. Moreover,
75.7% of the mapped UMLS Concept Unique Identifiers (CUIs) were categorized as
"Therapeutic or Preventive Procedure." Among the models applied to CIH named
entity recognition, BLUEBERT delivered the highest macro average F1-score of
0.90, surpassing other models. Conclusion: Our CIHLex significantly augments
representation of CIH approaches in biomedical literature. Demonstrating the
utility of advanced NLP models, BERT notably excelled in CIH entity
recognition. These results highlight promising strategies for enhancing
standardization and recognition of CIH terminology in biomedical contexts
Towards more Challenging Problems for Ontology Matching Tools
We motivate the need for challenging problems in the evaluation of ontology matching tools. To address this need, we propose mapping sets between well-known biomedical ontologies that are based on the UMLS Metathesaurus. These mappings could be used as a basis for a new track in future OAEI campaigns (http://oaei.ontologymatching.org/).

Parsing MetaMap Files in Hadoop
The UMLS::Association CUICollector module identifies UMLS Concept Unique Identifier bigrams and their frequencies in a biomedical text corpus. CUICollector was re-implemented in Hadoop MapReduce to improve algorithm speed, flexibility, and scalability. Evaluation of the Hadoop implementation compared to the serial module produced equivalent results and achieved a 28x speedup on a single-node Hadoop system
Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts
In this paper, we report a knowledge-based method for Word Sense
Disambiguation in the domains of biomedical and clinical text. We combine word
representations created on large corpora with a small number of definitions
from the UMLS to create concept representations, which we then compare to
representations of the context of ambiguous terms. Using no relational
information, we obtain comparable performance to previous approaches on the
MSH-WSD dataset, which is a well-known dataset in the biomedical domain.
Additionally, our method is fast and easy to set up and extend to other
domains. Supplementary materials, including source code, can be found at https:
//github.com/clips/yarnComment: 6 pages, 1 figure, presented at the 15th Workshop on Biomedical
Natural Language Processing, Berlin 201
Utilizing RxNorm to Support Practical Computing Applications: Capturing Medication History in Live Electronic Health Records
RxNorm was utilized as the basis for direct-capture of medication history
data in a live EHR system deployed in a large, multi-state outpatient
behavioral healthcare provider in the United States serving over 75,000
distinct patients each year across 130 clinical locations. This tool
incorporated auto-complete search functionality for medications and proper
dosage identification assistance. The overarching goal was to understand if and
how standardized terminologies like RxNorm can be used to support practical
computing applications in live EHR systems. We describe the stages of
implementation, approaches used to adapt RxNorm's data structure for the
intended EHR application, and the challenges faced. We evaluate the
implementation using a four-factor framework addressing flexibility, speed,
data integrity, and medication coverage. RxNorm proved to be functional for the
intended application, given appropriate adaptations to address high-speed
input/output (I/O) requirements of a live EHR and the flexibility required for
data entry in multiple potential clinical scenarios. Future research around
search optimization for medication entry, user profiling, and linking RxNorm to
drug classification schemes holds great potential for improving the user
experience and utility of medication data in EHRs.Comment: Appendix (including SQL/DDL Code) available by author request.
Keywords: RxNorm; Electronic Health Record; Medication History;
Interoperability; Unified Medical Language System; Search Optimizatio
PhenDisco: phenotype discovery system for the database of genotypes and phenotypes.
The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net
Terminologia Anatomica; Considered from the Perspective of Next-Generation Knowledge Sources
This report examines the semantic structure of Terminologia Anatomica, taking one randomly selected page as an example. The focus of analysis is the meaning imparted to an anatomical term by virtue of its location within the structured list. Terminologiaās structure expressed through hierarchies of headings, varied typographical styles, indentations and an alphanumeric code implies specific relationships between the terms embedded in the list. Together, terms and relationships can potentially capture essential elements of anatomical knowledge. The analysis focuses on these knowledge elements and evaluates the consistency and logic in their representation. Most critical of these elements are class inclusion and part-whole relationships, which are implied, rather than explicitly modeled by Terminologia. This limits the use of the term list to those who have some knowledge of anatomy and excludes computer programs from navigating through the terminology. Assuring consistency in the explicit representation of anatomical relationships would facilitate adoption of Terminologia as the anatomical standard by the various controlled medical terminology (CMT) projects. These projects are motivated by the need for computerizing the patient record, and their aim is to generate machineunderstandable representations of biomedical concepts, including anatomy. Because of the lack of a consistent and explicit representation of anatomy, each of these CMTs has generated it own anatomy model. None of these models is compatible with each other, yet each is consistent with textbook descriptions of anatomy. The analysis of the semantic structure of Terminologia Anatomica leads to some suggestions for enhancing the term list in ways that would facilitate its adoption as the standard for anatomical knowledge representation in biomedical informatics
- ā¦