20 research outputs found
Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of Biomedical Definitions Generated from Ontological Knowledge
Background: More than 400,000 biomedical concepts and some of their
relationships are contained in SnomedCT, a comprehensive biomedical ontology.
However, their concept names are not always readily interpretable by
non-experts, or patients looking at their own electronic health records (EHR).
Clear definitions or descriptions in understandable language are often not
available. Therefore, generating human-readable definitions for biomedical
concepts might help make the information they encode more accessible and
understandable to a wider public.
Objective: In this article, we introduce the Automatic Glossary of Clinical
Terminology (AGCT), a large-scale biomedical dictionary of clinical concepts
generated using high-quality information extracted from the biomedical
knowledge contained in SnomedCT.
Methods: We generate a novel definition for every SnomedCT concept, after
prompting the OpenAI Turbo model, a variant of GPT 3.5, using a high-quality
verbalization of the SnomedCT relationships of the to-be-defined concept. A
significant subset of the generated definitions was subsequently judged by NLP
researchers with biomedical expertise on 5-point scales along the following
three axes: factuality, insight, and fluency.
Results: AGCT contains 422,070 computer-generated definitions for SnomedCT
concepts, covering various domains such as diseases, procedures, drugs, and
anatomy. The average length of the definitions is 49 words. The definitions
were assigned average scores of over 4.5 out of 5 on all three axes, indicating
a majority of factual, insightful, and fluent definitions.
Conclusion: AGCT is a novel and valuable resource for biomedical tasks that
require human-readable definitions for SnomedCT concepts. It can also serve as
a base for developing robust biomedical retrieval models or other applications
that leverage natural language understanding of biomedical knowledge.Comment: Accepted at the BioNLP 2023 worksho
BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights
In this study, we investigate the potential of Large Language Models to
complement biomedical knowledge graphs in the training of semantic models for
the biomedical and clinical domains. Drawing on the wealth of the UMLS
knowledge graph and harnessing cutting-edge Large Language Models, we propose a
new state-of-the-art approach for obtaining high-fidelity representations of
biomedical concepts and sentences, consisting of three steps: an improved
contrastive learning phase, a novel self-distillation phase, and a weight
averaging phase. Through rigorous evaluations via the extensive BioLORD testing
suite and diverse downstream tasks, we demonstrate consistent and substantial
performance improvements over the previous state of the art (e.g. +2pts on
MedSTS, +2.5pts on MedNLI-S, +6.1pts on EHR-Rel-B). Besides our new
state-of-the-art biomedical model for English, we also distill and release a
multilingual model compatible with 50+ languages and finetuned on 7 European
languages. Many clinical pipelines can benefit from our latest models. Our new
multilingual model enables a range of languages to benefit from our
advancements in biomedical semantic representation learning, opening a new
avenue for bioinformatics researchers around the world. As a result, we hope to
see BioLORD-2023 becoming a precious tool for future biomedical applications.Comment: Preprint of upcoming journal articl
BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance
Timely and accurate extraction of Adverse Drug Events (ADE) from biomedical
literature is paramount for public safety, but involves slow and costly manual
labor. We set out to improve drug safety monitoring (pharmacovigilance, PV)
through the use of Natural Language Processing (NLP). We introduce BioDEX, a
large-scale resource for Biomedical adverse Drug Event Extraction, rooted in
the historical output of drug safety reporting in the U.S. BioDEX consists of
65k abstracts and 19k full-text biomedical papers with 256k associated
document-level safety reports created by medical experts. The core features of
these reports include the reported weight, age, and biological sex of a
patient, a set of drugs taken by the patient, the drug dosages, the reactions
experienced, and whether the reaction was life threatening. In this work, we
consider the task of predicting the core information of the report given its
originating paper. We estimate human performance to be 72.0% F1, whereas our
best model achieves 62.3% F1, indicating significant headroom on this task. We
also begin to explore ways in which these models could help professional PV
reviewers. Our code and data are available: https://github.com/KarelDO/BioDEX.Comment: 28 page
Automatic glossary of clinical terminology : a large-scale dictionary of biomedical definitions generated from ontological knowledge
BioLORD: Learning ontological representations from definitions for biomedical concepts and their textual descriptions
This work introduces BioLORD, a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts. State-of-the-art methodologies operate by maximizing the similarity in representation of names referring to the same concept, and preventing collapse through contrastive learning. However, because biomedical names are not always self-explanatory, it sometimes results in non-semantic representations. BioLORD overcomes this issue by grounding its concept representations using definitions, as well as short descriptions derived from a multi-relational knowledge graph consisting of biomedical ontologies. Thanks to this grounding, our model produces more semantic concept representations that match more closely the hierarchical structure of ontologies. BioLORD establishes a new state of the art for text similarity on both clinical sentences (MedSTS) and biomedical concepts (MayoSRS)
Detecting Idiomatic Multiword Expressions in Clinical Terminology using Definition-Based Representation Learning
This paper shines a light on the potential of definition-based semantic models for detecting idiomatic and semi-idiomatic multiword expressions (MWEs) in clinical terminology. Our study focuses on biomedical entities defined in the UMLS ontology and aims to help prioritize the translation efforts of these entities. In particular, we develop an effective tool for scoring the idiomaticity of biomedical MWEs based on the degree of similarity between the semantic representations of those MWEs and a weighted average of the representation of their constituents. We achieve this using a biomedical language model trained to produce similar representations for entity names and their definitions, called BioLORD. The importance of this definition-based approach is highlighted by comparing the BioLORD model to two other state-of-the-art biomedical language models based on Transformer: SapBERT and CODER. Our results show that the BioLORD model has a strong ability to identify idiomatic MWEs, not replicated in other models. Our corpus-free idiomaticity estimation helps ontology translators to focus on more challenging MWEs
Detecting Idiomatic Multiword Expressions in Clinical Terminology using Definition-Based Representation Learning
This paper shines a light on the potential of definition-based semantic
models for detecting idiomatic and semi-idiomatic multiword expressions (MWEs)
in clinical terminology. Our study focuses on biomedical entities defined in
the UMLS ontology and aims to help prioritize the translation efforts of these
entities. In particular, we develop an effective tool for scoring the
idiomaticity of biomedical MWEs based on the degree of similarity between the
semantic representations of those MWEs and a weighted average of the
representation of their constituents. We achieve this using a biomedical
language model trained to produce similar representations for entity names and
their definitions, called BioLORD. The importance of this definition-based
approach is highlighted by comparing the BioLORD model to two other
state-of-the-art biomedical language models based on Transformer: SapBERT and
CODER. Our results show that the BioLORD model has a strong ability to identify
idiomatic MWEs, not replicated in other models. Our corpus-free idiomaticity
estimation helps ontology translators to focus on more challenging MWEs.Comment: Best Paper Award @ MWE 202
Acute increase in goiter size during a normal pregnancy: An exceptional case report
SCOPUS: ar.jinfo:eu-repo/semantics/publishe