161 research outputs found
EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria.
Effective clinical text processing requires accurate extraction and representation of temporal expressions. Multiple temporal information extraction models were developed but a similar need for extracting temporal expressions in eligibility criteria (e.g., for eligibility determination) remains. We identified the temporal knowledge representation requirements of eligibility criteria by reviewing 100 temporal criteria. We developed EliXR-TIME, a frame-based representation designed to support semantic annotation for temporal expressions in eligibility criteria by reusing applicable classes from well-known clinical temporal knowledge representations. We used EliXR-TIME to analyze a training set of 50 new temporal eligibility criteria. We evaluated EliXR-TIME using an additional random sample of 20 eligibility criteria with temporal expressions that have no overlap with the training data, yielding 92.7% (76 / 82) inter-coder agreement on sentence chunking and 72% (72 / 100) agreement on semantic annotation. We conclude that this knowledge representation can facilitate semantic annotation of the temporal expressions in eligibility criteria
Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis
Phenotype-driven gene prioritization is a critical process in the diagnosis
of rare genetic disorders for identifying and ranking potential disease-causing
genes based on observed physical traits or phenotypes. While traditional
approaches rely on curated knowledge graphs with phenotype-gene relations,
recent advancements in large language models have opened doors to the potential
of AI predictions through extensive training on diverse corpora and complex
models. This study conducted a comprehensive evaluation of five large language
models, including two Generative Pre-trained Transformers series, and three
Llama2 series, assessing their performance across three key metrics: task
completeness, gene prediction accuracy, and adherence to required output
structures. Various experiments explored combinations of models, prompts, input
types, and task difficulty levels. Our findings reveal that even the
best-performing LLM, GPT-4, achieved an accuracy of 16.0%, which still lags
behind traditional bioinformatics tools. Prediction accuracy increased with the
parameter/model size. A similar increasing trend was observed for the task
completion rate, with complicated prompts more likely to increase task
completeness in models smaller than GPT-4. However, complicated prompts are
more likely to decrease the structure compliance rate, but no prompt effects on
GPT-4. Compared to HPO term-based input, LLM was also able to achieve better
than random prediction accuracy by taking free-text input, but slightly lower
than with the HPO input. Bias analysis showed that certain genes, such as
MECP2, CDKL5, and SCN1A, are more likely to be top-ranked, potentially
explaining the variances observed across different datasets. This study
provides valuable insights into the integration of LLMs within genomic
analysis, contributing to the ongoing discussion on the utilization of advanced
LLMs in clinical workflows.Comment: 56 pages, 6 figures, 6 tables, 2 supplementary table
Recommended from our members
Graphic analysis of population structure on genome-wide rheumatoid arthritis data
Principal-component analysis (PCA) has been used for decades to summarize the human genetic variation across geographic regions and to infer population migration history. Reduction of spurious associations due to population structure is crucial for the success of disease association studies. Recently, PCA has also become a popular method for detecting population structure and correction of population stratification in disease association studies. Inspired by manifold learning, we propose a novel method based on spectral graph theory. Regarding each study subject as a node with suitably defined weights for its edges to close neighbors, one can form a weighted graph. We suggest using the spectrum of the associated graph Laplacian operator, namely, Laplacian eigenfunctions, to infer population structures instead of principal components (PCs). For the whole genome-wide association data for the North American Rheumatoid Arthritis Consortium (NARAC) provided by Genetic Workshop Analysis 16, Laplacian eigenfunctions revealed more meaningful structures of the underlying population than PCA. The proposed method has connection to PCA, and it naturally includes PCA as a special case. Our simple method is computationally fast and is suitable for disease studies at the genome-wide scale
A Span-based Model for Extracting Overlapping PICO Entities from RCT Publications
Objectives Extraction of PICO (Populations, Interventions, Comparison, and
Outcomes) entities is fundamental to evidence retrieval. We present a novel
method PICOX to extract overlapping PICO entities.
Materials and Methods PICOX first identifies entities by assessing whether a
word marks the beginning or conclusion of an entity. Then it uses a multi-label
classifier to assign one or more PICO labels to a span candidate. PICOX was
evaluated using one of the best-performing baselines, EBM-NLP, and three more
datasets, i.e., PICO-Corpus, and RCT publications on Alzheimer's Disease or
COVID-19, using entity-level precision, recall, and F1 scores.
Results PICOX achieved superior precision, recall, and F1 scores across the
board, with the micro F1 score improving from 45.05 to 50.87 (p << 0.01). On
the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline
and improved the micro recall score from 56.66 to 67.33. On the COVID-19
dataset, PICOX also outperformed the baseline and improved the micro F1 score
from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores
with higher precision when compared to the baseline.
Conclusion PICOX excels in identifying overlapping entities and consistently
surpasses a leading baseline across multiple datasets. Ablation studies reveal
that its data augmentation strategy effectively minimizes false positives and
improves precision
Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT
We hypothesize that large language models (LLMs) based on the transformer
architecture can enable automated detection of clinical phenotype terms,
including terms not documented in the HPO. In this study, we developed two
types of models: PhenoBCBERT, a BERT-based model, utilizing Bio+Clinical BERT
as its pre-trained model, and PhenoGPT, a GPT-based model that can be
initialized from diverse GPT models, including open-source versions such as
GPT-J, Falcon, and LLaMA, as well as closed-source versions such as GPT-3 and
GPT-3.5. We compared our methods with PhenoTagger, a recently developed HPO
recognition tool that combines rule-based and deep learning methods. We found
that our methods can extract more phenotype concepts, including novel ones not
characterized by HPO. We also performed case studies on biomedical literature
to illustrate how new phenotype information can be recognized and extracted. We
compared current BERT-based versus GPT-based models for phenotype tagging, in
multiple aspects including model architecture, memory usage, speed, accuracy,
and privacy protection. We also discussed the addition of a negation step and
an HPO normalization layer to the transformer models for improved HPO term
tagging. In conclusion, PhenoBCBERT and PhenoGPT enable the automated discovery
of phenotype terms from clinical notes and biomedical literature, facilitating
automated downstream tasks to derive new biological insights on human diseases
Assessing the readiness of precision medicine interoperabilty: An exploratory study of the National Institutes of Health genetic testing registry
Background: Precision medicine involves three major innovations currently taking place in healthcare: electronic health records, genomics, and big data. A major challenge for healthcare providers, however, is understanding the readiness for practical application of initiatives like precision medicine.Objective: To better understand the current state and challenges of precision medicine interoperability using a national genetic testing registry as a starting point, placed in the context of established interoperability formats.Methods: We performed an exploratory analysis of the National Institutes of Health Genetic Testing Registry. Relevant standards included Health Level Seven International Version 3 Implementation Guide for Family History, the Human Genome Organization Gene Nomenclature Committee (HGNC) database, and Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT). We analyzed the distribution of genetic testing laboratories, genetic test characteristics, and standardized genome/clinical code mappings, stratified by laboratory setting.Results: There were a total of 25472 genetic tests from 240 laboratories testing for approximately 3632 distinct genes. Most tests focused on diagnosis, mutation confirmation, and/or risk assessment of germline mutations that could be passed to offspring. Genes were successfully mapped to all HGNC identifiers, but less than half of tests mapped to SNOMED CT codes, highlighting significant gaps when linking genetic tests to standardized clinical codes that explain the medical motivations behind test ordering.  Conclusion: While precision medicine could potentially transform healthcare, successful practical and clinical application will first require the comprehensive and responsible adoption of interoperable standards, terminologies, and formats across all aspects of the precision medicine pipeline
Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research
Background: To demonstrate that subject selection based on sufficient laboratory results and medication orders in electronic health records can be biased towards sick patients. Methods: Using electronic health record data from 10,000 patients who received anesthetic services at a major metropolitan tertiary care academic medical center, an affiliated hospital for women and children, and an affiliated urban primary care hospital, the correlation between patient health status and counts of days with laboratory results or medication orders, as indicated by the American Society of Anesthesiologists Physical Status Classification (ASA Class), was assessed with a Negative Binomial Regression model. Results: Higher ASA Class was associated with more points of data: compared to ASA Class 1 patients, ASA Class 4 patients had 5.05 times the number of days with laboratory results and 6.85 times the number of days with medication orders, controlling for age, sex, emergency status, admission type, primary diagnosis, and procedure. Conclusions: Imposing data sufficiency requirements for subject selection allows researchers to minimize missing data when reusing electronic health records for research, but introduces a bias towards the selection of sicker patients. We demonstrated the relationship between patient health and quantity of data, which may result in a systematic bias towards the selection of sicker patients for research studies and limit the external validity of research conducted using electronic health record data. Additionally, we discovered other variables (i.e., admission status, age, emergency classification, procedure, and diagnosis) that independently affect data sufficiency
- …