4 research outputs found
Multimodal LLMs for health grounded in individual-specific data
Foundation large language models (LLMs) have shown an impressive ability to
solve tasks across a wide range of fields including health. To effectively
solve personalized health tasks, LLMs need the ability to ingest a diversity of
data modalities that are relevant to an individual's health status. In this
paper, we take a step towards creating multimodal LLMs for health that are
grounded in individual-specific data by developing a framework (HeLM: Health
Large Language Model for Multimodal Understanding) that enables LLMs to use
high-dimensional clinical modalities to estimate underlying disease risk. HeLM
encodes complex data modalities by learning an encoder that maps them into the
LLM's token embedding space and for simple modalities like tabular data by
serializing the data into text. Using data from the UK Biobank, we show that
HeLM can effectively use demographic and clinical features in addition to
high-dimensional time-series data to estimate disease risk. For example, HeLM
achieves an AUROC of 0.75 for asthma prediction when combining tabular and
spirogram data modalities compared with 0.49 when only using tabular data.
Overall, we find that HeLM outperforms or performs at parity with classical
machine learning approaches across a selection of eight binary traits.
Furthermore, we investigate the downstream uses of this model such as its
generalizability to out-of-distribution traits and its ability to power
conversations around individual health and wellness
Deep Learning for Distinguishing Normal versus Abnormal Chest Radiographs and Generalization to Unseen Diseases
Chest radiography (CXR) is the most widely-used thoracic clinical imaging
modality and is crucial for guiding the management of cardiothoracic
conditions. The detection of specific CXR findings has been the main focus of
several artificial intelligence (AI) systems. However, the wide range of
possible CXR abnormalities makes it impractical to build specific systems to
detect every possible condition. In this work, we developed and evaluated an AI
system to classify CXRs as normal or abnormal. For development, we used a
de-identified dataset of 248,445 patients from a multi-city hospital network in
India. To assess generalizability, we evaluated our system using 6
international datasets from India, China, and the United States. Of these
datasets, 4 focused on diseases that the AI was not trained to detect: 2
datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our
results suggest that the AI system generalizes to new patient populations and
abnormalities. In a simulated workflow where the AI system prioritized abnormal
cases, the turnaround time for abnormal cases reduced by 7-28%. These results
represent an important step towards evaluating whether AI can be safely used to
flag cases in a general setting where previously unseen abnormalities exist
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or
ELIXR, leverages a language-aligned image encoder combined or grafted onto a
fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight
adapter architecture using images paired with corresponding free-text radiology
reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance
on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13
findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898
across five findings (atelectasis, cardiomegaly, consolidation, pleural
effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images)
training data), and semantic search (0.76 normalized discounted cumulative gain
(NDCG) across nineteen queries, including perfect retrieval on twelve of them).
Compared to existing data-efficient methods including supervised contrastive
learning (SupCon), ELIXR required two orders of magnitude less data to reach
similar performance. ELIXR also showed promise on CXR vision-language tasks,
demonstrating overall accuracies of 58.7% and 62.5% on visual question
answering and report quality assurance tasks, respectively. These results
suggest that ELIXR is a robust and versatile approach to CXR AI