37 research outputs found

    Self-Supervised Time-to-Event Modeling with Structured Medical Records

    Full text link
    Time-to-event (TTE) models are used in medicine and other fields for estimating the probability distribution of the time until a specific event occurs. TTE models provide many advantages over classification using fixed time horizons, including naturally handling censored observations, but require more parameters and are challenging to train in settings with limited labeled data. Existing approaches, e.g. proportional hazards or accelerated failure time, employ distributional assumptions to reduce parameters but are vulnerable to model misspecification. In this work, we address these challenges with MOTOR (Many Outcome Time Oriented Representations), a self-supervised model that leverages temporal structure found in collections of timestamped events in electronic health records (EHR) and health insurance claims. MOTOR uses a TTE pretraining objective that predicts the probability distribution of times when events occur, making it well-suited to transfer learning for medical prediction tasks. Having pretrained on EHR and claims data of up to 55M patient records (9B clinical events), we evaluate performance after finetuning for 19 tasks across two datasets. Task-specific models built using MOTOR improve time-dependent C statistics by 4.6% over state-of-the-art while greatly improving sample efficiency, achieving comparable performance to existing methods using only 5% of available task data

    Clinical Utility Gains from Incorporating Comorbidity and Geographic Location Information into Risk Estimation Equations for Atherosclerotic Cardiovascular Disease

    Full text link
    Objective: There are several efforts to re-learn the 2013 ACC/AHA pooled cohort equations (PCE) for patients with specific comorbidities and geographic locations. With over 363 customized risk models in the literature, we aim to evaluate such revised models to determine if the performance improvements translate to gains in clinical utility. Methods: We re-train a baseline PCE using the ACC/AHA PCE variables and revise it to incorporate subject-level geographic location and comorbidity information. We apply fixed effects, random effects, and extreme gradient boosting models to handle the correlation and heterogeneity induced by locations. Models are trained using 2,464,522 claims records from Optum Clinformatics Data Mart and validated in the hold-out set (N=1,056,224). We evaluate models' performance overall and across subgroups defined by the presence or absence of chronic kidney disease (CKD) or rheumatoid arthritis (RA) and geographic locations. We evaluate models' expected net benefit using decision curve analysis and models' statistical properties using several discrimination and calibration metrics. Results: The baseline PCE is miscalibrated overall, in patients with CKD or RA, and locations with small populations. Our revised models improved both the overall (GND P-value=0.41) and subgroup calibration but only enhanced net benefit in the underrepresented subgroups. The gains are larger in the subgroups with comorbidities and heterogeneous across geographic locations. Conclusions: Revising the PCE with comorbidity and location information significantly enhanced models' calibration; however, such improvements do not necessarily translate to clinical gains. Thus, we recommend future works to quantify the consequences from using risk calculators to guide clinical decisions

    The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs

    Full text link
    The successes of foundation models such as ChatGPT and AlphaFold have spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. We review over 80 foundation models trained on non-imaging EMR data (i.e. clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g. MIMIC-III) or broad, public biomedical corpora (e.g. PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. In light of these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.Comment: Reformatted figures, updated contribution

    INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

    Full text link
    Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications). Using INSPECT, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and multimodal fusion models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best of our knowledge, INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research

    Instability in clinical risk stratification models using deep learning

    Full text link
    While it has been well known in the ML community that deep learning models suffer from instability, the consequences for healthcare deployments are under characterised. We study the stability of different model architectures trained on electronic health records, using a set of outpatient prediction tasks as a case study. We show that repeated training runs of the same deep learning model on the same training data can result in significantly different outcomes at a patient level even though global performance metrics remain stable. We propose two stability metrics for measuring the effect of randomness of model training, as well as mitigation strategies for improving model stability.Comment: Accepted for publication in Machine Learning for Health (ML4H) 202

    AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature

    Get PDF
    The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient’s disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient’s given set of phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database–based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children’s Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu

    Clinical Trial Reference Set

    No full text
    corecore