8 research outputs found
Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise
Noisy training labels can hurt model performance. Most approaches that aim to
address label noise assume label noise is independent from the input features.
In practice, however, label noise is often feature or
\textit{instance-dependent}, and therefore biased (i.e., some instances are
more likely to be mislabeled than others). E.g., in clinical care, female
patients are more likely to be under-diagnosed for cardiovascular disease
compared to male patients. Approaches that ignore this dependence can produce
models with poor discriminative performance, and in many healthcare settings,
can exacerbate issues around health disparities. In light of these limitations,
we propose a two-stage approach to learn in the presence instance-dependent
label noise. Our approach utilizes \textit{\anchor points}, a small subset of
data for which we know the observed and ground truth labels. On several tasks,
our approach leads to consistent improvements over the state-of-the-art in
discriminative performance (AUROC) while mitigating bias (area under the
equalized odds curve, AUEOC). For example, when predicting acute respiratory
failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean
(AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next
best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while
mitigating potential bias compared to existing approaches in the presence of
instance-dependent label noise
P4‐554: An Ehr‐Based Cohort Discovery Tool For Identifying Probable Ad
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153029/1/alzjjalz201908101.pd
Cohort discovery and risk stratification for Alzheimer’s disease: an electronic health record‐based approach
BackgroundWe sought to leverage data routinely collected in electronic health records (EHRs), with the goal of developing patient risk stratification tools for predicting risk of developing Alzheimer’s disease (AD).MethodUsing EHR data from the University of Michigan (UM) hospitals and consensus‐based diagnoses from the Michigan Alzheimer’s Disease Research Center, we developed and validated a cohort discovery tool for identifying patients with AD. Applied to all UM patients, these labels were used to train an EHR‐based machine learning model for predicting AD onset within 10 years.ResultsApplied to a test cohort of 1697 UM patients, the model achieved an area under the receiver operating characteristics curve of 0.70 (95% confidence interval = 0.63‐0.77). Important predictive factors included cardiovascular factors and laboratory blood testing.ConclusionRoutinely collected EHR data can be used to predict AD onset with modest accuracy. Mining routinely collected data could shed light on early indicators of AD appearance and progression.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155901/1/trc212035-sup-0001-SuppMat.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155901/2/trc212035_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155901/3/trc212035.pd
P4‐555: Ehr‐Based Patient Risk Stratification Tool For Probable Ad
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153132/1/alzjjalz201908102.pd
A Hierarchical Approach to Multi-Event Survival Analysis
In multi-event survival analysis, one aims to predict the probability of multiple different events occurring over some time horizon. One typically assumes that the timing of events is drawn from some distribution conditioned on an individual's covariates. However, during training, one does not have access to this distribution, and the natural variation in the observed event times makes the task of survival prediction challenging, on top of the potential interdependence among events. To address this issue, we introduce a novel approach for multi-event survival analysis that models the probability of event occurrence hierarchically at different time scales, using coarse predictions (e.g., monthly predictions) to iteratively guide predictions at finer and finer grained time scales (e.g., daily predictions). We evaluate the proposed approach across several publicly available datasets in terms of both intra-event, inter-individual (global) and intra-individual, inter-event (local) consistency. We show that the proposed method consistently outperforms well-accepted and commonly used approaches to multi-event survival analysis. When estimating survival curves for Alzheimer's disease and mortality, our approach achieves a C-index of 0.91 (95% CI 0.88-0.93) and a local consistency score of 0.97 (95% CI 0.94-0.98) compared to a C-index of 0.75 (95% CI 0.70-0.80) and a local consistency score of 0.94 (95% CI 0.91-0.97) when modeling each event separately. Overall, our approach improves the accuracy of survival predictions by iteratively reducing the original task to a set of nested, simpler subtasks
Use of blood pressure measurements extracted from the electronic health record in predicting Alzheimer’s disease: A retrospective cohort study at two medical centers
IntroductionStudies investigating the relationship between blood pressure (BP) measurements from electronic health records (EHRs) and Alzheimer’s disease (AD) rely on summary statistics, like BP variability, and have only been validated at a single institution. We hypothesize that leveraging BP trajectories can accurately estimate AD risk across different populations.MethodsIn a retrospective cohort study, EHR data from Veterans Affairs (VA) patients were used to train and internally validate a machine learning model to predict AD onset within 5 years. External validation was conducted on patients from Michigan Medicine (MM).ResultsThe VA and MM cohorts included 6860 and 1201 patients, respectively. Model performance using BP trajectories was modest but comparable (area under the receiver operating characteristic curve [AUROC] = 0.64 [95% confidence interval (CI) = 0.54–0.73] for VA vs. AUROC = 0.66 [95% CI = 0.55–0.76] for MM).ConclusionApproaches that directly leverage BP trajectories from EHR data could aid in AD risk stratification across institutions.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/175219/1/alz12676.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/175219/2/alz12676_am.pd
Characterizing heterogeneity in the progression of Alzheimer’s disease using longitudinal clinical and neuroimaging biomarkers
IntroductionModels characterizing intermediate disease stages of Alzheimer’s disease (AD) are needed to inform clinical care and prognosis. Current models, however, use only a small subset of available biomarkers, capturing only coarse changes along the complete spectrum of disease progression. We propose the use of machine learning techniques and clinical, biochemical, and neuroimaging biomarkers to characterize progression to AD.MethodsWe used a large multimodal longitudinal data set of biomarkers and demographic and genotype information from 1624 participants from the Alzheimer’s Disease Neuroimaging Initiative. Using hidden Markov models, we characterized intermediate disease stages. We validated inferred disease trajectories by comparing time to first clinical AD diagnosis. We trained an L2‐regularized logistic regression model to predict disease trajectory and evaluated its discriminative performance on a test set.ResultsWe identified 12 distinct disease states. Progression to AD occurred most often through one of two possible paths through these states. Paths differed in terms of rate of disease progression (by 5.44 years on average), amyloid and total‐tau (t‐tau) burden (by 10% and 69%, respectively), and hippocampal neurodegeneration (P < .001). On the test set, the predictive model achieved an area under the receiver operating characteristic curve of 0.85.DiscussionProgression to AD, in terms of biomarker trajectories, can be predicted based on participant‐specific factors. Such disease staging tools could help in targeting high‐risk patients for therapeutic intervention trials. As longitudinal data with richer features are collected, such models will help increase our understanding of the factors that drive the different trajectories of AD.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153030/1/dad2jdadm201806007.pd
Predicting 5‐year dementia conversion in veterans with mild cognitive impairment
Abstract INTRODUCTION Identifying mild cognitive impairment (MCI) patients at risk for dementia could facilitate early interventions. Using electronic health records (EHRs), we developed a model to predict MCI to all‐cause dementia (ACD) conversion at 5 years. METHODS Cox proportional hazards model was used to identify predictors of ACD conversion from EHR data in veterans with MCI. Model performance (area under the receiver operating characteristic curve [AUC] and Brier score) was evaluated on a held‐out data subset. RESULTS Of 59,782 MCI patients, 15,420 (25.8%) converted to ACD. The model had good discriminative performance (AUC 0.73 [95% confidence interval (CI) 0.72–0.74]), and calibration (Brier score 0.18 [95% CI 0.17–0.18]). Age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors, while body mass index, alcohol abuse, and sleep apnea were protective factors. DISCUSSION EHR‐based prediction model had good performance in identifying 5‐year MCI to ACD conversion and has potential to assist triaging of at‐risk patients. Highlights Of 59,782 veterans with mild cognitive impairment (MCI), 15,420 (25.8%) converted to all‐cause dementia within 5 years. Electronic health record prediction models demonstrated good performance (area under the receiver operating characteristic curve 0.73; Brier 0.18). Age and vascular‐related morbidities were predictors of dementia conversion. Synthetic data was comparable to real data in modeling MCI to dementia conversion. Key Points An electronic health record–based model using demographic and co‐morbidity data had good performance in identifying veterans who convert from mild cognitive impairment (MCI) to all‐cause dementia (ACD) within 5 years. Increased age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors for 5‐year conversion from MCI to ACD. High body mass index, alcohol abuse, and sleep apnea were protective factors for 5‐year conversion from MCI to ACD. Models using synthetic data, analogs of real patient data that retain the distribution, density, and covariance between variables of real patient data but are not attributable to any specific patient, performed just as well as models using real patient data. This could have significant implications in facilitating widely distributed computing of health‐care data with minimized patient privacy concern that could accelerate scientific discoveries