2 research outputs found
Teaching deep learning causal effects improves predictive performance
Causal inference is a powerful statistical methodology for explanatory
analysis and individualized treatment effect (ITE) estimation, a prominent
causal inference task that has become a fundamental research problem. ITE
estimation, when performed naively, tends to produce biased estimates. To
obtain unbiased estimates, counterfactual information is needed, which is not
directly observable from data. Based on mature domain knowledge, reliable
traditional methods to estimate ITE exist. In recent years, neural networks
have been widely used in clinical studies. Specifically, recurrent neural
networks (RNN) have been applied to temporal Electronic Health Records (EHR)
data analysis. However, RNNs are not guaranteed to automatically discover
causal knowledge, correctly estimate counterfactual information, and thus
correctly estimate the ITE. This lack of correct ITE estimates can hinder the
performance of the model. In this work we study whether RNNs can be guided to
correctly incorporate ITE-related knowledge and whether this improves
predictive performance. Specifically, we first describe a Causal-Temporal
Structure for temporal EHR data; then based on this structure, we estimate
sequential ITE along the timeline, using sequential Propensity Score Matching
(PSM); and finally, we propose a knowledge-guided neural network methodology to
incorporate estimated ITE. We demonstrate on real-world and synthetic data
(where the actual ITEs are known) that the proposed methodology can
significantly improve the prediction performance of RNN.Comment: 9 pages, 8 figures, in the process of SDM 202
Discovery of Type 2 Diabetes Trajectories from Electronic Health Records
University of Minnesota Ph.D. dissertation. September 2020. Major: Health Informatics. Advisor: Gyorgy Simon. 1 computer file (PDF); xiii, 110 pages.Type 2 diabetes (T2D) is one of the fastest growing public health concerns in the United States. There were 30.3 million patients (9.4% of the US populations) suffering from diabetes in 2015. Diabetes, which is the seventh leading cause of death in the United States, is known to be a non-reversible (incurable) chronic disease, leading to severe complications, including chronic kidney disease, amputation, blindness, and various cardiac and vascular diseases. Early identification of patients at high risk is regarded as the most effective clinical tool to prevent or delay the development of diabetes, allowing patients to change their life style or to receive medication earlier. In turn, these interventions can help decrease the risk of diabetes by 30-60%. Many studies have been conducted aiming at the early identification of patients at high risk in the clinical settings. These studies typically only consider the patient's current state at the time of the assessment and do not fully utilize all available information such as patient's medical history. Past history is important. It has been shown that laboratory results and vital signs can differ between diabetic and non-diabetic patients as many as 15-20 years before the onset of diabetes. We have also shown in our study that the order in which patients develop diabetes-related comorbidities is predictive of their diabetes risk even after adjusting for the severity of the comorbidities. In this thesis, we develop multiple novel methods to discover T2D trajectories from Electronic Health Records (EHR). We define trajectory as an order of in which diseases developed. We aim to discover typical and atypical trajectories where typical trajectories represent predominant patterns of progressions and atypical trajectories refer to the rest of the trajectories. Revealing trajectories can allow us to divide patients into subpopulations that can uncover the underlying etiology of diabetes. More importantly, by assessing the risk correctly and by a better understanding of the heterogeneity of diabetes, we can provide better care. Since data collected from EHR poses several challenges to directly identify trajectories from EHR data, we devise four specific studies to address the challenges: First, we propose a new knowledge-driven representation for clinical data mining, second, we demonstrate a method for estimating the onset time of slow-onset diseases from intermittently observable laboratory results in the specific context of T2D, third, we present a method to infer trajectories, the sequence of comorbidities potentially leading up to a particular disease of interest, and finally, we propose a novel method to discover multiple trajectories from EHR data. The patterns we discovered from above four studies address a clinical issue, are clinically verifiable and are amenable to deployment in practice to improve the quality of individual patient care towards promoting public health in the United States