1,248 research outputs found
Discovery of Type 2 Diabetes Trajectories from Electronic Health Records
University of Minnesota Ph.D. dissertation. September 2020. Major: Health Informatics. Advisor: Gyorgy Simon. 1 computer file (PDF); xiii, 110 pages.Type 2 diabetes (T2D) is one of the fastest growing public health concerns in the United States. There were 30.3 million patients (9.4% of the US populations) suffering from diabetes in 2015. Diabetes, which is the seventh leading cause of death in the United States, is known to be a non-reversible (incurable) chronic disease, leading to severe complications, including chronic kidney disease, amputation, blindness, and various cardiac and vascular diseases. Early identification of patients at high risk is regarded as the most effective clinical tool to prevent or delay the development of diabetes, allowing patients to change their life style or to receive medication earlier. In turn, these interventions can help decrease the risk of diabetes by 30-60%. Many studies have been conducted aiming at the early identification of patients at high risk in the clinical settings. These studies typically only consider the patient's current state at the time of the assessment and do not fully utilize all available information such as patient's medical history. Past history is important. It has been shown that laboratory results and vital signs can differ between diabetic and non-diabetic patients as many as 15-20 years before the onset of diabetes. We have also shown in our study that the order in which patients develop diabetes-related comorbidities is predictive of their diabetes risk even after adjusting for the severity of the comorbidities. In this thesis, we develop multiple novel methods to discover T2D trajectories from Electronic Health Records (EHR). We define trajectory as an order of in which diseases developed. We aim to discover typical and atypical trajectories where typical trajectories represent predominant patterns of progressions and atypical trajectories refer to the rest of the trajectories. Revealing trajectories can allow us to divide patients into subpopulations that can uncover the underlying etiology of diabetes. More importantly, by assessing the risk correctly and by a better understanding of the heterogeneity of diabetes, we can provide better care. Since data collected from EHR poses several challenges to directly identify trajectories from EHR data, we devise four specific studies to address the challenges: First, we propose a new knowledge-driven representation for clinical data mining, second, we demonstrate a method for estimating the onset time of slow-onset diseases from intermittently observable laboratory results in the specific context of T2D, third, we present a method to infer trajectories, the sequence of comorbidities potentially leading up to a particular disease of interest, and finally, we propose a novel method to discover multiple trajectories from EHR data. The patterns we discovered from above four studies address a clinical issue, are clinically verifiable and are amenable to deployment in practice to improve the quality of individual patient care towards promoting public health in the United States
Recommended from our members
Building trajectories through clinical data to model disease progression
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Clinical trials are typically conducted over a population within a defined time period
in order to illuminate certain characteristics of a health issue or disease process. These cross-sectional studies provide a snapshot of these disease processes over a large number of people but do not allow us to model the temporal nature of disease, which is essential for modeling detailed prognostic predictions. Longitudinal studies, on the other hand, are used to explore how these processes develop over time in a number of people but can be expensive and time-consuming, and many studies only cover a relatively small window within the disease process. This thesis describes the application of intelligent data analysis techniques for extracting information from time series generated by different diseases. The aim of this thesis is to identify intermediate stages
in a disease process and sub-categories of the disease exhibiting subtly different symptoms. It explores the use of a bootstrap technique that fits trajectories through the data generating “pseudo time-series”. It addresses issues including: how clinical variables interact as a disease progresses along the trajectories in the data; and how to automatically identify different disease states along these trajectories, as well as the transitions between them. The thesis documents how reliable time-series models can be created from large amounts of historical cross-sectional data and a novel relabling/latent variable approach has enabled the exploration of the temporal nature of disease progression. The proposed algorithms are tested extensively on simulated data and on three real clinical datasets. Finally, a study is carried out to explore whether we can “calibrate” pseudo time-series models with real longitudinal data in order to improve them. Plausible directions for future research are discussed at the end of the thesis
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review
Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset
Adiabatic dynamic causal modelling
This technical note introduces adiabatic dynamic causal modelling, a method for inferring slow changes in biophysical parameters that control fluctuations of fast neuronal states. The application domain we have in mind is inferring slow changes in variables (e.g., extracellular ion concentrations or synaptic efficacy) that underlie phase transitions in brain activity (e.g., paroxysmal seizure activity). The scheme is efficient and yet retains a biophysical interpretation, in virtue of being based on established neural mass models that are equipped with a slow dynamic on the parameters (such as synaptic rate constants or effective connectivity). In brief, we use an adiabatic approximation to summarise fast fluctuations in hidden neuronal states (and their expression in sensors) in terms of their second order statistics; namely, their complex cross spectra. This allows one to specify and compare models of slowly changing parameters (using Bayesian model reduction) that generate a sequence of empirical cross spectra of electrophysiological recordings. Crucially, we use the slow fluctuations in the spectral power of neuronal activity as empirical priors on changes in synaptic parameters. This introduces a circular causality, in which synaptic parameters underwrite fast neuronal activity that, in turn, induces activity-dependent plasticity in synaptic parameters. In this foundational paper, we describe the underlying model, establish its face validity using simulations and provide an illustrative application to a chemoconvulsant animal model of seizure activity
Improving Diagnostics with Deep Forest Applied to Electronic Health Records
An electronic health record (EHR) is a vital high-dimensional part of medical concepts. Discovering implicit correlations in the information of this data set and the research and informative aspects can improve the treatment and management process. The challenge of concern is the data sources’ limitations in finding a stable model to relate medical concepts and use these existing connections. This paper presents Patient Forest, a novel end-to-end approach for learning patient representations from tree-structured data for readmission and mortality prediction tasks. By leveraging statistical features, the proposed model is able to provide an accurate and reliable classifier for predicting readmission and mortality. Experiments on MIMIC-III and eICU datasets demonstrate Patient Forest outperforms existing machine learning models, especially when the training data are limited. Additionally, a qualitative evaluation of Patient Forest is conducted by visualising the learnt representations in 2D space using the t-SNE, which further confirms the effectiveness of the proposed model in learning EHR representations
Machine learning in the social and health sciences
The uptake of machine learning (ML) approaches in the social and health
sciences has been rather slow, and research using ML for social and health
research questions remains fragmented. This may be due to the separate
development of research in the computational/data versus social and health
sciences as well as a lack of accessible overviews and adequate training in ML
techniques for non data science researchers. This paper provides a meta-mapping
of research questions in the social and health sciences to appropriate ML
approaches, by incorporating the necessary requirements to statistical analysis
in these disciplines. We map the established classification into description,
prediction, and causal inference to common research goals, such as estimating
prevalence of adverse health or social outcomes, predicting the risk of an
event, and identifying risk factors or causes of adverse outcomes. This
meta-mapping aims at overcoming disciplinary barriers and starting a fluid
dialogue between researchers from the social and health sciences and
methodologically trained researchers. Such mapping may also help to fully
exploit the benefits of ML while considering domain-specific aspects relevant
to the social and health sciences, and hopefully contribute to the acceleration
of the uptake of ML applications to advance both basic and applied social and
health sciences research
Monotonic Gaussian Process for Spatio-Temporal Disease Progression Modeling in Brain Imaging Data
We introduce a probabilistic generative model for disentangling
spatio-temporal disease trajectories from series of high-dimensional brain
images. The model is based on spatio-temporal matrix factorization, where
inference on the sources is constrained by anatomically plausible statistical
priors. To model realistic trajectories, the temporal sources are defined as
monotonic and time-reparametrized Gaussian Processes. To account for the
non-stationarity of brain images, we model the spatial sources as sparse codes
convolved at multiple scales. The method was tested on synthetic data
favourably comparing with standard blind source separation approaches. The
application on large-scale imaging data from a clinical study allows to
disentangle differential temporal progression patterns mapping brain regions
key to neurodegeneration, while revealing a disease-specific time scale
associated to the clinical diagnosis
Monotonic Gaussian Process for Spatio-Temporal Disease Progression Modeling in Brain Imaging Data
International audienceWe introduce a probabilistic generative model for disentangling spatio-temporal disease trajectories from series of high-dimensional brain images. The model is based on spatio-temporal matrix factorization, where inference on the sources is constrained by anatomically plausible statistical priors. To model realistic trajectories, the temporal sources are defined as monotonic and time-reparametrized Gaussian Processes. To account for the non-stationarity of brain images, we model the spatial sources as sparse codes convolved at multiple scales. The method was tested on synthetic data favourably comparing with standard blind source separation approaches. The application on large-scale imaging data from a clinical study allows to disentangle differential temporal progression patterns mapping brain regions key to neurodegeneration, while revealing a disease-specific time scale associated to the clinical diagnosis
Predicting brain age from functional connectivity in symptomatic and preclinical Alzheimer disease
Brain-predicted age quantifies apparent brain age compared to normative neuroimaging trajectories. Advanced brain-predicted age has been well established in symptomatic Alzheimer disease (AD), but is underexplored in preclinical AD. Prior brain-predicted age studies have typically used structural MRI, but resting-state functional connectivity (FC) remains underexplored. Our model predicted age from FC in 391 cognitively normal, amyloid-negative controls (ages 18-89). We applied the trained model to 145 amyloid-negative, 151 preclinical AD, and 156 symptomatic AD participants to test group differences. The model accurately predicted age in the training set. FC-predicted brain age gaps (FC-BAG) were significantly older in symptomatic AD and significantly younger in preclinical AD compared to controls. There was minimal correspondence between networks predictive of age and AD. Elevated FC-BAG may reflect network disruption during symptomatic AD. Reduced FC-BAG in preclinical AD was opposite to the expected direction, and may reflect a biphasic response to preclinical AD pathology or may be driven by inconsistency between age-related vs. AD-related networks. Overall, FC-predicted brain age may be a sensitive AD biomarker
- …