218 research outputs found
Recommended from our members
Predicting Comorbidities Using Resampling and Dynamic Bayesian Networks with Latent Variables
Recommended from our members
Combined supervised and unsupervised learning to identify subclasses of disease for better prediction
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonDisease subtyping, which aids in the development of personalised treatments, remains a challenge in data analysis because of the many different ways to group patients based upon their data. However, if I can identify subclasses of disease, this will help to develop better models that are more specific to individuals and should therefore improve prediction and understanding of the underlying characteristics of the disease in question. In addition, patients might suffer from multiple disease complications. Models that are tailored to individuals could improve both prediction of multiple complications and understanding of underlying disease characteristics. However, AI models can become outdated over time due to either sudden changes in the underlying data, such as those caused by new measurement methods, or incremental changes, such as the ageing of the study population. This thesis proposes a new algorithm that integrates consensus clustering methods with classification in order to overcome issues with sample bias. The method was tested on a freely available dataset of real-world breast cancer cases and data from a London hospital on systemic sclerosis, a rare and potentially fatal condition. The results show that nearest consensus clustering classification improves accuracy and prediction significantly when this algorithm is compared with competitive similar methods. In addition, this thesis proposes a new algorithm that integrates latent class models with classification. The new algorithm uses latent class models to cluster patients within groups; this results in improved classification and aids in the understanding of the underlying differences of the discovered groups. The method was tested on data from patients with systemic sclerosis (SSc), a rare and potentially fatal condition, and coronary heart disease. Results show that the latent class multi-label classification (MLC) model improves accuracy when compared with competitive similar methods. Finally, this thesis implemented the updated concept drift method (DDM) to monitor AI models over time and detect drifts when they occur. The method was tested on data from patients with SSc and patients with coronavirus disease (COVID)
Recommended from our members
Identifying Latent Variables in Dynamic Bayesian Networks with Bootstrapping Applied to Type 2 Diabetes Complication Prediction
Predicting Type 2 Diabetes Complications and Personalising Patient Using Artificial Intelligence Methodology
The prediction of the onset of different complications of disease, in general, is challenging due to the existence of unmeasured risk factors, imbalanced data, time-varying data due to dynamics, and various interventions to the disease over time. Scholars share a common argument that many Artificial Intelligence techniques that successfully model disease are often in the form of a âblack boxâ where the internal workings and complexities are extremely difficult to understand, both from practitionersâ and patientsâ perspective. There is a need for appropriate Artificial Intelligence techniques to build predictive models that not only capture unmeasured effects to improve prediction, but are also transparent in how they model data so that knowledge about disease processes can be extracted and trust in the model can be maintained by clinicians. The proposed strategy builds probabilistic graphical models for prediction with the inclusion of informative hidden variables. These are added in a stepwise manner to improve predictive performance whilst maintaining as simple a model as possible, which is regarded as crucial for the interpretation of the prediction results. This chapter explores this key issue with a specific focus on diabetes data. According to the literature on disease modelling, especially on major diseases such as diabetes, a patientâs mortality often occurs due to the associated complications caused by the disease over time and not the disease itself. This is often patient-specific and will depend on what type of cohort a patient belongs to. Another main focus of this study is patient personalisation via precision medicine by discovering meaningful subgroups of patients which are characterised as phenotypes. These phenotypes are explained further using Bayesian network analysis methods and temporal association rules. Overall, this chapter discussed the earlier research of the chapterâs author. It explores Artificial Intelligence (IDA) techniques for modelling the progression of disease whilst simultaneously stratifying patients and doing so in a transparent manner as possible. To this end, it reviews the current literature on some of the most common Artificial Intelligent (AI) methodologies, including probabilistic modelling, association rule mining, phenotype discovery and latent variable discovery by using diabetes as a case study
Incorporating Particle Filtering and System Dynamic Modelling in Infection Transmission of Measles and Pertussis
Childhood viral and bacterial infections remain an important public problem, and research into their dynamics has broader scientific implications for understanding both dynamical systems and associated methodologies at the population level. Measles and pertussis are two important childhood infectious diseases. Measles is a highly transmissible disease and is one of the leading causes of death among young children under 5 globally. Pertussis (whooping cough) is another common childhood infectious disease, which is most harmful for babies and young children and can be deadly.
While the use of ongoing surveillance data and - recently - dynamic models offer insight on measles (or pertussis) dynamics, both suffer notable shortcomings when applied to measles (or pertussis) outbreak prediction. In this thesis, I apply the Sequential Monte Carlo approach of particle filtering, incorporating reported measles and pertussis incidence for Saskatchewan during the pre-vaccination era, using an adaptation of a previously contributed measles and pertussis compartmental models. To secure further insight, I also perform particle filtering on age structured adaptations of the models. For some models, I further consider two different methods of configuring the contact matrix.
The results indicate that, when used with a suitable dynamic model, particle filtering can offer high predictive capacity for measles and pertussis dynamics and outbreak occurrence in a low vaccination context. Based on the most competitive model as evaluated by predictive accuracy, I have performed prediction and outbreak classification analysis. The prediction results demonstrated that the most competitive models could predict the measles and pertussis outbreak patterns and classify whether there will be an outbreak or not in the next month (Area under the ROC Curve of measles is 0.89, while pertussis is 0.91).
I conclude that anticipating the outbreak dynamics of measles and pertussis in low vaccination regions by applying particle filtering with simple measles and pertussis transmission models, and incorporating time series of reported case counts, is a valuable technique to assist public health authorities in estimating risk and magnitude of measles and pertussis outbreaks. Such approach offers particularly strong value proposition for other pathogens with little-known dynamics, important latent drivers, and in the context of the growing number of high-velocity electronic data sources. Strong additional benefits are also likely to be realized from extending the application of this technique to highly vaccinated populations
Recommended from our members
Opening the black box: Personalizing type 2 diabetes patients based on their latent phenotype and temporal associated complication rules
© 2020 The Authors. It is widely considered that approximately 10% of the population suffers from type 2 diabetes. Unfortunately, the impact of this disease is underestimated. Patient's mortality often occurs due to complications caused by the disease and not the disease itself. Many techniques utilized in modeling diseases are often in the form of a âblack boxâ where the internal workings and complexities are extremely difficult to understand, both from practitioners' and patients' perspective. In this work, we address this issue and present an informative model/pattern, known as a âlatent phenotype,â with an aim to capture the complexities of the associated complications' over time. We further extend this idea by using a combination of temporal association rule mining and unsupervised learning in order to find explainable subgroups of patients with more personalized prediction. Our extensive findings show how uncovering the latent phenotype aids in distinguishing the disparities among subgroups of patients based on their complications patterns. We gain insight into how best to enhance the prediction performance and reduce bias in the models applied using uncertainty in the patients' data
Discovery of Type 2 Diabetes Trajectories from Electronic Health Records
University of Minnesota Ph.D. dissertation. September 2020. Major: Health Informatics. Advisor: Gyorgy Simon. 1 computer file (PDF); xiii, 110 pages.Type 2 diabetes (T2D) is one of the fastest growing public health concerns in the United States. There were 30.3 million patients (9.4% of the US populations) suffering from diabetes in 2015. Diabetes, which is the seventh leading cause of death in the United States, is known to be a non-reversible (incurable) chronic disease, leading to severe complications, including chronic kidney disease, amputation, blindness, and various cardiac and vascular diseases. Early identification of patients at high risk is regarded as the most effective clinical tool to prevent or delay the development of diabetes, allowing patients to change their life style or to receive medication earlier. In turn, these interventions can help decrease the risk of diabetes by 30-60%. Many studies have been conducted aiming at the early identification of patients at high risk in the clinical settings. These studies typically only consider the patient's current state at the time of the assessment and do not fully utilize all available information such as patient's medical history. Past history is important. It has been shown that laboratory results and vital signs can differ between diabetic and non-diabetic patients as many as 15-20 years before the onset of diabetes. We have also shown in our study that the order in which patients develop diabetes-related comorbidities is predictive of their diabetes risk even after adjusting for the severity of the comorbidities. In this thesis, we develop multiple novel methods to discover T2D trajectories from Electronic Health Records (EHR). We define trajectory as an order of in which diseases developed. We aim to discover typical and atypical trajectories where typical trajectories represent predominant patterns of progressions and atypical trajectories refer to the rest of the trajectories. Revealing trajectories can allow us to divide patients into subpopulations that can uncover the underlying etiology of diabetes. More importantly, by assessing the risk correctly and by a better understanding of the heterogeneity of diabetes, we can provide better care. Since data collected from EHR poses several challenges to directly identify trajectories from EHR data, we devise four specific studies to address the challenges: First, we propose a new knowledge-driven representation for clinical data mining, second, we demonstrate a method for estimating the onset time of slow-onset diseases from intermittently observable laboratory results in the specific context of T2D, third, we present a method to infer trajectories, the sequence of comorbidities potentially leading up to a particular disease of interest, and finally, we propose a novel method to discover multiple trajectories from EHR data. The patterns we discovered from above four studies address a clinical issue, are clinically verifiable and are amenable to deployment in practice to improve the quality of individual patient care towards promoting public health in the United States
Network analysis of 18 attention-deficit/hyperactivity disorder symptoms suggests the importance of âDistractedâ and âFidgetâ as central symptoms: Invariance across age, gender, and subtype presentations
The network theory of mental disorders conceptualizes psychiatric symptoms as networks of symptoms that causally interact with each other. Our present study aimed to explore the symptomatic structure in children with attention-deficit/hyperactivity disorder (ADHD) using network analyses. Symptom network based on 18 items of ADHD Rating Scale-IV was evaluated in 4,033 children and adolescents with ADHD. The importance of nodes was evaluated quantitatively by examining centrality indices, including Strength, Betweenness and Closeness, as well as Predictability and Expected Influence (EI). In addition, we compared the network structure across different subgroups, as characterized by ADHD subtypes, gender and age groups to evaluate its invariance. A three-factor-community structure was identified including inattentive, hyperactive and impulsive clusters. For the centrality indices, the nodes of âDistractedâ and âFidgetâ showed high closeness and betweenness, and represented a bridge linking the inattentive and hyperactive/impulsive domains. âDetailsâ and âFidgetâ were the most common endorsed symptoms in inattentive and hyperactive/impulsive domains respectively. On the contrary, the âListenâ item formed a peripheral node showing weak links with all other items within the inattentive cluster, and the âLossâ item as the least central node by all measures of centrality and with low predictability value. The network structure was relatively invariant across gender, age and ADHD subtypes/presentations. The 18 items of ADHD core symptoms appear not equivalent and interchangeable. âDistractedâ and âFidgetâ should be considered as central, or core, symptoms for further evaluation and intervention. The network-informed differentiation of these symptoms has the potentials to refine the phenotype and reduce heterogeneity
Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series
Analysis of multivariate healthcare time series data is inherently
challenging: irregular sampling, noisy and missing values, and heterogeneous
patient groups with different dynamics violating exchangeability. In addition,
interpretability and quantification of uncertainty are critically important.
Here, we propose a novel class of models, a mixture of coupled hidden Markov
models (M-CHMM), and demonstrate how it elegantly overcomes these challenges.
To make the model learning feasible, we derive two algorithms to sample the
sequences of the latent variables in the CHMM: samplers based on (i) particle
filtering and (ii) factorized approximation. Compared to existing inference
methods, our algorithms are computationally tractable, improve mixing, and
allow for likelihood estimation, which is necessary to learn the mixture model.
Experiments on challenging real-world epidemiological and semi-synthetic data
demonstrate the advantages of the M-CHMM: improved data fit, capacity to
efficiently handle missing and noisy measurements, improved prediction
accuracy, and ability to identify interpretable subsets in the data.Comment: 9 pages, 7 figures, Proceedings of Machine Learning Research, Machine
Learning for Health (ML4H) 202
Predicting non-attendance in hospital outpatient appointments using Deep Learning Approach
The hospital outpatient non-attendance imposes a substantial financial burden on hospitals and roots in multiple diverse reasons. This research aims to build an advanced predictive model for predicting non-attendance regarding the whole spectrum of probable contributing factors to non-attendance that could be collated from heterogeneous sources including electronic patients records and external non-hospital data. We proposed a new non-attendance prediction model based on deep neural networks and machine learning models. The proposed approach works upon sparse stacked denoising autoencoders (SDAEs) to learn the underlying manifold of data and thereby compacting information and providing a better representation that can be utilised afterwards by other learning models as well. The proposed approach is evaluated over real hospital data and compared with several well-known and scalable machine learning models. The evaluation results reveal the proposed approach with softmax layer and logistic regression outperforms other methods in practice
- âŠ