863 research outputs found

    Machine Learning for Diabetes and Mortality Risk Prediction From Electronic Health Records

    Get PDF
    Data science can provide invaluable tools to better exploit healthcare data to improve patient outcomes and increase cost-effectiveness. Today, electronic health records (EHR) systems provide a fascinating array of data that data science applications can use to revolutionise the healthcare industry. Utilising EHR data to improve the early diagnosis of a variety of medical conditions/events is a rapidly developing area that, if successful, can help to improve healthcare services across the board. Specifically, as Type-2 Diabetes Mellitus (T2DM) represents one of the most serious threats to health across the globe, analysing the huge volumes of data provided by EHR systems to investigate approaches for early accurately predicting the onset of T2DM, and medical events such as in-hospital mortality, are two of the most important challenges data science currently faces. The present thesis addresses these challenges by examining the research gaps in the existing literature, pinpointing the un-investigated areas, and proposing a novel machine learning modelling given the difficulties inherent in EHR data. To achieve these aims, the present thesis firstly introduces a unique and large EHR dataset collected from Saudi Arabia. Then we investigate the use of a state-of-the-art machine learning predictive models that exploits this dataset for diabetes diagnosis and the early identification of patients with pre-diabetes by predicting the blood levels of one of the main indicators of diabetes and pre-diabetes: elevated Glycated Haemoglobin (HbA1c) levels. A novel collaborative denoising autoencoder (Col-DAE) framework is adopted to predict the diabetes (high) HbA1c levels. We also employ several machine learning approaches (random forest, logistic regression, support vector machine, and multilayer perceptron) for the identification of patients with pre-diabetes (elevated HbA1c levels). The models employed demonstrate that a patient's risk of diabetes/pre-diabetes can be reliably predicted from EHR records. We then extend this work to include pioneering adoption of recent technologies to investigate the outcomes of the predictive models employed by using recent explainable methods. This work also investigates the effect of using longitudinal data and more of the features available in the EHR systems on the performance and features ranking of the employed machine learning models for predicting elevated HbA1c levels in non-diabetic patients. This work demonstrates that longitudinal data and available EHR features can improve the performance of the machine learning models and can affect the relative order of importance of the features. Secondly, we develop a machine learning model for the early and accurate prediction all in-hospital mortality events for such patients utilising EHR data. This work investigates a novel application of the Stacked Denoising Autoencoder (SDA) to predict in-hospital patient mortality risk. In doing so, we demonstrate how our approach uniquely overcomes the issues associated with imbalanced datasets to which existing solutions are subject. The proposed model –– using clinical patient data on a variety of health conditions and without intensive feature engineering –– is demonstrated to achieve robust and promising results using EHR patient data recorded during the first 24 hours after admission

    ICU Patients’ Pattern Recognition and Correlation Identification of Vital Parameters Using Optimized Machine Learning Models

    Get PDF
    Early detection of patient deterioration in the Intensive Care Unit (ICU) can play a crucial role in improving patient outcomes. Conventional severity scales currently used to predict patient deterioration are based on a number of factors, the majority of which consist of multiple investigations. Recent advancements in machine learning (ML) within the healthcare domain offer the potential to alleviate the burden of continuous patient monitoring. In this study, we propose an optimized ML model designed to leverage variations in vital signs observed during the final 24 hours of an ICU stay for outcome predictions. Further, we elucidate the relative contributions of distinct vital parameters to these outcomes The dataset compiled in real-time encompasses six pivotal vital parameters: systolic (0) and diastolic (1) blood pressure, pulse rate (2), respiratory rate (3), oxygen saturation (SpO2) (4), and temperature (5). Of these vital parameters, systolic blood pressure emerges as the most significant predictor associated with mortality prediction. Using a fivefold cross-validation method, several ML classifiers are used to categorize the last 24 hours of time series data after ICU admission into three groups: recovery, death, and intubation. Notably, the optimized Gradient Boosting classifier exhibited the highest performance in detecting mortality, achieving an area under the receiver-operator curve (AUC) of 0.95. Through the integration of electronic health records with this ML software, there is the promise of early notifications regarding adverse outcomes, potentially several hours before the onset of hemodynamic instability

    Growth differentiation factor-15 and prediction of cancer-associated thrombosis and mortality: a prospective cohort study

    Full text link
    Background Patients with cancer are at increased risk of venous thromboembolism (VTE) and arterial thromboembolic/thrombotic events (ATEs). Growth differentiation factor-15 (GDF-15) improves cardiovascular risk assessment, but its predictive utility in patients with cancer remains undefined. Objectives To investigate the association of GDF-15 with the risks of VTE, ATE, and mortality in patients with cancer and its predictive utility alongside established models. Methods The Vienna Cancer and Thrombosis Study (CATS)—a prospective, observational cohort study of patients with newly diagnosed or recurrent cancer—which was followed for 2 years, served as the study framework. Serum GDF-15 levels at study inclusion were measured, and any association with VTE, ATE, and death was determined using competing risk (VTE/ATE) or Cox regression (death) modeling. The added value of GDF-15 to established VTE risk prediction models was assessed using the Khorana and Vienna CATScore. Results Among 1531 included patients with cancer (median age, 62 years; 53% men), median GDF-15 levels were 1004 ng/L (IQR, 654-1750). Increasing levels of GDF-15 were associated with the increased risks of VTE, ATE, and all-cause death ([subdistribution] hazard ratio per doubling, 1.16 [95% CI, 1.03-1.32], 1.30 [95% CI, 1.11-1.53], and 1.57 [95% CI, 1.46-1.69], respectively). After adjustment for clinically relevant covariates, the association only prevailed for all-cause death (hazard ratio, 1.21; 95% CI, 1.10-1.33) and GDF-15 did not improve the performance of the Khorana or Vienna CATScore. Conclusion GDF-15 is strongly associated with survival in patients with cancer, independent of the established risk factors. While an association with ATE and VTE was identified in univariable analysis, GDF-15 was not independently associated with these outcomes and failed to improve established VTE prediction models

    EDMON - Electronic Disease Surveillance and Monitoring Network: A Personalized Health Model-based Digital Infectious Disease Detection Mechanism using Self-Recorded Data from People with Type 1 Diabetes

    Get PDF
    Through time, we as a society have been tested with infectious disease outbreaks of different magnitude, which often pose major public health challenges. To mitigate the challenges, research endeavors have been focused on early detection mechanisms through identifying potential data sources, mode of data collection and transmission, case and outbreak detection methods. Driven by the ubiquitous nature of smartphones and wearables, the current endeavor is targeted towards individualizing the surveillance effort through a personalized health model, where the case detection is realized by exploiting self-collected physiological data from wearables and smartphones. This dissertation aims to demonstrate the concept of a personalized health model as a case detector for outbreak detection by utilizing self-recorded data from people with type 1 diabetes. The results have shown that infection onset triggers substantial deviations, i.e. prolonged hyperglycemia regardless of higher insulin injections and fewer carbohydrate consumptions. Per the findings, key parameters such as blood glucose level, insulin, carbohydrate, and insulin-to-carbohydrate ratio are found to carry high discriminative power. A personalized health model devised based on a one-class classifier and unsupervised method using selected parameters achieved promising detection performance. Experimental results show the superior performance of the one-class classifier and, models such as one-class support vector machine, k-nearest neighbor and, k-means achieved better performance. Further, the result also revealed the effect of input parameters, data granularity, and sample sizes on model performances. The presented results have practical significance for understanding the effect of infection episodes amongst people with type 1 diabetes, and the potential of a personalized health model in outbreak detection settings. The added benefit of the personalized health model concept introduced in this dissertation lies in its usefulness beyond the surveillance purpose, i.e. to devise decision support tools and learning platforms for the patient to manage infection-induced crises

    From genomic variation to personalized medicine

    Get PDF

    Endovascular Treatment for Ischemic Stroke:Identifying factors to improve outcome and alternative methods to evaluate treatment effect

    Get PDF
    Despite the overall treatment effect of EVT is highly effective, there are many targets to further improve patient outcomes. In this thesis, we identified factors influencing outcome after EVT and evaluated alternative approaches for study design to accelerate research on new therapeutic strategies. Based on our findings, we recommend to perform EVT under local anesthesia instead of conscious sedation if this is considered safe. Besides, large drops in periprocedural blood pressure should be avoided as these are associated with worse outcomes. Blood pressure on hospital admission is not an argument to withhold or delay EVT for ischemic stroke as blood pressure does not negate the effect of EVT. Furthermore, prevention of post-procedural adverse events has a great potential to further improve outcomes in successfully reperfused patients.A reduced infarct volume after EVT explains one third of treatment benefit in terms of neurological deficit. Development of new imaging techniques for accurate, reliable assessment of brain tissue viability is of importance to further improve both stroke research and clinical practice. Another future direction for EVT research is the use of cohorts of synthetic stroke patients for the development and validation of prediction tools, decision model analysis, and in-silico trials, as we demonstrated statistical methods to generate realistic cohorts of synthetic stroke patients.<br/

    Utilizing Temporal Information in The EHR for Developing a Novel Continuous Prediction Model

    Get PDF
    Type 2 diabetes mellitus (T2DM) is a nation-wide prevalent chronic condition, which includes direct and indirect healthcare costs. T2DM, however, is a preventable chronic condition based on previous clinical research. Many prediction models were based on the risk factors identified by clinical trials. One of the major tasks of the T2DM prediction models is to estimate the risks for further testing by HbA1c or fasting plasma glucose to determine whether the patient has or does not have T2DM because nation-wide screening is not cost-effective. Those models had substantial limitations on data quality, such as missing values. In this dissertation, I tested the conventional models which were based on the most widely used risk factors to predict the possibility of developing T2DM. The AUC was an average of 0.5, which implies the conventional model cannot be used to screen for T2DM risks. Based on this result, I further implemented three types of temporal representations, including non-temporal representation, interval-temporal representation, and continuous-temporal representation for building the T2DM prediction model. According to the results, continuous-temporal representation had the best performance. Continuous-temporal representation was based on deep learning methods. The result implied that the deep learning method could overcome the data quality issue and could achieve better performance. This dissertation also contributes to a continuous risk output model based on the seq2seq model. This model can generate a monotonic increasing function for a given patient to predict the future probability of developing T2DM. The model is workable but still has many limitations to overcome. Finally, this dissertation demonstrates some risks factors which are underestimated and are worthy for further research to revise the current T2DM screening guideline. The results were still preliminary. I need to collaborate with an epidemiologist and other fields to verify the findings. In the future, the methods for building a T2DM prediction model can also be used for other prediction models of chronic conditions
    • …
    corecore