604 research outputs found

    Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection

    Full text link
    The utilization of computer technology to solve problems in medical scenarios has attracted considerable attention in recent years, which still has great potential and space for exploration. Among them, machine learning has been widely used in the prediction, diagnosis and even treatment of Sepsis. However, state-of-the-art methods require large amounts of labeled medical data for supervised learning. In real-world applications, the lack of labeled data will cause enormous obstacles if one hospital wants to deploy a new Sepsis detection system. Different from the supervised learning setting, we need to use known information (e.g., from another hospital with rich labeled data) to help build a model with acceptable performance, i.e., transfer learning. In this paper, we propose a semi-supervised optimal transport with self-paced ensemble framework for Sepsis early detection, called SPSSOT, to transfer knowledge from the other that has rich labeled data. In SPSSOT, we first extract the same clinical indicators from the source domain (e.g., hospital with rich labeled data) and the target domain (e.g., hospital with little labeled data), then we combine the semi-supervised domain adaptation based on optimal transport theory with self-paced under-sampling to avoid a negative transfer possibly caused by covariate shift and class imbalance. On the whole, SPSSOT is an end-to-end transfer learning method for Sepsis early detection which can automatically select suitable samples from two domains respectively according to the number of iterations and align feature space of two domains. Extensive experiments on two open clinical datasets demonstrate that comparing with other methods, our proposed SPSSOT, can significantly improve the AUC values with only 1% labeled data in the target domain in two transfer learning scenarios, MIMIC rightarrowrightarrow Challenge and Challenge rightarrowrightarrow MIMIC.Comment: 14 pages, 9 figure

    Generalizability of machine learning models in predicting patient deterioration

    Get PDF
    Predicting patient deterioration in an Intensive Care Unit (ICU) effectively is a critical health care task serving patient health and resource allocation. At times, the task may be highly complex for a physician, yet high-stakes and time-critical decisions need to be made based on it. In this work, we investigate the ability of a set of machine learning models to algorithimically predict future occurrence of in hospital death based on Electronic Health Record (EHR) data of ICU-patients. For one, we will assess the generalizability of the models. We do this by evaluating the models on hospitals the data of which has not been considered when training the models. For another, we consider the case in which we have access to some EHR data for the patients treated at a hospital of interest. In this setting, we assess how EHR data from other hospitals can be used in the optimal way to improve the prediction accuracy. This study is important for the deployment and integration of such predictive models in practice, e.g., for real-time algorithmic deterioration prediction for clinical decision support. In order to address these questions, we use the eICU collaborative research database, which is a database containing EHRs of patients treated at a heterogeneous collection of hospitals in the United States. In this work, we use the patient demographics, vital signs and Glasgow coma score as the predictors. We devise and describe three computational experiments to test the generalization in different ways. The used models are the random forest, gradient boosted trees and long short-term memory network. In our first experiment concerning the generalization, we show that, with the chosen limited set of predictors, the models generalize reasonably across hospitals but that only a small data mismatch is observed. Moreover, with this setting, our second experiment shows that the model performance does not significantly improve when increasing the heterogeneity of the training set. Given these observations, our third experiment shows tha

    Conditional Tabular Generative Adversarial Net for Enhancing Ensemble Classifiers in Sepsis Diagnosis

    Get PDF
    Antibiotic-resistant bacteria have proliferated at an alarming rate as a result of the extensive use of antibiotics and the paucity of new medication research. The possibility that an antibiotic-resistant bacterial infection would progress to sepsis is one of the major collateral problems affecting people with this condition. 31,000 lives were lost due to sepsis in England with costs about two billion pounds annually. This research aims to develop and evaluate several classification approaches to improve predicting sepsis and reduce the tendency of underdiagnosis in computer-aided predictive tools. This research employs medical data sets for patients diagnosed with sepsis, it analyses the efficacy of ensemble machine learning techniques compared to non ensemble machine learning techniques and the significance of data balancing and Conditional Tabular Generative Adversarial Nets for data augmentation in producing reliable diagnosis. The average F Score obtained by the non-ensemble models trained in this paper is 0.83 compared to the ensemble techniques average of 0.94. Nonensemble techniques, such as Decision Tree, achieved an F score of 0.90, an AUC of 0.90 and an accuracy of 90%. Histogram-based Gradient Boosting Classification Tree achieved an F score of 0.96, an AUC of 0.96 and an accuracy of 95%, surpassing the other models tested. Additionally, when compared to the current state of the art sepsis prediction models, the models developed in this study demonstrated higher average performance in all metrics, indicating reduced bias and improved robustness through data balancing and Conditional Tabular Generative Adversarial Nets for data augmentation. The study revealed that data balancing and augmentation on the ensemble machine learning algorithms boost the efficacy of clinical predictive models and can help clinics decide which data types are most important when examining patients and diagnosing sepsis early through intelligent human-machine interface

    Predictive analytics framework for electronic health records with machine learning advancements : optimising hospital resources utilisation with predictive and epidemiological models

    Get PDF
    The primary aim of this thesis was to investigate the feasibility and robustness of predictive machine-learning models in the context of improving hospital resources’ utilisation with data- driven approaches and predicting hospitalisation with hospital quality assessment metrics such as length of stay. The length of stay predictions includes the validity of the proposed methodological predictive framework on each hospital’s electronic health records data source. In this thesis, we relied on electronic health records (EHRs) to drive a data-driven predictive inpatient length of stay (LOS) research framework that suits the most demanding hospital facilities for hospital resources’ utilisation context. The thesis focused on the viability of the methodological predictive length of stay approaches on dynamic and demanding healthcare facilities and hospital settings such as the intensive care units and the emergency departments. While the hospital length of stay predictions are (internal) healthcare inpatients outcomes assessment at the time of admission to discharge, the thesis also considered (external) factors outside hospital control, such as forecasting future hospitalisations from the spread of infectious communicable disease during pandemics. The internal and external splits are the thesis’ main contributions. Therefore, the thesis evaluated the public health measures during events of uncertainty (e.g. pandemics) and measured the effect of non-pharmaceutical intervention during outbreaks on future hospitalised cases. This approach is the first contribution in the literature to examine the epidemiological curves’ effect using simulation models to project the future hospitalisations on their strong potential to impact hospital beds’ availability and stress hospital workflow and workers, to the best of our knowledge. The main research commonalities between chapters are the usefulness of ensembles learning models in the context of LOS for hospital resources utilisation. The ensembles learning models anticipate better predictive performance by combining several base models to produce an optimal predictive model. These predictive models explored the internal LOS for various chronic and acute conditions using data-driven approaches to determine the most accurate and powerful predicted outcomes. This eventually helps to achieve desired outcomes for hospital professionals who are working in hospital settings

    Prediction of acute kidney injury using the Electronic Medical Records of a pediatric cardiac intensive care unit

    Get PDF
    Acute Kidney Injury (AKI) is a frequent complication in hospitalized patients significantly associated with mortality, length of stay, and healthcare cost. Management of AKI presents an important challenge and clinicians may be helped by robust prediction models for risk evaluation, foster prevention, and recognition. The advances in clinical informatics and the increasing availability of electronic medical records (EMR) have favored the development of predictive models of risk estimation in AKI. In this dissertation, we analyze the problem of predicting the AKI stage during the patient’s stay in the intensive care unit using retrospectively the Electronic medical records (EMRs) recently introduced in the Pediatric Intensive Care Unit (PCICU) of "Ospedale Pediatrico Bambino Gesù". After the initial phase of data selection, extraction, and management of missing data, we develop a random forest (RF) classification model including a variable selection step with the aim of predicting the stage of AKI 48 hours in advance in both binary and multiclass cases. The performances obtained in terms of Area under the ROC Curve (AUC-ROC) for binary cases and accuracy for multiclass cases are always very good compared with other recent attempts in the literature. The list of the most important variables obtained in the various classifications highlights the importance of some of the expected variables (such as creatinine) reported in other studies in the literature but also the presence of variables that are specific to pediatric patients under examination (such as PIM3). Moreover, we develop other classifications using the Generalized Additive Models (GAMS) and Bayesian network (BN) models that have the benefit of offering a more interpretable approach. Although these results are inferior to the RF, they are comparable with many outcomes reported in the literature. The plot obtained with GAMs and the structure of the directed acyclic graph (DAG) achieved with BN are consistent with a possible medical explanation and would present further interpretation hints for the doctors about the onset of AKI. Finally, we observe that all implemented models confirm the possibility of making an accurate prediction of the AKI stage using the PCICU. These models can be potentially included in a web interface and, in perspective, be integrated into the EMR of PCICU. This tool would allow the doctors to predict prospectively the patient’s stage of AKI and evaluate how to intervene if necessary. In order to proceed with this, it would be necessary for the future to implement the export of a larger dataset adding new data acquired in the meantime in PCICU
    • …
    corecore