183 research outputs found

    Mixed-Integer Projections for Automated Data Correction of EMRs Improve Predictions of Sepsis among Hospitalized Patients

    Full text link
    Machine learning (ML) models are increasingly pivotal in automating clinical decisions. Yet, a glaring oversight in prior research has been the lack of proper processing of Electronic Medical Record (EMR) data in the clinical context for errors and outliers. Addressing this oversight, we introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints, generating important meta-data that can be used in ML workflows. In particular, by using high-dimensional mixed-integer programs that capture physiological and biological constraints on patient vitals and lab values, we can harness the power of mathematical "projections" for the EMR data to correct patient data. Consequently, we measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores". These scores provide insight into the patient's health status and significantly boost the performance of ML classifiers in real-life clinical settings. We validate the impact of our framework in the context of early detection of sepsis using ML. We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections

    Learning deep patient representations for the teleICU

    Get PDF
    This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 89-93).This thesis presents a method of extracting deep robust representations of teleICU clinical data using Transformer networks, inspired by recent machine learning literature in language modeling. The utility of these representations is evaluated in various prediction outcome tasks, in which they were able to outperform linear and neural baselines. Also examined are the probability distributions of various patient characteristics across the learned patient representation space; where corresponding high-level spatial structure suggests potential for use as a similarity metric or in combination with other patient similarity metrics. Finally, the code for the models developed is publicly provided as a starting point for further research.by Ini Oguntola.M. Eng.M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc

    Statistical methods for NHS incident reporting data

    Get PDF
    The National Reporting and Learning System (NRLS) is the English and Welsh NHS’ national repository of incident reports from healthcare. It aims to capture details of incident reports, at national level, and facilitate clinical review and learning to improve patient safety. These incident reports range from minor ‘near-misses’ to critical incidents that may lead to severe harm or death. NRLS data are currently reported as crude counts and proportions, but their major use is clinical review of the free-text descriptions of incidents. There are few well-developed quantitative analysis approaches for NRLS, and this thesis investigates these methods. A literature review revealed a wealth of clinical detail, but also systematic constraints of NRLS’ structure, including non-mandatory reporting, missing data and misclassification. Summary statistics for reports from 2010/11 – 2016/17 supported this and suggest NRLS was not suitable for statistical modelling in isolation. Modelling methods were advanced by creating a hybrid dataset using other sources of hospital casemix data from Hospital Episode Statistics (HES). A theoretical model was established, based on ‘exposure’ variables (using casemix proxies), and ‘culture’ as a random-effect. The initial modelling approach examined Poisson regression, mixture and multilevel models. Overdispersion was significant, generated mainly by clustering and aggregation in the hybrid dataset, but models were chosen to reflect these structures. Further modelling approaches were examined, using Generalized Additive Models to smooth predictor variables, regression tree-based models including Random Forests, and Artificial Neural Networks. Models were also extended to examine a subset of death and severe harm incidents, exploring how sparse counts affect models. Text mining techniques were examined for analysis of incident descriptions and showed how term frequency might be used. Terms were used to generate latent topics models used, in-turn, to predict the harm level of incidents. Model outputs were used to create a ‘Standardised Incident Reporting Ratio’ (SIRR) and cast this in the mould of current regulatory frameworks, using process control techniques such as funnel plots and cusum charts. A prototype online reporting tool was developed to allow NHS organisations to examine their SIRRs, provide supporting analyses, and link data points back to individual incident reports

    A systematic review of the prediction of hospital length of stay:Towards a unified framework

    Get PDF
    Hospital length of stay of patients is a crucial factor for the effective planning and management of hospital resources. There is considerable interest in predicting the LoS of patients in order to improve patient care, control hospital costs and increase service efficiency. This paper presents an extensive review of the literature, examining the approaches employed for the prediction of LoS in terms of their merits and shortcomings. In order to address some of these problems, a unified framework is proposed to better generalise the approaches that are being used to predict length of stay. This includes the investigation of the types of routinely collected data used in the problem as well as recommendations to ensure robust and meaningful knowledge modelling. This unified common framework enables the direct comparison of results between length of stay prediction approaches and will ensure that such approaches can be used across several hospital environments. A literature search was conducted in PubMed, Google Scholar and Web of Science from 1970 until 2019 to identify LoS surveys which review the literature. 32 Surveys were identified, from these 32 surveys, 220 papers were manually identified to be relevant to LoS prediction. After removing duplicates, and exploring the reference list of studies included for review, 93 studies remained. Despite the continuing efforts to predict and reduce the LoS of patients, current research in this domain remains ad-hoc; as such, the model tuning and data preprocessing steps are too specific and result in a large proportion of the current prediction mechanisms being restricted to the hospital that they were employed in. Adopting a unified framework for the prediction of LoS could yield a more reliable estimate of the LoS as a unified framework enables the direct comparison of length of stay methods. Additional research is also required to explore novel methods such as fuzzy systems which could build upon the success of current models as well as further exploration of black-box approaches and model interpretability

    Two-step approach for occupancy estimation in intensive care units based on Bayesian optimization techniques

    Get PDF
    Due to the high occupational pressure suffered by intensive care units (ICUs), a correct estimation of the patients’ length of stay (LoS) in the ICU is of great interest to predict possible situations of collapse, to help healthcare personnel to select appropriate treatment options and to predict patients’ conditions. There has been a high amount of data collected by biomedical sensors during the continuous monitoring process of patients in the ICU, so the use of artificial intelligence techniques in automatic LoS estimation would improve patients’ care and facilitate the work of healthcare personnel. In this work, a novel methodology to estimate the LoS using data of the first 24 h in the ICU is presented. To achieve this, XGBoost, one of the most popular and efficient state-of-the-art algorithms, is used as an estimator model, and its performance is optimized both from computational and precision viewpoints using Bayesian techniques. For this optimization, a novel two-step approach is presented. The methodology was carefully designed to execute codes on a high-performance computing system based on graphics processing units, which considerably reduces the execution time. The algorithm scalability is analyzed. With the proposed methodology, the best set of XGBoost hyperparameters are identified, estimating LoS with a MAE of 2.529 days, improving the results reported in the current state of the art and probing the validity and utility of the proposed approach.Agencia Gallega de Innovación | Ref. IN845D-2020/29Agencia Gallega de Innovación | Ref. IN607B-2021/1

    A CNN-LSTM for predicting mortality in the ICU

    Get PDF
    An accurate predicted mortality is crucial to healthcare as it provides an empirical risk estimate for prognostic decision making, patient stratification and hospital benchmarking. Current prediction methods in practice are severity of disease scoring systems that usually involve a fixed set of admission attributes and summarized physiological data. These systems are prone to bias and require substantial manual effort which necessitates an updated approach which can account for most shortcomings. Clinical observation notes allow for recording highly subjective data on the patient that can possibly facilitate higher discrimination. Moreover, deep learning models can automatically extract and select features without human input.This thesis investigates the potential of a combination of a deep learning model and notes for predicting mortality with a higher accuracy. A custom architecture, called CNN-LSTM, is conceptualized for mapping multiple notes compiled in a hospital stay to a mortality outcome. It employs both convolutional and recurrent layers with the former capturing semantic relationships in individual notes independently and the latter capturing temporal relationships between concurrent notes in a hospital stay. This approach is compared to three severity of disease scoring systems with a case study on the MIMIC-III dataset. Experiments are set up to assess the CNN-LSTM for predicting mortality using only the notes from the first 24, 12 and 48 hours of a patient stay. The model is trained using K-fold cross-validation with k=5 and the mortality probability calculated by the three severity scores on the held-out set is used as the baseline. It is found that the CNN-LSTM outperforms the baseline on all experiments which serves as a proof-of-concept of how notes and deep learning can better outcome prediction
    • …
    corecore