8 research outputs found

    Predicting Decompensation Risk in Intensive Care Unit Patients Using Machine Learning

    Get PDF
    Patients in intensive care units (ICU) face the threat of decompensation, a rapid decline in health associated with a high risk of death. This study focuses on creating and evaluating machine learning (ML) models to predict decompensation risk in ICU patients. It proposes a novel approach using patient vitals and clinical data within a specified timeframe to forecast decompensation risk sequences. The study implemented and assessed long short-term memory (LSTM) and hybrid convolutional neural network (CNN)-LSTM architectures, along with traditional ML algorithms as baselines. Additionally, it introduced a novel decompensation score based on the predicted risk, validated through principal component analysis (PCA) and k-means analysis for risk stratification. The results showed that, with PPV=0.80, NPV=0.96 and AUC-ROC=0.90, CNN-LSTM had the best performance when predicting decompensation risk sequences. The decompensation score’s effectiveness was also confirmed (PPV=0.83 and NPV=0.96). SHAP plots were generated for the overall model and two risk strata, illustrating variations in feature importance and their associations with the predicted risk. Notably, this study represents the first attempt to predict a sequence of decompensation risks rather than single events, a critical advancement given the challenge of early decompensation detection. Predicting a sequence facilitates early detection of increased decompensation risk and pace, potentially leading to saving more live

    Using MLP partial responses to explain in-hospital mortality in ICU

    Get PDF
    In this paper we propose to use partial responses derived from an initial multilayer perceptron (MLP) to build an explanatory risk prediction model of in-hospital mortality in intensive care units (ICU). Traditionally, MLPs deliver higher performance than linear models such as multivariate logistic regression (MLR). However, MLPs interlink input variables in such a complex way that is not straightforward to explain how the outcome is influenced by inputs and/or input interactions. In this paper, we hypothesized that in some scenarios, such as when the data noise is significant or when the data is just marginally non-linear, we could find slightly more complex associations by obtaining MLP partial responses. That is, by letting change one variable at the time, while keeping constant the rest. Overall, we found that, although the MLR and MLP in-hospital mortality model performances were equivalent, the MLP could explain non-linear associations that otherwise the MLR had considered non-significant. We considered that, although deeming higher-other interactions as disposable noise could be a strong assumption, building explanatory models based on the MLP partial responses could still be more informative than on MLR

    In-hospital mortality of sepsis differs depending on the origin of infection: an investigation of predisposing factors

    Get PDF
    Sepsis is a heterogeneous syndrome characterised by a variety of clinical features. Analysis of large clinical datasets may serve to define groups of sepsis with different risks of adverse outcomes. Clinical experience supports the concept that prognosis, treatment, severity and time course of sepsis vary depending on the source of infection. We analysed a large publicly available database to test this hypothesis. In addition, we developed prognostic models for the three main types of sepsis: pulmonary, urinary and abdominal sepsis. We used logistic regression using routinely available clinical data for mortality prediction in each of these groups. The data was extracted from the eICU collaborative research database, a multi-centre intensive care unit with over 200,000 admissions. Sepsis cohorts were defined using admission diagnosis codes. We used univariate and multivariate analyses to establish factors relevant for outcome prediction in all three cohorts of sepsis (pulmonary, urinary and abdominal). For logistic regression, input variables were automatically selected using a sequential forward search algorithm over 10 dataset instances. Receiver operator characteristics were generated for each model and compared with established prognostication tools (APACHE IV and SOFA). 3958 sepsis admissions were included in the analysis. Sepsis in-hospital mortality differed depending on the cause of infection: abdominal 18.93%, pulmonary 19.27%, and renal 12.81%. Higher average heart rate was associated with increased mortality risk. Increased average Mean Arterial Pressure showed a reduced mortality risk across all sepsis groups. Results from the LR models found significant factors that were relevant for specific sepsis groups. Our models outperformed APACHE IV and SOFA scores with AUC between 0.63 and 0.74. Predictive power decreased over time, with the best results achieved for data extracted for the first 24h of admission. Mortality varied significantly between the three sepsis groups. We also demonstrate that factors of importance show considerable heterogeneity depending on the source of infection. The factors influencing in-hospital mortality vary depending on the source of sepsis which may explain why most sepsis trials have failed to identify an effective treatment. The source of infection should be considered when considering mortality risk. Planning of sepsis treatment trials may benefit from risk stratification based on the source of infection

    Urban Water Demand Prediction for a City that Suffers from Climate Change and Population Growth: Gauteng Province case study

    Get PDF
    The proper management of municipal water system is essential to sustain cities and support water security of societies. Urban water estimating has always been a challenging task for managers of water utilities and policymakers. This paper applies a novel methodology that includes data pre-processing and Artificial Neural Network (ANN) optimized with Backtracking Search Algorithm (BSA-ANN) to estimate monthly water demand in relation to previous water consumption. Historical data of monthly water consumption in the Gauteng Province, South Africa, for the period 2007–2016, were selected for the creation and evaluation of the methodology. Data pre-processing techniques played a crucial role in the enhancing of the quality of the data before creating the prediction model. The BSA-ANN model yielded the best result with a root mean square error and a coefficient of efficiency of 0.0099 mega liters and 0.979, respectively. Also, it proved more efficient and reliable than the Crow Search Algorithm (CSA-ANN), based on the scale of error. Overall, this paper presents a new application for the hybrid model BSA-ANN that can be successfully used to predict water demand with high accuracy, in a city that heavily suffers from the impact of climate change and population growth

    Mapping the global free expression landscape using machine learning

    Get PDF
    Freedom of expression is a core human right, yet the forces that seek to suppress it have intensified, increasing the need to develop tools that can measure the rates of freedom globally. In this study, we propose a novel freedom of expression index to gain a nuanced and data-led understanding of the level of censorship across the globe. For this, we used an unsupervised, probabilistic machine learning method, to model the status of the free expression landscape. This index seeks to provide legislators and other policymakers, activists and governments, and non-governmental and intergovernmental organisations, with tools to better inform policy or action decisions. The global nature of the proposed index also means it can become a vital resource/tool for engagement with international and supranational bodies

    Towards interpretable machine learning for clinical decision support

    No full text
    A major challenge in delivering reliable and trustworthy computational intelligence for practical applications in clinical medicine is interpretability. This aspect of machine learning is a major distinguishing factor compared with traditional statistical models for the stratification of patients, which typically use rules or a risk score identified by logistic regression. We show how functions of one and two variables can be extracted from pre-trained machine learning models using anchored Analysis of Variance (ANOVA) decompositions. This enables complex interaction terms to be filtered out by aggressive regularisation using the Least Absolute Shrinkage and Selection Operator (LASSO) resulting in a sparse model with comparable or even better performance than the original pre-trained black-box. Besides being theoretically well-founded, the decomposition of a black-box multivariate probabilistic binary classifier into a General Additive Model (GAM) comprising a linear combination of non-linear functions of one or two variables provides full interpretability. In effect this extends logistic regression into non-linear modelling without the need for manual intervention by way of variable transformations, using the pre-trained model as a seed. The application of the proposed methodology to existing machine learning models is demonstrated using the Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), Random Forests (RF) and Gradient Boosting Machines (GBM), to model a data frame from a well-known benchmark dataset available from Physionet, the Medical Information Mart for Intensive Care (MIMIC-III). Both the classification performance and plausibility of clinical interpretation compare favourably with other state-of-the-art sparse models namely Sparse Additive Models (SAM) and the Explainable Boosting Machine (EBM)

    Associations of Hepatosteatosis with Cardiovascular Disease in HIV Positive and HIV Negative Patients: The Liverpool HIV-Heart Project

    No full text
    Hepatosteatosis (HS) is the most common cause of liver disease in patients living with HIV (PLWHIV), affecting between 13 to 65% (1–3) individuals. HS describes hepatic ectopic fat accumulation and is present when it affects >5% of the liver by weight. HS encompasses a spectrum of clinically entities including non-alcoholic fatty liver disease (NAFLD). The prevalence of HS is under reported. Histologically, progressive hepatic fat accumulation is associated with lipotoxicity and chronic inflammation, progressing in many cases to cirrhotic liver disease, and a threefold increase in mortality (4). The relationship of obesity, insulin resistance, type II diabetes and hepatosteatosis (HS) is well defined in non-HIV populations (5). The estimated prevalence of NAFLD in the United States is predicted to reach 33% of the adult population by 2030 (6). PLWHIV have unique risk factors for the development of HS compared to non-HIV populations. They have been shown to develop lean NAFLD, defined as NAFLD in BMI< 25Kg/m2 , at increased rates compared to non-HIV populations (7). The complex interplay of viral related factors, antiretroviral (ARV) medications and chronic inflammation may cause PLWHIV to be more susceptible to the development of HS. Liver disease represents a huge source of morbidity and mortality in PLWHIV with up to 13% of deaths in the Data Collection on Adverse events of Anti-HIV Drugs (D:A:D) cohort attributable to liver disease (8). In both HIV-positive and HIV-negative populations dyslipidaemia, insulin resistance and overt type II diabetes are strongly associated with the presence of HS. HS has been shown to be associated with CVD in HIV-negative populations (9–11) although this is not universal (12–14). Given the increasing burden of HS in HIV-positive populations 3 we sought to examine if HS was independently associated with CVD in HIV-positive compared to HIV-negative populations

    Breast cancer patient characterisation and visualisation using deep learning and fisher information networks

    Get PDF
    Breast cancer is the most commonly diagnosed female malignancy globally, with better survival rates if diagnosed early. Mammography is the gold standard in screening programmes for breast cancer, but despite technological advances, high error rates are still reported. Machine learning techniques, and in particular deep learning (DL), have been successfully used for breast cancer detection and classification. However, the added complexity that makes DL models so successful reduces their ability to explain which features are relevant to the model, or whether the model is biased. The main aim of this study is to propose a novel visualisation to help characterise breast cancer patients using Fisher Information Networks on features extracted from mammograms using a DL model. In the proposed visualisation, patients are mapped out according to their similarities and can be used to study new patients as a 'patient-like-me' approach. When applied to the CBIS-DDSM dataset, it was shown that it is a competitive methodology that can (i) facilitate the analysis and decision-making process in breast cancer diagnosis with the assistance of the FIN visualisations and 'patient-like-me' analysis, and (ii) help improve diagnostic accuracy and reduce overdiagnosis by identifying the most likely diagnosis based on clinical similarities with neighbouring patients
    corecore