500 research outputs found

    Machine learning in transfusion medicine: A scoping review

    Get PDF

    Machine learning and disease prediction in obstetrics

    Get PDF
    Machine learning technologies and translation of artificial intelligence tools to enhance the patient experience are changing obstetric and maternity care. An increasing number of predictive tools have been developed with data sourced from electronic health records, diagnostic imaging and digital devices. In this review, we explore the latest tools of machine learning, the algorithms to establish prediction models and the challenges to assess fetal well-being, predict and diagnose obstetric diseases such as gestational diabetes, pre-eclampsia, preterm birth and fetal growth restriction. We discuss the rapid growth of machine learning approaches and intelligent tools for automated diagnostic imaging of fetal anomalies and to asses fetoplacental and cervix function using ultrasound and magnetic resonance imaging. In prenatal diagnosis, we discuss intelligent tools for magnetic resonance imaging sequencing of the fetus, placenta and cervix to reduce the risk of preterm birth. Finally, the use of machine learning to improve safety standards in intrapartum care and early detection of complications will be discussed. The demand for technologies to enhance diagnosis and treatment in obstetrics and maternity should improve frameworks for patient safety and enhance clinical practice

    A Machine Learning Approach to Predicting Early and Late Reintubation

    Get PDF
    Accurate estimations of surgical risks is important for improving the shared decision making and informed consent processes. Reintubation is a severe postoperative complication that can lead to various other detrimental outcomes. Reintubation can also be broken up into early reintubation (within 72 hours of surgery) and late reintubation (within 30 days of surgery). Using clinical data provided by ACS NSQIP, scoring systems were developed for the prediction of combined, early, and late reintubation. The risk factors included in each scoring system were narrowed down from a set of 37 pre and perioperative factors. The scoring systems demonstrated good performance in terms of both accuracy and discrimination, and these results were only marginally worse than prediction using the full set of risk variables. While more work needs to be done to identify clinically relevant differences between the early and late reintubation outcomes, the scoring systems provided here can be used by surgeons and patients to improve the quality of care overall

    Imbalance Learning and Its Application on Medical Datasets

    Get PDF
    To gain more valuable information from the increasing large amount of data, data mining has been a hot topic that attracts growing attention in this two decades. One of the challenges in data mining is imbalance learning, which refers to leaning from imbalanced datasets. The imbalanced datasets is dominated by some classes (majority) and other under-represented classes (minority). The imbalanced datasets degrade the learning ability of traditional methods, which are designed on the assumption that all classes are balanced and have equal misclassification costs, leading to the poor performance on the minority classes. This phenomenon is usually called the class imbalance problem. However, it is usually the minority classes of more interest and importance, such as sick cases in the medical dataset. Additionally, traditional methods are optimized to achieve maximum accuracy, which is not suitable for evaluating the performance on imbalanced datasets. From the view of data space, class imbalance could be classified as extrinsic imbalance and intrinsic imbalance. Extrinsic imbalance is caused by external factors, such as data transmission or data storage, while intrinsic imbalance means the dataset is inherently imbalanced due to its nature.  As extrinsic imbalance could be fixed by collecting more samples, this thesis mainly focus on on two scenarios of the intrinsic imbalance,  machine learning for imbalanced structured datasets and deep learning for imbalanced image datasets.  Normally, the solutions for the class imbalance problem are named as imbalance learning methods, which could be grouped into data-level methods (re-sampling), algorithm-level (re-weighting) methods and hybrid methods. Data-level methods modify the class distribution of the training dataset to create balanced training sets, and typical examples are over-sampling and under-sampling. Instead of modifying the data distribution, algorithm-level methods adjust the misclassification cost to alleviate the class imbalance problem, and one typical example is cost sensitive methods. Hybrid methods usually combine data-level methods and algorithm-level methods. However, existing imbalance learning methods encounter different kinds of problems. Over-sampling methods increase the minority samples to create balanced training sets, which might lead the trained model overfit to the minority class. Under-sampling methods create balanced training sets by discarding majority samples, which lead to the information loss and poor performance of the trained model. Cost-sensitive methods usually need assistance from domain expert to define the misclassification costs which are task specified. Thus, the generalization ability of cost-sensitive methods is poor. Especially, when it comes to the deep learning methods under class imbalance, re-sampling methods may introduce large computation cost and existing re-weighting methods could lead to poor performance. The object of this dissertation is to understand features difference under class imbalance, to improve the classification performance on structured datasets or image datasets. This thesis proposes two machine learning methods for imbalanced structured datasets and one deep learning method for imbalance image datasets. The proposed methods are evaluated on several medical datasets, which are intrinsically imbalanced.  Firstly, we study the feature difference between the majority class and the minority class of an imbalanced medical dataset, which is collected from a Chinese hospital. After data cleaning and structuring, we get 3292 kidney stone cases treated by Percutaneous Nephrolithonomy from 2012 to 2019. There are 651 (19.78% ) cases who have postoperative complications, which makes the complication prediction an imbalanced classification task. We propose a sampling-based method SMOTE-XGBoost and implement it to build a postoperative complication prediction model. Experimental results show that the proposed method outperforms classic machine learning methods. Furthermore, traditional prediction models of Percutaneous Nephrolithonomy are designed to predict the kidney stone status and overlook complication related features, which could degrade their prediction performance on complication prediction tasks. To this end, we merge more features into the proposed sampling-based method and further improve the classification performance. Overall, SMOTE-XGBoost achieves an AUC of 0.7077 which is 41.54% higher than that of S.T.O.N.E. nephrolithometry, a traditional prediction model of Percutaneous Nephrolithonomy. After reviewing the existing machine learning methods under class imbalance, we propose a novel ensemble learning approach called Multiple bAlance Subset Stacking (MASS). MASS first cuts the majority class into multiple subsets by the size of the minority set, and combines each majority subset with the minority set as one balanced subsets. In this way, MASS could overcome the problem of information loss because it does not discard any majority sample. Each balanced subset is used to train one base classifier. Then, the original dataset is feed to all the trained base classifiers, whose output are used to generate the stacking dataset. One stack model is trained by the staking dataset to get the optimal weights for the base classifiers. As the stacking dataset keeps the same labels as the original dataset, which could avoid the overfitting problem. Finally, we can get an ensembled strong model based on the trained base classifiers and the staking model. Extensive experimental results on three medical datasets show that MASS outperforms baseline methods.  The robustness of MASS is proved over implementing different base classifiers. We design a parallel version MASS to reduce the training time cost. The speedup analysis proves that Parallel MASS could reduce training time cost greatly when applied on large datasets. Specially, Parallel MASS reduces 101.8% training time compared with MASS at most in our experiments.  When it comes to the class imbalance problem of image datasets, existing imbalance learning methods suffer from the problem of large training cost and poor performance.  After introducing the problem of implementing resampling methods on image classification tasks, we demonstrate issues of re-weighting strategy using class frequencies through the experimental result on one medical image dataset.  We propose a novel re-weighting method Hardness Aware Dynamic loss to solve the class imbalance problem of image datasets. After each training epoch of deep neural networks, we compute the classification hardness of each class. We will assign higher class weights to the classes have large classification hardness values and vice versa in the next epoch. In this way, HAD could tune the weight of each sample in the loss function dynamically during the training process. The experimental results prove that HAD significantly outperforms the state-of-the-art methods. Moreover, HAD greatly improves the classification accuracies of minority classes while only making a small compromise of majority class accuracies. Especially, HAD loss improves 10.04% average precision compared with the best baseline, Focal loss, on the HAM10000 dataset. At last, I conclude this dissertation with our contributions to the imbalance learning, and provide an overview of potential directions for future research, which include extensions of the three proposed methods, development of task-specified algorithms, and fixing the challenges of within-class imbalance.2021-06-0

    Deep Risk Prediction and Embedding of Patient Data: Application to Acute Gastrointestinal Bleeding

    Get PDF
    Acute gastrointestinal bleeding is a common and costly condition, accounting for over 2.2 million hospital days and 19.2 billion dollars of medical charges annually. Risk stratification is a critical part of initial assessment of patients with acute gastrointestinal bleeding. Although all national and international guidelines recommend the use of risk-assessment scoring systems, they are not commonly used in practice, have sub-optimal performance, may be applied incorrectly, and are not easily updated. With the advent of widespread electronic health record adoption, longitudinal clinical data captured during the clinical encounter is now available. However, this data is often noisy, sparse, and heterogeneous. Unsupervised machine learning algorithms may be able to identify structure within electronic health record data while accounting for key issues with the data generation process: measurements missing-not-at-random and information captured in unstructured clinical note text. Deep learning tools can create electronic health record-based models that perform better than clinical risk scores for gastrointestinal bleeding and are well-suited for learning from new data. Furthermore, these models can be used to predict risk trajectories over time, leveraging the longitudinal nature of the electronic health record. The foundation of creating relevant tools is the definition of a relevant outcome measure; in acute gastrointestinal bleeding, a composite outcome of red blood cell transfusion, hemostatic intervention, and all-cause 30-day mortality is a relevant, actionable outcome that reflects the need for hospital-based intervention. However, epidemiological trends may affect the relevance and effectiveness of the outcome measure when applied across multiple settings and patient populations. Understanding the trends in practice, potential areas of disparities, and value proposition for using risk stratification in patients presenting to the Emergency Department with acute gastrointestinal bleeding is important in understanding how to best implement a robust, generalizable risk stratification tool. Key findings include a decrease in the rate of red blood cell transfusion since 2014 and disparities in access to upper endoscopy for patients with upper gastrointestinal bleeding by race/ethnicity across urban and rural hospitals. Projected accumulated savings of consistent implementation of risk stratification tools for upper gastrointestinal bleeding total approximately $1 billion 5 years after implementation. Most current risk scores were designed for use based on the location of the bleeding source: upper or lower gastrointestinal tract. However, the location of the bleeding source is not always clear at presentation. I develop and validate electronic health record based deep learning and machine learning tools for patients presenting with symptoms of acute gastrointestinal bleeding (e.g., hematemesis, melena, hematochezia), which is more relevant and useful in clinical practice. I show that they outperform leading clinical risk scores for upper and lower gastrointestinal bleeding, the Glasgow Blatchford Score and the Oakland score. While the best performing gradient boosted decision tree model has equivalent overall performance to the fully connected feedforward neural network model, at the very low risk threshold of 99% sensitivity the deep learning model identifies more very low risk patients. Using another deep learning model that can model longitudinal risk, the long-short-term memory recurrent neural network, need for transfusion of red blood cells can be predicted at every 4-hour interval in the first 24 hours of intensive care unit stay for high risk patients with acute gastrointestinal bleeding. Finally, for implementation it is important to find patients with symptoms of acute gastrointestinal bleeding in real time and characterize patients by risk using available data in the electronic health record. A decision rule-based electronic health record phenotype has equivalent performance as measured by positive predictive value compared to deep learning and natural language processing-based models, and after live implementation appears to have increased the use of the Acute Gastrointestinal Bleeding Clinical Care pathway. Patients with acute gastrointestinal bleeding but with other groups of disease concepts can be differentiated by directly mapping unstructured clinical text to a common ontology and treating the vector of concepts as signals on a knowledge graph; these patients can be differentiated using unbalanced diffusion earth mover’s distances on the graph. For electronic health record data with data missing not at random, MURAL, an unsupervised random forest-based method, handles data with missing values and generates visualizations that characterize patients with gastrointestinal bleeding. This thesis forms a basis for understanding the potential for machine learning and deep learning tools to characterize risk for patients with acute gastrointestinal bleeding. In the future, these tools may be critical in implementing integrated risk assessment to keep low risk patients out of the hospital and guide resuscitation and timely endoscopic procedures for patients at higher risk for clinical decompensation

    Statistical machines for trauma hospital outcomes research: Application to the PRospective, Observational, Multi-center Major trauma Transfusion (PROMMTT) study

    Get PDF
    Improving the treatment of trauma, a leading cause of death worldwide, is of great clinical and public health interest. This analysis introduces flexible statistical methods for estimating center-level effects on individual outcomes in the context of highly variable patient populations, such as those of the PRospective, Observational, Multi-center Major Trauma Transfusion study. Ten US level I trauma centers enrolled a total of 1,245 trauma patients who survived at least 30 minutes after admission and received at least one unit of red blood cells. Outcomes included death, multiple organ failure, substantial bleeding, and transfusion of blood products. The centers involved were classified as either large or small-volume based on the number of massive transfusion patients enrolled during the study period. We focused on estimation of parameters inspired by causal inference, specifically estimated impacts on patient outcomes related to the volume of the trauma hospital that treated them. We defined this association as the change in mean outcomes of interest that would be observed if, contrary to fact, subjects from large-volume sites were treated at small-volume sites (the effect of treatment among the treated). We estimated this parameter using three different methods, some of which use data-adaptive machine learning tools to derive the outcome models, minimizing residual confounding by reducing model misspecification. Differences between unadjusted and adjusted estimators sometimes differed dramatically, demonstrating the need to account for differences in patient characteristics in clinic comparisons. In addition, the estimators based on robust adjustment methods showed potential impacts of hospital volume. For instance, we estimated a survival benefit for patients who were treated at large-volume sites, which was not apparent in simpler, unadjusted comparisons. By removing arbitrary modeling decisions from the estimation process and concentrating on parameters that have more direct policy implications, these potentially automated approaches allow methodological standardization across similar comparativeness effectiveness studies

    Optimising cardiac services using routinely collected data and discrete event simulation

    Get PDF
    Background: The current practice of managing hospital resources, including beds, is very much driven by measuring past or expected utilisation of resources. This practice, however, doesn’t reflect variability among patients. Consequently, managers and clinicians cannot make fully informed decisions based upon these measures which are considered inadequate in planning and managing complex systems. Aim: to analyse how variation related to patient conditions and adverse events affect resource utilisation and operational performance. Methods: Data pertaining to cardiac patients (cardiothoracic and cardiology, n=2241) were collected from two major hospitals in Oman. Factors influential to resource utilisation were assessed using logistic regressions. Other analysis related to classifying patients based on their resource utilisation was carried out using decision tree to assist in predicting hospital stay. Finally, discrete event simulation modelling was used to evaluate how patient factors and postoperative complications are affecting operational performance. Results: 26.5% of the patients experienced prolonged Length of Stay (LOS) in intensive care units and 30% in the ward. Patients with prolonged postoperative LOS had 60% of the total patient days. Some of the factors that explained the largest amount of variance in resource use following cardiac procedure included body mass index, type of surgery, Cardiopulmonary Bypass (CPB) use, non-elective surgery, number of complications, blood transfusion, chronic heart failure, and previous angioplasty. Allocating resources based on patient expected LOS has resulted in a reduction of surgery cancellations and waiting times while overall throughput has increased. Complications had a significant effect on perioperative operational performance such as surgery cancellations. The effect was profound when complications occurred in the intensive care unit where a limited capacity was observed. Based on the simulation model, eliminating some complications can enlarge patient population. Conclusion: Integrating influential factors into resource planning through simulation modelling is an effective way to estimate and manage hospital capacity.Open Acces
    • …
    corecore