389 research outputs found

    Early Hospital Mortality Prediction Using Routine Vital Signs in ICU Patients

    Get PDF
    In a clinical setting, there are countless scenarios in which a statistical prognosis for patients can be extremely beneficial to medical professionals so that they may better allocate resources to provide the best patient care. The purpose of this paper is to identify when in a patient’s stay a meaningful prediction of hospital mortality can be made to provide that prognosis. In order to accomplish this, eight clinical variables were extracted from the MIMIC-III database for ICU patients and were supplied to a XGBoost model, an advanced Decision Tree Classifier that employs gradient boosting. Because of the imbalanced data, the positive values were weighted more heavily along with other optimized parameter values found from the use of GridSearchCV. A static model demonstrated an average accuracy of 80.50% with an AUC-ROC of 0.800 and an AUC-PR of 0.429. However, a time-series analysis using extracted statistics from twelve-hours of compounded, time-varying data generated a model with an 83.28% accuracy with an AUC-ROC of 0.846 and an AUC-PR of 0.562. Additionally, the model demonstrated the importance of GCS and airway management in the prediction of mortality indicating the need to focus more on these vitals in emergency situations. The time-series model was shown to be most effective in predicting mortality, exemplifying the importance of providing time-series data that can detail the progress/decline of the patient. This implementation especially could be very impactful in clinical settings to provide healthcare professionals with the means to make quick and effective decisions

    Bayesian Learning in the Counterfactual World

    Get PDF
    Recent years have witnessed a surging interest towards the use of machine learning tools for causal inference. In contrast to the usual large data settings where the primary goal is prediction, many disciplines, such as health, economic and social sciences, are instead interested in causal questions. Learning individualized responses to an intervention is a crucial task in many applied fields (e.g., precision medicine, targeted advertising, precision agriculture, etc.) where the ultimate goal is to design optimal and highly-personalized policies based on individual features. In this work, I thus tackle the problem of estimating causal effects of an intervention that are heterogeneous across a population of interest and depend on an individual set of characteristics (e.g., a patient's clinical record, user's browsing history, etc..) in high-dimensional observational data settings. This is done by utilizing Bayesian Nonparametric or Probabilistic Machine Learning tools that are specifically adjusted for the causal setting and have desirable uncertainty quantification properties, with a focus on the issues of interpretability/explainability and inclusion of domain experts' prior knowledge. I begin by introducing terminology and concepts from causality and causal reasoning in the first chapter. Then I include a literature review of some of the state-of-the-art regression-based methods for heterogeneous treatment effects estimation, with an attempt to build a unifying taxonomy and lay down the finite-sample empirical properties of these models. The chapters forming the core of the dissertation instead present some novel methods addressing existing issues in individualized causal effects estimation: Chapter 3 develops both a Bayesian tree ensemble method and a deep learning architecture to tackle interpretability, uncertainty coverage and targeted regularization; Chapter 4 instead introduces a novel multi-task Deep Kernel Learning method particularly suited for multi-outcome | multi-action scenarios. The last chapter concludes with a discussion

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features.

    Get PDF
    Empirical evidence suggests that the malaria parasite Plasmodium falciparum employs a broad range of mechanisms to regulate gene transcription throughout the organism's complex life cycle. To better understand this regulatory machinery, we assembled a rich collection of genomic and epigenomic data sets, including information about transcription factor (TF) binding motifs, patterns of covalent histone modifications, nucleosome occupancy, GC content, and global 3D genome architecture. We used these data to train machine learning models to discriminate between high-expression and low-expression genes, focusing on three distinct stages of the red blood cell phase of the Plasmodium life cycle. Our results highlight the importance of histone modifications and 3D chromatin architecture in Plasmodium transcriptional regulation and suggest that AP2 transcription factors may play a limited regulatory role, perhaps operating in conjunction with epigenetic factors
    • …
    corecore