1,104 research outputs found

    Deepr: A Convolutional Net for Medical Records

    Full text link
    Feature engineering remains a major bottleneck when creating predictive systems from electronic medical records. At present, an important missing element is detecting predictive regular clinical motifs from irregular episodic records. We present Deepr (short for Deep record), a new end-to-end deep learning system that learns to extract features from medical records and predicts future risk automatically. Deepr transforms a record into a sequence of discrete elements separated by coded time gaps and hospital transfers. On top of the sequence is a convolutional neural net that detects and combines predictive local clinical motifs to stratify the risk. Deepr permits transparent inspection and visualization of its inner working. We validate Deepr on hospital data to predict unplanned readmission after discharge. Deepr achieves superior accuracy compared to traditional techniques, detects meaningful clinical motifs, and uncovers the underlying structure of the disease and intervention space

    A New Scalable, Portable, and Memory-Efficient Predictive Analytics Framework for Predicting Time-to-Event Outcomes in Healthcare

    Get PDF
    Time-to-event outcomes are prevalent in medical research. To handle these outcomes, as well as censored observations, statistical and survival regression methods are widely used based on the assumptions of linear association; however, clinicopathological features often exhibit nonlinear correlations. Machine learning (ML) algorithms have been recently adapted to effectively handle nonlinear correlations. One drawback of ML models is that they can model idiosyncratic features of a training dataset. Due to this overlearning, ML models perform well on the training data but are not so striking on test data. The features that we choose indirectly influence the performance of ML prediction models. With the expansion of big data in biomedical informatics, appropriate feature engineering and feature selection are vital to ML success. Also, an ensemble learning algorithm helps decrease bias and variance by combining the predictions of multiple models. In this study, we newly constructed a scalable, portable, and memory-efficient predictive analytics framework, fitting four components (feature engineering, survival analysis, feature selection, and ensemble learning) together. Our framework first employs feature engineering techniques, such as binarization, discretization, transformation, and normalization on raw dataset. The normalized feature set was applied to the Cox survival regression that produces highly correlated features relevant to the outcome.The resultant feature set was deployed to “eXtreme gradient boosting ensemble learning” (XGBoost) and Recursive Feature Elimination algorithms. XGBoost uses a gradient boosting decision tree algorithm in which new models are created sequentially that predict the residuals of prior models, which are then added together to make the final prediction. In our experiments, we analyzed a cohort of cardiac surgery patients drawn from a multi-hospital academic health system. The model evaluated 72 perioperative variables that impact an event of readmission within 30 days of discharge, derived 48 significant features, and demonstrated optimum predictive ability with feature sets ranging from 16 to 24. The area under the receiver operating characteristics observed for the feature set of 16 were 0.8816, and 0.9307 at the 35th, and 151st iteration respectively. Our model showed improved performance compared to state-of-the-art models and could be more useful for decision support in clinical settings

    Feature selection and personalized modeling on medical adverse outcome prediction

    Get PDF
    This thesis is about the medical adverse outcome prediction and is composed of three parts, i.e. feature selection, time-to-event prediction and personalized modeling. For feature selection, we proposed a three-stage feature selection method which is an ensemble of filter, embedded and wrapper selection techniques. We combine them in a way to select a both stable and predictive set of features as well as reduce the computation burden. Datasets on two adverse outcome prediction problems, 30-day hip fracture readmission and diabetic retinopathy prognosis are derived from electronic health records and exemplified to prove the effectiveness of the proposed method. With the selected features, we investigated the application of some classical survival analysis models, namely the accelerated failure time models, Cox proportional hazard regression models and mixture cure models on adverse outcome prediction. Unlike binary classifiers, survival analysis methods consider both the status and time-to-event information and provide more flexibility when we are interested in the occurrence of adverse outcome in different time windows. Lastly, we introduced the use of personalized modeling(PM) to predict adverse outcome based on the most similar patients of each query patient. Different from the commonly used global modeling approach, PM builds prediction model on smaller but more similar patient cohort thus leading to a more individual-based prediction and customized risk factor profile. Both static and metric learning distance measures are used to identify similar patient cohort. We show that PM together with feature selection achieves better prediction performance by using only similar patients, compared with using data from all available patients in one-size-fits-all model

    Clinical Treatment Human Disease Networks and Comparative Effectiveness Research: Analyses of the Medicare Administrative Data

    Get PDF
    As the nation’s largest healthcare payer, the Medicare program generates an unimaginable vast volume of medical data. With an increasing emphasis on evidence-based care, how to effectively handle and make inferences from the heterogeneous and noisy healthcare data remains an important question. High-quality analysis could improve the quality, planning, and administrations of health services, evaluate comparative therapies, and forward research on epidemiology and disease etiology. This is especially true for older adults since this population’s health condition is generally complicated with multimorbidity, and the healthcare system for older adults is riddled with administrative and regulatory complexities. Taking advantage of the scaled and comprehensive Medicare data, this dissertation focuses on outcome research, human disease networks, and comparative effectiveness research for older adults. Healthcare outcome measures such as mortality, readmission, length of stay (LOS), and medical costs have been extensively studied. However, existing analysis generally focuses on one single disease (or at most a few pre-selected and closely related diseases) or all diseases combined. It is increasingly evident that human diseases are interconnected with each other. Motivated by the emerging human disease network (HDN) analysis, we conduct network analysis of disease interconnections on healthcare outcomes measures. First, we propose a clinical treatment HDN that analyzes inpatient LOS data. In the network graph, one node represents one disease, and two nodes are linked with an edge if their disease-specific LOS are correlated (conditional on LOS of all other diseases). To accommodate zero-inflated LOS data, we propose a network construction approach based on the multivariate Hurdle model. We analyze the Medicare inpatient data for the period of January 2008 to December 2018. Based on the constructed network, key network properties such as connectivity, module/hub, and temporal variation are analyzed. The results are found to be biomedically sensible, especially from a treatment perspective. A closer examination also reveals novel findings that are less/not investigated in the individual-disease studies. This work has been published in Statistics in Medicine. Second, considering that many healthcare outcomes are closely related to each other, we propose a high-dimensional clinical treatment HDN that can incorporate multiple outcomes. We construct a clinical treatment HDN on LOS and readmission and note that the proposed method can be easily generalized to other outcomes of different data types. To deal with uniquely challenging data distributions (high-dimensionality and zero-inflation), a new network construction approach is developed based on the integrative analysis of generalized linear models. Data analysis is conducted using the Medicare inpatient data from January 2010 to December 2018. Network structure and properties are found to be similar to that of the LOS HDN (in Chapter 2) but provide additional insights into disease interconnections considering both LOS and readmission. The proposed clinical treatment of HDNs can promote a better understanding of human diseases and their interconnections, guide a more efficient disease management and healthcare resources allocation, and foster complex network analysis. The manuscript of this work has been drafted and is ready for submission. Comparative effectiveness research aims to directly compare the outcomes of two or more healthcare strategies to address a particular medical condition. Such analysis can provide information about the risks, benefits, and costs of different treatment options, thus guide better clinical decisions. While conducting a randomized controlled trial is the gold-standard approach, there are several limitations. Efforts have been made to utilize healthcare record data in comparative effectiveness research. To estimate and compare causal effects of treatments/interventions, we use the Medicare data to emulate target clinical trials and develop a deep learning-based analysis approach. Under emulation, target clinical trials are explicitly “assembled” using the Medicare data. As such, statistical methods for clinical trials can be directly applied to estimate causal effects. With emulation analysis, we evaluate the effectiveness and safety outcomes of rivaroxaban versus dabigatran for Medicare patients with atrial fibrillation. The results show that dabigatran is superior in terms of time to any primary event (including ischemic stroke, other thromboembolic events, major bleeding, and death), major bleeding, and mortality. This work has been submitted to Clinical Epidemiology. Considering that many regression-based statistical methods (e.g., Cox proportional hazards model for survival data) have too strict data assumptions, we further develop an innovative deep learning-based analysis strategy. With the “emulation + deep learning” approach, we study the survival outcomes of endovascular repair versus open aortic repair for Medicare patients with abdominal aortic aneurysms. It is found that endovascular repair has survival advantages in both short- and long-term mortality. This work has been published in Entropy. Significantly different and advancing from the existing literature, this dissertation extends the scope of outcome research, human disease networks, and comparative effectiveness research. The findings in this dissertation are shown to have scientific merits, and the methodological developments may have other applications and serve as prototypes for future analysis
    • …
    corecore