4,308 research outputs found

    Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project

    Get PDF
    Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data

    Deepr: A Convolutional Net for Medical Records

    Full text link
    Feature engineering remains a major bottleneck when creating predictive systems from electronic medical records. At present, an important missing element is detecting predictive regular clinical motifs from irregular episodic records. We present Deepr (short for Deep record), a new end-to-end deep learning system that learns to extract features from medical records and predicts future risk automatically. Deepr transforms a record into a sequence of discrete elements separated by coded time gaps and hospital transfers. On top of the sequence is a convolutional neural net that detects and combines predictive local clinical motifs to stratify the risk. Deepr permits transparent inspection and visualization of its inner working. We validate Deepr on hospital data to predict unplanned readmission after discharge. Deepr achieves superior accuracy compared to traditional techniques, detects meaningful clinical motifs, and uncovers the underlying structure of the disease and intervention space

    Statistical analysis and data mining of Medicare patients with diabetes.

    Get PDF
    The purpose of this dissertation is to find ways to decrease Medicare costs and to study health outcomes of diabetes patients as well as to investigate the influence of Medicare, part D since its introduction in 2006 using the CMS CCW (Chronic Condition Data Warehouse) Data and the MEPS (Medical Expenditure Panel Survey) data. In this dissertation, we introduce pattern recognition analysis into the study of medical characteristics and demographic characteristics of the inpatients who have a higher readmission risk. We also broaden the cost-effectiveness analysis by including medical resources usage when investigating the effects of Medicare, part D. In addition, we apply several statistical linear models such as the generalized linear model and data mining techniques such as the neural network model to study the costs and outcomes of both inpatients and outpatients with diabetes in Medicare. Moreover, some descriptive statistics such as kernel density estimation and survival analysis are also employed. One important conclusion from these analyses is that only diseases and procedures, rather than age are key factors to inpatients\u27 mortality rate. Another important discovery is that at the influence of Medicare part 0, insulin is the most efficient oral anti-diabetes drug treatment and that the drug usage in 2006 is not as stable as that in 2005. We also find that the patients who are discharged to home or hospice are more likely to re-enter the hospital after discharge within 30 days. Two - way interaction effect analysis demonstrates that diabetes complications interact with each other, which makes healthcare costs and health outcomes different between a case with one complication and a case with two complications. Accordingly, we propose some useful suggestions. For instance, as for how to decrease Medicare payments for outpatients with diabetes, we suggest that the patients should often monitor their blood glucose level. We also recommend that inpatients with diabetes should pay more attention to their kidney disease, and use prevention to avoid such diseases to decrease the costs

    Cardiovascular/Stroke Risk Stratification in Diabetic Foot Infection Patients Using Deep Learning-Based Artificial Intelligence: An Investigative Study

    Get PDF
    A diabetic foot infection (DFI) is among the most serious, incurable, and costly to treat conditions. The presence of a DFI renders machine learning (ML) systems extremely nonlinear, posing difficulties in CVD/stroke risk stratification. In addition, there is a limited number of well-explained ML paradigms due to comorbidity, sample size limits, and weak scientific and clinical validation methodologies. Deep neural networks (DNN) are potent machines for learning that generalize nonlinear situations. The objective of this article is to propose a novel investigation of deep learning (DL) solutions for predicting CVD/stroke risk in DFI patients. The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) search strategy was used for the selection of 207 studies. We hypothesize that a DFI is responsible for increased morbidity and mortality due to the worsening of atherosclerotic disease and affecting coronary artery disease (CAD). Since surrogate biomarkers for CAD, such as carotid artery disease, can be used for monitoring CVD, we can thus use a DL-based model, namely, Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNN) for CVD/stroke risk prediction in DFI patients, which combines covariates such as office and laboratory-based biomarkers, carotid ultrasound image phenotype (CUSIP) lesions, along with the DFI severity. We confirmed the viability of CVD/stroke risk stratification in the DFI patients. Strong designs were found in the research of the DL architectures for CVD/stroke risk stratification. Finally, we analyzed the AI bias and proposed strategies for the early diagnosis of CVD/stroke in DFI patients. Since DFI patients have an aggressive atherosclerotic disease, leading to prominent CVD/stroke risk, we, therefore, conclude that the DL paradigm is very effective for predicting the risk of CVD/stroke in DFI patients

    Non-communicable Diseases, Big Data and Artificial Intelligence

    Get PDF
    This reprint includes 15 articles in the field of non-communicable Diseases, big data, and artificial intelligence, overviewing the most recent advances in the field of AI and their application potential in 3P medicine

    Interpretable Machine Learning Model for Clinical Decision Making

    Get PDF
    Despite machine learning models being increasingly used in medical decision-making and meeting classification predictive accuracy standards, they remain untrusted black-boxes due to decision-makers\u27 lack of insight into their complex logic. Therefore, it is necessary to develop interpretable machine learning models that will engender trust in the knowledge they generate and contribute to clinical decision-makers intention to adopt them in the field. The goal of this dissertation was to systematically investigate the applicability of interpretable model-agnostic methods to explain predictions of black-box machine learning models for medical decision-making. As proof of concept, this study addressed the problem of predicting the risk of emergency readmissions within 30 days of being discharged for heart failure patients. Using a benchmark data set, supervised classification models of differing complexity were trained to perform the prediction task. More specifically, Logistic Regression (LR), Random Forests (RF), Decision Trees (DT), and Gradient Boosting Machines (GBM) models were constructed using the Healthcare Cost and Utilization Project (HCUP) Nationwide Readmissions Database (NRD). The precision, recall, area under the ROC curve for each model were used to measure predictive accuracy. Local Interpretable Model-Agnostic Explanations (LIME) was used to generate explanations from the underlying trained models. LIME explanations were empirically evaluated using explanation stability and local fit (R2). The results demonstrated that local explanations generated by LIME created better estimates for Decision Trees (DT) classifiers
    corecore