110,120 research outputs found

    Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project

    Get PDF
    Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data

    Predictive modeling of housing instability and homelessness in the Veterans Health Administration

    Full text link
    OBJECTIVE: To develop and test predictive models of housing instability and homelessness based on responses to a brief screening instrument administered throughout the Veterans Health Administration (VHA). DATA SOURCES/STUDY SETTING: Electronic medical record data from 5.8 million Veterans who responded to the VHA's Homelessness Screening Clinical Reminder (HSCR) between October 2012 and September 2015. STUDY DESIGN: We randomly selected 80% of Veterans in our sample to develop predictive models. We evaluated the performance of both logistic regression and random forests—a machine learning algorithm—using the remaining 20% of cases. DATA COLLECTION/EXTRACTION METHODS: Data were extracted from two sources: VHA's Corporate Data Warehouse and National Homeless Registry. PRINCIPAL FINDINGS: Performance for all models was acceptable or better. Random forests models were more sensitive in predicting housing instability and homelessness than logistic regression, but less specific in predicting housing instability. Rates of positive screens for both outcomes were highest among Veterans in the top strata of model‐predicted risk. CONCLUSIONS: Predictive models based on medical record data can identify Veterans likely to report housing instability and homelessness, making the HSCR screening process more efficient and informing new engagement strategies. Our findings have implications for similar instruments in other health care systems.U.S. Department of Veterans Affairs (VA) Health Services Research and Development (HSR&D), Grant/Award Number: IIR 13-334 (IIR 13-334 - U.S. Department of Veterans Affairs (VA) Health Services Research and Development (HSRD))Accepted manuscrip

    ARTMAP-IC and Medical Diagnosis: Instance Counting and Inconsistent Cases

    Full text link
    For complex database prediction problems such as medical diagnosis, the ARTMAP-IC neural network adds distributed prediction and category instance counting to the basic fuzzy ARTMAP system. For the ARTMAP match tracking algorithm, which controls search following a predictive error, a new version facilitates prediction with sparse or inconsistent data. Compared to the original match tracking algorithm (MT+), the new algorithm (MT-) better approximates the real-time network differential equations and further compresses memory without loss of performance. Simulations examine predictive accuracy on four medical databases: Pima Indian diabetes, breast cancer, heart disease, and gall bladder removal. ARTMAP-IC results arc equal to or better than those of logistic regression, K nearest neighbor (KNN), the ADAP perceptron, multisurface pattern separation, CLASSIT, instance-based (IBL), and C4. ARTMAP dynamics are fast, stable, and scalable. A voting strategy improves prediction by training the system several times on different orderings of an input set. Voting, instance counting, and distributed representations combine to form confidence estimates for competing predictions.National Science Foundation (IRI 94-01659); Office of Naval Research (N00014-95-J-0409, N00014-95-0657

    Plate versus bulk trolley food service in a hospital: comparison of patients’ satisfaction

    Get PDF
    Objective The aim of this research was to compare plate with bulk trolley food service in hospitals in terms of patient satisfaction. Key factors distinguishing satisfaction with each system would also be identified. Methods A consumer opinion card (n = 180), concentrating on the quality indicators of core foods, was used to measure patient satisfaction and compare two systems of delivery, plate and trolley. Binary logistic regression analysis was used to build a model that would predict food service style on the basis of the food attributes measured. Further investigation used multinomial logistic regression to predict opinion for the assessment of each food attribute within food service style. Results Results showed that the bulk trolley method of food distribution enables all foods to have a more acceptable texture, and for some foods (potato, P = 0.007; poached fish, P = 0.001; and minced beef, P ≤ 0.0005) temperature, and for other foods (broccoli, P ≤ 0.0005; carrots, P ≤ 0.0005; and poached fish, P = 0.001) flavor, than the plate system of delivery, where flavor is associated with bad opinion or dissatisfaction. A model was built indicating patient satisfaction with the two service systems. Conclusion This research confirms that patient satisfaction is enhanced by choice at the point of consumption (trolley system); however, portion size was not the controlling dimension. Temperature and texture were the most important attributes that measure patient satisfaction with food, thus defining the focus for hospital food service managers. To date, a model predicting patient satisfaction with the quality of food as served has not been proposed, and as such this work adds to the body of knowledge in this field. This report brings new information about the service style of dishes for improving the quality of food and thus enhancing patient satisfaction

    Proteomic-biostatistic integrated approach for finding the underlying molecular determinants of hypertension in human plasma

    Get PDF
    Despite advancements in lowering blood pressure, the best approach to lower it remains controversial because of the lack of information on the molecular basis of hypertension. We, therefore, performed plasma proteomics of plasma from patients with hypertension to identify molecular determinants detectable in these subjects but not in controls and vice versa. Plasma samples from hypertensive subjects (cases; n=118) and controls (n=85) from the InGenious HyperCare cohort were used for this study and performed mass spectrometric analysis. Using biostatistical methods, plasma peptides specific for hypertension were identified, and a model was developed using least absolute shrinkage and selection operator logistic regression. The underlying peptides were identified and sequenced off-line using matrix-assisted laser desorption ionization orbitrap mass spectrometry. By comparison of the molecular composition of the plasma samples, 27 molecular determinants were identified differently expressed in cases from controls. Seventy percent of the molecular determinants selected were found to occur less likely in hypertensive patients. In cross-validation, the overall R(2) was 0.434, and the area under the curve was 0.891 with 95% confidence interval 0.8482 to 0.9349, P<0.0001. The mean values of the cross-validated proteomic score of normotensive and hypertensive patients were found to be -2.007±0.3568 and 3.383±0.2643, respectively, P<0.0001. The molecular determinants were successfully identified, and the proteomic model developed shows an excellent discriminatory ability between hypertensives and normotensives. The identified molecular determinants may be the starting point for further studies to clarify the molecular causes of hypertension

    RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    Full text link
    Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), and area under the curve for receiver operating characteristic plots (all p<106p < 10^{-6}). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases
    corecore