14 research outputs found
Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
<div><p>Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up, leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participants developed diabetes. The full RF model evaluated 93 variables including demographic, anthropometric, blood biomarker, medical history, and echocardiogram data. We also used RF metrics of variable importance to rank variables according to their contribution to diabetes prediction. We implemented other models based on logistic regression and RF where features were preselected. The RF full model performance was similar (AUC = 0.82) to those more parsimonious models. The top-ranked variables according to RF included hemoglobin A1C<sub>,</sub> fasting plasma glucose, waist circumference, adiponectin, c-reactive protein, triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, and aldosterone. This work shows the potential of RF for incident diabetes prediction while dealing with high-dimensional data.</p></div
Top 15 Variables Found in Random Forest Analyses, according to the Gini Index (N = 1000).
<p>Top 15 Variables Found in Random Forest Analyses, according to the Gini Index (N = 1000).</p
The dependence of classification accuracy on sample size is presented.
<p>The dependence of classification accuracy on sample size is presented.</p
Prediction performance of the five models when using sample size 1000 (500 participants per group).
<p>The values in each cell correspond to mean and standard deviation across the 100 computations.</p
Baseline Characteristics by Incident Diabetes Mellitus Status in Prediction of Incident Diabetes in the Jackson Heart Study Cohort using Random Forests.
<p>Baseline Characteristics by Incident Diabetes Mellitus Status in Prediction of Incident Diabetes in the Jackson Heart Study Cohort using Random Forests.</p
Studies investigating prediction of diabetes using machine learning methods.
<p>Studies investigating prediction of diabetes using machine learning methods.</p
Additional file 2: of Whole genome sequence analysis of serum amino acid levels
Supplemental methods and Figures S1–S2. (PDF 215 kb
Additional file 1: Tables S1–S13. of Whole genome sequence analysis of serum amino acid levels
(XLSX 195 kb
Additional file 2: Table S3. of Genome-wide association study of coronary artery calcified atherosclerotic plaque in African Americans with type 2 diabetes
Replication of AA-DHS admixture mapping results in JHS. (XLSX 11 kb
Additional file 3: Table S1. of Genome-wide association study of coronary artery calcified atherosclerotic plaque in African Americans with type 2 diabetes
List of SNPs with meta-analysis p-value less than 10–4. (XLSX 190 kb