Search CORE

14 research outputs found

Application of Random Forests Methods to Diabetic Retinopathy Classification Analyses

Author: Craig M. Greven (579782)
Emily Y. Chew (102712)
Ramon Casanova (128250)
Ronald P. Danis (502458)
Santiago Saldana (579781)
Walter T. Ambrosius (579783)
Publication venue
Publication date: 01/01/2014
Field of study

<div>BackgroundDiabetic retinopathy (DR) is one of the leading causes of blindness in the United States and world-wide. DR is a silent disease that may go unnoticed until it is too late for effective treatment. Therefore, early detection could improve the chances of therapeutic interventions that would alleviate its effects.MethodologyGraded fundus photography and systemic data from 3443 ACCORD-Eye Study participants were used to estimate Random Forest (RF) and logistic regression classifiers. We studied the impact of sample size on classifier performance and the possibility of using RF generated class conditional probabilities as metrics describing DR risk. RF measures of variable importance are used to detect factors that affect classification performance.Principal FindingsBoth types of data were informative when discriminating participants with or without DR. RF based models produced much higher classification accuracy than those based on logistic regression. Combining both types of data did not increase accuracy but did increase statistical discrimination of healthy participants who subsequently did or did not have DR events during four years of follow-up. RF variable importance criteria revealed that microaneurysms counts in both eyes seemed to play the most important role in discrimination among the graded fundus variables, while the number of medicines and diabetes duration were the most relevant among the systemic variables.Conclusions and SignificanceWe have introduced RF methods to DR classification analyses based on fundus photography data. In addition, we propose an approach to DR risk assessment based on metrics derived from graded fundus photography and systemic data. Our results suggest that RF methods could be a valuable tool to diagnose DR diagnosis and evaluate its progression.</div

Directory of Open Access Journals

PubMed Central

FigShare

Most relevant variables according to RF permutation index criterion for each type of data.

Author: Craig M. Greven (579782)
Emily Y. Chew (102712)
Ramon Casanova (128250)
Ronald P. Danis (502458)
Santiago Saldana (579781)
Walter T. Ambrosius (579783)
Publication venue
Publication date
Field of study

The permutation index reflects decreases in classification performance when the values of a given variable have been randomly permuted. Abnormalities refer to the presence of different lesions detected by reviewers (e.g. drusens, age-related macular degeneration features, etc. - see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0098587#pone.0098587.s001" target="_blank">Table S1</a>). ACCORD arm randomization refers to membership to one of the eights arms of the ACCORD trial.</p

FigShare

Estimates of RF classification accuracy obtained using the OOB mechanism and two-fold CV.

Author: Craig M. Greven (579782)
Emily Y. Chew (102712)
Ramon Casanova (128250)
Ronald P. Danis (502458)
Santiago Saldana (579781)
Walter T. Ambrosius (579783)
Publication venue
Publication date
Field of study

RF models were estimated using all the available variables.</p

FigShare

Performance across sample sizes of both RF (right panel) and LR is shown for three different scenarios: 1) Only eye data; 2) all variables in the study; and 3) only systemic data.

Author: Craig M. Greven (579782)
Emily Y. Chew (102712)
Ramon Casanova (128250)
Ronald P. Danis (502458)
Santiago Saldana (579781)
Walter T. Ambrosius (579783)
Publication venue
Publication date
Field of study

The addition of systemic variables did not lead to significant increases in classification accuracy.</p

FigShare

The RF probabilities of having DR were estimated for two groups of participants who were not diagnosed as DR at baseline: a) those who had a DR event (> = 3 step ETDRS progression, vitrectomy, or laser photocoagulation) during follow-up and 2) those who did not.

Author: Craig M. Greven (579782)
Emily Y. Chew (102712)
Ramon Casanova (128250)
Ronald P. Danis (502458)
Santiago Saldana (579781)
Walter T. Ambrosius (579783)
Publication venue
Publication date
Field of study

*Wilcoxon rank sum test, std – standard deviation.Estimation was made using baseline data.</p

FigShare

Diagnosis after four years of follow-up for subjects without DR at baseline, and eye events for each subgroup.

Author: Craig M. Greven (579782)
Emily Y. Chew (102712)
Ramon Casanova (128250)
Ronald P. Danis (502458)
Santiago Saldana (579781)
Walter T. Ambrosius (579783)
Publication venue
Publication date
Field of study

DR events represent changes3 steps in the ETDRS scale during follow-up.</p

FigShare

RF and LR performance using all available eye variables and a subset of eye variables selected by an expert as more clinically relevant.

Author: Craig M. Greven (579782)
Emily Y. Chew (102712)
Ramon Casanova (128250)
Ronald P. Danis (502458)
Santiago Saldana (579781)
Walter T. Ambrosius (579783)
Publication venue
Publication date
Field of study

While this selection led to some improvements for LR it had very little impact on RF performance.</p

FigShare

Baseline stratification of subjects across DR severity groups and numbers of eye events per group is provided.

Author: Craig M. Greven (579782)
Emily Y. Chew (102712)
Ramon Casanova (128250)
Ronald P. Danis (502458)
Santiago Saldana (579781)
Walter T. Ambrosius (579783)
Publication venue
Publication date
Field of study

DR events represent changes3 steps in the ETDRS scale during follow-up.</p

FigShare

Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning

Author: Alain G. Bertoni (820227)
Angela R. Subauste (3211311)
Chad Blackshear (3211314)
Lynne Wagenknecht (610797)
Mary E. Lacy (3211308)
Ramon Casanova (128250)
Santiago Saldana (579781)
Sean L. Simpson (219789)
Publication venue
Publication date: 11/10/2016
Field of study

<div>Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up, leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participants developed diabetes. The full RF model evaluated 93 variables including demographic, anthropometric, blood biomarker, medical history, and echocardiogram data. We also used RF metrics of variable importance to rank variables according to their contribution to diabetes prediction. We implemented other models based on logistic regression and RF where features were preselected. The RF full model performance was similar (AUC = 0.82) to those more parsimonious models. The top-ranked variables according to RF included hemoglobin A1C, fasting plasma glucose, waist circumference, adiponectin, c-reactive protein, triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, and aldosterone. This work shows the potential of RF for incident diabetes prediction while dealing with high-dimensional data.</div

Directory of Open Access Journals

PubMed Central

FigShare

Top 15 Variables Found in Random Forest Analyses, according to the Gini Index (N = 1000).

Author: Alain G. Bertoni (820227)
Angela R. Subauste (3211311)
Chad Blackshear (3211314)
Lynne Wagenknecht (610797)
Mary E. Lacy (3211308)
Ramon Casanova (128250)
Santiago Saldana (579781)
Sean L. Simpson (219789)
Publication venue
Publication date
Field of study

Top 15 Variables Found in Random Forest Analyses, according to the Gini Index (N = 1000).</p

FigShare