9 research outputs found
Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected via Iterative Sure Independent Screening Method
<div><p>Type 2 diabetes, which is a complex metabolic disease influenced by genetic and environment, has become a worldwide problem. Previous published results focused on genetic components through genome-wide association studies that just interpret this disease to some extent. Recently, two research groups published metagenome-wide association studies (MGWAS) result that found meta-biomarkers related with type 2 diabetes. However, One key problem of analyzing genomic data is that how to deal with the ultra-high dimensionality of features. From a statistical viewpoint it is challenging to filter true factors in high dimensional data. Various methods and techniques have been proposed on this issue, which can only achieve limited prediction performance and poor interpretability. New statistical procedure with higher performance and clear interpretability is appealing in analyzing high dimensional data. To address this problem, we apply an excellent statistical variable selection procedure called iterative sure independence screening to gene profiles that obtained from metagenome sequencing, and 48/24 meta-markers were selected in Chinese/European cohorts as predictors with 0.97/0.99 accuracy in AUC (area under the curve), which showed a better performance than other model selection methods, respectively. These results demonstrate the power and utility of data mining technologies within the large-scale and ultra-high dimensional genomic-related dataset for diagnostic and predictive markers identifying.</p></div
Averaged AUC obtained from SVM classifier combined with three variable selection methods.
<p>SVM classifier estimated as a function of sample size in a 50 × 10-fold cross-validation setting. We show accuracy of 60-gene of ensemble feature selection and 48-gene of ISIS-SCAD on Chinese dataset. For European dataset, the accuracy of ensemble feature selection is computed on 60-gene and the accuracy of ISIS-SCAD is on 24-gene.</p
Results of simulated example II: accuracy of ISIS in including the true model {<i>X</i><sub>1</sub>,<i>X</i><sub>2</sub>,<i>X</i><sub>3</sub>,<i>X</i><sub>4</sub>}.
<p>Accuracy of ISIS on different correlation <i>ρ</i> and dimensionality <i>p</i> setting under jointly contribution scenario. 100 data sets consisting of 50 observations were simulated and 20 variables were selected for computing the accuracy.</p><p>Results of simulated example II: accuracy of ISIS in including the true model {<i>X</i><sub>1</sub>,<i>X</i><sub>2</sub>,<i>X</i><sub>3</sub>,<i>X</i><sub>4</sub>}.</p
Data.
<p>Chinese and European gut microbiota datasets of type 2 diabetes (T2D) used in our work. The ‘sd’ means standard deviation. BMI means body mass index.</p><p>Data.</p
Results of simulated example I: accuracy of ISIS in including the true model {<i>X</i><sub>1</sub>,<i>X</i><sub>2</sub>,<i>X</i><sub>3</sub>}.
<p>Accuracy of ISIS on different correlation <i>ρ</i> and dimensionality <i>p</i> setting under nonlinear relationship. For each model, 100 data sets consisting of 50 observations were simulated and 20 variables were selected for computing the accuracy.</p><p>Results of simulated example I: accuracy of ISIS in including the true model {<i>X</i><sub>1</sub>,<i>X</i><sub>2</sub>,<i>X</i><sub>3</sub>}.</p
AUC.
<p>SVM classifier trained as a function of the size of signature, for mRMR, ensemble of lasso and ensemble of elastic net, in a 10-fold cross-validation setting on Chinese and European datasets respectively.</p
AUC obtained by ISIS-SCAD (Chinese).
<p>AUC of signature size in {10, 15, 18, 23, 26, 28, 34, 41, 43, 48, 50, 61, 63}, combined with four classification algorithms in a 10-fold cross-validation. For each classification method, we highlighted the best result.</p><p>AUC obtained by ISIS-SCAD (Chinese).</p
AUC obtained from SVM classifier estimated on genes selected by ISIS-SCAD and ensemble feature selection.
<p>Signature of size in {10, 15, 18, 23, 26, 28, 34, 41, 43, 48, 50, 61, 63} on Chinese dataset and size in {4, 11, 15, 22, 24, 26, 27, 28, 29, 32, 34, 35, 36} on European dataset in a 10-fold cross-validation setting.</p
AUC obtained by ISIS-SCAD (European).
<p>AUC of signature size in {4, 11, 15, 22, 24, 26, 27, 28, 29, 32, 34, 35, 36}, combined with four classification algorithms in a 10-fold cross-validation. For each classification method, we highlighted the best result.</p><p>AUC obtained by ISIS-SCAD (European).</p