22 research outputs found

    Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected via Iterative Sure Independent Screening Method

    No full text
    <div><p>Type 2 diabetes, which is a complex metabolic disease influenced by genetic and environment, has become a worldwide problem. Previous published results focused on genetic components through genome-wide association studies that just interpret this disease to some extent. Recently, two research groups published metagenome-wide association studies (MGWAS) result that found meta-biomarkers related with type 2 diabetes. However, One key problem of analyzing genomic data is that how to deal with the ultra-high dimensionality of features. From a statistical viewpoint it is challenging to filter true factors in high dimensional data. Various methods and techniques have been proposed on this issue, which can only achieve limited prediction performance and poor interpretability. New statistical procedure with higher performance and clear interpretability is appealing in analyzing high dimensional data. To address this problem, we apply an excellent statistical variable selection procedure called iterative sure independence screening to gene profiles that obtained from metagenome sequencing, and 48/24 meta-markers were selected in Chinese/European cohorts as predictors with 0.97/0.99 accuracy in AUC (area under the curve), which showed a better performance than other model selection methods, respectively. These results demonstrate the power and utility of data mining technologies within the large-scale and ultra-high dimensional genomic-related dataset for diagnostic and predictive markers identifying.</p></div

    Results of simulated example I: accuracy of ISIS in including the true model {<i>X</i><sub>1</sub>,<i>X</i><sub>2</sub>,<i>X</i><sub>3</sub>}.

    No full text
    <p>Accuracy of ISIS on different correlation <i>ρ</i> and dimensionality <i>p</i> setting under nonlinear relationship. For each model, 100 data sets consisting of 50 observations were simulated and 20 variables were selected for computing the accuracy.</p><p>Results of simulated example I: accuracy of ISIS in including the true model {<i>X</i><sub>1</sub>,<i>X</i><sub>2</sub>,<i>X</i><sub>3</sub>}.</p

    Data.

    No full text
    <p>Chinese and European gut microbiota datasets of type 2 diabetes (T2D) used in our work. The ‘sd’ means standard deviation. BMI means body mass index.</p><p>Data.</p

    AUC.

    No full text
    <p>SVM classifier trained as a function of the size of signature, for mRMR, ensemble of lasso and ensemble of elastic net, in a 10-fold cross-validation setting on Chinese and European datasets respectively.</p

    AUC obtained by ISIS-SCAD (Chinese).

    No full text
    <p>AUC of signature size in {10, 15, 18, 23, 26, 28, 34, 41, 43, 48, 50, 61, 63}, combined with four classification algorithms in a 10-fold cross-validation. For each classification method, we highlighted the best result.</p><p>AUC obtained by ISIS-SCAD (Chinese).</p

    Averaged AUC obtained from SVM classifier combined with three variable selection methods.

    No full text
    <p>SVM classifier estimated as a function of sample size in a 50 × 10-fold cross-validation setting. We show accuracy of 60-gene of ensemble feature selection and 48-gene of ISIS-SCAD on Chinese dataset. For European dataset, the accuracy of ensemble feature selection is computed on 60-gene and the accuracy of ISIS-SCAD is on 24-gene.</p

    AUC obtained from SVM classifier estimated on genes selected by ISIS-SCAD and ensemble feature selection.

    No full text
    <p>Signature of size in {10, 15, 18, 23, 26, 28, 34, 41, 43, 48, 50, 61, 63} on Chinese dataset and size in {4, 11, 15, 22, 24, 26, 27, 28, 29, 32, 34, 35, 36} on European dataset in a 10-fold cross-validation setting.</p

    AUC obtained by ISIS-SCAD (European).

    No full text
    <p>AUC of signature size in {4, 11, 15, 22, 24, 26, 27, 28, 29, 32, 34, 35, 36}, combined with four classification algorithms in a 10-fold cross-validation. For each classification method, we highlighted the best result.</p><p>AUC obtained by ISIS-SCAD (European).</p

    Results of simulated example II: accuracy of ISIS in including the true model {<i>X</i><sub>1</sub>,<i>X</i><sub>2</sub>,<i>X</i><sub>3</sub>,<i>X</i><sub>4</sub>}.

    No full text
    <p>Accuracy of ISIS on different correlation <i>ρ</i> and dimensionality <i>p</i> setting under jointly contribution scenario. 100 data sets consisting of 50 observations were simulated and 20 variables were selected for computing the accuracy.</p><p>Results of simulated example II: accuracy of ISIS in including the true model {<i>X</i><sub>1</sub>,<i>X</i><sub>2</sub>,<i>X</i><sub>3</sub>,<i>X</i><sub>4</sub>}.</p

    Timed phylogeny for 6,335 SNPs in 260 genomes from the seventh pandemic.

    No full text
    <p>The vertical order is the same as in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005072#pgen.1005072.s006" target="_blank">S1A Table</a>. Branches are colored according to inferred location as shown in the legend at the lower left, with the exception of branches for which the location was uncertain which are shown in gray. Isolates from China are subdivided into isolates from Xinjiang (black dot), inland provinces (red dot) and coastal provinces (no dot). Selected clades of multiple, closely related isolates are indicated by grey boxes next on the left of the clade designations (1.A, 1.B, etc). Inset: Maximum likelihood tree of the same data with significantly longer branches according to <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005072#pgen.1005072.s003" target="_blank">S3 Fig</a> indicated in red.</p
    corecore