37 research outputs found

    Mining Gene Expression Data of Multiple Sclerosis

    No full text
    <div><p>Objectives</p><p>Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example.</p><p>Materials and methods</p><p>Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined.</p><p>Results</p><p>An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score.</p><p>Conclusions</p><p>The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases.</p></div

    Receiver operating characteristic (ROC) curves for evaluating identified features.

    No full text
    <p>AUC (area under the curve) and pAUC (partial area under the curve) indicators were computed to assess the performance for each feature.</p

    Evaluation of multiple classification models including Support Vector Machine (SVM), Random Forest (RF), naïve Bayes (Bayes), Neural Network (NNT), K-Nearest Neighbor (KNN) and Logistic regression models via 10-fold cross-validation (10FCV).

    No full text
    <p>Evaluation of multiple classification models including Support Vector Machine (SVM), Random Forest (RF), naïve Bayes (Bayes), Neural Network (NNT), K-Nearest Neighbor (KNN) and Logistic regression models via 10-fold cross-validation (10FCV).</p

    Scatter plot of expression values of eight features.

    No full text
    <p>Each panel in the above plot corresponds to one probe set. The y-axis represents the logarithmic expression intensity of each probe set, and the x-axis represents the samples. The red and blue colors respectively represent the multiple sclerosis and control groups.</p

    KEGG enrichment analysis using the GATHER database.

    No full text
    <p>Genes (With Ann): the number of genes from the list with the annotation. Genes (No Ann): the number of genes from the list without the annotation. Genome (With Ann): the number of genes in the genome (excluding those in the list) with the annotation. Genome (No Ann): the number of genes in the genome (excluding those in the list) without the annotation. Genes: the symbols of the genes that have the annotation.</p

    Overlapping features based on the ranked feature sets generated by three algorithms.

    No full text
    <p>Model 1: Support Vector Machine based on Recursive Feature Elimination (SVM-RFE) algorithm; Model 2: Receiver Operating Characteristic (ROC) Curve algorithm; Model 3: Boruta algorithm. In this procedure, an overlapping set, including 8 features, was identified and used for gene matching.</p

    Annotations of gene symbol and full gene name for each selected probe set and the differential expression analysis using moderated <i>t</i>-test.

    No full text
    <p>Annotations of gene symbol and full gene name for each selected probe set and the differential expression analysis using moderated <i>t</i>-test.</p

    Flow chart of data analysis.

    No full text
    <p>The four major steps of this study: data preprocessing, feature selection, model building, and performance validation.</p
    corecore