7,309 research outputs found

    Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.</p> <p>Methods</p> <p>We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.</p> <p>Results</p> <p>We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at <url>https://sites.google.com/site/heyaumapbc2011/</url>.</p> <p>Conclusions</p> <p>This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining.</p

    Impact of the SPOP Mutant Subtype on the Interpretation of Clinical Parameters in Prostate Cancer.

    Get PDF
    Purpose: Molecular characterization of prostate cancer, including The Cancer Genome Atlas, has revealed distinct subtypes with underlying genomic alterations. One of these core subtypes, SPOP (speckle-type POZ protein) mutant prostate cancer, has previously only been identifiable via DNA sequencing, which has made the impact on prognosis and routinely used risk stratification parameters unclear. Methods: We have developed a novel gene expression signature, classifier (Subclass Predictor Based on Transcriptional Data), and decision tree to predict the SPOP mutant subclass from RNA gene expression data and classify common prostate cancer molecular subtypes. We then validated and further interrogated the association of prostate cancer molecular subtypes with pathologic and clinical outcomes in retrospective and prospective cohorts of 8,158 patients. Results: The subclass predictor based on transcriptional data model showed high sensitivity and specificity in multiple cohorts across both RNA sequencing and microarray gene expression platforms. We predicted approximately 8% to 9% of cases to be SPOP mutant from both retrospective and prospective cohorts. We found that the SPOP mutant subclass was associated with lower frequency of positive margins, extraprostatic extension, and seminal vesicle invasion at prostatectomy; however, SPOP mutant cancers were associated with higher pretreatment serum prostate-specific antigen (PSA). The association between SPOP mutant status and higher PSA level was validated in three independent cohorts. Despite high pretreatment PSA, the SPOP mutant subtype was associated with a favorable prognosis with improved metastasis-free survival, particularly in patients with high-risk preoperative PSA levels. Conclusion: Using a novel gene expression model and a decision tree algorithm to define prostate cancer molecular subclasses, we found that the SPOP mutant subclass is associated with higher preoperative PSA, less adverse pathologic features, and favorable prognosis. These findings suggest a paradigm in which the interpretation of common risk stratification parameters, particularly PSA, may be influenced by the underlying molecular subtype of prostate cancer

    Histogram-based models on non-thin section chest CT predict invasiveness of primary lung adenocarcinoma subsolid nodules.

    Get PDF
    109 pathologically proven subsolid nodules (SSN) were segmented by 2 readers on non-thin section chest CT with a lung nodule analysis software followed by extraction of CT attenuation histogram and geometric features. Functional data analysis of histograms provided data driven features (FPC1,2,3) used in further model building. Nodules were classified as pre-invasive (P1, atypical adenomatous hyperplasia and adenocarcinoma in situ), minimally invasive (P2) and invasive adenocarcinomas (P3). P1 and P2 were grouped together (T1) versus P3 (T2). Various combinations of features were compared in predictive models for binary nodule classification (T1/T2), using multiple logistic regression and non-linear classifiers. Area under ROC curve (AUC) was used as diagnostic performance criteria. Inter-reader variability was assessed using Cohen's Kappa and intra-class coefficient (ICC). Three models predicting invasiveness of SSN were selected based on AUC. First model included 87.5 percentile of CT lesion attenuation (Q.875), interquartile range (IQR), volume and maximum/minimum diameter ratio (AUC:0.89, 95%CI:[0.75 1]). Second model included FPC1, volume and diameter ratio (AUC:0.91, 95%CI:[0.77 1]). Third model included FPC1, FPC2 and volume (AUC:0.89, 95%CI:[0.73 1]). Inter-reader variability was excellent (Kappa:0.95, ICC:0.98). Parsimonious models using histogram and geometric features differentiated invasive from minimally invasive/pre-invasive SSN with good predictive performance in non-thin section CT

    PLS dimension reduction for classification of microarray data

    Get PDF
    PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets

    Validation of Soft Classification Models using Partial Class Memberships: An Extended Concept of Sensitivity & Co. applied to the Grading of Astrocytoma Tissues

    Full text link
    We use partial class memberships in soft classification to model uncertain labelling and mixtures of classes. Partial class memberships are not restricted to predictions, but may also occur in reference labels (ground truth, gold standard diagnosis) for training and validation data. Classifier performance is usually expressed as fractions of the confusion matrix, such as sensitivity, specificity, negative and positive predictive values. We extend this concept to soft classification and discuss the bias and variance properties of the extended performance measures. Ambiguity in reference labels translates to differences between best-case, expected and worst-case performance. We show a second set of measures comparing expected and ideal performance which is closely related to regression performance, namely the root mean squared error RMSE and the mean absolute error MAE. All calculations apply to classical crisp classification as well as to soft classification (partial class memberships and/or one-class classifiers). The proposed performance measures allow to test classifiers with actual borderline cases. In addition, hardening of e.g. posterior probabilities into class labels is not necessary, avoiding the corresponding information loss and increase in variance. We implement the proposed performance measures in the R package "softclassval", which is available from CRAN and at http://softclassval.r-forge.r-project.org. Our reasoning as well as the importance of partial memberships for chemometric classification is illustrated by a real-word application: astrocytoma brain tumor tissue grading (80 patients, 37000 spectra) for finding surgical excision borders. As borderline cases are the actual target of the analytical technique, samples which are diagnosed to be borderline cases must be included in the validation.Comment: The manuscript is accepted for publication in Chemometrics and Intelligent Laboratory Systems. Supplementary figures and tables are at the end of the pd
    corecore