25,389 research outputs found

    Rank discriminants for predicting phenotypes from RNA expression

    Get PDF
    Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework ("rank-in-context") for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support ("context"). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data

    Get PDF
    Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of high-dimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray expression data we provide a systematic comparison of the PLS approaches currently employed, and discuss problems as different as tumor classification, identification of relevant genes, survival analysis and modeling of gene networks

    TOP2A and EZH2 Provide Early Detection of an Aggressive Prostate Cancer Subgroup.

    Get PDF
    Purpose: Current clinical parameters do not stratify indolent from aggressive prostate cancer. Aggressive prostate cancer, defined by the progression from localized disease to metastasis, is responsible for the majority of prostate cancer–associated mortality. Recent gene expression profiling has proven successful in predicting the outcome of prostate cancer patients; however, they have yet to provide targeted therapy approaches that could inhibit a patient\u27s progression to metastatic disease. Experimental Design: We have interrogated a total of seven primary prostate cancer cohorts (n = 1,900), two metastatic castration-resistant prostate cancer datasets (n = 293), and one prospective cohort (n = 1,385) to assess the impact of TOP2A and EZH2 expression on prostate cancer cellular program and patient outcomes. We also performed IHC staining for TOP2A and EZH2 in a cohort of primary prostate cancer patients (n = 89) with known outcome. Finally, we explored the therapeutic potential of a combination therapy targeting both TOP2A and EZH2 using novel prostate cancer–derived murine cell lines. Results: We demonstrate by genome-wide analysis of independent primary and metastatic prostate cancer datasets that concurrent TOP2A and EZH2 mRNA and protein upregulation selected for a subgroup of primary and metastatic patients with more aggressive disease and notable overlap of genes involved in mitotic regulation. Importantly, TOP2A and EZH2 in prostate cancer cells act as key driving oncogenes, a fact highlighted by sensitivity to combination-targeted therapy. Conclusions: Overall, our data support further assessment of TOP2A and EZH2 as biomarkers for early identification of patients with increased metastatic potential that may benefit from adjuvant or neoadjuvant targeted therapy approaches. ©2017 AACR

    DNA expression microarrays may be the wrong tool to identify biological pathways

    Get PDF
    DNA microarray expression signatures are expected to provide new insights into patho- physiological pathways. Numerous variant statistical methods have been described for each step of the signal analysis. We employed five similar statistical tests on the same data set at the level of gene selection. Inter-test agreement for the identification of biological pathways in BioCarta, KEGG and Reactome was calculated using Cohen’s k- score. The identification of specific biological pathways showed only moderate agreement (0.30 < k < 0.79) between the analysis methods used. Pathways identified by microarrays must be treated cautiously as they vary according to the statistical method used

    A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

    Get PDF
    Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer

    Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

    Full text link
    Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure
    • …
    corecore