58 research outputs found

    Significance Analysis for Pairwise Variable Selection in Classification

    Get PDF
    The goal of this article is to select important variables that can distinguish one class of data from another. A marginal variable selection method ranks the marginal effects for classification of individual variables, and is a useful and efficient approach for variable selection. Our focus here is to consider the bivariate effect, in addition to the marginal effect. In particular, we are interested in those pairs of variables that can lead to accurate classification predictions when they are viewed jointly. To accomplish this, we propose a permutation test called Significance test of Joint Effect (SigJEff). In the absence of joint effect in the data, SigJEff is similar or equivalent to many marginal methods. However, when joint effects exist, our method can significantly boost the performance of variable selection. Such joint effects can help to provide additional, and sometimes dominating, advantage for classification. We illustrate and validate our approach using both simulated example and a real glioblastoma multiforme data set, which provide promising results.Comment: 28 pages, 7 figure

    Statistical methods for ranking differentially expressed genes

    Get PDF
    In the analysis of microarray data the identification of differential expression is paramount. Here I outline a method for finding an optimal test statistic with which to rank genes with respect to differential expression. Tests of the method show that it allows generation of top gene lists that give few false positives and few false negatives. Estimation of the false-negative as well as the false-positive rate lies at the heart of the method

    A New Test Statistic Based on Shrunken Sample Variance for Identifying Differentially Expressed Genes in Small Microarray Experiments

    Get PDF
    Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result

    Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

    Full text link
    Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

    Gene selection criterion for discriminant microarray data analysis based on extreme value distributions

    Full text link

    Globally increased ultraconserved noncoding RNA expression in pancreatic adenocarcinoma

    Get PDF
    This is the final version of the article. Available from the publisher via the DOI in this record.Transcribed ultraconserved regions (T-UCRs) are a class of non-coding RNAs with 100% sequence conservation among human, rat and mouse genomes. T-UCRs are differentially expressed in several cancers, however their expression in pancreatic adenocarcinoma (PDAC) has not been studied. We used a qPCR array to profile all 481 T-UCRs in pancreatic cancer specimens, pancreatic cancer cell lines, during experimental pancreatic desmoplasia and in the pancreases of P48Cre/wt; KrasLSL-G12D/wt mice. Fourteen, 57 and 29% of the detectable T-UCRs were differentially expressed in the cell lines, human tumors and transgenic mouse pancreases, respectively. The vast majority of the differentially expressed T-UCRs had increased expression in the cancer. T-UCRs were monitored using an in vitro model of the desmoplastic reaction. Twenty-five % of the expressed T-UCRs were increased in the HPDE cells cultured on PANC-1 cellular matrix. UC.190, UC.233 and UC.270 were increased in all three human data sets. siRNA knockdown of each of these three T-UCRs reduced the proliferation of MIA PaCa-2 cells up to 60%. The expression pattern among many T-UCRs in the human and mouse pancreases closely correlated with one another, suggesting that groups of T-UCRs are co-activated in PDAC. Successful knockout of the transcription factor EGR1 in PANC-1 cells caused a reduction in the expression of a subset of T-UCRs suggesting that EGR1 may control T-UCR expression in PDAC. We report a global increase in expression of T-UCRs in both human and mouse PDAC. Commonalties in their expression pattern suggest a similar mechanism of transcriptional upregulation for T-UCRs in PDAC.Supported by grants R21/R33CA114304 and U01CA111294. G.A.C. is supported as a Fellow at The University of Texas MD Anderson Research Trust, as a University of Texas System Regents Research Scholar and by the CLL Global Research Foundation. Work in Dr. Calin’s laboratory is supported in part by a 2009 Seena Magowitz–Pancreatic Cancer Action Network AACR Pilot Grant, the Laura and John Arnold Foundation, the RGK Foundation and the Estate of C. G. Johnson, Jr. A.C.P.A.P. was supported by NIH fellowship 5F31CA142238