58 research outputs found
Significance Analysis for Pairwise Variable Selection in Classification
The goal of this article is to select important variables that can
distinguish one class of data from another. A marginal variable selection
method ranks the marginal effects for classification of individual variables,
and is a useful and efficient approach for variable selection. Our focus here
is to consider the bivariate effect, in addition to the marginal effect. In
particular, we are interested in those pairs of variables that can lead to
accurate classification predictions when they are viewed jointly. To accomplish
this, we propose a permutation test called Significance test of Joint Effect
(SigJEff). In the absence of joint effect in the data, SigJEff is similar or
equivalent to many marginal methods. However, when joint effects exist, our
method can significantly boost the performance of variable selection. Such
joint effects can help to provide additional, and sometimes dominating,
advantage for classification. We illustrate and validate our approach using
both simulated example and a real glioblastoma multiforme data set, which
provide promising results.Comment: 28 pages, 7 figure
Statistical methods for ranking differentially expressed genes
In the analysis of microarray data the identification of differential expression is paramount. Here I outline a method for finding an optimal test statistic with which to rank genes with respect to differential expression. Tests of the method show that it allows generation of top gene lists that give few false positives and few false negatives. Estimation of the false-negative as well as the false-positive rate lies at the heart of the method
A New Test Statistic Based on Shrunken Sample Variance for Identifying Differentially Expressed Genes in Small Microarray Experiments
Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result
Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays
Volcano plot displays unstandardized signal (e.g. log-fold-change) against
noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from
the t test). We review the basic and an interactive use of the volcano plot,
and its crucial role in understanding the regularized t-statistic. The joint
filtering gene selection criterion based on regularized statistics has a curved
discriminant line in the volcano plot, as compared to the two perpendicular
lines for the "double filtering" criterion. This review attempts to provide an
unifying framework for discussions on alternative measures of differential
expression, improved methods for estimating variance, and visual display of a
microarray analysis result. We also discuss the possibility to apply volcano
plots to other fields beyond microarray.Comment: 8 figure
Recommended from our members
Statistical Workflow for Feature Selection in Human Metabolomics Data.
High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity underlying human health and disease. Large-scale metabolomics data sources, generated using either targeted or nontargeted platforms, are becoming more common. Appropriate statistical analysis of these complex high-dimensional data will be critical for extracting meaningful results from such large-scale human metabolomics studies. Therefore, we consider the statistical analytical approaches that have been employed in prior human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we offer a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection. We discuss the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and offer guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow. Certain pervasive analytical challenges facing the field warrant ongoing focused research. Addressing these challenges, particularly those related to analyzing human metabolomics data, will allow for more standardization of as well as advances in how research in the field is practiced. In turn, such major analytical advances will lead to substantial improvements in the overall contributions of human metabolomics investigations
Globally increased ultraconserved noncoding RNA expression in pancreatic adenocarcinoma
This is the final version of the article. Available from the publisher via the DOI in this record.Transcribed ultraconserved regions (T-UCRs) are a class of non-coding RNAs with 100% sequence conservation among human, rat and mouse genomes. T-UCRs are differentially expressed in several cancers, however their expression in pancreatic adenocarcinoma (PDAC) has not been studied. We used a qPCR array to profile all 481 T-UCRs in pancreatic cancer specimens, pancreatic cancer cell lines, during experimental pancreatic desmoplasia and in the pancreases of P48Cre/wt; KrasLSL-G12D/wt mice. Fourteen, 57 and 29% of the detectable T-UCRs were differentially expressed in the cell lines, human tumors and transgenic mouse pancreases, respectively. The vast majority of the differentially expressed T-UCRs had increased expression in the cancer. T-UCRs were monitored using an in vitro model of the desmoplastic reaction. Twenty-five % of the expressed T-UCRs were increased in the HPDE cells cultured on PANC-1 cellular matrix. UC.190, UC.233 and UC.270 were increased in all three human data sets. siRNA knockdown of each of these three T-UCRs reduced the proliferation of MIA PaCa-2 cells up to 60%. The expression pattern among many T-UCRs in the human and mouse pancreases closely correlated with one another, suggesting that groups of T-UCRs are co-activated in PDAC. Successful knockout of the transcription factor EGR1 in PANC-1 cells caused a reduction in the expression of a subset of T-UCRs suggesting that EGR1 may control T-UCR expression in PDAC. We report a global increase in expression of T-UCRs in both human and mouse PDAC. Commonalties in their expression pattern suggest a similar mechanism of transcriptional upregulation for T-UCRs in PDAC.Supported by grants R21/R33CA114304 and
U01CA111294. G.A.C. is supported as a Fellow at The
University of Texas MD Anderson Research Trust, as a
University of Texas System Regents Research Scholar
and by the CLL Global Research Foundation. Work in Dr.
Calin’s laboratory is supported in part by a 2009 Seena
Magowitz–Pancreatic Cancer Action Network AACR Pilot
Grant, the Laura and John Arnold Foundation, the RGK
Foundation and the Estate of C. G. Johnson, Jr. A.C.P.A.P.
was supported by NIH fellowship 5F31CA142238
- …