Search CORE

58 research outputs found

Significance Analysis for Pairwise Variable Selection in Classification

Author: Liu Yufeng
Marron J. S.
Qiao Xingye
Publication venue
Publication date: 01/01/2014
Field of study

The goal of this article is to select important variables that can distinguish one class of data from another. A marginal variable selection method ranks the marginal effects for classification of individual variables, and is a useful and efficient approach for variable selection. Our focus here is to consider the bivariate effect, in addition to the marginal effect. In particular, we are interested in those pairs of variables that can lead to accurate classification predictions when they are viewed jointly. To accomplish this, we propose a permutation test called Significance test of Joint Effect (SigJEff). In the absence of joint effect in the data, SigJEff is similar or equivalent to many marginal methods. However, when joint effects exist, our method can significantly boost the performance of variable selection. Such joint effects can help to provide additional, and sometimes dominating, advantage for classification. We illustrate and validate our approach using both simulated example and a real glioblastoma multiforme data set, which provide promising results.Comment: 28 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Carolina Digital Repository

Statistical methods for ranking differentially expressed genes

Author: Broberg Per
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

In the analysis of microarray data the identification of differential expression is paramount. Here I outline a method for finding an optimal test statistic with which to rank genes with respect to differential expression. Tests of the method show that it allows generation of top gene lists that give few false positives and few false negatives. Estimation of the false-negative as well as the false-positive rate lies at the heart of the method

Springer

Springer - Publisher Connector

PubMed Central

A New Test Statistic Based on Shrunken Sample Variance for Identifying Differentially Expressed Genes in Small Microarray Experiments

Author: Hamada Chikuma
Hirakawa Akihiro
Sato Yasunori
Yoshimura Isao
Publication venue: Libertas Academica
Publication date: 01/01/2008
Field of study

Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result

CiteSeerX

Directory of Open Access Journals

PubMed Central

Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

Author: Alvord W. G.
Auer P. L.
Chen Y.
Chen Z.
Cohen J.
Fechner G. T.
Guyon I.
Göhlmann H.
Lee J.
Li C.
Schwender H.
Smyth G. K.
Snedecor G. W.
Trevino V.
Vandesompele J.
Welsh B. L.
WENTIAN LI
Zhao C.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/08/2013
Field of study

Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Statistical Workflow for Feature Selection in Human Metabolomics Data.

Author: Antonelli Joseph
Cheng Susan
Claggett Brian L
Demler Olga V
Deng Katherine
Henglin Mir
Hushcha Pavel V
Jain Mohit
Kim Andy
Kim Nicole
Lagerborg Kim A
Mora Samia
Niiranen Teemu J
Ovsak Gavin
Pereira Alexandre C
Rao Kevin
Tyagi Octavia
Watrous Jeramie D
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity underlying human health and disease. Large-scale metabolomics data sources, generated using either targeted or nontargeted platforms, are becoming more common. Appropriate statistical analysis of these complex high-dimensional data will be critical for extracting meaningful results from such large-scale human metabolomics studies. Therefore, we consider the statistical analytical approaches that have been employed in prior human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we offer a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection. We discuss the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and offer guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow. Certain pervasive analytical challenges facing the field warrant ongoing focused research. Addressing these challenges, particularly those related to analyzing human metabolomics data, will allow for more standardization of as well as advances in how research in the field is practiced. In turn, such major analytical advances will lead to substantial improvements in the overall contributions of human metabolomics investigations

eScholarship - University of California

Gene selection criterion for discriminant microarray data analysis based on extreme value distributions

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref

Globally increased ultraconserved noncoding RNA expression in pancreatic adenocarcinoma

Author: Allard D
Azevedo-Pouly AC
Badawi M
Brackett DJ
Calin GA
Elgamal OA
Gusev Y
Jiang J
Lee EJ
Lerner MR
Redis RS
Schmittgen TD
Sutaria DS
Publication venue: 'Impact Journals, LLC'
Publication date: 05/07/2016
Field of study

This is the final version of the article. Available from the publisher via the DOI in this record.Transcribed ultraconserved regions (T-UCRs) are a class of non-coding RNAs with 100% sequence conservation among human, rat and mouse genomes. T-UCRs are differentially expressed in several cancers, however their expression in pancreatic adenocarcinoma (PDAC) has not been studied. We used a qPCR array to profile all 481 T-UCRs in pancreatic cancer specimens, pancreatic cancer cell lines, during experimental pancreatic desmoplasia and in the pancreases of P48Cre/wt; KrasLSL-G12D/wt mice. Fourteen, 57 and 29% of the detectable T-UCRs were differentially expressed in the cell lines, human tumors and transgenic mouse pancreases, respectively. The vast majority of the differentially expressed T-UCRs had increased expression in the cancer. T-UCRs were monitored using an in vitro model of the desmoplastic reaction. Twenty-five % of the expressed T-UCRs were increased in the HPDE cells cultured on PANC-1 cellular matrix. UC.190, UC.233 and UC.270 were increased in all three human data sets. siRNA knockdown of each of these three T-UCRs reduced the proliferation of MIA PaCa-2 cells up to 60%. The expression pattern among many T-UCRs in the human and mouse pancreases closely correlated with one another, suggesting that groups of T-UCRs are co-activated in PDAC. Successful knockout of the transcription factor EGR1 in PANC-1 cells caused a reduction in the expression of a subset of T-UCRs suggesting that EGR1 may control T-UCR expression in PDAC. We report a global increase in expression of T-UCRs in both human and mouse PDAC. Commonalties in their expression pattern suggest a similar mechanism of transcriptional upregulation for T-UCRs in PDAC.Supported by grants R21/R33CA114304 and U01CA111294. G.A.C. is supported as a Fellow at The University of Texas MD Anderson Research Trust, as a University of Texas System Regents Research Scholar and by the CLL Global Research Foundation. Work in Dr. Calin’s laboratory is supported in part by a 2009 Seena Magowitz–Pancreatic Cancer Action Network AACR Pilot Grant, the Laura and John Arnold Foundation, the RGK Foundation and the Estate of C. G. Johnson, Jr. A.C.P.A.P. was supported by NIH fellowship 5F31CA142238

Open Research Exeter