4,025 research outputs found
Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays
Volcano plot displays unstandardized signal (e.g. log-fold-change) against
noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from
the t test). We review the basic and an interactive use of the volcano plot,
and its crucial role in understanding the regularized t-statistic. The joint
filtering gene selection criterion based on regularized statistics has a curved
discriminant line in the volcano plot, as compared to the two perpendicular
lines for the "double filtering" criterion. This review attempts to provide an
unifying framework for discussions on alternative measures of differential
expression, improved methods for estimating variance, and visual display of a
microarray analysis result. We also discuss the possibility to apply volcano
plots to other fields beyond microarray.Comment: 8 figure
Bayesian Gene Set Analysis
Gene expression microarray technologies provide the simultaneous measurements
of a large number of genes. Typical analyses of such data focus on the
individual genes, but recent work has demonstrated that evaluating changes in
expression across predefined sets of genes often increases statistical power
and produces more robust results. We introduce a new methodology for
identifying gene sets that are differentially expressed under varying
experimental conditions. Our approach uses a hierarchical Bayesian framework
where a hyperparameter measures the significance of each gene set. Using
simulated data, we compare our proposed method to alternative approaches, such
as Gene Set Enrichment Analysis (GSEA) and Gene Set Analysis (GSA). Our
approach provides the best overall performance. We also discuss the application
of our method to experimental data based on p53 mutation status
GaGa: A parsimonious and flexible model for differential expression analysis
Hierarchical models are a powerful tool for high-throughput data with a small
to moderate number of replicates, as they allow sharing information across
units of information, for example, genes. We propose two such models and show
its increased sensitivity in microarray differential expression applications.
We build on the gamma--gamma hierarchical model introduced by Kendziorski et
al. [Statist. Med. 22 (2003) 3899--3914] and Newton et al. [Biostatistics 5
(2004) 155--176], by addressing important limitations that may have hampered
its performance and its more widespread use. The models parsimoniously describe
the expression of thousands of genes with a small number of hyper-parameters.
This makes them easy to interpret and analytically tractable. The first model
is a simple extension that improves the fit substantially with almost no
increase in complexity. We propose a second extension that uses a mixture of
gamma distributions to further improve the fit, at the expense of increased
computational burden. We derive several approximations that significantly
reduce the computational cost. We find that our models outperform the original
formulation of the model, as well as some other popular methods for
differential expression analysis. The improved performance is specially
noticeable for the small sample sizes commonly encountered in high-throughput
experiments. Our methods are implemented in the freely available Bioconductor
gaga package.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS244 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing
<p>Abstract</p> <p>Background</p> <p>Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. Recently, a concordance method that measures agreement among gene lists have been introduced to assess various aspects of differential gene expression detection. This method has the advantage of basing its assessment solely on the results of real data analyses, but as it requires examining gene lists of given sizes, it may be unstable.</p> <p>Results</p> <p>Two methodologies for assessing predictive error are described: a cross-validation method and a posterior predictive method. As a nonparametric method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing algorithms for detecting differential gene expression that is fully justified for large numbers of biological replicates. Because it leverages the knowledge that only a small portion of genes are differentially expressed, the posterior predictive method is expected to provide more reliable estimates of algorithm performance, allaying concerns about limited biological replication. In practice, the posterior predictive method can assess when its approximations are valid and when they are inaccurate. Under conditions in which its approximations are valid, it corroborates the results of cross validation. Both comparison methodologies are applicable to both single-channel and dual-channel microarrays. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on hierarchical models tend to outperform algorithms based on selecting genes by their fold changes or by non-hierarchical model-selection criteria. (The latter two approaches have comparable performance.) The posterior predictive assessment corroborates these findings.</p> <p>Conclusions</p> <p>Algorithms for detecting differential gene expression may be compared by estimating each algorithm's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups.</p> <p>According to two distinct estimators of prediction error, algorithms using hierarchical models outperform the other algorithms of the study. The fact that fold-change shrinkage performed as well as conventional model selection criteria calls for investigating algorithms that combine the strengths of significance testing and fold-change estimation.</p
A HIERARCHICAL BAYESIAN APPROACH FOR DETECTING DIFFERENTIAL GENE EXPRESSION IN UNREPLICATED RNA-SEQUENCING DATA
Next-generation sequencing technologies have emerged as a promising technology in a variety of fields, including genomics, epigenomics, and transcriptomics. These technologies play an important role in understanding cell organization and functionality. Unlike data from earlier technologies (e.g., microarrays), data from next-generation sequencing technologies are highly replicable with little technical variation. One application of next-generation sequencing technologies is RNA-Sequencing (RNA-Seq). It is used for detecting differential gene expression between different biological conditions. While statistical methods for detecting differential expression in RNA-Seq data exist, one serious limitation to these methods is the absence of biological replication. At present, the high cost of next-generation sequencing technologies imposes a serious restriction on the number of biological replicates. We present a simple parametric hierarchical Bayesian model for detecting differential expression in data from unreplicated RNA-Seq experiments. The model extends naturally to multiple treatment groups and any number of biological replicates. We illustrate the application of this model through simulation studies and compare our approach to existing methods for detecting differential expression such as, Fisher\u27s Exact Test
Estimating the proportion of differentially expressed genes in comparative DNA microarray experiments
DNA microarray experiments, a well-established experimental technique, aim at
understanding the function of genes in some biological processes. One of the
most common experiments in functional genomics research is to compare two
groups of microarray data to determine which genes are differentially
expressed. In this paper, we propose a methodology to estimate the proportion
of differentially expressed genes in such experiments. We study the performance
of our method in a simulation study where we compare it to other standard
methods. Finally we compare the methods in real data from two toxicology
experiments with mice.Comment: Published at http://dx.doi.org/10.1214/074921707000000076 in the IMS
Lecture Notes Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …