19,349 research outputs found
Estimating the proportion of differentially expressed genes in comparative DNA microarray experiments
DNA microarray experiments, a well-established experimental technique, aim at
understanding the function of genes in some biological processes. One of the
most common experiments in functional genomics research is to compare two
groups of microarray data to determine which genes are differentially
expressed. In this paper, we propose a methodology to estimate the proportion
of differentially expressed genes in such experiments. We study the performance
of our method in a simulation study where we compare it to other standard
methods. Finally we compare the methods in real data from two toxicology
experiments with mice.Comment: Published at http://dx.doi.org/10.1214/074921707000000076 in the IMS
Lecture Notes Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
A statistical framework for the design of microarray experiments and effective detection of differential gene expression
Four reasons why you might wish to read this paper: 1. We have devised a new
statistical T test to determine differentially expressed genes (DEG) in the
context of microarray experiments. This statistical test adds a new member to
the traditional T-test family. 2. An exact formula for calculating the
detection power of this T test is presented, which can also be fairly easily
modified to cover the traditional T tests. 3. We have presented an accurate yet
computationally very simple method to estimate the fraction of non-DEGs in a
set of genes being tested. This method is superior to an existing one which is
computationally much involved. 4. We approach the multiple testing problem from
a fresh angle, and discuss its relation to the classical Bonferroni procedure
and to the FDR (false discovery rate) approach. This is most useful in the
analysis of microarray data, where typically several thousands of genes are
being tested simultaneously.Comment: 9 pages, 1 table; to appear in Bioinformatic
Size, power and false discovery rates
Modern scientific technology has provided a new class of large-scale
simultaneous inference problems, with thousands of hypothesis tests to consider
at the same time. Microarrays epitomize this type of technology, but similar
situations arise in proteomics, spectroscopy, imaging, and social science
surveys. This paper uses false discovery rate methods to carry out both size
and power calculations on large-scale problems. A simple empirical Bayes
approach allows the false discovery rate (fdr) analysis to proceed with a
minimum of frequentist or Bayesian modeling assumptions. Closed-form accuracy
formulas are derived for estimated false discovery rates, and used to compare
different methodologies: local or tail-area fdr's, theoretical, permutation, or
empirical null hypothesis estimates. Two microarray data sets as well as
simulations are used to evaluate the methodology, the power diagnostics showing
why nonnull cases might easily fail to appear on a list of ``significant''
discoveries.Comment: Published in at http://dx.doi.org/10.1214/009053606000001460 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Microarrays, Empirical Bayes and the Two-Groups Model
The classic frequentist theory of hypothesis testing developed by Neyman,
Pearson and Fisher has a claim to being the twentieth century's most
influential piece of applied mathematics. Something new is happening in the
twenty-first century: high-throughput devices, such as microarrays, routinely
require simultaneous hypothesis tests for thousands of individual cases, not at
all what the classical theory had in mind. In these situations empirical Bayes
information begins to force itself upon frequentists and Bayesians alike. The
two-groups model is a simple Bayesian construction that facilitates empirical
Bayes analysis. This article concerns the interplay of Bayesian and frequentist
ideas in the two-groups setting, with particular attention focused on Benjamini
and Hochberg's False Discovery Rate method. Topics include the choice and
meaning of the null hypothesis in large-scale testing situations, power
considerations, the limitations of permutation methods, significance testing
for groups of cases (such as pathways in microarray studies), correlation
effects, multiple confidence intervals and Bayesian competitors to the
two-groups model.Comment: This paper commented in: [arXiv:0808.0582], [arXiv:0808.0593],
[arXiv:0808.0597], [arXiv:0808.0599]. Rejoinder in [arXiv:0808.0603].
Published in at http://dx.doi.org/10.1214/07-STS236 the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
A challenging problem in estimating high-dimensional graphical models is to
choose the regularization parameter in a data-dependent way. The standard
techniques include -fold cross-validation (-CV), Akaike information
criterion (AIC), and Bayesian information criterion (BIC). Though these methods
work well for low-dimensional problems, they are not suitable in high
dimensional settings. In this paper, we present StARS: a new stability-based
method for choosing the regularization parameter in high dimensional inference
for undirected graphs. The method has a clear interpretation: we use the least
amount of regularization that simultaneously makes a graph sparse and
replicable under random sampling. This interpretation requires essentially no
conditions. Under mild conditions, we show that StARS is partially sparsistent
in terms of graph estimation: i.e. with high probability, all the true edges
will be included in the selected model even when the graph size diverges with
the sample size. Empirically, the performance of StARS is compared with the
state-of-the-art model selection procedures, including -CV, AIC, and BIC, on
both synthetic data and a real microarray dataset. StARS outperforms all these
competing procedures
Tellipsoid: Exploiting inter-gene correlation for improved detection of differential gene expression
Motivation: Algorithms for differential analysis of microarray data are vital
to modern biomedical research. Their accuracy strongly depends on effective
treatment of inter-gene correlation. Correlation is ordinarily accounted for in
terms of its effect on significance cut-offs. In this paper it is shown that
correlation can, in fact, be exploited {to share information across tests},
which, in turn, can increase statistical power.
Results: Vastly and demonstrably improved differential analysis approaches
are the result of combining identifiability (the fact that in most microarray
data sets, a large proportion of genes can be identified a priori as
non-differential) with optimization criteria that incorporate correlation. As a
special case, we develop a method which builds upon the widely used two-sample
t-statistic based approach and uses the Mahalanobis distance as an optimality
criterion. Results on the prostate cancer data of Singh et al. (2002) suggest
that the proposed method outperforms all published approaches in terms of
statistical power.
Availability: The proposed algorithm is implemented in MATLAB and in R. The
software, called Tellipsoid, and relevant data sets are available at
http://www.egr.msu.edu/~desaikeyComment: 19 pages, Submitted to Bioinformatic
- …