184,672 research outputs found
Recommended from our members
Covariate-assisted ranking and screening for large-scale two-sample inference
Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection
Strong approximations of level exceedences related to multiple hypothesis testing
Particularly in genomics, but also in other fields, it has become commonplace
to undertake highly multiple Student's -tests based on relatively small
sample sizes. The literature on this topic is continually expanding, but the
main approaches used to control the family-wise error rate and false discovery
rate are still based on the assumption that the tests are independent. The
independence condition is known to be false at the level of the joint
distributions of the test statistics, but that does not necessarily mean, for
the small significance levels involved in highly multiple hypothesis testing,
that the assumption leads to major errors. In this paper, we give conditions
under which the assumption of independence is valid. Specifically, we derive a
strong approximation that closely links the level exceedences of a dependent
``studentized process'' to those of a process of independent random variables.
Via this connection, it can be seen that in high-dimensional, low sample-size
cases, provided the sample size diverges faster than the logarithm of the
number of tests, the assumption of independent -tests is often justified.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ220 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Diverse correlation structures in gene expression data and their utility in improving statistical inference
It is well known that correlations in microarray data represent a serious
nuisance deteriorating the performance of gene selection procedures. This paper
is intended to demonstrate that the correlation structure of microarray data
provides a rich source of useful information. We discuss distinct correlation
substructures revealed in microarray gene expression data by an appropriate
ordering of genes. These substructures include stochastic proportionality of
expression signals in a large percentage of all gene pairs, negative
correlations hidden in ordered gene triples, and a long sequence of weakly
dependent random variables associated with ordered pairs of genes. The reported
striking regularities are of general biological interest and they also have
far-reaching implications for theory and practice of statistical methods of
microarray data analysis. We illustrate the latter point with a method for
testing differential expression of nonoverlapping gene pairs. While designed
for testing a different null hypothesis, this method provides an order of
magnitude more accurate control of type 1 error rate compared to conventional
methods of individual gene expression profiling. In addition, this method is
robust to the technical noise. Quantitative inference of the correlation
structure has the potential to extend the analysis of microarray data far
beyond currently practiced methods.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS120 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Comment: Microarrays, Empirical Bayes and the Two-Group Model
Comment on ``Microarrays, Empirical Bayes and the Two-Group Model''
[arXiv:0808.0572]Comment: Published in at http://dx.doi.org/10.1214/07-STS236C the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …