184,672 research outputs found

    Strong approximations of level exceedences related to multiple hypothesis testing

    Full text link
    Particularly in genomics, but also in other fields, it has become commonplace to undertake highly multiple Student's tt-tests based on relatively small sample sizes. The literature on this topic is continually expanding, but the main approaches used to control the family-wise error rate and false discovery rate are still based on the assumption that the tests are independent. The independence condition is known to be false at the level of the joint distributions of the test statistics, but that does not necessarily mean, for the small significance levels involved in highly multiple hypothesis testing, that the assumption leads to major errors. In this paper, we give conditions under which the assumption of independence is valid. Specifically, we derive a strong approximation that closely links the level exceedences of a dependent ``studentized process'' to those of a process of independent random variables. Via this connection, it can be seen that in high-dimensional, low sample-size cases, provided the sample size diverges faster than the logarithm of the number of tests, the assumption of independent tt-tests is often justified.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ220 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Diverse correlation structures in gene expression data and their utility in improving statistical inference

    Full text link
    It is well known that correlations in microarray data represent a serious nuisance deteriorating the performance of gene selection procedures. This paper is intended to demonstrate that the correlation structure of microarray data provides a rich source of useful information. We discuss distinct correlation substructures revealed in microarray gene expression data by an appropriate ordering of genes. These substructures include stochastic proportionality of expression signals in a large percentage of all gene pairs, negative correlations hidden in ordered gene triples, and a long sequence of weakly dependent random variables associated with ordered pairs of genes. The reported striking regularities are of general biological interest and they also have far-reaching implications for theory and practice of statistical methods of microarray data analysis. We illustrate the latter point with a method for testing differential expression of nonoverlapping gene pairs. While designed for testing a different null hypothesis, this method provides an order of magnitude more accurate control of type 1 error rate compared to conventional methods of individual gene expression profiling. In addition, this method is robust to the technical noise. Quantitative inference of the correlation structure has the potential to extend the analysis of microarray data far beyond currently practiced methods.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS120 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Comment: Microarrays, Empirical Bayes and the Two-Group Model

    Get PDF
    Comment on ``Microarrays, Empirical Bayes and the Two-Group Model'' [arXiv:0808.0572]Comment: Published in at http://dx.doi.org/10.1214/07-STS236C the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore