17,010 research outputs found

    The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing

    Get PDF
    Significance testing is one of the main objectives of statistics. The Neyman-Pearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. As shrinkage estimation borrows strength across point estimates to improve their overall performance, I show here that borrowing strength across multiple significance tests can improve their performance as well. The optimal discovery procedure (ODP) is introduced, which shows how to maximize the number of expected true positives for each fixed number of expected false positives. The optimality achieved by this procedure is shown to be closely related to optimality in terms of the false discovery rate. The ODP motivates a new approach to testing multiple hypotheses, especially when the tests are related. As a simple example, a new simultaneous procedure for testing several Normal means is defined; this is surprisingly demonstrated to outperform the optimal single test procedure, showing that an optimal method for single tests may no longer be optimal in the multiple test setting. Connections to other concepts in statistics are discussed, including Stein\u27s paradox, shrinkage estimation, and Bayesian classification theory

    Unsupervised empirical Bayesian multiple testing with external covariates

    Full text link
    In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS158 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Recent developments towards optimality in multiple hypothesis testing

    Full text link
    There are many different notions of optimality even in testing a single hypothesis. In the multiple testing area, the number of possibilities is very much greater. The paper first will describe multiplicity issues that arise in tests involving a single parameter, and will describe a new optimality result in that context. Although the example given is of minimal practical importance, it illustrates the crucial dependence of optimality on the precise specification of the testing problem. The paper then will discuss the types of expanded optimality criteria that are being considered when hypotheses involve multiple parameters, will note a few new optimality results, and will give selected theoretical references relevant to optimality considerations under these expanded criteria.Comment: Published at http://dx.doi.org/10.1214/074921706000000374 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing

    Full text link
    In the spirit of modeling inference for microarrays as multiple testing for sparse mixtures, we present a similar approach to a simplified version of quantitative trait loci (QTL) mapping. Unlike in case of microarrays, where the number of tests usually reaches tens of thousands, the number of tests performed in scans for QTL usually does not exceed several hundreds. However, in typical cases, the sparsity pp of significant alternatives for QTL mapping is in the same range as for microarrays. For methodological interest, as well as some related applications, we also consider non-sparse mixtures. Using simulations as well as theoretical observations we study false discovery rate (FDR), power and misclassification probability for the Benjamini-Hochberg (BH) procedure and its modifications, as well as for various parametric and nonparametric Bayes and Parametric Empirical Bayes procedures. Our results confirm the observation of Genovese and Wasserman (2002) that for small p the misclassification error of BH is close to optimal in the sense of attaining the Bayes oracle. This property is shared by some of the considered Bayes testing rules, which in general perform better than BH for large or moderate pp's.Comment: Published in at http://dx.doi.org/10.1214/193940307000000158 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore