958 research outputs found

    Familywise Error Rate Control via Knockoffs

    Get PDF
    We present a novel method for controlling the kk-familywise error rate (kk-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Cand\`es. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testing procedures which act directly on pp-values, knockoffs is specifically tailored to linear regression and implicitly accounts for the statistical relationships between hypothesis tests of different coefficients. We prove that knockoffs controls the kk-FWER exactly in finite samples and show in simulations that it provides superior power to alternative procedures over a range of linear regression problems. We also discuss extensions to controlling other Type I error rates such as the false exceedance rate, and use it to identify candidates for mutations conferring drug-resistance in HIV.Comment: 15 pages, 3 figures. Updated reference

    A robust procedure for comparing multiple means under heteroscedasticity in unbalanced designs.

    Get PDF
    Investigating differences between means of more than two groups or experimental conditions is a routine research question addressed in biology. In order to assess differences statistically, multiple comparison procedures are applied. The most prominent procedures of this type, the Dunnett and Tukey-Kramer test, control the probability of reporting at least one false positive result when the data are normally distributed and when the sample sizes and variances do not differ between groups. All three assumptions are non-realistic in biological research and any violation leads to an increased number of reported false positive results. Based on a general statistical framework for simultaneous inference and robust covariance estimators we propose a new statistical multiple comparison procedure for assessing multiple means. In contrast to the Dunnett or Tukey-Kramer tests, no assumptions regarding the distribution, sample sizes or variance homogeneity are necessary. The performance of the new procedure is assessed by means of its familywise error rate and power under different distributions. The practical merits are demonstrated by a reanalysis of fatty acid phenotypes of the bacterium Bacillus simplex from the "Evolution Canyons" I and II in Israel. The simulation results show that even under severely varying variances, the procedure controls the number of false positive findings very well. Thus, the here presented procedure works well under biologically realistic scenarios of unbalanced group sizes, non-normality and heteroscedasticity

    Recent developments towards optimality in multiple hypothesis testing

    Full text link
    There are many different notions of optimality even in testing a single hypothesis. In the multiple testing area, the number of possibilities is very much greater. The paper first will describe multiplicity issues that arise in tests involving a single parameter, and will describe a new optimality result in that context. Although the example given is of minimal practical importance, it illustrates the crucial dependence of optimality on the precise specification of the testing problem. The paper then will discuss the types of expanded optimality criteria that are being considered when hypotheses involve multiple parameters, will note a few new optimality results, and will give selected theoretical references relevant to optimality considerations under these expanded criteria.Comment: Published at http://dx.doi.org/10.1214/074921706000000374 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Rejection Principle for Sequential Tests of Multiple Hypotheses Controlling Familywise Error Rates

    Full text link
    We present a unifying approach to multiple testing procedures for sequential (or streaming) data by giving sufficient conditions for a sequential multiple testing procedure to control the familywise error rate (FWER), extending to the sequential domain the work of Goeman and Solari (2010) who accomplished this for fixed sample size procedures. Together we call these conditions the "rejection principle for sequential tests," which we then apply to some existing sequential multiple testing procedures to give simplified understanding of their FWER control. Next the principle is applied to derive two new sequential multiple testing procedures with provable FWER control, one for testing hypotheses in order and another for closed testing. Examples of these new procedures are given by applying them to a chromosome aberration data set and to finding the maximum safe dose of a treatment

    Statistical significance in high-dimensional linear models

    Full text link
    We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters. Furthermore, when considering many hypotheses, we show how to adjust for multiple testing taking dependence among the p-values into account. Our technique is based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions. We prove strong error control for our p-values and provide sufficient conditions for detection: for the former, we do not make any assumption on the size of the true underlying regression coefficients while regarding the latter, our procedure might not be optimal in terms of power. We demonstrate the method in simulated examples and a real data application.Comment: Published in at http://dx.doi.org/10.3150/12-BEJSP11 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Genome-Wide Significance Levels and Weighted Hypothesis Testing

    Full text link
    Genetic investigations often involve the testing of vast numbers of related hypotheses simultaneously. To control the overall error rate, a substantial penalty is required, making it difficult to detect signals of moderate strength. To improve the power in this setting, a number of authors have considered using weighted pp-values, with the motivation often based upon the scientific plausibility of the hypotheses. We review this literature, derive optimal weights and show that the power is remarkably robust to misspecification of these weights. We consider two methods for choosing weights in practice. The first, external weighting, is based on prior information. The second, estimated weighting, uses the data to choose weights.Comment: Published in at http://dx.doi.org/10.1214/09-STS289 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Adaptive, Group Sequential Designs that Balance the Benefits and Risks of Wider Inclusion Criteria

    Get PDF
    We propose a new class of adaptive randomized trial designs aimed at gaining the advantages of wider generalizability and faster recruitment, while mitigating the risks of including a population for which there is greater a priori uncertainty. Our designs use adaptive enrichment, i.e., they have preplanned decision rules for modifying enrollment criteria based on data accrued at interim analyses. For example, enrollment can be restricted if the participants from predefined subpopulations are not benefiting from the new treatment. To the best of our knowledge, our designs are the first adaptive enrichment designs to have all of the following features: the multiple testing procedure fully leverages the correlation among statistics for different populations; the familywise Type I error rate is strongly controlled; for outcomes that are binary, normally distributed, or Poisson distributed, the decision rule and multiple testing procedure are functions of the data only through minimal sufficient statistics. The advantage of relying solely on minimal sufficient statistics is that not doing so can lead to losses in power. Our designs incorporate standard group sequential boundaries for each population of interest; this may be helpful in communicating our designs, since many clinical investigators are familiar with such boundaries, which can be summarized succinctly in a single table or graph. We demonstrate these adaptive designs in the context of a Phase III trial of a new treatment for stroke, and provide user-friendly, free software implementing these designs
    • …
    corecore