86 research outputs found

    The Interval Property in Multiple Testing of Pairwise Differences

    Full text link
    The usual step-down and step-up multiple testing procedures most often lack an important intuitive, practical, and theoretical property called the interval property. In short, the interval property is simply that for an individual hypothesis, among the several to be tested, the acceptance sections of relevant statistics are intervals. Lack of the interval property is a serious shortcoming. This shortcoming is demonstrated for testing various pairwise comparisons in multinomial models, multivariate normal models and in nonparametric models. Residual based stepwise multiple testing procedures that do have the interval property are offered in all these cases.Comment: Published in at http://dx.doi.org/10.1214/11-STS372 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Characterization of Bayes procedures for multiple endpoint problems and inadmissibility of the step-up procedure

    Full text link
    The problem of multiple endpoint testing for k endpoints is treated as a 2^k finite action problem. The loss function chosen is a vector loss function consisting of two components. The two components lead to a vector risk. One component of the vector risk is the false rejection rate (FRR), that is, the expected number of false rejections. The other component is the false acceptance rate (FAR), that is, the expected number of acceptances for which the corresponding null hypothesis is false. This loss function is more stringent than the positive linear combination loss function of Lehmann [Ann. Math. Statist. 28 (1957) 1-25] and Cohen and Sackrowitz [Ann. Statist. (2005) 33 126-144] in the sense that the class of admissible rules is larger for this vector risk formulation than for the linear combination risk function. In other words, fewer procedures are inadmissible for the vector risk formulation. The statistical model assumed is that the vector of variables Z is multivariate normal with mean vector \mu and known intraclass covariance matrix \Sigma. The endpoint hypotheses are H_i:\mu_i=0 vs K_i:\mu_i>0, i=1,...,k. A characterization of all symmetric Bayes procedures and their limits is obtained. The characterization leads to a complete class theorem. The complete class theorem is used to provide a useful necessary condition for admissibility of a procedure. The main result is that the step-up multiple endpoint procedure is shown to be inadmissible.Comment: Published at http://dx.doi.org/10.1214/009053604000000986 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Decision theory results for one-sided multiple comparison procedures

    Full text link
    A resurgence of interest in multiple hypothesis testing has occurred in the last decade. Motivated by studies in genomics, microarrays, DNA sequencing, drug screening, clinical trials, bioassays, education and psychology, statisticians have been devoting considerable research energy in an effort to properly analyze multiple endpoint data. In response to new applications, new criteria and new methodology, many ad hoc procedures have emerged. The classical requirement has been to use procedures which control the strong familywise error rate (FWE) at some predetermined level \alpha. That is, the probability of any false rejection of a true null hypothesis should be less than or equal to \alpha. Finding desirable and powerful multiple test procedures is difficult under this requirement. One of the more recent ideas is concerned with controlling the false discovery rate (FDR), that is, the expected proportion of rejected hypotheses which are, in fact, true. Many multiple test procedures do control the FDR. A much earlier approach to multiple testing was formulated by Lehmann [Ann. Math. Statist. 23 (1952) 541-552 and 28 (1957) 1-25]. Lehmann's approach is decision theoretic and he treats the multiple endpoints problem as a 2^k finite action problem when there are k endpoints. This approach is appealing since unlike the FWE and FDR criteria, the finite action approach pays attention to false acceptances as well as false rejections.Comment: Published at http://dx.doi.org/10.1214/009053604000000968 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An Alternative to Student\u27s t-Test for Problems With Indifference Zones

    Get PDF
    Consider a sample from a normal population with mean, ÎŒ, and variance unknown. Suppose it is desired to test H0:ÎŒ ≀ ÎŒ0 versus H1:ÎŒ ≄ ÎŒ1, with the region HI1:ÎŒ0 \u3c ÎŒ \u3c ÎŒ1 being a (nonempty) indifference zone. It is shown that the usual Student\u27s t-test is inadmissible for this problem. An alternative test is proposed. The two sided problem with indifference region is also discussed. By contrast with the above result, the usual Student\u27s t-test is admissible here. However the two sided version of the alternative test mentioned above does offer some practical advantages relative to the two sided t-test. A 3-decision version of the two sided problem is also discussed. Here the t-test is inadmissible, and is dominated by the appropriate version of the alternative test. The results concerning tests are also reformulated as results about confidence procedures

    A new multiple testing method in the dependent case

    Full text link
    The most popular multiple testing procedures are stepwise procedures based on PP-values for individual test statistics. Included among these are the false discovery rate (FDR) controlling procedures of Benjamini--Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289--300] and their offsprings. Even for models that entail dependent data, PP-values based on marginal distributions are used. Unlike such methods, the new method takes dependency into account at all stages. Furthermore, the PP-value procedures often lack an intuitive convexity property, which is needed for admissibility. Still further, the new methodology is computationally feasible. If the number of tests is large and the proportion of true alternatives is less than say 25 percent, simulations demonstrate a clear preference for the new methodology. Applications are detailed for models such as testing treatments against control (or any intraclass correlation model), testing for change points and testing means when correlation is successive.Comment: Published in at http://dx.doi.org/10.1214/08-AOS616 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The earth is flat (p < 0.05): significance thresholds and the crisis of unreplicable research

    Get PDF
    The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p -values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p -values at face value, but mistrust results with larger p -values. In either case, p -values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance ( p ≀ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be ‘conflicting’, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p -hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p -values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p -values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that ‘there is no effect’. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p -values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p -values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment

    Effect of angiotensin-converting enzyme inhibitor and angiotensin receptor blocker initiation on organ support-free days in patients hospitalized with COVID-19

    Get PDF
    IMPORTANCE Overactivation of the renin-angiotensin system (RAS) may contribute to poor clinical outcomes in patients with COVID-19. Objective To determine whether angiotensin-converting enzyme (ACE) inhibitor or angiotensin receptor blocker (ARB) initiation improves outcomes in patients hospitalized for COVID-19. DESIGN, SETTING, AND PARTICIPANTS In an ongoing, adaptive platform randomized clinical trial, 721 critically ill and 58 non–critically ill hospitalized adults were randomized to receive an RAS inhibitor or control between March 16, 2021, and February 25, 2022, at 69 sites in 7 countries (final follow-up on June 1, 2022). INTERVENTIONS Patients were randomized to receive open-label initiation of an ACE inhibitor (n = 257), ARB (n = 248), ARB in combination with DMX-200 (a chemokine receptor-2 inhibitor; n = 10), or no RAS inhibitor (control; n = 264) for up to 10 days. MAIN OUTCOMES AND MEASURES The primary outcome was organ support–free days, a composite of hospital survival and days alive without cardiovascular or respiratory organ support through 21 days. The primary analysis was a bayesian cumulative logistic model. Odds ratios (ORs) greater than 1 represent improved outcomes. RESULTS On February 25, 2022, enrollment was discontinued due to safety concerns. Among 679 critically ill patients with available primary outcome data, the median age was 56 years and 239 participants (35.2%) were women. Median (IQR) organ support–free days among critically ill patients was 10 (–1 to 16) in the ACE inhibitor group (n = 231), 8 (–1 to 17) in the ARB group (n = 217), and 12 (0 to 17) in the control group (n = 231) (median adjusted odds ratios of 0.77 [95% bayesian credible interval, 0.58-1.06] for improvement for ACE inhibitor and 0.76 [95% credible interval, 0.56-1.05] for ARB compared with control). The posterior probabilities that ACE inhibitors and ARBs worsened organ support–free days compared with control were 94.9% and 95.4%, respectively. Hospital survival occurred in 166 of 231 critically ill participants (71.9%) in the ACE inhibitor group, 152 of 217 (70.0%) in the ARB group, and 182 of 231 (78.8%) in the control group (posterior probabilities that ACE inhibitor and ARB worsened hospital survival compared with control were 95.3% and 98.1%, respectively). CONCLUSIONS AND RELEVANCE In this trial, among critically ill adults with COVID-19, initiation of an ACE inhibitor or ARB did not improve, and likely worsened, clinical outcomes. TRIAL REGISTRATION ClinicalTrials.gov Identifier: NCT0273570

    The influence of the nonrecent past in prediction for stochastic processes

    No full text
    Consider the stochastic processes X1, X2,... and [Lambda]1, [Lambda]2,... where the X process can be thought of as observations on the [Lambda] process. We investigate the asymptotic behavior of the conditional distributions of Xt+v given X1,..., Xt and [Lambda]t+v given X1,..., Xt with regard to their dependency on the "early" part of the X process. These distributions arise in various time series and sequential decision theory problems. The results support the intuitively reasonable and often used (as a basic tenet of model building) assumption that only the more recent past is needed for near optimal prediction.Stochastic process prediction martingale Markov process stationary process
    • 

    corecore