3,376 research outputs found

    The sceptical Bayes factor for the assessment of replication success

    Full text link
    There is an urgent need to develop new methodology for the design and analysis of replication studies. Recently, a reverse-Bayes method called the sceptical pp-value has been proposed for this purpose; the inversion of Bayes' theorem allows us to mathematically formalise the notion of scepticism, which in turn can be used to assess the agreement between the findings of an original study and its replication. However, despite its Bayesian nature, the method relies on tail probabilities as primary inference tools. Here, we present an extension that uses Bayes factors as an alternative means of quantifying evidence. This leads to a new measure for evaluating replication success, the sceptical Bayes factor: Conceptually, the sceptical Bayes factor provides a bound for the maximum level of evidence at which an advocate of the original finding can convince a sceptic who does not trust it, in light of the replication data. While the sceptical pp-value can only quantify the conflict between the sceptical prior and the observed replication data, the sceptical Bayes factor also takes into account how likely the data are under the posterior distribution of the effect conditional on the original study, allowing for stronger statements about replication success. Moreover, the proposed method elegantly combines traditional notions of replication success; it ensures that both studies need to show evidence against the null, while at the same time penalising incompatibility of their effect estimates. Case studies from the Reproducibility Project: Cancer Biology and the Social Sciences Replication Project show the advantages of the method for the quantitative assessment of replicability

    The sceptical Bayes factor for the assessment of replication success

    Full text link
    Replication studies are increasingly conducted but there is no established statistical criterion for replication success. We propose a novel approach combining reverse-Bayes analysis with Bayesian hypothesis testing: a sceptical prior is determined for the effect size such that the original finding is no longer convincing in terms of a Bayes factor. This prior is then contrasted to an advocacy prior (the reference posterior of the effect size based on the original study), and replication success is declared if the replication data favour the advocacy over the sceptical prior at a higher level than the original data favoured the sceptical prior over the null hypothesis. The sceptical Bayes factor is the highest level where replication success can be declared. A comparison to existing methods reveals that the sceptical Bayes factor combines several notions of replicability: it ensures that both studies show sufficient evidence against the null and penalises incompatibility of their effect estimates. Analysis of asymptotic properties and error rates, as well as case studies from the Social Sciences Replication Project show the advantages of the method for the assessment of replicability

    Multisite generalizations of replicability measures

    Get PDF
    Multisite replication studies aim to repeat an original study in order to assess whether similar results can be obtained with new data across different study sites. While a variety of statistical methods have been proposed for the analysis of singlesite replication studies, fewer methods are available for the multisite setting. Here we discuss several extensions of singlesite methods that have not yet been generalized to the multisite setting, both frequentist (the two-trials rule) and Bayesian (the sceptical p-value, the replication Bayes factor, and the sceptical Bayes factor). A key challenge is to account for between-replication heterogeneity, and we present different approaches for doing so. These generalizations provide analysts with a suite of methods for assessing different aspects of replicability. We illustrate their properties using data from several multisite replication projects

    Bayesian approaches to designing replication studies

    Full text link
    Replication studies are essential for assessing the credibility of claims from original studies. A critical aspect of designing replication studies is determining their sample size; a too small sample size may lead to inconclusive studies whereas a too large sample size may waste resources that could be allocated better in other studies. Here we show how Bayesian approaches can be used for tackling this problem. The Bayesian framework allows researchers to combine the original data and external knowledge in a design prior distribution for the underlying parameters. Based on a design prior, predictions about the replication data can be made, and the replication sample size can be chosen to ensure a sufficiently high probability of replication success. Replication success may be defined through Bayesian or non-Bayesian criteria, and different criteria may also be combined to meet distinct stakeholders and allow conclusive inferences based on multiple analysis approaches. We investigate sample size determination in the normal-normal hierarchical model where analytical results are available and traditional sample size determination is a special case where the uncertainty on parameter values is not accounted for. An application to data from a multisite replication project of social-behavioral experiments illustrates how Bayesian approaches help to design informative and cost-effective replication studies. Our methods can be used through the R package BayesRepDesign

    Comment on “Bayesian additional evidence for decision making under small sample uncertainty”

    Full text link
    We examine the concept of Bayesian Additional Evidence (BAE) recently proposed by Sondhi et al. We derive simple closed-form expressions for BAE and compare its properties with other methods for assessing findings in the light of new evidence. We find that while BAE is easy to apply, it lacks both a compelling rationale and clarity of use needed for reliable decision-making. Keywords: Advocacy prior; Analysis of credibility; Bayesian additional evidence; Reverse-Baye

    Pitfalls and potentials in simulation studies: Questionable research practices in comparative simulation studies allow for spurious claims of superiority of any method

    Get PDF
    Comparative simulation studies are workhorse tools for benchmarking statistical methods. As with other empirical studies, the success of simulation studies hinges on the quality of their design, execution, and reporting. If not conducted carefully and transparently, their conclusions may be misleading. In this paper, we discuss various questionable research practices, which may impact the validity of simulation studies, some of which cannot be detected or prevented by the current publication process in statistics journals. To illustrate our point, we invent a novel prediction method with no expected performance gain and benchmark it in a preregistered comparative simulation study. We show how easy it is to make the method appear superior over well‐established competitor methods if questionable research practices are employed. Finally, we provide concrete suggestions for researchers, reviewers, and other academic stakeholders for improving the methodological quality of comparative simulation studies, such as preregistering simulation protocols, incentivizing neutral simulation studies, and code and data sharing

    The assessment of replication success based on relative effect size

    Full text link
    Replication studies are increasingly conducted in order to confirm original findings. However, there is no established standard how to assess replication success, and, in practice, many different approaches are used. The purpose of this paper is to refine and extend a recently proposed reverse-Bayes approach for the analysis of replication studies. We show how this method is directly related to the relative effect size, the ratio of the replication to the original effect estimate. This perspective leads to a new proposal to recalibrate the assessment of replication success, the golden level. The recalibration ensures that, for borderline significant original studies, replication success can only be achieved if the replication effect estimate is larger than the original one. Conditional power for replication success can then take any desired value if the original study is significant and the replication sample size is large enough. Compared to the standard approach to require statistical significance of both the original and replication study, replication success at the golden level offers uniform gains in project power and controls the type-I error rate if the replication sample size is not smaller than the original one. An application to data from four large replication projects shows that the new approach leads to more appropriate inferences, as it penalizes shrinkage of the replication estimate, compared to the original one, while ensuring that both effect estimates are sufficiently convincing on their own

    Evidential Calibration of Confidence Intervals

    Get PDF

    Evidential Calibration of Confidence Intervals

    Full text link
    We present a novel and easy-to-use method for calibrating error-rate based confidence intervals to evidence-based support intervals. Support intervals are obtained from inverting Bayes factors based on a parameter estimate and its standard error. A kk support interval can be interpreted as "the observed data are at least kk times more likely under the included parameter values than under a specified alternative". Support intervals depend on the specification of prior distributions for the parameter under the alternative, and we present several types that allow different forms of external knowledge to be encoded. We also show how prior specification can to some extent be avoided by considering a class of prior distributions and then computing so-called minimum support intervals which, for a given class of priors, have a one-to-one mapping with confidence intervals. We also illustrate how the sample size of a future study can be determined based on the concept of support. Finally, we show how the bound for the type I error rate of Bayes factors leads to a bound for the coverage of support intervals. An application to data from a clinical trial illustrates how support intervals can lead to inferences that are both intuitive and informative

    Reverse-Bayes methods for evidence assessment and research synthesis

    Full text link
    It is now widely accepted that the standard inferential toolkit used by the scientific research community -- null-hypothesis significance testing (NHST) -- is not fit for purpose. Yet despite the threat posed to the scientific enterprise, there is no agreement concerning alternative approaches for evidence assessment. This lack of consensus reflects long-standing issues concerning Bayesian methods, the principal alternative to NHST. We report on recent work that builds on an approach to inference put forward over 70 years ago to address the well-known "Problem of Priors" in Bayesian analysis, by reversing the conventional prior-likelihood-posterior ("forward") use of Bayes's Theorem. Such Reverse-Bayes analysis allows priors to be deduced from the likelihood by requiring that the posterior achieve a specified level of credibility. We summarise the technical underpinning of this approach, and show how it opens up new approaches to common inferential challenges, such as assessing the credibility of scientific findings, setting them in appropriate context, estimating the probability of successful replications, and extracting more insight from NHST while reducing the risk of misinterpretation. We argue that Reverse-Bayes methods have a key role to play in making Bayesian methods more accessible and attractive for evidence assessment and research synthesis. As a running example we consider a recently published meta-analysis from several randomized controlled clinical trials investigating the association between corticosteroids and mortality in hospitalized patients with COVID-19.Comment: revised version of original manuscript "Reverse-Bayes methods: a review of recent technical advances
    corecore