92,779 research outputs found

    Worked-out examples of the adequacy of Bayesian optional stopping

    Get PDF
    The practice of sequentially testing a null hypothesis as data are collected until the null hypothesis is rejected is known as optional stopping. It is well known that optional stopping is problematic in the context of p value-based null hypothesis significance testing: The false-positive rates quickly overcome the single test's significance level. However, the state of affairs under null hypothesis Bayesian testing, where p values are replaced by Bayes factors, has perhaps surprisingly been much less consensual. Rouder (2014) used simulations to defend the use of optional stopping under null hypothesis Bayesian testing. The idea behind these simulations is closely related to the idea of sampling from prior predictive distributions. Deng et al. (2016) and Hendriksen et al. (2020) have provided mathematical evidence to the effect that optional stopping under null hypothesis Bayesian testing does hold under some conditions. These papers are, however, exceedingly technical for most researchers in the applied social sciences. In this paper, we provide some mathematical derivations concerning Rouder's approximate simulation results for the two Bayesian hypothesis tests that he considered. The key idea is to consider the probability distribution of the Bayes factor, which is regarded as being a random variable across repeated sampling. This paper therefore offers an intuitive perspective to the literature and we believe it is a valid contribution towards understanding the practice of optional stopping in the context of Bayesian hypothesis testing

    Significance testing in quantile regression

    Get PDF
    We consider the problem of testing significance of predictors in multivariate nonparametric quantile regression. A stochastic process is proposed, which is based on a comparison of the responses with a nonparametric quantile regression estimate under the null hypothesis. It is demonstrated that under the null hypothesis this process converges weakly to a centered Gaussian process and the asymptotic properties of the test under fixed and local alternatives are also discussed. In particular we show, that - in contrast to the nonparametric approach based on estimation of L2L^2-distances - the new test is able to detect local alternatives which converge to the null hypothesis with any rate an→0a_n \to 0 such that ann→∞a_n \sqrt{n} \to \infty (here nn denotes the sample size). We also present a small simulation study illustrating the finite sample properties of a bootstrap version of the the corresponding Kolmogorov-Smirnov test

    A Proposed Hybrid Effect Size Plus p -Value Criterion: Empirical Evidence Supporting its Use

    Get PDF
    DOI: 10.1080/00031305.2018.1564697 When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α = 0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information

    An adaptive significance threshold criterion for massive multiple hypotheses testing

    Full text link
    This research deals with massive multiple hypothesis testing. First regarding multiple tests as an estimation problem under a proper population model, an error measurement called Erroneous Rejection Ratio (ERR) is introduced and related to the False Discovery Rate (FDR). ERR is an error measurement similar in spirit to FDR, and it greatly simplifies the analytical study of error properties of multiple test procedures. Next an improved estimator of the proportion of true null hypotheses and a data adaptive significance threshold criterion are developed. Some asymptotic error properties of the significant threshold criterion is established in terms of ERR under distributional assumptions widely satisfied in recent applications. A simulation study provides clear evidence that the proposed estimator of the proportion of true null hypotheses outperforms the existing estimators of this important parameter in massive multiple tests. Both analytical and simulation studies indicate that the proposed significance threshold criterion can provide a reasonable balance between the amounts of false positive and false negative errors, thereby complementing and extending the various FDR control procedures. S-plus/R code is available from the author upon request.Comment: Published at http://dx.doi.org/10.1214/074921706000000392 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    The asymptotic relative efficiency and the ratio of sample sizes when testing two different null hypotheses

    Get PDF
    Composite endpoints, consisting of the union of two or more outcomes, are often used as the primary endpoint in time-to-event randomized clinical trials. Previously, Gómez and Lagakos provided a method to guide the decision between using a composite endpoint instead of one of its components when testing the effect of a treatment in a randomized clinical trial. Consider the problem of testing the null hypotheses of no treatment effect by means of either the single component or the composite endpoint. In this paper we prove that the usual interpretation of the asymptotic relative efficiency as the reciprocal ratio of the sample sizes required for two test procedures, for the same null and alternative hypothesis, and attaining the same power at the same significance level, can be extended to the test procedures considered here for two different null and alternative hypotheses. A simulation to study the relationship between asymptotic relative efficiency and finite sample sizes is carried out.Peer ReviewedPostprint (published version

    Assessing Significance in Finite Mixture Models

    Get PDF
    A new method is proposed to quantify significance in finite mixture models. The basis for this new methodology is an approach that calculates the p-value for testing a simpler model against a more complicated one in a way that is able to obviate the failure of regularity conditions for likelihood ratio tests. The developed testing procedure allows for pairwise comparison of any two mixture models with failure to reject the null hypothesis implying insignificant likelihood improvement under the more complex model. This leads to a comprehensive tool called a quantitation map which displays significance and quantitatively summarizes all model comparisons. This map can be used, among other applications, to decide on the best among a set of candidate mixture models. The performance of the procedure is illustrated on some classification datasets and a comprehensive simulation study. The methodology is also applied to a study of voting preferences of senators in the 109th US Congress. Although the development of our testing strategy is based on large-sample theory, we note that it has impressive performance even in cases with moderate sample sizes
    • …
    corecore