9 research outputs found

    Crowdsourcing hypothesis tests: Making transparent how design choices shape research results

    Get PDF
    To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer fiveoriginal research questions related to moral judgments, negotiations, and implicit cognition. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams renderedstatistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.</div

    Relation between sample size and Bayes factors.

    No full text
    <p>Among the 43 null results from the 2015 volume of NEJM, large samples are more likely to yield compelling evidence in favor of the null hypothesis than small samples (r = 0.72).</p

    Valid statements based on p-values and Bayes factors.

    No full text
    <p>The <i>p</i>-value and the Bayes factor allow fundamentally different statements concerning the null hypothesis. The <i>p</i>-value can be used to make a discrete decision: reject or retain the null hypothesis. The Bayes factor grades the evidence that the data provide for and against the null hypothesis.</p

    Simulation results for an in limbo analysis of a single subject.

    No full text
    <p>In a standard fMRI analysis, some region, region A, can be identified as significantly more activated during a specific condition. For another region, region B, this effect is not significant. However, region B also does not differ significantly from region A. Hence, it is incorrect to conclude that region A is selectively activated; instead, it is appropriate to conclude that region A is activated, and region B is in limbo. See also section <i>Simulation Studies</i>.</p

    An example of the in limbo approach.

    No full text
    <p>Shown are hypothetical contrast sizes and confidence intervals for 6 regions (these could be individual voxels). For region 1 and 6 there is no significant effect: zero falls well within the confidence intervals of the contrast size of in these regions. For region 3 and 5, there is a clear effect: the “task rest” contrast differs significantly from zero. For region 2 and 4 the situation is more complicated: the confidence interval of the “task rest” contrast still contains 0, so one is unable to reject the null hypothesis. However, the size of the contrast is not significantly different from that of the contrast in region 5. Regions 2 and 4 are “in limbo”: they differ neither from baseline, nor from least-significantly activated region. Legend: Orange areas are significantly activated, green areas are in limbo gray areas are significantly less activated than significant regions.</p

    https://osf.io/f3rxt/ from Exploring open science practices in behavioural public policy research

    No full text
    In their book ‘Nudge: Improving Decisions About Health, Wealth and Happiness’, Thaler & Sunstein (2009 Nudge: improving decisions about health, wealth and happiness. Penguin) argue that choice architectures are promising public policy interventions. This research programme motivated the creation of ‘nudge units’, government agencies which aim to apply insights from behavioural science to improve public policy. We closely examine a meta-analysis of the evidence gathered by two of the largest and most influential nudge units DellaVigna & Linos (2022 Econometrica 90, 81–116) and use statistical techniques to detect reporting biases. Our analysis shows evidence suggestive of selective reporting. We additionally evaluate the public pre-analysis plans from one of the two nudge units (Office of Evaluation Sciences). We identify several instances of excellent practice; however, we also find that the analysis plans and reporting often lack sufficient detail to evaluate (unintentional) reporting biases. We highlight several improvements that would enhance the effectiveness of the pre-analysis plans and reports as a means to combat reporting biases. Our findings and suggestions can further improve the evidence base for policy decisions

    WagenmakersOpenPracticesDisclosure – Supplemental material for Quantifying Support for the Null Hypothesis in Psychology: An Empirical Investigation

    No full text
    <p>Supplemental material, WagenmakersOpenPracticesDisclosure for Quantifying Support for the Null Hypothesis in Psychology: An Empirical Investigation by Balazs Aczel, Bence Palfi, Aba Szollosi, Marton Kovacs, Barnabas Szaszi, Peter Szecsi, Mark Zrubka, Quentin F. Gronau, Don van den Bergh and Eric-Jan Wagenmakers in Advances in Methods and Practices in Psychological Science</p

    Crowdsourcing hypothesis tests: making transparent how design choices shape research results

    Get PDF
    To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams rendered statistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim
    corecore