9 research outputs found

    Many Labs 5:Testing pre-data collection peer review as an intervention to increase replicability

    Get PDF
    Replication studies in psychological science sometimes fail to reproduce prior findings. If these studies use methods that are unfaithful to the original study or ineffective in eliciting the phenomenon of interest, then a failure to replicate may be a failure of the protocol rather than a challenge to the original finding. Formal pre-data-collection peer review by experts may address shortcomings and increase replicability rates. We selected 10 replication studies from the Reproducibility Project: Psychology (RP:P; Open Science Collaboration, 2015) for which the original authors had expressed concerns about the replication designs before data collection; only one of these studies had yielded a statistically significant effect (p < .05). Commenters suggested that lack of adherence to expert review and low-powered tests were the reasons that most of these RP:P studies failed to replicate the original effects. We revised the replication protocols and received formal peer review prior to conducting new replication studies. We administered the RP:P and revised protocols in multiple laboratories (median number of laboratories per original study = 6.5, range = 3?9; median total sample = 1,279.5, range = 276?3,512) for high-powered tests of each original finding with both protocols. Overall, following the preregistered analysis plan, we found that the revised protocols produced effect sizes similar to those of the RP:P protocols (?r = .002 or .014, depending on analytic approach). The median effect size for the revised protocols (r = .05) was similar to that of the RP:P protocols (r = .04) and the original RP:P replications (r = .11), and smaller than that of the original studies (r = .37). Analysis of the cumulative evidence across the original studies and the corresponding three replication attempts provided very precise estimates of the 10 tested effects and indicated that their effect sizes (median r = .07, range = .00?.15) were 78% smaller, on average, than the original effect sizes (median r = .37, range = .19?.50)

    Many Labs 5: Testing pre-data collection peer review as an intervention to increase replicability

    No full text
    Replication studies in psychological science sometimes fail to reproduce prior findings. If these studies use methods that are unfaithful to the original study or ineffective in eliciting the phenomenon of interest, then a failure to replicate may be a failure of the protocol rather than a challenge to the original finding. Formal pre-data-collection peer review by experts may address shortcomings and increase replicability rates. We selected 10 replication studies from the Reproducibility Project: Psychology (RP:P; Open Science Collaboration, 2015) for which the original authors had expressed concerns about the replication designs before data collection; only one of these studies had yielded a statistically significant effect (p \u3c .05). Commenters suggested that lack of adherence to expert review and low-powered tests were the reasons that most of these RP:P studies failed to replicate the original effects. We revised the replication protocols and received formal peer review prior to conducting new replication studies. We administered the RP:P and revised protocols in multiple laboratories (median number of laboratories per original study = 6.5, range = 3–9; median total sample = 1,279.5, range = 276–3,512) for high-powered tests of each original finding with both protocols. Overall, following the preregistered analysis plan, we found that the revised protocols produced effect sizes similar to those of the RP:P protocols (Δr = .002 or .014, depending on analytic approach). The median effect size for the revised protocols (r = .05) was similar to that of the RP:P protocols (r = .04) and the original RP:P replications (r = .11), and smaller than that of the original studies (r = .37). Analysis of the cumulative evidence across the original studies and the corresponding three replication attempts provided very precise estimates of the 10 tested effects and indicated that their effect sizes (median r = .07, range = .00–.15) were 78% smaller, on average, than the original effect sizes (median r = .37, range = .19–.50)

    Comparisons of Daily Behavior Across 21 Countries

    Get PDF
    While a large body of research has investigated cultural differences in behavior, this typical study assesses a single behavioral outcome, in a single context, compared across two countries. The current study compared a broad array of behaviors across 21 countries (N ¼ 5,522). Participants described their behavior at 7:00 p.m. the previous evening using the 68 items of the Riverside Behavioral Q-sort (RBQ). Correlations between average patterns of behavior in each country ranged from r ¼ .69 to r ¼ .97 and, in general, described a positive and relaxed activity. The most similar patterns were United States/Canada and least similar were Japan/United Arab Emirates (UAE). Similarities in behavior within countries were largest in Spain and smallest in the UAE. Further analyses correlated average RBQ item placements in each country with, among others, country-level value dimensions, personality traits, self-esteem levels, economic output, and population. Extroversion, openness, neuroticism, conscientiousness, selfesteem, happiness, and tolerant attitudes yielded more significant correlations than expected by chance

    Many Labs 3: Evaluating participant pool quality across the academic semester via replication

    No full text
    The university participant pool is a key resource for behavioral research, and data quality is believed to vary over the course of the academic semester. This crowdsourced project examined time of semester variation in 10 known effects, 10 individual differences, and 3 data quality indicators over the course of the academic semester in 20 participant pools (N = 2696) and with an online sample (N = 737). Weak time of semester effects were observed on data quality indicators, participant sex, and a few individual differences conscientiousness, mood, and stress. However, there was little evidence for time of semester qualifying experimental or correlational effects. The generality of this evidence is unknown because only a subset of the tested effects demonstrated evidence for the original result in the whole sample. Mean characteristics of pool samples change slightly during the semester, but these data suggest that those changes are mostly irrelevant for detecting effects. (C) 2015 Elsevier Inc. All rights reserved

    Data from: Estimating the reproducibility of psychological science

    No full text
    This record contains the underlying research data for the publication "Estimating the reproducibility of psychological science" and the full-text is available from: https://ink.library.smu.edu.sg/lkcsb_research/5257Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams
    corecore