10 research outputs found

    Consensus-based guidance for conducting and reporting multi-analyst studies

    Get PDF
    International audienceAny large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research

    Many Labs 2: Investigating Variation in Replicability Across Samples and Settings

    Get PDF
    We conducted preregistered replications of 28 classic and contemporary published findings, with protocols that were peer reviewed in advance, to examine variation in effect magnitudes across samples and settings. Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories. Using the conventional criterion of statistical significance (p < .05), we found that 15 (54%) of the replications provided evidence of a statistically significant effect in the same direction as the original finding. With a strict significance criterion (p < .0001), 14 (50%) of the replications still provided such evidence, a reflection of the extremely highpowered design. Seven (25%) of the replications yielded effect sizes larger than the original ones, and 21 (75%) yielded effect sizes smaller than the original ones. The median comparable Cohen’s ds were 0.60 for the original findings and 0.15 for the replications. The effect sizes were small (< 0.20) in 16 of the replications (57%), and 9 effects (32%) were in the direction opposite the direction of the original effect. Across settings, the Q statistic indicated significant heterogeneity in 11 (39%) of the replication effects, and most of those were among the findings with the largest overall effect sizes; only 1 effect that was near zero in the aggregate showed significant heterogeneity according to this measure. Only 1 effect had a tau value greater than .20, an indication of moderate heterogeneity. Eight others had tau values near or slightly above .10, an indication of slight heterogeneity. Moderation tests indicated that very little heterogeneity was attributable to the order in which the tasks were performed or whether the tasks were administered in lab versus online. Exploratory comparisons revealed little heterogeneity between Western, educated, industrialized, rich, and democratic (WEIRD) cultures and less WEIRD cultures (i.e., cultures with relatively high and low WEIRDness scores, respectively). Cumulatively, variability in the observed effect sizes was attributable more to the effect being studied than to the sample or setting in which it was studied.UCR::Vicerrectoría de Investigación::Unidades de Investigación::Ciencias Sociales::Instituto de Investigaciones Psicológicas (IIP

    Associations between maternal psychological distress and salivary cortisol during pregnancy: A mixed-models approach

    No full text
    BACKGROUND: Maternal psychological distress during pregnancy is related to adverse child behavioral and emotional outcomes later in life, such as ADHD and anxiety/depression. The underlying mechanisms for this, however, are still largely unknown. The hypothalamic-pituitary-adrenal (HPA)-axis, with its most important effector hormone cortisol, has been proposed as a mechanism, but results have been inconsistent. The current study investigated the association between maternal psychological distress (i.e. anxiety and depressive symptoms) and maternal cortisol levels during pregnancy using a mixed models approach. METHOD: During three pregnancy trimesters, mothers (N = 170) collected four salivary samples for two consecutive days. Mothers reported symptoms of anxiety and depression three times during pregnancy (at 13.3 ± 1.1, 20.2 ± 1.5, and 33.8 ± 1.5 weeks of pregnancy, respectively) using the anxiety subscale of the Symptom Checklist (SCL-90), the Spielberger State and Trait Anxiety Inventory (STAI), and the Edinburgh Postnatal Depression Scale (EPDS). Specific fears and worries during pregnancy were measured with the short version of the Pregnancy Related Anxiety Questionnaire (PRAQ-R). RESULTS: We found a significant effect of SCL-90 anxiety subscale on cortisol levels at awakening (p = .008), indicating that mothers with higher anxiety showed lower cortisol at awakening. Maternal psychological variables explained 10.5% of the variance at the person level in awakening cortisol level, but none in the overall diurnal cortisol model. CONCLUSION: More research is necessary to unravel the underlying mechanisms of the association between maternal psychological distress and cortisol and the search for mechanisms other than the HPA-axis should be continued and extended.status: publishe

    Transcripts of 28 interviews with researchers who fabricated data for an experiment

    No full text
    For an experiment we recently conducted, we asked researchers to fabricate data for a Stroop experiment. The purpose of this experiment was to test whether we could use statistics to discern the fabricated data from genuine data we collected from Many Labs 3. We also interviewed the researchers about how they fabricated data, in order to learn how researchers actually fabricate data. We share these transcripts here for maximum reuse under a CC 0 license

    Preregistration in practice: A comparison of preregistered and non-preregistered studies in psychology

    No full text
    Preregistration has gained traction as one of the most promising solutions to improve the replicability of scientific effects. In this project, we compared 193 psychology studies that earned a Preregistration Challenge prize or preregistration badge to 193 related studies that were not preregistered. In contrast to our theoretical expectations and prior research, we did not find that preregistered studies had a lower proportion of positive results (Hypothesis 1), smaller effect sizes (Hypothesis 2), or fewer statistical errors (Hypothesis 3) than non-preregistered studies. Supporting our Hypotheses 4 and 5, we found that preregistered studies more often contained power analyses and typically had larger sample sizes than non-preregistered studies. Finally, concerns about the publishability and impact of preregistered studies seem unwarranted, as preregistered studies did not take longer to publish and scored better on several impact measures. Overall, our data indicate that preregistration has beneficial effects in the realm of statistical power and impact, but we did not find robust evidence that preregistration prevents p-hacking and HARKing (Hypothesizing After the Results are Known).</p

    Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis

    No full text
    In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists’ gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed

    Justify your alpha

    No full text
    In response to recommendations to redefine statistical significance to P ≤ 0.005, we propose that researchers should transparently report and justify all choices they make when designing a study, including the alpha level

    Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis

    Get PDF
    In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists’ gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed

    Many Labs 2: Investigating Variation in Replicability Across Samples and Settings

    No full text
    We conducted preregistered replications of 28 classic and contemporary published findings, with protocols that were peer reviewed in advance, to examine variation in effect magnitudes across samples and settings. Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories. Using the conventional criterion of statistical significance (p &lt; .05), we found that 15 (54%) of the replications provided evidence of a statistically significant effect in the same direction as the original finding. With a strict significance criterion (p &lt; .0001), 14 (50%) of the replications still provided such evidence, a reflection of the extremely high-powered design. Seven (25%) of the replications yielded effect sizes larger than the original ones, and 21 (75%) yielded effect sizes smaller than the original ones. The median comparable Cohen’s ds were 0.60 for the original findings and 0.15 for the replications. The effect sizes were small (&lt; 0.20) in 16 of the replications (57%), and 9 effects (32%) were in the direction opposite the direction of the original effect. Across settings, the Q statistic indicated significant heterogeneity in 11 (39%) of the replication effects, and most of those were among the findings with the largest overall effect sizes; only 1 effect that was near zero in the aggregate showed significant heterogeneity according to this measure. Only 1 effect had a tau value greater than .20, an indication of moderate heterogeneity. Eight others had tau values near or slightly above .10, an indication of slight heterogeneity. Moderation tests indicated that very little heterogeneity was attributable to the order in which the tasks were performed or whether the tasks were administered in lab versus online. Exploratory comparisons revealed little heterogeneity between Western, educated, industrialized, rich, and democratic (WEIRD) cultures and less WEIRD cultures (i.e., cultures with relatively high and low WEIRDness scores, respectively). Cumulatively, variability in the observed effect sizes was attributable more to the effect being studied than to the sample or setting in which it was studied
    corecore