7 research outputs found

    The validity of the tool “statcheck” in discovering statistical reporting inconsistencies

    Get PDF
    The R package “statcheck” (Epskamp & Nuijten, 2016) is a tool to extract statistical results from articles and check whether the reported p-value matches the accompanying test statistic and degrees of freedom. A previous study showed high interrater reliabilities (between .76 and .89) between statcheck and manual coding of inconsistencies (.76 - .89; Nuijten, Hartgerink, Van Assen, Epskamp, & Wicherts, 2016). Here we present an additional, detailed study of the validity of statcheck. In Study 1, we calculated its sensitivity and specificity. We found that statcheck’s sensitivity (true positive rate) and specificity (true negative rate) were high: between 85.3% and 100%, and between 96.0% and 100%, respectively, depending on the assumptions and settings. The overall accuracy of statcheck ranged from 96.2% to 99.9%. In Study 2, we investigated statcheck’s ability to deal with statistical corrections for multiple testing or violations of assumptions in articles. We found that the prevalence of corrections for multiple testing or violations of assumptions in psychology was higher than we initially estimated in Nuijten et al. (2016). Although we found numerous reporting inconsistencies in results corrected for violations of the sphericity assumption, we demonstrate that inconsistencies associated with statistical corrections are not what is causing the high estimates of the prevalence of statistical reporting inconsistencies in psychology

    The dire disregard of measurement invariance testing in psychological science

    Get PDF
    In psychological science, self-report scales are widely used to compare means in targeted latent constructs across time points, groups, or experimental conditions. For these scale mean comparisons (SMC) to be meaningful and unbiased, the scales should be measurement invariant across the compared time points or (experimental) groups. Measurement invariance (MI) testing checks whether the latent constructs are measured equivalently across groups or time points. Since MI is essential for meaningful comparisons, we conducted a systematic review to check whether MI is taken seriously in psychological research. Specifically, we sampled 426 psychology articles with openly available data that involved a total of 918 SMCs to (1) investigate common practices in conducting and reporting of MI testing, (2) check whether reported MI test results can be reproduced, and (3) conduct MI tests for the SMCs that enabled sufficiently powerful MI testing with the shared data. Our results indicate that (1) 4% of the 918 scales underwent MI testing across groups or time and that these tests were generally poorly reported, (2) none of the reported MI tests could be successfully reproduced, and (3) of 161 newly performed MI tests, a mere 46 (29%) reached sufficient MI (scalar invariance), and MI often failed completely (89; 55%). Thus, MI tests were rarely done and poorly reported in psychological studies, and the frequent violations of MI indicate that reported group differences cannot be solely attributed to group differences in the latent constructs. We offer recommendations on reporting MI tests and improving computational reproducibility practices

    The validity of the tool “statcheck” in discovering statistical reporting inconsistencies

    No full text
    The R package “statcheck” (Epskamp & Nuijten, 2016) is a tool to extract statistical results from articles and check whether the reported p-value matches the accompanying test statistic and degrees of freedom. A previous study showed high interrater reliabilities (between .76 and .89) between statcheck and manual coding of inconsistencies (.76 - .89; Nuijten, Hartgerink, Van Assen, Epskamp, & Wicherts, 2016). Here we present an additional, detailed study of the validity of statcheck. In Study 1, we calculated its sensitivity and specificity. We found that statcheck’s sensitivity (true positive rate) and specificity (true negative rate) were high: between 85.3% and 100%, and between 96.0% and 100%, respectively, depending on the assumptions and settings. The overall accuracy of statcheck ranged from 96.2% to 99.9%. In Study 2, we investigated statcheck’s ability to deal with statistical corrections for multiple testing or violations of assumptions in articles. We found that the prevalence of corrections for multiple testing or violations of assumptions in psychology was higher than we initially estimated in Nuijten et al. (2016). Although we found numerous reporting inconsistencies in results corrected for violations of the sphericity assumption, we demonstrate that inconsistencies associated with statistical corrections are not what is causing the high estimates of the prevalence of statistical reporting inconsistencies in psychology

    The dire disregard of measurement invariance testing in psychological science

    No full text
    In psychological science, self-report scales are widely used to compare means in targeted latent constructs across time points, groups, or experimental conditions. For these scale mean comparisons (SMC) to be meaningful and unbiased, the scales should be measurement invariant across the compared time points or (experimental) groups. Measurement invariance (MI) testing checks whether the latent constructs are measured equivalently across groups or time points. Since MI is essential for meaningful comparisons, we conducted a systematic review to check whether MI is taken seriously in psychological research. Specifically, we sampled 426 psychology articles with openly available data that involved a total of 918 SMCs to (1) investigate common practices in conducting and reporting of MI testing, (2) check whether reported MI test results can be reproduced, and (3) conduct MI tests for the SMCs that enabled sufficiently powerful MI testing with the shared data. Our results indicate that (1) 4% of the 918 scales underwent MI testing across groups or time and that these tests were generally poorly reported, (2) none of the reported MI tests could be successfully reproduced, and (3) of 161 newly performed MI tests, a mere 46 (29%) reached sufficient MI (scalar invariance), and MI often failed completely (89; 55%). Thus, MI tests were rarely done and poorly reported in psychological studies, and the frequent violations of MI indicate that reported group differences cannot be solely attributed to group differences in the latent constructs. We offer recommendations on reporting MI tests and improving computational reproducibility practices

    The meta-plot:A graphical tool for interpreting the results of a meta-analysis

    Get PDF
    The meta-plot is a descriptive visual tool for meta-analysis that provides information on the primary studies in the meta-analysis and the results of the meta-analysis. More precisely, the meta-plot portrays (i) the precision and statistical power of the primary studies in the meta-analysis, (ii) the estimate and confidence interval of a random-effects meta-analysis, (iii) the results of a cumulative random-effects meta-analysis yielding a robustness check of the meta-analytic effect size with respect to primary studies’ precision, and (iv) evidence of publication bias. After explaining the underlying logic and theory, the meta-plot is applied to two cherry-picked meta-analyses that appear to be biased and to ten meta-analyses randomly selected from the psychological literature. We recommend using the meta-plot in addition to any meta-analysis of common effect size measures, rather than variants of the funnel plot

    A many-analysts approach to the relation between religiosity and well-being

    No full text
    corecore