228 research outputs found

    This (method) is (not) fine

    Get PDF
    SummaryIn their response to my criticism of their recent article in Journal of Biosocial Science (te Nijenhuis et al., 2017), te Nijenhuis and van den Hoek (2018) raise four points none of which concerns my main point that the method of correlated vectors (MCV) applied to item-level data represents a flawed method. Here, I discuss te Nijenhuis and van den Hoek's four points. First, I argue that my previous application of MCV to item-level data showed that the method can yield nonsensical results. Second, I note that meta-analytic corrections for sampling error, imperfect measures, restriction of range and unreliability of the vectors are futile and cannot help fix the method. Third, I note that even with perfect data, the method can yield negative correlations. Fourth, I highlight the irrelevance of te Nijenhuis and van den Hoek (2018)'s point that my comment had not been published in a peerreviewed journal by referring to my articles in 2009 and 2017 on MCV in peer-reviewed journals

    The (mis)reporting of statistical results in psychology journals

    Get PDF
    In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers’ expectations. We classified the most common errors and contacted authors to shed light on the origins of the errors

    Correcting for outcome reporting bias in a meta-analysis:A meta-regression approach

    Get PDF
    Outcome reporting bias (ORB) refers to the biasing effect caused by researchers selectively reporting outcomes within a study based on their statistical significance. ORB leads to inflated effect size estimates in meta-analysis if only the outcome with the largest effect size is reported due to ORB. We propose a new method (CORB) to correct for ORB that includes an estimate of the variability of the outcomes’ effect size as a moderator in a meta-regression model. An estimate of the variability of the outcomes’ effect size can be computed by assuming a correlation among the outcomes. Results of a Monte-Carlo simulation study showed that the effect size in meta-analyses may be severely overestimated without correcting for ORB. Estimates of CORB are close to the true effect size when overestimation caused by ORB is the largest. Applying the method to a meta-analysis on the effect of playing violent video games on aggression showed that the effect size estimate decreased when correcting for ORB. We recommend to routinely apply methods to correct for ORB in any meta-analysis. We provide annotated R code and functions to help researchers apply the CORB method.</p

    The influence of gender stereotype threat on mathematics test scores of Dutch high school students:A registered report

    Get PDF
    The effects of gender stereotype threat on mathematical test performance in the classroom have been extensively studied in several cultural contexts. Theory predicts that stereotype threat lowers girls’ performance on mathematics tests, while leaving boys’ math performance unaffected. We conducted a large-scale stereotype threat experiment in Dutch high schools (N = 2064) to study the generalizability of the effect. In this registered report, we set out to replicate the overall effect among female high school students and to study four core theoretical moderators, namely domain identification, gender identification, math anxiety, and test difficulty. Among the girls, we found neither an overall effect of stereotype threat on math performance, nor any moderated stereotype threat effects. Most variance in math performance was explained by gender, domain identification, and math identification. We discuss several theoretical and statistical explanations for these findings. Our results are limited to the studied population (i.e. Dutch high school students, age 13–14) and the studied domain (mathematics)

    A systematic review comparing two popular methods to assess a Type D personality effect

    Get PDF
    Introduction:  Type D personality, operationalized as high scores on negative affectivity (NA) and social inhibition (SI), has been associated with various medical and psychosocial outcomes. The recent failure to replicate several earlier findings could result from the various methods used to assess the Type D effect. Despite recommendations to analyze the continuous NA and SI scores, a popular approach groups people as having Type D personality or not. This method does not adequately detect a Type D effect as it is also sensitive to main effects of NA or SI only, suggesting the literature contains false positive Type D effects. Here, we systematically assess the extent of this problem. Method:  We conducted a systematic review including 44 published studies assessing a Type D effect with both a continuous and dichotomous operationalization. Results:  The dichotomous method showed poor agreement with the continuous Type D effect. Of the 89 significant dichotomous method effects, 37 (41.6%) were Type D effects according to the continuous method. The remaining 52 (58.4%) are therefore likely not Type D effects based on the continuous method, as 42 (47.2%) were main effects of NA or SI only. Conclusion:  Half of the published Type D effect according to the dichotomous method may be false positives, with only NA or SI driving the outcome

    Letting the daylight in: Reviewing the reviewers and other ways to maximize transparency in science

    Get PDF
    With the emergence of online publishing, opportunities to maximize transparency of scientific research have grown considerably. However, these possibilities are still only marginally used. We argue for the implementation of (1) peer-reviewed peer review, (2) transparent editorial hierarchies, and (3) online data publication. First, peer-reviewed peer review entails a community-wide review system in which reviews are published online and rated by peers. This ensures accountability of reviewers, thereby increasing academic quality of reviews. Second, reviewers who write many highly regarded reviews may move to higher editorial positions. Third, online publication of data ensures the possibility of independent verification of inferential claims in published papers. This counters statistical errors and overly positive reporting of statistical results. We illustrate the benefits of these strategies by discussing an example in which the classical publication system has gone awry, namely controversial IQ research. We argue that this case would have likely been avoided using more transparent publication practices. We argue that the proposed system leads to better reviews, meritocratic editorial hierarchies, and a higher degree of replicability of statistical analyses

    Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results

    Get PDF
    Background The widespread reluctance to share published research data is often hypothesized to be due to the authors' fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically. Methods and Findings We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance. Conclusions Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies

    Heterogeneity in direct replications in psychology and Its association with effect size

    Get PDF
    We examined the evidence for heterogeneity (of effect sizes) when only minor changes to sample population and settings were made between studies and explored the association between heterogeneity and average effect size in a sample of 68 meta-analyses from 13 preregistered multilab direct replication projects in social and cognitive psychology. Among the many examined effects, examples include the Stroop effect, the "verbal overshadowing" effect, and various priming effects such as "anchoring" effects. We found limited heterogeneity; 48/68 (71%) meta-analyses had nonsignificant heterogeneity, and most (49/68; 72%) were most likely to have zero to small heterogeneity. Power to detect small heterogeneity (as defined by Higgins, Thompson, Deeks, & Altman, 2003) was low for all projects (mean 43%), but good to excellent for medium and large heterogeneity. Our findings thus show little evidence of widespread heterogeneity in direct replication studies in social and cognitive psychology, suggesting that minor changes in sample population and settings are unlikely to affect research outcomes in these fields of psychology. We also found strong correlations between observed average effect sizes (standardized mean differences and log odds ratios) and heterogeneity in our sample. Our results suggest that heterogeneity and moderation of effects is unlikely for a 0 average true effect size, but increasingly likely for larger average true effect size

    Latent logistic interaction modeling::A simulation and empirical illustration of Type D personality

    Get PDF
    This study focuses on three popular methods to model interactions between two constructs containing measurement error in predicting an observed binary outcome: logistic regression using (1) observed scores, (2) factor scores, and (3) Structural Equation Modeling (SEM). It is still unclear how they compare with respect to bias and precision in the estimated interaction when item scores underlying the interaction constructs are skewed and ordinal. In this article, we investigated this issue using both a Monte Carlo simulation and an empirical illustration of the effect of Type D personality on cardiac events. Our results indicated that the logistic regression using SEM performed best in terms of bias and confidence interval coverage, especially at sample sizes of 500 or larger. Although for most methods bias increased when item scores were skewed and ordinal, SEM produced relatively unbiased interaction effect estimates when items were modeled as ordered categorical

    Detection of data fabrication using statistical tools

    Get PDF
    Scientific misconduct potentially invalidates findings in many scientific fields. Improved detection of unethical practices like data fabrication is considered to deter such practices. In two studies, we investigated the diagnostic performance of various statistical methods to detect fabricated quantitative data from psychological research. In Study 1, we tested the validity of statistical methods to detect fabricated data at the study level using summary statistics. Using (arguably) genuine data from the Many Labs 1 project on the anchoring effect (k=36) and fabricated data for the same effect by our participants (k=39), we tested the validity of our newly proposed 'reversed Fisher method', variance analyses, and extreme effect sizes, and a combination of these three indicators using the original Fisher method. Results indicate that the variance analyses perform fairly well when the homogeneity of population variances is accounted for and that extreme effect sizes perform similarly well in distinguishing genuine from fabricated data. The performance of the 'reversed Fisher method' was poor and depended on the types of tests included. In Study 2, we tested the validity of statistical methods to detect fabricated data using raw data. Using (arguably) genuine data from the Many Labs 3 project on the classic Stroop task (k=21) and fabricated data for the same effect by our participants (k=28), we investigated the performance of digit analyses, variance analyses, multivariate associations, and extreme effect sizes, and a combination of these four methods using the original Fisher method. Results indicate that variance analyses, extreme effect sizes, and multivariate associations perform fairly well to excellent in detecting fabricated data using raw data, while digit analyses perform at chance levels. The two studies provide mixed results on how the use of random number generators affects the detection of data fabrication. Ultimately, we consider the variance analyses, effect sizes, and multivariate associations valuable tools to detect potential data anomalies in empirical (summary or raw) data. However, we argue against widespread (possible automatic) application of these tools, because some fabricated data may be irregular in one aspect but not in another. Considering how violations of the assumptions of fabrication detection methods may yield high false positive or false negative probabilities, we recommend comparing potentially fabricated data to genuine data on the same topic
    corecore