Detection of data fabrication using statistical tools

Abstract

Scientific misconduct potentially invalidates findings in many scientific fields. Improved detection of unethical practices like data fabrication is considered to deter such practices. In two studies, we investigated the diagnostic performance of various statistical methods to detect fabricated quantitative data from psychological research. In Study 1, we tested the validity of statistical methods to detect fabricated data at the study level using summary statistics. Using (arguably) genuine data from the Many Labs 1 project on the anchoring effect (k=36) and fabricated data for the same effect by our participants (k=39), we tested the validity of our newly proposed 'reversed Fisher method', variance analyses, and extreme effect sizes, and a combination of these three indicators using the original Fisher method. Results indicate that the variance analyses perform fairly well when the homogeneity of population variances is accounted for and that extreme effect sizes perform similarly well in distinguishing genuine from fabricated data. The performance of the 'reversed Fisher method' was poor and depended on the types of tests included. In Study 2, we tested the validity of statistical methods to detect fabricated data using raw data. Using (arguably) genuine data from the Many Labs 3 project on the classic Stroop task (k=21) and fabricated data for the same effect by our participants (k=28), we investigated the performance of digit analyses, variance analyses, multivariate associations, and extreme effect sizes, and a combination of these four methods using the original Fisher method. Results indicate that variance analyses, extreme effect sizes, and multivariate associations perform fairly well to excellent in detecting fabricated data using raw data, while digit analyses perform at chance levels. The two studies provide mixed results on how the use of random number generators affects the detection of data fabrication. Ultimately, we consider the variance analyses, effect sizes, and multivariate associations valuable tools to detect potential data anomalies in empirical (summary or raw) data. However, we argue against widespread (possible automatic) application of these tools, because some fabricated data may be irregular in one aspect but not in another. Considering how violations of the assumptions of fabrication detection methods may yield high false positive or false negative probabilities, we recommend comparing potentially fabricated data to genuine data on the same topic

    Similar works