3,264 research outputs found
Detecting p-hacking
We theoretically analyze the problem of testing for -hacking based on
distributions of -values across multiple studies. We provide general results
for when such distributions have testable restrictions (are non-increasing)
under the null of no -hacking. We find novel additional testable
restrictions for -values based on -tests. Specifically, the shape of the
power functions results in both complete monotonicity as well as bounds on the
distribution of -values. These testable restrictions result in more powerful
tests for the null hypothesis of no -hacking. A reanalysis of two prominent
datasets shows the usefulness of our new tests
"Learn to p-hack like the pros!"
The replication crisis has hit several scientific fields. The most systematic investigation has been done in psychology, which revealed replication rates less than 40% (Open Science Collaboration, 2015). However, the same problem has been well documented in other disciplines, for example preclinical cancer research or economics. It has been argued that one reason for the high prevalence of false-positive findings is the application of "creative" data analysis techniques that allow to present nearly any noise as significant. Researchers who use such techniques, also called "p-hacking" or "questionable research practices", have higher chances of getting things published. What is the consequence? The answer is clear. Everybody should be equipped with these powerful tools of research enhancement. This talk covers the most commonly applied p-hacking tools, and shows which work best to enhance your research output: "If you torture the data long enough, it will confess!". But be careful: recently developed tools allow the detection of p-hacking. The talk also covers some ideas how to overcome the replication crisis
Modelling publication bias and p-hacking
Publication bias and p-hacking are two well-known phenomena that strongly
affect the scientific literature and cause severe problems in meta-analyses.
Due to these phenomena, the assumptions of meta-analyses are seriously violated
and the results of the studies cannot be trusted. While publication bias is
almost perfectly captured by the weighting function selection model, p-hacking
is much harder to model and no definitive solution has been found yet. In this
paper we propose to model both publication bias and p-hacking with selection
models. We derive some properties for these models, and we compare them
formally and through simulations. Finally, two real data examples are used to
show how the models work in practice.Comment: 21 pager, 6 figure
Integrable clusters
The goal of this note is to study quantum clusters in which cluster variables
(not coefficients) commute which each other. It turns out that this property is
preserved by mutations. Remarkably, this is equivalent to the celebrated sign
coherence conjecture recently proved by M. Gross, P. Hacking, S. Keel and M.
KontsevichComment: 3 page
Incentive-Compatible Critical Values
Statistical hypothesis tests are a cornerstone of scientific research. The
tests are informative when their size is properly controlled, so the frequency
of rejecting true null hypotheses (type I error) stays below a prespecified
nominal level. Publication bias exaggerates test sizes, however. Since
scientists can typically only publish results that reject the null hypothesis,
they have the incentive to continue conducting studies until attaining
rejection. Such -hacking takes many forms: from collecting additional data
to examining multiple regression specifications, all in the search of
statistical significance. The process inflates test sizes above their nominal
levels because the critical values used to determine rejection assume that test
statistics are constructed from a single study---abstracting from -hacking.
This paper addresses the problem by constructing critical values that are
compatible with scientists' behavior given their incentives. We assume that
researchers conduct studies until finding a test statistic that exceeds the
critical value, or until the benefit from conducting an extra study falls below
the cost. We then solve for the incentive-compatible critical value (ICCV).
When the ICCV is used to determine rejection, readers can be confident that
size is controlled at the desired significance level, and that the researcher's
response to the incentives delineated by the critical value is accounted for.
Since they allow researchers to search for significance among multiple studies,
ICCVs are larger than classical critical values. Yet, for a broad range of
researcher behaviors and beliefs, ICCVs lie in a fairly narrow range
HARKing and P-Hacking: A Call for More Transparent Reporting of Studies in the Information Systems Field
While researchers are expected to look for significant results to confirm their hypotheses, some engage in intentional or unintentional HARKing (Hypothesizing After Results are Known) and p-hacking (repeated tinkering with data and retesting). If these practices are widespread, one possible result is field-wide exaggerated (inflated) results reported in Information Systems (IS) publications. In this paper, we summarize the literature in HARKing and p-hacking across different disciplines. We offer an illustrative example of how an IS study could involve HARKing and p-hacking in various stages of the project to generate a more “publishable” result. We also report on a survey targeted at IS researchers to explore their experiences and awareness of this issue. Finally, we provide recommendations and suggestions based on the review of practices in other fields and advocate for more transparency in reporting research projects, so that study results can be interpreted properly, and reproducibility and replicability can be increased
P-hacking in Clinical Trials: A Meta-Analytical Approach
Clinical trials play a decisive role in the drug approval processes. By completing a p-curve analysis of a newly compiled data set that consists of thousands of clinical trials, we substantiate that the occurrence of p-hacking in clinical trials is not merely hypothetical. Medical and pharmaceutical research consists of both primary and secondary study endpoints. The primary finding covers the main effect, which directly influences the approval process, while the secondary outcome delivers further additional information. For primary p-curves, we observed an abnormal increase in the p-value frequency at common significance thresholds, while the secondary p-curves exhibited no such anomaly
- …