117 research outputs found
[Comment] Redefine statistical significance
The lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on “statistically significant” findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (e.g., multiple testing, P-hacking, publication bias, and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: Statistical standards of evidence for claiming discoveries in many fields of science are simply too low. Associating “statistically significant” findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems.
For fields where the threshold for defining statistical significance is P<0.05, we propose a change to P<0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called “significant” but do not meet the new threshold should instead be called “suggestive.” While statisticians have known the relative weakness of using P≈0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new (1, 2), a critical mass of researchers now endorse this change.
We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (e.g., genomics and high-energy physics research; see Potential Objections below).
We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P-values. However, changing the P-value threshold is simple and might quickly achieve broad acceptance
On the interpretation of removable interactions: A survey of the field 33 years after Loftus
In a classic 1978 Memory &Cognition article, Geoff Loftus explained why noncrossover interactions are removable. These removable interactions are tied to the scale of measurement for the dependent variable and therefore do not allow unambiguous conclusions about latent psychological processes. In the present article, we present concrete examples of how this insight helps prevent experimental psychologists from drawing incorrect conclusions about the effects of forgetting and aging. In addition, we extend the Loftus classification scheme for interactions to include those on the cusp between removable and nonremovable. Finally, we use various methods (i.e., a study of citation histories, a questionnaire for psychology students and faculty members, an analysis of statistical textbooks, and a review of articles published in the 2008 issue of Psychology andAging) to show that experimental psychologists have remained generally unaware of the concept of removable interactions. We conclude that there is more to interactions in a 2 × 2 design than meets the eye
Testing for DIF in a model with single peaked item characteristic curves: The parella model
marginal maximum likelihood, EM-algorithm, nonmonotone trace lines, single-peaked preference functions, latent trait theory, unfolding, DIF, differential item functioning,
Relationship between social anxiety and perceived trustworthiness
Four different patterns of biased ratings of facial expressions of emotions have been found in socially anxious participants: higher negative ratings of (1) negative, (2) neutral, and (3) positive facial expressions than nonanxious controls. As a fourth pattern, some studies have found no group differences in ratings of facial expressions of emotion. However, these studies usually employed valence and arousal ratings that arguably may be less able to reflect processing of social information. We examined the relationship between social anxiety and face ratings for perceived trustworthiness given that trustworthiness is an inherently socially relevant construct. Improving on earlier analytical strategies, we evaluated the four previously found result patterns using a Bayesian approach. Ninety-eight undergraduates rated 198 face stimuli on perceived trustworthiness. Subsequently, participants completed social anxiety questionnaires to assess the severity of social fears. Bayesian modeling indicated that the probability that social anxiety did not influence judgments of trustworthiness had at least three times more empirical support in our sample than assuming any kind of negative interpretation bias in social anxiety. We concluded that the deviant interpretation of facial trustworthiness is not a relevant aspect in social anxiety. © 2013 © 2013 Taylor & Francis
- …