41 research outputs found
ROC curve analyses of eyewitness identification decisions: An analysis of the recent debate
How should the accuracy of eyewitness identification decisions be measured, so that best practices for identification can be determined? This fundamental question is under intense debate. One side advocates for continued use of a traditional measure of identification accuracy, known as the diagnosticity ratio, whereas the other side argues that receiver operating characteristic curves (ROCs) should be used instead because diagnosticity is confounded with response bias. Diagnosticity proponents have offered several criticisms of ROCs, which we show are either false or irrelevant to the assessment of eyewitness accuracy. We also show that, like diagnosticity, Bayesian measures of identification accuracy confound response bias with witnesses’ ability to discriminate guilty from innocent suspects. ROCs are an essential tool for distinguishing memory-based processes from decisional aspects of a response; simulations of different possible identification tasks and response strategies show that they offer important constraints on theory development
Recommended from our members
Estimating the proportion of guilty suspects and posterior probability of guilt in lineups using signal-detection models
Background
The majority of eyewitness lineup studies are laboratory-based. How well the conclusions of these studies, including the relationship between confidence and accuracy, generalize to real-world police lineups is an open question. Signal detection theory (SDT) has emerged as a powerful framework for analyzing lineups that allows comparison of witnesses’ memory accuracy under different types of identification procedures. Because the guilt or innocence of a real-world suspect is generally not known, however, it is further unknown precisely how the identification of a suspect should change our belief in their guilt. The probability of guilt after the suspect has been identified, the posterior probability of guilt (PPG), can only be meaningfully estimated if we know the proportion of lineups that include a guilty suspect, P(guilty). Recent work used SDT to estimate P(guilty) on a single empirical data set that shared an important property with real-world data; that is, no information about the guilt or innocence of the suspects was provided. Here we test the ability of the SDT model to recover P(guilty) on a wide range of pre-existing empirical data from more than 10,000 identification decisions. We then use simulations of the SDT model to determine the conditions under which the model succeeds and, where applicable, why it fails. Results
For both empirical and simulated studies, the model was able to accurately estimate P(guilty) when the lineups were fair (the guilty and innocent suspects did not stand out) and identifications of both suspects and fillers occurred with a range of confidence levels. Simulations showed that the model can accurately recover P(guilty) given data that matches the model assumptions. The model failed to accurately estimate P(guilty) under conditions that violated its assumptions; for example, when the effective size of the lineup was reduced, either because the fillers were selected to be poor matches to the suspect or because the innocent suspect was more familiar than the guilty suspect. The model also underestimated P(guilty) when a weapon was shown. Conclusions
Depending on lineup quality, estimation of P(guilty) and, relatedly, PPG, from the SDT model can range from poor to excellent. These results highlight the need to carefully consider how the similarity relations between fillers and suspects influence identifications
Assessing Theoretical Conclusions With Blinded Inference to Investigate a Potential Inference Crisis
Scientific advances across a range of disciplines hinge on the ability to make inferences about unobservable theoretical entities on the basis of empirical data patterns. Accurate inferences rely on both discovering valid, replicable data patterns and accurately interpreting those patterns in terms of their implications for theoretical constructs. The replication crisis in science has led to widespread efforts to improve the reliability of research findings, but comparatively little attention has been devoted to the validity of inferences based on those findings. Using an example from cognitive psychology, we demonstrate a blinded-inference paradigm for assessing the quality of theoretical inferences from data. Our results reveal substantial variability in experts’ judgments on the very same data, hinting at a possible inference crisis
Are There Two Kinds of Reasoning?
Two experiments addressed the issue of how deductive reasoning and inductive reasoning are related. According to the criterion-shift account, these two kinds of reasoning assess arguments along a common scale of strength, however there is a stricter criterion for saying an argument is deductively correct as opposed to just inductively strong. The method, adapted from Rips (2001), was to give two groups of participants the same set of written arguments but with either deduction or induction instructions. Signal detection and receiver operating characteristic analyses showed that the difference between conditions could not be explained in terms of a criterion shift. Instead, the deduction condition showed greater sensitivity to argument strength than did the induction condition. Implications for two-process and one-process accounts of reasoning, and relations to memory research, are discussed
Sources of Bias in the Goodman-Kruskal Gamma Coefficient Measure of Association: Implications for Studies of Metacognitive Processes
In many cognitive, metacognitive, and perceptual tasks, measurement of performance or prediction accuracy may be influenced by response bias. Signal detection theory provides a means of assessing discrimination accuracy independent of such bias, but its application crucially depends on distributional assumptions. The Goodman-Kruskal gamma coefficient, G, has been proposed as an alternative means of measuring accuracy that is free of distributional assumptions. This measure is widely used with tasks that assess metamemory or metacognition performance. We demonstrate that the empirically determined value of G systematically deviates from its actual value under realistic conditions. We introduce a distribution-specific variant of G, called Gc, to show why this bias arises. Our findings imply caution is needed when using G as a measure of accuracy and alternative measures are recommended. "Our belief is that each scientific area that has use for measures of association should, after appropriate argument and trial, settle down on those measures most useful for its needs." – Goodman and Kruskal (1954, p. 763
Memory Cognition
this article should be sent to either C. M. Rotello, Department of Psychology, Box 37710, University of Massachusetts, Amherst, MA 010037710 (e-mail: [email protected]) or E. Heit, Department of Psychology, University of Warwick, Coventry CV4 7AL, England (e-mail: [email protected]