11 research outputs found
Correcting Judgment Correctives in National Security Intelligence
Intelligence analysts, like other professionals, form norms that define standards of tradecraft excellence. These norms, however, have evolved in an idiosyncratic manner that reflects the influence of prominent insiders who had keen psychological insights but little appreciation for how to translate those insights into testable hypotheses. The net result is that the prevailing tradecraft norms of best practice are only loosely grounded in the science of judgment and decision-making. The “common sense” of prestigious opinion leaders inside the intelligence community has pre-empted systematic validity testing of the training techniques and judgment aids endorsed by those opinion leaders. Drawing on the scientific literature, we advance hypotheses about how current best practices could well be reducing rather than increasing the quality of analytic products. One set of hypotheses pertain to the failure of tradecraft training to recognize the most basic threat to accuracy: measurement error in the interpretation of the same data and in the communication of interpretations. Another set of hypotheses focuses on the insensitivity of tradecraft training to the risk that issuing broad-brush, one-directional warnings against bias (e.g., over-confidence) will be less likely to encourage self-critical, deliberative cognition than simple response-threshold shifting that yields the mirror-image bias (e.g., under-confidence). Given the magnitude of the consequences of better and worse intelligence analysis flowing to policy-makers, we see a compelling case for greater funding of efforts to test what actually works
Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data
Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences
Recommended from our members
Leveraging Diversity to Improve the Wisdom of the Crowd
This dissertation addresses how contextualized expertise and task design can improve wisdom of the crowd estimates. The first two chapters apply the wisdom of the crowd to two related tasks that require spatial knowledge. The third chapter applies the wisdom of the crowd to a subset ranking task. In Chapter 1, I investigate how framing effects impact the wisdom of the crowd. Participants selected tiles that either represented US states or African countries in two frames, present and absent. I constructed three wisdom of the crowd estimates: an unweighted average, a confidence-weighted average, and a wisdom of the crowd within estimate that combines an individual's responses across frames. I found that combining the estimates from the two frames resulted in an improved wisdom of the crowd estimate.In Chapter 2, I build on the wisdom of the crowd application for a task that again requires spatial knowledge. Participants supplied a point estimate and a radius centered at that point estimate for where various US cities were located. Unweighted and radius-weighted wisdom of the crowd estimates were more accurate than most individuals, but the cognitive model-based wisdom of the crowd estimates tended to be even more accurate. I describe how using cognitive modeling that contextualizes expertise led to improved wisdom of the crowd estimates.In Chapter 3, I present a new extension for the Thurstone model to partial ranking data. Ranking tasks have usually had participants rank all items, but I present two different types of partial ranking tasks where either an experimenter or a participant selects the items to be ranked. I demonstrate how the Thurstone model can be used to generate wisdom of the crowd estimates, and speculate how other partial ranking tasks can be developed to better elicit diverse estimates from the crowd.In all, these chapters detail specific applications of the wisdom of the crowd effect that better contextualize expertise, elicit multiple meaningful estimates from the same individual, and improve diversity. These methods are used in conjunction with cognitive modeling to produce improved wisdom of the crowd estimates
Applying Cognitive Measures In Counterfactual Prediction
Counterfactual reasoning can be used in task-switching scenarios, such as design and planning tasks, to learn from past behavior, predict future performance, and customize interventions leading to enhanced performance. Previous research has focused on external factors and personality traits; there is a lack of research exploring how the decision-making process relates to both task-switching and counterfactual predictions. The purpose of this dissertation is to describe and explain individual differences in task-switching strategy and cognitive processes using machine learning techniques and linear ballistic accumulator (LBA) models, respectively, and apply those results in counterfactual models to predict behavior. Applying machine learning techniques to real-world task-switching data identifies a pattern of individual strategies that predicts out-of-sample clustering better than random assignment and identifies the most important factors contributing to the strategies. Comparing parameter estimates from several different LBA models, on both simulated and real data, indicates that a model based on information foraging theory that assumes all tasks are evaluated simultaneously and holistically best explains task-switching behavior. The resulting parameter values provide evidence that people have a switch-avoidance tendency, as reported in previous research, but also show how this tendency varies by participant. Including parameters that describe individual strategies and cognitive mechanisms in counterfactual prediction models provides little benefit over a baseline intercept-only model to predict a holdout dataset about real-world task switching behavior and performance, which may be due to the complexity and noise in the data. The methods developed in this research provide new opportunities to model and understand cognitive processes for decision-making strategies based on information foraging theory, which has not been considered previously. The results from this research can be applied to future task-switching scenarios as well as other decision-making tasks, both in a laboratory setting as well as the real-world, and have implications for understanding how these decisions are made
Assessing Credibility In Subjective Probability Judgment
Subjective probability judgments (SPJs) are an essential component of decision making under uncertainty. Yet, research shows that SPJs are vulnerable to a variety of errors and biases. From a practical perspective, this exposes decision makers to risk: if SPJs are (reasonably) valid, then expectations and choices will be rational; if they are not, then expectations may be erroneous and choices suboptimal. However, existing methods for evaluating SPJs depend on information that is typically not available to decision makers (e.g., ground truth; correspondence criteria). To address this issue, I develop a method for evaluating SPJs based on a construct I call credibility. At the conceptual level, credibility describes the relationship between an individual’s SPJs and the most defensible beliefs that one could hold, given all available information. Thus, coefficients describing credibility (i.e., “credibility estimates”) ought to reflect an individual’s tendencies towards error and bias in judgment. To determine whether empirical models of credibility can capture this information, this dissertation examines the reliability, validity, and utility of credibility estimates derived from a model that I call the linear credibility framework. In Chapter 1, I introduce the linear credibility framework and demonstrate its potential for validity and utility in a proof-of-concept simulation. In Chapter 2, I apply the linear credibility framework to SPJs from three empirical sources and examine the reliability and validity of credibility estimates as predictors of judgmental accuracy (among other measures of “good” judgment). In Chapter 3, I use credibility estimates from the same three sources to recalibrate and improve SPJs (i.e., increase accuracy) out-of-sample. In Chapter 4, I discuss the robustness of empirical models of credibility and present two studies in which I use exploratory research methods to (a) tailor the linear credibility framework to the data at hand; and (b) boost performance. Across nine studies, I conclude that the linear credibility framework is a robust (albeit imperfect) model of credibility that can provide reliable, valid, and useful estimates of credibility. Because the linear credibility framework is an intentionally weak model, I argue that these results represent a lower-bound for the performance of empirical models of credibility, more generally
Extracting more wisdom from the crowd
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 129-140).In many situations, from economists predicting unemployment rates to chemists estimating fuel safety, individuals have differing opinions or predictions. We consider the wisdom-of-the-crowd problem of aggregating the judgments of multiple individuals on a single question, when no outside information about their competence is available. Many standard methods select the most popular answer, after correcting for variations in confidence. Using a formal model, we prove that any such method can fail even if based on perfect Bayesian estimates of individual confidence, or, more generally, on Bayesian posterior probabilities. Our model suggests a new method for aggregating opinions: select the answer that is more popular than people predict. We derive theoretical conditions under which this new method is guaranteed to work, and generalize it to questions with more than two possible answers. We conduct empirical tests in which respondents are asked for both their own answer to some question and their prediction about the distribution of answers given by other people, and show that our new method outperforms majority and confidence-weighted voting in a range of domains including geography and trivia questions, laypeople and professionals judging art prices, and dermatologists evaluating skin lesions. We develop and evaluate a probabilistic generative model for crowd wisdom, including applying it across questions to determine individual respondent expertise and comparing it to various Bayesian hierarchical models. We extend our new crowd wisdom method to operate on domains where the answer space is unknown in advance, by having respondents predict the most common answers given by others, and discuss performance on a cognitive reflection test as a case study of this extension.by John Patrick McCoy.Ph. D