11 research outputs found

    Correcting Judgment Correctives in National Security Intelligence

    Get PDF
    Intelligence analysts, like other professionals, form norms that define standards of tradecraft excellence. These norms, however, have evolved in an idiosyncratic manner that reflects the influence of prominent insiders who had keen psychological insights but little appreciation for how to translate those insights into testable hypotheses. The net result is that the prevailing tradecraft norms of best practice are only loosely grounded in the science of judgment and decision-making. The “common sense” of prestigious opinion leaders inside the intelligence community has pre-empted systematic validity testing of the training techniques and judgment aids endorsed by those opinion leaders. Drawing on the scientific literature, we advance hypotheses about how current best practices could well be reducing rather than increasing the quality of analytic products. One set of hypotheses pertain to the failure of tradecraft training to recognize the most basic threat to accuracy: measurement error in the interpretation of the same data and in the communication of interpretations. Another set of hypotheses focuses on the insensitivity of tradecraft training to the risk that issuing broad-brush, one-directional warnings against bias (e.g., over-confidence) will be less likely to encourage self-critical, deliberative cognition than simple response-threshold shifting that yields the mirror-image bias (e.g., under-confidence). Given the magnitude of the consequences of better and worse intelligence analysis flowing to policy-makers, we see a compelling case for greater funding of efforts to test what actually works

    Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

    Get PDF
    Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences

    How the "wisdom of the inner crowd" can boost accuracy of confidence judgments

    Get PDF

    Applying Cognitive Measures In Counterfactual Prediction

    Get PDF
    Counterfactual reasoning can be used in task-switching scenarios, such as design and planning tasks, to learn from past behavior, predict future performance, and customize interventions leading to enhanced performance. Previous research has focused on external factors and personality traits; there is a lack of research exploring how the decision-making process relates to both task-switching and counterfactual predictions. The purpose of this dissertation is to describe and explain individual differences in task-switching strategy and cognitive processes using machine learning techniques and linear ballistic accumulator (LBA) models, respectively, and apply those results in counterfactual models to predict behavior. Applying machine learning techniques to real-world task-switching data identifies a pattern of individual strategies that predicts out-of-sample clustering better than random assignment and identifies the most important factors contributing to the strategies. Comparing parameter estimates from several different LBA models, on both simulated and real data, indicates that a model based on information foraging theory that assumes all tasks are evaluated simultaneously and holistically best explains task-switching behavior. The resulting parameter values provide evidence that people have a switch-avoidance tendency, as reported in previous research, but also show how this tendency varies by participant. Including parameters that describe individual strategies and cognitive mechanisms in counterfactual prediction models provides little benefit over a baseline intercept-only model to predict a holdout dataset about real-world task switching behavior and performance, which may be due to the complexity and noise in the data. The methods developed in this research provide new opportunities to model and understand cognitive processes for decision-making strategies based on information foraging theory, which has not been considered previously. The results from this research can be applied to future task-switching scenarios as well as other decision-making tasks, both in a laboratory setting as well as the real-world, and have implications for understanding how these decisions are made

    Assessing Credibility In Subjective Probability Judgment

    Get PDF
    Subjective probability judgments (SPJs) are an essential component of decision making under uncertainty. Yet, research shows that SPJs are vulnerable to a variety of errors and biases. From a practical perspective, this exposes decision makers to risk: if SPJs are (reasonably) valid, then expectations and choices will be rational; if they are not, then expectations may be erroneous and choices suboptimal. However, existing methods for evaluating SPJs depend on information that is typically not available to decision makers (e.g., ground truth; correspondence criteria). To address this issue, I develop a method for evaluating SPJs based on a construct I call credibility. At the conceptual level, credibility describes the relationship between an individual’s SPJs and the most defensible beliefs that one could hold, given all available information. Thus, coefficients describing credibility (i.e., “credibility estimates”) ought to reflect an individual’s tendencies towards error and bias in judgment. To determine whether empirical models of credibility can capture this information, this dissertation examines the reliability, validity, and utility of credibility estimates derived from a model that I call the linear credibility framework. In Chapter 1, I introduce the linear credibility framework and demonstrate its potential for validity and utility in a proof-of-concept simulation. In Chapter 2, I apply the linear credibility framework to SPJs from three empirical sources and examine the reliability and validity of credibility estimates as predictors of judgmental accuracy (among other measures of “good” judgment). In Chapter 3, I use credibility estimates from the same three sources to recalibrate and improve SPJs (i.e., increase accuracy) out-of-sample. In Chapter 4, I discuss the robustness of empirical models of credibility and present two studies in which I use exploratory research methods to (a) tailor the linear credibility framework to the data at hand; and (b) boost performance. Across nine studies, I conclude that the linear credibility framework is a robust (albeit imperfect) model of credibility that can provide reliable, valid, and useful estimates of credibility. Because the linear credibility framework is an intentionally weak model, I argue that these results represent a lower-bound for the performance of empirical models of credibility, more generally

    Extracting more wisdom from the crowd

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 129-140).In many situations, from economists predicting unemployment rates to chemists estimating fuel safety, individuals have differing opinions or predictions. We consider the wisdom-of-the-crowd problem of aggregating the judgments of multiple individuals on a single question, when no outside information about their competence is available. Many standard methods select the most popular answer, after correcting for variations in confidence. Using a formal model, we prove that any such method can fail even if based on perfect Bayesian estimates of individual confidence, or, more generally, on Bayesian posterior probabilities. Our model suggests a new method for aggregating opinions: select the answer that is more popular than people predict. We derive theoretical conditions under which this new method is guaranteed to work, and generalize it to questions with more than two possible answers. We conduct empirical tests in which respondents are asked for both their own answer to some question and their prediction about the distribution of answers given by other people, and show that our new method outperforms majority and confidence-weighted voting in a range of domains including geography and trivia questions, laypeople and professionals judging art prices, and dermatologists evaluating skin lesions. We develop and evaluate a probabilistic generative model for crowd wisdom, including applying it across questions to determine individual respondent expertise and comparing it to various Bayesian hierarchical models. We extend our new crowd wisdom method to operate on domains where the answer space is unknown in advance, by having respondents predict the most common answers given by others, and discuss performance on a cognitive reflection test as a case study of this extension.by John Patrick McCoy.Ph. D
    corecore