1,386 research outputs found
Recommended from our members
Deception in Spoken Dialogue: Classification and Individual Differences
Automatic deception detection is an important problem with far-reaching implications in many areas, including law enforcement, military and intelligence agencies, social services, and politics. Despite extensive efforts to develop automated deception detection technologies, there have been few objective successes. This is likely due to the many challenges involved, including the lack of large, cleanly recorded corpora; the difficulty of acquiring ground truth labels; and major differences in incentives for lying in the laboratory vs. lying in real life. Another well-recognized issue is that there are individual and cultural differences in deception production and detection, although little has been done to identify them. Human performance at deception detection is at the level of chance, making it an uncommon problem where machines can potentially outperform humans.
This thesis addresses these challenges associated with research of deceptive speech. We created the Columbia X-Cultural Deception (CXD) Corpus, a large-scale collection of deceptive and non-deceptive dialogues between native speakers of Standard American English and Mandarin Chinese. This corpus enabled a comprehensive study of deceptive speech on a large scale.
In the first part of the thesis, we introduce the CXD corpus and present an empirical analysis of acoustic-prosodic and linguistic cues to deception. We also describe machine learning classification experiments to automatically identify deceptive speech using those features. Our best classifier achieves classification accuracy of almost 70%, well above human performance.
The second part of this thesis addresses individual differences in deceptive speech. We present a comprehensive analysis of individual differences in verbal cues to deception, and several methods for leveraging these speaker differences to improve automatic deception classification. We identify many differences in cues to deception across gender, native language, and personality. Our comparison of approaches for leveraging these differences shows that speaker-dependent features that capture a speaker's deviation from their natural speaking style can improve deception classification performance. We also develop neural network models that accurately model speaker-specific patterns of deceptive speech.
The contributions of this work add substantially to our scientific understanding of deceptive speech, and have practical implications for human practitioners and automatic deception detection
Uncovering the Hidden Cognitive Processes and Underlying Dynamics of Deception
This dissertation examines the processing demands associated with motor responding and verbal statements during deceptive (or deceptive-like) behavior. In the first set of studies presented in Chapter 2, participants motor movements in a false response paradigm revealed signatures of competition with the truth. In a second set of studies presented in Chapter 3, deceptive participants used language that reflected cognitive and social demands inherent to various types of deception. In evaluating both motor and verbal cues, this dissertation provides a comprehensive, multi-modal approach to better understanding the cognitive processes underlying deception. in conducting the motor responding studies, participants\u27 arm movements were analyzed as they navigated a motor tracking device (computer-mouse, Nintendo Wiimote). To visually co-present response options, where the true option acts as a competitor to a false target. In an initial study, competition during deceptive responding was shown to be much greater than during truthful responding. In two follow-up studies, the introduction of various task-based cognitive demands was shown to systematically modulate response performance. Specifically, these studies suggest that an intention to false respond early in question presentation will amplify competition effects, and that false responding to information in autobiographical memory is much more difficult than responding to information in general semantic memory. In the studies analyzing verbal statements, the focus is turned to large-scale linguistic analyses using automated natural language processing tools. In the first study, changes in language use were identifed between deceptive and truthful narratives using six psychologically relevant categories. A major finding was that the language of deception is adapted to faciliate ease of cognitive processing. In a second study, the indicative phrasing and semantic content of deceptive texts was extracted using a contrastive corpus analysis, whereby indicative features are defined by frequent use in one corpus while being infrequent in a comparative corpus. Two contexts of deception were evaluated. In the first context of computer-mediated conversations, decievers used a range of unique thematic elements, as in avoiding personal involvement in their narrative accounts. In the second context of attitudes towards abortion, unique thematic elements once again emerged; for example, participants tended to position their arguments in terms of formal law
A Corpus Driven Computational Intelligence Framework for Deception Detection in Financial Text
Financial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as £193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud firms of similar size and industrial grouping. Differently from other similar studies, this thesis uniquely takes a wide angled view and extracts a range of features of different categories from the corpus. These linguistic correlates of deception are uncovered using a variety of techniques and tools. Corpus linguistics methodology is applied to extract keywords and to examine linguistic structure. N-grams are extracted to draw out collocations. Readability measurement in financial text is advanced through the extraction of new indices that probe the text at a deeper level. Cognitive and perceptual processes are also picked out. Tone, intention and liquidity are gauged using customised word lists. Linguistic ratios are derived from grammatical constructs and word categories. An attempt is also made to determine ‘what’ was said as opposed to ‘how’. Further a new module is developed to condense synonyms into concepts. Lastly frequency counts from keywords unearthed from a previous content analysis study on financial narrative are also used. These features are then used to drive machine learning based classification and clustering algorithms to determine if they aid in discriminating a fraud from a non-fraud firm. The results derived from the battery of models built typically exceed classification accuracy of 70%. The above process is amalgamated into a framework. The process outlined, driven by empirical data demonstrates in a practical way how linguistic analysis could aid in fraud detection and also constitutes a unique contribution made to deception detection studies
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown
Deception detection in dialogues
In the social media era, it is commonplace to engage in written conversations. People sometimes even form connections across large distances, in writing. However, human communication is in large part non-verbal. This means it is now easier for people to hide their harmful intentions. At the same time, people can now get in touch with more people than ever before. This puts vulnerable groups at higher risk for malevolent interactions, such as bullying, trolling, or predatory behavior. Furthermore, such growing behaviors have most recently led to waves of fake news and a growing industry of deceit creators and deceit detectors. There is now an urgent need for both theory that explains deception and applications that automatically detect deception.
In this thesis I address this need with a novel application that learns from examples and detects deception reliably in natural-language dialogues. I formally define the problem of deception detection and identify several domains where it is useful. I introduce and evaluate new psycholinguistic features of deception in written dialogues for two datasets. My results shed light on the connection between language, deception, and perception. They also underline the challenges and difficulty of assessing perceptions from written text.
To automatically learn to detect deception I first introduce an expressive logical model and then present a probabilistic model that simplifies the first and is learnable from labeled examples. I introduce a belief-over-belief formalization, based on Kripke semantics and situation calculus. I use an observation model to describe how utterances are produced from the nested beliefs and intentions. This allows me to easily make inferences about these beliefs and intentions given utterances, without needing to explicitly represent perlocutions. The agents’ belief states are filtered with the observed utterances, resulting in an updated Kripke structure.
I then translate my formalization to a practical system that can learn from a small dataset and is able to perform well using very little structural background knowledge in the form of a relational dynamic Bayesian network structure
- …