Search CORE

511 research outputs found

Deception Detection Using Machine Learning

Author: Benton Ryan
Ceballos Delgado Alberto Alejandro
Glisson William
Grispos George
Mcdonald Jeffrey
Shashidhar Narasimha
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2021
Field of study

Today’s digital society creates an environment potentially conducive to the exchange of deceptive information. The dissemination of misleading information can have severe consequences on society. This research investigates the possibility of using shared characteristics among reviews, news articles, and emails to detect deception in text-based communication using machine learning techniques. The experiment discussed in this paper examines the use of Bag of Words and Part of Speech tag features to detect deception on the aforementioned types of communication using Neural Networks, Support Vector Machine, Naïve Bayesian, Random Forest, Logistic Regression, and Decision Tree. The contribution of this paper is two-fold. First, it provides initial insight into the identification of text communication cues useful in detecting deception across different types of text-based communication. Second, it provides a foundation for future research involving the application of machine learning algorithms to detect deception on different types of text communication

Scholarly Works @ SHSU (Sam Houston State University)

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Personality Recognition For Deception Detection

Author: An Guozhen
Publication venue: CUNY Academic Works
Publication date: 01/09/2018
Field of study

Personality aims at capturing stable individual characteristics, typically measurable in quantitative terms, that explain and predict observable behavioral differences. Personality has been proved to be very useful in many life outcomes, and there has been huge interests on predicting personality automatically. Previously, there are tremendous amount of approaches successfully predicting personality. However, most previous research on personality detection has used personality scores assigned by annotators based solely on the text or audio clip, and found that predicting self-reported personality is a much more difficult task than predicting observer-report personality. In our study, we will demonstrate how to accurately detect self-reported personality from speech using various technique include feature engineering and machine learning algorithms. Individual speaker differences such as personality play an important role in deception detection, adding considerably to its difficulty. We therefore hypothesize that personality scores may provide useful information to a deception classifier, helping to account for interpersonal differences in verbal and deceptive behavior. In final step of this study, we focus upon the personality differences between deceivers as well as their common characteristics. We helped collect within- and cross-cultural data to train new automatic procedures to identify deceptive behavior in American and Mandarin speakers. We examined whether personality recognition can help to predict individual differences in deceivers’ behavior. Therefore, we embedded personality recognition classifier into the deception classifier using deep neural network to improve the performance of deception detection

City University of New York

Intergroup Variability in Personality Recognition

Author: Sengupta Arundhati
Publication venue: CUNY Academic Works
Publication date: 01/05/2018
Field of study

Automatic Identification of personality in conversational speech has many applications in natural language processing such as leader identification in a meeting, adaptive dialogue systems, and dating websites. However, the widespread acceptance of automatic personality recognition through lexical and vocal characteristics is limited by the variability of error rate in a general purpose model among speakers from different demographic groups. While other work reports accuracy, we explored error rates of automatic personality recognition task using classification models for different genders and native language groups (L1). We also present a statistical experiment showing the influence of gender and L1 on the relation between acoustic-prosodic features and NEO- FFI self-reported personality traits. Our results show the impact of demographic differences on error rate varies considerably while predicting “Big Five” personality traits from speaker’s utterances. This impact can also be observed through differences in the statistical relationship of voice characteristics with each personality inventory. These findings can be used to calibrate existing personality recognition models or to develop new models that are robust to intergroup variability

City University of New York

Recommended from our members

Deception in Spoken Dialogue: Classification and Individual Differences

Author: Levitan Sarah Ita
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Automatic deception detection is an important problem with far-reaching implications in many areas, including law enforcement, military and intelligence agencies, social services, and politics. Despite extensive efforts to develop automated deception detection technologies, there have been few objective successes. This is likely due to the many challenges involved, including the lack of large, cleanly recorded corpora; the difficulty of acquiring ground truth labels; and major differences in incentives for lying in the laboratory vs. lying in real life. Another well-recognized issue is that there are individual and cultural differences in deception production and detection, although little has been done to identify them. Human performance at deception detection is at the level of chance, making it an uncommon problem where machines can potentially outperform humans. This thesis addresses these challenges associated with research of deceptive speech. We created the Columbia X-Cultural Deception (CXD) Corpus, a large-scale collection of deceptive and non-deceptive dialogues between native speakers of Standard American English and Mandarin Chinese. This corpus enabled a comprehensive study of deceptive speech on a large scale. In the first part of the thesis, we introduce the CXD corpus and present an empirical analysis of acoustic-prosodic and linguistic cues to deception. We also describe machine learning classification experiments to automatically identify deceptive speech using those features. Our best classifier achieves classification accuracy of almost 70%, well above human performance. The second part of this thesis addresses individual differences in deceptive speech. We present a comprehensive analysis of individual differences in verbal cues to deception, and several methods for leveraging these speaker differences to improve automatic deception classification. We identify many differences in cues to deception across gender, native language, and personality. Our comparison of approaches for leveraging these differences shows that speaker-dependent features that capture a speaker's deviation from their natural speaking style can improve deception classification performance. We also develop neural network models that accurately model speaker-specific patterns of deceptive speech. The contributions of this work add substantially to our scientific understanding of deceptive speech, and have practical implications for human practitioners and automatic deception detection

Columbia University Academic Commons

A Language-Based Approach to Fake News Detection Through Interpretable Features and BRNN

Author: Kerz E.
Qiao Y.
Wiechmann D.
Publication venue
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

A Language-Based Approach to Fake News Detection Through Interpretable Features and BRNN

Author: Kerz E.
Qiao Y.
Wiechmann D.
Publication venue
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Can lies be faked? Comparing low-stakes and high-stakes deception video datasets from a Machine Learning perspective

Author: Camara Mateus Karvat
Maul Tomas Henrique
Paetzold Gustavo
Postal Adriana
Publication venue
Publication date: 18/08/2023
Field of study

Despite the great impact of lies in human societies and a meager 54% human accuracy for Deception Detection (DD), Machine Learning systems that perform automated DD are still not viable for proper application in real-life settings due to data scarcity. Few publicly available DD datasets exist and the creation of new datasets is hindered by the conceptual distinction between low-stakes and high-stakes lies. Theoretically, the two kinds of lies are so distinct that a dataset of one kind could not be used for applications for the other kind. Even though it is easier to acquire data on low-stakes deception since it can be simulated (faked) in controlled settings, these lies do not hold the same significance or depth as genuine high-stakes lies, which are much harder to obtain and hold the practical interest of automated DD systems. To investigate whether this distinction holds true from a practical perspective, we design several experiments comparing a high-stakes DD dataset and a low-stakes DD dataset evaluating their results on a Deep Learning classifier working exclusively from video data. In our experiments, a network trained in low-stakes lies had better accuracy classifying high-stakes deception than low-stakes, although using low-stakes lies as an augmentation strategy for the high-stakes dataset decreased its accuracy.Comment: 11 pages, 3 figure

arXiv.org e-Print Archive

Multimodal Depression Detection: An Investigation of Features and Fusion Techniques for Automated Systems

Author: Morales Michelle Renee
Publication venue: CUNY Academic Works
Publication date: 01/05/2018
Field of study

Depression is a serious illness that affects a large portion of the world’s population. Given the large effect it has on society, it is evident that depression is a serious health issue. This thesis evaluates, at length, how technology may aid in assessing depression. We present an in-depth investigation of features and fusion techniques for depression detection systems. We also present OpenMM: a novel tool for multimodal feature extraction. Lastly, we present novel techniques for multimodal fusion. The contributions of this work add considerably to our knowledge of depression detection systems and have the potential to improve future systems by incorporating that knowledge into their design

City University of New York

Exploiting Group Structures to Infer Social Interactions From Videos

Author: Bolonkin Maksim
Publication venue: Dartmouth Digital Commons
Publication date: 01/09/2021
Field of study

In this thesis, we consider the task of inferring the social interactions between humans by analyzing multi-modal data. Specifically, we attempt to solve some of the problems in interaction analysis, such as long-term deception detection, political deception detection, and impression prediction. In this work, we emphasize the importance of using knowledge about the group structure of the analyzed interactions. Previous works on the matter mostly neglected this aspect and analyzed a single subject at a time. Using the new Resistance dataset, collected by our collaborators, we approach the problem of long-term deception detection by designing a class of histogram-based features and a novel class of meta-features we callLiarRank. We develop a LiarOrNot model to identify spies in Resistance videos. We achieve AUCs of over 0.70 outperforming our baselines by 3% and human judges by 12%. For the problem of political deception, we first collect a dataset of videos and transcripts of 76 politicians from 18 countries making truthful and deceptive statements. We call it the Global Political Deception Dataset. We then show how to analyze the statements in a broader context by building a Video-Article-Topic graph. From this graph, we create a novel class of features called Deception Score that captures how controversial each topic is and how it affects the truthfulness of each statement. We show that our approach achieves 0.775 AUC outperforming competing baselines. Finally, we use the Resistance data to solve the problem of dyadic impression prediction. Our proposed Dyadic Impression Prediction System (DIPS) contains four major innovations: a novel class of features called emotion ranks, sign imbalance features derived from signed graphs theory, a novel method to align the facial expressions of subjects, and finally, we propose the concept of a multilayered stochastic network we call Temporal Delayed Network. Our DIPS architecture beats eight baselines from the literature, yielding statistically significant improvements of 19.9-30.8% in AUC

Dartmouth Digital Commons (Dartmouth College)