5 research outputs found
Recommended from our members
Deception in Spoken Dialogue: Classification and Individual Differences
Automatic deception detection is an important problem with far-reaching implications in many areas, including law enforcement, military and intelligence agencies, social services, and politics. Despite extensive efforts to develop automated deception detection technologies, there have been few objective successes. This is likely due to the many challenges involved, including the lack of large, cleanly recorded corpora; the difficulty of acquiring ground truth labels; and major differences in incentives for lying in the laboratory vs. lying in real life. Another well-recognized issue is that there are individual and cultural differences in deception production and detection, although little has been done to identify them. Human performance at deception detection is at the level of chance, making it an uncommon problem where machines can potentially outperform humans.
This thesis addresses these challenges associated with research of deceptive speech. We created the Columbia X-Cultural Deception (CXD) Corpus, a large-scale collection of deceptive and non-deceptive dialogues between native speakers of Standard American English and Mandarin Chinese. This corpus enabled a comprehensive study of deceptive speech on a large scale.
In the first part of the thesis, we introduce the CXD corpus and present an empirical analysis of acoustic-prosodic and linguistic cues to deception. We also describe machine learning classification experiments to automatically identify deceptive speech using those features. Our best classifier achieves classification accuracy of almost 70%, well above human performance.
The second part of this thesis addresses individual differences in deceptive speech. We present a comprehensive analysis of individual differences in verbal cues to deception, and several methods for leveraging these speaker differences to improve automatic deception classification. We identify many differences in cues to deception across gender, native language, and personality. Our comparison of approaches for leveraging these differences shows that speaker-dependent features that capture a speaker's deviation from their natural speaking style can improve deception classification performance. We also develop neural network models that accurately model speaker-specific patterns of deceptive speech.
The contributions of this work add substantially to our scientific understanding of deceptive speech, and have practical implications for human practitioners and automatic deception detection
Asian hate speech detection on Twitter during COVID-19
Coronavirus disease 2019 (COVID-19) started in Wuhan, China, in late 2019, and after being utterly contagious in Asian countries, it rapidly spread to other countries. This disease caused governments worldwide to declare a public health crisis with severe measures taken to reduce the speed of the spread of the disease. This pandemic affected the lives of millions of people. Many citizens that lost their loved ones and jobs experienced a wide range of emotions, such as disbelief, shock, concerns about health, fear about food supplies, anxiety, and panic. All of the aforementioned phenomena led to the spread of racism and hate against Asians in western countries, especially in the United States. An analysis of social preliminary police data by the Center for the Study of Hate & Extremism at California State University shows that Anti-Asian hate crime in 16 of America’s largest cities increased by 149% in 2020. In this study, we first chose a baseline of Americans’ hate crimes against Asians on Twitter. Then we present an approach to balance the biased dataset and consequently improve the performance of tweet classification. We also have downloaded 10 million tweets through the Twitter API V-2. In this study, we have used a small portion of that, and we will use the entire dataset in the future study. In this article, three thousand tweets from our collected corpus are annotated by four annotators, including three Asian and one Asian- American. Using this data, we built predictive models of hate speech using various machine learning and deep learning methods. Our machine learning methods include Random Forest, K-nearest neighbors (KNN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Tree, and Naive Bayes. Our Deep Learning models include Basic Long-Term Short-Term Memory (LSTM), Bidirectional LSTM, Bidirectional LSTM with Drop out, Convolution, and Bidirectional Encoder Representations from Transformers (BERT). We also adjusted our dataset by filtering tweets that were ambiguous to the annotators based on low Fleiss Kappa agreement between annotators. Our final result showed that Logistic Regression achieved the best statistical machine learning performance with an F1 score of 0.72, while BERT achieved the best performance of the deep learning models, with an F1-Score of 0.85
Entrainment, dominance and alliance in supreme court hearings
A major goal of the Cognitive Infocommunication approach is to develop applications in which human and artificial cognitive systems are made to work more effectively. A critical step in this process is improving our understanding of human-human interaction so that it may be modeled more closely. Our work addresses this task by examining the role of entrainment - the propensity of conversational partners to behave like one another - in (1) the production of conversational fillers (CFs) and acoustic intensity; (2) patterns of turn-taking; and (3) Linguistic Style. markers and how all of these relate to power relations, conflict, and voting behavior in a corpus of speech produced by justices and lawyers during oral arguments of the U.S. Supreme Court in the 2001 term. We examine several different measures of entrainment in justice-lawyer pairs to see whether or not they are related to justices' favorable or unfavorable votes for the lawyer's side. While two measures (a naive measure of similarity in CF rates and global similarity in CF phonetic realizations for the entire session) show no relationship, a third, which measures local entrainment in CFs in lawyer-justice pairs, does in fact identify a significant positive relationship between entrainment and justice votes. With respect to local entrainment in intensity, we found that lawyers do entrain more to justices than justices to lawyers, although there is no greater entrainment of female lawyers than of male lawyers. When we examine the relationship between entrainment in intensity and judicial voting, we find that, when justices voted for the petitioners, there is significant evidence of entrainment by both petitioners and respondents to justices. With respect to turn-taking behavior, we find that certain patterns of overlaps in turn exchanges between justices and lawyers are correlated with justices' voting behavior for four of the justices in our corpus. Finally, we find that there are lexical cues to divisiveness within the Court itself that can distinguish cases with close verdicts from cases with unanimous verdicts. We link these results to the possibility of building cognitive info-communication interfaces that exploit features of human-human entrainment for increasing effectiveness of human-machine interactions.Fil: Beňuš, Stefan. Slovak Academy Of Sciences; EslovaquiaFil: Gravano, Agustin. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Levitan, Rivka. Columbia University; Estados UnidosFil: Levitan, Sarah Ita. Columbia University; Estados UnidosFil: Willson, Laura. Columbia University; Estados UnidosFil: Hirschberg, Julia. Columbia University; Estados Unido