Search CORE

8 research outputs found

Psychological Metrics for Dialog System Evaluation

Author: Ahmed Farhan
Akhtar Zuhaib
Giorgi Salvatore
Havaldar Shreya
Pan Gary
Schwartz H. Andrew
Sedoc Joao
Ungar Lyle H.
Vaidya Shalaka
Publication venue
Publication date: 15/09/2023
Field of study

We present metrics for evaluating dialog systems through a psychologically-grounded "human" lens in which conversational agents express a diversity of both states (e.g., emotion) and traits (e.g., personality), just as people do. We present five interpretable metrics from established psychology that are fundamental to human communication and relationships: emotional entropy, linguistic style and emotion matching, agreeableness, and empathy. These metrics can be applied (1) across dialogs and (2) on turns within dialogs. The psychological metrics are compared against seven state-of-the-art traditional metrics (e.g., BARTScore and BLEURT) on seven standard dialog system data sets. We also introduce a novel data set, the Three Bot Dialog Evaluation Corpus, which consists of annotated conversations from ChatGPT, GPT-3, and BlenderBot. We demonstrate that our proposed metrics offer novel information; they are uncorrelated with traditional metrics, can be used to meaningfully compare dialog systems, and lead to increased accuracy (beyond existing traditional metrics) in predicting crowd-sourced dialog judgements. The interpretability and unique signal of our psychological metrics make them a valuable tool for evaluating and improving dialog systems

arXiv.org e-Print Archive

Perseverative Thinking is Associated with Features of Spoken Language

Author: Ayelet Meron Ruscio
Betsy Stade
Lyle H Ungar
Shreya Havaldar
Publication venue: OSF
Publication date: 14/12/2022
Field of study

Perseverative thinking (PT) is a process that consists of difficulty disengaging from negative thinking; two common forms are worry and rumination. Existing measures of PT require individuals to report on their own thought processes, a method that may be subject to bias or errors. An unobtrusive, behavioral measure of PT would circumvent these biases, improving our ability to detect PT. One promising behavioral method is computational linguistic analysis, which has recently been used to investigate personality and mental health constructs (e.g., Guntuku et al., 2017; Park et al., 2015). Evidence from the co-rumination and expressed worry literatures (e.g., Parkinson & Simons, 2012; Spendelow et al., 2017), combined with the fact that PT is verbal-linguistic in nature (Ehring & Watkins, 2008), hints that PT may be particularly well-suited for detection in natural language. In this project, we will examine linguistic correlates of PT build and test a language-based model of PT

OSF Preprints

Recommended from our members

Hatred is in the Eye of the Annotator: Hate Speech Classifiers Learn Human-LikeSocial Stereotypes

Author: Atari Mohammad
Davani Aida Mostafazadeh
Dehghani Morteza
Havaldar Shreya
Kennedy Brendan
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Social stereotypes impact individuals’ judgement about different social groups. One area where such stereotyping has acritical impact is in hate speech detection, in which human annotations of text are used to train machine learning models.Such models are likely to be biased in the same ways that humans are biased in their judgments of social groups. Inthis research, we investigate the effect of stereotypes of social groups on the performance of expert annotators in a largecorpus of annotated hate speech. We also examine the effect of these stereotypes on unintended bias of hate speechclassifiers. To this end, we show how language-encoded stereotypes, associated with social groups, lead to disagreementsin identifying hate speech. Lastly, we analyze how inconsistencies in annotations propagate to a supervised classifier whenhuman-generated labels are used to train a hate speech detection model

eScholarship - University of California

Multilingual Language Models are not Multicultural: A Case Study in Emotion

Author: Guntuku Langchen Liu Sharath Chandra
Havaldar Shreya
Rai Sunny
Singhal Bhumika
Ungar Lyle
Publication venue
Publication date: 03/07/2023
Field of study

Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddings obtained from LMs (e.g., XLM-RoBERTa) are Anglocentric, and generative LMs (e.g., ChatGPT) reflect Western norms, even when responding to prompts in other languages. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion and we highlight possible research directions towards correcting this.Comment: Accepted to WASSA at ACL 202

arXiv.org e-Print Archive

Moral Foundations Twitter Corpus

Author: Aida Mostafazadeh Davani
Arineh Mirinjian
Brendan Kennedy
Christian Leong
Christina Park
Gabriela Moreno
Gwenyth Portillo-Wightman
Jenna Chin
Joseph Hoover
Jun Yen Leung
Leigh Yeh
Madelyn Mendlen
Mohammad Atari
Morteza Dehghani
Shreya Havaldar
Tingyee E. Chang
Ying Lin
Zahra Kamel
Publication venue: 'Center for Open Science'
Publication date: 15/05/2019
Field of study

OSF Preprints

Moral Foundations Twitter Corpus: A collection of 35k tweets annotated for moral sentiment

Author: Aida Mostafazadeh Davani
Arineh Mirinjian
Brendan Kennedy
Christian Leong
Christina Park
Gabriela Moreno
Gwenyth Portillo-Wightman
Jenna Chin
Joseph Hoover
Jun Yen Leung
Leigh Yeh
Madelyn Mendlen
Mohammad Atari
Morteza Dehghani
Shreya Havaldar
Tingyee E. Chang
Ying Lin
Zahra Kamel
Publication venue: PsyArXiv
Publication date: 22/08/2019
Field of study

Research has shown that accounting for moral sentiment in natural language can yield insight into a variety of on- and off-line phenomena, such as message diffusion, protest dynamics, and social distancing. However, measuring moral sentiment in natural language is challenging and the difficulty of this task is exacerbated by the limited availability of annotated data. To address this issue, we introduce the Moral Foundations Twitter Corpus, a collection of 35,108 tweets that have been curated from seven distinct domains of discourse and hand-annotated by at least three trained annotators for 10 categories of moral sentiment. To facilitate investigations of annotator response dynamics, we also provide psychological and demographic meta-data for each annotator. Finally, we report moral sentiment classification baselines for this corpus using a range of popular methodologies

PsyArxiv

Introducing the Gab Hate Corpus: Defining and applying hate-based rhetoric to social media posts at scale

Author: Adam Omary
Aida Azatian
Aida Mostafazadeh Davani
Ali Omrani
Alyzeh Hussain
Austin Lara
Brendan Kennedy
Christina Park
Clarisa Wijaya
Elaine Gonzalez
gabriel olmos
Gwenyth Portillo-Wightman
Joseph Hoover
Kris Coombs
Leigh Yeh
Mohammad Atari
Morteza Dehghani
Shreya Havaldar
Xin Wang
Yehsong Kim
Yong Zhang
Publication venue: PsyArXiv
Publication date: 19/01/2022
Field of study

We present the Gab Hate Corpus (GHC), consisting of 27,665 posts from the social network service gab.com, each annotated for the presence of “hate-based rhetoric” by a minimum of three annotators. Posts were labeled according to a coding typology derived from a synthesis of hate speech definitions across legal precedent, previous hate speech coding typologies, and definitions from psychology and sociology, comprising hierarchical labels indicating dehumanizing and violent speech as well as indicators of targeted groups and rhetorical framing. We provide inter-annotator agreement statistics and perform a classification analysis in order to validate the corpus and establish performance baselines. The GHC complements existing hate speech datasets in its theoretical grounding and by providing a large, representative sample of richly annotated social media posts

PsyArxiv

The Gab Hate Corpus

Author: Adam Omary
Aida Azatian
Aida Mostafazadeh Davani
Ali Omrani
Alyzeh Hussain
Austin Lara
Beth Meyerowitz
Brendan Kennedy
Christina Park
Clarisa Wijaya
Elaine Gonzalez
gabriel olmos
Gwenyth Portillo-Wightman
Joseph Hoover
Kris Coombs
Leigh Yeh
Mohammad Atari
Morteza Dehghani
Shreya Havaldar
Xin Wang
Yehsong Kim
Yong Zhang
Publication venue: OSF
Publication date: 12/04/2022
Field of study

The growing prominence of online hate speech is a threat to a safe and just society. This endangering phenomenon requires collaboration across the sciences in order to generate evidence-based knowledge of, and policies for, the dissemination of hatred in online spaces. To foster such collaborations, here we present the Gab Hate Corpus (GHC), consisting of 27,665 posts from the social network service gab.ai, each annotated by a minimum of three trained annotators. Annotators were trained to label posts according to a coding typology derived from a synthesis of hate speech definitions across legal, computational, psychological, and sociological research. We detail the development of the corpus, describe the resulting distributions of hate-based rhetoric, target group, and rhetorical framing labels, and establish baseline classification performance for each using standard natural language processing methods. The GHC, which is the largest theoretically-justified, annotated corpus of hate speech to date, provides opportunities for training and evaluating hate speech classifiers and for scientific inquiries into the linguistic and network components of hate speech

OSF Preprints