Search CORE

8 research outputs found

Towards Better Understanding of Spoken Conversations: Assessment of Emotion and Sentiment

Author: Pappagari Raghavendra Reddy
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 25/07/2022
Field of study

Emotions play a vital role in our daily life as they help us convey information impossible to express verbally to other parties. While humans can easily perceive emotions, these are notoriously difficult to define and recognize by machines. However, automatically detecting the emotion of a spoken conversation can be useful for a diverse range of applications such as human-machine interaction and conversation analysis. In this thesis, we present several approaches based on machine learning to recognize emotion from isolated utterances and long recordings. Isolated utterances are usually shorter than 10s in duration and are assumed to contain only one major emotion. One of the main obstacles in achieving high emotion recognition accuracy is the lack of large annotated data. We propose to mitigate this problem by using transfer learning and data augmentation techniques. We show that x-vector representations extracted from speaker recognition models (x-vector models) contain emotion predictive information and adapting those models provide significant improvements in emotion recognition performance. To further improve the performance, we propose a novel perceptually motivated data augmentation method, Copy-Paste on isolated utterances. This method is based on the assumption that the presence of emotions other than neutral dictates a speaker ’s overall perceived emotion in a recording. As isolated utterances are assumed to contain only one emotion, the proposed models make predictions on the utterance level. However, these models can not be directly applied to conversations that can have multiple emotions unless we know the locations of emotion boundaries. In this work, we propose to recognize emotions in the conversations by doing frame-level classification where predictions are made at regular intervals. We compare models trained on isolated utterances and conversations. We propose a data augmentation method, DiverseCatAugment based on attention operation to improve the transformer models. To further improve the performance, we incorporate the turn-taking structure of the conversations into our models. Annotating utterances with emotions is not a simple task and it depends on the number of emotions used for annotation. However, annotation schemes can be changed to reduce annotation efforts based on application. We consider one such application: predicting customer satisfaction (CSAT) in a call center conversation where the goal is to predict the overall sentiment of the customer. We conduct a comprehensive search for adequate acoustic and lexical representations at different granular levels of conversations. We show that the methods that use transfer learning (x-vectors and CSAT Tracker) perform best. Our error analysis shows that the calls where customers accomplished their goal but were still dissatisfied are the most difficult to predict correctly, and the customer’s speech is more emotional compared to the agent’s speech

JScholarship

MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation

Author: Currey Anna
Dinu Georgiana
Hsu Benjamin
Lauly Stanislas
Mayer Mia
Niu Xing
Nădejde Maria
Pappagari Raghavendra
Publication venue
Publication date: 02/11/2022
Field of study

As generic machine translation (MT) quality has improved, the need for targeted benchmarks that explore fine-grained aspects of quality has increased. In particular, gender accuracy in translation can have implications in terms of output fluency, translation accuracy, and ethics. In this paper, we introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eight widely-spoken languages. MT-GenEval complements existing benchmarks by providing realistic, gender-balanced, counterfactual data in eight language pairs where the gender of individuals is unambiguous in the input segment, including multi-sentence segments requiring inter-sentential gender agreement. Our data and code is publicly available under a CC BY SA 3.0 license.Comment: Accepted at EMNLP 2022. Data and code: https://github.com/amazon-research/machine-translation-gender-eva

arXiv.org e-Print Archive