2,907 research outputs found
Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts
Emotion Recognition in Conversations (ERC) is an important and active
research area. Recent work has shown the benefits of using multiple modalities
(e.g., text, audio, and video) for the ERC task. In a conversation,
participants tend to maintain a particular emotional state unless some stimuli
evokes a change. There is a continuous ebb and flow of emotions in a
conversation. Inspired by this observation, we propose a multimodal ERC model
and augment it with an emotion-shift component that improves performance. The
proposed emotion-shift component is modular and can be added to any existing
multimodal ERC model (with a few modifications). We experiment with different
variants of the model, and results show that the inclusion of emotion shift
signal helps the model to outperform existing models for ERC on MOSEI and
IEMOCAP datasets.Comment: 13 pages, Accepted at Workshop on Performance and Interpretability
Evaluations of Multimodal, Multipurpose, Massive-Scale Models, COLING 202
Emotion Recognition in Conversation using Probabilistic Soft Logic
Creating agents that can both appropriately respond to conversations and
understand complex human linguistic tendencies and social cues has been a long
standing challenge in the NLP community. A recent pillar of research revolves
around emotion recognition in conversation (ERC); a sub-field of emotion
recognition that focuses on conversations or dialogues that contain two or more
utterances. In this work, we explore an approach to ERC that exploits the use
of neural embeddings along with complex structures in dialogues. We implement
our approach in a framework called Probabilistic Soft Logic (PSL), a
declarative templating language that uses first-order like logical rules, that
when combined with data, define a particular class of graphical model.
Additionally, PSL provides functionality for the incorporation of results from
neural models into PSL models. This allows our model to take advantage of
advanced neural methods, such as sentence embeddings, and logical reasoning
over the structure of a dialogue. We compare our method with state-of-the-art
purely neural ERC systems, and see almost a 20% improvement. With these
results, we provide an extensive qualitative and quantitative analysis over the
DailyDialog conversation dataset
Deep Emotion Recognition in Textual Conversations: A Survey
While Emotion Recognition in Conversations (ERC) has seen a tremendous
advancement in the last few years, new applications and implementation
scenarios present novel challenges and opportunities. These range from
leveraging the conversational context, speaker and emotion dynamics modelling,
to interpreting common sense expressions, informal language and sarcasm,
addressing challenges of real time ERC, recognizing emotion causes, different
taxonomies across datasets, multilingual ERC to interpretability. This survey
starts by introducing ERC, elaborating on the challenges and opportunities
pertaining to this task. It proceeds with a description of the emotion
taxonomies and a variety of ERC benchmark datasets employing such taxonomies.
This is followed by descriptions of the most prominent works in ERC with
explanations of the Deep Learning architectures employed. Then, it provides
advisable ERC practices towards better frameworks, elaborating on methods to
deal with subjectivity in annotations and modelling and methods to deal with
the typically unbalanced ERC datasets. Finally, it presents systematic review
tables comparing several works regarding the methods used and their
performance. The survey highlights the advantage of leveraging techniques to
address unbalanced data, the exploration of mixed emotions and the benefits of
incorporating annotation subjectivity in the learning phase
Beyond linguistic cues: fine-grained conversational emotion recognition via belief-desire modelling
Emotion recognition in conversation (ERC) is essential for dialogue systems to identify the emotions expressed by speakers. Although previous studies have made significant progress, accurate recognition and interpretation of similar fine-grained emotion properly accounting for individual variability remains a challenge. One particular under-explored area is the role of individual beliefs and desires in modelling emotion. Inspired by the Belief-Desire Theory of Emotion, we propose a novel method for conversational emotion recognition that incorporates both belief and desire to accurately identify emotions. We extract emotion-eliciting events from utterances and construct graphs that represent beliefs and desires in conversations. By applying message passing between nodes, our graph effectively models the utterance context, speaker's global state, and the interaction between emotional beliefs, desires, and utterances. We evaluate our model's performance by conducting extensive experiments on four popular ERC datasets and comparing it with multiple state-of-the-art models. The experimental results demonstrate the superiority of our proposed model and validate the effectiveness of each module in the model. © 2024 ELRA Language Resource Association: CC BY-NC 4.0
A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
Emotion recognition in conversations (ERC), the task of recognizing the
emotion of each utterance in a conversation, is crucial for building empathetic
machines. Existing studies focus mainly on capturing context- and
speaker-sensitive dependencies on the textual modality but ignore the
significance of multimodal information. Different from emotion recognition in
textual conversations, capturing intra- and inter-modal interactions between
utterances, learning weights between different modalities, and enhancing modal
representations play important roles in multimodal ERC. In this paper, we
propose a transformer-based model with self-distillation (SDT) for the task.
The transformer-based model captures intra- and inter-modal interactions by
utilizing intra- and inter-modal transformers, and learns weights between
modalities dynamically by designing a hierarchical gated fusion strategy.
Furthermore, to learn more expressive modal representations, we treat soft
labels of the proposed model as extra training supervision. Specifically, we
introduce self-distillation to transfer knowledge of hard and soft labels from
the proposed model to each modality. Experiments on IEMOCAP and MELD datasets
demonstrate that SDT outperforms previous state-of-the-art baselines.Comment: 13 pages, 10 figures. Accepted by IEEE Transactions on Multimedia
(TMM
IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings
This paper presents our approach for the SemEval-2024 Task 10: Emotion
Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition
in Conversations (ERC) task, we utilize a masked-memory network along with
speaker participation. We propose a transformer-based speaker-centric model for
the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone,
a region of the conversation that is more likely to contain the utterances
causing the emotion to flip. For sub-task 3, the proposed approach achieves a
5.9 (F1 score) improvement over the task baseline. The ablation study results
highlight the significance of various design choices in the proposed method.Comment: Accepted at SemEval 2024, NAACL 2024; 10 Page
- …
