4,997 research outputs found
Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network
Real-time emotion recognition (RTER) in conversations is significant for
developing emotionally intelligent chatting machines. Without the future
context in RTER, it becomes critical to build the memory bank carefully for
capturing historical context and summarize the memories appropriately to
retrieve relevant information. We propose an Attention Gated Hierarchical
Memory Network (AGHMN) to address the problems of prior work: (1) Commonly used
convolutional neural networks (CNNs) for utterance feature extraction are less
compatible in the memory modules; (2) Unidirectional gated recurrent units
(GRUs) only allow each historical utterance to have context before it,
preventing information propagation in the opposite direction; (3) The Soft
Attention for summarizing loses the positional and ordering information of
memories, regardless of how the memory bank is built. Particularly, we propose
a Hierarchical Memory Network (HMN) with a bidirectional GRU (BiGRU) as the
utterance reader and a BiGRU fusion layer for the interaction between
historical utterances. For memory summarizing, we propose an Attention GRU
(AGRU) where we utilize the attention weights to update the internal state of
GRU. We further promote the AGRU to a bidirectional variant (BiAGRU) to balance
the contextual information from recent memories and that from distant memories.
We conduct experiments on two emotion conversation datasets with extensive
analysis, demonstrating the efficacy of our AGHMN models.Comment: AAAI 2020, 8 pages, 5 figure
Deep Emotion Recognition in Textual Conversations: A Survey
While Emotion Recognition in Conversations (ERC) has seen a tremendous
advancement in the last few years, new applications and implementation
scenarios present novel challenges and opportunities. These range from
leveraging the conversational context, speaker and emotion dynamics modelling,
to interpreting common sense expressions, informal language and sarcasm,
addressing challenges of real time ERC, recognizing emotion causes, different
taxonomies across datasets, multilingual ERC to interpretability. This survey
starts by introducing ERC, elaborating on the challenges and opportunities
pertaining to this task. It proceeds with a description of the emotion
taxonomies and a variety of ERC benchmark datasets employing such taxonomies.
This is followed by descriptions of the most prominent works in ERC with
explanations of the Deep Learning architectures employed. Then, it provides
advisable ERC practices towards better frameworks, elaborating on methods to
deal with subjectivity in annotations and modelling and methods to deal with
the typically unbalanced ERC datasets. Finally, it presents systematic review
tables comparing several works regarding the methods used and their
performance. The survey highlights the advantage of leveraging techniques to
address unbalanced data, the exploration of mixed emotions and the benefits of
incorporating annotation subjectivity in the learning phase
A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
Emotion recognition in conversations (ERC), the task of recognizing the
emotion of each utterance in a conversation, is crucial for building empathetic
machines. Existing studies focus mainly on capturing context- and
speaker-sensitive dependencies on the textual modality but ignore the
significance of multimodal information. Different from emotion recognition in
textual conversations, capturing intra- and inter-modal interactions between
utterances, learning weights between different modalities, and enhancing modal
representations play important roles in multimodal ERC. In this paper, we
propose a transformer-based model with self-distillation (SDT) for the task.
The transformer-based model captures intra- and inter-modal interactions by
utilizing intra- and inter-modal transformers, and learns weights between
modalities dynamically by designing a hierarchical gated fusion strategy.
Furthermore, to learn more expressive modal representations, we treat soft
labels of the proposed model as extra training supervision. Specifically, we
introduce self-distillation to transfer knowledge of hard and soft labels from
the proposed model to each modality. Experiments on IEMOCAP and MELD datasets
demonstrate that SDT outperforms previous state-of-the-art baselines.Comment: 13 pages, 10 figures. Accepted by IEEE Transactions on Multimedia
(TMM
Multimodal Content Analysis for Effective Advertisements on YouTube
The rapid advances in e-commerce and Web 2.0 technologies have greatly
increased the impact of commercial advertisements on the general public. As a
key enabling technology, a multitude of recommender systems exists which
analyzes user features and browsing patterns to recommend appealing
advertisements to users. In this work, we seek to study the characteristics or
attributes that characterize an effective advertisement and recommend a useful
set of features to aid the designing and production processes of commercial
advertisements. We analyze the temporal patterns from multimedia content of
advertisement videos including auditory, visual and textual components, and
study their individual roles and synergies in the success of an advertisement.
The objective of this work is then to measure the effectiveness of an
advertisement, and to recommend a useful set of features to advertisement
designers to make it more successful and approachable to users. Our proposed
framework employs the signal processing technique of cross modality feature
learning where data streams from different components are employed to train
separate neural network models and are then fused together to learn a shared
representation. Subsequently, a neural network model trained on this joint
feature embedding representation is utilized as a classifier to predict
advertisement effectiveness. We validate our approach using subjective ratings
from a dedicated user study, the sentiment strength of online viewer comments,
and a viewer opinion metric of the ratio of the Likes and Views received by
each advertisement from an online platform.Comment: 11 pages, 5 figures, ICDM 201
- …