100 research outputs found
Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
The availability of an enormous quantity of multimodal data and its widespread applications, automatic sentiment analysis and emotion classification in the conversation has become an interesting research topic among the research community. The interlocutor state, context state between the neighboring utterances and multimodal fusion play an important role in multimodal sentiment analysis and emotion detection in conversation. In this article, the recurrent neural network (RNN) based method is developed to capture the interlocutor state and contextual state between the utterances. The pair-wise attention mechanism is used to understand the relationship between the modalities and their importance before fusion. First, two-two combinations of modalities are fused at a time and finally, all the modalities are fused to form the trimodal representation feature vector. The experiments are conducted on three standard datasets such as IEMOCAP, CMU-MOSEI, and CMU-MOSI. The proposed model is evaluated using two metrics such as accuracy and F1-Score and the results demonstrate that the proposed model performs better than the standard baselines
Multimodal Sentiment Analysis: Addressing Key Issues and Setting up the Baselines
We compile baselines, along with dataset split, for multimodal sentiment
analysis. In this paper, we explore three different deep-learning based
architectures for multimodal sentiment classification, each improving upon the
previous. Further, we evaluate these architectures with multiple datasets with
fixed train/test partition. We also discuss some major issues, frequently
ignored in multimodal sentiment analysis research, e.g., role of
speaker-exclusive models, importance of different modalities, and
generalizability. This framework illustrates the different facets of analysis
to be considered while performing multimodal sentiment analysis and, hence,
serves as a new benchmark for future research in this emerging field.Comment: IEEE Intelligence Systems. arXiv admin note: substantial text overlap
with arXiv:1707.0953
Multi-attention Recurrent Network for Human Communication Comprehension
Human face-to-face communication is a complex multimodal signal. We use words
(language modality), gestures (vision modality) and changes in tone (acoustic
modality) to convey our intentions. Humans easily process and understand
face-to-face communication, however, comprehending this form of communication
remains a significant challenge for Artificial Intelligence (AI). AI must
understand each modality and the interactions between them that shape human
communication. In this paper, we present a novel neural architecture for
understanding human communication called the Multi-attention Recurrent Network
(MARN). The main strength of our model comes from discovering interactions
between modalities through time using a neural component called the
Multi-attention Block (MAB) and storing them in the hybrid memory of a
recurrent component called the Long-short Term Hybrid Memory (LSTHM). We
perform extensive comparisons on six publicly available datasets for multimodal
sentiment analysis, speaker trait recognition and emotion recognition. MARN
shows state-of-the-art performance on all the datasets.Comment: AAAI 2018 Oral Presentatio
- …