2,886 research outputs found
AB-GRU: An attention-based bidirectional GRU model for multimodal sentiment fusion and analysis
Multimodal sentiment analysis is an important area of artificial intelligence. It integrates multiple modalities such as text, audio, video and image into a compact multimodal representation and obtains sentiment information from them. In this paper, we improve two modules, i.e., feature extraction and feature fusion, to enhance multimodal sentiment analysis and finally propose an attention-based two-layer bidirectional GRU (AB-GRU, gated recurrent unit) multimodal sentiment analysis method. For the feature extraction module, we use a two-layer bidirectional GRU network and connect two layers of attention mechanisms to enhance the extraction of important information. The feature fusion part uses low-rank multimodal fusion, which can reduce the multimodal data dimensionality and improve the computational rate and accuracy. The experimental results demonstrate that the AB-GRU model can achieve 80.9% accuracy on the CMU-MOSI dataset, which exceeds the same model type by at least 2.5%. The AB-GRU model also possesses a strong generalization capability and solid robustness
Targeted aspect based multimodal sentiment analysis:an attention capsule extraction and multi-head fusion network
Multimodal sentiment analysis has currently identified its significance in a
variety of domains. For the purpose of sentiment analysis, different aspects of
distinguishing modalities, which correspond to one target, are processed and
analyzed. In this work, we propose the targeted aspect-based multimodal
sentiment analysis (TABMSA) for the first time. Furthermore, an attention
capsule extraction and multi-head fusion network (EF-Net) on the task of TABMSA
is devised. The multi-head attention (MHA) based network and the ResNet-152 are
employed to deal with texts and images, respectively. The integration of MHA
and capsule network aims to capture the interaction among the multimodal
inputs. In addition to the targeted aspect, the information from the context
and the image is also incorporated for sentiment delivered. We evaluate the
proposed model on two manually annotated datasets. the experimental results
demonstrate the effectiveness of our proposed model for this new task
A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews
Despite the recent advances in opinion mining for written reviews, few works
have tackled the problem on other sources of reviews. In light of this issue,
we propose a multi-modal approach for mining fine-grained opinions from video
reviews that is able to determine the aspects of the item under review that are
being discussed and the sentiment orientation towards them. Our approach works
at the sentence level without the need for time annotations and uses features
derived from the audio, video and language transcriptions of its contents. We
evaluate our approach on two datasets and show that leveraging the video and
audio modalities consistently provides increased performance over text-only
baselines, providing evidence these extra modalities are key in better
understanding video reviews.Comment: Second Grand Challenge and Workshop on Multimodal Language ACL 202
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
Humor is a unique and creative communicative behavior displayed during social
interactions. It is produced in a multimodal manner, through the usage of words
(text), gestures (vision) and prosodic cues (acoustic). Understanding humor
from these three modalities falls within boundaries of multimodal language; a
recent research trend in natural language processing that models natural
language as it happens in face-to-face communication. Although humor detection
is an established research area in NLP, in a multimodal context it is an
understudied area. This paper presents a diverse multimodal dataset, called
UR-FUNNY, to open the door to understanding multimodal language used in
expressing humor. The dataset and accompanying studies, present a framework in
multimodal humor detection for the natural language processing community.
UR-FUNNY is publicly available for research
- …