5,049 research outputs found
Targeted aspect based multimodal sentiment analysis:an attention capsule extraction and multi-head fusion network
Multimodal sentiment analysis has currently identified its significance in a
variety of domains. For the purpose of sentiment analysis, different aspects of
distinguishing modalities, which correspond to one target, are processed and
analyzed. In this work, we propose the targeted aspect-based multimodal
sentiment analysis (TABMSA) for the first time. Furthermore, an attention
capsule extraction and multi-head fusion network (EF-Net) on the task of TABMSA
is devised. The multi-head attention (MHA) based network and the ResNet-152 are
employed to deal with texts and images, respectively. The integration of MHA
and capsule network aims to capture the interaction among the multimodal
inputs. In addition to the targeted aspect, the information from the context
and the image is also incorporated for sentiment delivered. We evaluate the
proposed model on two manually annotated datasets. the experimental results
demonstrate the effectiveness of our proposed model for this new task
Multimodal Sentiment Analysis Based on Deep Learning: Recent Progress
Multimodal sentiment analysis is an important research topic in the field of NLP, aiming to analyze speakers\u27 sentiment tendencies through features extracted from textual, visual, and acoustic modalities. Its main methods are based on machine learning and deep learning. Machine learning-based methods rely heavily on labeled data. But deep learning-based methods can overcome this shortcoming and capture the in-depth semantic information and modal characteristics of the data, as well as the interactive information between multimodal data. In this paper, we survey the deep learning-based methods, including fusion of text and image and fusion of text, image, audio, and video. Specifically, we discuss the main problems of these methods and the future directions. Finally, we review the work of multimodal sentiment analysis in conversation
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
Humor is a unique and creative communicative behavior displayed during social
interactions. It is produced in a multimodal manner, through the usage of words
(text), gestures (vision) and prosodic cues (acoustic). Understanding humor
from these three modalities falls within boundaries of multimodal language; a
recent research trend in natural language processing that models natural
language as it happens in face-to-face communication. Although humor detection
is an established research area in NLP, in a multimodal context it is an
understudied area. This paper presents a diverse multimodal dataset, called
UR-FUNNY, to open the door to understanding multimodal language used in
expressing humor. The dataset and accompanying studies, present a framework in
multimodal humor detection for the natural language processing community.
UR-FUNNY is publicly available for research
Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification
National Research Foundation (NRF) Singapor
Syntax-aware Hybrid prompt model for Few-shot multi-modal sentiment analysis
Multimodal Sentiment Analysis (MSA) has been a popular topic in natural
language processing nowadays, at both sentence and aspect level. However, the
existing approaches almost require large-size labeled datasets, which bring
about large consumption of time and resources. Therefore, it is practical to
explore the method for few-shot sentiment analysis in cross-modalities.
Previous works generally execute on textual modality, using the prompt-based
methods, mainly two types: hand-crafted prompts and learnable prompts. The
existing approach in few-shot multi-modality sentiment analysis task has
utilized both methods, separately. We further design a hybrid pattern that can
combine one or more fixed hand-crafted prompts and learnable prompts and
utilize the attention mechanisms to optimize the prompt encoder. The
experiments on both sentence-level and aspect-level datasets prove that we get
a significant outperformance
- …