3 research outputs found
A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis
Multimodal sentiment analysis has attracted increasing attention with broad
application prospects. The existing methods focuses on single modality, which
fails to capture the social media content for multiple modalities. Moreover, in
multi-modal learning, most of the works have focused on simply combining the
two modalities, without exploring the complicated correlations between them.
This resulted in dissatisfying performance for multimodal sentiment
classification. Motivated by the status quo, we propose a Deep Multi-Level
Attentive network, which exploits the correlation between image and text
modalities to improve multimodal learning. Specifically, we generate the
bi-attentive visual map along the spatial and channel dimensions to magnify
CNNs representation power. Then we model the correlation between the image
regions and semantics of the word by extracting the textual features related to
the bi-attentive visual features by applying semantic attention. Finally,
self-attention is employed to automatically fetch the sentiment-rich multimodal
features for the classification. We conduct extensive evaluations on four
real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty
Images, which verifies the superiority of our method.Comment: 11 pages, 7 figure