Search CORE

1,131 research outputs found

A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis

Author: Vishwakarma Dinesh Kumar
Yadav Ashima
Publication venue
Publication date: 15/12/2020
Field of study

Multimodal sentiment analysis has attracted increasing attention with broad application prospects. The existing methods focuses on single modality, which fails to capture the social media content for multiple modalities. Moreover, in multi-modal learning, most of the works have focused on simply combining the two modalities, without exploring the complicated correlations between them. This resulted in dissatisfying performance for multimodal sentiment classification. Motivated by the status quo, we propose a Deep Multi-Level Attentive network, which exploits the correlation between image and text modalities to improve multimodal learning. Specifically, we generate the bi-attentive visual map along the spatial and channel dimensions to magnify CNNs representation power. Then we model the correlation between the image regions and semantics of the word by extracting the textual features related to the bi-attentive visual features by applying semantic attention. Finally, self-attention is employed to automatically fetch the sentiment-rich multimodal features for the classification. We conduct extensive evaluations on four real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty Images, which verifies the superiority of our method.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN

Author: Huddar Mahesh G.
Rajpurohit Vijay S.
Sannakki Sanjeev S.
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 28/04/2022
Field of study

The availability of an enormous quantity of multimodal data and its widespread applications, automatic sentiment analysis and emotion classification in the conversation has become an interesting research topic among the research community. The interlocutor state, context state between the neighboring utterances and multimodal fusion play an important role in multimodal sentiment analysis and emotion detection in conversation. In this article, the recurrent neural network (RNN) based method is developed to capture the interlocutor state and contextual state between the utterances. The pair-wise attention mechanism is used to understand the relationship between the modalities and their importance before fusion. First, two-two combinations of modalities are fused at a time and finally, all the modalities are fused to form the trimodal representation feature vector. The experiments are conducted on three standard datasets such as IEMOCAP, CMU-MOSEI, and CMU-MOSI. The proposed model is evaluated using two metrics such as accuracy and F1-Score and the results demonstrate that the proposed model performs better than the standard baselines

Re-UNIR

Targeted aspect based multimodal sentiment analysis:an attention capsule extraction and multi-head fusion network

Author: Cai Shaohua
Gu Donghong
Song Zhengxin
Wang Hua
Wang Jiaqian
Xiao Luwei
Yang Chi
Zhao Haoliang
Publication venue
Publication date: 13/03/2021
Field of study

Multimodal sentiment analysis has currently identified its significance in a variety of domains. For the purpose of sentiment analysis, different aspects of distinguishing modalities, which correspond to one target, are processed and analyzed. In this work, we propose the targeted aspect-based multimodal sentiment analysis (TABMSA) for the first time. Furthermore, an attention capsule extraction and multi-head fusion network (EF-Net) on the task of TABMSA is devised. The multi-head attention (MHA) based network and the ResNet-152 are employed to deal with texts and images, respectively. The integration of MHA and capsule network aims to capture the interaction among the multimodal inputs. In addition to the targeted aspect, the information from the context and the image is also incorporated for sentiment delivered. We evaluate the proposed model on two manually annotated datasets. the experimental results demonstrate the effectiveness of our proposed model for this new task

arXiv.org e-Print Archive

Victoria University Eprints Repository

Multimodal Sentiment Analysis Based on Deep Learning: Recent Progress

Author: Fan Yifan
Lin Pingping
Liu Jie
Luo Xudong
Publication venue: AIS Electronic Library (AISeL)
Publication date: 03/12/2021
Field of study

Multimodal sentiment analysis is an important research topic in the field of NLP, aiming to analyze speakers\u27 sentiment tendencies through features extracted from textual, visual, and acoustic modalities. Its main methods are based on machine learning and deep learning. Machine learning-based methods rely heavily on labeled data. But deep learning-based methods can overcome this shortcoming and capture the in-depth semantic information and modal characteristics of the data, as well as the interactive information between multimodal data. In this paper, we survey the deep learning-based methods, including fusion of text and image and fusion of text, image, audio, and video. Specifically, we discuss the main problems of these methods and the future directions. Finally, we review the work of multimodal sentiment analysis in conversation

AIS Electronic Library (AISeL)

Fuzzy Layered Convolution Neutral Network for Feature Level Fusion Based On Multimodal Sentiment Classification

Author: Ayodele Onasoga Olukayode
Harun Nor Hazlyna
Yusoff Nooraini
Publication venue: 'Penerbit UTHM'
Publication date: 12/01/2023
Field of study

Multimodal sentiment analysis (MSA) is one of the core research topics of natural language processing (NLP). MSA has become a challenge for scholars and is equally complicated for an appliance to comprehend. One study that supports MS difficulties is the MSA, which is learning opinions, emotions, and attitudes in an audio-visual format. In order words, using such diverse modalities to obtain opinions and identify emotions is necessary. Such utilization can be achieved via modality data fusion, such as feature fusion. In handling the data fusion of such diverse modalities while obtaining high performance, a typical machine learning algorithm is Deep Learning (DL), particularly the Convolutional Neutral Network (CNN), which has the capacity to handle tasks of great intricacy and difficulty. In this paper, we present a CNN architecture with an integrated layer via fuzzy methodologies for MSA, a task yet to be explored in improving the accuracy performance of CNN for diverse inputs. Experiments conducted on a benchmark multimodal dataset, MOSI, obtaining 37.5% and 81% on seven (7) class and binary classification respectively, reveals an improved accuracy performance compared with the typical CNN, which acquired 28.9% and 78%, respectively

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

Recommended from our members

A Quantum-like Multimodal Network Framework for Modeling Interaction Dynamics in Multiparty Conversational Sentiment Analysis

Author: Li Xiang
Rong Lu
Song Dawei
Wang Bo
Wang Panpan
Yu Guangliang
Zhang Peng
Zhang Yazhou
Publication venue: 'Elsevier BV'
Publication date: 01/10/2020
Field of study

Sentiment analysis in conversations is an emerging yet challenging artificial intelligence (AI) task. It aims to discover the affective states and emotional changes of speakers involved in a conversation on the basis of their opinions, which are carried by different modalities of information (e.g., a video associated with a transcript). There exists a wealth of intra- and inter-utterance interaction information that affects the emotions of speakers in a complex and dynamic way. How to accurately and comprehensively model complicated interactions is the key problem of the field. To fill this gap, in this paper, we propose a novel and comprehensive framework for multimodal sentiment analysis in conversations, called a quantum-like multimodal network (QMN), which leverages the mathematical formalism of quantum theory (QT) and a long short-term memory (LSTM) network. Specifically, the QMN framework consists of a multimodal decision fusion approach inspired by quantum interference theory to capture the interactions within each utterance (i.e., the correlations between different modalities) and a strong-weak influence model inspired by quantum measurement theory to model the interactions between adjacent utterances (i.e., how one speaker influences another). Extensive experiments are conducted on two widely used conversational sentiment datasets: the MELD and IEMOCAP datasets. The experimental results show that our approach significantly outperforms a wide range of baselines and state-of-the-art models

Open Research Online (The Open University)