6,657 research outputs found
Target-oriented Sentiment Classification with Sequential Cross-modal Semantic Graph
Multi-modal aspect-based sentiment classification (MABSC) is task of
classifying the sentiment of a target entity mentioned in a sentence and an
image. However, previous methods failed to account for the fine-grained
semantic association between the image and the text, which resulted in limited
identification of fine-grained image aspects and opinions. To address these
limitations, in this paper we propose a new approach called SeqCSG, which
enhances the encoder-decoder sentiment classification framework using
sequential cross-modal semantic graphs. SeqCSG utilizes image captions and
scene graphs to extract both global and local fine-grained image information
and considers them as elements of the cross-modal semantic graph along with
tokens from tweets. The sequential cross-modal semantic graph is represented as
a sequence with a multi-modal adjacency matrix indicating relationships between
elements. Experimental results show that the approach outperforms existing
methods and achieves state-of-the-art performance on two standard datasets.
Further analysis has demonstrated that the model can implicitly learn the
correlation between fine-grained information of the image and the text with the
given target. Our code is available at https://github.com/zjukg/SeqCSG.Comment: ICANN 2023, https://github.com/zjukg/SeqCS
Commonsense knowledge enhanced memory network for stance classification
Stance classification aims at identifying, in the text, the attitude toward the given targets as favorable, negative, or unrelated. In existing models for stance classification, only textual representation is leveraged, while commonsense knowledge is ignored. In order to better incorporate commonsense knowledge into stance classification, we propose a novel model named commonsense knowledge enhanced memory network, which jointly represents textual and commonsense knowledge representation of given target and text. The textual memory module in our model treats the textual representation as memory vectors, and uses attention mechanism to embody the important parts. For commonsense knowledge memory module, we jointly leverage the entity and relation embeddings learned by TransE model to take full advantage of constraints of the knowledge graph. Experimental results on the SemEval dataset show that the combination of the commonsense knowledge memory and textual memory can improve stance classification
AoM: Detecting Aspect-oriented Information for Multimodal Aspect-Based Sentiment Analysis
Multimodal aspect-based sentiment analysis (MABSA) aims to extract aspects
from text-image pairs and recognize their sentiments. Existing methods make
great efforts to align the whole image to corresponding aspects. However,
different regions of the image may relate to different aspects in the same
sentence, and coarsely establishing image-aspect alignment will introduce noise
to aspect-based sentiment analysis (i.e., visual noise). Besides, the sentiment
of a specific aspect can also be interfered by descriptions of other aspects
(i.e., textual noise). Considering the aforementioned noises, this paper
proposes an Aspect-oriented Method (AoM) to detect aspect-relevant semantic and
sentiment information. Specifically, an aspect-aware attention module is
designed to simultaneously select textual tokens and image blocks that are
semantically related to the aspects. To accurately aggregate sentiment
information, we explicitly introduce sentiment embedding into AoM, and use a
graph convolutional network to model the vision-text and text-text interaction.
Extensive experiments demonstrate the superiority of AoM to existing methods.
The source code is publicly released at https://github.com/SilyRab/AoM.Comment: Findings of ACL 202
Syntax-aware Hybrid prompt model for Few-shot multi-modal sentiment analysis
Multimodal Sentiment Analysis (MSA) has been a popular topic in natural
language processing nowadays, at both sentence and aspect level. However, the
existing approaches almost require large-size labeled datasets, which bring
about large consumption of time and resources. Therefore, it is practical to
explore the method for few-shot sentiment analysis in cross-modalities.
Previous works generally execute on textual modality, using the prompt-based
methods, mainly two types: hand-crafted prompts and learnable prompts. The
existing approach in few-shot multi-modality sentiment analysis task has
utilized both methods, separately. We further design a hybrid pattern that can
combine one or more fixed hand-crafted prompts and learnable prompts and
utilize the attention mechanisms to optimize the prompt encoder. The
experiments on both sentence-level and aspect-level datasets prove that we get
a significant outperformance
- …