5,587 research outputs found
UniSA: Unified Generative Framework for Sentiment Analysis
Sentiment analysis is a crucial task that aims to understand people's
emotional states and predict emotional categories based on multimodal
information. It consists of several subtasks, such as emotion recognition in
conversation (ERC), aspect-based sentiment analysis (ABSA), and multimodal
sentiment analysis (MSA). However, unifying all subtasks in sentiment analysis
presents numerous challenges, including modality alignment, unified
input/output forms, and dataset bias. To address these challenges, we propose a
Task-Specific Prompt method to jointly model subtasks and introduce a
multimodal generative framework called UniSA. Additionally, we organize the
benchmark datasets of main subtasks into a new Sentiment Analysis Evaluation
benchmark, SAEval. We design novel pre-training tasks and training methods to
enable the model to learn generic sentiment knowledge among subtasks to improve
the model's multimodal sentiment perception ability. Our experimental results
show that UniSA performs comparably to the state-of-the-art on all subtasks and
generalizes well to various subtasks in sentiment analysis.Comment: Accepted to ACM MM 202
UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition
Multimodal sentiment analysis (MSA) and emotion recognition in conversation
(ERC) are key research topics for computers to understand human behaviors. From
a psychological perspective, emotions are the expression of affect or feelings
during a short period, while sentiments are formed and held for a longer
period. However, most existing works study sentiment and emotion separately and
do not fully exploit the complementary knowledge behind the two. In this paper,
we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that
unifies MSA and ERC tasks from features, labels, and models. We perform
modality fusion at the syntactic and semantic levels and introduce contrastive
learning between modalities and samples to better capture the difference and
consistency between sentiments and emotions. Experiments on four public
benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the
effectiveness of the proposed method and achieve consistent improvements
compared with state-of-the-art methods.Comment: Accepted to EMNLP 2022 main conferenc
深層学習に基づく感情会話分析に関する研究
Owning the capability to express specific emotions by a chatbot during a conversation is one of the key parts of artificial intelligence, which has an intuitive and quantifiable impact on the improvement of chatbot’s usability and user satisfaction. Enabling machines to emotion recognition in conversation is challenging, mainly because the information in human dialogue innately conveys emotions by long-term experience, abundant knowledge, context, and the intricate patterns between the affective states. Recently, many studies on neural emotional conversational models have been conducted. However, enabling the chatbot to control what kind of emotion to respond to upon its own characters in conversation is still underexplored. At this stage, people are no longer satisfied with using a dialogue system to solve specific tasks, and are more eager to achieve spiritual communication. In the chat process, if the robot can perceive the user's emotions and can accurately process them, it can greatly enrich the content of the dialogue and make the user empathize.
In the process of emotional dialogue, our ultimate goal is to make the machine understand human emotions and give matching responses. Based on these two points, this thesis explores and in-depth emotion recognition in conversation task and emotional dialogue generation task. In the past few years, although considerable progress has been made in emotional research in dialogue, there are still some difficulties and challenges due to the complex nature of human emotions. The key contributions in this thesis are summarized as below:
(1) Researchers have paid more attention to enhancing natural language models with knowledge graphs these days, since knowledge graph has gained a lot of systematic knowledge. A large number of studies had shown that the introduction of external commonsense knowledge is very helpful to improve the characteristic information. We address the task of emotion recognition in conversations using external knowledge to enhance semantics. In this work, we employ an external knowledge graph ATOMIC to extract the knowledge sources. We proposed KES model, a new framework that incorporates different elements of external knowledge and conversational semantic role labeling, where build upon them to learn interactions between interlocutors participating in a conversation. The conversation is a sequence of coherent and orderly discourses. For neural networks, the capture of long-range context information is a weakness. We adopt Transformer a structure composed of self-attention and feed forward neural network, instead of the traditional RNN model, aiming at capturing remote context information. We design a self-attention layer specialized for enhanced semantic text features with external commonsense knowledge. Then, two different networks composed of LSTM are responsible for tracking individual internal state and context external state. In addition, the proposed model has experimented on three datasets in emotion detection in conversation. The experimental results show that our model outperforms the state-of-the-art approaches on most of the tested datasets.
(2) We proposed an emotional dialogue model based on Seq2Seq, which is improved from three aspects: model input, encoder structure, and decoder structure, so that the model can generate responses with rich emotions, diversity, and context. In terms of model input, emotional information and location information are added based on word vectors. In terms of the encoder, the proposed model first encodes the current input and sentence sentiment to generate a semantic vector, and additionally encodes the context and sentence sentiment to generate a context vector, adding contextual information while ensuring the independence of the current input. On the decoder side, attention is used to calculate the weights of the two semantic vectors separately and then decode, to fully integrate the local emotional semantic information and the global emotional semantic information. We used seven objective evaluation indicators to evaluate the model's generation results, context similarity, response diversity, and emotional response. Experimental results show that the model can generate diverse responses with rich sentiment, contextual associations
InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework
The development of emotion recognition in dialogue (ERC) has been
consistently hindered by the complexity of pipeline designs, leading to ERC
models that often overfit to specific datasets and dialogue patterns. In this
study, we propose a novel approach, namely
InstructERC, to reformulates the ERC task from a discriminative framework to
a generative framework based on Large Language Models (LLMs) . InstructERC has
two significant contributions: Firstly, InstructERC introduces a simple yet
effective retrieval template module, which helps the model explicitly integrate
multi-granularity dialogue supervision information by concatenating the
historical dialog content, label statement, and emotional domain demonstrations
with high semantic similarity. Furthermore, we introduce two additional emotion
alignment tasks, namely speaker identification and emotion prediction tasks, to
implicitly model the dialogue role relationships and future emotional
tendencies in conversations. Our LLM-based plug-and-play plugin framework
significantly outperforms all previous models and achieves comprehensive SOTA
on three commonly used ERC datasets. Extensive analysis of parameter-efficient
and data-scaling experiments provide empirical guidance for applying
InstructERC in practical scenarios. Our code will be released after blind
review
MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations
We study conversational dialog in which there are many possible responses to
a given history. We present the MultiTalk Dataset, a corpus of over 320,000
sentences of written conversational dialog that balances a high branching
factor (10) with several conversation turns (6) through selective branch
continuation. We make multiple contributions to study dialog generation in the
highly branching setting. In order to evaluate a diverse set of generations, we
propose a simple scoring algorithm, based on bipartite graph matching, to
optimally incorporate a set of diverse references. We study multiple language
generation tasks at different levels of predictive conversation depth, using
textual attributes induced automatically from pretrained classifiers. Our
culminating task is a challenging theory of mind problem, a controllable
generation task which requires reasoning about the expected reaction of the
listener.Comment: 7 pages, AAAI-2
- …