5,587 research outputs found

    UniSA: Unified Generative Framework for Sentiment Analysis

    Full text link
    Sentiment analysis is a crucial task that aims to understand people's emotional states and predict emotional categories based on multimodal information. It consists of several subtasks, such as emotion recognition in conversation (ERC), aspect-based sentiment analysis (ABSA), and multimodal sentiment analysis (MSA). However, unifying all subtasks in sentiment analysis presents numerous challenges, including modality alignment, unified input/output forms, and dataset bias. To address these challenges, we propose a Task-Specific Prompt method to jointly model subtasks and introduce a multimodal generative framework called UniSA. Additionally, we organize the benchmark datasets of main subtasks into a new Sentiment Analysis Evaluation benchmark, SAEval. We design novel pre-training tasks and training methods to enable the model to learn generic sentiment knowledge among subtasks to improve the model's multimodal sentiment perception ability. Our experimental results show that UniSA performs comparably to the state-of-the-art on all subtasks and generalizes well to various subtasks in sentiment analysis.Comment: Accepted to ACM MM 202

    UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

    Full text link
    Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully exploit the complementary knowledge behind the two. In this paper, we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the effectiveness of the proposed method and achieve consistent improvements compared with state-of-the-art methods.Comment: Accepted to EMNLP 2022 main conferenc

    深層学習に基づく感情会話分析に関する研究

    Get PDF
    Owning the capability to express specific emotions by a chatbot during a conversation is one of the key parts of artificial intelligence, which has an intuitive and quantifiable impact on the improvement of chatbot’s usability and user satisfaction. Enabling machines to emotion recognition in conversation is challenging, mainly because the information in human dialogue innately conveys emotions by long-term experience, abundant knowledge, context, and the intricate patterns between the affective states. Recently, many studies on neural emotional conversational models have been conducted. However, enabling the chatbot to control what kind of emotion to respond to upon its own characters in conversation is still underexplored. At this stage, people are no longer satisfied with using a dialogue system to solve specific tasks, and are more eager to achieve spiritual communication. In the chat process, if the robot can perceive the user's emotions and can accurately process them, it can greatly enrich the content of the dialogue and make the user empathize. In the process of emotional dialogue, our ultimate goal is to make the machine understand human emotions and give matching responses. Based on these two points, this thesis explores and in-depth emotion recognition in conversation task and emotional dialogue generation task. In the past few years, although considerable progress has been made in emotional research in dialogue, there are still some difficulties and challenges due to the complex nature of human emotions. The key contributions in this thesis are summarized as below: (1) Researchers have paid more attention to enhancing natural language models with knowledge graphs these days, since knowledge graph has gained a lot of systematic knowledge. A large number of studies had shown that the introduction of external commonsense knowledge is very helpful to improve the characteristic information. We address the task of emotion recognition in conversations using external knowledge to enhance semantics. In this work, we employ an external knowledge graph ATOMIC to extract the knowledge sources. We proposed KES model, a new framework that incorporates different elements of external knowledge and conversational semantic role labeling, where build upon them to learn interactions between interlocutors participating in a conversation. The conversation is a sequence of coherent and orderly discourses. For neural networks, the capture of long-range context information is a weakness. We adopt Transformer a structure composed of self-attention and feed forward neural network, instead of the traditional RNN model, aiming at capturing remote context information. We design a self-attention layer specialized for enhanced semantic text features with external commonsense knowledge. Then, two different networks composed of LSTM are responsible for tracking individual internal state and context external state. In addition, the proposed model has experimented on three datasets in emotion detection in conversation. The experimental results show that our model outperforms the state-of-the-art approaches on most of the tested datasets. (2) We proposed an emotional dialogue model based on Seq2Seq, which is improved from three aspects: model input, encoder structure, and decoder structure, so that the model can generate responses with rich emotions, diversity, and context. In terms of model input, emotional information and location information are added based on word vectors. In terms of the encoder, the proposed model first encodes the current input and sentence sentiment to generate a semantic vector, and additionally encodes the context and sentence sentiment to generate a context vector, adding contextual information while ensuring the independence of the current input. On the decoder side, attention is used to calculate the weights of the two semantic vectors separately and then decode, to fully integrate the local emotional semantic information and the global emotional semantic information. We used seven objective evaluation indicators to evaluate the model's generation results, context similarity, response diversity, and emotional response. Experimental results show that the model can generate diverse responses with rich sentiment, contextual associations

    InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework

    Full text link
    The development of emotion recognition in dialogue (ERC) has been consistently hindered by the complexity of pipeline designs, leading to ERC models that often overfit to specific datasets and dialogue patterns. In this study, we propose a novel approach, namely InstructERC, to reformulates the ERC task from a discriminative framework to a generative framework based on Large Language Models (LLMs) . InstructERC has two significant contributions: Firstly, InstructERC introduces a simple yet effective retrieval template module, which helps the model explicitly integrate multi-granularity dialogue supervision information by concatenating the historical dialog content, label statement, and emotional domain demonstrations with high semantic similarity. Furthermore, we introduce two additional emotion alignment tasks, namely speaker identification and emotion prediction tasks, to implicitly model the dialogue role relationships and future emotional tendencies in conversations. Our LLM-based plug-and-play plugin framework significantly outperforms all previous models and achieves comprehensive SOTA on three commonly used ERC datasets. Extensive analysis of parameter-efficient and data-scaling experiments provide empirical guidance for applying InstructERC in practical scenarios. Our code will be released after blind review

    MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations

    Full text link
    We study conversational dialog in which there are many possible responses to a given history. We present the MultiTalk Dataset, a corpus of over 320,000 sentences of written conversational dialog that balances a high branching factor (10) with several conversation turns (6) through selective branch continuation. We make multiple contributions to study dialog generation in the highly branching setting. In order to evaluate a diverse set of generations, we propose a simple scoring algorithm, based on bipartite graph matching, to optimally incorporate a set of diverse references. We study multiple language generation tasks at different levels of predictive conversation depth, using textual attributes induced automatically from pretrained classifiers. Our culminating task is a challenging theory of mind problem, a controllable generation task which requires reasoning about the expected reaction of the listener.Comment: 7 pages, AAAI-2
    corecore