2,785 research outputs found
Emotion Recognition in Conversation using Probabilistic Soft Logic
Creating agents that can both appropriately respond to conversations and
understand complex human linguistic tendencies and social cues has been a long
standing challenge in the NLP community. A recent pillar of research revolves
around emotion recognition in conversation (ERC); a sub-field of emotion
recognition that focuses on conversations or dialogues that contain two or more
utterances. In this work, we explore an approach to ERC that exploits the use
of neural embeddings along with complex structures in dialogues. We implement
our approach in a framework called Probabilistic Soft Logic (PSL), a
declarative templating language that uses first-order like logical rules, that
when combined with data, define a particular class of graphical model.
Additionally, PSL provides functionality for the incorporation of results from
neural models into PSL models. This allows our model to take advantage of
advanced neural methods, such as sentence embeddings, and logical reasoning
over the structure of a dialogue. We compare our method with state-of-the-art
purely neural ERC systems, and see almost a 20% improvement. With these
results, we provide an extensive qualitative and quantitative analysis over the
DailyDialog conversation dataset
GA2MIF: Graph and Attention Based Two-Stage Multi-Source Information Fusion for Conversational Emotion Detection
Multimodal Emotion Recognition in Conversation (ERC) plays an influential
role in the field of human-computer interaction and conversational robotics
since it can motivate machines to provide empathetic services. Multimodal data
modeling is an up-and-coming research area in recent years, which is inspired
by human capability to integrate multiple senses. Several graph-based
approaches claim to capture interactive information between modalities, but the
heterogeneity of multimodal data makes these methods prohibit optimal
solutions. In this work, we introduce a multimodal fusion approach named Graph
and Attention based Two-stage Multi-source Information Fusion (GA2MIF) for
emotion detection in conversation. Our proposed method circumvents the problem
of taking heterogeneous graph as input to the model while eliminating complex
redundant connections in the construction of graph. GA2MIF focuses on
contextual modeling and cross-modal modeling through leveraging Multi-head
Directed Graph ATtention networks (MDGATs) and Multi-head Pairwise Cross-modal
ATtention networks (MPCATs), respectively. Extensive experiments on two public
datasets (i.e., IEMOCAP and MELD) demonstrate that the proposed GA2MIF has the
capacity to validly capture intra-modal long-range contextual information and
inter-modal complementary information, as well as outperforms the prevalent
State-Of-The-Art (SOTA) models by a remarkable margin.Comment: 14 page
Dynamic Causal Disentanglement Model for Dialogue Emotion Detection
Emotion detection is a critical technology extensively employed in diverse
fields. While the incorporation of commonsense knowledge has proven beneficial
for existing emotion detection methods, dialogue-based emotion detection
encounters numerous difficulties and challenges due to human agency and the
variability of dialogue content.In dialogues, human emotions tend to accumulate
in bursts. However, they are often implicitly expressed. This implies that many
genuine emotions remain concealed within a plethora of unrelated words and
dialogues.In this paper, we propose a Dynamic Causal Disentanglement Model
based on hidden variable separation, which is founded on the separation of
hidden variables. This model effectively decomposes the content of dialogues
and investigates the temporal accumulation of emotions, thereby enabling more
precise emotion recognition. First, we introduce a novel Causal Directed
Acyclic Graph (DAG) to establish the correlation between hidden emotional
information and other observed elements. Subsequently, our approach utilizes
pre-extracted personal attributes and utterance topics as guiding factors for
the distribution of hidden variables, aiming to separate irrelevant ones.
Specifically, we propose a dynamic temporal disentanglement model to infer the
propagation of utterances and hidden variables, enabling the accumulation of
emotion-related information throughout the conversation. To guide this
disentanglement process, we leverage the ChatGPT-4.0 and LSTM networks to
extract utterance topics and personal attributes as observed
information.Finally, we test our approach on two popular datasets in dialogue
emotion detection and relevant experimental results verified the model's
superiority
Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
Automatic emotion recognition is an active research topic with wide range of
applications. Due to the high manual annotation cost and inevitable label
ambiguity, the development of emotion recognition dataset is limited in both
scale and quality. Therefore, one of the key challenges is how to build
effective models with limited data resource. Previous works have explored
different approaches to tackle this challenge including data enhancement,
transfer learning, and semi-supervised learning etc. However, the weakness of
these existing approaches includes such as training instability, large
performance loss during transfer, or marginal improvement.
In this work, we propose a novel semi-supervised multi-modal emotion
recognition model based on cross-modality distribution matching, which
leverages abundant unlabeled data to enhance the model training under the
assumption that the inner emotional status is consistent at the utterance level
across modalities.
We conduct extensive experiments to evaluate the proposed model on two
benchmark datasets, IEMOCAP and MELD. The experiment results prove that the
proposed semi-supervised learning model can effectively utilize unlabeled data
and combine multi-modalities to boost the emotion recognition performance,
which outperforms other state-of-the-art approaches under the same condition.
The proposed model also achieves competitive capacity compared with existing
approaches which take advantage of additional auxiliary information such as
speaker and interaction context.Comment: 10 pages, 5 figures, to be published on ACM Multimedia 202
深層学習に基づく感情会話分析に関する研究
Owning the capability to express specific emotions by a chatbot during a conversation is one of the key parts of artificial intelligence, which has an intuitive and quantifiable impact on the improvement of chatbot’s usability and user satisfaction. Enabling machines to emotion recognition in conversation is challenging, mainly because the information in human dialogue innately conveys emotions by long-term experience, abundant knowledge, context, and the intricate patterns between the affective states. Recently, many studies on neural emotional conversational models have been conducted. However, enabling the chatbot to control what kind of emotion to respond to upon its own characters in conversation is still underexplored. At this stage, people are no longer satisfied with using a dialogue system to solve specific tasks, and are more eager to achieve spiritual communication. In the chat process, if the robot can perceive the user's emotions and can accurately process them, it can greatly enrich the content of the dialogue and make the user empathize.
In the process of emotional dialogue, our ultimate goal is to make the machine understand human emotions and give matching responses. Based on these two points, this thesis explores and in-depth emotion recognition in conversation task and emotional dialogue generation task. In the past few years, although considerable progress has been made in emotional research in dialogue, there are still some difficulties and challenges due to the complex nature of human emotions. The key contributions in this thesis are summarized as below:
(1) Researchers have paid more attention to enhancing natural language models with knowledge graphs these days, since knowledge graph has gained a lot of systematic knowledge. A large number of studies had shown that the introduction of external commonsense knowledge is very helpful to improve the characteristic information. We address the task of emotion recognition in conversations using external knowledge to enhance semantics. In this work, we employ an external knowledge graph ATOMIC to extract the knowledge sources. We proposed KES model, a new framework that incorporates different elements of external knowledge and conversational semantic role labeling, where build upon them to learn interactions between interlocutors participating in a conversation. The conversation is a sequence of coherent and orderly discourses. For neural networks, the capture of long-range context information is a weakness. We adopt Transformer a structure composed of self-attention and feed forward neural network, instead of the traditional RNN model, aiming at capturing remote context information. We design a self-attention layer specialized for enhanced semantic text features with external commonsense knowledge. Then, two different networks composed of LSTM are responsible for tracking individual internal state and context external state. In addition, the proposed model has experimented on three datasets in emotion detection in conversation. The experimental results show that our model outperforms the state-of-the-art approaches on most of the tested datasets.
(2) We proposed an emotional dialogue model based on Seq2Seq, which is improved from three aspects: model input, encoder structure, and decoder structure, so that the model can generate responses with rich emotions, diversity, and context. In terms of model input, emotional information and location information are added based on word vectors. In terms of the encoder, the proposed model first encodes the current input and sentence sentiment to generate a semantic vector, and additionally encodes the context and sentence sentiment to generate a context vector, adding contextual information while ensuring the independence of the current input. On the decoder side, attention is used to calculate the weights of the two semantic vectors separately and then decode, to fully integrate the local emotional semantic information and the global emotional semantic information. We used seven objective evaluation indicators to evaluate the model's generation results, context similarity, response diversity, and emotional response. Experimental results show that the model can generate diverse responses with rich sentiment, contextual associations
- …