2,072 research outputs found
深層学習に基づく感情会話分析に関する研究
Owning the capability to express specific emotions by a chatbot during a conversation is one of the key parts of artificial intelligence, which has an intuitive and quantifiable impact on the improvement of chatbot’s usability and user satisfaction. Enabling machines to emotion recognition in conversation is challenging, mainly because the information in human dialogue innately conveys emotions by long-term experience, abundant knowledge, context, and the intricate patterns between the affective states. Recently, many studies on neural emotional conversational models have been conducted. However, enabling the chatbot to control what kind of emotion to respond to upon its own characters in conversation is still underexplored. At this stage, people are no longer satisfied with using a dialogue system to solve specific tasks, and are more eager to achieve spiritual communication. In the chat process, if the robot can perceive the user's emotions and can accurately process them, it can greatly enrich the content of the dialogue and make the user empathize.
In the process of emotional dialogue, our ultimate goal is to make the machine understand human emotions and give matching responses. Based on these two points, this thesis explores and in-depth emotion recognition in conversation task and emotional dialogue generation task. In the past few years, although considerable progress has been made in emotional research in dialogue, there are still some difficulties and challenges due to the complex nature of human emotions. The key contributions in this thesis are summarized as below:
(1) Researchers have paid more attention to enhancing natural language models with knowledge graphs these days, since knowledge graph has gained a lot of systematic knowledge. A large number of studies had shown that the introduction of external commonsense knowledge is very helpful to improve the characteristic information. We address the task of emotion recognition in conversations using external knowledge to enhance semantics. In this work, we employ an external knowledge graph ATOMIC to extract the knowledge sources. We proposed KES model, a new framework that incorporates different elements of external knowledge and conversational semantic role labeling, where build upon them to learn interactions between interlocutors participating in a conversation. The conversation is a sequence of coherent and orderly discourses. For neural networks, the capture of long-range context information is a weakness. We adopt Transformer a structure composed of self-attention and feed forward neural network, instead of the traditional RNN model, aiming at capturing remote context information. We design a self-attention layer specialized for enhanced semantic text features with external commonsense knowledge. Then, two different networks composed of LSTM are responsible for tracking individual internal state and context external state. In addition, the proposed model has experimented on three datasets in emotion detection in conversation. The experimental results show that our model outperforms the state-of-the-art approaches on most of the tested datasets.
(2) We proposed an emotional dialogue model based on Seq2Seq, which is improved from three aspects: model input, encoder structure, and decoder structure, so that the model can generate responses with rich emotions, diversity, and context. In terms of model input, emotional information and location information are added based on word vectors. In terms of the encoder, the proposed model first encodes the current input and sentence sentiment to generate a semantic vector, and additionally encodes the context and sentence sentiment to generate a context vector, adding contextual information while ensuring the independence of the current input. On the decoder side, attention is used to calculate the weights of the two semantic vectors separately and then decode, to fully integrate the local emotional semantic information and the global emotional semantic information. We used seven objective evaluation indicators to evaluate the model's generation results, context similarity, response diversity, and emotional response. Experimental results show that the model can generate diverse responses with rich sentiment, contextual associations
CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking
In dialogue systems, a dialogue state tracker aims to accurately find a
compact representation of the current dialogue status, based on the entire
dialogue history. While previous approaches often define dialogue states as a
combination of separate triples ({\em domain-slot-value}), in this paper, we
employ a structured state representation and cast dialogue state tracking as a
sequence generation problem. Based on this new formulation, we propose a {\bf
C}oa{\bf R}s{\bf E}-to-fine {\bf DI}alogue state {\bf T}racking ({\bf CREDIT})
approach. Taking advantage of the structured state representation, which is a
marked language sequence, we can further fine-tune the pre-trained model (by
supervised learning) by optimizing natural language metrics with the policy
gradient method. Like all generative state tracking methods, CREDIT does not
rely on pre-defined dialogue ontology enumerating all possible slot values.
Experiments demonstrate our tracker achieves encouraging joint goal accuracy
for the five domains in MultiWOZ 2.0 and MultiWOZ 2.1 datasets.Comment: 10 pages, 3 figure
Reinforcement Learning for Generative AI: A Survey
Deep Generative AI has been a long-standing essential topic in the machine
learning community, which can impact a number of application areas like text
generation and computer vision. The major paradigm to train a generative model
is maximum likelihood estimation, which pushes the learner to capture and
approximate the target data distribution by decreasing the divergence between
the model distribution and the target distribution. This formulation
successfully establishes the objective of generative tasks, while it is
incapable of satisfying all the requirements that a user might expect from a
generative model. Reinforcement learning, serving as a competitive option to
inject new training signals by creating new objectives that exploit novel
signals, has demonstrated its power and flexibility to incorporate human
inductive bias from multiple angles, such as adversarial learning,
hand-designed rules and learned reward model to build a performant model.
Thereby, reinforcement learning has become a trending research field and has
stretched the limits of generative AI in both model design and application. It
is reasonable to summarize and conclude advances in recent years with a
comprehensive review. Although there are surveys in different application areas
recently, this survey aims to shed light on a high-level review that spans a
range of application areas. We provide a rigorous taxonomy in this area and
make sufficient coverage on various models and applications. Notably, we also
surveyed the fast-developing large language model area. We conclude this survey
by showing the potential directions that might tackle the limit of current
models and expand the frontiers for generative AI
A Conditional Generative Chatbot using Transformer Model
A Chatbot serves as a communication tool between a human user and a machine
to achieve an appropriate answer based on the human input. In more recent
approaches, a combination of Natural Language Processing and sequential models
are used to build a generative Chatbot. The main challenge of these models is
their sequential nature, which leads to less accurate results. To tackle this
challenge, in this paper, a novel end-to-end architecture is proposed using
conditional Wasserstein Generative Adversarial Networks and a transformer model
for answer generation in Chatbots. While the generator of the proposed model
consists of a full transformer model to generate an answer, the discriminator
includes only the encoder part of a transformer model followed by a classifier.
To the best of our knowledge, this is the first time that a generative Chatbot
is proposed using the embedded transformer in both generator and discriminator
models. Relying on the parallel computing of the transformer model, the results
of the proposed model on the Cornell Movie-Dialog corpus and the Chit-Chat
datasets confirm the superiority of the proposed model compared to
state-of-the-art alternatives using different evaluation metrics
Enriching Conversation Context in Retrieval-based Chatbots
Work on retrieval-based chatbots, like most sequence pair matching tasks, can
be divided into Cross-encoders that perform word matching over the pair, and
Bi-encoders that encode the pair separately. The latter has better performance,
however since candidate responses cannot be encoded offline, it is also much
slower. Lately, multi-layer transformer architectures pre-trained as language
models have been used to great effect on a variety of natural language
processing and information retrieval tasks. Recent work has shown that these
language models can be used in text-matching scenarios to create Bi-encoders
that perform almost as well as Cross-encoders while having a much faster
inference speed. In this paper, we expand upon this work by developing a
sequence matching architecture that %takes into account contexts in the
training dataset at inference time. utilizes the entire training set as a
makeshift knowledge-base during inference. We perform detailed experiments
demonstrating that this architecture can be used to further improve Bi-encoders
performance while still maintaining a relatively high inference speed.Comment: 8 pages, 1 figure, 3 table
A Survey of Knowledge-Enhanced Text Generation
The goal of text generation is to make machines express in human language. It
is one of the most important yet challenging tasks in natural language
processing (NLP). Since 2014, various neural encoder-decoder models pioneered
by Seq2Seq have been proposed to achieve the goal by learning to map input text
to output text. However, the input text alone often provides limited knowledge
to generate the desired output, so the performance of text generation is still
far from satisfaction in many real-world scenarios. To address this issue,
researchers have considered incorporating various forms of knowledge beyond the
input text into the generation models. This research direction is known as
knowledge-enhanced text generation. In this survey, we present a comprehensive
review of the research on knowledge enhanced text generation over the past five
years. The main content includes two parts: (i) general methods and
architectures for integrating knowledge into text generation; (ii) specific
techniques and applications according to different forms of knowledge data.
This survey can have broad audiences, researchers and practitioners, in
academia and industry.Comment: 42 pages, 12 tables, 8 figures; Under review at ACM CSUR (revised
manuscript
Dialogue History Matters! Personalized Response Selectionin Multi-turn Retrieval-based Chatbots
Existing multi-turn context-response matching methods mainly concentrate on
obtaining multi-level and multi-dimension representations and better
interactions between context utterances and response. However, in real-place
conversation scenarios, whether a response candidate is suitable not only
counts on the given dialogue context but also other backgrounds, e.g., wording
habits, user-specific dialogue history content. To fill the gap between these
up-to-date methods and the real-world applications, we incorporate
user-specific dialogue history into the response selection and propose a
personalized hybrid matching network (PHMN). Our contributions are two-fold: 1)
our model extracts personalized wording behaviors from user-specific dialogue
history as extra matching information; 2) we perform hybrid representation
learning on context-response utterances and explicitly incorporate a customized
attention mechanism to extract vital information from context-response
interactions so as to improve the accuracy of matching. We evaluate our model
on two large datasets with user identification, i.e., personalized Ubuntu
dialogue Corpus (P-Ubuntu) and personalized Weibo dataset (P-Weibo).
Experimental results confirm that our method significantly outperforms several
strong models by combining personalized attention, wording behaviors, and
hybrid representation learning.Comment: Accepted by ACM Transactions on Information Systems, 25 pages, 2
figures, 9 table
A Survey of Natural Language Generation
This paper offers a comprehensive review of the research on Natural Language
Generation (NLG) over the past two decades, especially in relation to
data-to-text generation and text-to-text generation deep learning methods, as
well as new applications of NLG technology. This survey aims to (a) give the
latest synthesis of deep learning research on the NLG core tasks, as well as
the architectures adopted in the field; (b) detail meticulously and
comprehensively various NLG tasks and datasets, and draw attention to the
challenges in NLG evaluation, focusing on different evaluation methods and
their relationships; (c) highlight some future emphasis and relatively recent
research issues that arise due to the increasing synergy between NLG and other
artificial intelligence areas, such as computer vision, text and computational
creativity.Comment: Accepted by ACM Computing Survey (CSUR) 202
Teacher-Student Architecture for Knowledge Distillation: A Survey
Although Deep neural networks (DNNs) have shown a strong capacity to solve
large-scale problems in many areas, such DNNs are hard to be deployed in
real-world systems due to their voluminous parameters. To tackle this issue,
Teacher-Student architectures were proposed, where simple student networks with
a few parameters can achieve comparable performance to deep teacher networks
with many parameters. Recently, Teacher-Student architectures have been
effectively and widely embraced on various knowledge distillation (KD)
objectives, including knowledge compression, knowledge expansion, knowledge
adaptation, and knowledge enhancement. With the help of Teacher-Student
architectures, current studies are able to achieve multiple distillation
objectives through lightweight and generalized student networks. Different from
existing KD surveys that primarily focus on knowledge compression, this survey
first explores Teacher-Student architectures across multiple distillation
objectives. This survey presents an introduction to various knowledge
representations and their corresponding optimization objectives. Additionally,
we provide a systematic overview of Teacher-Student architectures with
representative learning algorithms and effective distillation schemes. This
survey also summarizes recent applications of Teacher-Student architectures
across multiple purposes, including classification, recognition, generation,
ranking, and regression. Lastly, potential research directions in KD are
investigated, focusing on architecture design, knowledge quality, and
theoretical studies of regression-based learning, respectively. Through this
comprehensive survey, industry practitioners and the academic community can
gain valuable insights and guidelines for effectively designing, learning, and
applying Teacher-Student architectures on various distillation objectives.Comment: 20 pages. arXiv admin note: substantial text overlap with
arXiv:2210.1733
- …