2,224 research outputs found
Personalizing Dialogue Agents via Meta-Learning
Existing personalized dialogue models use human designed persona descriptions
to improve dialogue consistency. Collecting such descriptions from existing
dialogues is expensive and requires hand-crafted feature designs. In this
paper, we propose to extend Model-Agnostic Meta-Learning (MAML)(Finn et al.,
2017) to personalized dialogue learning without using any persona descriptions.
Our model learns to quickly adapt to new personas by leveraging only a few
dialogue samples collected from the same user, which is fundamentally different
from conditioning the response on the persona descriptions. Empirical results
on Persona-chat dataset (Zhang et al., 2018) indicate that our solution
outperforms non-meta-learning baselines using automatic evaluation metrics, and
in terms of human-evaluated fluency and consistency.Comment: Accepted in ACL 2019. Zhaojiang Lin* and Andrea Madotto* contributed
equally to this wor
FedPC: Federated Learning for Language Generation with Personal and Context Preference Embeddings
Federated learning is a training paradigm that learns from multiple
distributed users without aggregating data on a centralized server. Such a
paradigm promises the ability to deploy machine-learning at-scale to a diverse
population of end-users without first collecting a large, labeled dataset for
all possible tasks. As federated learning typically averages learning updates
across a decentralized population, there is a growing need for personalization
of federated learning systems (i.e conversational agents must be able to
personalize to a specific user's preferences). In this work, we propose a new
direction for personalization research within federated learning, leveraging
both personal embeddings and shared context embeddings. We also present an
approach to predict these ``preference'' embeddings, enabling personalization
without backpropagation. Compared to state-of-the-art personalization
baselines, our approach achieves a 50\% improvement in test-time perplexity
using 0.001\% of the memory required by baseline approaches, and achieving
greater sample- and compute-efficiency.Comment: Andrew Silva and Pradyumna Tambwekar contributed equally towards this
wor
Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function
Task-oriented dialog systems enable users to accomplish tasks using natural
language. State-of-the-art systems respond to users in the same way regardless
of their personalities, although personalizing dialogues can lead to higher
levels of adoption and better user experiences. Building personalized dialog
systems is an important, yet challenging endeavor and only a handful of works
took on the challenge. Most existing works rely on supervised learning
approaches and require laborious and expensive labeled training data for each
user profile. Additionally, collecting and labeling data for each user profile
is virtually impossible. In this work, we propose a novel framework, P-ToD, to
personalize task-oriented dialog systems capable of adapting to a wide range of
user profiles in an unsupervised fashion using a zero-shot generalizable reward
function. P-ToD uses a pre-trained GPT-2 as a backbone model and works in three
phases. Phase one performs task-specific training. Phase two kicks off
unsupervised personalization by leveraging the proximal policy optimization
algorithm that performs policy gradients guided by the zero-shot generalizable
reward function. Our novel reward function can quantify the quality of the
generated responses even for unseen profiles. The optional final phase
fine-tunes the personalized model using a few labeled training examples. We
conduct extensive experimental analysis using the personalized bAbI dialogue
benchmark for five tasks and up to 180 diverse user profiles. The experimental
results demonstrate that P-ToD, even when it had access to zero labeled
examples, outperforms state-of-the-art supervised personalization models and
achieves competitive performance on BLEU and ROUGE metrics when compared to a
strong fully-supervised GPT-2 baselineComment: 11 pages, 4 tables, 31st ACM International Conference on Information
and Knowledge Management (CIKM'22
Improving Search through A3C Reinforcement Learning based Conversational Agent
We develop a reinforcement learning based search assistant which can assist
users through a set of actions and sequence of interactions to enable them
realize their intent. Our approach caters to subjective search where the user
is seeking digital assets such as images which is fundamentally different from
the tasks which have objective and limited search modalities. Labeled
conversational data is generally not available in such search tasks and
training the agent through human interactions can be time consuming. We propose
a stochastic virtual user which impersonates a real user and can be used to
sample user behavior efficiently to train the agent which accelerates the
bootstrapping of the agent. We develop A3C algorithm based context preserving
architecture which enables the agent to provide contextual assistance to the
user. We compare the A3C agent with Q-learning and evaluate its performance on
average rewards and state values it obtains with the virtual user in validation
episodes. Our experiments show that the agent learns to achieve higher rewards
and better states.Comment: 17 pages, 7 figure
Bilateral Personalized Dialogue Generation with Dynamic Persona-Aware Fusion
Generating personalized responses is one of the major challenges in natural
human-robot interaction. Current researches in this field mainly focus on
generating responses consistent with the robot's pre-assigned persona, while
ignoring the user's persona. Such responses may be inappropriate or even
offensive, which may lead to the bad user experience. Therefore, we propose a
bilateral personalized dialogue generation (BPDG) method with dynamic
persona-aware fusion via multi-task transfer learning to generate responses
consistent with both personas. The proposed method aims to accomplish three
learning tasks: 1) an encoder is trained with dialogue utterances added with
corresponded personalized attributes and relative position (language model
task), 2) a dynamic persona-aware fusion module predicts the persona presence
to adaptively fuse the contextual and bilateral personas encodings (persona
prediction task) and 3) a decoder generates natural, fluent and personalized
responses (dialogue generation task). To make the generated responses more
personalized and bilateral persona-consistent, the Conditional Mutual
Information Maximum (CMIM) criterion is adopted to select the final response
from the generated candidates. The experimental results show that the proposed
method outperforms several state-of-the-art methods in terms of both automatic
and manual evaluations.Comment: 14 pages, 6 figure
- …