1,187 research outputs found
Dialog Action-Aware Transformer for Dialog Policy Learning
Recent works usually address Dialog policy learning DPL by training a
reinforcement learning (RL) agent to determine the best dialog action. However,
existing works on deep RL require a large volume of agent-user interactions to
achieve acceptable performance. In this paper, we propose to make full use of
the plain text knowledge from the pre-trained language model to accelerate the
RL agent's learning speed. Specifically, we design a dialog action-aware
transformer encoder (DaTrans), which integrates a new fine-tuning procedure
named masked last action task to encourage DaTrans to be dialog-aware and
distils action-specific features. Then, DaTrans is further optimized in an RL
setting with ongoing interactions and evolves through exploration in the dialog
action space toward maximizing long-term accumulated rewards. The effectiveness
and efficiency of the proposed model are demonstrated with both simulator
evaluation and human evaluation.Comment: To be appeared in SIGdial 202
Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative
Dialog policies, which determine a system's action based on the current state
at each dialog turn, are crucial to the success of the dialog. In recent years,
reinforcement learning (RL) has emerged as a promising option for dialog policy
learning (DPL). In RL-based DPL, dialog policies are updated according to
rewards. The manual construction of fine-grained rewards, such as
state-action-based ones, to effectively guide the dialog policy is challenging
in multi-domain task-oriented dialog scenarios with numerous state-action pair
combinations. One way to estimate rewards from collected data is to train the
reward estimator and dialog policy simultaneously using adversarial learning
(AL). Although this method has demonstrated superior performance
experimentally, it is fraught with the inherent problems of AL, such as mode
collapse. This paper first identifies the role of AL in DPL through detailed
analyses of the objective functions of dialog policy and reward estimator.
Next, based on these analyses, we propose a method that eliminates AL from
reward estimation and DPL while retaining its advantages. We evaluate our
method using MultiWOZ, a multi-domain task-oriented dialog corpus
Reinforced Natural Language Interfaces via Entropy Decomposition
In this paper, we study the technical problem of developing conversational
agents that can quickly adapt to unseen tasks, learn task-specific
communication tactics, and help listeners finish complex, temporally extended
tasks. We find that the uncertainty of language learning can be decomposed to
an entropy term and a mutual information term, corresponding to the structural
and functional aspect of language, respectively. Combined with reinforcement
learning, our method automatically requests human samples for training when
adapting to new tasks and learns communication protocols that are succinct and
helpful for task completion. Human and simulation test results on a referential
game and a 3D navigation game prove the effectiveness of the proposed method
JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
Its primary role is to determine the appropriate abstract response, commonly
referred to as the "dialogue action". Traditional DPL methodologies have
treated this as a sequential decision problem, using pre-defined action
candidates extracted from a corpus. However, these incomplete candidates can
significantly limit the diversity of responses and pose challenges when dealing
with edge cases, which are scenarios that occur only at extreme operating
parameters. To address these limitations, we introduce a novel framework, JoTR.
This framework is unique as it leverages a text-to-text Transformer-based model
to generate flexible dialogue actions. Unlike traditional methods, JoTR
formulates a word-level policy that allows for a more dynamic and adaptable
dialogue action generation, without the need for any action templates. This
setting enhances the diversity of responses and improves the system's ability
to handle edge cases effectively. In addition, JoTR employs reinforcement
learning with a reward-shaping mechanism to efficiently finetune the word-level
dialogue policy, which allows the model to learn from its interactions,
improving its performance over time. We conducted an extensive evaluation of
JoTR to assess its effectiveness. Our extensive evaluation shows that JoTR
achieves state-of-the-art performance on two benchmark dialogue modelling
tasks, as assessed by both user simulators and human evaluators.Comment: Our code, models and other related resources are publicly available
at https://github.com/KwanWaiChung/JoT
A Survey of the Evolution of Language Model-Based Dialogue Systems
Dialogue systems, including task-oriented_dialogue_system (TOD) and
open-domain_dialogue_system (ODD), have undergone significant transformations,
with language_models (LM) playing a central role. This survey delves into the
historical trajectory of dialogue systems, elucidating their intricate
relationship with advancements in language models by categorizing this
evolution into four distinct stages, each marked by pivotal LM breakthroughs:
1) Early_Stage: characterized by statistical LMs, resulting in rule-based or
machine-learning-driven dialogue_systems; 2) Independent development of TOD and
ODD based on neural_language_models (NLM; e.g., LSTM and GRU), since NLMs lack
intrinsic knowledge in their parameters; 3) fusion between different types of
dialogue systems with the advert of pre-trained_language_models (PLMs),
starting from the fusion between four_sub-tasks_within_TOD, and then
TOD_with_ODD; and 4) current LLM-based_dialogue_system, wherein LLMs can be
used to conduct TOD and ODD seamlessly. Thus, our survey provides a
chronological perspective aligned with LM breakthroughs, offering a
comprehensive review of state-of-the-art research outcomes. What's more, we
focus on emerging topics and discuss open challenges, providing valuable
insights into future directions for LLM-based_dialogue_systems. Through this
exploration, we pave the way for a deeper_comprehension of the evolution,
guiding future developments in LM-based dialogue_systems
- …