1,926 research outputs found
Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems
Dialogue policy learning for task-oriented dialogue systems has enjoyed great
progress recently mostly through employing reinforcement learning methods.
However, these approaches have become very sophisticated. It is time to
re-evaluate it. Are we really making progress developing dialogue agents only
based on reinforcement learning? We demonstrate how (1)~traditional supervised
learning together with (2)~a simulator-free adversarial learning method can be
used to achieve performance comparable to state-of-the-art RL-based methods.
First, we introduce a simple dialogue action decoder to predict the appropriate
actions. Then, the traditional multi-label classification solution for dialogue
policy learning is extended by adding dense layers to improve the dialogue
agent performance. Finally, we employ the Gumbel-Softmax estimator to
alternatively train the dialogue agent and the dialogue reward model without
using reinforcement learning. Based on our extensive experimentation, we can
conclude the proposed methods can achieve more stable and higher performance
with fewer efforts, such as the domain knowledge required to design a user
simulator and the intractable parameter tuning in reinforcement learning. Our
main goal is not to beat reinforcement learning with supervised learning, but
to demonstrate the value of rethinking the role of reinforcement learning and
supervised learning in optimizing task-oriented dialogue systems.Comment: 10 page
Reinforcement Learning for Generative AI: A Survey
Deep Generative AI has been a long-standing essential topic in the machine
learning community, which can impact a number of application areas like text
generation and computer vision. The major paradigm to train a generative model
is maximum likelihood estimation, which pushes the learner to capture and
approximate the target data distribution by decreasing the divergence between
the model distribution and the target distribution. This formulation
successfully establishes the objective of generative tasks, while it is
incapable of satisfying all the requirements that a user might expect from a
generative model. Reinforcement learning, serving as a competitive option to
inject new training signals by creating new objectives that exploit novel
signals, has demonstrated its power and flexibility to incorporate human
inductive bias from multiple angles, such as adversarial learning,
hand-designed rules and learned reward model to build a performant model.
Thereby, reinforcement learning has become a trending research field and has
stretched the limits of generative AI in both model design and application. It
is reasonable to summarize and conclude advances in recent years with a
comprehensive review. Although there are surveys in different application areas
recently, this survey aims to shed light on a high-level review that spans a
range of application areas. We provide a rigorous taxonomy in this area and
make sufficient coverage on various models and applications. Notably, we also
surveyed the fast-developing large language model area. We conclude this survey
by showing the potential directions that might tackle the limit of current
models and expand the frontiers for generative AI
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges
Generative Artificial Intelligence (AI) is one of the most exciting
developments in Computer Science of the last decade. At the same time,
Reinforcement Learning (RL) has emerged as a very successful paradigm for a
variety of machine learning tasks. In this survey, we discuss the state of the
art, opportunities and open research questions in applying RL to generative AI.
In particular, we will discuss three types of applications, namely, RL as an
alternative way for generation without specified objectives; as a way for
generating outputs while concurrently maximizing an objective function; and,
finally, as a way of embedding desired characteristics, which cannot be easily
captured by means of an objective function, into the generative process. We
conclude the survey with an in-depth discussion of the opportunities and
challenges in this fascinating emerging area.Comment: Published in JAIR at
https://www.jair.org/index.php/jair/article/view/1527
- …