65,660 research outputs found
Goal-oriented Dialogue Policy Learning from Failures
Reinforcement learning methods have been used for learning dialogue policies.
However, learning an effective dialogue policy frequently requires
prohibitively many conversations. This is partly because of the sparse rewards
in dialogues, and the very few successful dialogues in early learning phase.
Hindsight experience replay (HER) enables learning from failures, but the
vanilla HER is inapplicable to dialogue learning due to the implicit goals. In
this work, we develop two complex HER methods providing different trade-offs
between complexity and performance, and, for the first time, enabled HER-based
dialogue policy learning. Experiments using a realistic user simulator show
that our HER methods perform better than existing experience replay methods (as
applied to deep Q-networks) in learning rate
A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems
In dynamic motion generation tasks, including contact and collisions, small
changes in policy parameters can lead to extremely different returns. For
example, in soccer, the ball can fly in completely different directions with a
similar heading motion by slightly changing the hitting position or the force
applied to the ball or when the friction of the ball varies. However, it is
difficult to imagine that completely different skills are needed for heading a
ball in different directions. In this study, we proposed a multitask
reinforcement learning algorithm for adapting a policy to implicit changes in
goals or environments in a single motion category with different reward
functions or physical parameters of the environment. We evaluated the proposed
method on the ball heading task using a monopod robot model. The results showed
that the proposed method can adapt to implicit changes in the goal positions or
the coefficients of restitution of the ball, whereas the standard domain
randomization approach cannot cope with different task settings.Comment: 12 pages, 9 figure
Latent Plans for Task-Agnostic Offline Reinforcement Learning
Everyday tasks of long-horizon and comprising a sequence of multiple implicit
subtasks still impose a major challenge in offline robot control. While a
number of prior methods aimed to address this setting with variants of
imitation and offline reinforcement learning, the learned behavior is typically
narrow and often struggles to reach configurable long-horizon goals. As both
paradigms have complementary strengths and weaknesses, we propose a novel
hierarchical approach that combines the strengths of both methods to learn
task-agnostic long-horizon policies from high-dimensional camera observations.
Concretely, we combine a low-level policy that learns latent skills via
imitation learning and a high-level policy learned from offline reinforcement
learning for skill-chaining the latent behavior priors. Experiments in various
simulated and real robot control tasks show that our formulation enables
producing previously unseen combinations of skills to reach temporally extended
goals by "stitching" together latent skills through goal chaining with an
order-of-magnitude improvement in performance upon state-of-the-art baselines.
We even learn one multi-task visuomotor policy for 25 distinct manipulation
tasks in the real world which outperforms both imitation learning and offline
reinforcement learning techniques.Comment: CoRL 2022. Project website: http://tacorl.cs.uni-freiburg.de
- …