639 research outputs found
Successor features for transfer in reinforcement learning
Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. Our focus is on transfer where the reward functions vary across tasks while the environment's dynamics remain the same. The method we propose rests on two key ideas: "successor features," a value function representation that decouples the dynamics of the environment from the rewards, and "generalized policy improvement," a generalization of dynamic programming's policy improvement step that considers a set of policies rather than a single one. Put together, the two ideas lead to an approach that integrates seamlessly within the reinforcement learning framework and allows transfer to take place between tasks without any restriction. The proposed method also provides performance guarantees for the transferred policy even before any learning has taken place. We derive two theorems that set our approach in firm theoretical ground and present experiments that show that it successfully promotes transfer in practice
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
In this paper, hypernetworks are trained to generate behaviors across a range
of unseen task conditions, via a novel TD-based training objective and data
from a set of near-optimal RL solutions for training tasks. This work relates
to meta RL, contextual RL, and transfer learning, with a particular focus on
zero-shot performance at test time, enabled by knowledge of the task parameters
(also known as context). Our technical approach is based upon viewing each RL
algorithm as a mapping from the MDP specifics to the near-optimal value
function and policy and seek to approximate it with a hypernetwork that can
generate near-optimal value functions and policies, given the parameters of the
MDP. We show that, under certain conditions, this mapping can be considered as
a supervised learning problem. We empirically evaluate the effectiveness of our
method for zero-shot transfer to new reward and transition dynamics on a series
of continuous control tasks from DeepMind Control Suite. Our method
demonstrates significant improvements over baselines from multitask and meta RL
approaches.Comment: AAAI 202
Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning
Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards. To this end, we created SkillHack, a benchmark of tasks and associated skills based on the game of NetHack. We evaluate a number of baselines on this benchmark, as well as our own novel skill-based method Hierarchical Kickstarting (HKS), which is shown to outperform all other evaluated methods. Our experiments show that learning with a prior knowledge of useful skills can significantly improve the performance of agents on complex problems. We ultimately argue that utilising predefined skills provides a useful inductive bias for RL problems, especially those with large state-action spaces and sparse rewards
Sequential Transfer in Reinforcement Learning with a Generative Model
We are interested in how to design reinforcement learning agents that
provably reduce the sample complexity for learning new tasks by transferring
knowledge from previously-solved ones. The availability of solutions to related
problems poses a fundamental trade-off: whether to seek policies that are
expected to achieve high (yet sub-optimal) performance in the new task
immediately or whether to seek information to quickly identify an optimal
solution, potentially at the cost of poor initial behavior. In this work, we
focus on the second objective when the agent has access to a generative model
of state-action pairs. First, given a set of solved tasks containing an
approximation of the target one, we design an algorithm that quickly identifies
an accurate solution by seeking the state-action pairs that are most
informative for this purpose. We derive PAC bounds on its sample complexity
which clearly demonstrate the benefits of using this kind of prior knowledge.
Then, we show how to learn these approximate tasks sequentially by reducing our
transfer setting to a hidden Markov model and employing spectral methods to
recover its parameters. Finally, we empirically verify our theoretical findings
in simple simulated domains.Comment: ICML 202
Self-Attentional Credit Assignment for Transfer in Reinforcement Learning
The ability to transfer knowledge to novel environments and tasks is a
sensible desiderata for general learning agents. Despite the apparent promises,
transfer in RL is still an open and little exploited research area. In this
paper, we take a brand-new perspective about transfer: we suggest that the
ability to assign credit unveils structural invariants in the tasks that can be
transferred to make RL more sample-efficient. Our main contribution is SECRET,
a novel approach to transfer learning for RL that uses a backward-view credit
assignment mechanism based on a self-attentive architecture. Two aspects are
key to its generality: it learns to assign credit as a separate offline
supervised process and exclusively modifies the reward function. Consequently,
it can be supplemented by transfer methods that do not modify the reward
function and it can be plugged on top of any RL algorithm.Comment: 21 pages, 10 figures, 3 tables (accepted as an oral presentation at
the Learning Transferable Skills workshop, NeurIPS 2019
- …