82 research outputs found
Automatic Curriculum Learning For Deep RL: A Short Survey
Automatic Curriculum Learning (ACL) has become a cornerstone of recent
successes in Deep Reinforcement Learning (DRL).These methods shape the learning
trajectories of agents by challenging them with tasks adapted to their
capacities. In recent years, they have been used to improve sample efficiency
and asymptotic performance, to organize exploration, to encourage
generalization or to solve sparse reward problems, among others. The ambition
of this work is dual: 1) to present a compact and accessible introduction to
the Automatic Curriculum Learning literature and 2) to draw a bigger picture of
the current state of the art in ACL to encourage the cross-breeding of existing
concepts and the emergence of new ideas.Comment: Accepted at IJCAI202
TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL
Training autonomous agents able to generalize to multiple tasks is a key
target of Deep Reinforcement Learning (DRL) research. In parallel to improving
DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how
teacher algorithms can train DRL agents more efficiently by adapting task
selection to their evolving abilities. While multiple standard benchmarks exist
to compare DRL agents, there is currently no such thing for ACL algorithms.
Thus, comparing existing approaches is difficult, as too many experimental
parameters differ from paper to paper. In this work, we identify several key
challenges faced by ACL algorithms. Based on these, we present TeachMyAgent
(TA), a benchmark of current ACL algorithms leveraging procedural task
generation. It includes 1) challenge-specific unit-tests using variants of a
procedural Box2D bipedal walker environment, and 2) a new procedural Parkour
environment combining most ACL challenges, making it ideal for global
performance assessment. We then use TeachMyAgent to conduct a comparative study
of representative existing approaches, showcasing the competitiveness of some
ACL algorithms that do not use expert knowledge. We also show that the Parkour
environment remains an open problem. We open-source our environments, all
studied ACL algorithms (collected from open-source code or re-implemented), and
DRL students in a Python package available at
https://github.com/flowersteam/TeachMyAgent
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research
Transferable Curricula through Difficulty Conditioned Generators
Advancements in reinforcement learning (RL) have demonstrated superhuman
performance in complex tasks such as Starcraft, Go, Chess etc. However,
knowledge transfer from Artificial "Experts" to humans remain a significant
challenge. A promising avenue for such transfer would be the use of curricula.
Recent methods in curricula generation focuses on training RL agents
efficiently, yet such methods rely on surrogate measures to track student
progress, and are not suited for training robots in the real world (or more
ambitiously humans). In this paper, we introduce a method named Parameterized
Environment Response Model (PERM) that shows promising results in training RL
agents in parameterized environments. Inspired by Item Response Theory, PERM
seeks to model difficulty of environments and ability of RL agents directly.
Given that RL agents and humans are trained more efficiently under the "zone of
proximal development", our method generates a curriculum by matching the
difficulty of an environment to the current ability of the student. In
addition, PERM can be trained offline and does not employ non-stationary
measures of student ability, making it suitable for transfer between students.
We demonstrate PERM's ability to represent the environment parameter space, and
training with RL agents with PERM produces a strong performance in
deterministic environments. Lastly, we show that our method is transferable
between students, without any sacrifice in training quality.Comment: IJCAI'2
Learning with AMIGo: Adversarially Motivated Intrinsic Goals
A key challenge for reinforcement learning (RL) consists of learning in
environments with sparse extrinsic rewards. In contrast to current RL methods,
humans are able to learn new skills with little or no reward by using various
forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating --
as form of meta-learning -- a goal-generating teacher that proposes
Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student"
policy in the absence of (or alongside) environment reward. Specifically,
through a simple but effective "constructively adversarial" objective, the
teacher learns to propose increasingly challenging -- yet achievable -- goals
that allow the student to learn general skills for acting in a new environment,
independent of the task to be solved. We show that our method generates a
natural curriculum of self-proposed goals which ultimately allows the agent to
solve challenging procedurally-generated tasks where other forms of intrinsic
motivation and state-of-the-art RL methods fail.Comment: 18 pages, 6 figures, published at The Ninth International Conference
on Learning Representations (2021
- …