57 research outputs found
TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL
Training autonomous agents able to generalize to multiple tasks is a key
target of Deep Reinforcement Learning (DRL) research. In parallel to improving
DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how
teacher algorithms can train DRL agents more efficiently by adapting task
selection to their evolving abilities. While multiple standard benchmarks exist
to compare DRL agents, there is currently no such thing for ACL algorithms.
Thus, comparing existing approaches is difficult, as too many experimental
parameters differ from paper to paper. In this work, we identify several key
challenges faced by ACL algorithms. Based on these, we present TeachMyAgent
(TA), a benchmark of current ACL algorithms leveraging procedural task
generation. It includes 1) challenge-specific unit-tests using variants of a
procedural Box2D bipedal walker environment, and 2) a new procedural Parkour
environment combining most ACL challenges, making it ideal for global
performance assessment. We then use TeachMyAgent to conduct a comparative study
of representative existing approaches, showcasing the competitiveness of some
ACL algorithms that do not use expert knowledge. We also show that the Parkour
environment remains an open problem. We open-source our environments, all
studied ACL algorithms (collected from open-source code or re-implemented), and
DRL students in a Python package available at
https://github.com/flowersteam/TeachMyAgent
Learning with AMIGo: Adversarially Motivated Intrinsic Goals
A key challenge for reinforcement learning (RL) consists of learning in
environments with sparse extrinsic rewards. In contrast to current RL methods,
humans are able to learn new skills with little or no reward by using various
forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating --
as form of meta-learning -- a goal-generating teacher that proposes
Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student"
policy in the absence of (or alongside) environment reward. Specifically,
through a simple but effective "constructively adversarial" objective, the
teacher learns to propose increasingly challenging -- yet achievable -- goals
that allow the student to learn general skills for acting in a new environment,
independent of the task to be solved. We show that our method generates a
natural curriculum of self-proposed goals which ultimately allows the agent to
solve challenging procedurally-generated tasks where other forms of intrinsic
motivation and state-of-the-art RL methods fail.Comment: 18 pages, 6 figures, published at The Ninth International Conference
on Learning Representations (2021
Transferable Curricula through Difficulty Conditioned Generators
Advancements in reinforcement learning (RL) have demonstrated superhuman
performance in complex tasks such as Starcraft, Go, Chess etc. However,
knowledge transfer from Artificial "Experts" to humans remain a significant
challenge. A promising avenue for such transfer would be the use of curricula.
Recent methods in curricula generation focuses on training RL agents
efficiently, yet such methods rely on surrogate measures to track student
progress, and are not suited for training robots in the real world (or more
ambitiously humans). In this paper, we introduce a method named Parameterized
Environment Response Model (PERM) that shows promising results in training RL
agents in parameterized environments. Inspired by Item Response Theory, PERM
seeks to model difficulty of environments and ability of RL agents directly.
Given that RL agents and humans are trained more efficiently under the "zone of
proximal development", our method generates a curriculum by matching the
difficulty of an environment to the current ability of the student. In
addition, PERM can be trained offline and does not employ non-stationary
measures of student ability, making it suitable for transfer between students.
We demonstrate PERM's ability to represent the environment parameter space, and
training with RL agents with PERM produces a strong performance in
deterministic environments. Lastly, we show that our method is transferable
between students, without any sacrifice in training quality.Comment: IJCAI'2
Replay-Guided Adversarial Environment Design
Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR
⊥
, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR
⊥
improves the performance of PAIRED, from which it inherited its theoretical framework
- …