95 research outputs found
A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms
Consistently checking the statistical significance of experimental results is
the first mandatory step towards reproducible science. This paper presents a
hitchhiker's guide to rigorous comparisons of reinforcement learning
algorithms. After introducing the concepts of statistical testing, we review
the relevant statistical tests and compare them empirically in terms of false
positive rate and statistical power as a function of the sample size (number of
seeds) and effect size. We further investigate the robustness of these tests to
violations of the most common hypotheses (normal distributions, same
distributions, equal variances). Beside simulations, we compare empirical
distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep
Deterministic Policy Gradient on Half-Cheetah. We conclude by providing
guidelines and code to perform rigorous comparisons of RL algorithm
performances.Comment: 8 pages + supplementary materia
CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments
In this paper we study a new reinforcement learning setting where the
environment is non-rewarding, contains several possibly related objects of
various controllability, and where an apt agent Bob acts independently, with
non-observable intentions. We argue that this setting defines a realistic
scenario and we present a generic discrete-state discrete-action model of such
environments. To learn in this environment, we propose an unsupervised
reinforcement learning agent called CLIC for Curriculum Learning and Imitation
for Control. CLIC learns to control individual objects in its environment, and
imitates Bob's interactions with these objects. It selects objects to focus on
when training and imitating by maximizing its learning progress. We show that
CLIC is an effective baseline in our new setting. It can effectively observe
Bob to gain control of objects faster, even if Bob is not explicitly teaching.
It can also follow Bob when he acts as a mentor and provides ordered
demonstrations. Finally, when Bob controls objects that the agent cannot, or in
presence of a hierarchy between objects in the environment, we show that CLIC
ignores non-reproducible and already mastered interactions with objects,
resulting in a greater benefit from imitation
Automatic Curriculum Learning For Deep RL: A Short Survey
Automatic Curriculum Learning (ACL) has become a cornerstone of recent
successes in Deep Reinforcement Learning (DRL).These methods shape the learning
trajectories of agents by challenging them with tasks adapted to their
capacities. In recent years, they have been used to improve sample efficiency
and asymptotic performance, to organize exploration, to encourage
generalization or to solve sparse reward problems, among others. The ambition
of this work is dual: 1) to present a compact and accessible introduction to
the Automatic Curriculum Learning literature and 2) to draw a bigger picture of
the current state of the art in ACL to encourage the cross-breeding of existing
concepts and the emergence of new ideas.Comment: Accepted at IJCAI202
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments
Consistently checking the statistical significance of experimental results is
one of the mandatory methodological steps to address the so-called
"reproducibility crisis" in deep reinforcement learning. In this tutorial
paper, we explain how the number of random seeds relates to the probabilities
of statistical errors. For both the t-test and the bootstrap confidence
interval test, we recall theoretical guidelines to determine the number of
random seeds one should use to provide a statistically significant comparison
of the performance of two algorithms. Finally, we discuss the influence of
deviations from the assumptions usually made by statistical tests. We show that
they can lead to inaccurate evaluations of statistical errors and provide
guidelines to counter these negative effects. We make our code available to
perform the tests
ACES: Generating Diverse Programming Puzzles with Autotelic Language Models and Semantic Descriptors
Finding and selecting new and interesting problems to solve is at the heart
of curiosity, science and innovation. We here study automated problem
generation in the context of the open-ended space of python programming
puzzles. Existing generative models often aim at modeling a reference
distribution without any explicit diversity optimization. Other methods
explicitly optimizing for diversity do so either in limited hand-coded
representation spaces or in uninterpretable learned embedding spaces that may
not align with human perceptions of interesting variations. With ACES
(Autotelic Code Exploration via Semantic descriptors), we introduce a new
autotelic generation method that leverages semantic descriptors produced by a
large language model (LLM) to directly optimize for interesting diversity, as
well as few-shot-based generation. Each puzzle is labeled along 10 dimensions,
each capturing a programming skill required to solve it. ACES generates and
pursues novel and feasible goals to explore that abstract semantic space,
slowly discovering a diversity of solvable programming puzzles in any given
run. Across a set of experiments, we show that ACES discovers a richer
diversity of puzzles than existing diversity-maximizing algorithms as measured
across a range of diversity metrics. We further study whether and in which
conditions this diversity can translate into the successful training of puzzle
solving models
Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments
International audienceWe consider the problem of how a teacher algorithm can enable an unknown Deep Reinforcement Learning (DRL) student to become good at a skill over a wide range of diverse environments. To do so, we study how a teacher algorithm can learn to generate a learning curriculum, whereby it sequentially samples parameters controlling a stochastic procedural generation of environments. Because it does not initially know the capacities of its student, a key challenge for the teacher is to discover which environments are easy, difficult or unlearnable, and in what order to propose them to maximize the efficiency of learning over the learnable ones. To achieve this, this problem is transformed into a surrogate continuous bandit problem where the teacher samples environments in order to maximize absolute learning progress of its student. We present a new algorithm modeling absolute learning progress with Gaussian mixture models (ALP-GMM). We also adapt existing algorithms and provide a complete study in the context of DRL. Using parameterized variants of the BipedalWalker environment, we study their efficiency to personalize a learning curriculum for different learners (embodiments), their robustness to the ratio of learnable/unlearnable environments, and their scalability to non-linear and high-dimensional parameter spaces. Videos and code are available at https://github.com/flowersteam/teachDeepRL
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments,
discover possible interactions and build repertoires of skills is a general
objective of artificial intelligence. Developmental approaches argue that this
can only be achieved by : intrinsically motivated learning
agents that can learn to represent, generate, select and solve their own
problems. In recent years, the convergence of developmental approaches with
deep reinforcement learning (RL) methods has been leading to the emergence of a
new field: . Developmental RL is
concerned with the use of deep RL algorithms to tackle a developmental problem
-- the -
. The self-generation of goals requires the learning
of compact goal encodings as well as their associated goal-achievement
functions. This raises new challenges compared to standard RL algorithms
originally designed to tackle pre-defined sets of goals using external reward
signals. The present paper introduces developmental RL and proposes a
computational framework based on goal-conditioned RL to tackle the
intrinsically motivated skills acquisition problem. It proceeds to present a
typology of the various goal representations used in the literature, before
reviewing existing methods to learn to represent and prioritize goals in
autonomous systems. We finally close the paper by discussing some open
challenges in the quest of intrinsically motivated skills acquisition
- âŠ