414 research outputs found
A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms
Consistently checking the statistical significance of experimental results is
the first mandatory step towards reproducible science. This paper presents a
hitchhiker's guide to rigorous comparisons of reinforcement learning
algorithms. After introducing the concepts of statistical testing, we review
the relevant statistical tests and compare them empirically in terms of false
positive rate and statistical power as a function of the sample size (number of
seeds) and effect size. We further investigate the robustness of these tests to
violations of the most common hypotheses (normal distributions, same
distributions, equal variances). Beside simulations, we compare empirical
distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep
Deterministic Policy Gradient on Half-Cheetah. We conclude by providing
guidelines and code to perform rigorous comparisons of RL algorithm
performances.Comment: 8 pages + supplementary materia
CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments
In this paper we study a new reinforcement learning setting where the
environment is non-rewarding, contains several possibly related objects of
various controllability, and where an apt agent Bob acts independently, with
non-observable intentions. We argue that this setting defines a realistic
scenario and we present a generic discrete-state discrete-action model of such
environments. To learn in this environment, we propose an unsupervised
reinforcement learning agent called CLIC for Curriculum Learning and Imitation
for Control. CLIC learns to control individual objects in its environment, and
imitates Bob's interactions with these objects. It selects objects to focus on
when training and imitating by maximizing its learning progress. We show that
CLIC is an effective baseline in our new setting. It can effectively observe
Bob to gain control of objects faster, even if Bob is not explicitly teaching.
It can also follow Bob when he acts as a mentor and provides ordered
demonstrations. Finally, when Bob controls objects that the agent cannot, or in
presence of a hierarchy between objects in the environment, we show that CLIC
ignores non-reproducible and already mastered interactions with objects,
resulting in a greater benefit from imitation
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments
Consistently checking the statistical significance of experimental results is
one of the mandatory methodological steps to address the so-called
"reproducibility crisis" in deep reinforcement learning. In this tutorial
paper, we explain how the number of random seeds relates to the probabilities
of statistical errors. For both the t-test and the bootstrap confidence
interval test, we recall theoretical guidelines to determine the number of
random seeds one should use to provide a statistically significant comparison
of the performance of two algorithms. Finally, we discuss the influence of
deviations from the assumptions usually made by statistical tests. We show that
they can lead to inaccurate evaluations of statistical errors and provide
guidelines to counter these negative effects. We make our code available to
perform the tests
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments,
discover possible interactions and build repertoires of skills is a general
objective of artificial intelligence. Developmental approaches argue that this
can only be achieved by : intrinsically motivated learning
agents that can learn to represent, generate, select and solve their own
problems. In recent years, the convergence of developmental approaches with
deep reinforcement learning (RL) methods has been leading to the emergence of a
new field: . Developmental RL is
concerned with the use of deep RL algorithms to tackle a developmental problem
-- the -
. The self-generation of goals requires the learning
of compact goal encodings as well as their associated goal-achievement
functions. This raises new challenges compared to standard RL algorithms
originally designed to tackle pre-defined sets of goals using external reward
signals. The present paper introduces developmental RL and proposes a
computational framework based on goal-conditioned RL to tackle the
intrinsically motivated skills acquisition problem. It proceeds to present a
typology of the various goal representations used in the literature, before
reviewing existing methods to learn to represent and prioritize goals in
autonomous systems. We finally close the paper by discussing some open
challenges in the quest of intrinsically motivated skills acquisition
Modeling of Hydro-Pneumatic Energy Storage Using Pump Turbines
Modelling of a hydro-pneumatic energy storage system is presented in this paper. Hydro pneumatic storage aims to combine the good efficiency of hydraulic energy conversion and the space flexibility of pneumatic storage. The project aims to model a prototype which uses a rotodynamic multi-stage pump-turbine to displace a virtual liquid piston to compress air. To facilitate mass and heat transfers between both phases, there is no separation between the water and the air. A dynamic model of the storage system is developed using block diagram methodology. It takes into account characteristic curves of the pump-turbine and thermodynamic equations. Modelling results show that vapour diffusion contributes to reducing compression final temperature. This implies an increase of storage efficiency. A test rig construction will begin at the end of autumn 2011. It will be electrically connected to the âDistributed Energiesâ platform of ââArts et MĂ©tiers Paristechââ in Lille.adem
Grounding Language to Autonomously-Acquired Skills via Goal Generation
We are interested in the autonomous acquisition of repertoires of skills.
Language-conditioned reinforcement learning (LC-RL) approaches are great tools
in this quest, as they allow to express abstract goals as sets of constraints
on the states. However, most LC-RL agents are not autonomous and cannot learn
without external instructions and feedback. Besides, their direct language
condition cannot account for the goal-directed behavior of pre-verbal infants
and strongly limits the expression of behavioral diversity for a given language
input. To resolve these issues, we propose a new conceptual approach to
language-conditioned RL: the Language-Goal-Behavior architecture (LGB). LGB
decouples skill learning and language grounding via an intermediate semantic
representation of the world. To showcase the properties of LGB, we present a
specific implementation called DECSTR. DECSTR is an intrinsically motivated
learning agent endowed with an innate semantic representation describing
spatial relations between physical objects. In a first stage (G -> B), it
freely explores its environment and targets self-generated semantic
configurations. In a second stage (L -> G), it trains a language-conditioned
goal generator to generate semantic goals that match the constraints expressed
in language-based inputs. We showcase the additional properties of LGB w.r.t.
both an end-to-end LC-RL approach and a similar approach leveraging
non-semantic, continuous intermediate representations. Intermediate semantic
representations help satisfy language commands in a diversity of ways, enable
strategy switching after a failure and facilitate language grounding.Comment: Published at ICLR 202
Theory of molecular excitation and relaxation near a plasmonic device
International audienceThe new optical concepts currently developed in the research field of plasmonics can have significant practical applications for integrated optical device miniaturization as well as for molecular sensing applications. Particularly, these new devices can offer interesting opportunities for optical addressing of quantum systems. In this article, we develop a realistic model able to explore the various functionalities of a plasmon device connected to a single fluorescing molecule. We show that this theoretical method provides a useful framework to understand how quantum and plasmonic entities interact in a small area. Thus, the fluorescence signal evolution from excitation control to relaxation control depending on the incident light power is clearly observed
- âŠ