10,471 research outputs found
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Intrinsically motivated spontaneous exploration is a key enabler of
autonomous lifelong learning in human children. It enables the discovery and
acquisition of large repertoires of skills through self-generation,
self-selection, self-ordering and self-experimentation of learning goals. We
present an algorithmic approach called Intrinsically Motivated Goal Exploration
Processes (IMGEP) to enable similar properties of autonomous or self-supervised
learning in machines. The IMGEP algorithmic architecture relies on several
principles: 1) self-generation of goals, generalized as fitness functions; 2)
selection of goals based on intrinsic rewards; 3) exploration with incremental
goal-parameterized policy search and exploitation of the gathered data with a
batch learning algorithm; 4) systematic reuse of information acquired when
targeting a goal for improving towards other goals. We present a particularly
efficient form of IMGEP, called Modular Population-Based IMGEP, that uses a
population-based policy and an object-centered modularity in goals and
mutations. We provide several implementations of this architecture and
demonstrate their ability to automatically generate a learning curriculum
within several experimental setups including a real humanoid robot that can
explore multiple spaces of goals with several hundred continuous dimensions.
While no particular target goal is provided to the system, this curriculum
allows the discovery of skills that act as stepping stone for learning more
complex skills, e.g. nested tool use. We show that learning diverse spaces of
goals with intrinsic motivations is more efficient for learning complex skills
than only trying to directly learn these complex skills
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
Progressive growing of self-organized hierarchical representations for exploration
Designing agent that can autonomously discover and learn a diversity of
structures and skills in unknown changing environments is key for lifelong
machine learning. A central challenge is how to learn incrementally
representations in order to progressively build a map of the discovered
structures and re-use it to further explore. To address this challenge, we
identify and target several key functionalities. First, we aim to build lasting
representations and avoid catastrophic forgetting throughout the exploration
process. Secondly we aim to learn a diversity of representations allowing to
discover a "diversity of diversity" of structures (and associated skills) in
complex high-dimensional environments. Thirdly, we target representations that
can structure the agent discoveries in a coarse-to-fine manner. Finally, we
target the reuse of such representations to drive exploration toward an
"interesting" type of diversity, for instance leveraging human guidance.
Current approaches in state representation learning rely generally on
monolithic architectures which do not enable all these functionalities.
Therefore, we present a novel technique to progressively construct a Hierarchy
of Observation Latent Models for Exploration Stratification, called HOLMES.
This technique couples the use of a dynamic modular model architecture for
representation learning with intrinsically-motivated goal exploration processes
(IMGEPs). The paper shows results in the domain of automated discovery of
diverse self-organized patterns, considering as testbed the experimental
framework from Reinke et al. (2019)
Automatic Curriculum Learning For Deep RL: A Short Survey
Automatic Curriculum Learning (ACL) has become a cornerstone of recent
successes in Deep Reinforcement Learning (DRL).These methods shape the learning
trajectories of agents by challenging them with tasks adapted to their
capacities. In recent years, they have been used to improve sample efficiency
and asymptotic performance, to organize exploration, to encourage
generalization or to solve sparse reward problems, among others. The ambition
of this work is dual: 1) to present a compact and accessible introduction to
the Automatic Curriculum Learning literature and 2) to draw a bigger picture of
the current state of the art in ACL to encourage the cross-breeding of existing
concepts and the emergence of new ideas.Comment: Accepted at IJCAI202
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments,
discover possible interactions and build repertoires of skills is a general
objective of artificial intelligence. Developmental approaches argue that this
can only be achieved by : intrinsically motivated learning
agents that can learn to represent, generate, select and solve their own
problems. In recent years, the convergence of developmental approaches with
deep reinforcement learning (RL) methods has been leading to the emergence of a
new field: . Developmental RL is
concerned with the use of deep RL algorithms to tackle a developmental problem
-- the -
. The self-generation of goals requires the learning
of compact goal encodings as well as their associated goal-achievement
functions. This raises new challenges compared to standard RL algorithms
originally designed to tackle pre-defined sets of goals using external reward
signals. The present paper introduces developmental RL and proposes a
computational framework based on goal-conditioned RL to tackle the
intrinsically motivated skills acquisition problem. It proceeds to present a
typology of the various goal representations used in the literature, before
reviewing existing methods to learn to represent and prioritize goals in
autonomous systems. We finally close the paper by discussing some open
challenges in the quest of intrinsically motivated skills acquisition
Grounding Language to Autonomously-Acquired Skills via Goal Generation
We are interested in the autonomous acquisition of repertoires of skills.
Language-conditioned reinforcement learning (LC-RL) approaches are great tools
in this quest, as they allow to express abstract goals as sets of constraints
on the states. However, most LC-RL agents are not autonomous and cannot learn
without external instructions and feedback. Besides, their direct language
condition cannot account for the goal-directed behavior of pre-verbal infants
and strongly limits the expression of behavioral diversity for a given language
input. To resolve these issues, we propose a new conceptual approach to
language-conditioned RL: the Language-Goal-Behavior architecture (LGB). LGB
decouples skill learning and language grounding via an intermediate semantic
representation of the world. To showcase the properties of LGB, we present a
specific implementation called DECSTR. DECSTR is an intrinsically motivated
learning agent endowed with an innate semantic representation describing
spatial relations between physical objects. In a first stage (G -> B), it
freely explores its environment and targets self-generated semantic
configurations. In a second stage (L -> G), it trains a language-conditioned
goal generator to generate semantic goals that match the constraints expressed
in language-based inputs. We showcase the additional properties of LGB w.r.t.
both an end-to-end LC-RL approach and a similar approach leveraging
non-semantic, continuous intermediate representations. Intermediate semantic
representations help satisfy language commands in a diversity of ways, enable
strategy switching after a failure and facilitate language grounding.Comment: Published at ICLR 202
Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards
Intrinsic rewards were introduced to simulate how human intelligence works;
they are usually evaluated by intrinsically-motivated play, i.e., playing games
without extrinsic rewards but evaluated with extrinsic rewards. However, none
of the existing intrinsic reward approaches can achieve human-level performance
under this very challenging setting of intrinsically-motivated play. In this
work, we propose a novel megalomania-driven intrinsic reward (called
mega-reward), which, to our knowledge, is the first approach that achieves
human-level performance in intrinsically-motivated play. Intuitively,
mega-reward comes from the observation that infants' intelligence develops when
they try to gain more control on entities in an environment; therefore,
mega-reward aims to maximize the control capabilities of agents on given
entities in a given environment. To formalize mega-reward, a relational
transition model is proposed to bridge the gaps between direct and latent
control. Experimental studies show that mega-reward (i) can greatly outperform
all state-of-the-art intrinsic reward approaches, (ii) generally achieves the
same level of performance as Ex-PPO and professional human-level scores, and
(iii) has also a superior performance when it is incorporated with extrinsic
rewards
- …