17 research outputs found
CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments
In this paper we study a new reinforcement learning setting where the
environment is non-rewarding, contains several possibly related objects of
various controllability, and where an apt agent Bob acts independently, with
non-observable intentions. We argue that this setting defines a realistic
scenario and we present a generic discrete-state discrete-action model of such
environments. To learn in this environment, we propose an unsupervised
reinforcement learning agent called CLIC for Curriculum Learning and Imitation
for Control. CLIC learns to control individual objects in its environment, and
imitates Bob's interactions with these objects. It selects objects to focus on
when training and imitating by maximizing its learning progress. We show that
CLIC is an effective baseline in our new setting. It can effectively observe
Bob to gain control of objects faster, even if Bob is not explicitly teaching.
It can also follow Bob when he acts as a mentor and provides ordered
demonstrations. Finally, when Bob controls objects that the agent cannot, or in
presence of a hierarchy between objects in the environment, we show that CLIC
ignores non-reproducible and already mastered interactions with objects,
resulting in a greater benefit from imitation
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
Goal-Conditioned Reinforcement Learning with Imagined Subgoals
Goal-conditioned reinforcement learning endows an agent with a large variety
of skills, but it often struggles to solve tasks that require more temporally
extended reasoning. In this work, we propose to incorporate imagined subgoals
into policy learning to facilitate learning of complex tasks. Imagined subgoals
are predicted by a separate high-level policy, which is trained simultaneously
with the policy and its critic. This high-level policy predicts intermediate
states halfway to the goal using the value function as a reachability metric.
We don't require the policy to reach these subgoals explicitly. Instead, we use
them to define a prior policy, and incorporate this prior into a KL-constrained
policy iteration scheme to speed up and regularize learning. Imagined subgoals
are used during policy learning, but not during test time, where we only apply
the learned policy. We evaluate our approach on complex robotic navigation and
manipulation tasks and show that it outperforms existing methods by a large
margin.Comment: ICML 2021. See the project webpage at
https://www.di.ens.fr/willow/research/ris