16,305 research outputs found
Latent Space Policies for Hierarchical Reinforcement Learning
We address the problem of learning hierarchical deep neural network policies
for reinforcement learning. In contrast to methods that explicitly restrict or
cripple lower layers of a hierarchy to force them to use higher-level
modulating signals, each layer in our framework is trained to directly solve
the task, but acquires a range of diverse strategies via a maximum entropy
reinforcement learning objective. Each layer is also augmented with latent
random variables, which are sampled from a prior distribution during the
training of that layer. The maximum entropy objective causes these latent
variables to be incorporated into the layer's policy, and the higher level
layer can directly control the behavior of the lower layer through this latent
space. Furthermore, by constraining the mapping from latent variables to
actions to be invertible, higher layers retain full expressivity: neither the
higher layers nor the lower layers are constrained in their behavior. Our
experimental evaluation demonstrates that we can improve on the performance of
single-layer policies on standard benchmark tasks simply by adding additional
layers, and that our method can solve more complex sparse-reward tasks by
learning higher-level policies on top of high-entropy skills optimized for
simple low-level objectives.Comment: ICML 2018; Videos: https://sites.google.com/view/latent-space-deep-rl
Code: https://github.com/haarnoja/sa
Disentangled Skill Embeddings for Reinforcement Learning
We propose a novel framework for multi-task reinforcement learning (MTRL).
Using a variational inference formulation, we learn policies that generalize
across both changing dynamics and goals. The resulting policies are
parametrized by shared parameters that allow for transfer between different
dynamics and goal conditions, and by task-specific latent-space embeddings that
allow for specialization to particular tasks. We show how the latent-spaces
enable generalization to unseen dynamics and goals conditions. Additionally,
policies equipped with such embeddings serve as a space of skills (or options)
for hierarchical reinforcement learning. Since we can change task dynamics and
goals independently, we name our framework Disentangled Skill Embeddings (DSE)
Hierarchical Policies for Cluttered-Scene Grasping with Latent Plans
6D grasping in cluttered scenes is a longstanding robotic manipulation
problem. Open-loop manipulation pipelines can fail due to modularity and error
sensitivity while most end-to-end grasping policies with raw perception inputs
have not yet scaled to complex scenes with obstacles. In this work, we propose
a new method to close the gap through sampling and selecting plans in the
latent space. Our hierarchical framework learns collision-free target-driven
grasping based on partial point cloud observations. Our method learns an
embedding space to represent expert grasping plans and a variational
autoencoder to sample diverse latent plans at inference time. Furthermore, we
train a latent plan critic for plan selection and an option classifier for
switching to an instance grasping policy through hierarchical reinforcement
learning. We evaluate and analyze our method and compare against several
baselines in simulation, and demonstrate that the latent planning can
generalize to the real-world cluttered-scene grasping task. Our videos and code
can be found at https://sites.google.com/view/latent-grasping
Hierarchical Reinforcement Learning for Quadruped Locomotion
Legged locomotion is a challenging task for learning algorithms, especially
when the task requires a diverse set of primitive behaviors. To solve these
problems, we introduce a hierarchical framework to automatically decompose
complex locomotion tasks. A high-level policy issues commands in a latent space
and also selects for how long the low-level policy will execute the latent
command. Concurrently, the low-level policy uses the latent command and only
the robot's on-board sensors to control the robot's actuators. Our approach
allows the high-level policy to run at a lower frequency than the low-level
one. We test our framework on a path-following task for a dynamic quadruped
robot and we show that steering behaviors automatically emerge in the latent
command space as low-level skills are needed for this task. We then show
efficient adaptation of the trained policy to a different task by transfer of
the trained low-level policy. Finally, we validate the policies on a real
quadruped robot. To the best of our knowledge, this is the first application of
end-to-end hierarchical learning to a real robotic locomotion task
Hierarchical Policy Learning is Sensitive to Goal Space Design
Hierarchy in reinforcement learning agents allows for control at multiple
time scales yielding improved sample efficiency, the ability to deal with long
time horizons and transferability of sub-policies to tasks outside the training
distribution. It is often implemented as a master policy providing goals to a
sub-policy. Ideally, we would like the goal-spaces to be learned, however,
properties of optimal goal spaces still remain unknown and consequently there
is no method yet to learn optimal goal spaces. Motivated by this, we
systematically analyze how various modifications to the ground-truth goal-space
affect learning in hierarchical models with the aim of identifying important
properties of optimal goal spaces. Our results show that, while rotation of
ground-truth goal spaces and noise had no effect, having additional unnecessary
factors significantly impaired learning in hierarchical models.Comment: Accepted to be presented at Task-Agnostic Reinforcement Learning
(TARL) workshop at ICLR'1
Recommended from our members
Bayesian methods for knowledge transfer and policy search in reinforcement learning
Graduation date: 2013How can an agent generalize its knowledge to new circumstances? To learn\ud
effectively an agent acting in a sequential decision problem must make intelligent action selection choices based on its available knowledge. This dissertation focuses on Bayesian methods of representing learned knowledge and develops novel algorithms that exploit the represented\ud
knowledge when selecting actions.\ud
\ud
Our first contribution introduces the multi-task Reinforcement\ud
Learning setting in which an agent solves a sequence of tasks. An\ud
agent equipped with knowledge of the relationship between tasks can\ud
transfer knowledge between them. We propose the transfer of two\ud
distinct types of knowledge: knowledge of domain models and knowledge\ud
of policies. To represent the transferable knowledge, we propose\ud
hierarchical Bayesian priors on domain models and policies\ud
respectively. To transfer domain model knowledge, we introduce a new\ud
algorithm for model-based Bayesian Reinforcement Learning in the\ud
multi-task setting which exploits the learned hierarchical Bayesian\ud
model to improve exploration in related tasks. To transfer policy\ud
knowledge, we introduce a new policy search algorithm that accepts a\ud
policy prior as input and uses the prior to bias policy search. A\ud
specific implementation of this algorithm is developed that accepts a\ud
hierarchical policy prior. The algorithm learns the hierarchical\ud
structure and reuses components of the structure in related tasks.\ud
\ud
Our second contribution addresses the basic problem of generalizing knowledge gained from previously-executed policies. Bayesian\ud
Optimization is a method of exploiting a prior model of an objective function to quickly identify the point maximizing the modeled objective.\ud
Successful use of Bayesian Optimization in Reinforcement Learning\ud
requires a model relating policies and their performance. Given such a\ud
model, Bayesian Optimization can be applied to search for an optimal\ud
policy. Early work using Bayesian Optimization in the Reinforcement\ud
Learning setting ignored the sequential nature of the underlying\ud
decision problem. The work presented in this thesis explicitly\ud
addresses this problem. We construct new Bayesian models that take\ud
advantage of sequence information to better generalize knowledge\ud
across policies. We empirically evaluate the value of this approach in\ud
a variety of Reinforcement Learning benchmark problems. Experiments\ud
show that our method significantly reduces the amount of exploration\ud
required to identify the optimal policy.\ud
\ud
Our final contribution is a new framework for learning parametric\ud
policies from queries presented to an expert. In many domains it is\ud
difficult to provide expert demonstrations of desired policies.\ud
However, it may still be a simple matter for an expert to identify\ud
good and bad performance. To take advantage of this limited expert\ud
knowledge, our agent presents experts with pairs of demonstrations and\ud
asks which of the demonstrations best represents a latent target\ud
behavior. The goal is to use a small number of queries to elicit the\ud
latent behavior from the expert. We formulate a Bayesian model of the\ud
querying process, an inference procedure that estimates the posterior\ud
distribution over the latent policy space, and an active procedure for\ud
selecting new queries for presentation to the expert. We show, in\ud
multiple domains, that the algorithm successfully learns the target\ud
policy and that the active learning strategy generally improves the\ud
speed of learning
Learning Actionable Representations with Goal-Conditioned Policies
Representation learning is a central challenge across a range of machine
learning areas. In reinforcement learning, effective and functional
representations have the potential to tremendously accelerate learning progress
and solve more challenging problems. Most prior work on representation learning
has focused on generative approaches, learning representations that capture all
underlying factors of variation in the observation space in a more disentangled
or well-ordered manner. In this paper, we instead aim to learn functionally
salient representations: representations that are not necessarily complete in
terms of capturing all factors of variation in the observation space, but
rather aim to capture those factors of variation that are important for
decision making -- that are "actionable." These representations are aware of
the dynamics of the environment, and capture only the elements of the
observation that are necessary for decision making rather than all factors of
variation, without explicit reconstruction of the observation. We show how
these representations can be useful to improve exploration for sparse reward
problems, to enable long horizon hierarchical reinforcement learning, and as a
state representation for learning policies for downstream tasks. We evaluate
our method on a number of simulated environments, and compare it to prior
methods for representation learning, exploration, and hierarchical
reinforcement learning.Comment: To be presented at ICLR 201
MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies
Humans are able to perform a myriad of sophisticated tasks by drawing upon
skills acquired through prior experience. For autonomous agents to have this
capability, they must be able to extract reusable skills from past experience
that can be recombined in new ways for subsequent tasks. Furthermore, when
controlling complex high-dimensional morphologies, such as humanoid bodies,
tasks often require coordination of multiple skills simultaneously. Learning
discrete primitives for every combination of skills quickly becomes
prohibitive. Composable primitives that can be recombined to create a large
variety of behaviors can be more suitable for modeling this combinatorial
explosion. In this work, we propose multiplicative compositional policies
(MCP), a method for learning reusable motor skills that can be composed to
produce a range of complex behaviors. Our method factorizes an agent's skills
into a collection of primitives, where multiple primitives can be activated
simultaneously via multiplicative composition. This flexibility allows the
primitives to be transferred and recombined to elicit new behaviors as
necessary for novel tasks. We demonstrate that MCP is able to extract
composable skills for highly complex simulated characters from pre-training
tasks, such as motion imitation, and then reuse these skills to solve
challenging continuous control tasks, such as dribbling a soccer ball to a
goal, and picking up an object and transporting it to a target location
Scaling simulation-to-real transfer by learning composable robot skills
We present a novel solution to the problem of simulation-to-real transfer,
which builds on recent advances in robot skill decomposition. Rather than
focusing on minimizing the simulation-reality gap, we learn a set of diverse
policies that are parameterized in a way that makes them easily reusable. This
diversity and parameterization of low-level skills allows us to find a
transferable policy that is able to use combinations and variations of
different skills to solve more complex, high-level tasks. In particular, we
first use simulation to jointly learn a policy for a set of low-level skills,
and a "skill embedding" parameterization which can be used to compose them.
Later, we learn high-level policies which actuate the low-level policies via
this skill embedding parameterization. The high-level policies encode how and
when to reuse the low-level skills together to achieve specific high-level
tasks. Importantly, our method learns to control a real robot in joint-space to
achieve these high-level tasks with little or no on-robot time, despite the
fact that the low-level policies may not be perfectly transferable from
simulation to real, and that the low-level skills were not trained on any
examples of high-level tasks. We illustrate the principles of our method using
informative simulation experiments. We then verify its usefulness for real
robotics problems by learning, transferring, and composing free-space and
contact motion skills on a Sawyer robot using only joint-space control. We
experiment with several techniques for composing pre-learned skills, and find
that our method allows us to use both learning-based approaches and efficient
search-based planning to achieve high-level tasks using only pre-learned
skills.Comment: Presented at ISER 2018. See
https://www.youtube.com/watch?v=Syr2RQTHqTs for supplemental vide
FeUdal Networks for Hierarchical Reinforcement Learning
We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical
reinforcement learning. Our approach is inspired by the feudal reinforcement
learning proposal of Dayan and Hinton, and gains power and efficacy by
decoupling end-to-end learning across multiple levels -- allowing it to utilise
different resolutions of time. Our framework employs a Manager module and a
Worker module. The Manager operates at a lower temporal resolution and sets
abstract goals which are conveyed to and enacted by the Worker. The Worker
generates primitive actions at every tick of the environment. The decoupled
structure of FuN conveys several benefits -- in addition to facilitating very
long timescale credit assignment it also encourages the emergence of
sub-policies associated with different goals set by the Manager. These
properties allow FuN to dramatically outperform a strong baseline agent on
tasks that involve long-term credit assignment or memorisation. We demonstrate
the performance of our proposed system on a range of tasks from the ATARI suite
and also from a 3D DeepMind Lab environment
- …