Search CORE

16,305 research outputs found

Latent Space Policies for Hierarchical Reinforcement Learning

Author: Abbeel Pieter
Haarnoja Tuomas
Hartikainen Kristian
Levine Sergey
Publication venue
Publication date: 03/09/2018
Field of study

We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer's policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.Comment: ICML 2018; Videos: https://sites.google.com/view/latent-space-deep-rl Code: https://github.com/haarnoja/sa

arXiv.org e-Print Archive

Disentangled Skill Embeddings for Reinforcement Learning

Author: Adam Vincent
Grau-Moya Jordi
Pascual-Diaz Sergio
Petangoda Janith C.
Vrancx Peter
Publication venue
Publication date: 21/06/2019
Field of study

We propose a novel framework for multi-task reinforcement learning (MTRL). Using a variational inference formulation, we learn policies that generalize across both changing dynamics and goals. The resulting policies are parametrized by shared parameters that allow for transfer between different dynamics and goal conditions, and by task-specific latent-space embeddings that allow for specialization to particular tasks. We show how the latent-spaces enable generalization to unseen dynamics and goals conditions. Additionally, policies equipped with such embeddings serve as a space of skills (or options) for hierarchical reinforcement learning. Since we can change task dynamics and goals independently, we name our framework Disentangled Skill Embeddings (DSE)

arXiv.org e-Print Archive

Hierarchical Policies for Cluttered-Scene Grasping with Latent Plans

Author: Fox Dieter
Wang Lirui
Xiang Yu
Publication venue
Publication date: 03/07/2021
Field of study

6D grasping in cluttered scenes is a longstanding robotic manipulation problem. Open-loop manipulation pipelines can fail due to modularity and error sensitivity while most end-to-end grasping policies with raw perception inputs have not yet scaled to complex scenes with obstacles. In this work, we propose a new method to close the gap through sampling and selecting plans in the latent space. Our hierarchical framework learns collision-free target-driven grasping based on partial point cloud observations. Our method learns an embedding space to represent expert grasping plans and a variational autoencoder to sample diverse latent plans at inference time. Furthermore, we train a latent plan critic for plan selection and an option classifier for switching to an instance grasping policy through hierarchical reinforcement learning. We evaluate and analyze our method and compare against several baselines in simulation, and demonstrate that the latent planning can generalize to the real-world cluttered-scene grasping task. Our videos and code can be found at https://sites.google.com/view/latent-grasping

arXiv.org e-Print Archive

Hierarchical Reinforcement Learning for Quadruped Locomotion

Author: Caluwaerts Ken
Iscen Atil
Jain Deepali
Publication venue
Publication date: 21/05/2019
Field of study

Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework to automatically decompose complex locomotion tasks. A high-level policy issues commands in a latent space and also selects for how long the low-level policy will execute the latent command. Concurrently, the low-level policy uses the latent command and only the robot's on-board sensors to control the robot's actuators. Our approach allows the high-level policy to run at a lower frequency than the low-level one. We test our framework on a path-following task for a dynamic quadruped robot and we show that steering behaviors automatically emerge in the latent command space as low-level skills are needed for this task. We then show efficient adaptation of the trained policy to a different task by transfer of the trained low-level policy. Finally, we validate the policies on a real quadruped robot. To the best of our knowledge, this is the first application of end-to-end hierarchical learning to a real robotic locomotion task

arXiv.org e-Print Archive

Hierarchical Policy Learning is Sensitive to Goal Space Design

Author: Bansal Arjun K.
Candadai Madhavun
Dwiel Zach
Phielipp Mariano
Publication venue
Publication date: 25/06/2019
Field of study

Hierarchy in reinforcement learning agents allows for control at multiple time scales yielding improved sample efficiency, the ability to deal with long time horizons and transferability of sub-policies to tasks outside the training distribution. It is often implemented as a master policy providing goals to a sub-policy. Ideally, we would like the goal-spaces to be learned, however, properties of optimal goal spaces still remain unknown and consequently there is no method yet to learn optimal goal spaces. Motivated by this, we systematically analyze how various modifications to the ground-truth goal-space affect learning in hierarchical models with the aim of identifying important properties of optimal goal spaces. Our results show that, while rotation of ground-truth goal spaces and noise had no effect, having additional unnecessary factors significantly impaired learning in hierarchical models.Comment: Accepted to be presented at Task-Agnostic Reinforcement Learning (TARL) workshop at ICLR'1

arXiv.org e-Print Archive

Recommended from our members

Bayesian methods for knowledge transfer and policy search in reinforcement learning

Author
Publication venue
Publication date: 28/07/2012
Field of study

Graduation date: 2013How can an agent generalize its knowledge to new circumstances? To learn\ud effectively an agent acting in a sequential decision problem must make intelligent action selection choices based on its available knowledge. This dissertation focuses on Bayesian methods of representing learned knowledge and develops novel algorithms that exploit the represented\ud knowledge when selecting actions.\ud \ud Our first contribution introduces the multi-task Reinforcement\ud Learning setting in which an agent solves a sequence of tasks. An\ud agent equipped with knowledge of the relationship between tasks can\ud transfer knowledge between them. We propose the transfer of two\ud distinct types of knowledge: knowledge of domain models and knowledge\ud of policies. To represent the transferable knowledge, we propose\ud hierarchical Bayesian priors on domain models and policies\ud respectively. To transfer domain model knowledge, we introduce a new\ud algorithm for model-based Bayesian Reinforcement Learning in the\ud multi-task setting which exploits the learned hierarchical Bayesian\ud model to improve exploration in related tasks. To transfer policy\ud knowledge, we introduce a new policy search algorithm that accepts a\ud policy prior as input and uses the prior to bias policy search. A\ud specific implementation of this algorithm is developed that accepts a\ud hierarchical policy prior. The algorithm learns the hierarchical\ud structure and reuses components of the structure in related tasks.\ud \ud Our second contribution addresses the basic problem of generalizing knowledge gained from previously-executed policies. Bayesian\ud Optimization is a method of exploiting a prior model of an objective function to quickly identify the point maximizing the modeled objective.\ud Successful use of Bayesian Optimization in Reinforcement Learning\ud requires a model relating policies and their performance. Given such a\ud model, Bayesian Optimization can be applied to search for an optimal\ud policy. Early work using Bayesian Optimization in the Reinforcement\ud Learning setting ignored the sequential nature of the underlying\ud decision problem. The work presented in this thesis explicitly\ud addresses this problem. We construct new Bayesian models that take\ud advantage of sequence information to better generalize knowledge\ud across policies. We empirically evaluate the value of this approach in\ud a variety of Reinforcement Learning benchmark problems. Experiments\ud show that our method significantly reduces the amount of exploration\ud required to identify the optimal policy.\ud \ud Our final contribution is a new framework for learning parametric\ud policies from queries presented to an expert. In many domains it is\ud difficult to provide expert demonstrations of desired policies.\ud However, it may still be a simple matter for an expert to identify\ud good and bad performance. To take advantage of this limited expert\ud knowledge, our agent presents experts with pairs of demonstrations and\ud asks which of the demonstrations best represents a latent target\ud behavior. The goal is to use a small number of queries to elicit the\ud latent behavior from the expert. We formulate a Bayesian model of the\ud querying process, an inference procedure that estimates the posterior\ud distribution over the latent policy space, and an active procedure for\ud selecting new queries for presentation to the expert. We show, in\ud multiple domains, that the algorithm successfully learns the target\ud policy and that the active learning strategy generally improves the\ud speed of learning

ScholarsArchive@OSU

Learning Actionable Representations with Goal-Conditioned Policies

Author: Ghosh Dibya
Gupta Abhishek
Levine Sergey
Publication venue
Publication date: 29/01/2019
Field of study

Representation learning is a central challenge across a range of machine learning areas. In reinforcement learning, effective and functional representations have the potential to tremendously accelerate learning progress and solve more challenging problems. Most prior work on representation learning has focused on generative approaches, learning representations that capture all underlying factors of variation in the observation space in a more disentangled or well-ordered manner. In this paper, we instead aim to learn functionally salient representations: representations that are not necessarily complete in terms of capturing all factors of variation in the observation space, but rather aim to capture those factors of variation that are important for decision making -- that are "actionable." These representations are aware of the dynamics of the environment, and capture only the elements of the observation that are necessary for decision making rather than all factors of variation, without explicit reconstruction of the observation. We show how these representations can be useful to improve exploration for sparse reward problems, to enable long horizon hierarchical reinforcement learning, and as a state representation for learning policies for downstream tasks. We evaluate our method on a number of simulated environments, and compare it to prior methods for representation learning, exploration, and hierarchical reinforcement learning.Comment: To be presented at ICLR 201

arXiv.org e-Print Archive

MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies

Author: Abbeel Pieter
Chang Michael
Levine Sergey
Peng Xue Bin
Zhang Grace
Publication venue
Publication date: 23/05/2019
Field of study

Humans are able to perform a myriad of sophisticated tasks by drawing upon skills acquired through prior experience. For autonomous agents to have this capability, they must be able to extract reusable skills from past experience that can be recombined in new ways for subsequent tasks. Furthermore, when controlling complex high-dimensional morphologies, such as humanoid bodies, tasks often require coordination of multiple skills simultaneously. Learning discrete primitives for every combination of skills quickly becomes prohibitive. Composable primitives that can be recombined to create a large variety of behaviors can be more suitable for modeling this combinatorial explosion. In this work, we propose multiplicative compositional policies (MCP), a method for learning reusable motor skills that can be composed to produce a range of complex behaviors. Our method factorizes an agent's skills into a collection of primitives, where multiple primitives can be activated simultaneously via multiplicative composition. This flexibility allows the primitives to be transferred and recombined to elicit new behaviors as necessary for novel tasks. We demonstrate that MCP is able to extract composable skills for highly complex simulated characters from pre-training tasks, such as motion imitation, and then reuse these skills to solve challenging continuous control tasks, such as dribbling a soccer ball to a goal, and picking up an object and transporting it to a target location

arXiv.org e-Print Archive

Scaling simulation-to-real transfer by learning composable robot skills

Author: Hausman Karol
He Zhanpeng
Heiden Eric
Julian Ryan
Lim Joseph J.
Schaal Stefan
Sukhatme Gaurav
Zhang Hejia
Publication venue
Publication date: 13/11/2018
Field of study

We present a novel solution to the problem of simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation-reality gap, we learn a set of diverse policies that are parameterized in a way that makes them easily reusable. This diversity and parameterization of low-level skills allows us to find a transferable policy that is able to use combinations and variations of different skills to solve more complex, high-level tasks. In particular, we first use simulation to jointly learn a policy for a set of low-level skills, and a "skill embedding" parameterization which can be used to compose them. Later, we learn high-level policies which actuate the low-level policies via this skill embedding parameterization. The high-level policies encode how and when to reuse the low-level skills together to achieve specific high-level tasks. Importantly, our method learns to control a real robot in joint-space to achieve these high-level tasks with little or no on-robot time, despite the fact that the low-level policies may not be perfectly transferable from simulation to real, and that the low-level skills were not trained on any examples of high-level tasks. We illustrate the principles of our method using informative simulation experiments. We then verify its usefulness for real robotics problems by learning, transferring, and composing free-space and contact motion skills on a Sawyer robot using only joint-space control. We experiment with several techniques for composing pre-learned skills, and find that our method allows us to use both learning-based approaches and efficient search-based planning to achieve high-level tasks using only pre-learned skills.Comment: Presented at ISER 2018. See https://www.youtube.com/watch?v=Syr2RQTHqTs for supplemental vide

arXiv.org e-Print Archive

FeUdal Networks for Hierarchical Reinforcement Learning

Author: Heess Nicolas
Jaderberg Max
Kavukcuoglu Koray
Osindero Simon
Schaul Tom
Silver David
Vezhnevets Alexander Sasha
Publication venue
Publication date: 06/03/2017
Field of study

We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels -- allowing it to utilise different resolutions of time. Our framework employs a Manager module and a Worker module. The Manager operates at a lower temporal resolution and sets abstract goals which are conveyed to and enacted by the Worker. The Worker generates primitive actions at every tick of the environment. The decoupled structure of FuN conveys several benefits -- in addition to facilitating very long timescale credit assignment it also encourages the emergence of sub-policies associated with different goals set by the Manager. These properties allow FuN to dramatically outperform a strong baseline agent on tasks that involve long-term credit assignment or memorisation. We demonstrate the performance of our proposed system on a range of tasks from the ATARI suite and also from a 3D DeepMind Lab environment

arXiv.org e-Print Archive