Search CORE

216 research outputs found

Emergence of Locomotion Behaviours in Rich Environments

Author: Erez Tom
Eslami S. M. Ali
Heess Nicolas
Lemmon Jay
Merel Josh
Riedmiller Martin
Silver David
Sriram Srinivasan
Tassa Yuval
TB Dhruva
Wang Ziyu
Wayne Greg
Publication venue
Publication date: 10/07/2017
Field of study

The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion -- behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed following https://youtu.be/hx_bgoTF7bs

arXiv.org e-Print Archive

Codes, Functions, and Causes: A Critique of Brette's Conceptual Analysis of Coding

Author: Barack David
Jaegle Andrew
Publication venue
Publication date: 18/04/2019
Field of study

In a recent article, Brette argues that coding as a concept is inappropriate for explanations of neurocognitive phenomena. Here, we argue that Brette's conceptual analysis mischaracterizes the structure of causal claims in coding and other forms of analysis-by-decomposition. We argue that analyses of this form are permissible, conceptually coherent, and offer essential tools for building and developing models of neurocognitive systems like the brain.Comment: Invited commentary on Romain Brette: "Is coding a relevant metaphor for the brain?" (forthcoming in Behavioral and Brain Sciences). 4 pages, including bibliograph

arXiv.org e-Print Archive

TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

Author: Davidson James
Hafner Danijar
Vanhoucke Vincent
Publication venue
Publication date: 31/10/2018
Field of study

We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field.Comment: White paper, 7 page

arXiv.org e-Print Archive

Learning walk and trot from the same objective using different types of exploration

Author: Liu Zinan
Peters Jan
Ploeger Kai
Rueckert Elmar
Stark Svenja
Publication venue
Publication date: 28/04/2019
Field of study

In quadruped gait learning, policy search methods that scale high dimensional continuous action spaces are commonly used. In most approaches, it is necessary to introduce prior knowledge on the gaits to limit the highly non-convex search space of the policies. In this work, we propose a new approach to encode the symmetry properties of the desired gaits, on the initial covariance of the Gaussian search distribution, allowing for strategic exploration. Using episode-based likelihood ratio policy gradient and relative entropy policy search, we learned the gaits walk and trot on a simulated quadruped. Comparing these gaits to random gaits learned by initialized diagonal covariance matrix, we show that the performance can be significantly enhanced

arXiv.org e-Print Archive

Importance Weighted Evolution Strategies

Author: Campos Víctor
Giro-i-Nieto Xavier
Torres Jordi
Publication venue
Publication date: 12/11/2018
Field of study

Evolution Strategies (ES) emerged as a scalable alternative to popular Reinforcement Learning (RL) techniques, providing an almost perfect speedup when distributed across hundreds of CPU cores thanks to a reduced communication overhead. Despite providing large improvements in wall-clock time, ES is data inefficient when compared to competing RL methods. One of the main causes of such inefficiency is the collection of large batches of experience, which are discarded after each policy update. In this work, we study how to perform more than one update per batch of experience by means of Importance Sampling while preserving the scalability of the original method. The proposed method, Importance Weighted Evolution Strategies (IW-ES), shows promising results and is a first step towards designing efficient ES algorithms.Comment: NIPS Deep Reinforcement Learning Workshop 201

arXiv.org e-Print Archive

Cascade Attribute Learning Network

Author: Chang Haonan
Tomizuka Masayoshi
Xu Zhuo
Publication venue
Publication date: 24/11/2017
Field of study

We propose the cascade attribute learning network (CALNet), which can learn attributes in a control task separately and assemble them together. Our contribution is twofold: first we propose attribute learning in reinforcement learning (RL). Attributes used to be modeled using constraint functions or terms in the objective function, making it hard to transfer. Attribute learning, on the other hand, models these task properties as modules in the policy network. We also propose using novel cascading compensative networks in the CALNet to learn and assemble attributes. Using the CALNet, one can zero shoot an unseen task by separately learning all its attributes, and assembling the attribute modules. We have validated the capacity of our model on a wide variety of control problems with attributes in time, position, velocity and acceleration phases

arXiv.org e-Print Archive

Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control

Author: Zaiane Osmar R.
Zhang Shangtong
Publication venue
Publication date: 07/03/2018
Field of study

Reinforcement Learning and the Evolutionary Strategy are two major approaches in addressing complicated control problems. Both are strong contenders and have their own devotee communities. Both groups have been very active in developing new advances in their own domain and devising, in recent years, leading-edge techniques to address complex continuous control tasks. Here, in the context of Deep Reinforcement Learning, we formulate a parallelized version of the Proximal Policy Optimization method and a Deep Deterministic Policy Gradient method. Moreover, we conduct a thorough comparison between the state-of-the-art techniques in both camps fro continuous control; evolutionary methods and Deep Reinforcement Learning methods. The results show there is no consistent winner.Comment: NIPS 2017 Deep Reinforcement Learning Symposiu

arXiv.org e-Print Archive

Training in Task Space to Speed Up and Guide Reinforcement Learning

Author: Bellegarda Guillaume
Byl Katie
Publication venue
Publication date: 06/03/2019
Field of study

Recent breakthroughs in the reinforcement learning (RL) community have made significant advances towards learning and deploying policies on real world robotic systems. However, even with the current state-of-the-art algorithms and computational resources, these algorithms are still plagued with high sample complexity, and thus long training times, especially for high degree of freedom (DOF) systems. There are also concerns arising from lack of perceived stability or robustness guarantees from emerging policies. This paper aims at mitigating these drawbacks by: (1) modeling a complex, high DOF system with a representative simple one, (2) making explicit use of forward and inverse kinematics without forcing the RL algorithm to "learn" them on its own, and (3) learning locomotion policies in Cartesian space instead of joint space. In this paper these methods are applied to JPL's Robosimian, but can be readily used on any system with a base and end effector(s). These locomotion policies can be produced in just a few minutes, trained on a single laptop. We compare the robustness of the resulting learned policies to those of other control methods. An accompanying video for this paper can be found at https://youtu.be/xDxxSw5ahnc

arXiv.org e-Print Archive

Feedback Control For Cassie With Deep Reinforcement Learning

Author: Berseth Glen
Clary Patrick
Hurst Jonathan
van de Panne Michiel
Xie Zhaoming
Publication venue
Publication date: 27/07/2018
Field of study

Bipedal locomotion skills are challenging to develop. Control strategies often use local linearization of the dynamics in conjunction with reduced-order abstractions to yield tractable solutions. In these model-based control strategies, the controller is often not fully aware of many details, including torque limits, joint limits, and other non-linearities that are necessarily excluded from the control computations for simplicity. Deep reinforcement learning (DRL) offers a promising model-free approach for controlling bipedal locomotion which can more fully exploit the dynamics. However, current results in the machine learning literature are often based on ad-hoc simulation models that are not based on corresponding hardware. Thus it remains unclear how well DRL will succeed on realizable bipedal robots. In this paper, we demonstrate the effectiveness of DRL using a realistic model of Cassie, a bipedal robot. By formulating a feedback control problem as finding the optimal policy for a Markov Decision Process, we are able to learn robust walking controllers that imitate a reference motion with DRL. Controllers for different walking speeds are learned by imitating simple time-scaled versions of the original reference motion. Controller robustness is demonstrated through several challenging tests, including sensory delay, walking blindly on irregular terrain and unexpected pushes at the pelvis. We also show we can interpolate between individual policies and that robustness can be improved with an interpolated policy.Comment: 6 pages, 4 figures, accepted for IROS201

arXiv.org e-Print Archive

Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped

Author: Atkeson Christopher G.
Geyer Hartmut
Li Tianyu
Rai Akshara
Publication venue
Publication date: 27/09/2018
Field of study

Learning controllers for bipedal robots is a challenging problem, often requiring expert knowledge and extensive tuning of parameters that vary in different situations. Recently, deep reinforcement learning has shown promise at automatically learning controllers for complex systems in simulation. This has been followed by a push towards learning controllers that can be transferred between simulation and hardware, primarily with the use of domain randomization. However, domain randomization can make the problem of finding stable controllers even more challenging, especially for underactuated bipedal robots. In this work, we explore whether policies learned in simulation can be transferred to hardware with the use of high-fidelity simulators and structured controllers. We learn a neural network policy which is a part of a more structured controller. While the neural network is learned in simulation, the rest of the controller stays fixed, and can be tuned by the expert as needed. We show that using this approach can greatly speed up the rate of learning in simulation, as well as enable transfer of policies between simulation and hardware. We present our results on an ATRIAS robot and explore the effect of action spaces and cost functions on the rate of transfer between simulation and hardware. Our results show that structured policies can indeed be learned in simulation and implemented on hardware successfully. This has several advantages, as the structure preserves the intuitive nature of the policy, and the neural network improves the performance of the hand-designed policy. In this way, we propose a way of using neural networks to improve expert designed controllers, while maintaining ease of understanding.Comment: Submitted to 2019 IEEE International Conference on Robotics and Automatio

arXiv.org e-Print Archive