216 research outputs found
Emergence of Locomotion Behaviours in Rich Environments
The reinforcement learning paradigm allows, in principle, for complex
behaviours to be learned directly from simple reward signals. In practice,
however, it is common to carefully hand-design the reward function to encourage
a particular solution, or to derive it from demonstration data. In this paper
explore how a rich environment can help to promote the learning of complex
behavior. Specifically, we train agents in diverse environmental contexts, and
find that this encourages the emergence of robust behaviours that perform well
across a suite of tasks. We demonstrate this principle for locomotion --
behaviours that are known for their sensitivity to the choice of reward. We
train several simulated bodies on a diverse set of challenging terrains and
obstacles, using a simple reward function based on forward progress. Using a
novel scalable variant of policy gradient reinforcement learning, our agents
learn to run, jump, crouch and turn as required by the environment without
explicit reward-based guidance. A visual depiction of highlights of the learned
behavior can be viewed following https://youtu.be/hx_bgoTF7bs
Codes, Functions, and Causes: A Critique of Brette's Conceptual Analysis of Coding
In a recent article, Brette argues that coding as a concept is inappropriate
for explanations of neurocognitive phenomena. Here, we argue that Brette's
conceptual analysis mischaracterizes the structure of causal claims in coding
and other forms of analysis-by-decomposition. We argue that analyses of this
form are permissible, conceptually coherent, and offer essential tools for
building and developing models of neurocognitive systems like the brain.Comment: Invited commentary on Romain Brette: "Is coding a relevant metaphor
for the brain?" (forthcoming in Behavioral and Brain Sciences). 4 pages,
including bibliograph
TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow
We introduce TensorFlow Agents, an efficient infrastructure paradigm for
building parallel reinforcement learning algorithms in TensorFlow. We simulate
multiple environments in parallel, and group them to perform the neural network
computation on a batch rather than individual observations. This allows the
TensorFlow execution engine to parallelize computation, without the need for
manual synchronization. Environments are stepped in separate Python processes
to progress them in parallel without interference of the global interpreter
lock. As part of this project, we introduce BatchPPO, an efficient
implementation of the proximal policy optimization algorithm. By open sourcing
TensorFlow Agents, we hope to provide a flexible starting point for future
projects that accelerates future research in the field.Comment: White paper, 7 page
Learning walk and trot from the same objective using different types of exploration
In quadruped gait learning, policy search methods that scale high dimensional
continuous action spaces are commonly used. In most approaches, it is necessary
to introduce prior knowledge on the gaits to limit the highly non-convex search
space of the policies. In this work, we propose a new approach to encode the
symmetry properties of the desired gaits, on the initial covariance of the
Gaussian search distribution, allowing for strategic exploration. Using
episode-based likelihood ratio policy gradient and relative entropy policy
search, we learned the gaits walk and trot on a simulated quadruped. Comparing
these gaits to random gaits learned by initialized diagonal covariance matrix,
we show that the performance can be significantly enhanced
Importance Weighted Evolution Strategies
Evolution Strategies (ES) emerged as a scalable alternative to popular
Reinforcement Learning (RL) techniques, providing an almost perfect speedup
when distributed across hundreds of CPU cores thanks to a reduced communication
overhead. Despite providing large improvements in wall-clock time, ES is data
inefficient when compared to competing RL methods. One of the main causes of
such inefficiency is the collection of large batches of experience, which are
discarded after each policy update. In this work, we study how to perform more
than one update per batch of experience by means of Importance Sampling while
preserving the scalability of the original method. The proposed method,
Importance Weighted Evolution Strategies (IW-ES), shows promising results and
is a first step towards designing efficient ES algorithms.Comment: NIPS Deep Reinforcement Learning Workshop 201
Cascade Attribute Learning Network
We propose the cascade attribute learning network (CALNet), which can learn
attributes in a control task separately and assemble them together. Our
contribution is twofold: first we propose attribute learning in reinforcement
learning (RL). Attributes used to be modeled using constraint functions or
terms in the objective function, making it hard to transfer. Attribute
learning, on the other hand, models these task properties as modules in the
policy network. We also propose using novel cascading compensative networks in
the CALNet to learn and assemble attributes. Using the CALNet, one can zero
shoot an unseen task by separately learning all its attributes, and assembling
the attribute modules. We have validated the capacity of our model on a wide
variety of control problems with attributes in time, position, velocity and
acceleration phases
Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control
Reinforcement Learning and the Evolutionary Strategy are two major approaches
in addressing complicated control problems. Both are strong contenders and have
their own devotee communities. Both groups have been very active in developing
new advances in their own domain and devising, in recent years, leading-edge
techniques to address complex continuous control tasks. Here, in the context of
Deep Reinforcement Learning, we formulate a parallelized version of the
Proximal Policy Optimization method and a Deep Deterministic Policy Gradient
method. Moreover, we conduct a thorough comparison between the state-of-the-art
techniques in both camps fro continuous control; evolutionary methods and Deep
Reinforcement Learning methods. The results show there is no consistent winner.Comment: NIPS 2017 Deep Reinforcement Learning Symposiu
Training in Task Space to Speed Up and Guide Reinforcement Learning
Recent breakthroughs in the reinforcement learning (RL) community have made
significant advances towards learning and deploying policies on real world
robotic systems. However, even with the current state-of-the-art algorithms and
computational resources, these algorithms are still plagued with high sample
complexity, and thus long training times, especially for high degree of freedom
(DOF) systems. There are also concerns arising from lack of perceived stability
or robustness guarantees from emerging policies. This paper aims at mitigating
these drawbacks by: (1) modeling a complex, high DOF system with a
representative simple one, (2) making explicit use of forward and inverse
kinematics without forcing the RL algorithm to "learn" them on its own, and (3)
learning locomotion policies in Cartesian space instead of joint space. In this
paper these methods are applied to JPL's Robosimian, but can be readily used on
any system with a base and end effector(s). These locomotion policies can be
produced in just a few minutes, trained on a single laptop. We compare the
robustness of the resulting learned policies to those of other control methods.
An accompanying video for this paper can be found at
https://youtu.be/xDxxSw5ahnc
Feedback Control For Cassie With Deep Reinforcement Learning
Bipedal locomotion skills are challenging to develop. Control strategies
often use local linearization of the dynamics in conjunction with reduced-order
abstractions to yield tractable solutions. In these model-based control
strategies, the controller is often not fully aware of many details, including
torque limits, joint limits, and other non-linearities that are necessarily
excluded from the control computations for simplicity. Deep reinforcement
learning (DRL) offers a promising model-free approach for controlling bipedal
locomotion which can more fully exploit the dynamics. However, current results
in the machine learning literature are often based on ad-hoc simulation models
that are not based on corresponding hardware. Thus it remains unclear how well
DRL will succeed on realizable bipedal robots. In this paper, we demonstrate
the effectiveness of DRL using a realistic model of Cassie, a bipedal robot. By
formulating a feedback control problem as finding the optimal policy for a
Markov Decision Process, we are able to learn robust walking controllers that
imitate a reference motion with DRL. Controllers for different walking speeds
are learned by imitating simple time-scaled versions of the original reference
motion. Controller robustness is demonstrated through several challenging
tests, including sensory delay, walking blindly on irregular terrain and
unexpected pushes at the pelvis. We also show we can interpolate between
individual policies and that robustness can be improved with an interpolated
policy.Comment: 6 pages, 4 figures, accepted for IROS201
Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped
Learning controllers for bipedal robots is a challenging problem, often
requiring expert knowledge and extensive tuning of parameters that vary in
different situations. Recently, deep reinforcement learning has shown promise
at automatically learning controllers for complex systems in simulation. This
has been followed by a push towards learning controllers that can be
transferred between simulation and hardware, primarily with the use of domain
randomization. However, domain randomization can make the problem of finding
stable controllers even more challenging, especially for underactuated bipedal
robots. In this work, we explore whether policies learned in simulation can be
transferred to hardware with the use of high-fidelity simulators and structured
controllers. We learn a neural network policy which is a part of a more
structured controller. While the neural network is learned in simulation, the
rest of the controller stays fixed, and can be tuned by the expert as needed.
We show that using this approach can greatly speed up the rate of learning in
simulation, as well as enable transfer of policies between simulation and
hardware. We present our results on an ATRIAS robot and explore the effect of
action spaces and cost functions on the rate of transfer between simulation and
hardware. Our results show that structured policies can indeed be learned in
simulation and implemented on hardware successfully. This has several
advantages, as the structure preserves the intuitive nature of the policy, and
the neural network improves the performance of the hand-designed policy. In
this way, we propose a way of using neural networks to improve expert designed
controllers, while maintaining ease of understanding.Comment: Submitted to 2019 IEEE International Conference on Robotics and
Automatio
- …