4 research outputs found
Terrain RL Simulator
We provide challenging simulation environments that range in difficulty.
The difficulty of solving a task is linked not only to the number of dimensions
in the action space but also to the size and shape of the distribution of
configurations the agent experiences. Therefore, we are releasing a number of
simulation environments that include randomly generated terrain. The library
also provides simple mechanisms to create new environments with different agent
morphologies and the option to modify the distribution of generated terrain. We
believe using these and other more complex simulations will help push the field
closer to creating human-level intelligence.Comment: 10 page
Biased Estimates of Advantages over Path Ensembles
The estimation of advantage is crucial for a number of reinforcement learning
algorithms, as it directly influences the choices of future paths. In this
work, we propose a family of estimates based on the order statistics over the
path ensemble, which allows one to flexibly drive the learning process, towards
or against risks. On top of this formulation, we systematically study the
impacts of different methods for estimating advantages. Our findings reveal
that biased estimates, when chosen appropriately, can result in significant
benefits. In particular, for the environments with sparse rewards, optimistic
estimates would lead to more efficient exploration of the policy space; while
for those where individual actions can have critical impacts, conservative
estimates are preferable. On various benchmarks, including MuJoCo continuous
control, Terrain locomotion, Atari games, and sparse-reward environments, the
proposed biased estimation schemes consistently demonstrate improvement over
mainstream methods, not only accelerating the learning process but also
obtaining substantial performance gains
ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents
It is a widely accepted principle that software without tests has bugs.
Testing reinforcement learning agents is especially difficult because of the
stochastic nature of both agents and environments, the complexity of
state-of-the-art models, and the sequential nature of their predictions.
Recently, the Arcade Learning Environment (ALE) has become one of the most
widely used benchmark suites for deep learning research, and state-of-the-art
Reinforcement Learning (RL) agents have been shown to routinely equal or exceed
human performance on many ALE tasks. Since ALE is based on emulation of
original Atari games, the environment does not provide semantically meaningful
representations of internal game state. This means that ALE has limited utility
as an environment for supporting testing or model introspection. We propose
ToyBox, a collection of reimplementations of these games that solves this
critical problem and enables robust testing of RL agents.Comment: NeurIPS Systems for ML Worksho
Inter-Level Cooperation in Hierarchical Reinforcement Learning
Hierarchies of temporally decoupled policies present a promising approach for
enabling structured exploration in complex long-term planning problems. To
fully achieve this approach an end-to-end training paradigm is needed. However,
training these multi-level policies has had limited success due to challenges
arising from interactions between the goal-assigning and goal-achieving levels
within a hierarchy. In this article, we consider the policy optimization
process as a multi-agent process. This allows us to draw on connections between
communication and cooperation in multi-agent RL, and demonstrate the benefits
of increased cooperation between sub-policies on the training performance of
the overall policy. We introduce a simple yet effective technique for inducing
inter-level cooperation by modifying the objective function and subsequent
gradients of higher-level policies. Experimental results on a wide variety of
simulated robotics and traffic control tasks demonstrate that inducing
cooperation results in stronger performing policies and increased sample
efficiency on a set of difficult long time horizon tasks. We also find that
goal-conditioned policies trained using our method display better transfer to
new tasks, highlighting the benefits of our method in learning task-agnostic
lower-level behaviors. Videos and code are available at:
https://sites.google.com/berkeley.edu/cooperative-hrl