24,448 research outputs found
DoorGym: A Scalable Door Opening Environment And Baseline Agent
In order to practically implement the door opening task, a policy ought to be
robust to a wide distribution of door types and environment settings.
Reinforcement Learning (RL) with Domain Randomization (DR) is a promising
technique to enforce policy generalization, however, there are only a few
accessible training environments that are inherently designed to train agents
in domain randomized environments. We introduce DoorGym, an open-source door
opening simulation framework designed to utilize domain randomization to train
a stable policy. We intend for our environment to lie at the intersection of
domain transfer, practical tasks, and realism. We also provide baseline
Proximal Policy Optimization and Soft Actor-Critic implementations, which
achieves success rates between 0% up to 95% for opening various types of doors
in this environment. Moreover, the real-world transfer experiment shows the
trained policy is able to work in the real world. Environment kit available
here: https://github.com/PSVL/DoorGym/Comment: Full version (Real world transfer experiments result
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) has been applied successfully to many
robotic applications. However, the large number of trials needed for training
is a key issue. Most of existing techniques developed to improve training
efficiency (e.g. imitation) target on general tasks rather than being tailored
for robot applications, which have their specific context to benefit from. We
propose a novel framework, Assisted Reinforcement Learning, where a classical
controller (e.g. a PID controller) is used as an alternative, switchable policy
to speed up training of DRL for local planning and navigation problems. The
core idea is that the simple control law allows the robot to rapidly learn
sensible primitives, like driving in a straight line, instead of random
exploration. As the actor network becomes more advanced, it can then take over
to perform more complex actions, like obstacle avoidance. Eventually, the
simple controller can be discarded entirely. We show that not only does this
technique train faster, it also is less sensitive to the structure of the DRL
network and consistently outperforms a standard Deep Deterministic Policy
Gradient network. We demonstrate the results in both simulation and real-world
experiments.Comment: Published in ICRA2018. The code is now available at
https://github.com/xie9187/AsDDP
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
In this work, we propose a novel robot learning framework called Neural Task
Programming (NTP), which bridges the idea of few-shot learning from
demonstration and neural program induction. NTP takes as input a task
specification (e.g., video demonstration of a task) and recursively decomposes
it into finer sub-task specifications. These specifications are fed to a
hierarchical neural program, where bottom-level programs are callable
subroutines that interact with the environment. We validate our method in three
robot manipulation tasks. NTP achieves strong generalization across sequential
tasks that exhibit hierarchal and compositional structures. The experimental
results show that NTP learns to generalize well to- wards unseen tasks with
increasing lengths, variable topologies, and changing objectives.Comment: ICRA 201
- …