12,716 research outputs found
DoShiCo Challenge: Domain Shift in Control Prediction
Training deep neural network policies end-to-end for real-world applications
so far requires big demonstration datasets in the real world or big sets
consisting of a large variety of realistic and closely related 3D CAD models.
These real or virtual data should, moreover, have very similar characteristics
to the conditions expected at test time. These stringent requirements and the
time consuming data collection processes that they entail, are currently the
most important impediment that keeps deep reinforcement learning from being
deployed in real-world applications. Therefore, in this work we advocate an
alternative approach, where instead of avoiding any domain shift by carefully
selecting the training data, the goal is to learn a policy that can cope with
it. To this end, we propose the DoShiCo challenge: to train a model in very
basic synthetic environments, far from realistic, in a way that it can be
applied in more realistic environments as well as take the control decisions on
real-world data. In particular, we focus on the task of collision avoidance for
drones. We created a set of simulated environments that can be used as
benchmark and implemented a baseline method, exploiting depth prediction as an
auxiliary task to help overcome the domain shift. Even though the policy is
trained in very basic environments, it can learn to fly without collisions in a
very different realistic simulated environment. Of course several benchmarks
for reinforcement learning already exist - but they never include a large
domain shift. On the other hand, several benchmarks in computer vision focus on
the domain shift, but they take the form of a static datasets instead of
simulated environments. In this work we claim that it is crucial to take the
two challenges together in one benchmark.Comment: Published at SIMPAR 2018. Please visit the paper webpage for more
information, a movie and code for reproducing results:
https://kkelchte.github.io/doshic
One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
One of the key challenges in applying reinforcement learning to complex
robotic control tasks is the need to gather large amounts of experience in
order to find an effective policy for the task at hand. Model-based
reinforcement learning can achieve good sample efficiency, but requires the
ability to learn a model of the dynamics that is good enough to learn an
effective policy. In this work, we develop a model-based reinforcement learning
algorithm that combines prior knowledge from previous tasks with online
adaptation of the dynamics model. These two ingredients enable highly
sample-efficient learning even in regimes where estimating the true dynamics is
very difficult, since the online model adaptation allows the method to locally
compensate for unmodeled variation in the dynamics. We encode the prior
experience into a neural network dynamics model, adapt it online by
progressively refitting a local linear model of the dynamics, and use model
predictive control to plan under these dynamics. Our experimental results show
that this approach can be used to solve a variety of complex robotic
manipulation tasks in just a single attempt, using prior data from other
manipulation behaviors
Gradient-free Policy Architecture Search and Adaptation
We develop a method for policy architecture search and adaptation via
gradient-free optimization which can learn to perform autonomous driving tasks.
By learning from both demonstration and environmental reward we develop a model
that can learn with relatively few early catastrophic failures. We first learn
an architecture of appropriate complexity to perceive aspects of world state
relevant to the expert demonstration, and then mitigate the effect of
domain-shift during deployment by adapting a policy demonstrated in a source
domain to rewards obtained in a target environment. We show that our approach
allows safer learning than baseline methods, offering a reduced cumulative
crash metric over the agent's lifetime as it learns to drive in a realistic
simulated environment.Comment: Accepted in Conference on Robot Learning, 201
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Simulations are attractive environments for training agents as they provide
an abundant source of data and alleviate certain safety concerns during the
training process. But the behaviours developed by agents in simulation are
often specific to the characteristics of the simulator. Due to modeling error,
strategies that are successful in simulation may not transfer to their real
world counterparts. In this paper, we demonstrate a simple method to bridge
this "reality gap". By randomizing the dynamics of the simulator during
training, we are able to develop policies that are capable of adapting to very
different dynamics, including ones that differ significantly from the dynamics
on which the policies were trained. This adaptivity enables the policies to
generalize to the dynamics of the real world without any training on the
physical system. Our approach is demonstrated on an object pushing task using a
robotic arm. Despite being trained exclusively in simulation, our policies are
able to maintain a similar level of performance when deployed on a real robot,
reliably moving an object to a desired location from random initial
configurations. We explore the impact of various design decisions and show that
the resulting policies are robust to significant calibration error
- …