2,233 research outputs found
Data-efficient Domain Randomization with Bayesian Optimization
When learning policies for robot control, the required real-world data is
typically prohibitively expensive to acquire, so learning in simulation is a
popular strategy. Unfortunately, such polices are often not transferable to the
real world due to a mismatch between the simulation and reality, called
'reality gap'. Domain randomization methods tackle this problem by randomizing
the physics simulator (source domain) during training according to a
distribution over domain parameters in order to obtain more robust policies
that are able to overcome the reality gap. Most domain randomization approaches
sample the domain parameters from a fixed distribution. This solution is
suboptimal in the context of sim-to-real transferability, since it yields
policies that have been trained without explicitly optimizing for the reward on
the real system (target domain). Additionally, a fixed distribution assumes
there is prior knowledge about the uncertainty over the domain parameters. In
this paper, we propose Bayesian Domain Randomization (BayRn), a black-box
sim-to-real algorithm that solves tasks efficiently by adapting the domain
parameter distribution during learning given sparse data from the real-world
target domain. BayRn uses Bayesian optimization to search the space of source
domain distribution parameters such that this leads to a policy which maximizes
the real-word objective, allowing for adaptive distributions during policy
optimization. We experimentally validate the proposed approach in sim-to-sim as
well as in sim-to-real experiments, comparing against three baseline methods on
two robotic tasks. Our results show that BayRn is able to perform sim-to-real
transfer, while significantly reducing the required prior knowledge.Comment: Accepted at RA-L / ICR
How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?
Recently, reinforcement learning (RL) algorithms have demonstrated remarkable
success in learning complicated behaviors from minimally processed input.
However, most of this success is limited to simulation. While there are
promising successes in applying RL algorithms directly on real systems, their
performance on more complex systems remains bottle-necked by the relative data
inefficiency of RL algorithms. Domain randomization is a promising direction of
research that has demonstrated impressive results using RL algorithms to
control real robots. At a high level, domain randomization works by training a
policy on a distribution of environmental conditions in simulation. If the
environments are diverse enough, then the policy trained on this distribution
will plausibly generalize to the real world. A human-specified design choice in
domain randomization is the form and parameters of the distribution of
simulated environments. It is unclear how to the best pick the form and
parameters of this distribution and prior work uses hand-tuned distributions.
This extended abstract demonstrates that the choice of the distribution plays a
major role in the performance of the trained policies in the real world and
that the parameter of this distribution can be optimized to maximize the
performance of the trained policies in the real worldComment: 2-page extended abstrac
Policy Transfer with Strategy Optimization
Computer simulation provides an automatic and safe way for training robotic
control policies to achieve complex tasks such as locomotion. However, a policy
trained in simulation usually does not transfer directly to the real hardware
due to the differences between the two environments. Transfer learning using
domain randomization is a promising approach, but it usually assumes that the
target environment is close to the distribution of the training environments,
thus relying heavily on accurate system identification. In this paper, we
present a different approach that leverages domain randomization for
transferring control policies to unknown environments. The key idea that,
instead of learning a single policy in the simulation, we simultaneously learn
a family of policies that exhibit different behaviors. When tested in the
target environment, we directly search for the best policy in the family based
on the task performance, without the need to identify the dynamic parameters.
We evaluate our method on five simulated robotic control problems with
different discrepancies in the training and testing environment and demonstrate
that our method can overcome larger modeling errors compared to training a
robust policy or an adaptive policy
Active Domain Randomization
Domain randomization is a popular technique for improving domain transfer,
often used in a zero-shot setting when the target domain is unknown or cannot
easily be used for training. In this work, we empirically examine the effects
of domain randomization on agent generalization. Our experiments show that
domain randomization may lead to suboptimal, high-variance policies, which we
attribute to the uniform sampling of environment parameters. We propose Active
Domain Randomization, a novel algorithm that learns a parameter sampling
strategy. Our method looks for the most informative environment variations
within the given randomization ranges by leveraging the discrepancies of policy
rollouts in randomized and reference environment instances. We find that
training more frequently on these instances leads to better overall agent
generalization. Our experiments across various physics-based simulated and
real-robot tasks show that this enhancement leads to more robust, consistent
policies.Comment: Code available at
https://github.com/montrealrobotics/active-domainran
Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience
We consider the problem of transferring policies to the real world by
training on a distribution of simulated scenarios. Rather than manually tuning
the randomization of simulations, we adapt the simulation parameter
distribution using a few real world roll-outs interleaved with policy training.
In doing so, we are able to change the distribution of simulations to improve
the policy transfer by matching the policy behavior in simulation and the real
world. We show that policies trained with our method are able to reliably
transfer to different robots in two real world tasks: swing-peg-in-hole and
opening a cabinet drawer. The video of our experiments can be found at
https://sites.google.com/view/simop
Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped
Learning controllers for bipedal robots is a challenging problem, often
requiring expert knowledge and extensive tuning of parameters that vary in
different situations. Recently, deep reinforcement learning has shown promise
at automatically learning controllers for complex systems in simulation. This
has been followed by a push towards learning controllers that can be
transferred between simulation and hardware, primarily with the use of domain
randomization. However, domain randomization can make the problem of finding
stable controllers even more challenging, especially for underactuated bipedal
robots. In this work, we explore whether policies learned in simulation can be
transferred to hardware with the use of high-fidelity simulators and structured
controllers. We learn a neural network policy which is a part of a more
structured controller. While the neural network is learned in simulation, the
rest of the controller stays fixed, and can be tuned by the expert as needed.
We show that using this approach can greatly speed up the rate of learning in
simulation, as well as enable transfer of policies between simulation and
hardware. We present our results on an ATRIAS robot and explore the effect of
action spaces and cost functions on the rate of transfer between simulation and
hardware. Our results show that structured policies can indeed be learned in
simulation and implemented on hardware successfully. This has several
advantages, as the structure preserves the intuitive nature of the policy, and
the neural network improves the performance of the hand-designed policy. In
this way, we propose a way of using neural networks to improve expert designed
controllers, while maintaining ease of understanding.Comment: Submitted to 2019 IEEE International Conference on Robotics and
Automatio
A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies
Learning effective visuomotor policies for robots purely from data is
challenging, but also appealing since a learning-based system should not
require manual tuning or calibration. In the case of a robot operating in a
real environment the training process can be costly, time-consuming, and even
dangerous since failures are common at the start of training. For this reason,
it is desirable to be able to leverage \textit{simulation} and
\textit{off-policy} data to the extent possible to train the robot. In this
work, we introduce a robust framework that plans in simulation and transfers
well to the real environment. Our model incorporates a gradient-descent based
planning module, which, given the initial image and goal image, encodes the
images to a lower dimensional latent state and plans a trajectory to reach the
goal. The model, consisting of the encoder and planner modules, is trained
through a meta-learning strategy in simulation first. We subsequently perform
adversarial domain transfer on the encoder by using a bank of unlabelled but
random images from the simulation and real environments to enable the encoder
to map images from the real and simulated environments to a similarly
distributed latent representation. By fine tuning the entire model (encoder +
planner) with far fewer real world expert demonstrations, we show successful
planning performances in different navigation tasks.Comment: Under review in ICRA 201
From Video Game to Real Robot: The Transfer between Action Spaces
Deep reinforcement learning has proven to be successful for learning tasks in
simulated environments, but applying same techniques for robots in real-world
domain is more challenging, as they require hours of training. To address this,
transfer learning can be used to train the policy first in a simulated
environment and then transfer it to physical agent. As the simulation never
matches reality perfectly, the physics, visuals and action spaces by necessity
differ between these environments to some degree. In this work, we study how
general video games can be directly used instead of fine-tuned simulations for
the sim-to-real transfer. Especially, we study how the agent can learn the new
action space autonomously, when the game actions do not match the robot
actions. Our results show that the different action space can be learned by
re-training only part of neural network and we obtain above 90% mean success
rate in simulation and robot experiments.Comment: Two first authors contributed equally. Accepted by ICASSP 202
Policy Transfer via Kinematic Domain Randomization and Adaptation
Transferring reinforcement learning policies trained in physics simulation to
the real hardware remains a challenge, known as the "sim-to-real" gap. Domain
randomization is a simple yet effective technique to address dynamics
discrepancies across source and target domains, but its success generally
depends on heuristics and trial-and-error. In this work we investigate the
impact of randomized parameter selection on policy transferability across
different types of domain discrepancies. Contrary to common practice in which
kinematic parameters are carefully measured while dynamic parameters are
randomized, we found that virtually randomizing kinematic parameters (e.g.,
link lengths) during training in simulation generally outperforms dynamic
randomization. Based on this finding, we introduce a new domain adaptation
algorithm that utilizes simulated kinematic parameters variation. Our
algorithm, Multi-Policy Bayesian Optimization, trains an ensemble of universal
policies conditioned on virtual kinematic parameters and efficiently adapts to
the target environment using a limited number of target domain rollouts. We
showcase our findings on a simulated quadruped robot in five different target
environments covering different aspects of domain discrepancies.Comment: Submitted to the 2021 IEEE International Conference on Robotics and
Automation (ICRA
A User's Guide to Calibrating Robotics Simulators
Simulators are a critical component of modern robotics research. Strategies
for both perception and decision making can be studied in simulation first
before deployed to real world systems, saving on time and costs. Despite
significant progress on the development of sim-to-real algorithms, the analysis
of different methods is still conducted in an ad-hoc manner, without a
consistent set of tests and metrics for comparison. This paper fills this gap
and proposes a set of benchmarks and a framework for the study of various
algorithms aimed to transfer models and policies learnt in simulation to the
real world. We conduct experiments on a wide range of well known simulated
environments to characterize and offer insights into the performance of
different algorithms. Our analysis can be useful for practitioners working in
this area and can help make informed choices about the behavior and main
properties of sim-to-real algorithms. We open-source the benchmark, training
data, and trained models, which can be found at
https://github.com/NVlabs/sim-parameter-estimation.Comment: Accepted at Conference on Robot Learning 202
- …