Search CORE

2,233 research outputs found

Data-efficient Domain Randomization with Bayesian Optimization

Author: Eilers Christian
Gienger Michael
Muratore Fabio
Peters Jan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/01/2021
Field of study

When learning policies for robot control, the required real-world data is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) during training according to a distribution over domain parameters in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. In this paper, we propose Bayesian Domain Randomization (BayRn), a black-box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning given sparse data from the real-world target domain. BayRn uses Bayesian optimization to search the space of source domain distribution parameters such that this leads to a policy which maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach in sim-to-sim as well as in sim-to-real experiments, comparing against three baseline methods on two robotic tasks. Our results show that BayRn is able to perform sim-to-real transfer, while significantly reducing the required prior knowledge.Comment: Accepted at RA-L / ICR

arXiv.org e-Print Archive

How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Author: Christensen Henrik I.
Gao Sicun
Su Hao
Vikram Sharad
Vuong Quan
Publication venue
Publication date: 27/03/2019
Field of study

Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL algorithms. Domain randomization is a promising direction of research that has demonstrated impressive results using RL algorithms to control real robots. At a high level, domain randomization works by training a policy on a distribution of environmental conditions in simulation. If the environments are diverse enough, then the policy trained on this distribution will plausibly generalize to the real world. A human-specified design choice in domain randomization is the form and parameters of the distribution of simulated environments. It is unclear how to the best pick the form and parameters of this distribution and prior work uses hand-tuned distributions. This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance of the trained policies in the real worldComment: 2-page extended abstrac

arXiv.org e-Print Archive

Policy Transfer with Strategy Optimization

Author: Liu C. Karen
Turk Greg
Yu Wenhao
Publication venue
Publication date: 04/12/2018
Field of study

Computer simulation provides an automatic and safe way for training robotic control policies to achieve complex tasks such as locomotion. However, a policy trained in simulation usually does not transfer directly to the real hardware due to the differences between the two environments. Transfer learning using domain randomization is a promising approach, but it usually assumes that the target environment is close to the distribution of the training environments, thus relying heavily on accurate system identification. In this paper, we present a different approach that leverages domain randomization for transferring control policies to unknown environments. The key idea that, instead of learning a single policy in the simulation, we simultaneously learn a family of policies that exhibit different behaviors. When tested in the target environment, we directly search for the best policy in the family based on the task performance, without the need to identify the dynamic parameters. We evaluate our method on five simulated robotic control problems with different discrepancies in the training and testing environment and demonstrate that our method can overcome larger modeling errors compared to training a robust policy or an adaptive policy

arXiv.org e-Print Archive

Active Domain Randomization

Author: Diaz Manfred
Golemo Florian
Mehta Bhairav
Pal Christopher J.
Paull Liam
Publication venue
Publication date: 10/07/2019
Field of study

Domain randomization is a popular technique for improving domain transfer, often used in a zero-shot setting when the target domain is unknown or cannot easily be used for training. In this work, we empirically examine the effects of domain randomization on agent generalization. Our experiments show that domain randomization may lead to suboptimal, high-variance policies, which we attribute to the uniform sampling of environment parameters. We propose Active Domain Randomization, a novel algorithm that learns a parameter sampling strategy. Our method looks for the most informative environment variations within the given randomization ranges by leveraging the discrepancies of policy rollouts in randomized and reference environment instances. We find that training more frequently on these instances leads to better overall agent generalization. Our experiments across various physics-based simulated and real-robot tasks show that this enhancement leads to more robust, consistent policies.Comment: Code available at https://github.com/montrealrobotics/active-domainran

arXiv.org e-Print Archive

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

Author: Chebotar Yevgen
Fox Dieter
Handa Ankur
Issac Jan
Macklin Miles
Makoviychuk Viktor
Ratliff Nathan
Publication venue
Publication date: 05/03/2019
Field of study

We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https://sites.google.com/view/simop

arXiv.org e-Print Archive

Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped

Author: Atkeson Christopher G.
Geyer Hartmut
Li Tianyu
Rai Akshara
Publication venue
Publication date: 27/09/2018
Field of study

Learning controllers for bipedal robots is a challenging problem, often requiring expert knowledge and extensive tuning of parameters that vary in different situations. Recently, deep reinforcement learning has shown promise at automatically learning controllers for complex systems in simulation. This has been followed by a push towards learning controllers that can be transferred between simulation and hardware, primarily with the use of domain randomization. However, domain randomization can make the problem of finding stable controllers even more challenging, especially for underactuated bipedal robots. In this work, we explore whether policies learned in simulation can be transferred to hardware with the use of high-fidelity simulators and structured controllers. We learn a neural network policy which is a part of a more structured controller. While the neural network is learned in simulation, the rest of the controller stays fixed, and can be tuned by the expert as needed. We show that using this approach can greatly speed up the rate of learning in simulation, as well as enable transfer of policies between simulation and hardware. We present our results on an ATRIAS robot and explore the effect of action spaces and cost functions on the rate of transfer between simulation and hardware. Our results show that structured policies can indeed be learned in simulation and implemented on hardware successfully. This has several advantages, as the structure preserves the intuitive nature of the policy, and the neural network improves the performance of the hand-designed policy. In this way, we propose a way of using neural networks to improve expert designed controllers, while maintaining ease of understanding.Comment: Submitted to 2019 IEEE International Conference on Robotics and Automatio

arXiv.org e-Print Archive

A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies

Author: Bengio Yoshua
Bharadhwaj Homanga
Paull Liam
Wang Zihan
Publication venue
Publication date: 11/10/2018
Field of study

Learning effective visuomotor policies for robots purely from data is challenging, but also appealing since a learning-based system should not require manual tuning or calibration. In the case of a robot operating in a real environment the training process can be costly, time-consuming, and even dangerous since failures are common at the start of training. For this reason, it is desirable to be able to leverage \textit{simulation} and \textit{off-policy} data to the extent possible to train the robot. In this work, we introduce a robust framework that plans in simulation and transfers well to the real environment. Our model incorporates a gradient-descent based planning module, which, given the initial image and goal image, encodes the images to a lower dimensional latent state and plans a trajectory to reach the goal. The model, consisting of the encoder and planner modules, is trained through a meta-learning strategy in simulation first. We subsequently perform adversarial domain transfer on the encoder by using a bank of unlabelled but random images from the simulation and real environments to enable the encoder to map images from the real and simulated environments to a similarly distributed latent representation. By fine tuning the entire model (encoder + planner) with far fewer real world expert demonstrations, we show successful planning performances in different navigation tasks.Comment: Under review in ICRA 201

arXiv.org e-Print Archive

From Video Game to Real Robot: The Transfer between Action Spaces

Author: Hautamäki Ville
Kanervisto Anssi
Karttunen Janne
Kyrki Ville
Publication venue
Publication date: 23/03/2020
Field of study

Deep reinforcement learning has proven to be successful for learning tasks in simulated environments, but applying same techniques for robots in real-world domain is more challenging, as they require hours of training. To address this, transfer learning can be used to train the policy first in a simulated environment and then transfer it to physical agent. As the simulation never matches reality perfectly, the physics, visuals and action spaces by necessity differ between these environments to some degree. In this work, we study how general video games can be directly used instead of fine-tuned simulations for the sim-to-real transfer. Especially, we study how the agent can learn the new action space autonomously, when the game actions do not match the robot actions. Our results show that the different action space can be learned by re-training only part of neural network and we obtain above 90% mean success rate in simulation and robot experiments.Comment: Two first authors contributed equally. Accepted by ICASSP 202

arXiv.org e-Print Archive

Policy Transfer via Kinematic Domain Randomization and Adaptation

Author: Exarchos Ioannis
Jiang Yifeng
Liu C. Karen
Yu Wenhao
Publication venue
Publication date: 01/04/2021
Field of study

Transferring reinforcement learning policies trained in physics simulation to the real hardware remains a challenge, known as the "sim-to-real" gap. Domain randomization is a simple yet effective technique to address dynamics discrepancies across source and target domains, but its success generally depends on heuristics and trial-and-error. In this work we investigate the impact of randomized parameter selection on policy transferability across different types of domain discrepancies. Contrary to common practice in which kinematic parameters are carefully measured while dynamic parameters are randomized, we found that virtually randomizing kinematic parameters (e.g., link lengths) during training in simulation generally outperforms dynamic randomization. Based on this finding, we introduce a new domain adaptation algorithm that utilizes simulated kinematic parameters variation. Our algorithm, Multi-Policy Bayesian Optimization, trains an ensemble of universal policies conditioned on virtual kinematic parameters and efficiently adapts to the target environment using a limited number of target domain rollouts. We showcase our findings on a simulated quadruped robot in five different target environments covering different aspects of domain discrepancies.Comment: Submitted to the 2021 IEEE International Conference on Robotics and Automation (ICRA

arXiv.org e-Print Archive

A User's Guide to Calibrating Robotics Simulators

Author: Fox Dieter
Handa Ankur
Mehta Bhairav
Ramos Fabio
Publication venue
Publication date: 17/11/2020
Field of study

Simulators are a critical component of modern robotics research. Strategies for both perception and decision making can be studied in simulation first before deployed to real world systems, saving on time and costs. Despite significant progress on the development of sim-to-real algorithms, the analysis of different methods is still conducted in an ad-hoc manner, without a consistent set of tests and metrics for comparison. This paper fills this gap and proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world. We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms. Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms. We open-source the benchmark, training data, and trained models, which can be found at https://github.com/NVlabs/sim-parameter-estimation.Comment: Accepted at Conference on Robot Learning 202

arXiv.org e-Print Archive