Search CORE

5 research outputs found

Meta Reinforcement Learning for Sim-to-real Domain Adaptation

Author: Arndt Karol
Ghadirzadeh Ali
Hazara Murtaza
Kyrki Ville
Publication venue
Publication date: 16/09/2019
Field of study

Modern reinforcement learning methods suffer from low sample efficiency and unsafe exploration, making it infeasible to train robotic policies entirely on real hardware. In this work, we propose to address the problem of sim-to-real domain transfer by using meta learning to train a policy that can adapt to a variety of dynamic conditions, and using a task-specific trajectory generation model to provide an action space that facilitates quick exploration. We evaluate the method by performing domain adaptation in simulation and analyzing the structure of the latent space during adaptation. We then deploy this policy on a KUKA LBR 4+ robot and evaluate its performance on a task of hitting a hockey puck to a target. Our method shows more consistent and stable domain adaptation than the baseline, resulting in better overall performance.Comment: Submitted to ICRA 202

arXiv.org e-Print Archive

Crossref

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Author: Arndt Karol
Björkman Mårten
Finn Chelsea
Ghadirzadeh Ali
Kragic Danica
Kyrki Ville
Poklukar Petra
Publication venue
Publication date: 18/04/2022
Field of study

We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.Comment: arXiv admin note: substantial text overlap with arXiv:2007.1313

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Transferring Generalizable Motor Primitives From Simulation to Real World

Author: Hazara Murtaza
Kyrki Ville
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2019
Field of study

Reinforcement learning provides robots with an autonomous learning framework where a skill can he learned by exploration. Exploration in real world is, however, inherently unsafe and time consuming, and causes wear and tear. To address these, learning policies in simulation and then transferring them to physical systems has been proposed. In this letter, we propose a novel sample-efficient transfer approach, which is agnostic to the dynamics of a simulated system and combines it with incremental learning. Instead of transferring a single control policy, we transfer a generalizable contextual policy generated in simulation using one or few samples from real world to a target global model, which can generate policies across parameterized real-world situations. We studied the generalization capability of the incremental transfer framework using MuJoCo physics engine and KUKA LBR 4+. Experiments with ball-in-a-cup and basketball tasks demonstrated that the target model improved the generalization capability beyond the direct use of the source model indicating the effectiveness of the proposed framework. Experiments also indicated that the transfer capability depends on the generalization capability of the corresponding source model, similarity between source and target environment, and number of samples used for transferring.Peer reviewe

Aaltodoc Publication Archive

Domain Adaptation in Unmanned Aerial Vehicles Landing using Reinforcement Learning

Author: Franca Albuquerque Pedro Lucas
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/12/2019
Field of study

Landing an unmanned aerial vehicle (UAV) on a moving platform is a challenging task that often requires exact models of the UAV dynamics, platform characteristics, and environmental conditions. In this thesis, we present and investigate three different machine learning approaches with varying levels of domain knowledge: dynamics randomization, universal policy with system identification, and reinforcement learning with no parameter variation. We first train the policies in simulation, then perform experiments both in simulation, making variations of the system dynamics with wind and friction coefficient, then perform experiments in a real robot system with wind variation. We initially expected that providing more information on environmental characteristics with system identification would improve the outcomes, however, we found that transferring a policy learned in simulation with domain randomization to the real robot system achieves the best result in the real robot and simulation. Although in simulation the universal policy with system identification is faster in some cases. In this thesis, we compare the results of multiple deep reinforcement learning approaches trained in simulation and transferred in robot experiments with the presence of external disturbances. We were able to create a policy to control a UAV completely trained in simulation and transfer to a real system with the presence of external disturbances. In doing so, we evaluate the performance of dynamics randomization and universal policy with system identification. Adviser: Carrick Detweile