6 research outputs found
Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization
Deep reinforcement learning with domain randomization learns a control policy
in various simulations with randomized physical and sensor model parameters to
become transferable to the real world in a zero-shot setting. However, a huge
number of samples are often required to learn an effective policy when the
range of randomized parameters is extensive due to the instability of policy
updates. To alleviate this problem, we propose a sample-efficient method named
cyclic policy distillation (CPD). CPD divides the range of randomized
parameters into several small sub-domains and assigns a local policy to each
one. Then local policies are learned while cyclically transitioning to
sub-domains. CPD accelerates learning through knowledge transfer based on
expected performance improvements. Finally, all of the learned local policies
are distilled into a global policy for sim-to-real transfers. CPD's
effectiveness and sample efficiency are demonstrated through simulations with
four tasks (Pendulum from OpenAIGym and Pusher, Swimmer, and HalfCheetah from
Mujoco), and a real-robot, ball-dispersal task. We published code and videos
from our experiments at
https://github.com/yuki-kadokawa/cyclic-policy-distillation
Physically Consistent Preferential Bayesian Optimization for Food Arrangement
This paper considers the problem of estimating a preferred food arrangement
for users from interactive pairwise comparisons using Computer Graphics
(CG)-based dish images. As a foodservice industry requirement, we need to
utilize domain rules for the geometry of the arrangement (e.g., the food layout
of some Japanese dishes is reminiscent of mountains). However, those rules are
qualitative and ambiguous; the estimated result might be physically
inconsistent (e.g., each food physically interferes, and the arrangement
becomes infeasible). To cope with this problem, we propose Physically
Consistent Preferential Bayesian Optimization (PCPBO) as a method that obtains
physically feasible and preferred arrangements that satisfy domain rules. PCPBO
employs a bi-level optimization that combines a physical simulation-based
optimization and a Preference-based Bayesian Optimization (PbBO). Our
experimental results demonstrated the effectiveness of PCPBO on simulated and
actual human users.Comment: 8 pages, 10 figures, accepted by IEEE Robotics and Automation Letters
(RA-L) 202
ホウサク オ ナメラカ ニ コウシンスル シンソウ キョウカ ガクシュウ ニヨル ジツロボット イルイ ソウサ ノ ガクシュウ
博第1737号博士(工学)奈良先端科学技術大学院大
Randomized-to-Canonical Model Predictive Control for Real-world Visual Robotic Manipulation
Many works have recently explored Sim-to-real transferable visual model
predictive control (MPC). However, such works are limited to one-shot transfer,
where real-world data must be collected once to perform the sim-to-real
transfer, which remains a significant human effort in transferring the models
learned in simulations to new domains in the real world. To alleviate this
problem, we first propose a novel model-learning framework called Kalman
Randomized-to-Canonical Model (KRC-model). This framework is capable of
extracting task-relevant intrinsic features and their dynamics from randomized
images. We then propose Kalman Randomized-to-Canonical Model Predictive Control
(KRC-MPC) as a zero-shot sim-to-real transferable visual MPC using KRC-model.
The effectiveness of our method is evaluated through a valve rotation task by a
robot hand in both simulation and the real world, and a block mating task in
simulation. The experimental results show that KRC-MPC can be applied to
various real domains and tasks in a zero-shot manner.Comment: 8 pages, Accepted by Robotics and Automation Letter