Search CORE

6 research outputs found

Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization

Author: Kadokawa Yuki
Matsubara Takamitsu
Tsurumine Yoshihisa
Zhu Lingwei
Publication venue
Publication date: 10/04/2023
Field of study

Deep reinforcement learning with domain randomization learns a control policy in various simulations with randomized physical and sensor model parameters to become transferable to the real world in a zero-shot setting. However, a huge number of samples are often required to learn an effective policy when the range of randomized parameters is extensive due to the instability of policy updates. To alleviate this problem, we propose a sample-efficient method named cyclic policy distillation (CPD). CPD divides the range of randomized parameters into several small sub-domains and assigns a local policy to each one. Then local policies are learned while cyclically transitioning to sub-domains. CPD accelerates learning through knowledge transfer based on expected performance improvements. Finally, all of the learned local policies are distilled into a global policy for sim-to-real transfers. CPD's effectiveness and sample efficiency are demonstrated through simulations with four tasks (Pendulum from OpenAIGym and Pusher, Swimmer, and HalfCheetah from Mujoco), and a real-robot, ball-dispersal task. We published code and videos from our experiments at https://github.com/yuki-kadokawa/cyclic-policy-distillation

arXiv.org e-Print Archive

Physically Consistent Preferential Bayesian Optimization for Food Arrangement

Author: Kawamura Sadao
Kwon Yuhwan
Matsubara Takamitsu
Shimmura Takeshi
Tsurumine Yoshihisa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/09/2022
Field of study

This paper considers the problem of estimating a preferred food arrangement for users from interactive pairwise comparisons using Computer Graphics (CG)-based dish images. As a foodservice industry requirement, we need to utilize domain rules for the geometry of the arrangement (e.g., the food layout of some Japanese dishes is reminiscent of mountains). However, those rules are qualitative and ambiguous; the estimated result might be physically inconsistent (e.g., each food physically interferes, and the arrangement becomes infeasible). To cope with this problem, we propose Physically Consistent Preferential Bayesian Optimization (PCPBO) as a method that obtains physically feasible and preferred arrangements that satisfy domain rules. PCPBO employs a bi-level optimization that combines a physical simulation-based optimization and a Preference-based Bayesian Optimization (PbBO). Our experimental results demonstrated the effectiveness of PCPBO on simulated and actual human users.Comment: 8 pages, 10 figures, accepted by IEEE Robotics and Automation Letters (RA-L) 202

arXiv.org e-Print Archive

ホウサクオナメラカニコウシンスルシンソウキョウカガクシュウニヨルジツロボットイルイソウサノガクシュウ

Author: Tsurumine Yoshihisa
Publication venue: 'Nara Institute of Science and Technology'
Publication date: 31/12/2020
Field of study

博第1737号博士(工学)奈良先端科学技術大学院大

NAIST Academic Repository

Deep Reinforcement Learning with Smooth Policy Update for Robotic Cloth Manipulation

Author: Tsurumine Yoshihisa
Publication venue: 奈良先端科学技術大学院大学
Publication date: 06/03/2023
Field of study

Institutional Repositories DataBase (IRDB)

Randomized-to-Canonical Model Predictive Control for Real-world Visual Robotic Manipulation

Author: Kwon Yuhwan
Matsubara Takamitsu
Morimoto Jun
Tsurumine Yoshihisa
Uchibe Eiji
Yamanokuchi Tomoya
Publication venue
Publication date: 05/07/2022
Field of study

Many works have recently explored Sim-to-real transferable visual model predictive control (MPC). However, such works are limited to one-shot transfer, where real-world data must be collected once to perform the sim-to-real transfer, which remains a significant human effort in transferring the models learned in simulations to new domains in the real world. To alleviate this problem, we first propose a novel model-learning framework called Kalman Randomized-to-Canonical Model (KRC-model). This framework is capable of extracting task-relevant intrinsic features and their dynamics from randomized images. We then propose Kalman Randomized-to-Canonical Model Predictive Control (KRC-MPC) as a zero-shot sim-to-real transferable visual MPC using KRC-model. The effectiveness of our method is evaluated through a valve rotation task by a robot hand in both simulation and the real world, and a block mating task in simulation. The experimental results show that KRC-MPC can be applied to various real domains and tasks in a zero-shot manner.Comment: 8 pages, Accepted by Robotics and Automation Letter

arXiv.org e-Print Archive