Search CORE

41 research outputs found

Reinforcement Learning of Action and Query Policies with LTL Instructions under Uncertain Event Detector

Author: Hatanaka Wataru
Matsubara Takamitsu
Yamashina Ryota
Publication venue
Publication date: 06/09/2023
Field of study

Reinforcement learning (RL) with linear temporal logic (LTL) objectives can allow robots to carry out symbolic event plans in unknown environments. Most existing methods assume that the event detector can accurately map environmental states to symbolic events; however, uncertainty is inevitable for real-world event detectors. Such uncertainty in an event detector generates multiple branching possibilities on LTL instructions, confusing action decisions. Moreover, the queries to the uncertain event detector, necessary for the task's progress, may increase the uncertainty further. To cope with those issues, we propose an RL framework, Learning Action and Query over Belief LTL (LAQBL), to learn an agent that can consider the diversity of LTL instructions due to uncertain event detection while avoiding task failure due to the unnecessary event-detection query. Our framework simultaneously learns 1) an embedding of belief LTL, which is multiple branching possibilities on LTL instructions using a graph neural network, 2) an action policy, and 3) a query policy which decides whether or not to query for the event detector. Simulations in a 2D grid world and image-input robotic inspection environments show that our method successfully learns actions to follow LTL instructions even with uncertain event detectors.Comment: 8 pages, Accepted by Robotics and Automation Letters (RA-L

arXiv.org e-Print Archive

Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies

Author: Matsubara Takamitsu
Michael Brendan
Oh Hanbit
Sasaki Hikaru
Publication venue
Publication date: 27/03/2021
Field of study

Scenarios requiring humans to choose from multiple seemingly optimal actions are commonplace, however standard imitation learning often fails to capture this behavior. Instead, an over-reliance on replicating expert actions induces inflexible and unstable policies, leading to poor generalizability in an application. To address the problem, this paper presents the first imitation learning framework that incorporates Bayesian variational inference for learning flexible non-parametric multi-action policies, while simultaneously robustifying the policies against sources of error, by introducing and optimizing disturbances to create a richer demonstration dataset. This combinatorial approach forces the policy to adapt to challenging situations, enabling stable multi-action policies to be learned efficiently. The effectiveness of our proposed method is evaluated through simulations and real-robot experiments for a table-sweep task using the UR3 6-DOF robotic arm. Results show that, through improved flexibility and robustness, the learning performance and control safety are better than comparison methods.Comment: 7 pages, Accepted by the 2021 International Conference on Robotics and Automation (ICRA 2021

arXiv.org e-Print Archive

Deep Segmented DMP Networks for Learning Discontinuous Motions

Author: Anarossi Edgar
Komeno Naoto
Matsubara Takamitsu
Tahara Hirotaka
Publication venue
Publication date: 01/09/2023
Field of study

Discontinuous motion which is a motion composed of multiple continuous motions with sudden change in direction or velocity in between, can be seen in state-aware robotic tasks. Such robotic tasks are often coordinated with sensor information such as image. In recent years, Dynamic Movement Primitives (DMP) which is a method for generating motor behaviors suitable for robotics has garnered several deep learning based improvements to allow associations between sensor information and DMP parameters. While the implementation of deep learning framework does improve upon DMP's inability to directly associate to an input, we found that it has difficulty learning DMP parameters for complex motion which requires large number of basis functions to reconstruct. In this paper we propose a novel deep learning network architecture called Deep Segmented DMP Network (DSDNet) which generates variable-length segmented motion by utilizing the combination of multiple DMP parameters predicting network architecture, double-stage decoder network, and number of segments predictor. The proposed method is evaluated on both artificial data (object cutting & pick-and-place) and real data (object cutting) where our proposed method could achieve high generalization capability, task-achievement, and data-efficiency compared to previous method on generating discontinuous long-horizon motions.Comment: 7 pages, Accepted by the 2023 International Conference on Automation Science and Engineering (CASE 2023

arXiv.org e-Print Archive

Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization

Author: Kadokawa Yuki
Matsubara Takamitsu
Tsurumine Yoshihisa
Zhu Lingwei
Publication venue
Publication date: 10/04/2023
Field of study

Deep reinforcement learning with domain randomization learns a control policy in various simulations with randomized physical and sensor model parameters to become transferable to the real world in a zero-shot setting. However, a huge number of samples are often required to learn an effective policy when the range of randomized parameters is extensive due to the instability of policy updates. To alleviate this problem, we propose a sample-efficient method named cyclic policy distillation (CPD). CPD divides the range of randomized parameters into several small sub-domains and assigns a local policy to each one. Then local policies are learned while cyclically transitioning to sub-domains. CPD accelerates learning through knowledge transfer based on expected performance improvements. Finally, all of the learned local policies are distilled into a global policy for sim-to-real transfers. CPD's effectiveness and sample efficiency are demonstrated through simulations with four tasks (Pendulum from OpenAIGym and Pusher, Swimmer, and HalfCheetah from Mujoco), and a real-robot, ball-dispersal task. We published code and videos from our experiments at https://github.com/yuki-kadokawa/cyclic-policy-distillation

arXiv.org e-Print Archive

Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies for Robot Manipulation

Author: Matsubara Takamitsu
Michael Brendan
Oh Hanbit
Sasaki Hikaru
Publication venue
Publication date: 07/11/2022
Field of study

Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, applicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterisation of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility.Comment: 69 pages, 9 figures, accepted by Elsevier Neural Networks - Journa

arXiv.org e-Print Archive

Physically Consistent Preferential Bayesian Optimization for Food Arrangement

Author: Kawamura Sadao
Kwon Yuhwan
Matsubara Takamitsu
Shimmura Takeshi
Tsurumine Yoshihisa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/09/2022
Field of study

This paper considers the problem of estimating a preferred food arrangement for users from interactive pairwise comparisons using Computer Graphics (CG)-based dish images. As a foodservice industry requirement, we need to utilize domain rules for the geometry of the arrangement (e.g., the food layout of some Japanese dishes is reminiscent of mountains). However, those rules are qualitative and ambiguous; the estimated result might be physically inconsistent (e.g., each food physically interferes, and the arrangement becomes infeasible). To cope with this problem, we propose Physically Consistent Preferential Bayesian Optimization (PCPBO) as a method that obtains physically feasible and preferred arrangements that satisfy domain rules. PCPBO employs a bi-level optimization that combines a physical simulation-based optimization and a Preference-based Bayesian Optimization (PbBO). Our experimental results demonstrated the effectiveness of PCPBO on simulated and actual human users.Comment: 8 pages, 10 figures, accepted by IEEE Robotics and Automation Letters (RA-L) 202

arXiv.org e-Print Archive