Search CORE

1,237 research outputs found

Probabilistic movement modeling for intention inference in human-robot interaction.

Author: Anderson R
Billingsley J
Bishop C
Deisenroth M
Deisenroth M
Friesen A
Fässler H
Khan M
Lawrence N
Lawrence N
Quiñonero-Candela J
Quiñonero-Candela J
Ramanantsoa M
Rao R
Rasmussen C
Schölkopf B
Simon M
Turner R
van der Maaten L
Wang Z
Williams A
Ziebart B
Publication venue: 'SAGE Publications'
Publication date: 01/01/2013
Field of study

Intention inference can be an essential step toward efficient humanrobot interaction. For this purpose, we propose the Intention-Driven Dynamics Model (IDDM) to probabilistically model the generative process of movements that are directed by the intention. The IDDM allows to infer the intention from observed movements using Bayes ’ theorem. The IDDM simultaneously finds a latent state representation of noisy and highdimensional observations, and models the intention-driven dynamics in the latent states. As most robotics applications are subject to real-time constraints, we develop an efficient online algorithm that allows for real-time intention inference. Two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive humanoid robots, are used to evaluate the performance of our inference algorithm. In both intention inference tasks, the proposed algorithm achieves substantial improvements over support vector machines and Gaussian processes.

CiteSeerX

TUbiblio

Crossref

Publikationsserver der Universität Tübingen

Spiral - Imperial College Digital Repository

MPG.PuRe

Recommended from our members

Hierarchical policy design for sample-efficient learning of robot table tennis through self-play

Author: Mahjourian Reza
Publication venue
Publication date: 06/02/2019
Field of study

Training robots with physical bodies requires developing new methods and action representations that allow the learning agents to explore the space of policies efficiently. This work studies sample-efficient learning of complex policies in the context of robot table tennis. It incorporates learning into a hierarchical control framework using a model-free strategy layer (which requires complex reasoning about opponents that is difficult to do in a model-based way), model-based prediction of external objects (which are difficult to control directly with analytic control methods, but governed by learnable and relatively simple laws of physics), and analytic controllers for the robot itself. Human demonstrations are used to train dynamics models, which together with the analytic controller allow any robot that is physically capable to play table tennis without training episodes. Using only about 7000 demonstrated trajectories, a striking policy can hit ball targets with about 20 cm error. Self-play is used to train cooperative and adversarial strategies on top of model-based striking skills trained from human demonstrations. After only about 24000 strikes in self-play the agent learns to best exploit the human dynamics models for longer cooperative games. Further experiments demonstrate that more flexible variants of the policy can discover new strikes not demonstrated by humans and achieve higher performance at the expense of lower sample-efficiency. Experiments are carried out in a virtual reality environment using sensory observations that are obtainable in the real world. The high sample-efficiency demonstrated in the evaluations show that the proposed method is suitable for learning directly on physical robots without transfer of models or policies from simulation.Computer Science

Texas ScholarWorks

Intention Inference and Decision Making with Hierarchical Gaussian Process Dynamics Models

Author: Darmstadt D
Entscheidungsfindung Hierarchischen Gaußprozess
Inferenz Intentionen
Publication venue
Publication date: 01/01/2013
Field of study

Anticipation is crucial for fluent human-robot interaction, which allows a robot to independently coordinate its actions with human beings in joint activities. An anticipatory robot relies on a predictive model of its human partners, and selects its own action according to the model's predictions. Intention inference and decision making are key elements towards such anticipatory robots. In this thesis, we present a machine-learning approach to intention inference and decision making, based on Hierarchical Gaussian Process Dynamics Models (H-GPDMs). We first introduce the H-GPDM, a class of generic latent-variable dynamics models. The H-GPDM represents the generative process of complex human movements that are directed by exogenous driving factors. Incorporating the exogenous variables in the dynamics model, the H-GPDM achieves improved interpretation, analysis, and prediction of human movements. While exact inference of the exogenous variables and the latent states is intractable, we introduce an approximate method using variational Bayesian inference, and demonstrate the merits of the H-GPDM in three different applications of human movement analysis. The H-GPDM lays a foundation for the following studies on intention inference and decision making. Intention inference is an essential step towards anticipatory robots. For this purpose, we consider a special case of the H-GPDM, the Intention-Driven Dynamics Model (IDDM), which considers the human partners' intention as exogenous driving factors. The IDDM is applicable to intention inference from observed movements using Bayes' theorem, where the latent state variables are marginalized out. As most robotics applications are subject to real-time constraints, we introduce an efficient online algorithm that allows for real-time intention inference. We show that the IDDM achieved state-of-the-art performance in intention inference using two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive robots. Decision making based on a time series of predictions allows a robot to be proactive in its action selection, which involves a trade-off between the accuracy and confidence of the prediction and the time for executing a selected action. To address the problem of action selection and optimal timing for initiating the movement, we formulate the anticipatory action selection using Partially Observable Markov Decision Process, where the H-GPDM is adopted to update belief state and to estimate transition model. We present two approaches to policy learning and decision making, and show their effectiveness using human-robot table tennis. In addition, we consider decision making solely based on the preference of the human partners, where observations are not sufficient for reliable intention inference. We formulate it as a repeated game and present a learning approach to safe strategies that exploit the humans' preferences. The learned strategy enables action selection when reliable intention inference is not available due to insufficient observation, e.g., for a robot to return served balls from a human table tennis player. In this thesis, we use human-robot table tennis as a running example, where a key bottleneck is the limited amount of time for executing a hitting movement. Movement initiation usually requires an early decision on the type of action, such as a forehand or backhand hitting movement, at least 80ms before the opponent has hit the ball. The robot, therefore, needs to be anticipatory and proactive of the opponent's intended target. Using the proposed methods, the robot can predict the intended target of the opponent and initiate an appropriate hitting movement according to the prediction. Experimental results show that the proposed intention inference and decision making methods can substantially enhance the capability of the robot table tennis player, using both a physically realistic simulation and a real Barrett WAM robot arm with seven degrees of freedom

심층 강화학습을 이용한 사람의 모션을 통한 이형적 캐릭터 제어기 개발

Author: 김선우
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2022. 8. 서진욱.사람의 모션을 이용한 로봇 컨트롤 인터페이스는 사용자의 직관과 로봇의 모터 능력을 합하여 위험한 환경에서 로봇의 유연한 작동을 만들어낸다. 하지만, 휴머노이드 외의 사족보행 로봇이나 육족보행 로봇을 위한 모션 인터페이스를 디자인 하는 것은 쉬운일이 아니다. 이것은 사람과 로봇 사이의 형태 차이로 오는 다이나믹스 차이와 제어 전략이 크게 차이나기 때문이다. 우리는 사람 사용자가 움직임을 통하여 사족보행 로봇에서 부드럽게 여러 과제를 수행할 수 있게끔 하는 새로운 모션 제어 시스템을 제안한다. 우리는 우선 캡쳐한 사람의 모션을 상응하는 로봇의 모션으로 리타겟 시킨다. 이때 상응하는 로봇의 모션은 유저가 의도한 의미를 내포하게 되며, 우리는 이를 지도학습 방법과 후처리 기술을 이용하여 가능케 하였다. 그 뒤 우리는 모션을 모사하는 학습을 커리큘럼 학습과 병행하여 주어진 리타겟된 참조 모션을 따라가는 제어 정책을 생성하였다. 우리는 "전문가 집단"을 학습함으로 모션 리타게팅 모듈과 모션 모사 모듈의 성능을 크게 증가시켰다. 결과에서 볼 수 있듯, 우리의 시스템을 이용하여 사용자가 사족보행 로봇의 서있기, 앉기, 기울이기, 팔 뻗기, 걷기, 돌기와 같은 다양한 모터 과제들을 시뮬레이션 환경과 현실에서 둘 다 수행할 수 있었다. 우리는 연구의 성능을 평가하기 위하여 다양한 분석을 하였으며, 특히 우리 시스템의 각각의 요소들의 중요성을 보여줄 수 있는 실험들을 진행하였다.A human motion-based interface fuses operator intuitions with the motor capabilities of robots, enabling adaptable robot operations in dangerous environments. However, the challenge of designing a motion interface for non-humanoid robots, such as quadrupeds or hexapods, is emerged from the different morphology and dynamics of a human controller, leading to an ambiguity of control strategy. We propose a novel control framework that allows human operators to execute various motor skills on a quadrupedal robot by their motion. Our system first retargets the captured human motion into the corresponding robot motion with the operator's intended semantics. The supervised learning and post-processing techniques allow this retargeting skill which is ambiguity-free and suitable for control policy training. To enable a robot to track a given retargeted motion, we then obtain the control policy from reinforcement learning that imitates the given reference motion with designed curriculums. We additionally enhance the system's performance by introducing a set of experts. Finally, we randomize the domain parameters to adapt the physically simulated motor skills to real-world tasks. We demonstrate that a human operator can perform various motor tasks using our system including standing, tilting, manipulating, sitting, walking, and steering on both physically simulated and real quadruped robots. We also analyze the performance of each system component ablation study.1 Introduction 1 2 Related Work 5 2.1 Legged Robot Control 5 2.2 Motion Imitation 6 2.3 Motion-based Control 7 3 Overview 9 4 Motion Retargeting Module 11 4.1 Motion Retargeting Network 12 4.2 Post-processing for Consistency 14 4.3 A Set of Experts for Multi-task Support 15 5 Motion Imitation Module 17 5.1 Background: Reinforcement Learning 18 5.2 Formulation of Motion Imitation 18 5.3 Curriculum Learning over Tasks and Difficulties 21 5.4 Hierarchical Control with States 21 5.5 Domain Randomization 22 6 Results and Analysis 23 6.1 Experimental Setup 23 6.2 Motion Performance 24 6.3 Analysis 28 6.4 Comparison to Other Methods 31 7 Conclusion And Future Work 32 Bibliography 34 Abstract (In Korean) 44 감사의 글 45석

SNU Open Repository and Archive

Annotated Bibliography: Anticipation

Author: Nadin Mihai
Publication venue
Publication date: 01/01/2010
Field of study

PhilPapers

Imitation learning based on entropy-regularized forward and inverse reinforcement learning

Author: Doya Kenji
Uchibe Eiji
Publication venue
Publication date: 17/08/2020
Field of study

This paper proposes Entropy-Regularized Imitation Learning (ERIL), which is a combination of forward and inverse reinforcement learning under the framework of the entropy-regularized Markov decision process. ERIL minimizes the reverse Kullback-Leibler (KL) divergence between two probability distributions induced by a learner and an expert. Inverse reinforcement learning (RL) in ERIL evaluates the log-ratio between two distributions using the density ratio trick, which is widely used in generative adversarial networks. More specifically, the log-ratio is estimated by building two binary discriminators. The first discriminator is a state-only function, and it tries to distinguish the state generated by the forward RL step from the expert's state. The second discriminator is a function of current state, action, and transitioned state, and it distinguishes the generated experiences from the ones provided by the expert. Since the second discriminator has the same hyperparameters of the forward RL step, it can be used to control the discriminator's ability. The forward RL minimizes the reverse KL estimated by the inverse RL. We show that minimizing the reverse KL divergence is equivalent to finding an optimal policy under entropy regularization. Consequently, a new policy is derived from an algorithm that resembles Dynamic Policy Programming and Soft Actor-Critic. Our experimental results on MuJoCo-simulated environments show that ERIL is more sample-efficient than such previous methods. We further apply the method to human behaviors in performing a pole-balancing task and show that the estimated reward functions show how every subject achieves the goal.Comment: 33 pages, 10 figure

arXiv.org e-Print Archive

OIST Institutional Repository

Use of Robotic Arm as a Tool in Manipulation Under Uncertainty for Dual Arm Systems

Author: Cai Huang Jimy Andre
Publication venue: 'University of Queensland Library'
Publication date: 10/06/2019
Field of study

University of Queensland eSpace