1,237 research outputs found

    Probabilistic movement modeling for intention inference in human-robot interaction.

    No full text
    Intention inference can be an essential step toward efficient humanrobot interaction. For this purpose, we propose the Intention-Driven Dynamics Model (IDDM) to probabilistically model the generative process of movements that are directed by the intention. The IDDM allows to infer the intention from observed movements using Bayes โ€™ theorem. The IDDM simultaneously finds a latent state representation of noisy and highdimensional observations, and models the intention-driven dynamics in the latent states. As most robotics applications are subject to real-time constraints, we develop an efficient online algorithm that allows for real-time intention inference. Two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive humanoid robots, are used to evaluate the performance of our inference algorithm. In both intention inference tasks, the proposed algorithm achieves substantial improvements over support vector machines and Gaussian processes.

    Intention Inference and Decision Making with Hierarchical Gaussian Process Dynamics Models

    Get PDF
    Anticipation is crucial for fluent human-robot interaction, which allows a robot to independently coordinate its actions with human beings in joint activities. An anticipatory robot relies on a predictive model of its human partners, and selects its own action according to the model's predictions. Intention inference and decision making are key elements towards such anticipatory robots. In this thesis, we present a machine-learning approach to intention inference and decision making, based on Hierarchical Gaussian Process Dynamics Models (H-GPDMs). We first introduce the H-GPDM, a class of generic latent-variable dynamics models. The H-GPDM represents the generative process of complex human movements that are directed by exogenous driving factors. Incorporating the exogenous variables in the dynamics model, the H-GPDM achieves improved interpretation, analysis, and prediction of human movements. While exact inference of the exogenous variables and the latent states is intractable, we introduce an approximate method using variational Bayesian inference, and demonstrate the merits of the H-GPDM in three different applications of human movement analysis. The H-GPDM lays a foundation for the following studies on intention inference and decision making. Intention inference is an essential step towards anticipatory robots. For this purpose, we consider a special case of the H-GPDM, the Intention-Driven Dynamics Model (IDDM), which considers the human partners' intention as exogenous driving factors. The IDDM is applicable to intention inference from observed movements using Bayes' theorem, where the latent state variables are marginalized out. As most robotics applications are subject to real-time constraints, we introduce an efficient online algorithm that allows for real-time intention inference. We show that the IDDM achieved state-of-the-art performance in intention inference using two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive robots. Decision making based on a time series of predictions allows a robot to be proactive in its action selection, which involves a trade-off between the accuracy and confidence of the prediction and the time for executing a selected action. To address the problem of action selection and optimal timing for initiating the movement, we formulate the anticipatory action selection using Partially Observable Markov Decision Process, where the H-GPDM is adopted to update belief state and to estimate transition model. We present two approaches to policy learning and decision making, and show their effectiveness using human-robot table tennis. In addition, we consider decision making solely based on the preference of the human partners, where observations are not sufficient for reliable intention inference. We formulate it as a repeated game and present a learning approach to safe strategies that exploit the humans' preferences. The learned strategy enables action selection when reliable intention inference is not available due to insufficient observation, e.g., for a robot to return served balls from a human table tennis player. In this thesis, we use human-robot table tennis as a running example, where a key bottleneck is the limited amount of time for executing a hitting movement. Movement initiation usually requires an early decision on the type of action, such as a forehand or backhand hitting movement, at least 80ms before the opponent has hit the ball. The robot, therefore, needs to be anticipatory and proactive of the opponent's intended target. Using the proposed methods, the robot can predict the intended target of the opponent and initiate an appropriate hitting movement according to the prediction. Experimental results show that the proposed intention inference and decision making methods can substantially enhance the capability of the robot table tennis player, using both a physically realistic simulation and a real Barrett WAM robot arm with seven degrees of freedom

    ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต์„ ์ด์šฉํ•œ ์‚ฌ๋žŒ์˜ ๋ชจ์…˜์„ ํ†ตํ•œ ์ดํ˜•์  ์บ๋ฆญํ„ฐ ์ œ์–ด๊ธฐ ๊ฐœ๋ฐœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2022. 8. ์„œ์ง„์šฑ.์‚ฌ๋žŒ์˜ ๋ชจ์…˜์„ ์ด์šฉํ•œ ๋กœ๋ด‡ ์ปจํŠธ๋กค ์ธํ„ฐํŽ˜์ด์Šค๋Š” ์‚ฌ์šฉ์ž์˜ ์ง๊ด€๊ณผ ๋กœ๋ด‡์˜ ๋ชจํ„ฐ ๋Šฅ๋ ฅ์„ ํ•ฉํ•˜์—ฌ ์œ„ํ—˜ํ•œ ํ™˜๊ฒฝ์—์„œ ๋กœ๋ด‡์˜ ์œ ์—ฐํ•œ ์ž‘๋™์„ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค. ํ•˜์ง€๋งŒ, ํœด๋จธ๋…ธ์ด๋“œ ์™ธ์˜ ์‚ฌ์กฑ๋ณดํ–‰ ๋กœ๋ด‡์ด๋‚˜ ์œก์กฑ๋ณดํ–‰ ๋กœ๋ด‡์„ ์œ„ํ•œ ๋ชจ์…˜ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ๋””์ž์ธ ํ•˜๋Š” ๊ฒƒ์€ ์‰ฌ์šด์ผ์ด ์•„๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์‚ฌ๋žŒ๊ณผ ๋กœ๋ด‡ ์‚ฌ์ด์˜ ํ˜•ํƒœ ์ฐจ์ด๋กœ ์˜ค๋Š” ๋‹ค์ด๋‚˜๋ฏน์Šค ์ฐจ์ด์™€ ์ œ์–ด ์ „๋žต์ด ํฌ๊ฒŒ ์ฐจ์ด๋‚˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์šฐ๋ฆฌ๋Š” ์‚ฌ๋žŒ ์‚ฌ์šฉ์ž๊ฐ€ ์›€์ง์ž„์„ ํ†ตํ•˜์—ฌ ์‚ฌ์กฑ๋ณดํ–‰ ๋กœ๋ด‡์—์„œ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์—ฌ๋Ÿฌ ๊ณผ์ œ๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ๋” ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ชจ์…˜ ์ œ์–ด ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ์šฐ์„  ์บก์ณํ•œ ์‚ฌ๋žŒ์˜ ๋ชจ์…˜์„ ์ƒ์‘ํ•˜๋Š” ๋กœ๋ด‡์˜ ๋ชจ์…˜์œผ๋กœ ๋ฆฌํƒ€๊ฒŸ ์‹œํ‚จ๋‹ค. ์ด๋•Œ ์ƒ์‘ํ•˜๋Š” ๋กœ๋ด‡์˜ ๋ชจ์…˜์€ ์œ ์ €๊ฐ€ ์˜๋„ํ•œ ์˜๋ฏธ๋ฅผ ๋‚ดํฌํ•˜๊ฒŒ ๋˜๋ฉฐ, ์šฐ๋ฆฌ๋Š” ์ด๋ฅผ ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ ํ›„์ฒ˜๋ฆฌ ๊ธฐ์ˆ ์„ ์ด์šฉํ•˜์—ฌ ๊ฐ€๋Šฅ์ผ€ ํ•˜์˜€๋‹ค. ๊ทธ ๋’ค ์šฐ๋ฆฌ๋Š” ๋ชจ์…˜์„ ๋ชจ์‚ฌํ•˜๋Š” ํ•™์Šต์„ ์ปค๋ฆฌํ˜๋Ÿผ ํ•™์Šต๊ณผ ๋ณ‘ํ–‰ํ•˜์—ฌ ์ฃผ์–ด์ง„ ๋ฆฌํƒ€๊ฒŸ๋œ ์ฐธ์กฐ ๋ชจ์…˜์„ ๋”ฐ๋ผ๊ฐ€๋Š” ์ œ์–ด ์ •์ฑ…์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” "์ „๋ฌธ๊ฐ€ ์ง‘๋‹จ"์„ ํ•™์Šตํ•จ์œผ๋กœ ๋ชจ์…˜ ๋ฆฌํƒ€๊ฒŒํŒ… ๋ชจ๋“ˆ๊ณผ ๋ชจ์…˜ ๋ชจ์‚ฌ ๋ชจ๋“ˆ์˜ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ์ฆ๊ฐ€์‹œ์ผฐ๋‹ค. ๊ฒฐ๊ณผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ, ์šฐ๋ฆฌ์˜ ์‹œ์Šคํ…œ์„ ์ด์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ์‚ฌ์กฑ๋ณดํ–‰ ๋กœ๋ด‡์˜ ์„œ์žˆ๊ธฐ, ์•‰๊ธฐ, ๊ธฐ์šธ์ด๊ธฐ, ํŒ” ๋ป—๊ธฐ, ๊ฑท๊ธฐ, ๋Œ๊ธฐ์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋ชจํ„ฐ ๊ณผ์ œ๋“ค์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ ํ˜„์‹ค์—์„œ ๋‘˜ ๋‹ค ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์—ฐ๊ตฌ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋ถ„์„์„ ํ•˜์˜€์œผ๋ฉฐ, ํŠนํžˆ ์šฐ๋ฆฌ ์‹œ์Šคํ…œ์˜ ๊ฐ๊ฐ์˜ ์š”์†Œ๋“ค์˜ ์ค‘์š”์„ฑ์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๋Š” ์‹คํ—˜๋“ค์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.A human motion-based interface fuses operator intuitions with the motor capabilities of robots, enabling adaptable robot operations in dangerous environments. However, the challenge of designing a motion interface for non-humanoid robots, such as quadrupeds or hexapods, is emerged from the different morphology and dynamics of a human controller, leading to an ambiguity of control strategy. We propose a novel control framework that allows human operators to execute various motor skills on a quadrupedal robot by their motion. Our system first retargets the captured human motion into the corresponding robot motion with the operator's intended semantics. The supervised learning and post-processing techniques allow this retargeting skill which is ambiguity-free and suitable for control policy training. To enable a robot to track a given retargeted motion, we then obtain the control policy from reinforcement learning that imitates the given reference motion with designed curriculums. We additionally enhance the system's performance by introducing a set of experts. Finally, we randomize the domain parameters to adapt the physically simulated motor skills to real-world tasks. We demonstrate that a human operator can perform various motor tasks using our system including standing, tilting, manipulating, sitting, walking, and steering on both physically simulated and real quadruped robots. We also analyze the performance of each system component ablation study.1 Introduction 1 2 Related Work 5 2.1 Legged Robot Control 5 2.2 Motion Imitation 6 2.3 Motion-based Control 7 3 Overview 9 4 Motion Retargeting Module 11 4.1 Motion Retargeting Network 12 4.2 Post-processing for Consistency 14 4.3 A Set of Experts for Multi-task Support 15 5 Motion Imitation Module 17 5.1 Background: Reinforcement Learning 18 5.2 Formulation of Motion Imitation 18 5.3 Curriculum Learning over Tasks and Difficulties 21 5.4 Hierarchical Control with States 21 5.5 Domain Randomization 22 6 Results and Analysis 23 6.1 Experimental Setup 23 6.2 Motion Performance 24 6.3 Analysis 28 6.4 Comparison to Other Methods 31 7 Conclusion And Future Work 32 Bibliography 34 Abstract (In Korean) 44 ๊ฐ์‚ฌ์˜ ๊ธ€ 45์„

    Annotated Bibliography: Anticipation

    Get PDF

    Imitation learning based on entropy-regularized forward and inverse reinforcement learning

    Get PDF
    This paper proposes Entropy-Regularized Imitation Learning (ERIL), which is a combination of forward and inverse reinforcement learning under the framework of the entropy-regularized Markov decision process. ERIL minimizes the reverse Kullback-Leibler (KL) divergence between two probability distributions induced by a learner and an expert. Inverse reinforcement learning (RL) in ERIL evaluates the log-ratio between two distributions using the density ratio trick, which is widely used in generative adversarial networks. More specifically, the log-ratio is estimated by building two binary discriminators. The first discriminator is a state-only function, and it tries to distinguish the state generated by the forward RL step from the expert's state. The second discriminator is a function of current state, action, and transitioned state, and it distinguishes the generated experiences from the ones provided by the expert. Since the second discriminator has the same hyperparameters of the forward RL step, it can be used to control the discriminator's ability. The forward RL minimizes the reverse KL estimated by the inverse RL. We show that minimizing the reverse KL divergence is equivalent to finding an optimal policy under entropy regularization. Consequently, a new policy is derived from an algorithm that resembles Dynamic Policy Programming and Soft Actor-Critic. Our experimental results on MuJoCo-simulated environments show that ERIL is more sample-efficient than such previous methods. We further apply the method to human behaviors in performing a pole-balancing task and show that the estimated reward functions show how every subject achieves the goal.Comment: 33 pages, 10 figure

    Use of Robotic Arm as a Tool in Manipulation Under Uncertainty for Dual Arm Systems

    Get PDF
    • โ€ฆ
    corecore