1,237 research outputs found
Probabilistic movement modeling for intention inference in human-robot interaction.
Intention inference can be an essential step toward efficient humanrobot interaction. For this purpose, we propose the Intention-Driven Dynamics Model (IDDM) to probabilistically model the generative process of movements that are directed by the intention. The IDDM allows to infer the intention from observed movements using Bayes โ theorem. The IDDM simultaneously finds a latent state representation of noisy and highdimensional observations, and models the intention-driven dynamics in the latent states. As most robotics applications are subject to real-time constraints, we develop an efficient online algorithm that allows for real-time intention inference. Two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive humanoid robots, are used to evaluate the performance of our inference algorithm. In both intention inference tasks, the proposed algorithm achieves substantial improvements over support vector machines and Gaussian processes.
Recommended from our members
Hierarchical policy design for sample-efficient learning of robot table tennis through self-play
Training robots with physical bodies requires developing new methods and action representations that allow the learning agents to explore the space of policies efficiently. This work studies sample-efficient learning of complex policies in the context of robot table tennis. It incorporates learning into a hierarchical control framework using a model-free strategy layer (which requires complex reasoning about opponents that is difficult to do in a model-based way), model-based prediction of external objects (which are difficult to control directly with analytic control methods, but governed by learnable and relatively simple laws of physics), and analytic controllers for the robot itself. Human demonstrations are used to train dynamics models, which together with the analytic controller allow any robot that is physically capable to play table tennis without training episodes. Using only about 7000 demonstrated trajectories, a striking policy can hit ball targets with about 20 cm error. Self-play is used to train cooperative and adversarial strategies on top of model-based striking skills trained from human demonstrations. After only about 24000 strikes in self-play the agent learns to best exploit the human dynamics models for longer cooperative games. Further experiments demonstrate that more flexible variants of the policy can discover new strikes not demonstrated by humans and achieve higher performance at the expense of lower sample-efficiency. Experiments are carried out in a virtual reality environment using sensory observations that are obtainable in the real world. The high sample-efficiency demonstrated in the evaluations show that the proposed method is suitable for learning directly on physical robots without transfer of models or policies from simulation.Computer Science
Intention Inference and Decision Making with Hierarchical Gaussian Process Dynamics Models
Anticipation is crucial for fluent human-robot interaction, which allows a robot to independently coordinate its actions with human beings in joint activities. An anticipatory robot relies on a predictive model of its human partners, and selects its own action according to the model's predictions. Intention inference and decision making are key elements towards such anticipatory robots. In this thesis, we present a machine-learning approach to intention inference and decision making, based on Hierarchical Gaussian Process Dynamics Models (H-GPDMs).
We first introduce the H-GPDM, a class of generic latent-variable dynamics models. The H-GPDM represents the generative process of complex human movements that are directed by exogenous driving factors. Incorporating the exogenous variables in the dynamics model, the H-GPDM achieves improved interpretation, analysis, and prediction of human movements. While exact inference of the exogenous variables and the latent states is intractable, we introduce an approximate method using variational Bayesian inference, and demonstrate the merits of the H-GPDM in three different applications of human movement analysis. The H-GPDM lays a foundation for the following studies on intention inference and decision making.
Intention inference is an essential step towards anticipatory robots. For this purpose, we consider a special case of the H-GPDM, the Intention-Driven Dynamics Model (IDDM), which considers the human partners' intention as exogenous driving factors. The IDDM is applicable to intention inference from observed movements using Bayes' theorem, where the latent state variables are marginalized out.
As most robotics applications are subject to real-time constraints, we introduce an efficient online algorithm that allows for real-time intention inference. We show that the IDDM achieved state-of-the-art performance in intention inference using two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive robots.
Decision making based on a time series of predictions allows a robot to be proactive in its action selection, which involves a trade-off between the accuracy and confidence of the prediction and the time for executing a selected action. To address the problem of action selection and optimal timing for initiating the movement, we formulate the anticipatory action selection using Partially Observable Markov Decision Process, where the H-GPDM is adopted to update belief state and to estimate transition model. We present two approaches to policy learning and decision making, and show their effectiveness using human-robot table tennis.
In addition, we consider decision making solely based on the preference of the human partners, where observations are not sufficient for reliable intention inference. We formulate it as a repeated game and present a learning approach to safe strategies that exploit the humans' preferences. The learned strategy enables action selection when reliable intention inference is not available due to insufficient observation, e.g., for a robot to return served balls from a human table tennis player.
In this thesis, we use human-robot table tennis as a running example, where a key bottleneck is the limited amount of time for executing a hitting movement. Movement initiation usually requires an early decision on the type of action, such as a forehand or backhand hitting movement, at least 80ms before the opponent has hit the ball. The robot, therefore, needs to be anticipatory and proactive of the opponent's intended target. Using the proposed methods, the robot can predict the intended target of the opponent and initiate an appropriate hitting movement according to the prediction. Experimental results show that the proposed intention inference and decision making methods can substantially enhance the capability of the robot table tennis player, using both a physically realistic simulation and a real Barrett WAM robot arm with seven degrees of freedom
์ฌ์ธต ๊ฐํํ์ต์ ์ด์ฉํ ์ฌ๋์ ๋ชจ์ ์ ํตํ ์ดํ์ ์บ๋ฆญํฐ ์ ์ด๊ธฐ ๊ฐ๋ฐ
ํ์๋
ผ๋ฌธ(์์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2022. 8. ์์ง์ฑ.์ฌ๋์ ๋ชจ์
์ ์ด์ฉํ ๋ก๋ด ์ปจํธ๋กค ์ธํฐํ์ด์ค๋ ์ฌ์ฉ์์ ์ง๊ด๊ณผ ๋ก๋ด์ ๋ชจํฐ ๋ฅ๋ ฅ์ ํฉํ์ฌ ์ํํ ํ๊ฒฝ์์ ๋ก๋ด์ ์ ์ฐํ ์๋์ ๋ง๋ค์ด๋ธ๋ค. ํ์ง๋ง, ํด๋จธ๋
ธ์ด๋ ์ธ์ ์ฌ์กฑ๋ณดํ ๋ก๋ด์ด๋ ์ก์กฑ๋ณดํ ๋ก๋ด์ ์ํ ๋ชจ์
์ธํฐํ์ด์ค๋ฅผ ๋์์ธ ํ๋ ๊ฒ์ ์ฌ์ด์ผ์ด ์๋๋ค. ์ด๊ฒ์ ์ฌ๋๊ณผ ๋ก๋ด ์ฌ์ด์ ํํ ์ฐจ์ด๋ก ์ค๋ ๋ค์ด๋๋ฏน์ค ์ฐจ์ด์ ์ ์ด ์ ๋ต์ด ํฌ๊ฒ ์ฐจ์ด๋๊ธฐ ๋๋ฌธ์ด๋ค. ์ฐ๋ฆฌ๋ ์ฌ๋ ์ฌ์ฉ์๊ฐ ์์ง์์ ํตํ์ฌ ์ฌ์กฑ๋ณดํ ๋ก๋ด์์ ๋ถ๋๋ฝ๊ฒ ์ฌ๋ฌ ๊ณผ์ ๋ฅผ ์ํํ ์ ์๊ฒ๋ ํ๋ ์๋ก์ด ๋ชจ์
์ ์ด ์์คํ
์ ์ ์ํ๋ค. ์ฐ๋ฆฌ๋ ์ฐ์ ์บก์ณํ ์ฌ๋์ ๋ชจ์
์ ์์ํ๋ ๋ก๋ด์ ๋ชจ์
์ผ๋ก ๋ฆฌํ๊ฒ ์ํจ๋ค. ์ด๋ ์์ํ๋ ๋ก๋ด์ ๋ชจ์
์ ์ ์ ๊ฐ ์๋ํ ์๋ฏธ๋ฅผ ๋ดํฌํ๊ฒ ๋๋ฉฐ, ์ฐ๋ฆฌ๋ ์ด๋ฅผ ์ง๋ํ์ต ๋ฐฉ๋ฒ๊ณผ ํ์ฒ๋ฆฌ ๊ธฐ์ ์ ์ด์ฉํ์ฌ ๊ฐ๋ฅ์ผ ํ์๋ค. ๊ทธ ๋ค ์ฐ๋ฆฌ๋ ๋ชจ์
์ ๋ชจ์ฌํ๋ ํ์ต์ ์ปค๋ฆฌํ๋ผ ํ์ต๊ณผ ๋ณํํ์ฌ ์ฃผ์ด์ง ๋ฆฌํ๊ฒ๋ ์ฐธ์กฐ ๋ชจ์
์ ๋ฐ๋ผ๊ฐ๋ ์ ์ด ์ ์ฑ
์ ์์ฑํ์๋ค. ์ฐ๋ฆฌ๋ "์ ๋ฌธ๊ฐ ์ง๋จ"์ ํ์ตํจ์ผ๋ก ๋ชจ์
๋ฆฌํ๊ฒํ
๋ชจ๋๊ณผ ๋ชจ์
๋ชจ์ฌ ๋ชจ๋์ ์ฑ๋ฅ์ ํฌ๊ฒ ์ฆ๊ฐ์์ผฐ๋ค. ๊ฒฐ๊ณผ์์ ๋ณผ ์ ์๋ฏ, ์ฐ๋ฆฌ์ ์์คํ
์ ์ด์ฉํ์ฌ ์ฌ์ฉ์๊ฐ ์ฌ์กฑ๋ณดํ ๋ก๋ด์ ์์๊ธฐ, ์๊ธฐ, ๊ธฐ์ธ์ด๊ธฐ, ํ ๋ป๊ธฐ, ๊ฑท๊ธฐ, ๋๊ธฐ์ ๊ฐ์ ๋ค์ํ ๋ชจํฐ ๊ณผ์ ๋ค์ ์๋ฎฌ๋ ์ด์
ํ๊ฒฝ๊ณผ ํ์ค์์ ๋ ๋ค ์ํํ ์ ์์๋ค. ์ฐ๋ฆฌ๋ ์ฐ๊ตฌ์ ์ฑ๋ฅ์ ํ๊ฐํ๊ธฐ ์ํ์ฌ ๋ค์ํ ๋ถ์์ ํ์์ผ๋ฉฐ, ํนํ ์ฐ๋ฆฌ ์์คํ
์ ๊ฐ๊ฐ์ ์์๋ค์ ์ค์์ฑ์ ๋ณด์ฌ์ค ์ ์๋ ์คํ๋ค์ ์งํํ์๋ค.A human motion-based interface fuses operator intuitions with the motor capabilities of robots, enabling adaptable robot operations in dangerous environments. However, the challenge of designing a motion interface for non-humanoid robots, such as quadrupeds or hexapods, is emerged from the different morphology and dynamics of a human controller, leading to an ambiguity of control strategy. We propose a novel control framework that allows human operators to execute various motor skills on a quadrupedal robot by their motion. Our system first retargets the captured human motion into the corresponding robot motion with the operator's intended semantics. The supervised learning and post-processing techniques allow this retargeting skill which is ambiguity-free and suitable for control policy training. To enable a robot to track a given retargeted motion, we then obtain the control policy from reinforcement learning that imitates the given reference motion with designed curriculums. We additionally enhance the system's performance by introducing a set of experts. Finally, we randomize the domain parameters to adapt the physically simulated motor skills to real-world tasks. We demonstrate that a human operator can perform various motor tasks using our system including standing, tilting, manipulating, sitting, walking, and steering on both physically simulated and real quadruped robots. We also analyze the performance of each system component ablation study.1 Introduction 1
2 Related Work 5
2.1 Legged Robot Control 5
2.2 Motion Imitation 6
2.3 Motion-based Control 7
3 Overview 9
4 Motion Retargeting Module 11
4.1 Motion Retargeting Network 12
4.2 Post-processing for Consistency 14
4.3 A Set of Experts for Multi-task Support 15
5 Motion Imitation Module 17
5.1 Background: Reinforcement Learning 18
5.2 Formulation of Motion Imitation 18
5.3 Curriculum Learning over Tasks and Difficulties 21
5.4 Hierarchical Control with States 21
5.5 Domain Randomization 22
6 Results and Analysis 23
6.1 Experimental Setup 23
6.2 Motion Performance 24
6.3 Analysis 28
6.4 Comparison to Other Methods 31
7 Conclusion And Future Work 32
Bibliography 34
Abstract (In Korean) 44
๊ฐ์ฌ์ ๊ธ 45์
Imitation learning based on entropy-regularized forward and inverse reinforcement learning
This paper proposes Entropy-Regularized Imitation Learning (ERIL), which is a
combination of forward and inverse reinforcement learning under the framework
of the entropy-regularized Markov decision process. ERIL minimizes the reverse
Kullback-Leibler (KL) divergence between two probability distributions induced
by a learner and an expert. Inverse reinforcement learning (RL) in ERIL
evaluates the log-ratio between two distributions using the density ratio
trick, which is widely used in generative adversarial networks. More
specifically, the log-ratio is estimated by building two binary discriminators.
The first discriminator is a state-only function, and it tries to distinguish
the state generated by the forward RL step from the expert's state. The second
discriminator is a function of current state, action, and transitioned state,
and it distinguishes the generated experiences from the ones provided by the
expert. Since the second discriminator has the same hyperparameters of the
forward RL step, it can be used to control the discriminator's ability. The
forward RL minimizes the reverse KL estimated by the inverse RL. We show that
minimizing the reverse KL divergence is equivalent to finding an optimal policy
under entropy regularization. Consequently, a new policy is derived from an
algorithm that resembles Dynamic Policy Programming and Soft Actor-Critic. Our
experimental results on MuJoCo-simulated environments show that ERIL is more
sample-efficient than such previous methods. We further apply the method to
human behaviors in performing a pole-balancing task and show that the estimated
reward functions show how every subject achieves the goal.Comment: 33 pages, 10 figure
- โฆ