11,126 research outputs found

    Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

    Full text link
    The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human's underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1: (i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to hold under weaker assumptions; (iii) Added additional figures and expanded discussions to improve readabilit

    Learning Feedback Terms for Reactive Planning and Control

    Full text link
    With the advancement of robotics, machine learning, and machine perception, increasingly more robots will enter human environments to assist with daily tasks. However, dynamically-changing human environments requires reactive motion plans. Reactivity can be accomplished through replanning, e.g. model-predictive control, or through a reactive feedback policy that modifies on-going behavior in response to sensory events. In this paper, we investigate how to use machine learning to add reactivity to a previously learned nominal skilled behavior. We approach this by learning a reactive modification term for movement plans represented by nonlinear differential equations. In particular, we use dynamic movement primitives (DMPs) to represent a skill and a neural network to learn a reactive policy from human demonstrations. We use the well explored domain of obstacle avoidance for robot manipulation as a test bed. Our approach demonstrates how a neural network can be combined with physical insights to ensure robust behavior across different obstacle settings and movement durations. Evaluations on an anthropomorphic robotic system demonstrate the effectiveness of our work.Comment: 8 pages, accepted to be published at ICRA 2017 conferenc

    PILCO: A Model-Based and Data-Efficient Approach to Policy Search

    No full text
    In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. Copyright 2011 by the author(s)/owner(s)

    Gaussian-Process-based Robot Learning from Demonstration

    Full text link
    Endowed with higher levels of autonomy, robots are required to perform increasingly complex manipulation tasks. Learning from demonstration is arising as a promising paradigm for transferring skills to robots. It allows to implicitly learn task constraints from observing the motion executed by a human teacher, which can enable adaptive behavior. We present a novel Gaussian-Process-based learning from demonstration approach. This probabilistic representation allows to generalize over multiple demonstrations, and encode variability along the different phases of the task. In this paper, we address how Gaussian Processes can be used to effectively learn a policy from trajectories in task space. We also present a method to efficiently adapt the policy to fulfill new requirements, and to modulate the robot behavior as a function of task variability. This approach is illustrated through a real-world application using the TIAGo robot.Comment: 8 pages, 10 figure
    • …
    corecore