25 research outputs found

    Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

    Full text link
    In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the α\alpha-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.Comment: In proceedings AAAI-1

    Bayesian Nonparametric Inverse Reinforcement Learning

    Get PDF
    Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the entire observation set. In practice, this leads to a computationally-costly search over a large (typically infinite) space of complex reward functions. This paper proposes the notion that if the observations can be partitioned into smaller groups, a class of much simpler reward functions can be used to explain each group. The proposed method uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Experimental results are given for simple examples showing comparable performance to other IRL algorithms in nominal situations. Moreover, the proposed method handles cyclic tasks (where the agent begins and ends in the same state) that would break existing algorithms without modification. Finally, the new algorithm has a fundamentally different structure than previous methods, making it more computationally efficient in a real-world learning scenario where the state space is large but the demonstration set is small

    Interactive learning gives the tempo to an intrinsically motivated robot learner

    Get PDF
    International audienceThis paper studies an interactive learning system that couples internally guided learning and social interaction for robot learning of motor skills. We present Socially Guided Intrinsic Motivation with Interactive learning at the Meta level (SGIM-IM), an algorithm for learning forward and inverse models in high-dimensional, continuous and non-preset environments. The robot actively self-determines: at a meta level a strategy, whether to choose active autonomous learning or social learning strategies; and at the task level a goal task in autonomous exploration. We illustrate through 2 experimental set-ups that SGIM-IM efficiently combines the advantages of social learning and intrinsic motivation to be able to produce a wide range of effects in the environment, and develop precise control policies in large spaces, while minimising its reliance on the teacher, and offering a flexible interaction framework with human
    corecore