23 research outputs found

    Bayesian multitask inverse reinforcement learning

    Get PDF
    We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure

    Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets

    Full text link
    Imitation learning has traditionally been applied to learn a single task from demonstrations thereof. The requirement of structured and isolated demonstrations limits the scalability of imitation learning approaches as they are difficult to apply to real-world scenarios, where robots have to be able to execute a multitude of tasks. In this paper, we propose a multi-modal imitation learning framework that is able to segment and imitate skills from unlabelled and unstructured demonstrations by learning skill segmentation and imitation learning jointly. The extensive simulation results indicate that our method can efficiently separate the demonstrations into individual skills and learn to imitate them using a single multi-modal policy. The video of our experiments is available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201

    Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

    Full text link
    In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the α\alpha-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.Comment: In proceedings AAAI-1

    Exploring Apprenticeship Learning for Player Modelling in Interactive Narratives

    Full text link
    In this paper we present an early Apprenticeship Learning approach to mimic the behaviour of different players in a short adaption of the interactive fiction Anchorhead. Our motivation is the need to understand and simulate player behaviour to create systems to aid the design and personalisation of Interactive Narratives (INs). INs are partially observable for the players and their goals are dynamic as a result. We used Receding Horizon IRL (RHIRL) to learn players' goals in the form of reward functions, and derive policies to imitate their behaviour. Our preliminary results suggest that RHIRL is able to learn action sequences to complete a game, and provided insights towards generating behaviour more similar to specific players.Comment: Extended Abstracts of the 2019 Annual Symposium on Computer-Human Interaction in Play (CHI Play

    Softstar: Heuristic-guided probabilistic inference

    Get PDF
    Recent machine learning methods for sequential behavior prediction estimate the motives of behavior rather than the behavior itself. This higher-level abstraction improves generalization in different prediction settings, but computing predictions often becomes intractable in large decision spaces. We propose the Softstar algorithm, a softened heuristic-guided search technique for the maximum entropy inverse optimal control model of sequential behavior. This approach supports probabilistic search with bounded approximation error at a significantly reduced computational cost when compared to sampling based methods. We present the algorithm, analyze approximation guarantees, and compare performance with simulation-based inference on two distinct complex decision tasks
    corecore