23 research outputs found
Bayesian multitask inverse reinforcement learning
We generalise the problem of inverse reinforcement learning to multiple
tasks, from multiple demonstrations. Each one may represent one expert trying
to solve a different task, or as different experts trying to solve the same
task. Our main contribution is to formalise the problem as statistical
preference elicitation, via a number of structured priors, whose form captures
our biases about the relatedness of different tasks or expert policies. In
doing so, we introduce a prior on policy optimality, which is more natural to
specify. We show that our framework allows us not only to learn to efficiently
from multiple experts but to also effectively differentiate between the goals
of each. Possible applications include analysing the intrinsic motivations of
subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Imitation learning has traditionally been applied to learn a single task from
demonstrations thereof. The requirement of structured and isolated
demonstrations limits the scalability of imitation learning approaches as they
are difficult to apply to real-world scenarios, where robots have to be able to
execute a multitude of tasks. In this paper, we propose a multi-modal imitation
learning framework that is able to segment and imitate skills from unlabelled
and unstructured demonstrations by learning skill segmentation and imitation
learning jointly. The extensive simulation results indicate that our method can
efficiently separate the demonstrations into individual skills and learn to
imitate them using a single multi-modal policy. The video of our experiments is
available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201
Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning
In the field of reinforcement learning there has been recent progress towards
safety and high-confidence bounds on policy performance. However, to our
knowledge, no practical methods exist for determining high-confidence policy
performance bounds in the inverse reinforcement learning setting---where the
true reward function is unknown and only samples of expert behavior are given.
We propose a sampling method based on Bayesian inverse reinforcement learning
that uses demonstrations to determine practical high-confidence upper bounds on
the -worst-case difference in expected return between any evaluation
policy and the optimal policy under the expert's unknown reward function. We
evaluate our proposed bound on both a standard grid navigation task and a
simulated driving task and achieve tighter and more accurate bounds than a
feature count-based baseline. We also give examples of how our proposed bound
can be utilized to perform risk-aware policy selection and risk-aware policy
improvement. Because our proposed bound requires several orders of magnitude
fewer demonstrations than existing high-confidence bounds, it is the first
practical method that allows agents that learn from demonstration to express
confidence in the quality of their learned policy.Comment: In proceedings AAAI-1
Exploring Apprenticeship Learning for Player Modelling in Interactive Narratives
In this paper we present an early Apprenticeship Learning approach to mimic
the behaviour of different players in a short adaption of the interactive
fiction Anchorhead. Our motivation is the need to understand and simulate
player behaviour to create systems to aid the design and personalisation of
Interactive Narratives (INs). INs are partially observable for the players and
their goals are dynamic as a result. We used Receding Horizon IRL (RHIRL) to
learn players' goals in the form of reward functions, and derive policies to
imitate their behaviour. Our preliminary results suggest that RHIRL is able to
learn action sequences to complete a game, and provided insights towards
generating behaviour more similar to specific players.Comment: Extended Abstracts of the 2019 Annual Symposium on Computer-Human
Interaction in Play (CHI Play
Softstar: Heuristic-guided probabilistic inference
Recent machine learning methods for sequential behavior prediction estimate the motives of behavior rather than the behavior itself. This higher-level abstraction improves generalization in different prediction settings, but computing predictions often becomes intractable in large decision spaces. We propose the Softstar algorithm, a softened heuristic-guided search technique for the maximum entropy inverse optimal control model of sequential behavior. This approach supports probabilistic search with bounded approximation error at a significantly reduced computational cost when compared to sampling based methods. We present the algorithm, analyze approximation guarantees, and compare performance with simulation-based inference on two distinct complex decision tasks