2,698 research outputs found
Teaching Inverse Reinforcement Learners via Features and Demonstrations
Learning near-optimal behaviour from an expert's demonstrations typically
relies on the assumption that the learner knows the features that the true
reward function depends on. In this paper, we study the problem of learning
from demonstrations in the setting where this is not the case, i.e., where
there is a mismatch between the worldviews of the learner and the expert. We
introduce a natural quantity, the teaching risk, which measures the potential
suboptimality of policies that look optimal to the learner in this setting. We
show that bounds on the teaching risk guarantee that the learner is able to
find a near-optimal policy using standard algorithms based on inverse
reinforcement learning. Based on these findings, we suggest a teaching scheme
in which the expert can decrease the teaching risk by updating the learner's
worldview, and thus ultimately enable her to find a near-optimal policy.Comment: NeurIPS'2018 (extended version
Interactive Teaching Algorithms for Inverse Reinforcement Learning
We study the problem of inverse reinforcement learning (IRL) with the added
twist that the learner is assisted by a helpful teacher. More formally, we
tackle the following algorithmic question: How could a teacher provide an
informative sequence of demonstrations to an IRL learner to speed up the
learning process? We present an interactive teaching framework where a teacher
adaptively chooses the next demonstration based on learner's current policy. In
particular, we design teaching algorithms for two concrete settings: an
omniscient setting where a teacher has full knowledge about the learner's
dynamics and a blackbox setting where the teacher has minimal knowledge. Then,
we study a sequential variant of the popular MCE-IRL learner and prove
convergence guarantees of our teaching algorithm in the omniscient setting.
Extensive experiments with a car driving simulator environment show that the
learning progress can be speeded up drastically as compared to an uninformative
teacher.Comment: IJCAI'19 paper (extended version
Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback
We study the problem of teaching via demonstrations in sequential
decision-making tasks. In particular, we focus on the situation when the
teacher has no access to the learner's model and policy, and the feedback from
the learner is limited to trajectories that start from states selected by the
teacher. The necessity to select the starting states and infer the learner's
policy creates an opportunity for using the methods of inverse reinforcement
learning and active learning by the teacher. In this work, we formalize the
teaching process with limited feedback and propose an algorithm that solves
this teaching problem. The algorithm uses a modified version of the active
value-at-risk method to select the starting states, a modified maximum causal
entropy algorithm to infer the policy, and the difficulty score ratio method to
choose the teaching demonstrations. We test the algorithm in a synthetic car
driving environment and conclude that the proposed algorithm is an effective
solution when the learner's feedback is limited.Comment: 7 pages, 3 figure
Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications
Inverse reinforcement learning (IRL) infers a reward function from
demonstrations, allowing for policy improvement and generalization. However,
despite much recent interest in IRL, little work has been done to understand
the minimum set of demonstrations needed to teach a specific sequential
decision-making task. We formalize the problem of finding maximally informative
demonstrations for IRL as a machine teaching problem where the goal is to find
the minimum number of demonstrations needed to specify the reward equivalence
class of the demonstrator. We extend previous work on algorithmic teaching for
sequential decision-making tasks by showing a reduction to the set cover
problem which enables an efficient approximation algorithm for determining the
set of maximally-informative demonstrations. We apply our proposed machine
teaching algorithm to two novel applications: providing a lower bound on the
number of queries needed to learn a policy using active IRL and developing a
novel IRL algorithm that can learn more efficiently from informative
demonstrations than a standard IRL approach.Comment: In proceedings of the AAAI Conference on Artificial Intelligence,
201
- …