2 research outputs found
Inverse Reinforcement Learning from a Gradient-based Learner
Inverse Reinforcement Learning addresses the problem of inferring an expert's
reward function from demonstrations. However, in many applications, we not only
have access to the expert's near-optimal behavior, but we also observe part of
her learning process. In this paper, we propose a new algorithm for this
setting, in which the goal is to recover the reward function being optimized by
an agent, given a sequence of policies produced during learning. Our approach
is based on the assumption that the observed agent is updating her policy
parameters along the gradient direction. Then we extend our method to deal with
the more realistic scenario where we only have access to a dataset of learning
trajectories. For both settings, we provide theoretical insights into our
algorithms' performance. Finally, we evaluate the approach in a simulated
GridWorld environment and on the MuJoCo environments, comparing it with the
state-of-the-art baseline