26 research outputs found
Online inverse reinforcement learning with unknown disturbances
This paper addresses the problem of online inverse reinforcement learning for
nonlinear systems with modeling uncertainties while in the presence of unknown
disturbances. The developed approach observes state and input trajectories for
an agent and identifies the unknown reward function online. Sub-optimality
introduced in the observed trajectories by the unknown external disturbance is
compensated for using a novel model-based inverse reinforcement learning
approach. The observer estimates the external disturbances and uses the
resulting estimates to learn the dynamic model of the demonstrator. The learned
demonstrator model along with the observed suboptimal trajectories are used to
implement inverse reinforcement learning. Theoretical guarantees are provided
using Lyapunov theory and a simulation example is shown to demonstrate the
effectiveness of the proposed technique.Comment: 8 pages, 3 figure
Online Observer-Based Inverse Reinforcement Learning
In this paper, a novel approach to the output-feedback inverse reinforcement
learning (IRL) problem is developed by casting the IRL problem, for linear
systems with quadratic cost functions, as a state estimation problem. Two
observer-based techniques for IRL are developed, including a novel observer
method that re-uses previous state estimates via history stacks. Theoretical
guarantees for convergence and robustness are established under appropriate
excitation conditions. Simulations demonstrate the performance of the developed
observers and filters under noisy and noise-free measurements.Comment: 7 pages, 3 figure
Compatible Reward Inverse Reinforcement Learning
International audienceInverse Reinforcement Learning (IRL) is an effective approach to recover a reward function that explains the behavior of an expert by observing a set of demonstrations. This paper is about a novel model-free IRL approach that, differently from most of the existing IRL algorithms, does not require to specify a function space where to search for the expert's reward function. Leveraging on the fact that the policy gradient needs to be zero for any optimal policy, the algorithm generates a set of basis functions that span the subspace of reward functions that make the policy gradient vanish. Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy. After introducing our approach for finite domains, we extend it to continuous ones. The proposed approach is empirically compared to other IRL methods both in the (finite) Taxi domain and in the (continuous) Linear Quadratic Gaussian (LQG) and Car on the Hill environments
Kernel Density Bayesian Inverse Reinforcement Learning
Inverse reinforcement learning~(IRL) is a powerful framework to infer an
agent's reward function by observing its behavior, but IRL algorithms that
learn point estimates of the reward function can be misleading because there
may be several functions that describe an agent's behavior equally well. A
Bayesian approach to IRL models a distribution over candidate reward functions,
alleviating the shortcomings of learning a point estimate. However, several
Bayesian IRL algorithms use a -value function in place of the likelihood
function. The resulting posterior is computationally intensive to calculate,
has few theoretical guarantees, and the -value function is often a poor
approximation for the likelihood. We introduce kernel density Bayesian IRL
(KD-BIRL), which uses conditional kernel density estimation to directly
approximate the likelihood, providing an efficient framework that, with a
modified reward function parameterization, is applicable to environments with
complex and infinite state spaces. We demonstrate KD-BIRL's benefits through a
series of experiments in Gridworld environments and a simulated sepsis
treatment task