195 research outputs found
Inverse Reinforcement Learning through Policy Gradient Minimization
Inverse Reinforcement Learning (IRL) deals with the problem of recovering the reward function optimized by an expert given a set of demonstrations of the expert's policy.Most IRL algorithms need to repeatedly compute the optimal policy for different reward functions.This paper proposes a new IRL approach that allows to recover the reward function without the need of solving any "direct" RL problem.The idea is to find the reward function that minimizes the gradient of a parameterized representation of the expert's policy.In particular, when the reward function can be represented as a linear combination of some basis functions, we will show that the aforementioned optimization problem can be efficiently solved.We present an empirical evaluation of the proposed approach on a multidimensional version of the Linear-Quadratic Regulator (LQR) both in the case where the parameters of the expert's policy are known and in the (more realistic) case where the parameters of the expert's policy need to be inferred from the expert's demonstrations.Finally, the algorithm is compared against the state-of-the-art on the mountain car domain, where the expert's policy is unknown
Semidefinite Relaxations for Stochastic Optimal Control Policies
Recent results in the study of the Hamilton Jacobi Bellman (HJB) equation
have led to the discovery of a formulation of the value function as a linear
Partial Differential Equation (PDE) for stochastic nonlinear systems with a
mild constraint on their disturbances. This has yielded promising directions
for research in the planning and control of nonlinear systems. This work
proposes a new method obtaining approximate solutions to these linear
stochastic optimal control (SOC) problems. A candidate polynomial with variable
coefficients is proposed as the solution to the SOC problem. A Sum of Squares
(SOS) relaxation is then taken to the partial differential constraints, leading
to a hierarchy of semidefinite relaxations with improving sub-optimality gap.
The resulting approximate solutions are shown to be guaranteed over- and
under-approximations for the optimal value function.Comment: Preprint. Accepted to American Controls Conference (ACC) 2014 in
Portland, Oregon. 7 pages, colo
Inverse Reinforcement Learning with Explicit Policy Estimates
Various methods for solving the inverse reinforcement learning (IRL) problem
have been developed independently in machine learning and economics. In
particular, the method of Maximum Causal Entropy IRL is based on the
perspective of entropy maximization, while related advances in the field of
economics instead assume the existence of unobserved action shocks to explain
expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability
method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously
unknown connections between these related methods from both fields. We achieve
this by showing that they all belong to a class of optimization problems,
characterized by a common form of the objective, the associated policy and the
objective gradient. We demonstrate key computational and algorithmic
differences which arise between the methods due to an approximation of the
optimal soft value function, and describe how this leads to more efficient
algorithms. Using insights which emerge from our study of this class of
optimization problems, we identify various problem scenarios and investigate
each method's suitability for these problems.Comment: To be published in: Proceedings of the 35th AAAI Conference on
Artificial Intelligence, February 202
- …