335 research outputs found

    Linear Hamilton Jacobi Bellman Equations in High Dimensions

    Get PDF
    The Hamilton Jacobi Bellman Equation (HJB) provides the globally optimal solution to large classes of control problems. Unfortunately, this generality comes at a price, the calculation of such solutions is typically intractible for systems with more than moderate state space size due to the curse of dimensionality. This work combines recent results in the structure of the HJB, and its reduction to a linear Partial Differential Equation (PDE), with methods based on low rank tensor representations, known as a separated representations, to address the curse of dimensionality. The result is an algorithm to solve optimal control problems which scales linearly with the number of states in a system, and is applicable to systems that are nonlinear with stochastic forcing in finite-horizon, average cost, and first-exit settings. The method is demonstrated on inverted pendulum, VTOL aircraft, and quadcopter models, with system dimension two, six, and twelve respectively.Comment: 8 pages. Accepted to CDC 201

    Inverse Data-Driven Optimal Control for Nonlinear Stochastic Non-stationary Systems

    Full text link
    We consider the problem of estimating the possibly non-convex cost of an agent by observing its interactions with a nonlinear, non-stationary and stochastic environment. For this inverse problem, we give a result that allows to estimate the cost by solving a convex optimization problem. To obtain this result we also tackle a forward problem. This leads to formulate a finite-horizon optimal control problem for which we show convexity and find the optimal solution. Our approach leverages certain probabilistic descriptions that can be obtained both from data and/or from first-principles. The effectiveness of our results, which are turned in an algorithm, is illustrated via simulations on the problem of estimating the cost of an agent that is stabilizing the unstable equilibrium of a pendulum.Comment: Submitted to The 62nd IEEE Conference on Decision and Contro

    Inverse Reinforcement Learning through Policy Gradient Minimization

    Get PDF
    Inverse Reinforcement Learning (IRL) deals with the problem of recovering the reward function optimized by an expert given a set of demonstrations of the expert's policy.Most IRL algorithms need to repeatedly compute the optimal policy for different reward functions.This paper proposes a new IRL approach that allows to recover the reward function without the need of solving any "direct" RL problem.The idea is to find the reward function that minimizes the gradient of a parameterized representation of the expert's policy.In particular, when the reward function can be represented as a linear combination of some basis functions, we will show that the aforementioned optimization problem can be efficiently solved.We present an empirical evaluation of the proposed approach on a multidimensional version of the Linear-Quadratic Regulator (LQR) both in the case where the parameters of the expert's policy are known and in the (more realistic) case where the parameters of the expert's policy need to be inferred from the expert's demonstrations.Finally, the algorithm is compared against the state-of-the-art on the mountain car domain, where the expert's policy is unknown

    A cascaded supervised learning approach to inverse reinforcement learning

    Get PDF
    International audienceThis paper considers the Inverse Reinforcement Learning (IRL) problem, that is inferring a reward function for which a demonstrated expert policy is optimal. We propose to break the IRL problem down into two generic Supervised Learning steps: this is the Cascaded Supervised IRL (CSI) approach. A classification step that defines a score function is followed by a regression step providing a reward function. A theoretical analysis shows that the demonstrated expert policy is nearoptimal for the computed reward function. Not needing to repeatedly solve a Markov Decision Process (MDP) and the ability to leverage existing techniques for classification and regression are two important advantages of the CSI approach. It is furthermore empirically demonstrated to compare positively to state-of-the-art approaches when using only transitions sampled according to the expert policy, up to the use of some heuristics. This is exemplified on two classical benchmarks (the mountain car problem and a highway driving simulator)

    Inverse stochastic optimal controls

    Full text link
    We study an inverse problem of the stochastic optimal control of general diffusions with performance index having the quadratic penalty term of the control process. Under mild conditions on the drift, the volatility, the cost functions of the state, and under the assumption that the optimal control belongs to the interior of the control set, we show that our inverse problem is well-posed using a stochastic maximum principle. Then, with the well-posedness, we reduce the inverse problem to some root finding problem of the expectation of a random variable involved with the value function, which has a unique solution. Based on this result, we propose a numerical method for our inverse problem by replacing the expectation above with arithmetic mean of observed optimal control processes and the corresponding state processes. The recent progress of numerical analyses of Hamilton-Jacobi-Bellman equations enables the proposed method to be implementable for multi-dimensional cases. In particular, with the help of the kernel-based collocation method for Hamilton-Jacobi-Bellman equations, our method for the inverse problems still works well even when an explicit form of the value function is unavailable. Several numerical experiments show that the numerical method recover the unknown weight parameter with high accuracy
    • …
    corecore