2,414 research outputs found

    Approximate Dynamic Programming via Sum of Squares Programming

    Full text link
    We describe an approximate dynamic programming method for stochastic control problems on infinite state and input spaces. The optimal value function is approximated by a linear combination of basis functions with coefficients as decision variables. By relaxing the Bellman equation to an inequality, one obtains a linear program in the basis coefficients with an infinite set of constraints. We show that a recently introduced method, which obtains convex quadratic value function approximations, can be extended to higher order polynomial approximations via sum of squares programming techniques. An approximate value function can then be computed offline by solving a semidefinite program, without having to sample the infinite constraint. The policy is evaluated online by solving a polynomial optimization problem, which also turns out to be convex in some cases. We experimentally validate the method on an autonomous helicopter testbed using a 10-dimensional helicopter model.Comment: 7 pages, 5 figures. Submitted to the 2013 European Control Conference, Zurich, Switzerlan

    Performance guarantees for model-based Approximate Dynamic Programming in continuous spaces

    Full text link
    We study both the value function and Q-function formulation of the Linear Programming approach to Approximate Dynamic Programming. The approach is model-based and optimizes over a restricted function space to approximate the value function or Q-function. Working in the discrete time, continuous space setting, we provide guarantees for the fitting error and online performance of the policy. In particular, the online performance guarantee is obtained by analyzing an iterated version of the greedy policy, and the fitting error guarantee by analyzing an iterated version of the Bellman inequality. These guarantees complement the existing bounds that appear in the literature. The Q-function formulation offers benefits, for example, in decentralized controller design, however it can lead to computationally demanding optimization problems. To alleviate this drawback, we provide a condition that simplifies the formulation, resulting in improved computational times.Comment: 18 pages, 5 figures, journal pape

    Accelerated Point-wise Maximum Approach to Approximate Dynamic Programming

    Full text link
    We describe an approximate dynamic programming approach to compute lower bounds on the optimal value function for a discrete time, continuous space, infinite horizon setting. The approach iteratively constructs a family of lower bounding approximate value functions by using the so-called Bellman inequality. The novelty of our approach is that, at each iteration, we aim to compute an approximate value function that maximizes the point-wise maximum taken with the family of approximate value functions computed thus far. This leads to a non-convex objective, and we propose a gradient ascent algorithm to find stationary points by solving a sequence of convex optimization problems. We provide convergence guarantees for our algorithm and an interpretation for how the gradient computation relates to the state relevance weighting parameter appearing in related approximate dynamic programming approaches. We demonstrate through numerical examples that, when compared to existing approaches, the algorithm we propose computes tighter sub-optimality bounds with less computation time.Comment: 14 pages, 3 figure

    Least Squares Policy Iteration with Instrumental Variables vs. Direct Policy Search: Comparison Against Optimal Benchmarks Using Energy Storage

    Full text link
    This paper studies approximate policy iteration (API) methods which use least-squares Bellman error minimization for policy evaluation. We address several of its enhancements, namely, Bellman error minimization using instrumental variables, least-squares projected Bellman error minimization, and projected Bellman error minimization using instrumental variables. We prove that for a general discrete-time stochastic control problem, Bellman error minimization using instrumental variables is equivalent to both variants of projected Bellman error minimization. An alternative to these API methods is direct policy search based on knowledge gradient. The practical performance of these three approximate dynamic programming methods are then investigated in the context of an application in energy storage, integrated with an intermittent wind energy supply to fully serve a stochastic time-varying electricity demand. We create a library of test problems using real-world data and apply value iteration to find their optimal policies. These benchmarks are then used to compare the developed policies. Our analysis indicates that API with instrumental variables Bellman error minimization prominently outperforms API with least-squares Bellman error minimization. However, these approaches underperform our direct policy search implementation.Comment: 37 pages, 9 figure

    Generalized Dual Dynamic Programming for Infinite Horizon Problems in Continuous State and Action Spaces

    Full text link
    We describe a nonlinear generalization of dual dynamic programming theory and its application to value function estimation for deterministic control problems over continuous state and action spaces, in a discrete-time infinite horizon setting. We prove, using a Benders-type argument leveraging the monotonicity of the Bellman operator, that the result of a one-stage policy evaluation can be used to produce nonlinear lower bounds on the optimal value function that are valid over the entire state space. These bounds contain terms reflecting the functional form of the system's costs, dynamics, and constraints. We provide an iterative algorithm that produces successively better approximations of the optimal value function, and prove under certain assumptions that it achieves an arbitrarily low desired Bellman optimality tolerance at pre-selected points in the state space, in a finite number of iterations. We also describe means of certifying the quality of the value function generated. We demonstrate the efficacy of the approach on systems whose dimensions are too large for conventional dynamic programming approaches to be practical.Comment: 12 pages, 2 figure

    Risk Sensitive, Nonlinear Optimal Control: Iterative Linear Exponential-Quadratic Optimal Control with Gaussian Noise

    Full text link
    In this contribution, we derive ILEG, an iterative algorithm to find risk sensitive solutions to nonlinear, stochastic optimal control problems. The algorithm is based on a linear quadratic approximation of an exponential risk sensitive nonlinear control problem. ILEG allows to find risk sensitive policies and thus generalizes previous algorithms to solve nonlinear optimal control based on iterative linear-quadratic methods. Depending on the setting of the parameter controlling the risk sensitivity, two different strategies on how to cope with the risk emerge. For positive-value parameters, the control policy uses high feedback gains whereas for negative-value parameters, it uses a robust feedforward control strategy (a robust plan) with low gains. These results are illustrated with a simple example. This note should be considered as a preliminary report

    An efficient DP algorithm on a tree-structure for finite horizon optimal control problems

    Full text link
    The classical Dynamic Programming (DP) approach to optimal control problems is based on the characterization of the value function as the unique viscosity solution of a Hamilton-Jacobi-Bellman (HJB) equation. The DP scheme for the numerical approximation of viscosity solutions of Bellman equations is typically based on a time discretization which is projected on a fixed state-space grid. The time discretization can be done by a one-step scheme for the dynamics and the projection on the grid typically uses a local interpolation. Clearly the use of a grid is a limitation with respect to possible applications in high-dimensional problems due to the curse of dimensionality. Here, we present a new approach for finite horizon optimal control problems where the value function is computed using a DP algorithm on a tree structure algorithm (TSA) constructed by the time discrete dynamics. In this way there is no need to build a fixed space triangulation and to project on it. The tree will guarantee a perfect matching with the discrete dynamics and drop off the cost of the space interpolation allowing for the solution of very high-dimensional problems. Numerical tests will show the effectiveness of the proposed method

    Feedback control of parametrized PDEs via model order reduction and dynamic programming principle

    Full text link
    In this paper we investigate infinite horizon optimal control problems for parametrized partial differential equations. We are interested in feedback control via dynamic programming equations which is well-known to suffer from the curse of dimensionality. Thus, we apply parametric model order reduction techniques to construct low-dimensional subspaces with suitable information on the control problem, where the dynamic programming equations can be approximated. To guarantee a low number of basis functions, we combine recent basis generation methods and parameter partitioning techniques. Furthermore, we present a novel technique to construct nonuniform grids in the reduced domain, which is based on statistical information. Finally, we discuss numerical examples to illustrate the effectiveness of the proposed methods for PDEs in two space dimensions

    Semidefinite Relaxations for Stochastic Optimal Control Policies

    Full text link
    Recent results in the study of the Hamilton Jacobi Bellman (HJB) equation have led to the discovery of a formulation of the value function as a linear Partial Differential Equation (PDE) for stochastic nonlinear systems with a mild constraint on their disturbances. This has yielded promising directions for research in the planning and control of nonlinear systems. This work proposes a new method obtaining approximate solutions to these linear stochastic optimal control (SOC) problems. A candidate polynomial with variable coefficients is proposed as the solution to the SOC problem. A Sum of Squares (SOS) relaxation is then taken to the partial differential constraints, leading to a hierarchy of semidefinite relaxations with improving sub-optimality gap. The resulting approximate solutions are shown to be guaranteed over- and under-approximations for the optimal value function.Comment: Preprint. Accepted to American Controls Conference (ACC) 2014 in Portland, Oregon. 7 pages, colo

    Tropical Kraus maps for optimal control of switched systems

    Full text link
    Kraus maps (completely positive trace preserving maps) arise classically in quantum information, as they describe the evolution of noncommutative probability measures. We introduce tropical analogues of Kraus maps, obtained by replacing the addition of positive semidefinite matrices by a multivalued supremum with respect to the L\"owner order. We show that non-linear eigenvectors of tropical Kraus maps determine piecewise quadratic approximations of the value functions of switched optimal control problems. This leads to a new approximation method, which we illustrate by two applications: 1) approximating the joint spectral radius, 2) computing approximate solutions of Hamilton-Jacobi PDE arising from a class of switched linear quadratic problems studied previously by McEneaney. We report numerical experiments, indicating a major improvement in terms of scalability by comparison with earlier numerical schemes, owing to the "LMI-free" nature of our method.Comment: 15 page
    • …
    corecore