2,414 research outputs found
Approximate Dynamic Programming via Sum of Squares Programming
We describe an approximate dynamic programming method for stochastic control
problems on infinite state and input spaces. The optimal value function is
approximated by a linear combination of basis functions with coefficients as
decision variables. By relaxing the Bellman equation to an inequality, one
obtains a linear program in the basis coefficients with an infinite set of
constraints. We show that a recently introduced method, which obtains convex
quadratic value function approximations, can be extended to higher order
polynomial approximations via sum of squares programming techniques. An
approximate value function can then be computed offline by solving a
semidefinite program, without having to sample the infinite constraint. The
policy is evaluated online by solving a polynomial optimization problem, which
also turns out to be convex in some cases. We experimentally validate the
method on an autonomous helicopter testbed using a 10-dimensional helicopter
model.Comment: 7 pages, 5 figures. Submitted to the 2013 European Control
Conference, Zurich, Switzerlan
Performance guarantees for model-based Approximate Dynamic Programming in continuous spaces
We study both the value function and Q-function formulation of the Linear
Programming approach to Approximate Dynamic Programming. The approach is
model-based and optimizes over a restricted function space to approximate the
value function or Q-function. Working in the discrete time, continuous space
setting, we provide guarantees for the fitting error and online performance of
the policy. In particular, the online performance guarantee is obtained by
analyzing an iterated version of the greedy policy, and the fitting error
guarantee by analyzing an iterated version of the Bellman inequality. These
guarantees complement the existing bounds that appear in the literature. The
Q-function formulation offers benefits, for example, in decentralized
controller design, however it can lead to computationally demanding
optimization problems. To alleviate this drawback, we provide a condition that
simplifies the formulation, resulting in improved computational times.Comment: 18 pages, 5 figures, journal pape
Accelerated Point-wise Maximum Approach to Approximate Dynamic Programming
We describe an approximate dynamic programming approach to compute lower
bounds on the optimal value function for a discrete time, continuous space,
infinite horizon setting. The approach iteratively constructs a family of lower
bounding approximate value functions by using the so-called Bellman inequality.
The novelty of our approach is that, at each iteration, we aim to compute an
approximate value function that maximizes the point-wise maximum taken with the
family of approximate value functions computed thus far. This leads to a
non-convex objective, and we propose a gradient ascent algorithm to find
stationary points by solving a sequence of convex optimization problems. We
provide convergence guarantees for our algorithm and an interpretation for how
the gradient computation relates to the state relevance weighting parameter
appearing in related approximate dynamic programming approaches. We demonstrate
through numerical examples that, when compared to existing approaches, the
algorithm we propose computes tighter sub-optimality bounds with less
computation time.Comment: 14 pages, 3 figure
Least Squares Policy Iteration with Instrumental Variables vs. Direct Policy Search: Comparison Against Optimal Benchmarks Using Energy Storage
This paper studies approximate policy iteration (API) methods which use
least-squares Bellman error minimization for policy evaluation. We address
several of its enhancements, namely, Bellman error minimization using
instrumental variables, least-squares projected Bellman error minimization, and
projected Bellman error minimization using instrumental variables. We prove
that for a general discrete-time stochastic control problem, Bellman error
minimization using instrumental variables is equivalent to both variants of
projected Bellman error minimization. An alternative to these API methods is
direct policy search based on knowledge gradient. The practical performance of
these three approximate dynamic programming methods are then investigated in
the context of an application in energy storage, integrated with an
intermittent wind energy supply to fully serve a stochastic time-varying
electricity demand. We create a library of test problems using real-world data
and apply value iteration to find their optimal policies. These benchmarks are
then used to compare the developed policies. Our analysis indicates that API
with instrumental variables Bellman error minimization prominently outperforms
API with least-squares Bellman error minimization. However, these approaches
underperform our direct policy search implementation.Comment: 37 pages, 9 figure
Generalized Dual Dynamic Programming for Infinite Horizon Problems in Continuous State and Action Spaces
We describe a nonlinear generalization of dual dynamic programming theory and
its application to value function estimation for deterministic control problems
over continuous state and action spaces, in a discrete-time infinite horizon
setting. We prove, using a Benders-type argument leveraging the monotonicity of
the Bellman operator, that the result of a one-stage policy evaluation can be
used to produce nonlinear lower bounds on the optimal value function that are
valid over the entire state space. These bounds contain terms reflecting the
functional form of the system's costs, dynamics, and constraints. We provide an
iterative algorithm that produces successively better approximations of the
optimal value function, and prove under certain assumptions that it achieves an
arbitrarily low desired Bellman optimality tolerance at pre-selected points in
the state space, in a finite number of iterations. We also describe means of
certifying the quality of the value function generated. We demonstrate the
efficacy of the approach on systems whose dimensions are too large for
conventional dynamic programming approaches to be practical.Comment: 12 pages, 2 figure
Risk Sensitive, Nonlinear Optimal Control: Iterative Linear Exponential-Quadratic Optimal Control with Gaussian Noise
In this contribution, we derive ILEG, an iterative algorithm to find risk
sensitive solutions to nonlinear, stochastic optimal control problems. The
algorithm is based on a linear quadratic approximation of an exponential risk
sensitive nonlinear control problem. ILEG allows to find risk sensitive
policies and thus generalizes previous algorithms to solve nonlinear optimal
control based on iterative linear-quadratic methods. Depending on the setting
of the parameter controlling the risk sensitivity, two different strategies on
how to cope with the risk emerge. For positive-value parameters, the control
policy uses high feedback gains whereas for negative-value parameters, it uses
a robust feedforward control strategy (a robust plan) with low gains. These
results are illustrated with a simple example. This note should be considered
as a preliminary report
An efficient DP algorithm on a tree-structure for finite horizon optimal control problems
The classical Dynamic Programming (DP) approach to optimal control problems
is based on the characterization of the value function as the unique viscosity
solution of a Hamilton-Jacobi-Bellman (HJB) equation. The DP scheme for the
numerical approximation of viscosity solutions of Bellman equations is
typically based on a time discretization which is projected on a fixed
state-space grid. The time discretization can be done by a one-step scheme for
the dynamics and the projection on the grid typically uses a local
interpolation. Clearly the use of a grid is a limitation with respect to
possible applications in high-dimensional problems due to the curse of
dimensionality. Here, we present a new approach for finite horizon optimal
control problems where the value function is computed using a DP algorithm on a
tree structure algorithm (TSA) constructed by the time discrete dynamics. In
this way there is no need to build a fixed space triangulation and to project
on it. The tree will guarantee a perfect matching with the discrete dynamics
and drop off the cost of the space interpolation allowing for the solution of
very high-dimensional problems. Numerical tests will show the effectiveness of
the proposed method
Feedback control of parametrized PDEs via model order reduction and dynamic programming principle
In this paper we investigate infinite horizon optimal control problems for
parametrized partial differential equations. We are interested in feedback
control via dynamic programming equations which is well-known to suffer from
the curse of dimensionality. Thus, we apply parametric model order reduction
techniques to construct low-dimensional subspaces with suitable information on
the control problem, where the dynamic programming equations can be
approximated. To guarantee a low number of basis functions, we combine recent
basis generation methods and parameter partitioning techniques. Furthermore, we
present a novel technique to construct nonuniform grids in the reduced domain,
which is based on statistical information. Finally, we discuss numerical
examples to illustrate the effectiveness of the proposed methods for PDEs in
two space dimensions
Semidefinite Relaxations for Stochastic Optimal Control Policies
Recent results in the study of the Hamilton Jacobi Bellman (HJB) equation
have led to the discovery of a formulation of the value function as a linear
Partial Differential Equation (PDE) for stochastic nonlinear systems with a
mild constraint on their disturbances. This has yielded promising directions
for research in the planning and control of nonlinear systems. This work
proposes a new method obtaining approximate solutions to these linear
stochastic optimal control (SOC) problems. A candidate polynomial with variable
coefficients is proposed as the solution to the SOC problem. A Sum of Squares
(SOS) relaxation is then taken to the partial differential constraints, leading
to a hierarchy of semidefinite relaxations with improving sub-optimality gap.
The resulting approximate solutions are shown to be guaranteed over- and
under-approximations for the optimal value function.Comment: Preprint. Accepted to American Controls Conference (ACC) 2014 in
Portland, Oregon. 7 pages, colo
Tropical Kraus maps for optimal control of switched systems
Kraus maps (completely positive trace preserving maps) arise classically in
quantum information, as they describe the evolution of noncommutative
probability measures. We introduce tropical analogues of Kraus maps, obtained
by replacing the addition of positive semidefinite matrices by a multivalued
supremum with respect to the L\"owner order. We show that non-linear
eigenvectors of tropical Kraus maps determine piecewise quadratic
approximations of the value functions of switched optimal control problems.
This leads to a new approximation method, which we illustrate by two
applications: 1) approximating the joint spectral radius, 2) computing
approximate solutions of Hamilton-Jacobi PDE arising from a class of switched
linear quadratic problems studied previously by McEneaney. We report numerical
experiments, indicating a major improvement in terms of scalability by
comparison with earlier numerical schemes, owing to the "LMI-free" nature of
our method.Comment: 15 page
- …