208,031 research outputs found
Approximate Policy Iteration Schemes: A Comparison
We consider the infinite-horizon discounted optimal control problem
formalized by Markov Decision Processes. We focus on several approximate
variations of the Policy Iteration algorithm: Approximate Policy Iteration,
Conservative Policy Iteration (CPI), a natural adaptation of the Policy Search
by Dynamic Programming algorithm to the infinite-horizon case (PSDP),
and the recently proposed Non-Stationary Policy iteration (NSPI(m)). For all
algorithms, we describe performance bounds, and make a comparison by paying a
particular attention to the concentrability constants involved, the number of
iterations and the memory required. Our analysis highlights the following
points: 1) The performance guarantee of CPI can be arbitrarily better than that
of API/API(), but this comes at the cost of a relative---exponential in
---increase of the number of iterations. 2) PSDP
enjoys the best of both worlds: its performance guarantee is similar to that of
CPI, but within a number of iterations similar to that of API. 3) Contrary to
API that requires a constant memory, the memory needed by CPI and PSDP
is proportional to their number of iterations, which may be problematic when
the discount factor is close to 1 or the approximation error
is close to ; we show that the NSPI(m) algorithm allows to make
an overall trade-off between memory and performance. Simulations with these
schemes confirm our analysis.Comment: ICML (2014
Hermite matrix in Lagrange basis for scaling static output feedback polynomial matrix inequalities
Using Hermite's formulation of polynomial stability conditions, static output
feedback (SOF) controller design can be formulated as a polynomial matrix
inequality (PMI), a (generally nonconvex) nonlinear semidefinite programming
problem that can be solved (locally) with PENNON, an implementation of a
penalty method. Typically, Hermite SOF PMI problems are badly scaled and
experiments reveal that this has a negative impact on the overall performance
of the solver. In this note we recall the algebraic interpretation of Hermite's
quadratic form as a particular Bezoutian and we use results on polynomial
interpolation to express the Hermite PMI in a Lagrange polynomial basis, as an
alternative to the conventional power basis. Numerical experiments on benchmark
problem instances show the substantial improvement brought by the approach, in
terms of problem scaling, number of iterations and convergence behavior of
PENNON
- âŠ