208,031 research outputs found

    Approximate Policy Iteration Schemes: A Comparison

    Get PDF
    We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy Iteration (CPI), a natural adaptation of the Policy Search by Dynamic Programming algorithm to the infinite-horizon case (PSDP∞_\infty), and the recently proposed Non-Stationary Policy iteration (NSPI(m)). For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API/API(α\alpha), but this comes at the cost of a relative---exponential in 1Ï”\frac{1}{\epsilon}---increase of the number of iterations. 2) PSDP∞_\infty enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a number of iterations similar to that of API. 3) Contrary to API that requires a constant memory, the memory needed by CPI and PSDP∞_\infty is proportional to their number of iterations, which may be problematic when the discount factor Îł\gamma is close to 1 or the approximation error Ï”\epsilon is close to 00; we show that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance. Simulations with these schemes confirm our analysis.Comment: ICML (2014

    Hermite matrix in Lagrange basis for scaling static output feedback polynomial matrix inequalities

    Full text link
    Using Hermite's formulation of polynomial stability conditions, static output feedback (SOF) controller design can be formulated as a polynomial matrix inequality (PMI), a (generally nonconvex) nonlinear semidefinite programming problem that can be solved (locally) with PENNON, an implementation of a penalty method. Typically, Hermite SOF PMI problems are badly scaled and experiments reveal that this has a negative impact on the overall performance of the solver. In this note we recall the algebraic interpretation of Hermite's quadratic form as a particular Bezoutian and we use results on polynomial interpolation to express the Hermite PMI in a Lagrange polynomial basis, as an alternative to the conventional power basis. Numerical experiments on benchmark problem instances show the substantial improvement brought by the approach, in terms of problem scaling, number of iterations and convergence behavior of PENNON
    • 

    corecore