33,606 research outputs found
Generalized Policy Iteration for Optimal Control in Continuous Time
This paper proposes the Deep Generalized Policy Iteration (DGPI) algorithm to
find the infinite horizon optimal control policy for general nonlinear
continuous-time systems with known dynamics. Unlike existing adaptive dynamic
programming algorithms for continuous time systems, DGPI does not require the
admissibility of initialized policy, and input-affine nature of controlled
systems for convergence. Our algorithm employs the actor-critic architecture to
approximate both policy and value functions with the purpose of iteratively
solving the Hamilton-Jacobi-Bellman equation. Both the policy and value
functions are approximated by deep neural networks. Given any arbitrary initial
policy, the proposed DGPI algorithm can eventually converge to an admissible,
and subsequently an optimal policy for an arbitrary nonlinear system. We also
relax the update termination conditions of both the policy evaluation and
improvement processes, which leads to a faster convergence speed than
conventional Policy Iteration (PI) methods, for the same architecture of
function approximators. We further prove the convergence and optimality of the
algorithm with thorough Lyapunov analysis, and demonstrate its generality and
efficacy using two detailed numerical examples
- …