Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement
  Learning

Chen, Yen-Ju; Hsieh, Ping-Chun; Huang, Nai-Chieh

Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement Learning

Authors: Yen-Ju Chen
Ping-Chun Hsieh
Nai-Chieh Huang
Publication date: 18 October 2023
Publisher

Abstract

Policy gradient methods have recently been shown to enjoy global convergence at a

\Theta(1/t)

rate in the non-regularized tabular softmax setting. Accordingly, one important research question is whether this convergence rate can be further improved, with only first-order updates. In this paper, we answer the above question from the perspective of momentum by adapting the celebrated Nesterov's accelerated gradient (NAG) method to reinforcement learning (RL), termed \textit{Accelerated Policy Gradient} (APG). To demonstrate the potential of APG in achieving faster global convergence, we formally show that with the true gradient, APG with softmax policy parametrization converges to an optimal policy at a

\tilde{O}(1/t^2)

rate. To the best of our knowledge, this is the first characterization of the global convergence rate of NAG in the context of RL. Notably, our analysis relies on one interesting finding: Regardless of the initialization, APG could end up reaching a locally nearly-concave regime, where APG could benefit significantly from the momentum, within finite iterations. By means of numerical validation, we confirm that APG exhibits

\tilde{O}(1/t^2)

rate as well as show that APG could significantly improve the convergence behavior over the standard policy gradient.Comment: 51 pages, 8 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.11897

Last time updated on 06/01/2024