Policy gradient methods have recently been shown to enjoy global convergence
at a Θ(1/t) rate in the non-regularized tabular softmax setting.
Accordingly, one important research question is whether this convergence rate
can be further improved, with only first-order updates. In this paper, we
answer the above question from the perspective of momentum by adapting the
celebrated Nesterov's accelerated gradient (NAG) method to reinforcement
learning (RL), termed \textit{Accelerated Policy Gradient} (APG). To
demonstrate the potential of APG in achieving faster global convergence, we
formally show that with the true gradient, APG with softmax policy
parametrization converges to an optimal policy at a O~(1/t2) rate. To
the best of our knowledge, this is the first characterization of the global
convergence rate of NAG in the context of RL. Notably, our analysis relies on
one interesting finding: Regardless of the initialization, APG could end up
reaching a locally nearly-concave regime, where APG could benefit significantly
from the momentum, within finite iterations. By means of numerical validation,
we confirm that APG exhibits O~(1/t2) rate as well as show that APG
could significantly improve the convergence behavior over the standard policy
gradient.Comment: 51 pages, 8 figure