CORE
🇺🇦Â
 make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
slides
On the Global Convergence Rates of Softmax Policy Gradient Methods
Authors
Jincheng Mei
Dale Schuurmans
Csaba Szepesvari
Chenjun Xiao
Publication date
4 September 2020
Publisher
View
on
arXiv
Abstract
We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a
O
(
1
/
t
)
O(1/t)
O
(
1/
t
)
rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a \L{}ojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate
O
(
e
−
t
)
O(e^{-t})
O
(
e
−
t
)
toward softmax optimal policy. This result resolves an open question in the recent literature. Finally, combining the above two results and additional new
Ω
(
1
/
t
)
\Omega(1/t)
Ω
(
1/
t
)
lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. The separation of rates is further explained using the notion of non-uniform \L{}ojasiewicz degree. These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.Comment: 64 pages, 5 figures. Published in ICML 202
Similar works
Full text
Available Versions
arXiv.org e-Print Archive
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:arXiv.org:2005.06392
Last time updated on 14/05/2020