Search CORE

3 research outputs found

Improved upper bounds on the expected error in constant step-size Q-learning

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

Author: Chen Zaiwei
Maguluri Siva Theja
Shakkottai Sanjay
Shanmugam Karthikeyan
Publication venue
Publication date: 02/02/2021
Field of study

This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous Reinforcement Learning (RL) algorithms. We do this by first reformulating the RL algorithms as Markovian Stochastic Approximation (SA) algorithms to solve fixed-point equations. We then develop a Lyapunov analysis and derive mean-square error bounds on the convergence of the Markovian SA. Based on this central result, we establish finite-sample mean-square convergence bounds for asynchronous RL algorithms such as

Q

-learning,

n

-step TD, TD

(\lambda)

, and off-policy TD algorithms including V-trace. As a by-product, by analyzing the performance bounds of the TD

(\lambda)

(and

n

-step TD) algorithm for general

\lambda

(and

n

), we demonstrate a bias-variance trade-off, i.e., efficiency of bootstrapping in RL. This was first posed as an open problem in [37]

arXiv.org e-Print Archive