51,391 research outputs found
Exponential Concentration of Stochastic Approximation with Non-vanishing Gradient
We analyze the behavior of stochastic approximation algorithms where
iterates, in expectation, make progress towards an objective at each step. When
progress is proportional to the step size of the algorithm, we prove
exponential concentration bounds. These tail-bounds contrast asymptotic
normality results which are more frequently associated with stochastic
approximation. The methods that we develop rely on a geometric ergodicity
proof. This extends a result on Markov chains due to Hajek (1982) to the area
of stochastic approximation algorithms. For Projected Stochastic Gradient
Descent with a non-vanishing gradient, our results can be used to prove
and linear convergence rates.Comment: 20 pages, 6 Figure
Stability of Q-Learning Through Design and Optimism
Q-learning has become an important part of the reinforcement learning toolkit
since its introduction in the dissertation of Chris Watkins in the 1980s. The
purpose of this paper is in part a tutorial on stochastic approximation and
Q-learning, providing details regarding the INFORMS APS inaugural Applied
Probability Trust Plenary Lecture, presented in Nancy France, June 2023.
The paper also presents new approaches to ensure stability and potentially
accelerated convergence for these algorithms, and stochastic approximation in
other settings. Two contributions are entirely new:
1. Stability of Q-learning with linear function approximation has been an
open topic for research for over three decades. It is shown that with
appropriate optimistic training in the form of a modified Gibbs policy, there
exists a solution to the projected Bellman equation, and the algorithm is
stable (in terms of bounded parameter estimates). Convergence remains one of
many open topics for research.
2. The new Zap Zero algorithm is designed to approximate the Newton-Raphson
flow without matrix inversion. It is stable and convergent under mild
assumptions on the mean flow vector field for the algorithm, and compatible
statistical assumption on an underlying Markov chain. The algorithm is a
general approach to stochastic approximation which in particular applies to
Q-learning with "oblivious" training even with non-linear function
approximation.Comment: Companion paper to the INFORMS APS inaugural Applied Probability
Trust Plenary Lecture, presented in Nancy France, June 2023. Slides available
online, Online, DOI 10.13140/RG.2.2.24897.3312
- …