Search CORE

51,995 research outputs found

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

Author: Liu Bo
Whiteson Shimon
Yao Hengshuai
Zhang Shangtong
Publication venue
Publication date: 01/01/2020
Field of study

We present the first provably convergent two-timescale off-policy actor-critic algorithm (COF-PAC) with function approximation. Key to COF-PAC is the introduction of a new critic, the emphasis critic, which is trained via Gradient Emphasis Learning (GEM), a novel combination of the key ideas of Gradient Temporal Difference Learning and Emphatic Temporal Difference Learning. With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.Comment: ICML 202

arXiv.org e-Print Archive

Oxford University Research Archive

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Author: Bhatnagar Shalabh
Karmakar Prasenjit
Publication venue
Publication date: 25/02/2017
Field of study

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.Comment: 23 pages (relaxed some important assumptions from the previous version), accepted in Mathematics of Operations Research in Feb, 201

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications