Search CORE

6,297 research outputs found

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Author: Bhatnagar Shalabh
Karmakar Prasenjit
Publication venue
Publication date: 25/02/2017
Field of study

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.Comment: 23 pages (relaxed some important assumptions from the previous version), accepted in Mathematics of Operations Research in Feb, 201

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

Author: Farahmand Amir-massoud
Fard Mahdi Milani
Grinberg Yuri
Pineau Joelle
Precup Doina
Publication venue
Publication date: 21/09/2012
Field of study

We address the problem of automatic generation of features for value function approximation. Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections to generate BEBFs for sparse feature spaces. We provide a finite sample analysis of the proposed method, and prove that projections logarithmic in the dimension of the original space are enough to guarantee contraction in the error. Empirical results demonstrate the strength of this method

arXiv.org e-Print Archive

CiteSeerX