6,297 research outputs found
Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning
We present for the first time an asymptotic convergence analysis of two
time-scale stochastic approximation driven by `controlled' Markov noise. In
particular, both the faster and slower recursions have non-additive controlled
Markov noise components in addition to martingale difference noise. We analyze
the asymptotic behavior of our framework by relating it to limiting
differential inclusions in both time-scales that are defined in terms of the
ergodic occupation measures associated with the controlled Markov processes.
Finally, we present a solution to the off-policy convergence problem for
temporal difference learning with linear function approximation, using our
results.Comment: 23 pages (relaxed some important assumptions from the previous
version), accepted in Mathematics of Operations Research in Feb, 201
Bellman Error Based Feature Generation using Random Projections on Sparse Spaces
We address the problem of automatic generation of features for value function
approximation. Bellman Error Basis Functions (BEBFs) have been shown to improve
the error of policy evaluation with function approximation, with a convergence
rate similar to that of value iteration. We propose a simple, fast and robust
algorithm based on random projections to generate BEBFs for sparse feature
spaces. We provide a finite sample analysis of the proposed method, and prove
that projections logarithmic in the dimension of the original space are enough
to guarantee contraction in the error. Empirical results demonstrate the
strength of this method
- …