8 research outputs found
Q-learning with Nearest Neighbors
We consider model-free reinforcement learning for infinite-horizon discounted
Markov Decision Processes (MDPs) with a continuous state space and unknown
transition kernel, when only a single sample path under an arbitrary policy of
the system is available. We consider the Nearest Neighbor Q-Learning (NNQL)
algorithm to learn the optimal Q function using nearest neighbor regression
method. As the main contribution, we provide tight finite sample analysis of
the convergence rate. In particular, for MDPs with a -dimensional state
space and the discounted factor , given an arbitrary sample
path with "covering time" , we establish that the algorithm is guaranteed
to output an -accurate estimate of the optimal Q-function using
samples. For instance, for a
well-behaved MDP, the covering time of the sample path under the purely random
policy scales as so the sample
complexity scales as Indeed, we
establish a lower bound that argues that the dependence of is necessary.Comment: Accepted to NIPS 201
Non-parametric Approximate Dynamic Programming via the Kernel Method
This paper presents a novel and practical non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful, dimension-independent approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable replacement to state of the art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by ‘kernelizing ’ a recent mathematical program for ADP (the ‘smoothed ’ approximate LP) proposed by Desai et al. (2011). Our theoretical guarantees establish that the quality of the approximation produced by our procedure improves gracefully with sampling effort. Via a computational study on a controlled queueing network, we show that our non-parametric procedure outperforms the state of the art parametric ADP approaches and established heuristics. 1