1 research outputs found
Empirical Dynamic Programming
We propose empirical dynamic programming algorithms for Markov decision
processes (MDPs). In these algorithms, the exact expectation in the Bellman
operator in classical value iteration is replaced by an empirical estimate to
get `empirical value iteration' (EVI). Policy evaluation and policy improvement
in classical policy iteration are also replaced by simulation to get `empirical
policy iteration' (EPI). Thus, these empirical dynamic programming algorithms
involve iteration of a random operator, the empirical Bellman operator. We
introduce notions of probabilistic fixed points for such random monotone
operators. We develop a stochastic dominance framework for convergence analysis
of such operators. We then use this to give sample complexity bounds for both
EVI and EPI. We then provide various variations and extensions to asynchronous
empirical dynamic programming, the minimax empirical dynamic program, and show
how this can also be used to solve the dynamic newsvendor problem. Preliminary
experimental results suggest a faster rate of convergence than stochastic
approximation algorithms.Comment: 34 Pages, 1 Figur