Skip to main content
Article thumbnail
Location of Repository

An adaptive sampling algorithm for solving Markov decision processes

By Hyeong Soo Chang, Michael C. Fu, Jiaqiao Hu and Steven I. Marcus


informs ® doi 10.1287/opre.1040.0145 © 2005 INFORMS Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate �ln N �/N, where N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is O���A�N�H�, independent of the size of the state space, where �A � is the size of the action space and H is the horizon length. The algorithm can be used to create an approximate receding horizon control to solve infinite-horizon MDPs. To illustrate the algorithm, computational results are reported on simple examples from inventory control. Subject classifications: dynamic programming/optimal control: Markov finite state

Year: 2005
DOI identifier: 10.1287/opre.1040.0145
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.