Bayesian model-based reinforcement learning is a formally elegant approach to
learning optimal behaviour under model uncertainty, trading off exploration and
exploitation in an ideal way. Unfortunately, finding the resulting
Bayes-optimal policies is notoriously taxing, since the search space becomes
enormous. In this paper we introduce a tractable, sample-based method for
approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our
approach outperformed prior Bayesian model-based RL algorithms by a significant
margin on several well-known benchmark problems -- because it avoids expensive
applications of Bayes rule within the search tree by lazily sampling models
from the current beliefs. We illustrate the advantages of our approach by
showing it working in an infinite state space domain which is qualitatively out
of reach of almost all previous work in Bayesian exploration.Comment: 14 pages, 7 figures, includes supplementary material. Advances in
Neural Information Processing Systems (NIPS) 201