3 research outputs found
Selecting Computations: Theory and Applications
Sequential decision problems are often approximately solvable by simulating
possible future action sequences. {\em Metalevel} decision procedures have been
developed for selecting {\em which} action sequences to simulate, based on
estimating the expected improvement in decision quality that would result from
any particular simulation; an example is the recent work on using bandit
algorithms to control Monte Carlo tree search in the game of Go. In this paper
we develop a theoretical basis for metalevel decisions in the statistical
framework of Bayesian {\em selection problems}, arguing (as others have done)
that this is more appropriate than the bandit framework. We derive a number of
basic results applicable to Monte Carlo selection problems, including the first
finite sampling bounds for optimal policies in certain cases; we also provide a
simple counterexample to the intuitive conjecture that an optimal policy will
necessarily reach a decision in all cases. We then derive heuristic
approximations in both Bayesian and distribution-free settings and demonstrate
their superiority to bandit-based heuristics in one-shot decision problems and
in Go.Comment: 10 pages, UAI 201