1,552 research outputs found
Better Optimism By Bayes: Adaptive Planning with Rich Models
The computational costs of inference and planning have confined Bayesian
model-based reinforcement learning to one of two dismal fates: powerful
Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian
non-parametric models but using simple, myopic planning strategies such as
Thompson sampling. We ask whether it is feasible and truly beneficial to
combine rich probabilistic models with a closer approximation to fully Bayesian
planning. First, we use a collection of counterexamples to show formal problems
with the over-optimism inherent in Thompson sampling. Then we leverage
state-of-the-art techniques in efficient Bayes-adaptive planning and
non-parametric Bayesian methods to perform qualitatively better than both
existing conventional algorithms and Thompson sampling on two contextual
bandit-like problems.Comment: 11 pages, 11 figure
- …