29,604 research outputs found
Optimal Reinforcement Learning for Gaussian Systems
The exploration-exploitation trade-off is among the central challenges of
reinforcement learning. The optimal Bayesian solution is intractable in
general. This paper studies to what extent analytic statements about optimal
learning are possible if all beliefs are Gaussian processes. A first order
approximation of learning of both loss and dynamics, for nonlinear,
time-varying systems in continuous time and space, subject to a relatively weak
restriction on the dynamics, is described by an infinite-dimensional partial
differential equation. An approximate finite-dimensional projection gives an
impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again,
please note some nontrivial changes to exposition and interpretation of the
results, in particular in Equation (9) and Eqs. 11-14. The algorithm and
results have remained the same, but their theoretical interpretation has
change
- …