4,905 research outputs found
Monte Carlo Bayesian Reinforcement Learning
Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in
a model and represents uncertainty in model parameters by maintaining a
probability distribution over them. This paper presents Monte Carlo BRL
(MC-BRL), a simple and general approach to BRL. MC-BRL samples a priori a
finite set of hypotheses for the model parameter values and forms a discrete
partially observable Markov decision process (POMDP) whose state space is a
cross product of the state space for the reinforcement learning task and the
sampled model parameter space. The POMDP does not require conjugate
distributions for belief representation, as earlier works do, and can be solved
relatively easily with point-based approximation algorithms. MC-BRL naturally
handles both fully and partially observable worlds. Theoretical and
experimental results show that the discrete POMDP approximates the underlying
BRL task well with guaranteed performance.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Better Optimism By Bayes: Adaptive Planning with Rich Models
The computational costs of inference and planning have confined Bayesian
model-based reinforcement learning to one of two dismal fates: powerful
Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian
non-parametric models but using simple, myopic planning strategies such as
Thompson sampling. We ask whether it is feasible and truly beneficial to
combine rich probabilistic models with a closer approximation to fully Bayesian
planning. First, we use a collection of counterexamples to show formal problems
with the over-optimism inherent in Thompson sampling. Then we leverage
state-of-the-art techniques in efficient Bayes-adaptive planning and
non-parametric Bayesian methods to perform qualitatively better than both
existing conventional algorithms and Thompson sampling on two contextual
bandit-like problems.Comment: 11 pages, 11 figure
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Near-Optimal BRL using Optimistic Local Transitions
Model-based Bayesian Reinforcement Learning (BRL) allows a found
formalization of the problem of acting optimally while facing an unknown
environment, i.e., avoiding the exploration-exploitation dilemma. However,
algorithms explicitly addressing BRL suffer from such a combinatorial explosion
that a large body of work relies on heuristic algorithms. This paper introduces
BOLT, a simple and (almost) deterministic heuristic algorithm for BRL which is
optimistic about the transition function. We analyze BOLT's sample complexity,
and show that under certain parameters, the algorithm is near-optimal in the
Bayesian sense with high probability. Then, experimental results highlight the
key differences of this method compared to previous work.Comment: ICML201
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Bayesian model-based reinforcement learning is a formally elegant approach to
learning optimal behaviour under model uncertainty, trading off exploration and
exploitation in an ideal way. Unfortunately, finding the resulting
Bayes-optimal policies is notoriously taxing, since the search space becomes
enormous. In this paper we introduce a tractable, sample-based method for
approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our
approach outperformed prior Bayesian model-based RL algorithms by a significant
margin on several well-known benchmark problems -- because it avoids expensive
applications of Bayes rule within the search tree by lazily sampling models
from the current beliefs. We illustrate the advantages of our approach by
showing it working in an infinite state space domain which is qualitatively out
of reach of almost all previous work in Bayesian exploration.Comment: 14 pages, 7 figures, includes supplementary material. Advances in
Neural Information Processing Systems (NIPS) 201
Bounded Optimal Exploration in MDP
Within the framework of probably approximately correct Markov decision
processes (PAC-MDP), much theoretical work has focused on methods to attain
near optimality after a relatively long period of learning and exploration.
However, practical concerns require the attainment of satisfactory behavior
within a short period of time. In this paper, we relax the PAC-MDP conditions
to reconcile theoretically driven exploration methods and practical needs. We
propose simple algorithms for discrete and continuous state spaces, and
illustrate the benefits of our proposed relaxation via theoretical analyses and
numerical examples. Our algorithms also maintain anytime error bounds and
average loss bounds. Our approach accommodates both Bayesian and non-Bayesian
methods.Comment: In Proceedings of the 30th AAAI Conference on Artificial Intelligence
(AAAI), 201
- …