Search CORE

37 research outputs found

Optimal Reinforcement Learning for Gaussian Systems

Author: Hennig Philipp
Publication venue
Publication date: 14/10/2011
Field of study

The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finite-dimensional projection gives an impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again, please note some nontrivial changes to exposition and interpretation of the results, in particular in Equation (9) and Eqs. 11-14. The algorithm and results have remained the same, but their theoretical interpretation has change

arXiv.org e-Print Archive

MPG.PuRe

Monte Carlo Bayesian Reinforcement Learning

Author: Hsu David
Lee Wee Sun
Wang Yi
Won Kok Sung
Publication venue
Publication date: 01/01/2012
Field of study

Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them. This paper presents Monte Carlo BRL (MC-BRL), a simple and general approach to BRL. MC-BRL samples a priori a finite set of hypotheses for the model parameter values and forms a discrete partially observable Markov decision process (POMDP) whose state space is a cross product of the state space for the reinforcement learning task and the sampled model parameter space. The POMDP does not require conjugate distributions for belief representation, as earlier works do, and can be solved relatively easily with point-based approximation algorithms. MC-BRL naturally handles both fully and partially observable worlds. Theoretical and experimental results show that the discrete POMDP approximates the underlying BRL task well with guaranteed performance.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

CiteSeerX

ScholarBank@NUS

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Author: Ahmed Faisal
Deng Li
Gao Jianfeng
Li Lihong
Li Xiujun
Lipton Zachary C.
Publication venue
Publication date: 19/11/2017
Field of study

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as

\epsilon

-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.Comment: 13 pages, 9 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

Author: Dayan Peter
Guez Arthur
Silver David
Publication venue
Publication date: 01/01/2012
Field of study

Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems -- because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.Comment: 14 pages, 7 figures, includes supplementary material. Advances in Neural Information Processing Systems (NIPS) 201

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery