Search CORE

20,208 research outputs found

Bayesian reinforcement learning with exploration

Author: E. Even-Dar
I. Szita
K. Dyagilev
L. Orseau
M. Hutter
M. Hutter
M. Hutter
M. Kearns
M.G. Azar
P. Auer
P. Sunehag
S. Mannor
T. Lattimore
T. Lattimore
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case

Crossref

The Australian National University

A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

Author: Brochu Eric
Cora Vlad M.
de Freitas Nando
Publication venue
Publication date: 01/01/2009
Field of study

We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments---active user modelling with preferences, and hierarchical reinforcement learning---and a discussion of the pros and cons of Bayesian optimization based on our experiences

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Cover Tree Bayesian Reinforcement Learning

Author: Blekas Konstantinos
Dimitrakakis Christos
Tziortziotis Nikolaos
Publication venue
Publication date: 08/12/2013
Field of study

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library

Risk, Unexpected Uncertainty, and Estimation Uncertainty: Bayesian Learning in Unstable Settings

Author: A Quinn
A Wagner
AC Courville
AJ Yu
AN Hampton
BA Strange
CD Fiorillo
D Draper
D Ellsberg
E Payzan-LeNestour
Elise Payzan-LeNestour
FH Knight
G Aston-Jones
G Vanni-Mercier
GI Christopoulos
J Dow
JD Cohen
JM Keynes
JM Pearce
JO Berger
K Craik
K Doya
K Preuschoff
K Preuschoff
K Sangjoon
LP Hansen
M Allais
M Basili
M d'Acremont
M Hsu
MFS Rushworth
MP Paulus
ND Daw
ND Daw
P Bossaerts
P Dayan
P Diaconis
Peter Bossaerts
PN Tobler
RE Kass
RH Thaler
S Huettel
S Ishii
S Kakade
SA Huettel
TEJ Behrens
Tim Behrens
U Rutishauser
W Epstein
W Yoshida
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Recently, evidence has emerged that humans approach learning using Bayesian updating rather than (model-free) reinforcement algorithms in a six-arm restless bandit problem. Here, we investigate what this implies for human appreciation of uncertainty. In our task, a Bayesian learner distinguishes three equally salient levels of uncertainty. First, the Bayesian perceives irreducible uncertainty or risk: even knowing the payoff probabilities of a given arm, the outcome remains uncertain. Second, there is (parameter) estimation uncertainty or ambiguity: payoff probabilities are unknown and need to be estimated. Third, the outcome probabilities of the arms change: the sudden jumps are referred to as unexpected uncertainty. We document how the three levels of uncertainty evolved during the course of our experiment and how it affected the learning rate. We then zoom in on estimation uncertainty, which has been suggested to be a driving force in exploration, in spite of evidence of widespread aversion to ambiguity. Our data corroborate the latter. We discuss neural evidence that foreshadowed the ability of humans to distinguish between the three levels of uncertainty. Finally, we investigate the boundaries of human capacity to implement Bayesian learning. We repeat the experiment with different instructions, reflecting varying levels of structural uncertainty. Under this fourth notion of uncertainty, choices were no better explained by Bayesian updating than by (model-free) reinforcement learning. Exit questionnaires revealed that participants remained unaware of the presence of unexpected uncertainty and failed to acquire the right model with which to implement Bayesian updating

Infoscience - École polytechnique fédérale de Lausanne

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

University of Melbourne Institutional Repository

Sequential Decision Making with Untrustworthy Service Providers

Author: Chalkiadakis G.
Jennings N. R.
Rogers A.
Teacy W. T. L.
Publication venue
Publication date: 01/01/2008
Field of study

In this paper, we deal with the sequential decision making problem of agents operating in computational economies, where there is uncertainty regarding the trustworthiness of service providers populating the environment. Specifically, we propose a generic Bayesian trust model, and formulate the optimal Bayesian solution to the exploration-exploitation problem facing the agents when repeatedly interacting with others in such environments. We then present a computationally tractable Bayesian reinforcement learning algorithm to approximate that solution by taking into account the expected value of perfect information of an agent's actions. Our algorithm is shown to dramatically outperform all previous finalists of the international Agent Reputation and Trust (ART) competition, including the winner from both years the competition has been run

CiteSeerX

Southampton (e-Prints Soton)

Spiral - Imperial College Digital Repository

Near-Optimal BRL using Optimistic Local Transitions

Author: Araya Mauricio
Buffet Olivier
Thomas Vincent
Publication venue
Publication date: 18/06/2012
Field of study

Model-based Bayesian Reinforcement Learning (BRL) allows a found formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the exploration-exploitation dilemma. However, algorithms explicitly addressing BRL suffer from such a combinatorial explosion that a large body of work relies on heuristic algorithms. This paper introduces BOLT, a simple and (almost) deterministic heuristic algorithm for BRL which is optimistic about the transition function. We analyze BOLT's sample complexity, and show that under certain parameters, the algorithm is near-optimal in the Bayesian sense with high probability. Then, experimental results highlight the key differences of this method compared to previous work.Comment: ICML201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1