Search CORE

9,182 research outputs found

Risk, Unexpected Uncertainty, and Estimation Uncertainty: Bayesian Learning in Unstable Settings

Author: A Quinn
A Wagner
AC Courville
AJ Yu
AN Hampton
BA Strange
CD Fiorillo
D Draper
D Ellsberg
E Payzan-LeNestour
Elise Payzan-LeNestour
FH Knight
G Aston-Jones
G Vanni-Mercier
GI Christopoulos
J Dow
JD Cohen
JM Keynes
JM Pearce
JO Berger
K Craik
K Doya
K Preuschoff
K Preuschoff
K Sangjoon
LP Hansen
M Allais
M Basili
M d'Acremont
M Hsu
MFS Rushworth
MP Paulus
ND Daw
ND Daw
P Bossaerts
P Dayan
P Diaconis
Peter Bossaerts
PN Tobler
RE Kass
RH Thaler
S Huettel
S Ishii
S Kakade
SA Huettel
TEJ Behrens
Tim Behrens
U Rutishauser
W Epstein
W Yoshida
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Recently, evidence has emerged that humans approach learning using Bayesian updating rather than (model-free) reinforcement algorithms in a six-arm restless bandit problem. Here, we investigate what this implies for human appreciation of uncertainty. In our task, a Bayesian learner distinguishes three equally salient levels of uncertainty. First, the Bayesian perceives irreducible uncertainty or risk: even knowing the payoff probabilities of a given arm, the outcome remains uncertain. Second, there is (parameter) estimation uncertainty or ambiguity: payoff probabilities are unknown and need to be estimated. Third, the outcome probabilities of the arms change: the sudden jumps are referred to as unexpected uncertainty. We document how the three levels of uncertainty evolved during the course of our experiment and how it affected the learning rate. We then zoom in on estimation uncertainty, which has been suggested to be a driving force in exploration, in spite of evidence of widespread aversion to ambiguity. Our data corroborate the latter. We discuss neural evidence that foreshadowed the ability of humans to distinguish between the three levels of uncertainty. Finally, we investigate the boundaries of human capacity to implement Bayesian learning. We repeat the experiment with different instructions, reflecting varying levels of structural uncertainty. Under this fourth notion of uncertainty, choices were no better explained by Bayesian updating than by (model-free) reinforcement learning. Exit questionnaires revealed that participants remained unaware of the presence of unexpected uncertainty and failed to acquire the right model with which to implement Bayesian updating

Infoscience - École polytechnique fédérale de Lausanne

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

University of Melbourne Institutional Repository

Searching for rewards in graph-structured spaces

Author: Gershman S.
Schulz E.
Wu C.
Publication venue: 'Cognitive Computational Neuroscience'
Publication date: 01/01/2019
Field of study

How do people generalize and explore structured spaces? We study human behavior on a multi-armed bandit task, where rewards are influenced by the connectivity structure of a graph. A detailed predictive model comparison shows that a Gaussian Process regression model using a diffusion kernel is able to best describe participant choices, and also predict judgments about expected reward and confidence. This model unifies psychological models of function learning with the Successor Representation used in reinforcement learning, thereby building a bridge between different models of generalization

Crossref

MPG.PuRe

VIME: Variational Information Maximizing Exploration

Author: Abbeel Pieter
Chen Xi
De Turck Filip
Duan Yan
Houthooft Rein
Schulman John
Publication venue
Publication date: 01/01/2016
Field of study

Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.Comment: Published in Advances in Neural Information Processing Systems 29 (NIPS), pages 1109-111

arXiv.org e-Print Archive

Ghent University Academic Bibliography