6,723 research outputs found
Evidence for surprise minimization over value maximization in choice behavior
Classical economic models are predicated on the idea that the ultimate aim of choice is to maximize utility or reward. In contrast, an alternative perspective highlights the fact that adaptive behavior requires agents' to model their environment and minimize surprise about the states they frequent. We propose that choice behavior can be more accurately accounted for by surprise minimization compared to reward or utility maximization alone. Minimizing surprise makes a prediction at variance with expected utility models; namely, that in addition to attaining valuable states, agents attempt to maximize the entropy over outcomes and thus 'keep their options open'. We tested this prediction using a simple binary choice paradigm and show that human decision-making is better explained by surprise minimization compared to utility maximization. Furthermore, we replicated this entropy-seeking behavior in a control task with no explicit utilities. These findings highlight a limitation of purely economic motivations in explaining choice behavior and instead emphasize the importance of belief-based motivations
Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcement driving both action acquisition and reward maximization: A simulated robotic study
An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions
Saccade learning with concurrent cortical and subcortical basal ganglia loops
The Basal Ganglia is a central structure involved in multiple cortical and
subcortical loops. Some of these loops are believed to be responsible for
saccade target selection. We study here how the very specific structural
relationships of these saccadic loops can affect the ability of learning
spatial and feature-based tasks.
We propose a model of saccade generation with reinforcement learning
capabilities based on our previous basal ganglia and superior colliculus
models. It is structured around the interactions of two parallel cortico-basal
loops and one tecto-basal loop. The two cortical loops separately deal with
spatial and non-spatial information to select targets in a concurrent way. The
subcortical loop is used to make the final target selection leading to the
production of the saccade. These different loops may work in concert or disturb
each other regarding reward maximization. Interactions between these loops and
their learning capabilities are tested on different saccade tasks.
The results show the ability of this model to correctly learn basic target
selection based on different criteria (spatial or not). Moreover the model
reproduces and explains training dependent express saccades toward targets
based on a spatial criterion.
Finally, the model predicts that in absence of prefrontal control, the
spatial loop should dominate
Active Sensing as Bayes-Optimal Sequential Decision Making
Sensory inference under conditions of uncertainty is a major problem in both
machine learning and computational neuroscience. An important but poorly
understood aspect of sensory processing is the role of active sensing. Here, we
present a Bayes-optimal inference and control framework for active sensing,
C-DAC (Context-Dependent Active Controller). Unlike previously proposed
algorithms that optimize abstract statistical objectives such as information
maximization (Infomax) [Butko & Movellan, 2010] or one-step look-ahead accuracy
[Najemnik & Geisler, 2005], our active sensing model directly minimizes a
combination of behavioral costs, such as temporal delay, response error, and
effort. We simulate these algorithms on a simple visual search task to
illustrate scenarios in which context-sensitivity is particularly beneficial
and optimization with respect to generic statistical objectives particularly
inadequate. Motivated by the geometric properties of the C-DAC policy, we
present both parametric and non-parametric approximations, which retain
context-sensitivity while significantly reducing computational complexity.
These approximations enable us to investigate the more complex problem
involving peripheral vision, and we notice that the difference between C-DAC
and statistical policies becomes even more evident in this scenario.Comment: Scheduled to appear in UAI 201
- …