621 research outputs found
Dopamine Bonuses
Substantial data support a temporal difference (TD) model of dopamine (DA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain circumstances, DA activity seems anomalous under the TD model, responding to non-rewarding stimuli. We address these anomalies by suggesting that DA cells multiplex information about reward bonuses, including Sutton\u27s exploration bonuses and Ng et al\u27s non-distorting shaping bonuses. We interpret this additional role for DA in terms of the unconditional attentional and psychomotor effects of dopamine, having the computational role of guiding exploration
Dopamine: Generalization and Bonuses
In the temporal difference model of primate dopamine neurons, their phasic activity reports a prediction error for future reward. This model is supported by a wealth of experimental data. However, in certain circumstances, the activity of the dopamine cells seems anomalous under the model, as they respond in particular ways to stimuli that are not obviously related to predictions of reward. In this paper, we address two important sets of anomalies, those having to do with generalization and novelty. Generalization responses are treated as the natural consequence of partial information; novelty responses are treated by the suggestion that dopamine cells multiplex information about reward bonuses, including exploration bonuses and shaping bonuses. We interpret this additional role for dopamine in terms of the mechanistic attentional and psychomotor effects of dopamine, having the computational role of guiding exploration
Contextual novelty changes reward representations in the striatum
Reward representation in ventral striatum is boosted by perceptual novelty, although the mechanism of this effect remains elusive. Animal studies indicate a functional loop (Lisman and Grace, 2005) that includes hippocampus, ventral striatum, and midbrain as being important in regulating salience attribution within the context of novel stimuli. According to this model, reward responses in ventral striatum or midbrain should be enhanced in the context of novelty even if reward and novelty constitute unrelated, independent events. Using fMRI, we show that trials with reward-predictive cues and subsequent outcomes elicit higher responses in the striatum if preceded by an unrelated novel picture, indicating that reward representation is enhanced in the context of novelty. Notably, this effect was observed solely when reward occurrence, and hence reward-related salience, was low. These findings support a view that contextual novelty enhances neural responses underlying reward representation in the striatum and concur with the effects of novelty processing as predicted by the model of Lisman and Grace (2005)
Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcement driving both action acquisition and reward maximization: A simulated robotic study
An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions
Recommended from our members
The value of novelty in schizophrenia
Influential models of schizophrenia suggest that patients experience incoming stimuli as excessively novel and motivating, with important consequences for hallucinatory experience and delusional belief. However, whether schizophrenia patients exhibit excessive novelty value and whether this interferes with adaptive behaviour has not yet been formally tested. Here, we employed a three-armed bandit task to investigate this hypothesis. Schizophrenia patients and healthy controls were first familiarised with a group of images and then asked to repeatedly choose between familiar and unfamiliar images associated with different monetary reward probabilities. By fitting a reinforcement-learning model we were able to estimate the values attributed to familiar and unfamiliar images when first presented in the context of the decision-making task. In line with our hypothesis, we found increased preference for newly introduced images (irrespective of whether these were familiar or unfamiliar) in patients compared to healthy controls and this to correlate with severity of hallucinatory experience. In addition, we found a correlation between value assigned to novel images and task performance, suggesting that excessive novelty value may interfere with optimal learning in patients, putatively through the disruption of the mechanisms regulating exploration versus exploitation. Our results suggest excessive novelty value in patients, whereby even previously seen stimuli acquire higher value as the result of their exposure in a novel context – a form of ‘hyper novelty’ which may explain why patients are often attracted by familiar stimuli experienced as new
What causes aberrant salience in schizophrenia? A role for impaired short-term habituation and the GRIA1 (GluA1) AMPA receptor subunit.
The GRIA1 locus, encoding the GluA1 (also known as GluRA or GluR1) AMPA glutamate receptor subunit, shows genome-wide association to schizophrenia. As well as extending the evidence that glutamatergic abnormalities have a key role in the disorder, this finding draws attention to the behavioural phenotype of Gria1 knockout mice. These mice show deficits in short-term habituation. Importantly, under some conditions the attention being paid to a recently presented neutral stimulus can actually increase rather than decrease (sensitization). We propose that this mouse phenotype represents a cause of aberrant salience and, in turn, that aberrant salience (and the resulting positive symptoms) in schizophrenia may arise, at least in part, from a glutamatergic genetic predisposition and a deficit in short-term habituation. This proposal links an established risk gene with a psychological process central to psychosis and is supported by findings of comparable deficits in short-term habituation in mice lacking the NMDAR receptor subunit Grin2a (which also shows association to schizophrenia). As aberrant salience is primarily a dopaminergic phenomenon, the model supports the view that the dopaminergic abnormalities can be downstream of a glutamatergic aetiology. Finally, we suggest that, as illustrated here, the real value of genetically modified mice is not as ‘models of schizophrenia’ but as experimental tools that can link genomic discoveries with psychological processes and help elucidate the underlying neural mechanisms
Reinforcement learning or active inference?
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain
Recommended from our members
Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision
Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect a prediction was (“reward prediction error,” RPE). In several reinforcement learning contexts, such as Pavlovian conditioning and decisions guided by reward history, this RPE signal is provided by midbrain dopamine neurons. In many situations, however, the stimuli predictive of outcomes are perceptually ambiguous. Perceptual uncertainty is known to influence choices, but it has been unclear whether or how dopamine neurons factor it into their teaching signal. To cope with uncertainty, we extended a reinforcement learning model with a belief state about the perceptually ambiguous stimulus; this model generates an estimate of the probability of choice correctness, termed decision confidence. We show that dopamine responses in monkeys performing a perceptually ambiguous decision task comply with the model’s predictions. Consequently, dopamine responses did not simply reflect a stimulus’ average expected reward value but were predictive of the trial-to-trial fluctuations in perceptual accuracy. These confidence-dependent dopamine responses emerged prior to monkeys’ choice initiation, raising the possibility that dopamine impacts impending decisions, in addition to encoding a post-decision teaching signal. Finally, by manipulating reward size, we found that dopamine neurons reflect both the upcoming reward size and the confidence in achieving it. Together, our results show that dopamine responses convey teaching signals that are also appropriate for perceptual decisions
- …