621 research outputs found

    Dopamine Bonuses

    Get PDF
    Substantial data support a temporal difference (TD) model of dopamine (DA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain circumstances, DA activity seems anomalous under the TD model, responding to non-rewarding stimuli. We address these anomalies by suggesting that DA cells multiplex information about reward bonuses, including Sutton\u27s exploration bonuses and Ng et al\u27s non-distorting shaping bonuses. We interpret this additional role for DA in terms of the unconditional attentional and psychomotor effects of dopamine, having the computational role of guiding exploration

    Dopamine: Generalization and Bonuses

    Get PDF
    In the temporal difference model of primate dopamine neurons, their phasic activity reports a prediction error for future reward. This model is supported by a wealth of experimental data. However, in certain circumstances, the activity of the dopamine cells seems anomalous under the model, as they respond in particular ways to stimuli that are not obviously related to predictions of reward. In this paper, we address two important sets of anomalies, those having to do with generalization and novelty. Generalization responses are treated as the natural consequence of partial information; novelty responses are treated by the suggestion that dopamine cells multiplex information about reward bonuses, including exploration bonuses and shaping bonuses. We interpret this additional role for dopamine in terms of the mechanistic attentional and psychomotor effects of dopamine, having the computational role of guiding exploration

    Contextual novelty changes reward representations in the striatum

    Get PDF
    Reward representation in ventral striatum is boosted by perceptual novelty, although the mechanism of this effect remains elusive. Animal studies indicate a functional loop (Lisman and Grace, 2005) that includes hippocampus, ventral striatum, and midbrain as being important in regulating salience attribution within the context of novel stimuli. According to this model, reward responses in ventral striatum or midbrain should be enhanced in the context of novelty even if reward and novelty constitute unrelated, independent events. Using fMRI, we show that trials with reward-predictive cues and subsequent outcomes elicit higher responses in the striatum if preceded by an unrelated novel picture, indicating that reward representation is enhanced in the context of novelty. Notably, this effect was observed solely when reward occurrence, and hence reward-related salience, was low. These findings support a view that contextual novelty enhances neural responses underlying reward representation in the striatum and concur with the effects of novelty processing as predicted by the model of Lisman and Grace (2005)

    Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcement driving both action acquisition and reward maximization: A simulated robotic study

    Get PDF
    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions

    What causes aberrant salience in schizophrenia? A role for impaired short-term habituation and the GRIA1 (GluA1) AMPA receptor subunit.

    Get PDF
    The GRIA1 locus, encoding the GluA1 (also known as GluRA or GluR1) AMPA glutamate receptor subunit, shows genome-wide association to schizophrenia. As well as extending the evidence that glutamatergic abnormalities have a key role in the disorder, this finding draws attention to the behavioural phenotype of Gria1 knockout mice. These mice show deficits in short-term habituation. Importantly, under some conditions the attention being paid to a recently presented neutral stimulus can actually increase rather than decrease (sensitization). We propose that this mouse phenotype represents a cause of aberrant salience and, in turn, that aberrant salience (and the resulting positive symptoms) in schizophrenia may arise, at least in part, from a glutamatergic genetic predisposition and a deficit in short-term habituation. This proposal links an established risk gene with a psychological process central to psychosis and is supported by findings of comparable deficits in short-term habituation in mice lacking the NMDAR receptor subunit Grin2a (which also shows association to schizophrenia). As aberrant salience is primarily a dopaminergic phenomenon, the model supports the view that the dopaminergic abnormalities can be downstream of a glutamatergic aetiology. Finally, we suggest that, as illustrated here, the real value of genetically modified mice is not as ‘models of schizophrenia’ but as experimental tools that can link genomic discoveries with psychological processes and help elucidate the underlying neural mechanisms

    Reinforcement learning or active inference?

    Get PDF
    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain
    corecore