216 research outputs found

    Dopamine, uncertainty and TD learning

    Get PDF
    Substantial evidence suggests that the phasic activities of dopaminergic neurons in the primate midbrain represent a temporal difference (TD) error in predictions of future reward, with increases above and decreases below baseline consequent on positive and negative prediction errors, respectively. However, dopamine cells have very low baseline activity, which implies that the representation of these two sorts of error is asymmetric. We explore the implications of this seemingly innocuous asymmetry for the interpretation of dopaminergic firing patterns in experiments with probabilistic rewards which bring about persistent prediction errors. In particular, we show that when averaging the non-stationary prediction errors across trials, a ramping in the activity of the dopamine neurons should be apparent, whose magnitude is dependent on the learning rate. This exact phenomenon was observed in a recent experiment, though being interpreted there in antipodal terms as a within-trial encoding of uncertainty

    Tamping Ramping: Algorithmic, Implementational, and Computational Explanations of Phasic Dopamine Signals in the Accumbens.

    Get PDF
    Substantial evidence suggests that the phasic activity of dopamine neurons represents reinforcement learning's temporal difference prediction error. However, recent reports of ramp-like increases in dopamine concentration in the striatum when animals are about to act, or are about to reach rewards, appear to pose a challenge to established thinking. This is because the implied activity is persistently predictable by preceding stimuli, and so cannot arise as this sort of prediction error. Here, we explore three possible accounts of such ramping signals: (a) the resolution of uncertainty about the timing of action; (b) the direct influence of dopamine over mechanisms associated with making choices; and (c) a new model of discounted vigour. Collectively, these suggest that dopamine ramps may be explained, with only minor disturbance, by standard theoretical ideas, though urgent questions remain regarding their proximal cause. We suggest experimental approaches to disentangling which of the proposed mechanisms are responsible for dopamine ramps

    Midbrain dopamine neurons signal phasic and ramping reward prediction error during goal-directed navigation

    Get PDF
    Goal-directed navigation requires learning to accurately estimate location and select optimal actions in each location. Midbrain dopamine neurons are involved in reward value learning and have been linked to reward location learning. They are therefore ideally placed to provide teaching signals for goal-directed navigation. By imaging dopamine neural activity as mice learned to actively navigate a closed-loop virtual reality corridor to obtain reward, we observe phasic and pre-reward ramping dopamine activity, which are modulated by learning stage and task engagement. A Q-learning model incorporating position inference recapitulates our results, displaying prediction errors resembling phasic and ramping dopamine neural activity. The model predicts that ramping is followed by improved task performance, which we confirm in our experimental data, indicating that the dopamine ramp may have a teaching effect. Our results suggest that midbrain dopamine neurons encode phasic and ramping reward prediction error signals to improve goal-directed navigation

    Valuation and Decision-Making in Cortical-Striatal Circuits.

    Full text link
    Adaptive decision-making relies on a distributed network of neural substrates that learn associations between behaviors and outcomes, to ultimately guide future behavior. These substrates are organized in a system of cortical-striatal loops that offer unique contributions to goal-directed behavior and receive prominent inputs from the midbrain dopamine system. However, the consequences of dopamine fluctuations at these targets remain largely unresolved, despite aggressive interrogation. Some experiments have highlighted dopamine’s role in learning via reward prediction errors, while others have noted the importance of dopamine in motivated behavior. Here, we explored the precise role of dopamine in shaping decision-making in cortex and striatum. First, we measure dopamine in ventral striatum during a trial-and-error task and show that it uniformly encodes a moment-by-moment estimate of value across multiple timescales. Our optogenetic manipulations demonstrate that changes in this value signal can be used to immediately enhance vigor, consistent with a motivational signal, and alter subsequent choice behavior, consistent with a learning signal. Next, I measured dopamine in multiple cortical-striatal loops to examine the uniformity of the value signal. I report that dopamine is non-uniform across circuits, but is consistent within them, implying that dopamine may offer unique contributions to the information processed in each loop. Finally, I performed single-unit recordings in the dorsal striatum, a major recipient of dopamine, to examine whether distinct its subcompartments—the patch and matrix—carry distinct value signals used in the selection of actions. I report preliminary data and summarize improvements in my electrode localization technique.PhDPsychologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133227/1/jpettibo_1.pd

    Dopamine Contributions to Motivational Vigor and Reinforcement Driven Learning.

    Full text link
    Brain mechanisms for reinforcement learning and adaptive decision-making are widely accepted to critically involve the basal ganglia (BG) and the neurotransmitter dopamine (DA). DA is a key modulator of synaptic plasticity within the striatum, critically regulating neurophysiological adaptations for normal reinforcement driven learning, and maladaptive changes during disease conditions (e.g. drug addiction, Parkinson’s disease). Activity in midbrain DA cells are reported to encode errors in reward prediction, providing a learning signal to guide future behaviors. Yet, dopamine is also a key modulatory of motivation, invigorating current behavior. Prevailing theories of DA emphasize its role in either affecting current performance, or modulating reward-related learning. This thesis will present data aimed at resolving gaps in the literature for how DA makes simultaneous contributions to dissociable learning and motivational processes. Specifically, I argue that striatal DA fluctuations signal a single decision variable: a Value function (an ongoing estimate of discounted future rewards) that is used for motivational decision making ('Is It worth it?') and that abrupt deflections in this value function serve as temporal-difference reward prediction errors used for reinforcement/learning ("repeat action?”). These DA prediction errors may be causally involved in strengthening some, but not all, valuation mechanisms. Furthermore, DA activity on the midbrain-forebrain axis indicate a dissociation between DA cell bodies and their striatal terminals. I propose that this is an adaptive computational strategy, whereby DA targets tailor release to their own computational requirements, potentially converting an RPE-like spike signal into a motivational (value) message.PHDNeuroscienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135768/1/hamidaa_1.pd

    Dopaminergic signalling during goal-directed behaviour in a structured environment

    Get PDF
    During flexible behaviour, dopamine is thought to carry reward prediction errors (RPEs), which update values and hence modify future behaviour. However, in real-world situations where the statistical relationships in the environment can be learned, continuously adapting values is not always the most efficient way of adapting to change. Moreover, the environment is not always fully observable, and observations may provide only partial information about the current state of the world. In such partial observable structured environments, as is found in many real-world situations, it is not well understood what kind of information dopamine conveys or its causal role in shaping adaptive behaviour. To probe dopamine’s involvement in goal-directed behavioural flexibility in such environments, we measured and manipulated dopamine while mice solved a partially observable structured sequential decision task. In chapter 3 we show that mice solve such a task using state inference. In chapter 4, we recorded calcium activity from dopaminergic cell bodies in the ventral tegmental area and dopamine axonal projections in the ventral and dorsomedial striatum, as well as dopaminergic concentrations in the same striatal regions. Dopamine multiplexed a wide range of information. At different timescales dopamine signalling was consistent with carrying information about choice-specific RPEs, choice-independent reward history and lateralised movement signals. RPE computations were shaped by task structure and the inferred state of the task. Nonetheless, in chapter 5, we show that although dopamine responded strongly to rewards, optogenetically activating or inhibiting dopamine at the time of trial outcome had no effect on subsequent choice. However, in a different task context, we could show that the same stimulation had a substantial effect on animals’ choices. Therefore, we conclude that when inference guides choice, rewards have a dopamine-independent influence on policy through the information they carry about the world’s state

    Dynamics of dopamine signaling and network activity in the striatum during learning and motivated pursuit of goals

    Get PDF
    Thesis (Ph. D. in Neuroscience)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2013.Cataloged from PDF version of thesis. "February 2013."Includes bibliographical references (p. 118-126).Learning to direct behaviors towards goals is a central function of all vertebrate nervous systems. Initial learning often involves an exploratory phase, in which actions are flexible and highly variable. With repeated successful experience, behaviors may be guided by cues in the environment that reliably predict the desired outcome, and eventually behaviors can be executed as crystallized action sequences, or "habits", which are relatively inflexible. Parallel circuits through the basal ganglia and their inputs from midbrain dopamine neurons are believed to make critical contributions to these phases of learning and behavioral execution. To explore the neural mechanisms underlying goal-directed learning and behavior, I have employed electrophysiological and electrochemical techniques to measure neural activity and dopamine release in networks of the striatum, the principle input nucleus of the basal ganglia as rats learned to pursue rewards in mazes. The electrophysiological recordings revealed training dependent dynamics in striatum local field potentials and coordinated neural firing that may differentially support both network rigidity and flexibility during pursuit of goals. Electrochemical measurements of real-time dopamine signaling during maze running revealed prolonged signaling changes that may contribute to motivating or guiding behavior. Pathological over or under-expression of these network states may contribute to symptoms experienced in a range of basal ganglia disorders, from Parkinson's disease to drug addiction.by Mark W. Howe.Ph.D.in Neuroscienc

    The value of what’s to come: Neural mechanisms coupling prediction error and the utility of anticipation

    Get PDF
    Having something to look forward to is a keystone of well-being. Anticipation of future reward, such as an upcoming vacation, can often be more gratifying than the experience itself. Theories suggest the utility of anticipation underpins various behaviors, ranging from beneficial information-seeking to harmful addiction. However, how neural systems compute anticipatory utility remains unclear. We analyzed the brain activity of human participants as they performed a task involving choosing whether to receive information predictive of future pleasant outcomes. Using a computational model, we show three brain regions orchestrate anticipatory utility. Specifically, ventromedial prefrontal cortex tracks the value of anticipatory utility, dopaminergic midbrain correlates with information that enhances anticipation, while sustained hippocampal activity mediates a functional coupling between these regions. Our findings suggest a previously unidentified neural underpinning for anticipation’s influence over decision-making and unify a range of phenomena associated with risk and time-delay preference

    The Value of Beliefs

    Get PDF

    A Neurocomputational Model of the Functional Role of Dopamine in Stimulus-Response Task Learning and Performance

    Get PDF
    Thesis (Ph.D.) - Indiana University, Psychology, 2009The neuromodulatory neurotransmitter dopamine (DA) plays a complex, but central role in the learning and performance of stimulus-response (S-R) behaviors. Studies have implicated DA's role in reward-driven learning and also its role in setting the overall level of vigor or frequency of response. Here, a neurocomputational model is developed which models DA's influence on a set of brain regions believed to be involved in the learning and execution of S-R tasks, including frontal cortex, basal ganglia, and cingulate cortex. An `actor' component of the model is trained, using `babble' (random behavior selection) and `critic' (rewarding and punishing) components of the model, to perform acceptance/rejection responses upon presentation of color stimuli in the context of recently presented auditory tones. The model behaves like an autonomous organism learning (and relearning) through `trial-and-error'. The focus of the study, the impact of hypo- and hyper-normal DA activity on this model, is investigated by three different dopaminergic pathways--two striatal and one prefrontal cortical--being manipulated independently during the learning and performance of the color response task. Hypo-DA conditions, analogous to Parkinsonism, cause slowing and reduction of frequency of learned responses, and, at extremes, degrade the learning (either initial or reversal) of the task. Hyper-DA conditions, analogous to psychostimulant effects, cause more rapid response times, but also can lead to perseveration of incorrect learning of response on the task. The presence of these effects often depends on which DA-ergic pathway is manipulated, however, which has implications for interpretation of the pharmacological experimental data. The proposed model embodies an integrative theory of dopamine function which suggests that the base rate of DA cell activity encodes the overall `activity-oriented motivation' of the organism, with hunger and/or expectation of reward driving both response vigor and tendency to generate an explorative `babble' response. This more `tonic' feature of DA functionality coexists naturally with the more extensively-studied `phasic' reward-learning features. The model may provide better insights on the role of DA system dysfunction in the cognitive and motivational symptoms of disorders such as Parkinsonism, psychostimulant abuse, ADHD, OCD, and schizophrenia, accounting for deficits in both learning and performance of tasks
    • …
    corecore