24 research outputs found

    Midbrain dopamine neurons signal phasic and ramping reward prediction error during goal-directed navigation

    Get PDF
    Goal-directed navigation requires learning to accurately estimate location and select optimal actions in each location. Midbrain dopamine neurons are involved in reward value learning and have been linked to reward location learning. They are therefore ideally placed to provide teaching signals for goal-directed navigation. By imaging dopamine neural activity as mice learned to actively navigate a closed-loop virtual reality corridor to obtain reward, we observe phasic and pre-reward ramping dopamine activity, which are modulated by learning stage and task engagement. A Q-learning model incorporating position inference recapitulates our results, displaying prediction errors resembling phasic and ramping dopamine neural activity. The model predicts that ramping is followed by improved task performance, which we confirm in our experimental data, indicating that the dopamine ramp may have a teaching effect. Our results suggest that midbrain dopamine neurons encode phasic and ramping reward prediction error signals to improve goal-directed navigation

    Dopamine neurons learn relative chosen value from probabilistic rewards.

    Get PDF
    Economic theories posit reward probability as one of the factors defining reward value. Individuals learn the value of cues that predict probabilistic rewards from experienced reward frequencies. Building on the notion that responses of dopamine neurons increase with reward probability and expected value, we asked how dopamine neurons in monkeys acquire this value signal that may represent an economic decision variable. We found in a Pavlovian learning task that reward probability-dependent value signals arose from experienced reward frequencies. We then assessed neuronal response acquisition during choices among probabilistic rewards. Here, dopamine responses became sensitive to the value of both chosen and unchosen options. Both experiments showed also the novelty responses of dopamine neurones that decreased as learning advanced. These results show that dopamine neurons acquire predictive value signals from the frequency of experienced rewards. This flexible and fast signal reflects a specific decision variable and could update neuronal decision mechanisms

    Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration

    Get PDF
    When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning

    Components and characteristics of the dopamine reward utility signal.

    Get PDF
    Rewards are defined by their behavioral functions in learning (positive reinforcement), approach behavior, economic choices, and emotions. Dopamine neurons respond to rewards with two components, similar to higher order sensory and cognitive neurons. The initial, rapid, unselective dopamine detection component reports all salient environmental events irrespective of their reward association. It is highly sensitive to factors related to reward and thus detects a maximal number of potential rewards. It also senses aversive stimuli but reports their physical impact rather than their aversiveness. The second response component processes reward value accurately and starts early enough to prevent confusion with unrewarded stimuli and objects. It codes reward value as a numeric, quantitative utility prediction error, consistent with formal concepts of economic decision theory. Thus, the dopamine reward signal is fast, highly sensitive and appropriate for driving and updating economic decisions.Grant sponsor: the Wellcome Trust; Grant sponsor: the European Research Council (ERC); Grant sponsor: the National Institutes of Health Conte Center at Caltech.This is the accepted version. The final version is available via http://onlinelibrary.wiley.com/doi/10.1002/cne.23880/abstract

    Economic choices reveal probability distortion in macaque monkeys.

    Get PDF
    Economic choices are largely determined by two principal elements, reward value (utility) and probability. Although nonlinear utility functions have been acknowledged for centuries, nonlinear probability weighting (probability distortion) was only recently recognized as a ubiquitous aspect of real-world choice behavior. Even when outcome probabilities are known and acknowledged, human decision makers often overweight low probability outcomes and underweight high probability outcomes. Whereas recent studies measured utility functions and their corresponding neural correlates in monkeys, it is not known whether monkeys distort probability in a manner similar to humans. Therefore, we investigated economic choices in macaque monkeys for evidence of probability distortion. We trained two monkeys to predict reward from probabilistic gambles with constant outcome values (0.5 ml or nothing). The probability of winning was conveyed using explicit visual cues (sector stimuli). Choices between the gambles revealed that the monkeys used the explicit probability information to make meaningful decisions. Using these cues, we measured probability distortion from choices between the gambles and safe rewards. Parametric modeling of the choices revealed classic probability weighting functions with inverted-S shape. Therefore, the animals overweighted low probability rewards and underweighted high probability rewards. Empirical investigation of the behavior verified that the choices were best explained by a combination of nonlinear value and nonlinear probability distortion. Together, these results suggest that probability distortion may reflect evolutionarily preserved neuronal processing.This work was supported by the Wellcome Trust, European Research Council (ERC) and Caltech Conte Center.This is the final version of the article. It was first published by the Society for Neuroscience at http://www.jneurosci.org/content/35/7/3146.ful

    Dopamine Neuron-Specific Optogenetic Stimulation in Rhesus Macaques.

    Get PDF
    Optogenetic studies in mice have revealed new relationships between well-defined neurons and brain functions. However, there are currently no means to achieve the same cell-type specificity in monkeys, which possess an expanded behavioral repertoire and closer anatomical homology to humans. Here, we present a resource for cell-type-specific channelrhodopsin expression in Rhesus monkeys and apply this technique to modulate dopamine activity and monkey choice behavior. These data show that two viral vectors label dopamine neurons with greater than 95% specificity. Infected neurons were activated by light pulses, indicating functional expression. The addition of optical stimulation to reward outcomes promoted the learning of reward-predicting stimuli at the neuronal and behavioral level. Together, these results demonstrate the feasibility of effective and selective stimulation of dopamine neurons in non-human primates and a resource that could be applied to other cell types in the monkey brain.This work was supported by the Wellcome Trust (Principal Research Fellowship and Programme Grant 095495), European Research Council (ERC Advanced Grant 293549), and NIH Caltech Conte Center (P50MH094258)

    High-yield methods for accurate two-alternative visual psychophysics in head-fixed mice

    Get PDF
    Research in neuroscience increasingly relies on the mouse, a mammalian species that affords unparalleled genetic tractability and brain atlases. Here, we introduce high-yield methods for probing mouse visual decisions. Mice are head-fixed, facilitating repeatable visual stimulation, eye tracking, and brain access. They turn a steering wheel to make two alternative choices, forced or unforced. Learning is rapid thanks to intuitive coupling of stimuli to wheel position. The mouse decisions deliver high-quality psychometric curves for detection and discrimination and conform to the predictions of a simple probabilistic observer model. The task is readily paired with two-photon imaging of cortical activity. Optogenetic inactivation reveals that the task requires mice to use their visual cortex. Mice are motivated to perform the task by fluid reward or optogenetic stimulation of dopamine neurons. This stimulation elicits a larger number of trials and faster learning. These methods provide a platform to accurately probe mouse vision and its neural basis

    Afterimage size is modulated by size-contrast illusions.

    No full text
    Traditionally, the perceived size of negative afterimages has been examined in relation to E. Emmert's law (1881), a size-distance equation that states that changes in perceived size of an afterimage are a function of the distance of the surface on which it is projected. Here, we present evidence that the size of an afterimage is also modulated by its surrounding context. We employed a new version of the Ebbinghaus-Titchener illusion with flickering surrounding stimuli and a static inner target that generated a vivid afterimage of the latter but not the former. Observers were asked to give an initial manual estimate of the size of the inner target during the adaptation phase followed by another manual estimate of the size of the afterimage during the test phase. Manual estimates were affected by the size-contrast illusion both when the surrounding contextual elements were present during afterimage induction and when the surrounding elements were absent during the viewing of the afterimage (Experiment 1). Such a modulation in perceived size, however, did not occur when observers viewed only the flickering surrounding context for a prolonged period of time and then estimated the size of a static target presented on the monitor afterward, demonstrating that flickering stimuli by themselves did not produce any aftereffect on perceived size (Experiment 2). Furthermore, in a final experiment, we showed that the modulation observed in the test phase of Experiment 1 was not due to memory of the manual estimates that had been performed during the adaptation phase (Experiment 3). These findings provide clear evidence for the role of high-level cognitive processes on the perceived size of an afterimage beyond the retinal level. Thus, although retinal stimulation is required to induce an afterimage, post-retinal factors influence its perceived size

    GCaMP8 transgenic mice learn to make visual decisions

    No full text
    Transgenic mice engineered to express calcium indicators such as GCaMP have revolutionized exploration of neuronal circuit function. The latest development, GCaMP8 transgenic mice, exhibits enhanced temporal kinetics and sensitivity of neural signals, opening new avenues for studying neuronal dynamics within behaviorally relevant time frames. However, in initial attempts, it has been chalenging to train these mice in visual decision making tasks. Here we show that GCaMP8 transgenic mice, specifically TetO-jGCaMP8s x CaMK2a-tTA mice, learn to perform head-fixed visual decision tasks with a rate and accuracy comparable to wildtype mice. These proof-of-principle results enhance the utility of these transgenic animals in neuroscientific studies of learning and decision making
    corecore