18 research outputs found

    Transient Calcium and Dopamine Increase PKA Activity and DARPP-32 Phosphorylation

    Get PDF
    Reinforcement learning theorizes that strengthening of synaptic connections in medium spiny neurons of the striatum occurs when glutamatergic input (from cortex) and dopaminergic input (from substantia nigra) are received simultaneously. Subsequent to learning, medium spiny neurons with strengthened synapses are more likely to fire in response to cortical input alone. This synaptic plasticity is produced by phosphorylation of AMPA receptors, caused by phosphorylation of various signalling molecules. A key signalling molecule is the phosphoprotein DARPP-32, highly expressed in striatal medium spiny neurons. DARPP-32 is regulated by several neurotransmitters through a complex network of intracellular signalling pathways involving cAMP (increased through dopamine stimulation) and calcium (increased through glutamate stimulation). Since DARPP-32 controls several kinases and phosphatases involved in striatal synaptic plasticity, understanding the interactions between cAMP and calcium, in particular the effect of transient stimuli on DARPP-32 phosphorylation, has major implications for understanding reinforcement learning. We developed a computer model of the biochemical reaction pathways involved in the phosphorylation of DARPP-32 on Thr34 and Thr75. Ordinary differential equations describing the biochemical reactions were implemented in a single compartment model using the software XPPAUT. Reaction rate constants were obtained from the biochemical literature. The first set of simulations using sustained elevations of dopamine and calcium produced phosphorylation levels of DARPP-32 similar to that measured experimentally, thereby validating the model. The second set of simulations, using the validated model, showed that transient dopamine elevations increased the phosphorylation of Thr34 as expected, but transient calcium elevations also increased the phosphorylation of Thr34, contrary to what is believed. When transient calcium and dopamine stimuli were paired, PKA activation and Thr34 phosphorylation increased compared with dopamine alone. This result, which is robust to variation in model parameters, supports reinforcement learning theories in which activity-dependent long-term synaptic plasticity requires paired glutamate and dopamine inputs

    Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes

    Get PDF
    Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time

    Local and global reward learning in the lateral frontal cortex show differential development during human adolescence

    Get PDF
    Reward-guided choice is fundamental for adaptive behaviour and depends on several component processes supported by prefrontal cortex. Here, across three studies, we show that two such component processes, linking reward to specific choices and estimating the global reward state, develop during human adolescence and are linked to the lateral portions of the prefrontal cortex. These processes reflect the assignment of rewards contingently to local choices, or noncontingently, to choices that make up the global reward history. Using matched experimental tasks and analysis platforms, we show the influence of both mechanisms increase during adolescence (study 1) and that lesions to lateral frontal cortex (that included and/or disconnected both orbitofrontal and insula cortex) in human adult patients (study 2) and macaque monkeys (study 3) impair both local and global reward learning. Developmental effects were distinguishable from the influence of a decision bias on choice behaviour, known to depend on medial prefrontal cortex. Differences in local and global assignments of reward to choices across adolescence, in the context of delayed grey matter maturation of the lateral orbitofrontal and anterior insula cortex, may underlie changes in adaptive behaviour

    Forget-me-some: General versus special purpose models in a hierarchical probabilistic task

    Get PDF
    Humans build models of their environments and act according to what they have learnt. In simple experimental environments, such model-based behaviour is often well accounted for as if subjects are ideal Bayesian observers. However, more complex probabilistic tasks require more sophisticated forms of inference that are sufficiently computationally and statistically taxing as to demand approximation. Here, we study properties of two approximation schemes in the context of a serial reaction time task in which stimuli were generated from a hierarchical Markov chain. One, pre-existing, scheme was a generically powerful variational method for hierarchical inference which has recently become popular as an account of psychological and neural data across a wide swathe of probabilistic tasks. A second, novel, scheme was more specifically tailored to the task at hand. We show that the latter model fit significantly better than the former. This suggests that our subjects were sensitive to many of the particular constraints of a complex behavioural task. Further, the tailored model provided a different perspective on the effects of cholinergic manipulations in the task. Neither model fit the behaviour on more complex contingencies that competently. These results illustrate the benefits and challenges that come with the general and special purpose modelling approaches and raise important questions of how they can advance our current understanding of learning mechanisms in the brain

    Dynamics of serotonergic neurons revealed by fiber photometry

    Get PDF
    This work was developed in the context of the MIT Portugal Program, area of Bioengineering Systems, in collaboration with the Champalimaud Research Programme, Champalimaud Center for the Unknown, Lisbon, Portugal. The project entitled Dynamics of serotonergic neurons revealed by fiber photometry was carried out at Instituto Gulbenkian de Ciência, Oeiras, Portugal and at the Champalimaud Research Programme, Champalimaud Center for the Unknown, Lisbon, PortugalSerotonin is an important neuromodulator implicated in the regulation of many physiological and cognitive processes. It is one of the most studied neuromodulators and one of the main targets of psychoactive drugs, since its dysregulation can contribute to altered perception and pathological conditions such as depression and obsessive-compulsive disorder. However, it is still one of the most mysterious and least understood neuromodulatory systems of the brain. In order to study the activity of serotonergic neurons in behaving mice, we used genetically encoded calcium indicators and developed a fiber photometry system to monitor neural activity from genetically defined populations of neurons. This approach was developed to study serotonin neurons but it can be used in any genetically defined neuronal population. To validate our approach, we first confirmed that increased neural activity, induced by electrical microstimulation, indeed produced increases in fluorescence detected by the system. We then used it to monitor activity in the dorsal striatum of freely behaving mice. We show that the two projection pathways of the basal ganglia are both active during spontaneous contraversive turns. Additionally, we show that this balanced activity in the two pathways is needed for such contraversive movements. Finally, we used the fiber photometry system to study the role of serotonin in learning and behavioral control and to compare it to that of dopamine, another important neuromodulator. Dopamine and serotonin are thought to act jointly to orchestrate learning and behavioral control. While dopamine is thought to invigorate behavior and drive learning by signaling reward prediction errors, i.e. better-than-expected outcomes, serotonin has been implicated in behavioral inhibition and aversive processing. More specifically, serotonin has been implicated in preventing perseverative responses in changing environments. However, whether or how serotonin neurons signal such changes is not clear. To investigate these issues, we used a reversal learning task in which mice first learned to associate different odor cues with specific outcomes and then we unexpectedly reversed these associations. We show that dorsal raphe serotonin neurons, like midbrain dopamine neurons, are specifically recruited following prediction errors that occur after reversal. Yet, unlike dopamine neurons, serotonin neurons are similarly activated by surprising events that are both better and worse than expected. Dopamine and serotonin responses both track learned cue-reward associations, but serotonin neurons are slower to adapt to the changes that occur at reversal. The different dynamics of these neurons following reversal creates an imbalance that favors dopamine activity when invigoration is needed to obtain rewards and serotonin activity when behavior should be inhibited. Our data supports a model in which serotonin acts by rapidly reporting erroneous associations, expectations or priors in order to suppress behaviors driven by such errors and enhance plasticity to facilitate error correction. Contrary to prevailing views, it supports a concept of serotonin based on primary functions in prediction, control and learning rather than affect and mood

    Hebbian learning and cognitive control : modeling and empirical studies

    Get PDF

    Modelling variations in human learning in probabilistic decision-making tasks

    Get PDF
    This thesis focused on evaluating the capacity of models of human learning to encapsulate the action choices of a range of individuals performing probabilistic decision-making tasks. To do so, an extensible evaluation framework, Tinker Taylor py (TTpy), was developed in Python allowing models to be compared like-for-like across a range of tasks. TTpy allows models, tasks and fitting methods to be added or replaced without affecting the other parts of the simulation and fitting process. Models were drawn from the reinforcement learning literature along with a few similarly structured Bayesian learning models. The fitting assumed that the same model was used throughout a task to make all the choices. Using TTpy, significant uncertainty was found in parameter recovery for short, simple tasks across a range of models. This was traced back to significant overlap in the action sequences plausibly produced by different combinations of parameters. Replacing softmax with epsilon greedy, as the way of calculating the action choice probabilities, was found to improve parameter recovery in simulated data. Datasets from three existing unpublished probabilistic decision-making tasks were examined. These datasets were chosen as they contained information on extraversion for all their participants, their tasks were well established, and the tasks had a gains-only promotion focus. Only one of the three tasks provided models where most of the model participant fits had strong evidence that they were better fits than uniform random action choices. In light of the difficulties in parameter recovery for individual participants, the unusual step was taken of averaging the recovered parameters across a subset of the best performing and most consistently recovered models within the same family. A significant correlation was found between this learning rate parameter and the participant extraversion measure when the softmax parameter variance was taken into account
    corecore