81 research outputs found

    Time representation in reinforcement learning models of the basal ganglia

    Get PDF
    Reinforcement learning (RL) models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between RL models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both RL and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired

    Biological cumulative learning through intrinsic motivations: a simulated robotic study on development of visually-guided reaching

    Get PDF
    This work aims to model the ability of biological organisms to achieve cumulative learning, i.e. to learn increasingly more complex skills on the basis of simpler ones. In particular, we studied how a simulated kinematic robotic system composed of an arm and an eye can learn the ability to reach for an object on the basis of the ability to systematically look at the object, which, in our set-up, represented a prerequisite for the reaching task. We designed the system by following several biological constraints and investigated which kind of sub-task reinforcements might facilitate the development of the final skill. We found that the performance in the reaching task was optimized when the reinforcement signal included not only the extrinsic reinforcement provided by touching the object but also an intrinsic reinforcement given by the error in the prediction of fovea activation. We discuss how these results might explain biological data regarding the neural basis of action discovery and reinforcement earning, in particular with respect to the neuromodulator dopamine

    Interactions between the Midbrain Superior Colliculus and the Basal Ganglia

    Get PDF
    An important component of the architecture of cortico-basal ganglia connections is the parallel, re-entrant looped projections that originate and return to specific regions of the cerebral cortex. However, such loops are unlikely to have been the first evolutionary example of a closed-loop architecture involving the basal ganglia. A phylogenetically older, series of subcortical loops can be shown to link the basal ganglia with many brainstem sensorimotor structures. While the characteristics of individual components of potential subcortical re-entrant loops have been documented, the full extent to which they represent functionally segregated parallel projecting channels remains to be determined. However, for one midbrain structure, the superior colliculus (SC), anatomical evidence for closed-loop connectivity with the basal ganglia is robust, and can serve as an example against which the loop hypothesis can be evaluated for other subcortical structures. Examination of ascending projections from the SC to the thalamus suggests there may be multiple functionally segregated systems. The SC also provides afferent signals to the other principal input nuclei of the basal ganglia, the dopaminergic neurones in substantia nigra and to the subthalamic nucleus. Recent electrophysiological investigations show that the afferent signals originating in the SC carry important information concerning the onset of biologically significant events to each of the basal ganglia input nuclei. Such signals are widely regarded as crucial for the proposed functions of selection and reinforcement learning with which the basal ganglia have so often been associated

    Opposing patterns of abnormal D1 and D2 receptor dependent cortico-striatal plasticity explain increased risk taking in patients with DYT1 dystonia

    Get PDF
    Patients with DYT1 dystonia caused by the mutated TOR1A gene exhibit risk neutral behaviour compared to controls who are risk averse in the same reinforcement learning task. It is unclear whether this behaviour can be linked to changes in cortico-striatal plasticity demonstrated in animal models which share the same TOR1A mutation. We hypothesised that we could reproduce the experimental risk taking behaviour using a model of the basal ganglia under conditions where cortico-striatal plasticity was abnormal. As dopamine exerts opposing effects on cortico-striatal plasticity via different receptors expressed on medium spiny neurons (MSN) of the direct (D1R dominant, dMSNs) and indirect (D2R dominant, iMSNs) pathways, we tested whether abnormalities in cortico-striatal plasticity in one or both of these pathways could explain the patient's behaviour. Our model could generate simulated behaviour indistinguishable from patients when cortico-striatal plasticity was abnormal in both dMSNs and iMSNs in opposite directions. The risk neutral behaviour of the patients was replicated when increased cortico-striatal long term potentiation in dMSN's was in combination with increased long term depression in iMSN's. This result is consistent with previous observations in rodent models of increased cortico-striatal plasticity at in dMSNs, but contrasts with the pattern reported in vitro of dopamine D2 receptor dependant increases in cortico-striatal LTP and loss of LTD at iMSNs. These results suggest that additional factors in patients who manifest motor symptoms may lead to divergent effects on D2 receptor dependant cortico-striatal plasticity that are not apparent in rodent models of this disease

    A phosphatase cascade by which rewarding stimuli control nucleosomal response

    Get PDF
    ArticleInternational audienceDopamine orchestrates motor behaviour and reward-driven learning. Perturbations of dopamine signalling have been implicated in several neurological and psychiatric disorders, and in drug addiction. The actions of dopamine are mediated in part by the regulation of gene expression in the striatum, through mechanisms that are not fully understood. Here we show that drugs of abuse, as well as food reinforcement learning, promote the nuclear accumulation of 32-kDa dopamine-regulated and cyclic-AMP-regulated phosphoprotein (DARPP-32). This accumulation is mediated through a signalling cascade involving dopamine D1 receptors, cAMP-dependent activation of protein phosphatase-2A, dephosphorylation of DARPP-32 at Ser 97 and inhibition of its nuclear export. The nuclear accumulation of DARPP-32, a potent inhibitor of protein phosphatase-1, increases the phosphorylation of histone H3, an important component of nucleosomal response. Mutation of Ser 97 profoundly alters behavioural effects of drugs of abuse and decreases motivation for food, underlining the functional importance of this signalling cascad

    Interactions between Procedural Learning and Cocaine Exposure Alter Spontaneous and Cortically Evoked Spike Activity in the Dorsal Striatum

    Get PDF
    We have previously shown that cocaine enhances gene regulation in the sensorimotor striatum associated with procedural learning in a running-wheel paradigm. Here we assessed whether cocaine produces enduring modifications of learning-related changes in striatal neuron activity, using single-unit recordings in anesthetized rats 1 day after the wheel training. Spontaneous and cortically evoked spike activity was compared between groups treated with cocaine or vehicle immediately prior to the running-wheel training or placement in a locked wheel (control conditions). We found that wheel training in vehicle-treated rats increased the average firing rate of spontaneously active neurons without changing the relative proportion of active to quiescent cells. In contrast, in rats trained under the influence of cocaine, the proportion of spontaneously firing to quiescent cells was significantly greater than in vehicle-treated, trained rats. However, this effect was associated with a lower average firing rate in these spontaneously active cells, suggesting that training under the influence of cocaine recruited additional low-firing cells. Measures of cortically evoked activity revealed a second interaction between cocaine treatment and wheel training, namely, a cocaine-induced decrease in spike onset latency in control rats (locked wheel). This facilitatory effect of cocaine was abolished when rats trained in the running wheel during cocaine action. These findings highlight important interactions between cocaine and procedural learning, which act to modify population firing activity and the responsiveness of striatal neurons to excitatory inputs. Moreover, these effects were found 24 h after the training and last drug exposure indicating that cocaine exposure during the learning phase triggers long-lasting changes in synaptic plasticity in the dorsal striatum. Such changes may contribute to the transition from recreational to habitual or compulsive drug taking behavior

    Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of neoHebbian Three-Factor Learning Rules

    Full text link
    Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales. Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only if an additional factor is present while the flag is set. This third factor, signaling reward, punishment, surprise, or novelty, could be implemented by the phasic activity of neuromodulators or specific neuronal inputs signaling special events. While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few years. Here we review, in the context of three-factor rules of synaptic plasticity, four key experiments that support the role of synaptic eligibility traces in combination with a third factor as a biological implementation of neoHebbian three-factor learning rules
    corecore