37 research outputs found

    Impacts of inter-trial interval duration on a computational model of sign-tracking vs. goal-tracking behaviour

    Get PDF
    International audienceIn the context of Pavlovian conditioning, two types of behaviour may emerge within the population (Flagel et al. Nature, 469(7328): 53-57, 2011). Animals may choose to engage either with the conditioned stimulus (CS), a behaviour known as sign-tracking (ST) which is sensitive to dopamine inhibition for its acquisition, or with the food cup in which the reward or unconditioned stimulus (US) will eventually be delivered, a behaviour known as goal-tracking (GT) which is dependent on dopamine for its expression only. Previous work by Lesaint et al. (PLoS Comput Biol, 10(2), 2014) offered a computational explanation for these phenomena and led to the prediction that varying the duration of the inter-trial interval (ITI) would change the relative ST-GT proportion in the population as well as phasic dopamine responses. A recent study verified this prediction, but also found a rich variance of ST and GT behaviours within the trial which goes beyond the original computational model. In this paper, we provide a computational perspective on these novel results

    Bayesian mapping of the striatal microcircuit reveals robust asymmetries in the probabilities and distances of connections

    Get PDF
    The striatum’s complex microcircuit is made by connections within and between its D1- and D2-receptor expressing projection neurons and at least five species of interneuron. Precise knowledge of this circuit is likely essential to understanding striatum’s functional roles and its dysfunction in a wide range of movement and cognitive disorders. We introduce here a Bayesian approach to mapping neuron connectivity using intracellular recording data, which lets us simultaneously evaluate the probability of connection between neuron types, the strength of evidence for it, and its dependence on distance. Using it to synthesise a complete map of the mouse striatum, we find strong evidence for two asymmetries: a selective asymmetry of projection neuron connections, with D2 neurons connecting twice as densely to other projection neurons than do D1 neurons, but neither subtype preferentially connecting to another; and a length-scale asymmetry, with interneuron connection probabilities remaining non-negligible at more than twice the distance of projection neuron connections. We further show our Bayesian approach can evaluate evidence for wiring changes, using data from the developing striatum and a mouse model of Huntington’s disease. By quantifying the uncertainty in our knowledge of the microcircuit, our approach reveals a wide range of potential striatal wiring diagrams consistent with current data

    Modeling [F-18]MPPF positron emission tomography kinetics for the determination of 5-hydroxytryptamine(1A) receptor concentration with multiinjection

    Full text link
    peer reviewedThe selectivity of [F-18]MPPF (fluorine-18-labeled 4-(2'-methoxyphenyl)-1-[2'-(N-2"-pirydynyl)-p-fluorobenzamido]ethylpiperazine) for serotonergic 5-hydroxytryptamine(1A) (5-HT1A) receptors has been established in animals and humans. The authors quantified the parameters of ligand-receptor exchanges using a double-injection protocol. After injection of a tracer and a coinjection dose of [F-18]MPPF, dynamic positron emission tomography (PET) data Were acquired during a 160-minute session in five healthy males. These PET and magnetic resonance imaging data were coregistered for anatomical identification. A three-compartment model was used to determine six parameters: F-v (vascular fraction). K-1, k(2) (plasma/free compartment exchange rate). k(off). k(on)/V-r (association and dissociation rate), B-max (receptor concentration), and to deduce K-d (apparent equilibrium dissociation rate). The model was fitted with regional PET kinetics and arterial input function corrected for metabolites. Analytical distribution volume and binding potential Were compared With indices generated by Logan-Patlak graphical analysis. The 5HT(1A) specificity for MPPF was evidenced. A B-max of 2.9 pmol/mL and a K-d of 2.8 nmol/L were found in hippocampal regions, K-d and distribution volume in the free compartment were regionally stable. and the Logan binding potential was linearly correlated to B-max. This study confirms the value of MPPF in the investigation of normal and pathologic systems involving the limbic network and 5-HT1A receptors. Standard values can be used for the simulation of simplified protocols

    Modélisation computationnelle de la variabilité et de la régulation de l'apprentissage par renforcement chez le Rat

    No full text
    In this work, I will discuss two main topics concerning learnt behaviour in Rats: firstly meta-learning, i.e. the regulation of learning and decision-making parameters; secondly, inter-individual variability in the strategies used in a simple Pavlovian conditioning experiment. In both cases, I will adopt a computational standpoint using reinforcement learning algorithms to model experimental data while also attending to related dopamine functions in the Rat brain.If environmental access to food and reproductive opportunities evolves at a relatively stable pace, the learning abilities of an organism should keep track of this evolution and enable appropriate behaviour in response to these changes, but, should the environment change unexpectedly, an additional process of meta-learning might be required to cope with this change. In particular, controlling the learning rate or speed with which state, stimulus or action values are updated in response to discrete environmental feedback, and balancing exploitation of what seems to be the best option with exploration of potentially better ones, could constitute two powerful meta-learning strategies when facing a volatile environment. I will start my investigation of meta-learning by analysing the results of a three-armed bandit task with pharmacological inhibition of dopamine, a neurotransmitter suspected of regulating the exploration-exploitation trade-off by Humphries et al. 2012. After this, I will assess how well different models with meta-learning mechanisms regulating either the learning rate or exploration-exploitation trade-off can explain long-term changes in behaviour during the control sessions of the same three-armed bandit task.Finally, in a Pavlovian conditioning task in which the appearance of a lever predicts food delivery, it is well known that two kinds of behaviour can appear in a rat population. On the one hand, so-called sign-trackers become strongly attracted to the lever which they will approach and nibble, while goal-trackers will prefer to immediately go to the site of reward delivery. In parallel, there are differences in the associated dopamine signals, sign-trackers presenting a classical reward prediction error pattern, i.e. a burst of phasic activity which shifts from the time of reward delivery in the early stages of the task to the time the lever appears in later stages, contrary to goal-trackers whose dopamine signals are mostly stable throughout the task. A model aimed at explaining these behavioural and neurological results was previously proposed by Lesaint et al. 2014, and I will apply this model to new experimental findings based on a task with different inter-trial interval durations. This will result in adjustments to the previous model and propositions for going forward.Dans cette thèse, je traiterai de deux sujets principaux concernant la variabilité des comportements d'apprentissage chez le Rat: premièrement, une variabilité temporelle propre à chaque individu et qu'il convient d'appeler méta-apprentissage ("meta-learning" en anglais), c'est-à-dire capacité d'auto-régulation de paramètres comportementaux qui déterminent la prise de décision; deuxièmement, une variabilité inter-individuelle quant à la stratégie employée dans le cadre d'une expérience de conditionnement pavlovien. Pour chacun de ces deux sujets, j'adopterai un point de vue computationnel ancré dans les techniques de l'apprentissage par renforcement afin de modéliser des données expérimentales, tout en dressant des parallèles avec les fonctions dopaminergiques censées être associées à ces processus. Si les capacités d'apprentissage d'un être vivant peuvent suffire à assurer sa survie dans un environnement stable, un environnement dans lequel les ressources alimentaires et les occasions de se reproduire varient de façon imprévisible nécessite une capacité supplémentaire de méta-apprentissage permettant d'ajuster ses paramètres d'apprentissage et de prise de décision à ces changements inattendus. La régulation de la vitesse avec laquelle l'estimation subjective de la valeur d'une action, d'un stimulus ou d'un événement est mise à jour, et celle de l'équilibre entre exploitation de ce qui semble être la meilleure option et exploration d'options alternatives qui pourraient s'avérer meilleures, constituent deux levier d'action particulièrement aptes à répondre aux défis posés par un environnement instable. Je commencerai mon investigation par l'analyse d'une tâche d'apprentissage sur un bandit manchot à trois bras avec une inhibition pharmacologique de la dopamine et démontrerai que les résultats sont en accord avec l'hypothèse avancée par Humphries et al. 2012 selon laquelle ce neurotransmetteur régulerait le taux d'exploration. Dans un second temps, j'évaluerai la capacité de différents modèles originaux de méta-apprentissage soit du taux d'exploration soit de la vitesse d'apprentissage pour rendre compte des changements comportementaux observés sur le long terme entre sessions expérimentales.Enfin, dans le cadre d'un conditionnement classique ou pavlovien au cours duquel le stimulus conditionnel est l'apparition d'un levier prédisant une récompense alimentaire, il a été établi que deux types de réponses peuvent émerger au sein d'une cohorte de rats. Une première catégorie d'individus, appelés "sign-trackers" en anglais, vont préférentiellement se diriger vers le stimulus conditionnel et interagir avec lui, en le mordillant par exemple, tandis que la seconde catégorie, constituée de "goal-trackers", se rend directement à l'emplacement où la récompense sera livrée. En parallèle, ces deux catégories d'individus présentent une disparité des signaux dopaminergiques produits: chez les sign-trackers, l'activité dopaminergique suit le profil classique d'une erreur de prédiction, c'est-à-dire une forte activation au moment de la réception de la récompense en début de tâche qui s'atténue progressivement pour se reporter au moment de l'apparition du levier, tandis que chez les goal-trackers, les signaux dopaminergiques sont stables au cours de la tâche. Un modèle expliquant ces différences comportementales et neurologiques a déjà été proposé par Lesaint et al. 2014, et au cours de cette thèse, j'évaluerai sa capacité à rendre compte de nouvelles données expérimentales obtenues sur une tâche avec des intervalles entre chaque essai de durées différentes et proposerai des ajustements du modèle en conséquence

    Dopamine blockade impairs the exploration-exploitation trade-off in rats

    No full text
    International audienceIn a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. this study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies. All organisms need to make choices for their survival while being confronted to uncertainty in their environment. Animals and humans tend to exploit actions likely to provide desirable outcomes, but they must also take into account the possibility that environmental contingencies and the outcome of their actions may vary with time. Behavioral flexibility is thus needed in volatile environments in order to detect and learn new contingencies 1. This requires a delicate balance between exploitation of known resources and exploration of alternative options that may have become advantageous. How this exploration/exploitation dilemma may be resolved and regulated is still a subject of active research in the fields of Neuroscience and Machine Learning 2-5. Dopamine holds a fundamental place in contemporary theories of learning and decision-making. The temporal evolution of phasic dopamine signals across learning has been extensively replicated, and is most of the time considered as evidence of a role in learning 6-8 , but see alternative views in Coddington et al. 9. Dopamine reward prediction error (RPE) signals have been identified in a variety of instrumental and Pavlovian conditioning tasks 10-13. They affect plasticity and action value learning in cortico-basal networks 14-16 and have been directly related to behavioral adaptation in a number of decision-making tasks in humans, non-human primates 17 and rodents 18-21. Accordingly, it is commonly assumed that manipulations of dopamine activity affect the rate of learning, but this could represent a misconception. Besides learning, the role of dopamine in the control of behavioral performance is still unclear. Dopamine is known to modulate incentive choice (the tendency to differentially weigh costs and benefits) 22,23 , and risk-taking behavior 24 , as well as other motivational aspects such as effort and response vigour 25. Because dopamine is one of the key factors that may encode success or uncertainty, it might modulate decisions by biasing them toward options that present the largest uncertainty 26,27. This would correspond to a "directed" exploration strategy 5,28,29. Alternatively, success and failure could affect tonic dopamine levels and control random exploration of all options, as recently proposed by Humphries et al. 30. This form of undirected exploration, which is often difficul

    Ex vivo confocal microscopy imaging to identify tumor tissue on freshly removed brain sample

    No full text
    Confocal microscopy is a technique able to realize "optic sections" of a tissue with increasing applications. We wondered if we could apply an ex vivo confocal microscope designed for dermatological purpose in a routine use for the most frequent brain tumors. The aim of this work was to identify tumor tissue and its histopathological hallmarks, and to assess grading criteria used in neuropathological practice without tissue loss on freshly removed brain tissue. Seven infiltrating gliomas, nine meningiomas and three metastases of carcinomas were included. We compared imaging results obtained with the confocal microscope to frozen sections, smears and tissue sections of formalin-fixed tissue. Our results show that ex vivo confocal microscopy imaging can be applied to brain tumors in order to quickly identify tumor tissue without tissue loss. It can differentiate tumors and can assess most of grading criteria. Confocal microscopy could represent a new tool to identify tumor tissue on freshly removed sample and could help in selecting areas for biobanking of tumor tissue
    corecore