Search CORE

2 research outputs found

Parallel Representation of Value-Based and Finite State-Based Strategies in the Ventral and Dorsal Striatum.

Author: Kenji Doya
Makoto Ito
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/11/2015
Field of study

Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the "win-stay, lose-switch" strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

A computational model of cortical-striatal mediation of speed-accuracy tradeoff and habit formation emerging from anatomical gradients in dopamine physiology and reinforcement learning

Author: Patrick Sean
Publication venue
Publication date: 27/11/2018
Field of study

Decision making – committing to a single action from a plethora of viable alternatives – is a necessity for all motile creatures, each moving a single body to many possible destinations. Some decisions are better than others. For example, to a rat deciding between one path that will bring it to a piece of cheese and another that will bring it to the jaws of a cat, there is a clear reason for the rat to prefer one choice over the other. Two criteria for adjusting decision making for optimal outcome are to make decisions as accurately as possible – choose the course of action most likely to result in the preferred outcome – but also to decide as fast as possible. Because these criteria often conflict, decision making has an inherent “speed-accuracy tradeoff”. Presented here is a computational neural model of decision making, which incorporates neurobiological design principles that optimize this tradeoff via reward-guided transfers of control between two sensory processing systems with different speed/accuracy characteristics. This model incorporates anatomical and physiological evidence that dopamine, the key neurotransmitter in reinforcement learning, has varying effects in different sub-regions of the basal ganglia, a subcortical structure that interfaces with the neocortex to control behavior. Based on the observed differences between these sub-regions, the model proposes that gradual adaptations of synaptic links by reinforcement learning signals lead to rapid changes in the speed and accuracy of decision making, by assigning control of behavior to alternative cortical representations. Chapter one draws conceptual links from experimental data to the design of the proposed model. Chapter two applies the model to speed-accuracy tradeoffs and habit formation by simulating forced-choice paradigms. Several robust behavioral phenomena are replicated. By isolating reinforcement learning factors that control the speed and depth of habit formation, the model can help explain why all substances that strongly and synergistically affect such factors share a high potential for habit formation, or habit abatement. To illustrate such potential applications of the current model, chapter three investigates effects of varying model parameters in accord with the known neurochemical effects of some major habit-forming substances, such as cocaine and ethanol

Boston University Institutional Repository (OpenBU)