Search CORE

5 research outputs found

Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes

Author: A Dickinson
A Dickinson
A Dickinson
A Mas-Colell
A Rangel
A Shah
A Yuille
AD Redish
AD Redish
AG Barto
Amir Dezfouli
AT Welford
B Balleine
B Shiv
BW Balleine
C Vickrey
CD Adams
D Belin
D Hu
D Joel
DE Broadbent
DM Jackson
E Alluisi
E Alluisi
E Tolman
E Tolman
G Gigerenzer
G Gigerenzer
GD Carr
GH Mowbray
H Simon
H Simon
H Simon
H Tassinari
HH Yin
IM Spigel
JD Salamone
JD Sokolowski
JE Aberman
JI Gold
JL Evenden
JN Tsitsiklis
JR Taylor
JR Taylor
K Muenzinger
M Correa
M Geist
M Haruno
M Jueptner
M Jueptner
M Lyons
M Pessiglione
Mehdi Keramati
MF Brown
ML Evans
ND Daw
ND Daw
NL Munn
Payam Piray
PC Holland
PR Montague
R Dearden
R Howard
R Hyman
RE Suri
RK Mahurin
RL Buckner
RM Colwill
RM Colwill
RS Sutton
S Killcross
S Mingote
S Zilberstein
SA Ellias
SJ Julier
SM McClure
SN Haber
SN Haber
T Ljungberg
Tim Behrens
TW Robbins
W Schultz
WE Hick
Y Kosaki
Y Niv
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time

CiteSeerX

Public Library of Science (PLOS)

City Research Online

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery