Search CORE

31 research outputs found

An abstract model of the basal ganglia, reward learning and action selection

Author: Anders Lansner
DJ Surmeier
H Nakahara
JNJ Reynolds
JW Mink
MX Cohen
P Redgrave
Pierre Berthet
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Learning Shapes Spontaneous Activity Itinerating over Memorized States

Author: A Luczak
A Luczak
AG Barto
C Gros
DE Rumelhart
DJ Willshaw
H Jaeger
I Tsuda
JJ Hopfield
JL Elman
JNJ Reynolds
Kunihiko Kaneko
M Rabinovich
MD Fox
MI Jordan
O Mazor
Olaf Sporns
RabinovichMI
S Fusi
T Kenet
T Kohonen
T Kurikawa
T Sasaki
TM Jay
Tomoki Kurikawa
W Maass
X Xie
Publication venue: Public Library of Science
Publication date
Field of study

Learning is a process that helps create neural dynamical systems so that an appropriate output pattern is generated for a given input. Often, such a memory is considered to be included in one of the attractors in neural dynamical systems, depending on the initial neural state specified by an input. Neither neural activities observed in the absence of inputs nor changes caused in the neural activity when an input is provided were studied extensively in the past. However, recent experimental studies have reported existence of structured spontaneous neural activity and its changes when an input is provided. With this background, we propose that memory recall occurs when the spontaneous neural activity changes to an appropriate output activity upon the application of an input, and this phenomenon is known as bifurcation in the dynamical systems theory. We introduce a reinforcement-learning-based layered neural network model with two synaptic time scales; in this network, I/O relations are successively memorized when the difference between the time scales is appropriate. After the learning process is complete, the neural dynamics are shaped so that it changes appropriately with each input. As the number of memorized patterns is increased, the generated spontaneous neural activity after learning shows itineration over the previously learned output patterns. This theoretical finding also shows remarkable agreement with recent experimental reports, where spontaneous neural activity in the visual cortex without stimuli itinerate over evoked patterns by previously applied signals. Our results suggest that itinerant spontaneous activity can be a natural outcome of successive learning of several patterns, and it facilitates bifurcation of the network when an input is provided

Crossref

Directory of Open Access Journals

PubMed Central

Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

Author: A Arleo
A Barto
A Barto
A Klopf
A Klopf
A Morrison
B Devan
B Poucet
C Clopath
C Hull
C von der Malsburg
C Watkins
D Baras
D Di Castro
D Foster
D Sheynikhovich
DO Hebb
E Bienenstock
E Izhikevich
E Nordlie
E Oja
E Thorndike
E Toleman
E Vasilaki
Eleni Vasilaki
F Wörgötter
H Eichenbaum
H Markram
H Seung
I Fiete
J Baxter
J Wickens
J Wickens
JC Zhang
JNJ Reynolds
JNJ Reynolds
JP Pfister
K Doya
Karl J. Friston
KG Reymann
LF Abbott
M Packard
M Tsodyks
MA Farries
MCW van Rossum
N White
Nicolas Frémaux
P Dayan
P Dayan
P Redgrave
P Roberts
PJ Sjöström
R Jolivet
R Kempter
R Legenstein
R Morris
R Morris
R Morris
R Rao
R Rescorla
R Suri
R Sutton
R Sutton
R Urbanczik
R Williams
RB Stein
RC Malenka
Robert Urbanczik
RS Sutton
RV Florian
S Sajikumar
S Sajikumar
T Kohonen
T Stroesslin
TVP Bliss
U Frey
V Pawlak
W Gerstner
W Gerstner
W Gerstner
W Potjans
W Schultz
W Senn
Walter Senn
Wulfram Gerstner
X Xie
XJ Wang
Y Loewenstein
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Bern Open Repository and Information System (BORIS)

White Rose Research Online

Towards a General Theory of Neural Computation Based on Prediction by Single Neurons

Author: A Dickinson
AL Fairhall
B Hille
CC Bell
CC Bell
Christopher D. Fiorillo
D Marr
D Oliver
DM Dacey
EL Bienenstock
EP Simoncelli
ET Jaynes
F Attneave
F Palmieri
F Rieke
G Rainer
G Turrigiano
H von Helmholz
HB Barlow
HB Barlow
HW Tao
J Golowasch
J Hawkins
JNJ Reynolds
JS Anderson
KS Park
LI Zhang
LI Zhang
LJ Toth
M Wehr
MF Bear
MV Srinivasan
NS Desai
Olaf Sporns
P Dayan
PR Montague
R Fettiplace
R Malinow
RPN Rao
RS Sutton
RS Sutton
T Hosoya
W Schultz
W Schultz
Publication venue: Public Library of Science
Publication date: 01/10/2008
Field of study

Although there has been tremendous progress in understanding the mechanics of the nervous system, there has not been a general theory of its computational function. Here I present a theory that relates the established biophysical properties of single generic neurons to principles of Bayesian probability theory, reinforcement learning and efficient coding. I suggest that this theory addresses the general computational problem facing the nervous system. Each neuron is proposed to mirror the function of the whole system in learning to predict aspects of the world related to future reward. According to the model, a typical neuron receives current information about the state of the world from a subset of its excitatory synaptic inputs, and prior information from its other inputs. Prior information would be contributed by synaptic inputs representing distinct regions of space, and by different types of non-synaptic, voltage-regulated channels representing distinct periods of the past. The neuron's membrane voltage is proposed to signal the difference between current and prior information (“prediction error” or “surprise”). A neuron would apply a Hebbian plasticity rule to select those excitatory inputs that are the most closely correlated with reward but are the least predictable, since unpredictable inputs provide the neuron with the most “new” information about future reward. To minimize the error in its predictions and to respond only when excitation is “new and surprising,” the neuron selects amongst its prior information sources through an anti-Hebbian rule. The unique inputs of a mature neuron would therefore result from learning about spatial and temporal patterns in its local environment, and by extension, the external world. Thus the theory describes how the structure of the mature nervous system could reflect the structure of the external world, and how the complexity and intelligence of the system might develop from a population of undifferentiated neurons, each implementing similar learning algorithms

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

Author: A Barto
A Garthe
A Hanuschkin
A Soltani
A Soltani
AA Prinz
Abigail Morrison
AG Barto
B Porr
B Seymour
B Seymour
BI Hyland
BJ Knowlton
CA Paladini
CD Fiorillo
D Baras
D Joel
DC Dennett
DJ Foster
DZ Jin
E Brazhnik
E Nordlie
E Vasilaki
EA Ludvig
EM Izhikevich
F Wörgötter
G La Camera
G Morris
G Morris
GS Berns
HC Tuckwell
HE Attalah
HM Bayer
HS Seung
IH Witten
J Brown
J O'Doherty
J Wickens
J Yacubian
JC Horvitz
JC Houk
JC Houk
JC Houk
JC Houk
JJ Hopfield
JL Contreras-Vidal
JM Tepper
JN Reynolds
JNJ Reynolds
K Doya
K Gurney
KD Sethi
KJ Friston
M Dai
M Helias
M Matsumoto
M Matsumoto
M Pessiglione
MA Farries
MA Häusser
Markus Diesmann
MD Humphries
MO Gewaltig
N Bowery
N Frémaux
N Schweighofer
ND Daw
O Arias-Carrion
P Calabresi
P Dayan
P Dayan
P Dayan
P Montague
P Redgrave
P Redgrave
PA Garris
PN Tobler
PR Montague
PR Montague
R Legenstein
R Suri
R VanRullen
RC Froemke
RE Suri
RE Suri
RJ McDonald
RJ Steele
RPN Rao
RS Sutton
RS Sutton
RV Florian
S Fusi
S Pecina
S Schrader
S Sugita
SM Reynolds
SM Reynolds
T Ljungberg
T Nakano
Tim Behrens
V Pawlak
V Pawlak
W Potjans
W Potjans
W Schultz
W Schultz
W Schultz
Wiebke Potjans
X Xie
Y Loewenstein
Y Niv
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

A Kinetic Model of Dopamine- and Calcium-Dependent Striatal Synaptic Plasticity

Author: A Hayer
A Hudmon
A Hudmon
A Nishi
A Nishi
A Nishi
A Nishi
A Nishi
AA Grace
AC Nairn
AE Kelley
AG Carter
Boris S. Gutkin
BW Balleine
CB Klee
CH Tan
CJ Wilson
CM Hempel
CR Gerfen
CR Gerfen
D Centonze
DMF Cooper
E Fernandez
EJ Nestler
F Desdouits
F Desdouits
F Desdouits
F Gonon
F Liu
F Liu
G Akopian
GC Castellani
GC Castellani
GL Snyder
GL Snyder
GT Swanson
GW Arbuthnott
H Bito
H Usui
HC Hemmings
HC Hemmings
HC Hemmings
HK Lee
I Bezprozvanny
I Lengyel
J Lisman
JA Bibb
JA Bibb
JA Esteban
JA Girault
JL Guillou
JM Bradshaw
JN Reynolds
JN Reynolds
JNJ Reynolds
JP Spencer
JR Wickens
Junichiro Yoshimoto
K Doya
K Håkansson
Kenji Doya
KW Roche
M Day
M Hamada
M Lindskog
M Passafaro
MD Ehlers
MM King
N Mons
NI Markevich
O Hikosaka
P Bonsi
P Calabresi
P Calabresi
P Calabresi
P Calabresi
P Calabresi
P Calabresi
P D'Alcantara
P Greengard
P Greengard
P Greengard
P Svenningsson
P Svenningsson
PE Barbano
R Kötter
R Kötter
R Malinow
S Charpier
S Charpier
S Choi
S Halpain
S Hoops
S Paul
S Sivakumaran
SC Hu
SH Ahmed
SM Hersch
SV Rakhilin
T Li
T Nakano
Takashi Nakano
TD Gould
TG Banke
Tomokazu Doi
US Bhalla
V Bernard
V Derkach
V Janssens
W Schultz
Y Kawaguchi
YP Deng
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Corticostriatal synapse plasticity of medium spiny neurons is regulated by glutamate input from the cortex and dopamine input from the substantia nigra. While cortical stimulation alone results in long-term depression (LTD), the combination with dopamine switches LTD to long-term potentiation (LTP), which is known as dopamine-dependent plasticity. LTP is also induced by cortical stimulation in magnesium-free solution, which leads to massive calcium influx through NMDA-type receptors and is regarded as calcium-dependent plasticity. Signaling cascades in the corticostriatal spines are currently under investigation. However, because of the existence of multiple excitatory and inhibitory pathways with loops, the mechanisms regulating the two types of plasticity remain poorly understood. A signaling pathway model of spines that express D1-type dopamine receptors was constructed to analyze the dynamic mechanisms of dopamine- and calcium-dependent plasticity. The model incorporated all major signaling molecules, including dopamine- and cyclic AMP-regulated phosphoprotein with a molecular weight of 32 kDa (DARPP32), as well as AMPA receptor trafficking in the post-synaptic membrane. Simulations with dopamine and calcium inputs reproduced dopamine- and calcium-dependent plasticity. Further in silico experiments revealed that the positive feedback loop consisted of protein kinase A (PKA), protein phosphatase 2A (PP2A), and the phosphorylation site at threonine 75 of DARPP-32 (Thr75) served as the major switch for inducing LTD and LTP. Calcium input modulated this loop through the PP2B (phosphatase 2B)-CK1 (casein kinase 1)-Cdk5 (cyclin-dependent kinase 5)-Thr75 pathway and PP2A, whereas calcium and dopamine input activated the loop via PKA activation by cyclic AMP (cAMP). The positive feedback loop displayed robust bi-stable responses following changes in the reaction parameters. Increased basal dopamine levels disrupted this dopamine-dependent plasticity. The present model elucidated the mechanisms involved in bidirectional regulation of corticostriatal synapses and will allow for further exploration into causes and therapies for dysfunctions such as drug addiction

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A new framework for cortico-striatal plasticity: behavioural theory meets In vitro data at the reinforcement-action interface

Author: A Dickinson
A Faure
A Ilango
A Kreitzer
A Lak
A Ponzi
AD Redish
AJ McGeorge
AM Bornstein
AS Hart
AV Kravitz
AV Kravitz
B Elsner
B Kim
BS Freeze
C Tang
CA Thorn
CA Thorn
CE Jahr
CM Gremel
D Centonze
D Fan
D Joel
DJ Surmeier
E Bauswein
E De Leonibus
E Fino
E Izhikevich
EA Schilman
EL Bienenstock
EM Izhikevich
EY Kimchi
EY Kimchi
F Tecuapetla
F Worgotter
FA Middleton
G Cui
G Schoenbaum
GE Alexander
GE Alexander
GE Alexander
GJ Mogenson
HC Tsai
HH Yin
HH Yin
HH Yin
HH Yin
HM Bayer
HM Bayer
HS Crombag
J Mink
JD Berke
JH Sul
JN Reynolds
JNJ Reynolds
JP Pfister
JR Wickens
JT Moyer
K Gurney
K Gurney
Kevin N. Gurney
L Abbott
LL Brown
LP Wang
M Frank
M Humphries
M Khamassi
M Lindskog
M Matsumoto
Mark D. Humphries
MCW van Rossum
MD Humphries
MD Humphries
ME Bouton
ME Bouton
MJ Frank
MS Fee
MW Shiflett
MW Shiflett
ND Daw
O Hikosaka
P Calabresi
P Redgrave
P Redgrave
P Redgrave
P Romanelli
P Znamenskiy
Peter Dayan
Peter Redgrave
PR Montague
Q Shan
R Bolado-Gomez
R Smith
RC Evans
RD Samson
S Cragg
S Mahon
S Nakajima
S Shinomoto
S Threlfell
SB Ostlund
SN Haber
T Nakano
TD Barnes
TP Todd
V Paille
V Pawlak
W Schultz
W Schultz
W Shen
WX Pan
Y Maurin
Z Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Operant learning requires that reinforcement signals interact with action representations at a suitable neural interface. Much evidence suggests that this occurs when phasic dopamine, acting as a reinforcement prediction error, gates plasticity at cortico-striatal synapses, and thereby changes the future likelihood of selecting the action(s) coded by striatal neurons. But this hypothesis faces serious challenges. First, cortico-striatal plasticity is inexplicably complex, depending on spike timing, dopamine level, and dopamine receptor type. Second, there is a credit assignment problem—action selection signals occur long before the consequent dopamine reinforcement signal. Third, the two types of striatal output neuron have apparently opposite effects on action selection. Whether these factors rule out the interface hypothesis and how they interact to produce reinforcement learning is unknown. We present a computational framework that addresses these challenges. We first predict the expected activity changes over an operant task for both types of action-coding striatal neuron, and show they co-operate to promote action selection in learning and compete to promote action suppression in extinction. Separately, we derive a complete model of dopamine and spike-timing dependent cortico-striatal plasticity from in vitro data. We then show this model produces the predicted activity changes necessary for learning and extinction in an operant task, a remarkable convergence of a bottom-up data-driven plasticity model with the top-down behavioural requirements of learning theory. Moreover, we show the complex dependencies of cortico-striatal plasticity are not only sufficient but necessary for learning and extinction. Validating the model, we show it can account for behavioural data describing extinction, renewal, and reacquisition, and replicate in vitro experimental data on cortico-striatal plasticity. By bridging the levels between the single synapse and behaviour, our model shows how striatum acts as the action-reinforcement interface

Directory of Open Access Journals

The University of Manchester - Institutional Repository

White Rose Research Online

FigShare

Nottingham ePrints

Public Library of Science (PLOS)

CiteSeerX

Nottingham eTheses

Crossref

Repository@Nottingham

PubMed Central