Search CORE

5 research outputs found

Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

Author: A Arleo
A Barto
A Barto
A Klopf
A Klopf
A Morrison
B Devan
B Poucet
C Clopath
C Hull
C von der Malsburg
C Watkins
D Baras
D Di Castro
D Foster
D Sheynikhovich
DO Hebb
E Bienenstock
E Izhikevich
E Nordlie
E Oja
E Thorndike
E Toleman
E Vasilaki
Eleni Vasilaki
F Wörgötter
H Eichenbaum
H Markram
H Seung
I Fiete
J Baxter
J Wickens
J Wickens
JC Zhang
JNJ Reynolds
JNJ Reynolds
JP Pfister
K Doya
Karl J. Friston
KG Reymann
LF Abbott
M Packard
M Tsodyks
MA Farries
MCW van Rossum
N White
Nicolas Frémaux
P Dayan
P Dayan
P Redgrave
P Roberts
PJ Sjöström
R Jolivet
R Kempter
R Legenstein
R Morris
R Morris
R Morris
R Rao
R Rescorla
R Suri
R Sutton
R Sutton
R Urbanczik
R Williams
RB Stein
RC Malenka
Robert Urbanczik
RS Sutton
RV Florian
S Sajikumar
S Sajikumar
T Kohonen
T Stroesslin
TVP Bliss
U Frey
V Pawlak
W Gerstner
W Gerstner
W Gerstner
W Potjans
W Schultz
W Senn
Walter Senn
Wulfram Gerstner
X Xie
XJ Wang
Y Loewenstein
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Bern Open Repository and Information System (BORIS)

White Rose Research Online

Democratic population decisions result in robust policy-gradient learning: A parametric study with GPU simulations

Author: A Arleo
A Barto
A Barto
A Davison
A Davison
A Davison
A Davison
AP Georgopoulos
C Watkins
D Baras
D Bruederle
D Bruederle
D Di Castro
D Foster
D Goodman
D Goodman
D Goodman
D Sheynikhovich
D Talmi
DL Ly
E Izhikevich
E Vasilaki
E Vasilaki
Eleni Vasilaki
F Bernhard
F Piekniewski
H Markram
HS Seung
I Buck
I Fiete
J Baxter
J Bill
J Eppler
J Friedrich
J Gummaraju
J Nageswaran
J O′Keefe
J Sanders
J van Meel
JP Pfister
K Hamaguchi
Lars Buesing
LF Abbott
M Bhuiyan
M Hines
M Martnez-Zarzuela
M Spiridon
M Van Rossum
M Van Rossum
M Van Rossum
MA Farries
Michele Giugliano
N Bell
O Jensen
P Dayan
P Dayan
P Dayan
P Richmond
P Richmond
Paul L. Gribble
Paul Richmond
R Ananthanarayanan
R Coultrip
R Cyrille
R Kempter
R Legenstein
R Legenstein
R Morris
R Suri
R Sutton
R Sutton
R Williams
RB Stein
RS Sutton
RV Florian
S Amari
S Furber
S Renaud
S Sengupta
SI Amari
T Stroesslin
U Beierholm
W Gerstner
W Gerstner
W Potjans
X Wang
X Xie
Publication venue
Publication date: 01/01/2011
Field of study

High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a "non-democratic" mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons "vote" independently ("democratic") for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated. © 2011 Richmond et al

Crossref

Directory of Open Access Journals

PubMed Central

Sissa Digital Library

Institutional Repository Universiteit Antwerpen

White Rose Research Online

Analyzing Interactions between Cue-Guided and Place-Based Navigation with a Computational Model of Action Selection: Influence of Sensory Cues and Training

Author: A. Guazzelli
B. Girard
B. Ujfalussy
C.F. Doeller
E. Rich
F. Botreau
G. Martel
J. O’Keefe
J. Pearce
K. Leising
L. Dolle
L.E. Martinet
N. White
P. Carrillo-Mora
Q. Chang
R. Chavarriaga
T. Hartley
T. Stroesslin
Y. Burnod
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

International audienceThe hypothesis of multiple memory systems involved in different learning of navigation strategies has gained strong arguments through biological experiments. However, it remains difficult for experimentalists to understand how these systems interact. We propose a new computational model of selection between parallel systems involving cue-guided and place-based navigation strategies that allows analyses of selection switches between both control systems, while providing information that is not directly accessible in experiments with animals. Contrary to existing models of navigation, its module of selection is adaptive and uses a criterion which allows the comparison of strategies having different learning processes. Moreover, the spatial representation used by the place-based strategy is based on a recent hippocampus model. We illustrate the ability of this navigation model to analyze animal behavior in experiments in which the availability of sensory cues, together with the amount of training, influence the competitive or cooperative nature of their interactions

Infoscience - École polytechnique fédérale de Lausanne

Crossref

IP Differentiated Services Over a WDM Passive Optical Star

Author: D Clark
D Levine
D Levine
E Dinan
J Heinanen
K Nichols
M Gagnaire
N Golmie
P Humblet
R Chipalkatti
R Ramaswami
S Blake
S Floyd
T Stroesslin
V Jacobson
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

Author: A Arleo
A Barto
A Georgopoulos
C Clopath
C Watkins
D Foster
D Joel
D Sheynikhovich
E Izhikevich
E Vasilaki
G Bi
GH Seol
H Markram
Henning Sprekeler
J Baxter
J Hollerman
J O'Keefe
JC Zhang
JI Gold
JN Tsitsiklis
JNJ Reynolds
JNJ Reynolds
JNJ Reynolds
JP Pfister
JP Sutton
JR Wickens
JY Cohen
K Doya
K Miyazaki
K Nakamura
Lyle J. Graham
MAA van der Meer
ME Harmon
N Frémaux
Nicolas Frémaux
P Dayan
R Jolivet
R Legenstein
R Legenstein
R Williams
RS Sutton
RV Florian
S Song
T Robbins
T Stroesslin
U Frey
V Pawlak
V Pawlak
W Gerstner
W Potjans
W Potjans
W Schultz
Wulfram Gerstner
X Xie
Y Loewenstein
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref