Search CORE

Spatio-Temporal Credit Assignment in Neuronal Population Learning

Author: A Pouget
A Volterra
B Averbeck
Boris S. Gutkin
C Clopath
C Henneberger
D Baras
D Castro
E Izhikevich
E Vasilaki
EE Fetz
G Bi
G Seol
H Markram
J Baxter
J Baxter
J Friedrich
J Mazur
J Pfister
J Zhang
Johannes Friedrich
JP Gavornik
MA Lynch
ND Daw
P Dayan
R Florian
R Foehring
R Legenstein
R Legenstein
R Sutton
R Urbanczik
R Williams
Robert Urbanczik
RS Sutton
S Frey
S Song
S Song
T Nevian
V Pawlak
W Abraham
W Almaguer-Melian
W Baum
W Schultz
Walter Senn
X Wang
Y Matsuda
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain

CiteSeerX

Bern Open Repository and Information System (BORIS)

Neural Coordination of Distinct Motor Learning Strategies: Latent Neurofunctional Mechanisms Elucidated via Computational Modeling

Author: Capps Robert A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 11/08/2020
Field of study

In this dissertation, a neurofunctional theory of learning is presented as an extension of functional analysis. This new theory clarifies the distinction— via applied quantitative analysis— between functionally intrinsic (essential) mechanistic structures and irrelevant structural details. This thesis is supported by a review of the relevant literature to provide historical context and sufficient scientific background. Further, the scope of this thesis is elucidated by two questions that are posed from a neurofunctional perspective— (1) how can specialized neuromorphology contribute to the functional dynamics of neural learning processes? (2) Can large-scale neurofunctional pathways emerge via inter-network communication between disparate neural circuits? These questions motivate the specific aims of this dissertation. Each aim is addressed by posing a relevant hypothesis, which is then tested via a neurocomputational experiment. In each experiment, computational techniques are leveraged to elucidate specific mechanisms that underlie neurofunctional learning processes. For instance, the role of specialized neuromorphology is investigated via the development of a computational model that replicates the neurophysiological mechanisms that underlie cholinergic interneurons’ regulation of dopamine in the striatum during reinforcement learning. Another research direction focuses on the emergence of large-scale neurofunctional pathways that connect the cerebellum and basal ganglia— this study also involves the construction of a neurocomputational model. The results of each study illustrate the capability of neurocomputational models to replicate functional learning dynamics of human subjects during a variety of motor adaptation tasks. Finally, the significance— and some potential applications— of neurofunctional theory are discussed

ScholarWorks @ Georgia State University

An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

Author: A Barto
A Garthe
A Hanuschkin
A Soltani
A Soltani
AA Prinz
Abigail Morrison
AG Barto
B Porr
B Seymour
B Seymour
BI Hyland
BJ Knowlton
CA Paladini
CD Fiorillo
D Baras
D Joel
DC Dennett
DJ Foster
DZ Jin
E Brazhnik
E Nordlie
E Vasilaki
EA Ludvig
EM Izhikevich
F Wörgötter
G La Camera
G Morris
G Morris
GS Berns
HC Tuckwell
HE Attalah
HM Bayer
HS Seung
IH Witten
J Brown
J O'Doherty
J Wickens
J Yacubian
JC Horvitz
JC Houk
JC Houk
JC Houk
JC Houk
JJ Hopfield
JL Contreras-Vidal
JM Tepper
JN Reynolds
JNJ Reynolds
K Doya
K Gurney
KD Sethi
KJ Friston
M Dai
M Helias
M Matsumoto
M Matsumoto
M Pessiglione
MA Farries
MA Häusser
Markus Diesmann
MD Humphries
MO Gewaltig
N Bowery
N Frémaux
N Schweighofer
ND Daw
O Arias-Carrion
P Calabresi
P Dayan
P Dayan
P Dayan
P Montague
P Redgrave
P Redgrave
PA Garris
PN Tobler
PR Montague
PR Montague
R Legenstein
R Suri
R VanRullen
RC Froemke
RE Suri
RE Suri
RJ McDonald
RJ Steele
RPN Rao
RS Sutton
RS Sutton
RV Florian
S Fusi
S Pecina
S Schrader
S Sugita
SM Reynolds
SM Reynolds
T Ljungberg
T Nakano
Tim Behrens
V Pawlak
V Pawlak
W Potjans
W Potjans
W Schultz
W Schultz
W Schultz
Wiebke Potjans
X Xie
Y Loewenstein
Y Niv
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards

Public Library of Science (PLOS)

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

Temporal-Difference Reinforcement Learning with Distributed Representations

Author: A Johnson
A Johnson
A Kacelnik
A. David Redish
AD Redish
AD Redish
AD Redish
AG Barto
AG Sanfey
AL Odum
AM Graybiel
AV Beylin
B Reynolds
CD Fiorillo
CD Fiorillo
CD Fiorillo
CR Gallistel
D Read
D Self
DC Rubin
DC Rubin
DI Laibson
DW Stephens
E Pastalkova
EA Ludvig
EA Ludvig
F Wörgötter
G Ainslie
G Ainslie
G Ainslie
G Thibaudeau
GD Stuber
GE Alexander
GE Alexander
GJ Madden
HM Bayer
HM Bayer
I Pavlov
J Gibbon
J Mazur
J Mirenowicz
J Mirenowicz
JC Jackson
JE Mazur
JER Staddon
JF Cheer
JJ Day
JP O'Doherty
JP O'Doherty
JR Hollerman
JR Norris
K Doya
K Doya
K Doya
K Doya
K Samejima
K Samejima
M Bertin
M Kawato
MF Roitman
N Schweighofer
N Schweighofer
N Schweighofer
ND Daw
ND Daw
ND Daw
ND Daw
NJ Mackintosh
NM Petry
Olaf Sporns
P Brémaud
P Dayan
P Dayan
PD Sozou
PEM Phillips
PL Strick
PR Montague
PR Solomon
PS Kaplan
R Bellman
RA Rescorla
RE Suri
RE Suri
RE Vuchinich
RJ Herrnstein
RM Wightman
RN Cardinal
RS Sutton
RS Sutton
RS Zemel
S Kakade
SC Tanaka
SC Tanaka
SH Mitchell
SJ Badtke
SM Alessi
SM McClure
SN Haber
T Das
T Kalenscher
T Ljungberg
TJ Shors
W Schultz
W Schultz
W Schultz
W Schultz
W Schultz
W Schultz
WB Levy
WB Levy
WX Pan
Y Niv
Zeb Kurth-Nelson
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments

CiteSeerX

Public Library of Science (PLOS)

UCL Discovery

Correlates of reward-predictive value in learning-related hippocampal neural activity

Author: Ainge
Apicella
Attwell
Barraclough
Barto
Barto
Bellebaum
Brown
Brown
Burnham
Buzsaki
Cacucci
Cahusac
Chapman
Chen
Chornoboy
Daley
Dempster
Efron
Eichenbaum
Eichenbaum
Eichenbaum
Eichenbaum
Foster
Foster
Friedman
Funahashi
Gheusi
Goldman-Rakic
Goldman-Rakic
Haykin
Houk
Hölscher
Jung
Kaelbling
Kakade
Knepper
Kunishio
Lee
Leutgeb
Liu
Ljungberg
McClure
McClure
McCullagh
Mehta
Montague
Montague
Montague
Montague
Niv
O'Doherty
O'Doherty
O'Keefe
O'Keefe
Okatan
Pastalkova
Rolls
Rolls
Romo
Samejima
Sappington
Schacter
Schoenbaum
Schultz
Schultz
Schultz
Schultz
Schultz
Seymour
Siegel
Singh
Smith
Smith
Snyder
Squire
Suri
Suri
Sutton
Sutton
Sutton
Tanaka
Tesauro
Waelti
Wilson
Wirth
Yin
Publication venue: Wiley Subscription Services, Inc., A Wiley Company
Publication date: 02/01/2009
Field of study

Temporal difference learning (TD) is a popular algorithm in machine learning. Two learning signals that are derived from this algorithm, the predictive value and the prediction error, have been shown to explain changes in neural activity and behavior during learning across species. Here, the predictive value signal is used to explain the time course of learning-related changes in the activity of hippocampal neurons in monkeys performing an associative learning task. The TD algorithm serves as the centerpiece of a joint probability model for the learning-related neural activity and the behavioral responses recorded during the task. The neural component of the model consists of spiking neurons that compete and learn the reward-predictive value of task-relevant input signals. The predictive-value signaled by these neurons influences the behavioral response generated by a stochastic decision stage, which constitutes the behavioral component of the model. It is shown that the time course of the changes in neural activity and behavioral performance generated by the model exhibits key features of the experimental data. The results suggest that information about correct associations may be expressed in the hippocampus before it is detected in the behavior of a subject. In this way, the hippocampus may be among the earliest brain areas to express learning and drive the behavioral changes associated with learning. Correlates of reward-predictive value may be expressed in the hippocampus through rate remapping within spatial memory representations, they may represent reward-related aspects of a declarative or explicit relational memory representation of task contingencies, or they may correspond to reward-related components of episodic memory representations. These potential functions are discussed in connection with hippocampal cell assembly sequences and their reverse reactivation during the awake state. The results provide further support for the proposal that neural processes underlying learning may be implementing a temporal difference-like algorithm

Boston University Institutional Repository (OpenBU)

arXiv.org e-Print Archive

Saccade learning with concurrent cortical and subcortical basal ganglia loops

Author: Girard Benoît
N'Guyen Steve
Thurat Charles
Publication venue
Publication date: 30/12/2013
Field of study

The Basal Ganglia is a central structure involved in multiple cortical and subcortical loops. Some of these loops are believed to be responsible for saccade target selection. We study here how the very specific structural relationships of these saccadic loops can affect the ability of learning spatial and feature-based tasks. We propose a model of saccade generation with reinforcement learning capabilities based on our previous basal ganglia and superior colliculus models. It is structured around the interactions of two parallel cortico-basal loops and one tecto-basal loop. The two cortical loops separately deal with spatial and non-spatial information to select targets in a concurrent way. The subcortical loop is used to make the final target selection leading to the production of the saccade. These different loops may work in concert or disturb each other regarding reward maximization. Interactions between these loops and their learning capabilities are tested on different saccade tasks. The results show the ability of this model to correctly learn basic target selection based on different criteria (spatial or not). Moreover the model reproduces and explains training dependent express saccades toward targets based on a spatial criterion. Finally, the model predicts that in absence of prefrontal control, the spatial loop should dominate

Frontiers - Publisher Connector

Phasic Dopamine Changes and Hebbian Mechanisms during Probabilistic Reversal Learning in Striatal Circuits: A Computational Study

Author: Nekka Fahima
Schirru Miriam
Ursino Mauro
Véronneau-Veilleux Florence
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Cognitive flexibility is essential to modify our behavior in a non-stationary environment and is often explored by reversal learning tasks. The basal ganglia (BG) dopaminergic system, under a top-down control of the pre-frontal cortex, is known to be involved in flexible action selection through reinforcement learning. However, how adaptive dopamine changes regulate this process and learning mechanisms for training the striatal synapses remain open questions. The current study uses a neurocomputational model of the BG, based on dopamine-dependent direct (Go) and indirect (NoGo) pathways, to investigate reinforcement learning in a probabilistic environment through a task that associates different stimuli to different actions. Here, we investigated: the efficacy of several versions of the Hebb rule, based on covariance between pre- and post-synaptic neurons, as well as the required control in phasic dopamine changes crucial to achieving a proper reversal learning. Furthermore, an original mechanism for modulating the phasic dopamine changes is proposed, assuming that the expected reward probability is coded by the activity of the winner Go neuron before a reward/punishment takes place. Simulations show that this original formulation for an automatic phasic dopamine control allows the achievement of a good flexible reversal even in difficult conditions. The current outcomes may contribute to understanding the mechanisms for active control of dopamine changes during flexible behavior. In perspective, it may be applied in neuropsychiatric or neurological disorders, such as Parkinson's or schizophrenia, in which reinforcement learning is impaired