Search CORE

18 research outputs found

Learning mechanisms for DA-modulated spiking networks in the basal ganglia

Author: K Gurney
MA Farries
MD Humphries
MJ Frank
P Cisek
R Bogacz
Simon M Vogt
SP Singh
T Masquelier
Ulrich G Hofmann
W Potjans
W Schultz
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Closing the loop between neural network simulators and the OpenAI Gym

Author: Jordan Jakob
Morrison Abigail
Weidel Philipp
Publication venue
Publication date: 17/09/2017
Field of study

Since the enormous breakthroughs in machine learning over the last decade, functional neural network models are of growing interest for many researchers in the field of computational neuroscience. One major branch of research is concerned with biologically plausible implementations of reinforcement learning, with a variety of different models developed over the recent years. However, most studies in this area are conducted with custom simulation scripts and manually implemented tasks. This makes it hard for other researchers to reproduce and build upon previous work and nearly impossible to compare the performance of different learning architectures. In this work, we present a novel approach to solve this problem, connecting benchmark tools from the field of machine learning and state-of-the-art neural network simulators from computational neuroscience. This toolchain enables researchers in both fields to make use of well-tested high-performance simulation software supporting biologically plausible neuron, synapse and network models and allows them to evaluate and compare their approach on the basis of standardized environments of varying complexity. We demonstrate the functionality of the toolchain by implementing a neuronal actor-critic architecture for reinforcement learning in the NEST simulator and successfully training it on two different environments from the OpenAI Gym

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement Learning

Author: Cheng Xiang
Jia Shuncheng
Xu Bo
Zhang Duzhen
Zhang Tielin
Publication venue
Publication date: 14/06/2021
Field of study

With the Deep Neural Networks (DNNs) as a powerful function approximator, Deep Reinforcement Learning (DRL) has been excellently demonstrated on robotic control tasks. Compared to DNNs with vanilla artificial neurons, the biologically plausible Spiking Neural Network (SNN) contains a diverse population of spiking neurons, making it naturally powerful on state representation with spatial and temporal information. Based on a hybrid learning framework, where a spike actor-network infers actions from states and a deep critic network evaluates the actor, we propose a Population-coding and Dynamic-neurons improved Spiking Actor Network (PDSAN) for efficient state representation from two different scales: input coding and neuronal coding. For input coding, we apply population coding with dynamically receptive fields to directly encode each input state component. For neuronal coding, we propose different types of dynamic-neurons (containing 1st-order and 2nd-order neuronal dynamics) to describe much more complex neuronal dynamics. Finally, the PDSAN is trained in conjunction with deep critic networks using the Twin Delayed Deep Deterministic policy gradient algorithm (TD3-PDSAN). Extensive experimental results show that our TD3-PDSAN model achieves better performance than state-of-the-art models on four OpenAI gym benchmark tasks. It is an important attempt to improve RL with SNN towards the effective computation satisfying biological plausibility.Comment: 27 pages, 11 figures, accepted by Journal of Neural Network

arXiv.org e-Print Archive

A Compositionality Machine Realized by a Hierarchic Architecture of Synfire Chains

Author: Abigail eMorrison
Abigail eMorrison
Abigail eMorrison
Markus eDiesmann
Markus eDiesmann
Markus eDiesmann
Sven eSchrader
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2011
Field of study

The composition of complex behavior is thought to rely on the concurrent and sequential activation of simpler action components, or primitives. Systems of synfire chains have previously been proposed to account for either the simultaneous or the sequential aspects of compositionality; however, the compatibility of the two aspects has so far not been addressed. Moreover, the simultaneous activation of primitives has up until now only been investigated in the context of reactive computations, i.e., the perception of stimuli. In this study we demonstrate how a hierarchical organization of synfire chains is capable of generating both aspects of compositionality for proactive computations such as the generation of complex and ongoing action. To this end, we develop a network model consisting of two layers of synfire chains. Using simple drawing strokes as a visualization of abstract primitives, we map the feed-forward activity of the upper level synfire chains to motion in two-dimensional space. Our model is capable of producing drawing strokes that are combinations of primitive strokes by binding together the corresponding chains. Moreover, when the lower layer of the network is constructed in a closed-loop fashion, drawing strokes are generated sequentially. The generated pattern can be random or deterministic, depending on the connection pattern between the lower level chains. We propose quantitative measures for simultaneity and sequentiality, revealing a wide parameter range in which both aspects are fulfilled. Finally, we investigate the spiking activity of our model to propose candidate signatures of synfire chain computation in measurements of neural activity during action execution

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Juelich Shared Electronic Resources

Reinforcement Learning on Slow Features of High-Dimensional Input Streams

Author: Legenstein Robert
Wilbert Niko
Wiskott Laurenz
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Rare neural correlations implement robotic conditioning with delayed rewards and disturbances

Author: Andre Lemme (7167692)
Andrea Soltoggio (1248822)
Felix Reinhart (7168535)
Jochen Steil (5426480)
Publication venue
Publication date: 01/01/2013
Field of study

Neural conditioning associates cues and actions with following rewards. The environments in which robots operate, however, are pervaded by a variety of disturbing stimuli and uncertain timing. In particular, variable reward delays make it difficult to reconstruct which previous actions are responsible for following rewards. Such an uncertainty is handled by biological neural networks, but represents a challenge for computational models, suggesting the lack of a satisfactory theory for robotic neural conditioning. The present study demonstrates the use of rare neural correlations in making correct associations between rewards and previous cues or actions. Rare correlations are functional in selecting sparse synapses to be eligible for later weight updates if a reward occurs. The repetition of this process singles out the associating and reward-triggering pathways, and thereby copes with distal rewards. The neural network displays macro-level classical and operant conditioning, which is demonstrated in an interactive real-life human-robot interaction. The proposed mechanism models realistic conditioning in humans and animals and implements similar behaviors in neuro-robotic platforms

Loughborough University Institutional Repository

Synchronicity:The Role of Midbrain Dopamine in Whole-Brain Coordination

Author: Beeler Jeff A.
Dreyer Jakob Kisbye
Publication venue: 'Society for Neuroscience'
Publication date: 01/01/2019
Field of study

Midbrain dopamine seems to play an outsized role in motivated behavior and learning. Widely associated with mediating reward-related behavior, decision making, and learning, dopamine continues to generate controversies in the field. While many studies and theories focus on what dopamine cells encode, the question of how the midbrain derives the information it encodes is poorly understood and comparatively less addressed. Recent anatomical studies suggest greater diversity and complexity of afferent inputs than previously appreciated, requiring rethinking of prior models. Here, we elaborate a hypothesis that construes midbrain dopamine as implementing a Bayesian selector in which individual dopamine cells sample afferent activity across distributed brain substrates, comprising evidence to be evaluated on the extent to which stimuli in the on-going sensorimotor stream organizes distributed, parallel processing, reflecting implicit value. To effectively generate a temporally resolved phasic signal, a population of dopamine cells must exhibit synchronous activity. We argue that synchronous activity across a population of dopamine cells signals consensus across distributed afferent substrates, invigorating responding to recognized opportunities and facilitating further learning. In framing our hypothesis, we shift from the question of how value is computed to the broader question of how the brain achieves coordination across distributed, parallel processing. We posit the midbrain is part of an “axis of agency” in which the prefrontal cortex (PFC), basal ganglia (BGS), and midbrain form an axis mediating control, coordination, and consensus, respectively

City University of New York

Copenhagen University Research Information System

Dopamine, affordance and active inference.

Author: A Rosell
A Yuille
AA Moustafa
AA Moustafa
AA Moustafa
AD Redish
AJ Lees
AM Gotham
AM Owen
AV Kravitz
B Berger
B van Swinderen
BF Skinner
C Bergson
C Bick
C Mathys
C Missale
CD Fiorillo
CR Gerfen
D Badre
D Baldauf
D Joel
D Mumford
DA Allport
DA Lewis
DA Peterson
DC Knill
E Bezard
E Bird
E Gherri
E Koechlin
ES Bromberg-Martin
ET Rolls
FG Ashby
G Chevalier
G Deco
G Winterer
H Deubel
H Feldman
H Haken
Harriet Brown
HC Margolese
J Diedrichsen
J Zhang
JF Leckman
JF Smiley
JJ Gibson
JJ Gibson
JL Plotkin
JM Fuster
Joseph M. Galea
JR Crittenden
JR Müller
K Doya
K Doya
K Friston
K Friston
K Friston
K Friston
K Friston
K Friston
K Friston
K Friston
K Gurney
KA Dalrymple
Karl J. Friston
KC Berridge
KC Berridge
KJ Campbell
KJ Campbell
KJ Friston
KJ Friston
KJ Friston
Klaas Enno Stephan
KM Shannon
LG Ungerleider
LM Harrison
LS Krimer
LS Zweifel
M Guitart-Masip
M Matsumoto
M Rabinovich
M Takada
M Toussaint
MA Nitsche
MD Humphries
ME Goldberg
MF Rushworth
MI Garrido
MJ Frank
MJ Frank
MS Lidow
MS Lidow
N Parush
ND Daw
O Monchi
Olaf Sporns
P Anselme
P Cisek
P Cisek
P Dayan
P Dayan
P Redgrave
PR Montague
PS Goldman-Rakic
R Bellman
R Cools
R Cools
Raymond J. Dolan
RB Rutledge
RG Brown
Rick Adams
RL Gregory
Rosalyn Moran
RP Rao
RS Sutton
S Kakade
S Kakei
S Kapur
S Kojima
SA Davidoff
SHGM Ahmed
SHLM Bestmann
SJ Kiebel
SJ Kiebel
SM Hersch
SM McClure
SM Wanjerkhede
ST Grafton
Sven Bestmann
Tamara Shiner
TE Hazy
TE Hazy
Thomas FitzGerald
TJ Vickery
TS Braver
TS Braver
TV Maia
TV Wiecki
UM D'Souza
V Afraimovich
VL Ginzburg
W Potjans
W Schultz
W Schultz
W Schultz
W Shen
W Wu
WD Yao
Y Kubota
Y Kwak
Publication venue
Publication date: 01/01/2011
Field of study

The role of dopamine in behaviour and decision-making is often cast in terms of reinforcement learning and optimal decision theory. Here, we present an alternative view that frames the physiology of dopamine in terms of Bayes-optimal behaviour. In this account, dopamine controls the precision or salience of (external or internal) cues that engender action. In other words, dopamine balances bottom-up sensory information and top-down prior beliefs when making hierarchical inferences (predictions) about cues that have affordance. In this paper, we focus on the consequences of changing tonic levels of dopamine firing using simulations of cued sequential movements. Crucially, the predictions driving movements are based upon a hierarchical generative model that infers the context in which movements are made. This means that we can confuse agents by changing the context (order) in which cues are presented. These simulations provide a (Bayes-optimal) model of contextual uncertainty and set switching that can be quantified in terms of behavioural and electrophysiological responses. Furthermore, one can simulate dopaminergic lesions (by changing the precision of prediction errors) to produce pathological behaviours that are reminiscent of those seen in neurological disorders such as Parkinson's disease. We use these simulations to demonstrate how a single functional role for dopamine at the synaptic level can manifest in different ways at the behavioural level

CiteSeerX

Public Library of Science (PLOS)

Repository for Publications and Research Data

Crossref

University of Birmingham Research Portal

Directory of Open Access Journals

UCL Discovery

PubMed Central

ZORA

University of East Anglia digital repository

MPG.PuRe

Explore Bristol Research

Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

Author: A Arleo
A Barto
A Barto
A Klopf
A Klopf
A Morrison
B Devan
B Poucet
C Clopath
C Hull
C von der Malsburg
C Watkins
D Baras
D Di Castro
D Foster
D Sheynikhovich
DO Hebb
E Bienenstock
E Izhikevich
E Nordlie
E Oja
E Thorndike
E Toleman
E Vasilaki
Eleni Vasilaki
F Wörgötter
H Eichenbaum
H Markram
H Seung
I Fiete
J Baxter
J Wickens
J Wickens
JC Zhang
JNJ Reynolds
JNJ Reynolds
JP Pfister
K Doya
Karl J. Friston
KG Reymann
LF Abbott
M Packard
M Tsodyks
MA Farries
MCW van Rossum
N White
Nicolas Frémaux
P Dayan
P Dayan
P Redgrave
P Roberts
PJ Sjöström
R Jolivet
R Kempter
R Legenstein
R Morris
R Morris
R Morris
R Rao
R Rescorla
R Suri
R Sutton
R Sutton
R Urbanczik
R Williams
RB Stein
RC Malenka
Robert Urbanczik
RS Sutton
RV Florian
S Sajikumar
S Sajikumar
T Kohonen
T Stroesslin
TVP Bliss
U Frey
V Pawlak
W Gerstner
W Gerstner
W Gerstner
W Potjans
W Schultz
W Senn
Walter Senn
Wulfram Gerstner
X Xie
XJ Wang
Y Loewenstein
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Bern Open Repository and Information System (BORIS)

White Rose Research Online