Search CORE

522 research outputs found

[Re] How Attention Can Create Synaptic Tags for the Learning of Working Memories in Sequential Tasks

Author: Alexandre Frédéric
Le Masson Erwan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 09/12/2016
Field of study

International audienceThe reference paper introduces a new reinforcement learning model called Attention-Gated MEmory Tagging (AuGMEnT). The results presented suggest new approachesin understanding the acquisition of tasks requiring working memory and attentionalfeedback, as well as biologically plausible learning mechanisms. The model also improveson previous reinforcement learning schemes by allowing tasks to be expressedmore naturally as a sequence of inputs and outputs.A Python implementation of the model is available on the author’s GitHub pagewhich helped to verify the correctness of the computations. The script written forthis replication also uses Python along with NumPy

INRIA a CCSD electronic archive server

A Biologically Plausible Learning Rule for Deep Learning in the Brain

Author: Bohté Sander
Pozzi Isabella
Roelfsema Pieter
Publication venue
Publication date: 05/11/2018
Field of study

Researchers have proposed that deep learning, which is providing important progress in a wide range of high complexity tasks, might inspire new insights into learning in the brain. However, the methods used for deep learning by artificial neural networks are biologically unrealistic and would need to be replaced by biologically realistic counterparts. Previous biologically plausible reinforcement learning rules, like AGREL and AuGMEnT, showed promising results but focused on shallow networks with three layers. Will these learning rules also generalize to networks with more layers and can they handle tasks of higher complexity? We demonstrate the learning scheme on classical and hard image-classification benchmarks, namely MNIST, CIFAR10 and CIFAR100, cast as direct reward tasks, both for fully connected, convolutional and locally connected architectures. We show that our learning rule - Q-AGREL - performs comparably to supervised learning via error-backpropagation, with this type of trial-and-error reinforcement learning requiring only 1.5-2.5 times more epochs, even when classifying 100 different classes as in CIFAR100. Our results provide new insights into how deep learning may be implemented in the brain

arXiv.org e-Print Archive

CWI's Institutional Repository

Continuous-time spike-based reinforcement learning for working memory tasks

Author: A Gilra
D Silver
JO Rombouts
KN Gurney
PR Roelfsema
RS Sutton
S Hochreiter
TP Lillicrap
V Mnih
W Gerstner
Y Niv
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/10/2018
Field of study

As the brain purportedly employs on-policy reinforcement learning compatible with SARSA learning, and most interesting cognitive tasks require some form of memory while taking place in continuous-time, recent work has developed plausible reinforcement learning schemes that are compatible with these requirements. Lacking is a formulation of both computation and learning in terms of spiking neurons. Such a formulation creates both a closer mapping to biology, and also expresses such learning in terms of asynchronous and sparse neural computation. We present a spiking neural network with memory that learns cognitive tasks in continuous time. Learning is biologically plausibly implemented using the AuGMeNT framework, and we show how separate spiking forward and feedback networks suffice for learning the tasks just as fast the analog CT-AuGMeNT counterpart, while computing efficiently using very few spikes: 1–20 Hz on average

Crossref

CWI's Institutional Repository

Continuous-time on-policy neural reinforcement learning of working memory tasks

Author: Bohte S.M. (Sander)
Roelfsema P.R. (Pieter)
Zambrano D. (Davide)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

As living organisms, one of our primary characteristics is the ability to rapidly process and react to unknown and unexpected events. To this end, we are able to recognize an event or a sequence of events and learn to respond properly. Despite advances in machine learning, current cognitive robotic systems are not able to rapidly and efficiently respond in the real world: the challenge is to learn to recognize both what is important, and also when to act. Reinforcement Learning (RL) is typically used to solve complex tasks: to learn the how. To respond quickly - to learn when - the environment has to be sampled often enough. For “enough”, a programmer has to decide on the step-size as a time-representation, choosing between a fine-grained representation of time (many state-transitions; difficult to learn with RL) or to a coarse temporal resolution (easier to learn with RL but lacking precise timing). Here, we derive a continuous-time version of on-policy SARSA-learning in a working-memory neural network model, AuGMEnT. Using a neural working memory network resolves the what problem, our when solution is built on the notion that in the real world, instantaneous actions of duration dt are actually impossible. We demonstrate how we can decouple action duration from the internal time-steps in the neural RL model using an action selection system. The resultant CT-AuGMEnT successfully learns to react to the events of a continuous-time task, without any pre-imposed specifications about the duration of the events or the delays between them

CWI's Institutional Repository

Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of neoHebbian Three-Factor Learning Rules

Author: Brea Johanni
Corneil Dane
Gerstner Wulfram
Lehmann Marco
Liakoni Vasiliki
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales. Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only if an additional factor is present while the flag is set. This third factor, signaling reward, punishment, surprise, or novelty, could be implemented by the phasic activity of neuromodulators or specific neuronal inputs signaling special events. While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few years. Here we review, in the context of three-factor rules of synaptic plasticity, four key experiments that support the role of synaptic eligibility traces in combination with a third factor as a biological implementation of neoHebbian three-factor learning rules

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Directory of Open Access Journals

Frontiers - Publisher Connector

Closing the loop between neural network simulators and the OpenAI Gym

Author: Jordan Jakob
Morrison Abigail
Weidel Philipp
Publication venue
Publication date: 17/09/2017
Field of study

Since the enormous breakthroughs in machine learning over the last decade, functional neural network models are of growing interest for many researchers in the field of computational neuroscience. One major branch of research is concerned with biologically plausible implementations of reinforcement learning, with a variety of different models developed over the recent years. However, most studies in this area are conducted with custom simulation scripts and manually implemented tasks. This makes it hard for other researchers to reproduce and build upon previous work and nearly impossible to compare the performance of different learning architectures. In this work, we present a novel approach to solve this problem, connecting benchmark tools from the field of machine learning and state-of-the-art neural network simulators from computational neuroscience. This toolchain enables researchers in both fields to make use of well-tested high-performance simulation software supporting biologically plausible neuron, synapse and network models and allows them to evaluate and compare their approach on the basis of standardized environments of varying complexity. We demonstrate the functionality of the toolchain by implementing a neuronal actor-critic architecture for reinforcement learning in the NEST simulator and successfully training it on two different environments from the OpenAI Gym

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

A biologically plausible learning rule for deep learning in the brain

Author: Bohte S.M. (Sander)
Pozzi I. (Isabella)
Roelfsema P.R. (Pieter)
Publication venue
Publication date: 05/11/2018
Field of study

Researchers have proposed that deep learning, which is providing important progress in a wide range of high complexity tasks, might inspire new insights into learning in the brain. However, the methods used for deep learning by artificial neural networks are biologically unrealistic and would need to be replaced by biologically realistic counterparts. Previous biologically plausible reinforcement learning rules, like AGREL and AuGMEnT, showed promising results but focused on shallow networks with three layers. Will these learning rules also generalize to networks with more layers and can they handle tasks of higher complexity? Here, we demonstrate that these learning schemes indeed generalize to deep networks, if we include an attention network that propagates information about the selected action to lower network levels. The resulting learning rule, called Q-AGREL, is equivalent to a particular form of error-backpropagation that trains one output unit at any one time. To demonstrate the utility of the learning scheme for larger problems, we trained networks with two hidden layers on the MNIST dataset, a standard and interesting Machine Learning task. Our results demonstrate that the capability of Q-AGREL is comparable to that of error backpropagation, although the learning rate is 1.5-2 times slower because the network has to learn by trial-and-error and updates the action value of only one output unit at a time. Our results provide new insights into how deep learning can be implemented in the brain

CWI's Institutional Repository