23,709 research outputs found
Learning with Delayed Synaptic Plasticity
The plasticity property of biological neural networks allows them to perform
learning and optimize their behavior by changing their configuration. Inspired
by biology, plasticity can be modeled in artificial neural networks by using
Hebbian learning rules, i.e. rules that update synapses based on the neuron
activations and reinforcement signals. However, the distal reward problem
arises when the reinforcement signals are not available immediately after each
network output to associate the neuron activations that contributed to
receiving the reinforcement signal. In this work, we extend Hebbian plasticity
rules to allow learning in distal reward cases. We propose the use of neuron
activation traces (NATs) to provide additional data storage in each synapse to
keep track of the activation of the neurons. Delayed reinforcement signals are
provided after each episode relative to the networks' performance during the
previous episode. We employ genetic algorithms to evolve delayed synaptic
plasticity (DSP) rules and perform synaptic updates based on NATs and delayed
reinforcement signals. We compare DSP with an analogous hill climbing algorithm
that does not incorporate domain knowledge introduced with the NATs, and show
that the synaptic updates performed by the DSP rules demonstrate more effective
training performance relative to the HC algorithm.Comment: GECCO201
Short-term plasticity as cause-effect hypothesis testing in distal reward learning
Asynchrony, overlaps and delays in sensory-motor signals introduce ambiguity
as to which stimuli, actions, and rewards are causally related. Only the
repetition of reward episodes helps distinguish true cause-effect relationships
from coincidental occurrences. In the model proposed here, a novel plasticity
rule employs short and long-term changes to evaluate hypotheses on cause-effect
relationships. Transient weights represent hypotheses that are consolidated in
long-term memory only when they consistently predict or cause future rewards.
The main objective of the model is to preserve existing network topologies when
learning with ambiguous information flows. Learning is also improved by biasing
the exploration of the stimulus-response space towards actions that in the past
occurred before rewards. The model indicates under which conditions beliefs can
be consolidated in long-term memory, it suggests a solution to the
plasticity-stability dilemma, and proposes an interpretation of the role of
short-term plasticity.Comment: Biological Cybernetics, September 201
Spatio-Temporal Credit Assignment in Neuronal Population Learning
In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain
HDAC2 expression in parvalbumin interneurons regulates synaptic plasticity in the mouse visual cortex
An experience-dependent postnatal increase in GABAergic inhibition in the visual cortex is important for the closure of a critical period of enhanced synaptic plasticity. Although maturation of the subclass of parvalbumin (Pv)-expressing GABAergic interneurons is known to contribute to critical period closure, the role of epigenetics on cortical inhibition and synaptic plasticity has not been explored. The transcription regulator, histone deacetylase 2 (HDAC2), has been shown to modulate synaptic plasticity and learning processes in hippocampal excitatory neurons. We found that genetic deletion of HDAC2 specifically from Pv interneurons reduces inhibitory input in the visual cortex of adult mice and coincides with enhanced long-term depression that is more typical of young mice. These findings show that HDAC2 loss in Pv interneurons leads to a delayed closure of the critical period in the visual cortex and supports the hypothesis that HDAC2 is a key negative regulator of synaptic plasticity in the adult brain.National Institute of Neurological Diseases and Stroke (U.S.) (Grant NS078839)National Institute on Aging (Grant NS078839
Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of neoHebbian Three-Factor Learning Rules
Most elementary behaviors such as moving the arm to grasp an object or
walking into the next room to explore a museum evolve on the time scale of
seconds; in contrast, neuronal action potentials occur on the time scale of a
few milliseconds. Learning rules of the brain must therefore bridge the gap
between these two different time scales.
Modern theories of synaptic plasticity have postulated that the co-activation
of pre- and postsynaptic neurons sets a flag at the synapse, called an
eligibility trace, that leads to a weight change only if an additional factor
is present while the flag is set. This third factor, signaling reward,
punishment, surprise, or novelty, could be implemented by the phasic activity
of neuromodulators or specific neuronal inputs signaling special events. While
the theoretical framework has been developed over the last decades,
experimental evidence in support of eligibility traces on the time scale of
seconds has been collected only during the last few years.
Here we review, in the context of three-factor rules of synaptic plasticity,
four key experiments that support the role of synaptic eligibility traces in
combination with a third factor as a biological implementation of neoHebbian
three-factor learning rules
Forward Table-Based Presynaptic Event-Triggered Spike-Timing-Dependent Plasticity
Spike-timing-dependent plasticity (STDP) incurs both causal and acausal
synaptic weight updates, for negative and positive time differences between
pre-synaptic and post-synaptic spike events. For realizing such updates in
neuromorphic hardware, current implementations either require forward and
reverse lookup access to the synaptic connectivity table, or rely on
memory-intensive architectures such as crossbar arrays. We present a novel
method for realizing both causal and acausal weight updates using only forward
lookup access of the synaptic connectivity table, permitting memory-efficient
implementation. A simplified implementation in FPGA, using a single timer
variable for each neuron, closely approximates exact STDP cumulative weight
updates for neuron refractory periods greater than 10 ms, and reduces to exact
STDP for refractory periods greater than the STDP time window. Compared to
conventional crossbar implementation, the forward table-based implementation
leads to substantial memory savings for sparsely connected networks supporting
scalable neuromorphic systems with fully reconfigurable synaptic connectivity
and plasticity.Comment: Submitted to BioCAS 201
Reinforcement learning in populations of spiking neurons
Population coding is widely regarded as a key mechanism for achieving reliable behavioral responses in the face of neuronal variability. But in standard reinforcement learning a flip-side becomes apparent. Learning slows down with increasing population size since the global reinforcement becomes less and less related to the performance of any single neuron. We show that, in contrast, learning speeds up with increasing population size if feedback about the populationresponse modulates synaptic plasticity in addition to global reinforcement. The two feedback signals (reinforcement and population-response signal) can be encoded by ambient neurotransmitter concentrations which vary slowly, yielding a fully online plasticity rule where the learning of a stimulus is interleaved with the processing of the subsequent one. The assumption of a single additional feedback mechanism therefore reconciles biological plausibility with efficient learning
A Neural Network Approach for Analyzing the Illusion of Movement in Static Images
The purpose of this work is to analyze the illusion of movement that appears when seeing certain static images. This analysis is accomplished by using a biologically plausible neural network that learned (in a unsupervised manner) to identify the movement direction of shifting training patterns. Some of the biological features that characterizes this neural network are: intrinsic plasticity to adapt firing probability, metaplasticity to regulate synaptic weights and firing adaptation of simulated pyramidal networks. After analyzing the results, we hypothesize that the illusion is due to cinematographic perception mechanisms in the brain due to which each visual frame is renewed approximately each 100 msec. Blurring of moving object in visual frames might be interpreted by the brain as movement, the same as if we present a static blurred object
- …