253 research outputs found
A Biologically Plausible Learning Rule for Deep Learning in the Brain
Researchers have proposed that deep learning, which is providing important
progress in a wide range of high complexity tasks, might inspire new insights
into learning in the brain. However, the methods used for deep learning by
artificial neural networks are biologically unrealistic and would need to be
replaced by biologically realistic counterparts. Previous biologically
plausible reinforcement learning rules, like AGREL and AuGMEnT, showed
promising results but focused on shallow networks with three layers. Will these
learning rules also generalize to networks with more layers and can they handle
tasks of higher complexity? We demonstrate the learning scheme on classical and
hard image-classification benchmarks, namely MNIST, CIFAR10 and CIFAR100, cast
as direct reward tasks, both for fully connected, convolutional and locally
connected architectures. We show that our learning rule - Q-AGREL - performs
comparably to supervised learning via error-backpropagation, with this type of
trial-and-error reinforcement learning requiring only 1.5-2.5 times more
epochs, even when classifying 100 different classes as in CIFAR100. Our results
provide new insights into how deep learning may be implemented in the brain
The effects of pair-wise and higher order correlations on the firing rate of a post-synaptic neuron
Coincident firing of neurons projecting to a common target cell is likely to raise the probability of firing of this post-synaptic cell. Therefore synchronized firing constitutes a significant event for post-synaptic neurons and is likely to play a role in neuronal information processing. Physiological data on synchronized firing in cortical networks is primarily based on paired recordings and cross-correlation analysis. However, pair-wise correlations among all inputs onto a post-synaptic neuron do not uniquely determine the distribution of simultaneous post-synaptic events. We develop a framework in order to calculate the amount of synchronous firing that, based on maximum entropy, should exist in a homogeneous neural network in which the neurons have known pair-wise correlations and higher order structure is absent. According to the distribution of maximal entropy, synchronous events in which a large proportion of the neurons participates should exist, even in the case of weak pair-wise correlations. Network simulations also exhibit these highly synchronous events in the case of weak pair-wise correlations. If such a group of neurons provides input to a common post-synaptic target, these network bursts may enhance the impact of this input, especially in the case of a high post-synaptic threshold. Unfortunately, the proportion of neurons participating in synchronous bursts can be approximated by our method under restricted conditions. When these conditions are not fulfilled, the spike trains have less than maximal entropy, which is indicative of the presence of higher order structure. In this situation, the degree of synchronicity cannot be derived from the pair-wise correlations
A biologically plausible learning rule for deep learning in the brain
Researchers have proposed that deep learning, which is providing important progress in a wide range of high complexity tasks, might inspire new insights into learning in the brain. However, the methods used for deep learning by artificial neural networks are biologically unrealistic and would need to be replaced by biologically realistic counterparts. Previous biologically plausible reinforcement learning rules, like AGREL and AuGMEnT, showed promising results but focused on shallow networks with three layers. Will these learning rules also generalize to networks with more layers and can they handle tasks of higher complexity? Here, we demonstrate that these learning schemes indeed generalize to deep networks, if we include an attention network that propagates information about the selected action to lower network levels. The resulting learning rule, called Q-AGREL, is equivalent to a particular form of error-backpropagation that trains one output unit at any one time. To demonstrate the utility of the learning scheme for larger problems, we trained networks with two hidden layers on the MNIST dataset, a standard and interesting Machine Learning task. Our results demonstrate that the capability of Q-AGREL is comparable to that of error backpropagation, although the learning rate is 1.5-2 times slower because the network has to learn by trial-and-error and updates the action value of only one output unit at a time. Our results provide new insights into how deep learning can be implemented in the brain
Continuous-time on-policy neural reinforcement learning of working memory tasks
As living organisms, one of our primary characteristics is the ability to rapidly process and react to unknown and unexpected events. To this end, we are able to recognize an event or a sequence of events and learn to respond properly. Despite advances in machine learning, current cognitive robotic systems are not able to rapidly and efficiently respond in the real world: the challenge is to learn to recognize both what is important, and also when to act. Reinforcement Learning (RL) is typically used to solve complex tasks: to learn the how. To respond quickly - to learn when - the environment has to be sampled often enough. For “enough”, a programmer has to decide on the step-size as a time-representation, choosing between a fine-grained representation of time (many state-transitions; difficult to learn with RL) or to a coarse temporal resolution (easier to learn with RL but lacking precise timing). Here, we derive a continuous-time version of on-policy SARSA-learning in a working-memory neural network model, AuGMEnT. Using a neural working memory network resolves the what problem, our when solution is built on the notion that in the real world, instantaneous actions of duration dt are actually impossible. We demonstrate how we can decouple action duration from the internal time-steps in the neural RL model using an action selection system. The resultant CT-AuGMEnT successfully learns to react to the events of a continuous-time task, without any pre-imposed specifications about the duration of the events or the delays between them
- …