674 research outputs found
Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks
The state-of-the art machine learning approach to training deep neural networks, backpropagation, is implausible for real neural networks: neurons need to know their outgoing weights; training alternates between a bottom-up forward pass (computation) and a top-down backward pass (learning); and the algorithm often needs precise labels of many data points. Biologically plausible approximations to backpropagation, such as feedback alignment, solve the weight transport problem, but not the other two. Thus, fully biologically plausible learning rules have so far remained elusive. Here we present a family of learning rules that does not suffer from any of these problems. It is motivated by the information bottleneck principle (extended with kernel methods), in which networks learn to compress the input as much as possible without sacrificing prediction of the output. The resulting rules have a 3-factor Hebbian structure: they require pre- and post-synaptic firing rates and an error signal - the third factor - consisting of a global teaching signal and a layer-specific term, both available without a top-down pass. They do not require precise labels; instead, they rely on the similarity between pairs of desired outputs. Moreover, to obtain good performance on hard problems and retain biological plausibility, our rules need divisive normalization - a known feature of biological networks. Finally, simulations show that our rules perform nearly as well as backpropagation on image classification tasks
Rapid Bayesian learning in the mammalian olfactory system
Many experimental studies suggest that animals can rapidly learn to identify odors and predict the rewards associated with them. However, the underlying plasticity mechanism remains elusive. In particular, it is not clear how olfactory circuits achieve rapid, data efficient learning with local synaptic plasticity. Here, we formulate olfactory learning as a Bayesian optimization process, then map the learning rules into a computational model of the mammalian olfactory circuit. The model is capable of odor identification from a small number of observations, while reproducing cellular plasticity commonly observed during development. We extend the framework to reward-based learning, and show that the circuit is able to rapidly learn odor-reward association with a plausible neural architecture. These results deepen our theoretical understanding of unsupervised learning in the mammalian brain
Post-decisional accounts of biases in confidence
Most models of decision-making suggest that confidence, the 'feeling of knowing' that accompanies our choices, is constructed as the decision unfolds. However, more recent studies have noted that processes occurring after we commit to a particular choice also affect this subjective belief. This leads to the following question: when are we better judges of ourselves? If, after a decision, evidence continues to accumulate in an unbiased manner, then our confidence judgements should improve. Conversely, if post-decisional information processing is biased, our sense of confidence could be distorted, and so our confidence judgements should degrade with time. We briefly discuss recently proposed models of post-decisional evidence accumulation, and explore whether, and how, biases in confidence could arise
Noisy Synaptic Conductance: Bug or a Feature?
More often than not, action potentials fail to trigger neurotransmitter release. And even when neurotransmitter is released, the resulting change in synaptic conductance is highly variable. Given the energetic cost of generating and propagating action potentials, and the importance of information transmission across synapses, this seems both wasteful and inefficient. However, synaptic noise arising from variable transmission can improve, in certain restricted conditions, information transmission. Under broader conditions, it can improve information transmission per release, a quantity that is relevant given the energetic constraints on computing in the brain. Here we discuss the role, both positive and negative, synaptic noise plays in information transmission and computation in the brain
Looking back on the first year of Neural Systems & Circuits
Molecular and Cellular Biolog
Towards Biologically Plausible Convolutional Networks
Convolutional networks are ubiquitous in deep learning. They are particularly useful for images, as they reduce the number of parameters, reduce training time, and increase accuracy. However, as a model of the brain they are seriously problematic, since they require weight sharing - something real neurons simply cannot do. Consequently, while neurons in the brain can be locally connected (one of the features of convolutional networks), they cannot be convolutional. Locally connected but non-convolutional networks, however, significantly underperform convolutional ones. This is troublesome for studies that use convolutional networks to explain activity in the visual system. Here we study plausible alternatives to weight sharing that aim at the same regularization principle, which is to make each neuron within a pool react similarly to identical inputs. The most natural way to do that is by showing the network multiple translations of the same image, akin to saccades in animal vision. However, this approach requires many translations, and doesn't remove the performance gap. We propose instead to add lateral connectivity to a locally connected network, and allow learning via Hebbian plasticity. This requires the network to pause occasionally for a sleep-like phase of "weight sharing". This method enables locally connected networks to achieve nearly convolutional performance on ImageNet and improves their fit to the ventral stream data, thus supporting convolutional networks as a model of the visual stream
Powerpropagation: A sparsity inducing weight reparameterisation
The training of sparse neural networks is becoming an increasingly important tool
for reducing the computational footprint of models at training and evaluation, as
well enabling the effective scaling up of models. Whereas much work over the
years has been dedicated to specialised pruning techniques, little attention has
been paid to the inherent effect of gradient based training on model sparsity. In
this work, we introduce Powerpropagation, a new weight-parameterisation for
neural networks that leads to inherently sparse models. Exploiting the behaviour
of gradient descent, our method gives rise to weight updates exhibiting a “rich get
richer” dynamic, leaving low-magnitude parameters largely unaffected by learning.
Models trained in this manner exhibit similar performance, but have a distribution
with markedly higher density at zero, allowing more parameters to be pruned safely.
Powerpropagation is general, intuitive, cheap and straight-forward to implement
and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent
line of work, we investigate its effect on sparse training for resource-constrained
settings. Here, we combine Powerpropagation with a traditional weight-pruning
technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing
superior performance on the ImageNet benchmark. Secondly, we advocate the use
of sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all
cases our reparameterisation considerably increases the efficacy of the off-the-shelf
methods
Shared Information -- New Insights and Problems in Decomposing Information in Complex Systems
How can the information that a set of random variables
contains about another random variable be decomposed? To what extent do
different subgroups provide the same, i.e. shared or redundant, information,
carry unique information or interact for the emergence of synergistic
information?
Recently Williams and Beer proposed such a decomposition based on natural
properties for shared information. While these properties fix the structure of
the decomposition, they do not uniquely specify the values of the different
terms. Therefore, we investigate additional properties such as strong symmetry
and left monotonicity. We find that strong symmetry is incompatible with the
properties proposed by Williams and Beer. Although left monotonicity is a very
natural property for an information measure it is not fulfilled by any of the
proposed measures.
We also study a geometric framework for information decompositions and ask
whether it is possible to represent shared information by a family of posterior
distributions.
Finally, we draw connections to the notions of shared knowledge and common
knowledge in game theory. While many people believe that independent variables
cannot share information, we show that in game theory independent agents can
have shared knowledge, but not common knowledge. We conclude that intuition and
heuristic arguments do not suffice when arguing about information.Comment: 20 page
How well do mean field theories of spiking quadratic-integrate-and-fire networks work in realistic parameter regimes?
We use mean field techniques to compute the distribution of excitatory and inhibitory firing rates in large networks of randomly connected spiking quadratic integrate and fire neurons. These techniques are based on the assumption that activity is asynchronous and Poisson. For most parameter settings these assumptions are strongly violated; nevertheless, so long as the networks are not too synchronous, we find good agreement between mean field prediction and network simulations. Thus, much of the intuition developed for randomly connected networks in the asynchronous regime applies to mildly synchronous networks
- …