674 research outputs found

    Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks

    Get PDF
    The state-of-the art machine learning approach to training deep neural networks, backpropagation, is implausible for real neural networks: neurons need to know their outgoing weights; training alternates between a bottom-up forward pass (computation) and a top-down backward pass (learning); and the algorithm often needs precise labels of many data points. Biologically plausible approximations to backpropagation, such as feedback alignment, solve the weight transport problem, but not the other two. Thus, fully biologically plausible learning rules have so far remained elusive. Here we present a family of learning rules that does not suffer from any of these problems. It is motivated by the information bottleneck principle (extended with kernel methods), in which networks learn to compress the input as much as possible without sacrificing prediction of the output. The resulting rules have a 3-factor Hebbian structure: they require pre- and post-synaptic firing rates and an error signal - the third factor - consisting of a global teaching signal and a layer-specific term, both available without a top-down pass. They do not require precise labels; instead, they rely on the similarity between pairs of desired outputs. Moreover, to obtain good performance on hard problems and retain biological plausibility, our rules need divisive normalization - a known feature of biological networks. Finally, simulations show that our rules perform nearly as well as backpropagation on image classification tasks

    Rapid Bayesian learning in the mammalian olfactory system

    Get PDF
    Many experimental studies suggest that animals can rapidly learn to identify odors and predict the rewards associated with them. However, the underlying plasticity mechanism remains elusive. In particular, it is not clear how olfactory circuits achieve rapid, data efficient learning with local synaptic plasticity. Here, we formulate olfactory learning as a Bayesian optimization process, then map the learning rules into a computational model of the mammalian olfactory circuit. The model is capable of odor identification from a small number of observations, while reproducing cellular plasticity commonly observed during development. We extend the framework to reward-based learning, and show that the circuit is able to rapidly learn odor-reward association with a plausible neural architecture. These results deepen our theoretical understanding of unsupervised learning in the mammalian brain

    Post-decisional accounts of biases in confidence

    Get PDF
    Most models of decision-making suggest that confidence, the 'feeling of knowing' that accompanies our choices, is constructed as the decision unfolds. However, more recent studies have noted that processes occurring after we commit to a particular choice also affect this subjective belief. This leads to the following question: when are we better judges of ourselves? If, after a decision, evidence continues to accumulate in an unbiased manner, then our confidence judgements should improve. Conversely, if post-decisional information processing is biased, our sense of confidence could be distorted, and so our confidence judgements should degrade with time. We briefly discuss recently proposed models of post-decisional evidence accumulation, and explore whether, and how, biases in confidence could arise

    Noisy Synaptic Conductance: Bug or a Feature?

    Get PDF
    More often than not, action potentials fail to trigger neurotransmitter release. And even when neurotransmitter is released, the resulting change in synaptic conductance is highly variable. Given the energetic cost of generating and propagating action potentials, and the importance of information transmission across synapses, this seems both wasteful and inefficient. However, synaptic noise arising from variable transmission can improve, in certain restricted conditions, information transmission. Under broader conditions, it can improve information transmission per release, a quantity that is relevant given the energetic constraints on computing in the brain. Here we discuss the role, both positive and negative, synaptic noise plays in information transmission and computation in the brain

    Towards Biologically Plausible Convolutional Networks

    Get PDF
    Convolutional networks are ubiquitous in deep learning. They are particularly useful for images, as they reduce the number of parameters, reduce training time, and increase accuracy. However, as a model of the brain they are seriously problematic, since they require weight sharing - something real neurons simply cannot do. Consequently, while neurons in the brain can be locally connected (one of the features of convolutional networks), they cannot be convolutional. Locally connected but non-convolutional networks, however, significantly underperform convolutional ones. This is troublesome for studies that use convolutional networks to explain activity in the visual system. Here we study plausible alternatives to weight sharing that aim at the same regularization principle, which is to make each neuron within a pool react similarly to identical inputs. The most natural way to do that is by showing the network multiple translations of the same image, akin to saccades in animal vision. However, this approach requires many translations, and doesn't remove the performance gap. We propose instead to add lateral connectivity to a locally connected network, and allow learning via Hebbian plasticity. This requires the network to pause occasionally for a sleep-like phase of "weight sharing". This method enables locally connected networks to achieve nearly convolutional performance on ImageNet and improves their fit to the ventral stream data, thus supporting convolutional networks as a model of the visual stream

    Powerpropagation: A sparsity inducing weight reparameterisation

    Get PDF
    The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity. In this work, we introduce Powerpropagation, a new weight-parameterisation for neural networks that leads to inherently sparse models. Exploiting the behaviour of gradient descent, our method gives rise to weight updates exhibiting a “rich get richer” dynamic, leaving low-magnitude parameters largely unaffected by learning. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Powerpropagation is general, intuitive, cheap and straight-forward to implement and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent line of work, we investigate its effect on sparse training for resource-constrained settings. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark. Secondly, we advocate the use of sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all cases our reparameterisation considerably increases the efficacy of the off-the-shelf methods

    Shared Information -- New Insights and Problems in Decomposing Information in Complex Systems

    Full text link
    How can the information that a set X1,...,Xn{X_{1},...,X_{n}} of random variables contains about another random variable SS be decomposed? To what extent do different subgroups provide the same, i.e. shared or redundant, information, carry unique information or interact for the emergence of synergistic information? Recently Williams and Beer proposed such a decomposition based on natural properties for shared information. While these properties fix the structure of the decomposition, they do not uniquely specify the values of the different terms. Therefore, we investigate additional properties such as strong symmetry and left monotonicity. We find that strong symmetry is incompatible with the properties proposed by Williams and Beer. Although left monotonicity is a very natural property for an information measure it is not fulfilled by any of the proposed measures. We also study a geometric framework for information decompositions and ask whether it is possible to represent shared information by a family of posterior distributions. Finally, we draw connections to the notions of shared knowledge and common knowledge in game theory. While many people believe that independent variables cannot share information, we show that in game theory independent agents can have shared knowledge, but not common knowledge. We conclude that intuition and heuristic arguments do not suffice when arguing about information.Comment: 20 page

    How well do mean field theories of spiking quadratic-integrate-and-fire networks work in realistic parameter regimes?

    Get PDF
    We use mean field techniques to compute the distribution of excitatory and inhibitory firing rates in large networks of randomly connected spiking quadratic integrate and fire neurons. These techniques are based on the assumption that activity is asynchronous and Poisson. For most parameter settings these assumptions are strongly violated; nevertheless, so long as the networks are not too synchronous, we find good agreement between mean field prediction and network simulations. Thus, much of the intuition developed for randomly connected networks in the asynchronous regime applies to mildly synchronous networks
    • …
    corecore