56 research outputs found
A Spiking Neural Network with Local Learning Rules Derived From Nonnegative Similarity Matching
The design and analysis of spiking neural network algorithms will be
accelerated by the advent of new theoretical approaches. In an attempt at such
approach, we provide a principled derivation of a spiking algorithm for
unsupervised learning, starting from the nonnegative similarity matching cost
function. The resulting network consists of integrate-and-fire units and
exhibits local learning rules, making it biologically plausible and also
suitable for neuromorphic hardware. We show in simulations that the algorithm
can perform sparse feature extraction and manifold learning, two tasks which
can be formulated as nonnegative similarity matching problems.Comment: ICASSP 201
A Normative Theory of Adaptive Dimensionality Reduction in Neural Networks
To make sense of the world our brains must analyze high-dimensional datasets
streamed by our sensory organs. Because such analysis begins with
dimensionality reduction, modelling early sensory processing requires
biologically plausible online dimensionality reduction algorithms. Recently, we
derived such an algorithm, termed similarity matching, from a Multidimensional
Scaling (MDS) objective function. However, in the existing algorithm, the
number of output dimensions is set a priori by the number of output neurons and
cannot be changed. Because the number of informative dimensions in sensory
inputs is variable there is a need for adaptive dimensionality reduction. Here,
we derive biologically plausible dimensionality reduction algorithms which
adapt the number of output dimensions to the eigenspectrum of the input
covariance matrix. We formulate three objective functions which, in the offline
setting, are optimized by the projections of the input dataset onto its
principal subspace scaled by the eigenvalues of the output covariance matrix.
In turn, the output eigenvalues are computed as i) soft-thresholded, ii)
hard-thresholded, iii) equalized thresholded eigenvalues of the input
covariance matrix. In the online setting, we derive the three corresponding
adaptive algorithms and map them onto the dynamics of neuronal activity in
networks with biologically plausible local learning rules. Remarkably, in the
last two networks, neurons are divided into two classes which we identify with
principal neurons and interneurons in biological circuits.Comment: Advances in Neural Information Processing Systems (NIPS), 201
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
We derive analytical expressions for the generalization performance of kernel
regression as a function of the number of training samples using theoretical
methods from Gaussian processes and statistical physics. Our expressions apply
to wide neural networks due to an equivalence between training them and kernel
regression with the Neural Tangent Kernel (NTK). By computing the decomposition
of the total generalization error due to different spectral components of the
kernel, we identify a new spectral principle: as the size of the training set
grows, kernel machines and neural networks fit successively higher spectral
modes of the target function. When data are sampled from a uniform distribution
on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit
learning stages where different frequency modes of the target function are
learned. We verify our theory with simulations on synthetic data and MNIST
dataset.Comment: ICML 2020 Update: Updated section on asymptotics generalization error
for power law spectra, finding agreement with Spigler, Geiger, Wyart 2019
arXiv:1905.10843. Added a section on Discrete measures and an MNIST
Experiment. Eigenvalue problem can be approximated by Kernel PCA. Typo fixed
on 2/25/202
Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons
Excitation-inhibition (E-I) balance is ubiquitously observed in the cortex.
Recent studies suggest an intriguing link between balance on fast timescales,
tight balance, and efficient information coding with spikes. We further this
connection by taking a principled approach to optimal balanced networks of
excitatory (E) and inhibitory (I) neurons. By deriving E-I spiking neural
networks from greedy spike-based optimizations of constrained minimax
objectives, we show that tight balance arises from correcting for deviations
from the minimax optima. We predict specific neuron firing rates in the network
by solving the minimax problem, going beyond statistical theories of balanced
networks. Finally, we design minimax objectives for reconstruction of an input
signal, associative memory, and storage of manifold attractors, and derive from
them E-I networks that perform the computation. Overall, we present a novel
normative modeling approach for spiking E-I networks, going beyond the
widely-used energy minimizing networks that violate Dale's law. Our networks
can be used to model cortical circuits and computations
Holography, Fractals and the Weyl Anomaly
We study the large source asymptotics of the generating functional in quantum
field theory using the holographic renormalization group, and draw comparisons
with the asymptotics of the Hopf characteristic function in fractal geometry.
Based on the asymptotic behavior, we find a correspondence relating the Weyl
anomaly and the fractal dimension of the Euclidean path integral measure. We
are led to propose an equivalence between the logarithmic ultraviolet
divergence of the Shannon entropy of this measure and the integrated Weyl
anomaly, reminiscent of a known relation between logarithmic divergences of
entanglement entropy and a central charge. It follows that the information
dimension associated with the Euclidean path integral measure satisfies a
c-theorem.Comment: 24 pages, 2 figures, factor of two error corrected, minor edit
Biologically Plausible Online Principal Component Analysis Without Recurrent Neural Dynamics
Artificial neural networks that learn to perform Principal Component Analysis
(PCA) and related tasks using strictly local learning rules have been
previously derived based on the principle of similarity matching: similar pairs
of inputs should map to similar pairs of outputs. However, the operation of
these networks (and of similar networks) requires a fixed-point iteration to
determine the output corresponding to a given input, which means that dynamics
must operate on a faster time scale than the variation of the input. Further,
during these fast dynamics such networks typically "disable" learning, updating
synaptic weights only once the fixed-point iteration has been resolved. Here,
we derive a network for PCA-based dimensionality reduction that avoids this
fast fixed-point iteration. The key novelty of our approach is a modification
of the similarity matching objective to encourage near-diagonality of a
synaptic weight matrix. We then approximately invert this matrix using a Taylor
series approximation, replacing the previous fast iterations. In the offline
setting, our algorithm corresponds to a dynamical system, the stability of
which we rigorously analyze. In the online setting (i.e., with stochastic
gradients), we map our algorithm to a familiar neural network architecture and
give numerical results showing that our method converges at a competitive rate.
The computational complexity per iteration of our online algorithm is linear in
the total degrees of freedom, which is in some sense optimal.Comment: 8 pages, 2 figure
A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization
Olshausen and Field (OF) proposed that neural computations in the primary
visual cortex (V1) can be partially modeled by sparse dictionary learning. By
minimizing the regularized representation error they derived an online
algorithm, which learns Gabor-filter receptive fields from a natural image
ensemble in agreement with physiological experiments. Whereas the OF algorithm
can be mapped onto the dynamics and synaptic plasticity in a single-layer
neural network, the derived learning rule is nonlocal - the synaptic weight
update depends on the activity of neurons other than just pre- and postsynaptic
ones - and hence biologically implausible. Here, to overcome this problem, we
derive sparse dictionary learning from a novel cost-function - a regularized
error of the symmetric factorization of the input's similarity matrix. Our
algorithm maps onto a neural network of the same architecture as OF but using
only biologically plausible local learning rules. When trained on natural
images our network learns Gabor-filter receptive fields and reproduces the
correlation among synaptic weights hard-wired in the OF network. Therefore,
online symmetric matrix factorization may serve as an algorithmic theory of
neural computation.Comment: 2014 Asilomar Conference on Signals, Systems and Computers. v2: fixed
a typo in equation 2
Blind nonnegative source separation using biological neural networks
Blind source separation, i.e. extraction of independent sources from a
mixture, is an important problem for both artificial and natural signal
processing. Here, we address a special case of this problem when sources (but
not the mixing matrix) are known to be nonnegative, for example, due to the
physical nature of the sources. We search for the solution to this problem that
can be implemented using biologically plausible neural networks. Specifically,
we consider the online setting where the dataset is streamed to a neural
network. The novelty of our approach is that we formulate blind nonnegative
source separation as a similarity matching problem and derive neural networks
from the similarity matching objective. Importantly, synaptic weights in our
networks are updated according to biologically plausible local learning rules.Comment: Accepted for publication in Neural Computatio
A Hebbian/Anti-Hebbian Neural Network for Linear Subspace Learning: A Derivation from Multidimensional Scaling of Streaming Data
Neural network models of early sensory processing typically reduce the
dimensionality of streaming input data. Such networks learn the principal
subspace, in the sense of principal component analysis (PCA), by adjusting
synaptic weights according to activity-dependent learning rules. When derived
from a principled cost function these rules are nonlocal and hence biologically
implausible. At the same time, biologically plausible local rules have been
postulated rather than derived from a principled cost function. Here, to bridge
this gap, we derive a biologically plausible network for subspace learning on
streaming data by minimizing a principled cost function. In a departure from
previous work, where cost was quantified by the representation, or
reconstruction, error, we adopt a multidimensional scaling (MDS) cost function
for streaming data. The resulting algorithm relies only on biologically
plausible Hebbian and anti-Hebbian local learning rules. In a stochastic
setting, synaptic weights converge to a stationary state which projects the
input data onto the principal subspace. If the data are generated by a
nonstationary distribution, the network can track the principal subspace. Thus,
our result makes a step towards an algorithmic theory of neural computation.Comment: Accepted for publication in Neural Computatio
Associative Memory in Iterated Overparameterized Sigmoid Autoencoders
Recent work showed that overparameterized autoencoders can be trained to
implement associative memory via iterative maps, when the trained input-output
Jacobian of the network has all of its eigenvalue norms strictly below one.
Here, we theoretically analyze this phenomenon for sigmoid networks by
leveraging recent developments in deep learning theory, especially the
correspondence between training neural networks in the infinite-width limit and
performing kernel regression with the Neural Tangent Kernel (NTK). We find that
overparameterized sigmoid autoencoders can have attractors in the NTK limit for
both training with a single example and multiple examples under certain
conditions. In particular, for multiple training examples, we find that the
norm of the largest Jacobian eigenvalue drops below one with increasing input
norm, leading to associative memory
- …