12 research outputs found
Replica method for eigenvalues of real Wishart product matrices
We show how the replica method can be used to compute the asymptotic
eigenvalue spectrum of a real Wishart product matrix. For unstructured factors,
this provides a compact, elementary derivation of a polynomial condition on the
Stieltjes transform first proved by M{\"u}ller [IEEE Trans. Inf. Theory. 48,
2086-2091 (2002)]. We then show how this computation can be extended to
ensembles where the factors have correlated rows. Finally, we derive polynomial
conditions on the average values of the minimum and maximum eigenvalues, which
match the results obtained by Akemann, Ipsen, and Kieburg [Phys. Rev. E 88,
052118 (2013)] for the complex Wishart product ensemble.Comment: 35 pages, 4 figure
Asymptotics of representation learning in finite Bayesian neural networks
Recent works have suggested that finite Bayesian neural networks may
sometimes outperform their infinite cousins because finite networks can
flexibly adapt their internal representations. However, our theoretical
understanding of how the learned hidden layer representations of finite
networks differ from the fixed representations of infinite networks remains
incomplete. Perturbative finite-width corrections to the network prior and
posterior have been studied, but the asymptotics of learned features have not
been fully characterized. Here, we argue that the leading finite-width
corrections to the average feature kernels for any Bayesian network with linear
readout and Gaussian likelihood have a largely universal form. We illustrate
this explicitly for three tractable network architectures: deep linear
fully-connected and convolutional networks, and networks with a single
nonlinear hidden layer. Our results begin to elucidate how task-relevant
learning signals shape the hidden layer representations of wide Bayesian neural
networks.Comment: 13+28 pages, 4 figures; v3: extensive revision with improved
exposition and new section on CNNs, accepted to NeurIPS 2021; v4: minor
updates to supplemen
Long Sequence Hopfield Memory
Sequence memory is an essential attribute of natural and artificial
intelligence that enables agents to encode, store, and retrieve complex
sequences of stimuli and actions. Computational models of sequence memory have
been proposed where recurrent Hopfield-like neural networks are trained with
temporally asymmetric Hebbian rules. However, these networks suffer from
limited sequence capacity (maximal length of the stored sequence) due to
interference between the memories. Inspired by recent work on Dense Associative
Memories, we expand the sequence capacity of these models by introducing a
nonlinear interaction term, enhancing separation between the patterns. We
derive novel scaling laws for sequence capacity with respect to network size,
significantly outperforming existing scaling laws for models based on
traditional Hopfield networks, and verify these theoretical results with
numerical simulation. Moreover, we introduce a generalized pseudoinverse rule
to recall sequences of highly correlated patterns. Finally, we extend this
model to store sequences with variable timing between states' transitions and
describe a biologically-plausible implementation, with connections to motor
neuroscience.Comment: NeurIPS 2023 Camera-Ready, 41 page
Learning curves for deep structured Gaussian feature models
In recent years, significant attention in deep learning theory has been
devoted to analyzing the generalization performance of models with multiple
layers of Gaussian random features. However, few works have considered the
effect of feature anisotropy; most assume that features are generated using
independent and identically distributed Gaussian weights. Here, we derive
learning curves for models with many layers of structured Gaussian features. We
show that allowing correlations between the rows of the first layer of features
can aid generalization, while structure in later layers is generally
detrimental. Our results shed light on how weight structure affects
generalization in a simple class of solvable models.Comment: 28 pages, 3 figure
Tripod gait example data.
MATLAB version 7.3 .mat file, created using MATLAB 9.4
Dataset of responses of freely-walking flies to translating random dot visual stimuli.
MATLAB version 7.3 .mat file, created using MATLAB 9.4