7 research outputs found
A Spectral Theory of Neural Prediction and Alignment
The representations of neural networks are often compared to those of
biological systems by performing regression between the neural network
responses and those measured from biological systems. Many different
state-of-the-art deep neural networks yield similar neural predictions, but it
remains unclear how to differentiate among models that perform equally well at
predicting neural responses. To gain insight into this, we use a recent
theoretical framework that relates the generalization error from regression to
the spectral bias of the model activations and the alignment of the neural
responses onto the learnable subspace of the model. We extend this theory to
the case of regression between model activations and neural responses, and
define geometrical properties describing the error embedding geometry. We test
a large number of deep neural networks that predict visual cortical activity
and show that there are multiple types of geometries that result in low neural
prediction error as measured via regression. The work demonstrates that
carefully decomposing representational metrics can provide interpretability of
how models are capturing neural activity and points the way towards improved
models of neural activity.Comment: First two authors contributed equally. To appear at NeurIPS 202
Asymptotics of representation learning in finite Bayesian neural networks
Recent works have suggested that finite Bayesian neural networks may
sometimes outperform their infinite cousins because finite networks can
flexibly adapt their internal representations. However, our theoretical
understanding of how the learned hidden layer representations of finite
networks differ from the fixed representations of infinite networks remains
incomplete. Perturbative finite-width corrections to the network prior and
posterior have been studied, but the asymptotics of learned features have not
been fully characterized. Here, we argue that the leading finite-width
corrections to the average feature kernels for any Bayesian network with linear
readout and Gaussian likelihood have a largely universal form. We illustrate
this explicitly for three tractable network architectures: deep linear
fully-connected and convolutional networks, and networks with a single
nonlinear hidden layer. Our results begin to elucidate how task-relevant
learning signals shape the hidden layer representations of wide Bayesian neural
networks.Comment: 13+28 pages, 4 figures; v3: extensive revision with improved
exposition and new section on CNNs, accepted to NeurIPS 2021; v4: minor
updates to supplemen
Strong localization in a suspended monolayer graphene by intervalley scattering
A gate induced insulating behavior at zero magnetic field is observed in a
high mobility suspended monolayer graphene near the charge neutrality point.
The graphene device initially cleaned by a current annealing technique was
undergone a thermo-pressure cycle to allow short range impurities to be
adsorbed directly on the ultra clean graphene surface. The adsorption process
generated a strong temperature and electric field dependent behavior on the
conductance of the graphene device. The conductance around the neutrality point
is observed to be reduced from around at 30 K to at 20
mK. A direct transition from insulator to quantum Hall conductor within
accompanied by broken-symmetry-induced plateaux
confirms the presence of intervalley scatterers.Comment: 4 pages, 4 figure
Bandwidth Enables Generalization in Quantum Kernel Models
Quantum computers are known to provide speedups over classical
state-of-the-art machine learning methods in some specialized settings. For
example, quantum kernel methods have been shown to provide an exponential
speedup on a learning version of the discrete logarithm problem. Understanding
the generalization of quantum models is essential to realizing similar speedups
on problems of practical interest. Recent results demonstrate that
generalization is hindered by the exponential size of the quantum feature
space. Although these results suggest that quantum models cannot generalize
when the number of qubits is large, in this paper we show that these results
rely on overly restrictive assumptions. We consider a wider class of models by
varying a hyperparameter that we call quantum kernel bandwidth. We analyze the
large-qubit limit and provide explicit formulas for the generalization of a
quantum model that can be solved in closed form. Specifically, we show that
changing the value of the bandwidth can take a model from provably not being
able to generalize to any target function to good generalization for
well-aligned targets. Our analysis shows how the bandwidth controls the
spectrum of the kernel integral operator and thereby the inductive bias of the
model. We demonstrate empirically that our theory correctly predicts how
varying the bandwidth affects generalization of quantum models on challenging
datasets, including those far outside our theoretical assumptions. We discuss
the implications of our results for quantum advantage in machine learning