7 research outputs found

    A Spectral Theory of Neural Prediction and Alignment

    Full text link
    The representations of neural networks are often compared to those of biological systems by performing regression between the neural network responses and those measured from biological systems. Many different state-of-the-art deep neural networks yield similar neural predictions, but it remains unclear how to differentiate among models that perform equally well at predicting neural responses. To gain insight into this, we use a recent theoretical framework that relates the generalization error from regression to the spectral bias of the model activations and the alignment of the neural responses onto the learnable subspace of the model. We extend this theory to the case of regression between model activations and neural responses, and define geometrical properties describing the error embedding geometry. We test a large number of deep neural networks that predict visual cortical activity and show that there are multiple types of geometries that result in low neural prediction error as measured via regression. The work demonstrates that carefully decomposing representational metrics can provide interpretability of how models are capturing neural activity and points the way towards improved models of neural activity.Comment: First two authors contributed equally. To appear at NeurIPS 202

    Asymptotics of representation learning in finite Bayesian neural networks

    Full text link
    Recent works have suggested that finite Bayesian neural networks may sometimes outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width corrections to the network prior and posterior have been studied, but the asymptotics of learned features have not been fully characterized. Here, we argue that the leading finite-width corrections to the average feature kernels for any Bayesian network with linear readout and Gaussian likelihood have a largely universal form. We illustrate this explicitly for three tractable network architectures: deep linear fully-connected and convolutional networks, and networks with a single nonlinear hidden layer. Our results begin to elucidate how task-relevant learning signals shape the hidden layer representations of wide Bayesian neural networks.Comment: 13+28 pages, 4 figures; v3: extensive revision with improved exposition and new section on CNNs, accepted to NeurIPS 2021; v4: minor updates to supplemen

    Strong localization in a suspended monolayer graphene by intervalley scattering

    Full text link
    A gate induced insulating behavior at zero magnetic field is observed in a high mobility suspended monolayer graphene near the charge neutrality point. The graphene device initially cleaned by a current annealing technique was undergone a thermo-pressure cycle to allow short range impurities to be adsorbed directly on the ultra clean graphene surface. The adsorption process generated a strong temperature and electric field dependent behavior on the conductance of the graphene device. The conductance around the neutrality point is observed to be reduced from around e2/he^2/h at 30 K to ∼0.01 e2/h\sim0.01~e^2/h at 20 mK. A direct transition from insulator to quantum Hall conductor within ≈0.4 T\approx0.4~T accompanied by broken-symmetry-induced ν=0,±1\nu=0,\pm1 plateaux confirms the presence of intervalley scatterers.Comment: 4 pages, 4 figure

    Bandwidth Enables Generalization in Quantum Kernel Models

    Full text link
    Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical interest. Recent results demonstrate that generalization is hindered by the exponential size of the quantum feature space. Although these results suggest that quantum models cannot generalize when the number of qubits is large, in this paper we show that these results rely on overly restrictive assumptions. We consider a wider class of models by varying a hyperparameter that we call quantum kernel bandwidth. We analyze the large-qubit limit and provide explicit formulas for the generalization of a quantum model that can be solved in closed form. Specifically, we show that changing the value of the bandwidth can take a model from provably not being able to generalize to any target function to good generalization for well-aligned targets. Our analysis shows how the bandwidth controls the spectrum of the kernel integral operator and thereby the inductive bias of the model. We demonstrate empirically that our theory correctly predicts how varying the bandwidth affects generalization of quantum models on challenging datasets, including those far outside our theoretical assumptions. We discuss the implications of our results for quantum advantage in machine learning
    corecore