Search CORE

2,125 research outputs found

Generic predictions of output probability based on complexities of inputs and outputs

Author: Dingle Kamaludin
Louis Ard A.
Pérez Guillermo Valle
Publication venue
Publication date: 02/10/2019
Field of study

For a broad class of input-output maps, arguments based on the coding theorem from algorithmic information theory (AIT) predict that simple (low Kolmogorov complexity) outputs are exponentially more likely to occur upon uniform random sampling of inputs than complex outputs are. Here, we derive probability bounds that are based on the complexities of the inputs as well as the outputs, rather than just on the complexities of the outputs. The more that outputs deviate from the coding theorem bound, the lower the complexity of their inputs. Our new bounds are tested for an RNA sequence to structure map, a finite state transducer and a perceptron. These results open avenues for AIT to be more widely used in physics.Comment: 6 pages plus supplementary material

arXiv.org e-Print Archive

Oxford University Research Archive

Do deep neural networks have an inbuilt Occam's razor?

Author: Louis Ard A.
Mingard Chris
Rees Henry
Valle-Pérez Guillermo
Publication venue
Publication date: 13/04/2023
Field of study

The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. When combined with the prior, this accurately predicts the posterior, measured for DNNs trained with stochastic gradient descent. This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of the number of functions with complexity, is a key to the success of DNNs

arXiv.org e-Print Archive

Is SGD a Bayesian sampler? Well, almost

Author: Louis Ard A.
Mingard Chris
Skalse Joar
Valle-Pérez Guillermo
Publication venue
Publication date: 24/10/2020
Field of study

Overparameterised deep neural networks (DNNs) are highly expressive and so can, in principle, generate almost any function that fits a training dataset with zero error. The vast majority of these functions will perform poorly on unseen data, and yet in practice DNNs often generalise remarkably well. This success suggests that a trained DNN must have a strong inductive bias towards functions with low generalisation error. Here we empirically investigate this inductive bias by calculating, for a range of architectures and datasets, the probability

P_{SGD}(f\mid S)

that an overparameterised DNN, trained with stochastic gradient descent (SGD) or one of its variants, converges on a function

f

consistent with a training set

S

. We also use Gaussian processes to estimate the Bayesian posterior probability

P_B(f\mid S)

that the DNN expresses

f

upon random sampling of its parameters, conditioned on

S

. Our main findings are that

P_{SGD}(f\mid S)

correlates remarkably well with

P_B(f\mid S)

and that

P_B(f\mid S)

is strongly biased towards low-error and low complexity functions. These results imply that strong inductive bias in the parameter-function map (which determines

P_B(f\mid S)

), rather than a special property of SGD, is the primary explanation for why DNNs generalise so well in the overparameterised regime. While our results suggest that the Bayesian posterior

P_B(f\mid S)

is the first order determinant of

P_{SGD}(f\mid S)

, there remain second order differences that are sensitive to hyperparameter tuning. A function probability picture, based on

P_{SGD}(f\mid S)

and/or

P_B(f\mid S)

, can shed new light on the way that variations in architecture or hyperparameter settings such as batch size, learning rate, and optimiser choice, affect DNN performance

arXiv.org e-Print Archive

Oxford University Research Archive

Double-descent curves in neural networks: a new perspective using Gaussian processes

Author: Cuenca Grau Bernardo
El Harzli Ouns
Louis Adriaan A
Valle-Pérez Guillermo
Publication venue
Publication date: 13/06/2021
Field of study

Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterised regime. Here we use a neural network Gaussian process (NNGP) which maps exactly to a fully connected network (FCN) in the infinite width limit, combined with techniques from random matrix theory, to calculate this generalisation behaviour, with a particular focus on the overparameterised regime. An advantage of our NNGP approach is that the analytical calculations are easier to interpret. We argue that neural network generalization performance improves in the overparameterised regime precisely because that is where they converge to their equivalent Gaussian process

arXiv.org e-Print Archive

Oxford University Research Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Double-descent curves in neural networks: a new perspective using Gaussian processes

Author: Cuenca Grau Bernardo
El Harzli Ouns
Louis Adriaan A
Valle-Pérez Guillermo
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 24/03/2024
Field of study

Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expressions allow us to explore the generalisation behavior of the corresponding kernel and GP regression. Furthermore, they offer a new interpretation of double-descent in terms of the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel

Oxford University Research Archive

Electronic transport and vibrational modes in the smallest molecular bridge: H2 in Pt nanocontacts

Author: A. J. Pérez-Jiménez
E. Louis
E. Louis
E. SanFabián
J. A. Vergés
J. Heurich
J. J. Palacios
J. Taylor
J.J. Palacios
J.J. Palacios
L.F. Pacios
M. Brandbyge
M.M. Hurley
N. Agrait
N.D. Lang
P.S. Damle
R.B. Ross
R.H.M. Smit
S.K. Nielsen
S.N. Yaliraki
Y. García
Publication venue: 'American Physical Society (APS)'
Publication date: 06/10/2003
Field of study

We present a state-of-the-art first-principles analysis of electronic transport in a Pt nanocontact in the presence of H2 which has been recently reported by Smit et al. in Nature 419, 906 (2002). Our results indicate that at the last stages of the breaking of the Pt nanocontact two basic forms of bridge involving H can appear. Our claim is, in contrast to Smit et al.'s, that the main conductance histogram peak at G approx 2e^2/h is not due to molecular H2, but to a complex Pt2H2 where the H2 molecule dissociates. A first-principles vibrational analysis that compares favorably with the experimental one also supports our claim .Comment: 5 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Neural networks are a priori biased towards Boolean functions with low entropy

Author: Louis Ard A.
Martínez-Rubio David
Mikulik Vladimir
Mingard Chris
Skalse Joar
Valle-Pérez Guillermo
Publication venue
Publication date: 02/01/2020
Field of study

Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability P(t) that it represents a Boolean function that classifies t points in {0,1}^n as 1 has a remarkably simple form: P(t) = 2^{-n} for 0\leq t < 2^n. Since a perceptron can express far fewer Boolean functions with small or large values of t (low entropy) than with intermediate values of t (high entropy) there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed t, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect

arXiv.org e-Print Archive

Fullerene-based molecular nanobridges: A first-principles study

Author: A. D. Becke
A. J. Pérez-Jiménez
C. Dekker
C. Joachim
C. Joachim
C. Joachim
C. Kergueris
D. Porath
E. Louis
E. Louis
G. Taraschi
H. Park
J. A. Vergés
J. C. Cuevas
J. D. Joannopoulos
J. G. Hou
J. G. Hou
J. I. Pascual
J. I. Pascual
J. J. Palacios
J. Taylor
L. Martín-Moreno
M. A. Reed
N. D. Lang
P. Ordejón
R. Fasel
S. Datta
S. N. Yaliraki
Publication venue: 'American Physical Society (APS)'
Publication date: 23/01/2001
Field of study

Building upon traditional quantum chemistry calculations, we have implemented an {\em ab-initio} method to study the electrical transport in nanocontacts. We illustrate our technique calculating the conductance of C

_{60}

molecules connected in various ways to Al electrodes characterized at the atomic level. Central to a correct estimate of the electrical current is a precise knowledge of the local charge transfer between molecule and metal which, in turn, guarantees the correct positioning of the Fermi level with respect to the molecular orbitals. Contrary to our expectations, ballistic transport seems to occur in this system.Comment: 4 pages in two-column forma

arXiv.org e-Print Archive

Crossref