2,125 research outputs found
Generic predictions of output probability based on complexities of inputs and outputs
For a broad class of input-output maps, arguments based on the coding theorem
from algorithmic information theory (AIT) predict that simple (low Kolmogorov
complexity) outputs are exponentially more likely to occur upon uniform random
sampling of inputs than complex outputs are. Here, we derive probability bounds
that are based on the complexities of the inputs as well as the outputs, rather
than just on the complexities of the outputs. The more that outputs deviate
from the coding theorem bound, the lower the complexity of their inputs. Our
new bounds are tested for an RNA sequence to structure map, a finite state
transducer and a perceptron. These results open avenues for AIT to be more
widely used in physics.Comment: 6 pages plus supplementary material
Do deep neural networks have an inbuilt Occam's razor?
The remarkable performance of overparameterized deep neural networks (DNNs)
must arise from an interplay between network architecture, training algorithms,
and structure in the data. To disentangle these three components, we apply a
Bayesian picture, based on the functions expressed by a DNN, to supervised
learning. The prior over functions is determined by the network, and is varied
by exploiting a transition between ordered and chaotic regimes. For Boolean
function classification, we approximate the likelihood using the error spectrum
of functions on data. When combined with the prior, this accurately predicts
the posterior, measured for DNNs trained with stochastic gradient descent. This
analysis reveals that structured data, combined with an intrinsic Occam's
razor-like inductive bias towards (Kolmogorov) simple functions that is strong
enough to counteract the exponential growth of the number of functions with
complexity, is a key to the success of DNNs
Is SGD a Bayesian sampler? Well, almost
Overparameterised deep neural networks (DNNs) are highly expressive and so
can, in principle, generate almost any function that fits a training dataset
with zero error. The vast majority of these functions will perform poorly on
unseen data, and yet in practice DNNs often generalise remarkably well. This
success suggests that a trained DNN must have a strong inductive bias towards
functions with low generalisation error. Here we empirically investigate this
inductive bias by calculating, for a range of architectures and datasets, the
probability that an overparameterised DNN, trained with
stochastic gradient descent (SGD) or one of its variants, converges on a
function consistent with a training set . We also use Gaussian processes
to estimate the Bayesian posterior probability that the DNN
expresses upon random sampling of its parameters, conditioned on .
Our main findings are that correlates remarkably well with
and that is strongly biased towards low-error and
low complexity functions. These results imply that strong inductive bias in the
parameter-function map (which determines ), rather than a special
property of SGD, is the primary explanation for why DNNs generalise so well in
the overparameterised regime.
While our results suggest that the Bayesian posterior is the
first order determinant of , there remain second order
differences that are sensitive to hyperparameter tuning. A function probability
picture, based on and/or , can shed new light
on the way that variations in architecture or hyperparameter settings such as
batch size, learning rate, and optimiser choice, affect DNN performance
Double-descent curves in neural networks: a new perspective using Gaussian processes
Double-descent curves in neural networks describe the phenomenon that the
generalisation error initially descends with increasing parameters, then grows
after reaching an optimal number of parameters which is less than the number of
data points, but then descends again in the overparameterised regime. Here we
use a neural network Gaussian process (NNGP) which maps exactly to a fully
connected network (FCN) in the infinite width limit, combined with techniques
from random matrix theory, to calculate this generalisation behaviour, with a
particular focus on the overparameterised regime. An advantage of our NNGP
approach is that the analytical calculations are easier to interpret. We argue
that neural network generalization performance improves in the
overparameterised regime precisely because that is where they converge to their
equivalent Gaussian process
Double-descent curves in neural networks: a new perspective using Gaussian processes
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expressions allow us to explore the generalisation behavior of the corresponding kernel and GP regression. Furthermore, they offer a new interpretation of double-descent in terms of the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel
Electronic transport and vibrational modes in the smallest molecular bridge: H2 in Pt nanocontacts
We present a state-of-the-art first-principles analysis of electronic
transport in a Pt nanocontact in the presence of H2 which has been recently
reported by Smit et al. in Nature 419, 906 (2002). Our results indicate that at
the last stages of the breaking of the Pt nanocontact two basic forms of bridge
involving H can appear. Our claim is, in contrast to Smit et al.'s, that the
main conductance histogram peak at G approx 2e^2/h is not due to molecular H2,
but to a complex Pt2H2 where the H2 molecule dissociates. A first-principles
vibrational analysis that compares favorably with the experimental one also
supports our claim .Comment: 5 pages, 3 figure
Neural networks are a priori biased towards Boolean functions with low entropy
Understanding the inductive bias of neural networks is critical to explaining
their ability to generalise. Here, for one of the simplest neural networks -- a
single-layer perceptron with n input neurons, one output neuron, and no
threshold bias term -- we prove that upon random initialisation of weights, the
a priori probability P(t) that it represents a Boolean function that classifies
t points in {0,1}^n as 1 has a remarkably simple form: P(t) = 2^{-n} for 0\leq
t < 2^n.
Since a perceptron can express far fewer Boolean functions with small or
large values of t (low entropy) than with intermediate values of t (high
entropy) there is, on average, a strong intrinsic a-priori bias towards
individual functions with low entropy. Furthermore, within a class of functions
with fixed t, we often observe a further intrinsic bias towards functions of
lower complexity. Finally, we prove that, regardless of the distribution of
inputs, the bias towards low entropy becomes monotonically stronger upon adding
ReLU layers, and empirically show that increasing the variance of the bias term
has a similar effect
Fullerene-based molecular nanobridges: A first-principles study
Building upon traditional quantum chemistry calculations, we have implemented
an {\em ab-initio} method to study the electrical transport in nanocontacts. We
illustrate our technique calculating the conductance of C molecules
connected in various ways to Al electrodes characterized at the atomic level.
Central to a correct estimate of the electrical current is a precise knowledge
of the local charge transfer between molecule and metal which, in turn,
guarantees the correct positioning of the Fermi level with respect to the
molecular orbitals. Contrary to our expectations, ballistic transport seems to
occur in this system.Comment: 4 pages in two-column forma
- …