1,791 research outputs found
Double-descent curves in neural networks: a new perspective using Gaussian processes
Double-descent curves in neural networks describe the phenomenon that the
generalisation error initially descends with increasing parameters, then grows
after reaching an optimal number of parameters which is less than the number of
data points, but then descends again in the overparameterised regime. Here we
use a neural network Gaussian process (NNGP) which maps exactly to a fully
connected network (FCN) in the infinite width limit, combined with techniques
from random matrix theory, to calculate this generalisation behaviour, with a
particular focus on the overparameterised regime. An advantage of our NNGP
approach is that the analytical calculations are easier to interpret. We argue
that neural network generalization performance improves in the
overparameterised regime precisely because that is where they converge to their
equivalent Gaussian process
Do deep neural networks have an inbuilt Occam's razor?
The remarkable performance of overparameterized deep neural networks (DNNs)
must arise from an interplay between network architecture, training algorithms,
and structure in the data. To disentangle these three components, we apply a
Bayesian picture, based on the functions expressed by a DNN, to supervised
learning. The prior over functions is determined by the network, and is varied
by exploiting a transition between ordered and chaotic regimes. For Boolean
function classification, we approximate the likelihood using the error spectrum
of functions on data. When combined with the prior, this accurately predicts
the posterior, measured for DNNs trained with stochastic gradient descent. This
analysis reveals that structured data, combined with an intrinsic Occam's
razor-like inductive bias towards (Kolmogorov) simple functions that is strong
enough to counteract the exponential growth of the number of functions with
complexity, is a key to the success of DNNs
Is SGD a Bayesian sampler? Well, almost
Overparameterised deep neural networks (DNNs) are highly expressive and so
can, in principle, generate almost any function that fits a training dataset
with zero error. The vast majority of these functions will perform poorly on
unseen data, and yet in practice DNNs often generalise remarkably well. This
success suggests that a trained DNN must have a strong inductive bias towards
functions with low generalisation error. Here we empirically investigate this
inductive bias by calculating, for a range of architectures and datasets, the
probability that an overparameterised DNN, trained with
stochastic gradient descent (SGD) or one of its variants, converges on a
function consistent with a training set . We also use Gaussian processes
to estimate the Bayesian posterior probability that the DNN
expresses upon random sampling of its parameters, conditioned on .
Our main findings are that correlates remarkably well with
and that is strongly biased towards low-error and
low complexity functions. These results imply that strong inductive bias in the
parameter-function map (which determines ), rather than a special
property of SGD, is the primary explanation for why DNNs generalise so well in
the overparameterised regime.
While our results suggest that the Bayesian posterior is the
first order determinant of , there remain second order
differences that are sensitive to hyperparameter tuning. A function probability
picture, based on and/or , can shed new light
on the way that variations in architecture or hyperparameter settings such as
batch size, learning rate, and optimiser choice, affect DNN performance
Electronic transport and vibrational modes in the smallest molecular bridge: H2 in Pt nanocontacts
We present a state-of-the-art first-principles analysis of electronic
transport in a Pt nanocontact in the presence of H2 which has been recently
reported by Smit et al. in Nature 419, 906 (2002). Our results indicate that at
the last stages of the breaking of the Pt nanocontact two basic forms of bridge
involving H can appear. Our claim is, in contrast to Smit et al.'s, that the
main conductance histogram peak at G approx 2e^2/h is not due to molecular H2,
but to a complex Pt2H2 where the H2 molecule dissociates. A first-principles
vibrational analysis that compares favorably with the experimental one also
supports our claim .Comment: 5 pages, 3 figure
Neural networks are a priori biased towards Boolean functions with low entropy
Understanding the inductive bias of neural networks is critical to explaining
their ability to generalise. Here, for one of the simplest neural networks -- a
single-layer perceptron with n input neurons, one output neuron, and no
threshold bias term -- we prove that upon random initialisation of weights, the
a priori probability P(t) that it represents a Boolean function that classifies
t points in {0,1}^n as 1 has a remarkably simple form: P(t) = 2^{-n} for 0\leq
t < 2^n.
Since a perceptron can express far fewer Boolean functions with small or
large values of t (low entropy) than with intermediate values of t (high
entropy) there is, on average, a strong intrinsic a-priori bias towards
individual functions with low entropy. Furthermore, within a class of functions
with fixed t, we often observe a further intrinsic bias towards functions of
lower complexity. Finally, we prove that, regardless of the distribution of
inputs, the bias towards low entropy becomes monotonically stronger upon adding
ReLU layers, and empirically show that increasing the variance of the bias term
has a similar effect
Fullerene-based molecular nanobridges: A first-principles study
Building upon traditional quantum chemistry calculations, we have implemented
an {\em ab-initio} method to study the electrical transport in nanocontacts. We
illustrate our technique calculating the conductance of C molecules
connected in various ways to Al electrodes characterized at the atomic level.
Central to a correct estimate of the electrical current is a precise knowledge
of the local charge transfer between molecule and metal which, in turn,
guarantees the correct positioning of the Fermi level with respect to the
molecular orbitals. Contrary to our expectations, ballistic transport seems to
occur in this system.Comment: 4 pages in two-column forma
Deep Learning versus Classical Regression for Brain Tumor Patient Survival Prediction
Deep learning for regression tasks on medical imaging data has shown
promising results. However, compared to other approaches, their power is
strongly linked to the dataset size. In this study, we evaluate
3D-convolutional neural networks (CNNs) and classical regression methods with
hand-crafted features for survival time regression of patients with high grade
brain tumors. The tested CNNs for regression showed promising but unstable
results. The best performing deep learning approach reached an accuracy of
51.5% on held-out samples of the training set. All tested deep learning
experiments were outperformed by a Support Vector Classifier (SVC) using 30
radiomic features. The investigated features included intensity, shape,
location and deep features. The submitted method to the BraTS 2018 survival
prediction challenge is an ensemble of SVCs, which reached a cross-validated
accuracy of 72.2% on the BraTS 2018 training set, 57.1% on the validation set,
and 42.9% on the testing set. The results suggest that more training data is
necessary for a stable performance of a CNN model for direct regression from
magnetic resonance images, and that non-imaging clinical patient information is
crucial along with imaging information.Comment: Contribution to The International Multimodal Brain Tumor Segmentation
(BraTS) Challenge 2018, survival prediction tas
First-principles phase-coherent transport in metallic nanotubes with realistic contacts
We present first-principles calculations of phase coherent electron transport
in a carbon nanotube (CNT) with realistic contacts. We focus on the zero-bias
response of open metallic CNT's considering two archetypal contact geometries
(end and side) and three commonly used metals as electrodes (Al, Au, and Ti).
Our ab-initio electrical transport calculations make, for the first time,
quantitative predictions on the contact transparency and the transport
properties of finite metallic CNT's. Al and Au turn out to make poor contacts
while Ti is the best option of the three. Additional information on the CNT
band mixing at the contacts is also obtained.Comment: 5 pages (two-column format
- …