62,056 research outputs found
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Training neural networks is a challenging non-convex optimization problem,
and backpropagation or gradient descent can get stuck in spurious local optima.
We propose a novel algorithm based on tensor decomposition for guaranteed
training of two-layer neural networks. We provide risk bounds for our proposed
method, with a polynomial sample complexity in the relevant parameters, such as
input dimension and number of neurons. While learning arbitrary target
functions is NP-hard, we provide transparent conditions on the function and the
input for learnability. Our training method is based on tensor decomposition,
which provably converges to the global optimum, under a set of mild
non-degeneracy conditions. It consists of simple embarrassingly parallel linear
and multi-linear operations, and is competitive with standard stochastic
gradient descent (SGD), in terms of computational complexity. Thus, we propose
a computationally efficient method with guaranteed risk bounds for training
neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of
ridge regression is added for recovering the parameters of last layer of
neural networ
Learning Two-layer Neural Networks with Symmetric Inputs
We give a new algorithm for learning a two-layer neural network under a
general class of input distributions. Assuming there is a ground-truth
two-layer network where are weight
matrices, represents noise, and the number of neurons in the hidden layer
is no larger than the input or output, our algorithm is guaranteed to recover
the parameters of the ground-truth network. The only requirement on the
input is that it is symmetric, which still allows highly complicated and
structured input.
Our algorithm is based on the method-of-moments framework and extends several
results in tensor decompositions. We use spectral algorithms to avoid the
complicated non-convex optimization in learning neural networks. Experiments
show that our algorithm can robustly learn the ground-truth neural network with
a small number of samples for many symmetric input distributions
Parametrised polyconvex hyperelasticity with physics-augmented neural networks
In the present work, neural networks are applied to formulate parametrised
hyperelastic constitutive models. The models fulfill all common mechanical
conditions of hyperelasticity by construction. In particular, partially
input-convex neural network (pICNN) architectures are applied based on
feed-forward neural networks. Receiving two different sets of input arguments,
pICNNs are convex in one of them, while for the other, they represent arbitrary
relationships which are not necessarily convex. In this way, the model can
fulfill convexity conditions stemming from mechanical considerations without
being too restrictive on the functional relationship in additional parameters,
which may not necessarily be convex. Two different models are introduced, where
one can represent arbitrary functional relationships in the additional
parameters, while the other is monotonic in the additional parameters. As a
first proof of concept, the model is calibrated to data generated with two
differently parametrised analytical potentials, whereby three different pICNN
architectures are investigated. In all cases, the proposed model shows
excellent performance
Principled Weight Initialisation for Input-Convex Neural Networks
Input-Convex Neural Networks (ICNNs) are networks that guarantee convexity in
their input-output mapping. These networks have been successfully applied for
energy-based modelling, optimal transport problems and learning invariances.
The convexity of ICNNs is achieved by using non-decreasing convex activation
functions and non-negative weights. Because of these peculiarities, previous
initialisation strategies, which implicitly assume centred weights, are not
effective for ICNNs. By studying signal propagation through layers with
non-negative weights, we are able to derive a principled weight initialisation
for ICNNs. Concretely, we generalise signal propagation theory by removing the
assumption that weights are sampled from a centred distribution. In a set of
experiments, we demonstrate that our principled initialisation effectively
accelerates learning in ICNNs and leads to better generalisation. Moreover, we
find that, in contrast to common belief, ICNNs can be trained without
skip-connections when initialised correctly. Finally, we apply ICNNs to a
real-world drug discovery task and show that they allow for more effective
molecular latent space exploration.Comment: Presented at NeurIPS 202
Stochastic Training of Neural Networks via Successive Convex Approximations
This paper proposes a new family of algorithms for training neural networks
(NNs). These are based on recent developments in the field of non-convex
optimization, going under the general name of successive convex approximation
(SCA) techniques. The basic idea is to iteratively replace the original
(non-convex, highly dimensional) learning problem with a sequence of (strongly
convex) approximations, which are both accurate and simple to optimize.
Differently from similar ideas (e.g., quasi-Newton algorithms), the
approximations can be constructed using only first-order information of the
neural network function, in a stochastic fashion, while exploiting the overall
structure of the learning problem for a faster convergence. We discuss several
use cases, based on different choices for the loss function (e.g., squared loss
and cross-entropy loss), and for the regularization of the NN's weights. We
experiment on several medium-sized benchmark problems, and on a large-scale
dataset involving simulated physical data. The results show how the algorithm
outperforms state-of-the-art techniques, providing faster convergence to a
better minimum. Additionally, we show how the algorithm can be easily
parallelized over multiple computational units without hindering its
performance. In particular, each computational unit can optimize a tailored
surrogate function defined on a randomly assigned subset of the input
variables, whose dimension can be selected depending entirely on the available
computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and
Learning System
- …