62,056 research outputs found

    Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

    Get PDF
    Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of ridge regression is added for recovering the parameters of last layer of neural networ

    Learning Two-layer Neural Networks with Symmetric Inputs

    Full text link
    We give a new algorithm for learning a two-layer neural network under a general class of input distributions. Assuming there is a ground-truth two-layer network y=Aσ(Wx)+ξ, y = A \sigma(Wx) + \xi, where A,WA,W are weight matrices, ξ\xi represents noise, and the number of neurons in the hidden layer is no larger than the input or output, our algorithm is guaranteed to recover the parameters A,WA,W of the ground-truth network. The only requirement on the input xx is that it is symmetric, which still allows highly complicated and structured input. Our algorithm is based on the method-of-moments framework and extends several results in tensor decompositions. We use spectral algorithms to avoid the complicated non-convex optimization in learning neural networks. Experiments show that our algorithm can robustly learn the ground-truth neural network with a small number of samples for many symmetric input distributions

    Parametrised polyconvex hyperelasticity with physics-augmented neural networks

    Full text link
    In the present work, neural networks are applied to formulate parametrised hyperelastic constitutive models. The models fulfill all common mechanical conditions of hyperelasticity by construction. In particular, partially input-convex neural network (pICNN) architectures are applied based on feed-forward neural networks. Receiving two different sets of input arguments, pICNNs are convex in one of them, while for the other, they represent arbitrary relationships which are not necessarily convex. In this way, the model can fulfill convexity conditions stemming from mechanical considerations without being too restrictive on the functional relationship in additional parameters, which may not necessarily be convex. Two different models are introduced, where one can represent arbitrary functional relationships in the additional parameters, while the other is monotonic in the additional parameters. As a first proof of concept, the model is calibrated to data generated with two differently parametrised analytical potentials, whereby three different pICNN architectures are investigated. In all cases, the proposed model shows excellent performance

    Principled Weight Initialisation for Input-Convex Neural Networks

    Full text link
    Input-Convex Neural Networks (ICNNs) are networks that guarantee convexity in their input-output mapping. These networks have been successfully applied for energy-based modelling, optimal transport problems and learning invariances. The convexity of ICNNs is achieved by using non-decreasing convex activation functions and non-negative weights. Because of these peculiarities, previous initialisation strategies, which implicitly assume centred weights, are not effective for ICNNs. By studying signal propagation through layers with non-negative weights, we are able to derive a principled weight initialisation for ICNNs. Concretely, we generalise signal propagation theory by removing the assumption that weights are sampled from a centred distribution. In a set of experiments, we demonstrate that our principled initialisation effectively accelerates learning in ICNNs and leads to better generalisation. Moreover, we find that, in contrast to common belief, ICNNs can be trained without skip-connections when initialised correctly. Finally, we apply ICNNs to a real-world drug discovery task and show that they allow for more effective molecular latent space exploration.Comment: Presented at NeurIPS 202

    Stochastic Training of Neural Networks via Successive Convex Approximations

    Full text link
    This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and Learning System
    corecore