2,233 research outputs found

    Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

    Get PDF
    Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of ridge regression is added for recovering the parameters of last layer of neural networ

    Recovery Guarantees for One-hidden-layer Neural Networks

    Full text link
    In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to local strong convexity\mathit{local~strong~convexity} in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective. Most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (ReLUs), leaky ReLUs, squared ReLUs and sigmoids. For activation functions that are also smooth, we show local linear convergence\mathit{local~linear~convergence} guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity dlog(1/ϵ)poly(k,λ) d \cdot \log(1/\epsilon) \cdot \mathrm{poly}(k,\lambda ) and computational complexity ndpoly(k,λ)n\cdot d \cdot \mathrm{poly}(k,\lambda) for smooth homogeneous activations with high probability, where dd is the dimension of the input, kk (kdk\leq d) is the number of hidden nodes, λ\lambda is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer, ϵ\epsilon is the targeted precision and nn is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity linear\mathit{linear} in the input dimension and logarithmic\mathit{logarithmic} in the precision.Comment: ICML 201
    corecore