2,233 research outputs found
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Training neural networks is a challenging non-convex optimization problem,
and backpropagation or gradient descent can get stuck in spurious local optima.
We propose a novel algorithm based on tensor decomposition for guaranteed
training of two-layer neural networks. We provide risk bounds for our proposed
method, with a polynomial sample complexity in the relevant parameters, such as
input dimension and number of neurons. While learning arbitrary target
functions is NP-hard, we provide transparent conditions on the function and the
input for learnability. Our training method is based on tensor decomposition,
which provably converges to the global optimum, under a set of mild
non-degeneracy conditions. It consists of simple embarrassingly parallel linear
and multi-linear operations, and is competitive with standard stochastic
gradient descent (SGD), in terms of computational complexity. Thus, we propose
a computationally efficient method with guaranteed risk bounds for training
neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of
ridge regression is added for recovering the parameters of last layer of
neural networ
Recovery Guarantees for One-hidden-layer Neural Networks
In this paper, we consider regression problems with one-hidden-layer neural
networks (1NNs). We distill some properties of activation functions that lead
to in the neighborhood of the ground-truth
parameters for the 1NN squared-loss objective. Most popular nonlinear
activation functions satisfy the distilled properties, including rectified
linear units (ReLUs), leaky ReLUs, squared ReLUs and sigmoids. For activation
functions that are also smooth, we show
guarantees of gradient descent under a resampling rule. For homogeneous
activations, we show tensor methods are able to initialize the parameters to
fall into the local strong convexity region. As a result, tensor initialization
followed by gradient descent is guaranteed to recover the ground truth with
sample complexity
and computational complexity for
smooth homogeneous activations with high probability, where is the
dimension of the input, () is the number of hidden nodes,
is a conditioning property of the ground-truth parameter matrix
between the input layer and the hidden layer, is the targeted
precision and is the number of samples. To the best of our knowledge, this
is the first work that provides recovery guarantees for 1NNs with both sample
complexity and computational complexity in the input
dimension and in the precision.Comment: ICML 201
- …