8 research outputs found
Nonparametric Learning of Two-Layer ReLU Residual Units
We describe an algorithm that learns two-layer residual units with rectified
linear unit (ReLU) activation: suppose the input is from a
distribution with support space and the ground-truth generative
model is such a residual unit, given by where ground-truth network parameters
is a nonnegative full-rank
matrix and is full-rank with
and for , . We design layer-wise objectives as functionals whose analytic
minimizers express the exact ground-truth network in terms of its parameters
and nonlinearities. Following this objective landscape, learning residual units
from finite samples can be formulated using convex optimization of a
nonparametric function: for each layer, we first formulate the corresponding
empirical risk minimization (ERM) as a positive semi-definite quadratic program
(QP), then we show the solution space of the QP can be equivalently determined
by a set of linear inequalities, which can then be efficiently solved by linear
programming (LP). We further prove the statistical strong consistency of our
algorithm, and demonstrate the robustness and sample efficiency of our
algorithm by experiments
Learning Two-layer Neural Networks with Symmetric Inputs
We give a new algorithm for learning a two-layer neural network under a
general class of input distributions. Assuming there is a ground-truth
two-layer network where are weight
matrices, represents noise, and the number of neurons in the hidden layer
is no larger than the input or output, our algorithm is guaranteed to recover
the parameters of the ground-truth network. The only requirement on the
input is that it is symmetric, which still allows highly complicated and
structured input.
Our algorithm is based on the method-of-moments framework and extends several
results in tensor decompositions. We use spectral algorithms to avoid the
complicated non-convex optimization in learning neural networks. Experiments
show that our algorithm can robustly learn the ground-truth neural network with
a small number of samples for many symmetric input distributions