Search CORE

8 research outputs found

Nonparametric Learning of Two-Layer ReLU Residual Units

Author: Cohen Shay B.
He Linyun
Lyu Chunchuan
Wang Zhunxuan
Publication venue
Publication date: 11/06/2021
Field of study

We describe an algorithm that learns two-layer residual units with rectified linear unit (ReLU) activation: suppose the input

\mathbf{x}

is from a distribution with support space

\mathbb{R}^d

and the ground-truth generative model is such a residual unit, given by

\mathbf{y}= \boldsymbol{B}^\ast\left[\left(\boldsymbol{A}^\ast\mathbf{x}\right)^+ + \mathbf{x}\right]\text{,}

where ground-truth network parameters

\boldsymbol{A}^\ast \in \mathbb{R}^{d\times d}

is a nonnegative full-rank matrix and

\boldsymbol{B}^\ast \in \mathbb{R}^{m\times d}

is full-rank with

m \geq d

and for

\mathbf{c} \in \mathbb{R}^d

[\mathbf{c}^{+}]_i = \max\{0, c_i\}

. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. Following this objective landscape, learning residual units from finite samples can be formulated using convex optimization of a nonparametric function: for each layer, we first formulate the corresponding empirical risk minimization (ERM) as a positive semi-definite quadratic program (QP), then we show the solution space of the QP can be equivalently determined by a set of linear inequalities, which can then be efficiently solved by linear programming (LP). We further prove the statistical strong consistency of our algorithm, and demonstrate the robustness and sample efficiency of our algorithm by experiments

arXiv.org e-Print Archive

Edinburgh Research Explorer

Learning Two-layer Neural Networks with Symmetric Inputs

Author: Ge Rong
Kuditipudi Rohith
Li Zhize
Wang Xiang
Publication venue
Publication date: 03/02/2019
Field of study

We give a new algorithm for learning a two-layer neural network under a general class of input distributions. Assuming there is a ground-truth two-layer network

y = A \sigma(Wx) + \xi,

where

A,W

are weight matrices,

\xi

represents noise, and the number of neurons in the hidden layer is no larger than the input or output, our algorithm is guaranteed to recover the parameters

A,W

of the ground-truth network. The only requirement on the input

x

is that it is symmetric, which still allows highly complicated and structured input. Our algorithm is based on the method-of-moments framework and extends several results in tensor decompositions. We use spectral algorithms to avoid the complicated non-convex optimization in learning neural networks. Experiments show that our algorithm can robustly learn the ground-truth neural network with a small number of samples for many symmetric input distributions

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University