Search CORE

46,866 research outputs found

Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization

Author: Chen Wei
Liu Tie-Yan
Yi Mingyang
Yu Da
Zhang Huishuai
Publication venue
Publication date: 12/07/2019
Field of study

ResNet structure has achieved great empirical success since its debut. Recent work established the convergence of learning over-parameterized ResNet with a scaling factor

\tau=1/L

on the residual branch where

L

is the network depth. However, it is not clear how learning ResNet behaves for other values of

\tau

. In this paper, we fully characterize the convergence theory of gradient descent for learning over-parameterized ResNet with different values of

\tau

. Specifically, with hiding logarithmic factor and constant coefficients, we show that for

\tau\le 1/\sqrt{L}

gradient descent is guaranteed to converge to the global minma, and especially when

\tau\le 1/L

the convergence is irrelevant of the network depth. Conversely, we show that for

\tau>L^{-\frac{1}{2}+c}

, the forward output grows at least with rate

L^c

in expectation and then the learning fails because of gradient explosion for large

L

. This means the bound

\tau\le 1/\sqrt{L}

is sharp for learning ResNet with arbitrary depth. To the best of our knowledge, this is the first work that studies learning ResNet with full range of

\tau

.Comment: 31 page

arXiv.org e-Print Archive