2 research outputs found
Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
We analyze the learning dynamics of infinitely wide neural networks with a
finite sized bottle-neck. Unlike the neural tangent kernel limit, a bottleneck
in an otherwise infinite width network al-lows data dependent feature learning
in its bottle-neck representation. We empirically show that a single bottleneck
in infinite networks dramatically accelerates training when compared to purely
in-finite networks, with an improved overall performance. We discuss the
acceleration phenomena by drawing similarities to infinitely wide deep linear
models, where the acceleration effect of a bottleneck can be understood
theoretically
On the asymptotics of wide networks with polynomial activations
We consider an existing conjecture addressing the asymptotic behavior of
neural networks in the large width limit. The results that follow from this
conjecture include tight bounds on the behavior of wide networks during
stochastic gradient descent, and a derivation of their finite-width dynamics.
We prove the conjecture for deep networks with polynomial activation functions,
greatly extending the validity of these results. Finally, we point out a
difference in the asymptotic behavior of networks with analytic (and
non-linear) activation functions and those with piecewise-linear activations
such as ReLU.Comment: 8+12 pages, 6 figures, 2 table