Search CORE

31 research outputs found

Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions

Author: Jacot Arthur
Publication venue
Publication date: 23/03/2023
Field of study

We show that the representation cost of fully connected neural networks with homogeneous nonlinearities - which describes the implicit bias in function space of networks with

L_2

-regularization or with losses such as the cross-entropy - converges as the depth of the network goes to infinity to a notion of rank over nonlinear functions. We then inquire under which conditions the global minima of the loss recover the `true' rank of the data: we show that for too large depths the global minimum will be approximately rank 1 (underestimating the rank); we then argue that there is a range of depths which grows with the number of datapoints where the true rank is recovered. Finally, we discuss the effect of the rank of a classifier on the topology of the resulting class boundaries and show that autoencoders with optimal nonlinear rank are naturally denoising

arXiv.org e-Print Archive

Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Author: Jacot Arthur
Publication venue
Publication date: 30/05/2023
Field of study

Previous work has shown that DNNs with large depth

L

and

L_{2}

-regularization are biased towards learning low-dimensional representations of the inputs, which can be interpreted as minimizing a notion of rank

R^{(0)}(f)

of the learned function

f

, conjectured to be the Bottleneck rank. We compute finite depth corrections to this result, revealing a measure

R^{(1)}

of regularity which bounds the pseudo-determinant of the Jacobian

\left|Jf(x)\right|_{+}

and is subadditive under composition and addition. This formalizes a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps, allowing the network to learn the `right' inner dimension. We also show how large learning rates also control the regularity of the learned function. Finally, we use these theoretical tools to prove the conjectured bottleneck structure in the learned features as

L\to\infty

: for large depths, almost all hidden representations concentrates around

R^{(0)}(f)

-dimensional representations. These limiting low-dimensional representation can be described using the second correction

R^{(2)}

arXiv.org e-Print Archive

Implicit bias of SGD in $L_{2}$ -regularized linear DNNs: One-way jumps from high to low rank

Author: Jacot Arthur
Wang Zihan
Publication venue
Publication date: 29/09/2023
Field of study

The

L_{2}

-regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks. In tasks such as matrix completion, the goal is to converge to the local minimum with the smallest rank that still fits the training data. While rank-underestimating minima can be avoided since they do not fit the data, GD might get stuck at rank-overestimating minima. We show that with SGD, there is always a probability to jump from a higher rank minimum to a lower rank one, but the probability of jumping back is zero. More precisely, we define a sequence of sets

B_{1}\subset B_{2}\subset\cdots\subset B_{R}

so that

B_{r}

contains all minima of rank

r

or less (and not more) that are absorbing for small enough ridge parameters

\lambda

and learning rates

\eta

: SGD has prob. 0 of leaving

B_{r}

, and from any starting point there is a non-zero prob. for SGD to go in

B_{r}

arXiv.org e-Print Archive

Disentangling feature and lazy training in deep neural networks

Author: Geiger Mario
Jacot Arthur
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 04/10/2020
Field of study

Two distinct limits for deep learning have been derived as the network width

h\rightarrow \infty

, depending on how the weights of the last layer scale with

h

. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel

\Theta

. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated with a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as

\alpha h^{-1/2}

at initialization. By varying

\alpha

and

h

, we probe the crossover between the two limits. We observe the previously identified regimes of lazy training and feature training. In the lazy-training regime, the dynamics is almost linear and the NTK barely changes after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that (i) The two regimes are separated by an

\alpha^*

that scales as

h^{-1/2}

. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks perform generally better in the lazy-training regime, unlike convolutional networks. (iii) In both regimes, the fluctuations

\delta F

induced on the learned function by initial conditions decay as

\delta F\sim 1/\sqrt{h}

, leading to a performance that increases with

h

. The same improvement can also be obtained at an intermediate width by ensemble-averaging several networks. (iv) In the feature-training regime we identify a time scale

t_1\sim\sqrt{h}\alpha

, such that for

t\ll t_1

the dynamics is linear.Comment: minor revision

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chinese bees

Author: Cockerell Theodore D. A. (Theodore Dru Alison), 1866-1948.
Jacot Arthur Paul, 1890-
Publication venue: New York City : American Museum of Natural History
Publication date: 01/01/1931
Field of study

3 p. : ill. ; 23 cm

American Museum of Natural History Scientific Publications