Previous work has shown that DNNs with large depth L and
L2β-regularization are biased towards learning low-dimensional
representations of the inputs, which can be interpreted as minimizing a notion
of rank R(0)(f) of the learned function f, conjectured to be the
Bottleneck rank. We compute finite depth corrections to this result, revealing
a measure R(1) of regularity which bounds the pseudo-determinant of the
Jacobian β£Jf(x)β£+β and is subadditive under composition and
addition. This formalizes a balance between learning low-dimensional
representations and minimizing complexity/irregularity in the feature maps,
allowing the network to learn the `right' inner dimension. We also show how
large learning rates also control the regularity of the learned function.
Finally, we use these theoretical tools to prove the conjectured bottleneck
structure in the learned features as Lββ: for large depths, almost all
hidden representations concentrates around R(0)(f)-dimensional
representations. These limiting low-dimensional representation can be described
using the second correction R(2)