7 research outputs found
Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks
Randomly initialized neural networks are known to become harder to train with
increasing depth, unless architectural enhancements like residual connections
and batch normalization are used. We here investigate this phenomenon by
revisiting the connection between random initialization in deep networks and
spectral instabilities in products of random matrices. Given the rich
literature on random matrices, it is not surprising to find that the rank of
the intermediate representations in unnormalized networks collapses quickly
with depth. In this work we highlight the fact that batch normalization is an
effective strategy to avoid rank collapse for both linear and ReLU networks.
Leveraging tools from Markov chain theory, we derive a meaningful lower rank
bound in deep linear networks. Empirically, we also demonstrate that this rank
robustness generalizes to ReLU nets. Finally, we conduct an extensive set of
experiments on real-world data sets, which confirm that rank stability is
indeed a crucial condition for training modern-day deep neural architectures