In the pursuit of explaining implicit regularization in deep learning,
prominent focus was given to matrix and tensor factorizations, which correspond
to simplified neural networks. It was shown that these models exhibit an
implicit tendency towards low matrix and tensor ranks, respectively. Drawing
closer to practical deep learning, the current paper theoretically analyzes the
implicit regularization in hierarchical tensor factorization, a model
equivalent to certain deep convolutional neural networks. Through a dynamical
systems lens, we overcome challenges associated with hierarchy, and establish
implicit regularization towards low hierarchical tensor rank. This translates
to an implicit regularization towards locality for the associated convolutional
networks. Inspired by our theory, we design explicit regularization
discouraging locality, and demonstrate its ability to improve the performance
of modern convolutional networks on non-local tasks, in defiance of
conventional wisdom by which architectural changes are needed. Our work
highlights the potential of enhancing neural networks via theoretical analysis
of their implicit regularization.Comment: Accepted to ICML 202