13,182 research outputs found
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
In the pursuit of explaining implicit regularization in deep learning,
prominent focus was given to matrix and tensor factorizations, which correspond
to simplified neural networks. It was shown that these models exhibit an
implicit tendency towards low matrix and tensor ranks, respectively. Drawing
closer to practical deep learning, the current paper theoretically analyzes the
implicit regularization in hierarchical tensor factorization, a model
equivalent to certain deep convolutional neural networks. Through a dynamical
systems lens, we overcome challenges associated with hierarchy, and establish
implicit regularization towards low hierarchical tensor rank. This translates
to an implicit regularization towards locality for the associated convolutional
networks. Inspired by our theory, we design explicit regularization
discouraging locality, and demonstrate its ability to improve the performance
of modern convolutional networks on non-local tasks, in defiance of
conventional wisdom by which architectural changes are needed. Our work
highlights the potential of enhancing neural networks via theoretical analysis
of their implicit regularization.Comment: Accepted to ICML 202
Compression-aware Training of Deep Networks
In recent years, great progress has been made in a variety of application
domains thanks to the development of increasingly deeper neural networks.
Unfortunately, the huge number of units of these networks makes them expensive
both computationally and memory-wise. To overcome this, exploiting the fact
that deep networks are over-parametrized, several compression strategies have
been proposed. These methods, however, typically start from a network that has
been trained in a standard manner, without considering such a future
compression. In this paper, we propose to explicitly account for compression in
the training process. To this end, we introduce a regularizer that encourages
the parameter matrix of each layer to have low rank during training. We show
that accounting for compression during training allows us to learn much more
compact, yet at least as effective, models than state-of-the-art compression
techniques.Comment: Accepted at NIPS 201
- …