21,525 research outputs found
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
In the pursuit of explaining implicit regularization in deep learning,
prominent focus was given to matrix and tensor factorizations, which correspond
to simplified neural networks. It was shown that these models exhibit an
implicit tendency towards low matrix and tensor ranks, respectively. Drawing
closer to practical deep learning, the current paper theoretically analyzes the
implicit regularization in hierarchical tensor factorization, a model
equivalent to certain deep convolutional neural networks. Through a dynamical
systems lens, we overcome challenges associated with hierarchy, and establish
implicit regularization towards low hierarchical tensor rank. This translates
to an implicit regularization towards locality for the associated convolutional
networks. Inspired by our theory, we design explicit regularization
discouraging locality, and demonstrate its ability to improve the performance
of modern convolutional networks on non-local tasks, in defiance of
conventional wisdom by which architectural changes are needed. Our work
highlights the potential of enhancing neural networks via theoretical analysis
of their implicit regularization.Comment: Accepted to ICML 202
Implicit Regularization in Deep Matrix Factorization
Efforts to understand the generalization mystery in deep learning have led to
the belief that gradient-based optimization induces a form of implicit
regularization, a bias towards models of low "complexity." We study the
implicit regularization of gradient descent over deep linear neural networks
for matrix completion and sensing, a model referred to as deep matrix
factorization. Our first finding, supported by theory and experiments, is that
adding depth to a matrix factorization enhances an implicit tendency towards
low-rank solutions, oftentimes leading to more accurate recovery. Secondly, we
present theoretical and empirical arguments questioning a nascent view by which
implicit regularization in matrix factorization can be captured using simple
mathematical norms. Our results point to the possibility that the language of
standard regularizers may not be rich enough to fully encompass the implicit
regularization brought forth by gradient-based optimization.Comment: Published at the conference on Neural Information Processing Systems
(NeurIPS) 201
Why neural networks find simple solutions: the many regularizers of geometric complexity
In many contexts, simpler models are preferable to more complex models and
the control of this model complexity is the goal for many methods in machine
learning such as regularization, hyperparameter tuning and architecture design.
In deep learning, it has been difficult to understand the underlying mechanisms
of complexity control, since many traditional measures are not naturally
suitable for deep neural networks. Here we develop the notion of geometric
complexity, which is a measure of the variability of the model function,
computed using a discrete Dirichlet energy. Using a combination of theoretical
arguments and empirical results, we show that many common training heuristics
such as parameter norm regularization, spectral norm regularization, flatness
regularization, implicit gradient regularization, noise regularization and the
choice of parameter initialization all act to control geometric complexity,
providing a unifying framework in which to characterize the behavior of deep
learning models.Comment: Accepted as a NeurIPS 2022 pape
- …