5,459 research outputs found
Implicit Regularization in Deep Matrix Factorization
Efforts to understand the generalization mystery in deep learning have led to
the belief that gradient-based optimization induces a form of implicit
regularization, a bias towards models of low "complexity." We study the
implicit regularization of gradient descent over deep linear neural networks
for matrix completion and sensing, a model referred to as deep matrix
factorization. Our first finding, supported by theory and experiments, is that
adding depth to a matrix factorization enhances an implicit tendency towards
low-rank solutions, oftentimes leading to more accurate recovery. Secondly, we
present theoretical and empirical arguments questioning a nascent view by which
implicit regularization in matrix factorization can be captured using simple
mathematical norms. Our results point to the possibility that the language of
standard regularizers may not be rich enough to fully encompass the implicit
regularization brought forth by gradient-based optimization.Comment: Published at the conference on Neural Information Processing Systems
(NeurIPS) 201
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
In the pursuit of explaining implicit regularization in deep learning,
prominent focus was given to matrix and tensor factorizations, which correspond
to simplified neural networks. It was shown that these models exhibit an
implicit tendency towards low matrix and tensor ranks, respectively. Drawing
closer to practical deep learning, the current paper theoretically analyzes the
implicit regularization in hierarchical tensor factorization, a model
equivalent to certain deep convolutional neural networks. Through a dynamical
systems lens, we overcome challenges associated with hierarchy, and establish
implicit regularization towards low hierarchical tensor rank. This translates
to an implicit regularization towards locality for the associated convolutional
networks. Inspired by our theory, we design explicit regularization
discouraging locality, and demonstrate its ability to improve the performance
of modern convolutional networks on non-local tasks, in defiance of
conventional wisdom by which architectural changes are needed. Our work
highlights the potential of enhancing neural networks via theoretical analysis
of their implicit regularization.Comment: Accepted to ICML 202
Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
Works on implicit regularization have studied gradient trajectories during
the optimization process to explain why deep networks favor certain kinds of
solutions over others. In deep linear networks, it has been shown that gradient
descent implicitly regularizes toward low-rank solutions on matrix
completion/factorization tasks. Adding depth not only improves performance on
these tasks but also acts as an accelerative pre-conditioning that further
enhances this bias towards low-rankedness. Inspired by this, we propose an
explicit penalty to mirror this implicit bias which only takes effect with
certain adaptive gradient optimizers (e.g. Adam). This combination can enable a
degenerate single-layer network to achieve low-rank approximations with
generalization error comparable to deep linear networks, making depth no longer
necessary for learning. The single-layer network also performs competitively or
out-performs various approaches for matrix completion over a range of parameter
and data regimes despite its simplicity. Together with an optimizer's inductive
bias, our findings suggest that explicit regularization can play a role in
designing different, desirable forms of regularization and that a more nuanced
understanding of this interplay may be necessary
A regularized deep matrix factorized model of matrix completion for image restoration
It has been an important approach of using matrix completion to perform image
restoration. Most previous works on matrix completion focus on the low-rank
property by imposing explicit constraints on the recovered matrix, such as the
constraint of the nuclear norm or limiting the dimension of the matrix
factorization component. Recently, theoretical works suggest that deep linear
neural network has an implicit bias towards low rank on matrix completion.
However, low rank is not adequate to reflect the intrinsic characteristics of a
natural image. Thus, algorithms with only the constraint of low rank are
insufficient to perform image restoration well. In this work, we propose a
Regularized Deep Matrix Factorized (RDMF) model for image restoration, which
utilizes the implicit bias of the low rank of deep neural networks and the
explicit bias of total variation. We demonstrate the effectiveness of our RDMF
model with extensive experiments, in which our method surpasses the state of
art models in common examples, especially for the restoration from very few
observations. Our work sheds light on a more general framework for solving
other inverse problems by combining the implicit bias of deep learning with
explicit regularization
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Recent works on over-parameterized neural networks have shown that the
stochasticity in optimizers has the implicit regularization effect of
minimizing the sharpness of the loss function (in particular, the trace of its
Hessian) over the family zero-loss solutions. More explicit forms of flatness
regularization also empirically improve the generalization performance.
However, it remains unclear why and when flatness regularization leads to
better generalization. This work takes the first step toward understanding the
inductive bias of the minimum trace of the Hessian solutions in an important
setting: learning deep linear networks from linear measurements, also known as
\emph{deep matrix factorization}. We show that for all depth greater than one,
with the standard Restricted Isometry Property (RIP) on the measurements,
minimizing the trace of Hessian is approximately equivalent to minimizing the
Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the
product of all layer matrices), which in turn leads to better generalization.
We empirically verify our theoretical findings on synthetic datasets
- …