18,914 research outputs found
Convergence of Gradient Descent for Low-Rank Matrix Approximation
This paper provides a proof of global convergence of gradient search for low-rank matrix approximation. Such approximations have recently been of interest for large-scale problems, as well as for dictionary learning for sparse signal representations and matrix completion. The proof is based on the interpretation of the problem as an optimization on the Grassmann manifold and Fubiny-Study distance on this space
Fast global convergence of gradient descent for low-rank matrix approximation
This paper investigates gradient descent for solving low-rank matrix
approximation problems. We begin by establishing the local linear convergence
of gradient descent for symmetric matrix approximation. Building on this
result, we prove the rapid global convergence of gradient descent, particularly
when initialized with small random values. Remarkably, we show that even with
moderate random initialization, which includes small random initialization as a
special case, gradient descent achieves fast global convergence in scenarios
where the top eigenvalues are identical. Furthermore, we extend our analysis to
address asymmetric matrix approximation problems and investigate the
effectiveness of a retraction-free eigenspace computation method. Numerical
experiments strongly support our theory. In particular, the retraction-free
algorithm outperforms the corresponding Riemannian gradient descent method,
resulting in a significant 29\% reduction in runtime
Convergence of Gradient Descent for Low-Rank Matrix Approximation
This paper provides a proof of global convergence of gradient search for low-rank matrix approximation. Such approximations have recently been of interest for large scale problems, as well as for dictionary learning for sparse signal representations and matrix completion. The proof is based on the interpretation of the problem as an optimization on the Grassmann manifold and Fubiny-Study distance on this space
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
We consider a deep matrix factorization model of covariance matrices trained
with the Bures-Wasserstein distance. While recent works have made important
advances in the study of the optimization problem for overparametrized low-rank
matrix approximation, much emphasis has been placed on discriminative settings
and the square loss. In contrast, our model considers another interesting type
of loss and connects with the generative setting. We characterize the critical
points and minimizers of the Bures-Wasserstein distance over the space of
rank-bounded matrices. For low-rank matrices the Hessian of this loss can
theoretically blow up, which creates challenges to analyze convergence of
optimizaton methods. We establish convergence results for gradient flow using a
smooth perturbative version of the loss and convergence results for finite step
size gradient descent under certain assumptions on the initial weights.Comment: 35 pages, 1 figure, accepted at ICML 202
Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization
Many problems encountered in science and engineering can be formulated as
estimating a low-rank object (e.g., matrices and tensors) from incomplete, and
possibly corrupted, linear measurements. Through the lens of matrix and tensor
factorization, one of the most popular approaches is to employ simple iterative
algorithms such as gradient descent (GD) to recover the low-rank factors
directly, which allow for small memory and computation footprints. However, the
convergence rate of GD depends linearly, and sometimes even quadratically, on
the condition number of the low-rank object, and therefore, GD slows down
painstakingly when the problem is ill-conditioned. This chapter introduces a
new algorithmic approach, dubbed scaled gradient descent (ScaledGD), that
provably converges linearly at a constant rate independent of the condition
number of the low-rank object, while maintaining the low per-iteration cost of
gradient descent for a variety of tasks including sensing, robust principal
component analysis and completion. In addition, ScaledGD continues to admit
fast global convergence to the minimax-optimal solution, again almost
independent of the condition number, from a small random initialization when
the rank is over-specified in the presence of Gaussian noise. In total,
ScaledGD highlights the power of appropriate preconditioning in accelerating
nonconvex statistical estimation, where the iteration-varying preconditioners
promote desirable invariance properties of the trajectory with respect to the
symmetry in low-rank factorization without hurting generalization.Comment: Book chapter for "Explorations in the Mathematics of Data Science -
The Inaugural Volume of the Center for Approximation and Mathematical Data
Analytics". arXiv admin note: text overlap with arXiv:2104.1452
Black Box Lie Group Preconditioners for SGD
A matrix free and a low rank approximation preconditioner are proposed to
accelerate the convergence of stochastic gradient descent (SGD) by exploiting
curvature information sampled from Hessian-vector products or finite
differences of parameters and gradients similar to the BFGS algorithm. Both
preconditioners are fitted with an online updating manner minimizing a
criterion that is free of line search and robust to stochastic gradient noise,
and further constrained to be on certain connected Lie groups to preserve their
corresponding symmetry or invariance, e.g., orientation of coordinates by the
connected general linear group with positive determinants. The Lie group's
equivariance property facilitates preconditioner fitting, and its invariance
property saves any need of damping, which is common in second-order optimizers,
but difficult to tune. The learning rate for parameter updating and step size
for preconditioner fitting are naturally normalized, and their default values
work well in most situations.Comment: HOOML 202
Convergence results for projected line-search methods on varieties of low-rank matrices via \L{}ojasiewicz inequality
The aim of this paper is to derive convergence results for projected
line-search methods on the real-algebraic variety of real
matrices of rank at most . Such methods extend Riemannian
optimization methods, which are successfully used on the smooth manifold
of rank- matrices, to its closure by taking steps along
gradient-related directions in the tangent cone, and afterwards projecting back
to . Considering such a method circumvents the
difficulties which arise from the nonclosedness and the unbounded curvature of
. The pointwise convergence is obtained for real-analytic
functions on the basis of a \L{}ojasiewicz inequality for the projection of the
antigradient to the tangent cone. If the derived limit point lies on the smooth
part of , i.e. in , this boils down to more
or less known results, but with the benefit that asymptotic convergence rate
estimates (for specific step-sizes) can be obtained without an a priori
curvature bound, simply from the fact that the limit lies on a smooth manifold.
At the same time, one can give a convincing justification for assuming critical
points to lie in : if is a critical point of on
, then either has rank , or
- …