132 research outputs found
Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization
Many problems encountered in science and engineering can be formulated as
estimating a low-rank object (e.g., matrices and tensors) from incomplete, and
possibly corrupted, linear measurements. Through the lens of matrix and tensor
factorization, one of the most popular approaches is to employ simple iterative
algorithms such as gradient descent (GD) to recover the low-rank factors
directly, which allow for small memory and computation footprints. However, the
convergence rate of GD depends linearly, and sometimes even quadratically, on
the condition number of the low-rank object, and therefore, GD slows down
painstakingly when the problem is ill-conditioned. This chapter introduces a
new algorithmic approach, dubbed scaled gradient descent (ScaledGD), that
provably converges linearly at a constant rate independent of the condition
number of the low-rank object, while maintaining the low per-iteration cost of
gradient descent for a variety of tasks including sensing, robust principal
component analysis and completion. In addition, ScaledGD continues to admit
fast global convergence to the minimax-optimal solution, again almost
independent of the condition number, from a small random initialization when
the rank is over-specified in the presence of Gaussian noise. In total,
ScaledGD highlights the power of appropriate preconditioning in accelerating
nonconvex statistical estimation, where the iteration-varying preconditioners
promote desirable invariance properties of the trajectory with respect to the
symmetry in low-rank factorization without hurting generalization.Comment: Book chapter for "Explorations in the Mathematics of Data Science -
The Inaugural Volume of the Center for Approximation and Mathematical Data
Analytics". arXiv admin note: text overlap with arXiv:2104.1452
- …