934 research outputs found
Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization
Many problems encountered in science and engineering can be formulated as
estimating a low-rank object (e.g., matrices and tensors) from incomplete, and
possibly corrupted, linear measurements. Through the lens of matrix and tensor
factorization, one of the most popular approaches is to employ simple iterative
algorithms such as gradient descent (GD) to recover the low-rank factors
directly, which allow for small memory and computation footprints. However, the
convergence rate of GD depends linearly, and sometimes even quadratically, on
the condition number of the low-rank object, and therefore, GD slows down
painstakingly when the problem is ill-conditioned. This chapter introduces a
new algorithmic approach, dubbed scaled gradient descent (ScaledGD), that
provably converges linearly at a constant rate independent of the condition
number of the low-rank object, while maintaining the low per-iteration cost of
gradient descent for a variety of tasks including sensing, robust principal
component analysis and completion. In addition, ScaledGD continues to admit
fast global convergence to the minimax-optimal solution, again almost
independent of the condition number, from a small random initialization when
the rank is over-specified in the presence of Gaussian noise. In total,
ScaledGD highlights the power of appropriate preconditioning in accelerating
nonconvex statistical estimation, where the iteration-varying preconditioners
promote desirable invariance properties of the trajectory with respect to the
symmetry in low-rank factorization without hurting generalization.Comment: Book chapter for "Explorations in the Mathematics of Data Science -
The Inaugural Volume of the Center for Approximation and Mathematical Data
Analytics". arXiv admin note: text overlap with arXiv:2104.1452
Block-Randomized Stochastic Methods for Tensor Ring Decomposition
Tensor ring (TR) decomposition is a simple but effective tensor network for
analyzing and interpreting latent patterns of tensors. In this work, we propose
a doubly randomized optimization framework for computing TR decomposition. It
can be regarded as a sensible mix of randomized block coordinate descent and
stochastic gradient descent, and hence functions in a double-random manner and
can achieve lightweight updates and a small memory footprint. Further, to
improve the convergence, especially for ill-conditioned problems, we propose a
scaled version of the framework that can be viewed as an adaptive
preconditioned or diagonally-scaled variant. Four different probability
distributions for selecting the mini-batch and the adaptive strategy for
determining the step size are also provided. Finally, we present the
theoretical properties and numerical performance for our proposals
Fast and Minimax Optimal Estimation of Low-Rank Matrices via Non-Convex Gradient Descent
We study the problem of estimating a low-rank matrix from noisy measurements,
with the specific goal of achieving minimax optimal error. In practice, the
problem is commonly solved using non-convex gradient descent, due to its
ability to scale to large-scale real-world datasets. In theory, non-convex
gradient descent is capable of achieving minimax error. But in practice, it
often converges extremely slowly, such that it cannot even deliver estimations
of modest accuracy within reasonable time. On the other hand, methods that
improve the convergence of non-convex gradient descent, through rescaling or
preconditioning, also greatly amplify the measurement noise, resulting in
estimations that are orders of magnitude less accurate than what is
theoretically achievable with minimax optimal error. In this paper, we propose
a slight modification to the usual non-convex gradient descent method that
remedies the issue of slow convergence, while provably preserving its minimax
optimality. Our proposed algorithm has essentially the same per-iteration cost
as non-convex gradient descent, but is guaranteed to converge to minimax error
at a linear rate that is immune to ill-conditioning. Using our proposed
algorithm, we reconstruct a 60 megapixel dataset for a medical imaging
application, and observe significantly decreased reconstruction error compared
to previous approaches
Learning Linear Dynamical Systems via Spectral Filtering
We present an efficient and practical algorithm for the online prediction of
discrete-time linear dynamical systems with a symmetric transition matrix. We
circumvent the non-convex optimization problem using improper learning:
carefully overparameterize the class of LDSs by a polylogarithmic factor, in
exchange for convexity of the loss functions. From this arises a
polynomial-time algorithm with a near-optimal regret guarantee, with an
analogous sample complexity bound for agnostic learning. Our algorithm is based
on a novel filtering technique, which may be of independent interest: we
convolve the time series with the eigenvectors of a certain Hankel matrix.Comment: Published as a conference paper at NIPS 201
- …