15 research outputs found
Sparse Power Factorization: Balancing peakiness and sample complexity
In many applications, one is faced with an inverse problem, where the known
signal depends in a bilinear way on two unknown input vectors. Often at least
one of the input vectors is assumed to be sparse, i.e., to have only few
non-zero entries. Sparse Power Factorization (SPF), proposed by Lee, Wu, and
Bresler, aims to tackle this problem. They have established recovery guarantees
for a somewhat restrictive class of signals under the assumption that the
measurements are random. We generalize these recovery guarantees to a
significantly enlarged and more realistic signal class at the expense of a
moderately increased number of measurements.Comment: 18 page
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
Recent years have seen a flurry of activities in designing provably efficient
nonconvex procedures for solving statistical estimation problems. Due to the
highly nonconvex nature of the empirical loss, state-of-the-art procedures
often require proper regularization (e.g. trimming, regularized cost,
projection) in order to guarantee fast convergence. For vanilla procedures such
as gradient descent, however, prior theory either recommends highly
conservative learning rates to avoid overshooting, or completely lacks
performance guarantees.
This paper uncovers a striking phenomenon in nonconvex optimization: even in
the absence of explicit regularization, gradient descent enforces proper
regularization implicitly under various statistical models. In fact, gradient
descent follows a trajectory staying within a basin that enjoys nice geometry,
consisting of points incoherent with the sampling mechanism. This "implicit
regularization" feature allows gradient descent to proceed in a far more
aggressive fashion without overshooting, which in turn results in substantial
computational savings. Focusing on three fundamental statistical estimation
problems, i.e. phase retrieval, low-rank matrix completion, and blind
deconvolution, we establish that gradient descent achieves near-optimal
statistical and computational guarantees without explicit regularization. In
particular, by marrying statistical modeling with generic optimization theory,
we develop a general recipe for analyzing the trajectories of iterative
algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy
matrix completion, we demonstrate that gradient descent achieves near-optimal
error control --- measured entrywise and by the spectral norm --- which might
be of independent interest.Comment: accepted to Foundations of Computational Mathematics (FOCM
Uncertainty quantification for nonconvex tensor completion: Confidence intervals, heteroscedasticity and optimality
We study the distribution and uncertainty of nonconvex optimization for noisy
tensor completion -- the problem of estimating a low-rank tensor given
incomplete and corrupted observations of its entries. Focusing on a two-stage
estimation algorithm proposed by Cai et al. (2019), we characterize the
distribution of this nonconvex estimator down to fine scales. This
distributional theory in turn allows one to construct valid and short
confidence intervals for both the unseen tensor entries and the unknown tensor
factors. The proposed inferential procedure enjoys several important features:
(1) it is fully adaptive to noise heteroscedasticity, and (2) it is data-driven
and automatically adapts to unknown noise distributions. Furthermore, our
findings unveil the statistical optimality of nonconvex tensor completion: it
attains un-improvable accuracy -- including both the rates and the
pre-constants -- when estimating both the unknown tensor and the underlying
tensor factors.Comment: Accepted in part to ICML 202