Search CORE

9 research outputs found

Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval

Author: Tan Yan Shuo
Vershynin Roman
Publication venue: eScholarship, University of California
Publication date: 28/10/2019
Field of study

In recent literature, a general two step procedure has been formulated for solving the problem of phase retrieval. First, a spectral technique is used to obtain a constant-error initial estimate, following which, the estimate is refined to arbitrary precision by first-order optimization of a non-convex loss function. Numerical experiments, however, seem to suggest that simply running the iterative schemes from a random initialization may also lead to convergence, albeit at the cost of slightly higher sample complexity. In this paper, we prove that, in fact, constant step size online stochastic gradient descent (SGD) converges from arbitrary initializations for the non-smooth, non-convex amplitude squared loss objective. In this setting, online SGD is also equivalent to the randomized Kaczmarz algorithm from numerical analysis. Our analysis can easily be generalized to other single index models. It also makes use of new ideas from stochastic process theory, including the notion of a summary state space, which we believe will be of use for the broader field of non-convex optimization

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval

Author: Tan Yan Shuo
Vershynin Roman
Publication venue: eScholarship, University of California
Publication date: 28/10/2019
Field of study

eScholarship - University of California

Online stochastic gradient descent on non-convex losses from high-dimensional inference

Author: Arous Gerard Ben
Gheissari Reza
Jagannath Aukosh
Publication venue
Publication date: 10/11/2020
Field of study

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behaviour. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, parameter estimation for generalized linear models, spiked matrix models, and spiked tensor models, as well as for supervised learning for single-layer networks with general activation functions.Comment: Substantially revised presentation. Figures adde

arXiv.org e-Print Archive

Nonconvex Low-Rank Tensor Completion from Noisy Data

Author: Cai Changxiao
Chen Yuxin
Li Gen
Poor H. Vincent
Publication venue
Publication date: 02/06/2021
Field of study

We study a noisy tensor completion problem of broad practical interest, namely, the reconstruction of a low-rank tensor from highly incomplete and randomly corrupted observations of its entries. While a variety of prior work has been dedicated to this problem, prior algorithms either are computationally too expensive for large-scale applications, or come with sub-optimal statistical guarantees. Focusing on "incoherent" and well-conditioned tensors of a constant CP rank, we propose a two-stage nonconvex algorithm -- (vanilla) gradient descent following a rough initialization -- that achieves the best of both worlds. Specifically, the proposed nonconvex algorithm faithfully completes the tensor and retrieves all individual tensor factors within nearly linear time, while at the same time enjoying near-optimal statistical guarantees (i.e. minimal sample complexity and optimal estimation accuracy). The estimation errors are evenly spread out across all entries, thus achieving optimal

\ell_{\infty}

statistical accuracy. We have also discussed how to extend our approach to accommodate asymmetric tensors. The insight conveyed through our analysis of nonconvex optimization might have implications for other tensor estimation problems.Comment: Accepted to Operations Researc

arXiv.org e-Print Archive

Princeton University Open Access Repository