9 research outputs found
Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval
In recent literature, a general two step procedure has been formulated for
solving the problem of phase retrieval. First, a spectral technique is used to
obtain a constant-error initial estimate, following which, the estimate is
refined to arbitrary precision by first-order optimization of a non-convex loss
function. Numerical experiments, however, seem to suggest that simply running
the iterative schemes from a random initialization may also lead to
convergence, albeit at the cost of slightly higher sample complexity. In this
paper, we prove that, in fact, constant step size online stochastic gradient
descent (SGD) converges from arbitrary initializations for the non-smooth,
non-convex amplitude squared loss objective. In this setting, online SGD is
also equivalent to the randomized Kaczmarz algorithm from numerical analysis.
Our analysis can easily be generalized to other single index models. It also
makes use of new ideas from stochastic process theory, including the notion of
a summary state space, which we believe will be of use for the broader field of
non-convex optimization
Recommended from our members
Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval
In recent literature, a general two step procedure has been formulated for
solving the problem of phase retrieval. First, a spectral technique is used to
obtain a constant-error initial estimate, following which, the estimate is
refined to arbitrary precision by first-order optimization of a non-convex loss
function. Numerical experiments, however, seem to suggest that simply running
the iterative schemes from a random initialization may also lead to
convergence, albeit at the cost of slightly higher sample complexity. In this
paper, we prove that, in fact, constant step size online stochastic gradient
descent (SGD) converges from arbitrary initializations for the non-smooth,
non-convex amplitude squared loss objective. In this setting, online SGD is
also equivalent to the randomized Kaczmarz algorithm from numerical analysis.
Our analysis can easily be generalized to other single index models. It also
makes use of new ideas from stochastic process theory, including the notion of
a summary state space, which we believe will be of use for the broader field of
non-convex optimization
Online stochastic gradient descent on non-convex losses from high-dimensional inference
Stochastic gradient descent (SGD) is a popular algorithm for optimization
problems arising in high-dimensional inference tasks. Here one produces an
estimator of an unknown parameter from independent samples of data by
iteratively optimizing a loss function. This loss function is random and often
non-convex. We study the performance of the simplest version of SGD, namely
online SGD, from a random start in the setting where the parameter space is
high-dimensional.
We develop nearly sharp thresholds for the number of samples needed for
consistent estimation as one varies the dimension. Our thresholds depend only
on an intrinsic property of the population loss which we call the information
exponent. In particular, our results do not assume uniform control on the loss
itself, such as convexity or uniform derivative bounds. The thresholds we
obtain are polynomial in the dimension and the precise exponent depends
explicitly on the information exponent. As a consequence of our results, we
find that except for the simplest tasks, almost all of the data is used simply
in the initial search phase to obtain non-trivial correlation with the ground
truth. Upon attaining non-trivial correlation, the descent is rapid and
exhibits law of large numbers type behaviour.
We illustrate our approach by applying it to a wide set of inference tasks
such as phase retrieval, parameter estimation for generalized linear models,
spiked matrix models, and spiked tensor models, as well as for supervised
learning for single-layer networks with general activation functions.Comment: Substantially revised presentation. Figures adde
Nonconvex Low-Rank Tensor Completion from Noisy Data
We study a noisy tensor completion problem of broad practical interest,
namely, the reconstruction of a low-rank tensor from highly incomplete and
randomly corrupted observations of its entries. While a variety of prior work
has been dedicated to this problem, prior algorithms either are computationally
too expensive for large-scale applications, or come with sub-optimal
statistical guarantees. Focusing on "incoherent" and well-conditioned tensors
of a constant CP rank, we propose a two-stage nonconvex algorithm -- (vanilla)
gradient descent following a rough initialization -- that achieves the best of
both worlds. Specifically, the proposed nonconvex algorithm faithfully
completes the tensor and retrieves all individual tensor factors within nearly
linear time, while at the same time enjoying near-optimal statistical
guarantees (i.e. minimal sample complexity and optimal estimation accuracy).
The estimation errors are evenly spread out across all entries, thus achieving
optimal statistical accuracy. We have also discussed how to
extend our approach to accommodate asymmetric tensors. The insight conveyed
through our analysis of nonconvex optimization might have implications for
other tensor estimation problems.Comment: Accepted to Operations Researc