11 research outputs found
Subexponential-Time Algorithms for Sparse PCA
We study the computational cost of recovering a unit-norm sparse principal
component planted in a random matrix, in either the Wigner
or Wishart spiked model (observing either with drawn
from the Gaussian orthogonal ensemble, or independent samples from
, respectively). Prior work has shown that
when the signal-to-noise ratio ( or , respectively)
is a small constant and the fraction of nonzero entries in the planted vector
is , it is possible to recover in polynomial time if
. While it is possible to recover in exponential
time under the weaker condition , it is believed that
polynomial-time recovery is impossible unless . We
investigate the precise amount of time required for recovery in the "possible
but hard" regime by exploring the power of
subexponential-time algorithms, i.e., algorithms running in time
for some constant . For any , we give a recovery algorithm with runtime roughly , demonstrating a smooth tradeoff between sparsity and runtime. Our family
of algorithms interpolates smoothly between two existing algorithms: the
polynomial-time diagonal thresholding algorithm and the -time
exhaustive search algorithm. Furthermore, by analyzing the low-degree
likelihood ratio, we give rigorous evidence suggesting that the tradeoff
achieved by our algorithms is optimal.Comment: 44 page
Online stochastic gradient descent on non-convex losses from high-dimensional inference
Stochastic gradient descent (SGD) is a popular algorithm for optimization
problems arising in high-dimensional inference tasks. Here one produces an
estimator of an unknown parameter from independent samples of data by
iteratively optimizing a loss function. This loss function is random and often
non-convex. We study the performance of the simplest version of SGD, namely
online SGD, from a random start in the setting where the parameter space is
high-dimensional.
We develop nearly sharp thresholds for the number of samples needed for
consistent estimation as one varies the dimension. Our thresholds depend only
on an intrinsic property of the population loss which we call the information
exponent. In particular, our results do not assume uniform control on the loss
itself, such as convexity or uniform derivative bounds. The thresholds we
obtain are polynomial in the dimension and the precise exponent depends
explicitly on the information exponent. As a consequence of our results, we
find that except for the simplest tasks, almost all of the data is used simply
in the initial search phase to obtain non-trivial correlation with the ground
truth. Upon attaining non-trivial correlation, the descent is rapid and
exhibits law of large numbers type behaviour.
We illustrate our approach by applying it to a wide set of inference tasks
such as phase retrieval, parameter estimation for generalized linear models,
spiked matrix models, and spiked tensor models, as well as for supervised
learning for single-layer networks with general activation functions.Comment: Substantially revised presentation. Figures adde
Computational Barriers to Estimation from Low-Degree Polynomials
One fundamental goal of high-dimensional statistics is to detect or recover
structure from noisy data. In many cases, the data can be faithfully modeled by
a planted structure (such as a low-rank matrix) perturbed by random noise. But
even for these simple models, the computational complexity of estimation is
sometimes poorly understood. A growing body of work studies low-degree
polynomials as a proxy for computational complexity: it has been demonstrated
in various settings that low-degree polynomials of the data can match the
statistical performance of the best known polynomial-time algorithms for
detection. While prior work has studied the power of low-degree polynomials for
the task of detecting the presence of hidden structures, it has failed to
address the estimation problem in settings where detection is qualitatively
easier than estimation.
In this work, we extend the method of low-degree polynomials to address
problems of estimation and recovery. For a large class of "signal plus noise"
problems, we give a user-friendly lower bound for the best possible mean
squared error achievable by any degree-D polynomial. To our knowledge, this is
the first instance in which the low-degree polynomial method can establish
low-degree hardness of recovery problems where the associated detection problem
is easy. As applications, we give a tight characterization of the low-degree
minimum mean squared error for the planted submatrix and planted dense subgraph
problems, resolving (in the low-degree framework) open problems about the
computational complexity of recovery in both cases.Comment: 38 page
The Overlap Gap Property in Principal Submatrix Recovery
We study support recovery for a principal submatrix with
elevated mean , hidden in an symmetric mean zero
Gaussian matrix. Here is a universal constant, and we assume for some constant . We establish that {there exists a
constant such that} the MLE recovers a constant proportion of the hidden
submatrix if ,
{while such recovery is information theoretically impossible if }. The MLE is computationally
intractable in general, and in fact, for sufficiently small, this
problem is conjectured to exhibit a \emph{statistical-computational gap}. To
provide rigorous evidence for this, we study the likelihood landscape for this
problem, and establish that for some and , the
problem exhibits a variant of the \emph{Overlap-Gap-Property (OGP)}. As a
direct consequence, we establish that a family of local MCMC based algorithms
do not achieve optimal recovery. Finally, we establish that for , a simple spectral method recovers a constant proportion of the hidden
submatrix.Comment: 42 pages, 1 figur