Search CORE

11 research outputs found

Subexponential-Time Algorithms for Sparse PCA

Author: Bandeira Afonso S.
Ding Yunzi
Kunisky Dmitriy
Wein Alexander S.
Publication venue
Publication date: 01/03/2020
Field of study

We study the computational cost of recovering a unit-norm sparse principal component

x \in \mathbb{R}^n

planted in a random matrix, in either the Wigner or Wishart spiked model (observing either

W + \lambda xx^\top

with

W

drawn from the Gaussian orthogonal ensemble, or

N

independent samples from

\mathcal{N}(0, I_n + \beta xx^\top)

, respectively). Prior work has shown that when the signal-to-noise ratio (

\lambda

\beta\sqrt{N/n}

, respectively) is a small constant and the fraction of nonzero entries in the planted vector is

\|x\|_0 / n = \rho

, it is possible to recover

x

in polynomial time if

\rho \lesssim 1/\sqrt{n}

. While it is possible to recover

x

in exponential time under the weaker condition

\rho \ll 1

, it is believed that polynomial-time recovery is impossible unless

\rho \lesssim 1/\sqrt{n}

. We investigate the precise amount of time required for recovery in the "possible but hard" regime

1/\sqrt{n} \ll \rho \ll 1

by exploring the power of subexponential-time algorithms, i.e., algorithms running in time

\exp(n^\delta)

for some constant

\delta \in (0,1)

. For any

1/\sqrt{n} \ll \rho \ll 1

, we give a recovery algorithm with runtime roughly

\exp(\rho^2 n)

, demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the

\exp(\rho n)

-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.Comment: 44 page

arXiv.org e-Print Archive

Online stochastic gradient descent on non-convex losses from high-dimensional inference

Author: Arous Gerard Ben
Gheissari Reza
Jagannath Aukosh
Publication venue
Publication date: 10/11/2020
Field of study

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behaviour. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, parameter estimation for generalized linear models, spiked matrix models, and spiked tensor models, as well as for supervised learning for single-layer networks with general activation functions.Comment: Substantially revised presentation. Figures adde

arXiv.org e-Print Archive

Computational Barriers to Estimation from Low-Degree Polynomials

Author: Schramm Tselil
Wein Alexander S.
Publication venue
Publication date: 05/08/2020
Field of study

One fundamental goal of high-dimensional statistics is to detect or recover structure from noisy data. In many cases, the data can be faithfully modeled by a planted structure (such as a low-rank matrix) perturbed by random noise. But even for these simple models, the computational complexity of estimation is sometimes poorly understood. A growing body of work studies low-degree polynomials as a proxy for computational complexity: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polynomial-time algorithms for detection. While prior work has studied the power of low-degree polynomials for the task of detecting the presence of hidden structures, it has failed to address the estimation problem in settings where detection is qualitatively easier than estimation. In this work, we extend the method of low-degree polynomials to address problems of estimation and recovery. For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree-D polynomial. To our knowledge, this is the first instance in which the low-degree polynomial method can establish low-degree hardness of recovery problems where the associated detection problem is easy. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems, resolving (in the low-degree framework) open problems about the computational complexity of recovery in both cases.Comment: 38 page

arXiv.org e-Print Archive

The Overlap Gap Property in Principal Submatrix Recovery

Author: Gamarnik David
Jagannath Aukosh
Sen Subhabrata
Publication venue
Publication date: 12/12/2020
Field of study

We study support recovery for a

k \times k

principal submatrix with elevated mean

\lambda/N

, hidden in an

N\times N

symmetric mean zero Gaussian matrix. Here

\lambda>0

is a universal constant, and we assume

k = N \rho

for some constant

\rho \in (0,1)

. We establish that {there exists a constant

C>0

such that} the MLE recovers a constant proportion of the hidden submatrix if

\lambda {\geq C} \sqrt{\frac{1}{\rho} \log \frac{1}{\rho}}

, {while such recovery is information theoretically impossible if

\lambda = o( \sqrt{\frac{1}{\rho} \log \frac{1}{\rho}} )

}. The MLE is computationally intractable in general, and in fact, for

\rho>0

sufficiently small, this problem is conjectured to exhibit a \emph{statistical-computational gap}. To provide rigorous evidence for this, we study the likelihood landscape for this problem, and establish that for some

\varepsilon>0

and

\sqrt{\frac{1}{\rho} \log \frac{1}{\rho} } \ll \lambda \ll \frac{1}{\rho^{1/2 + \varepsilon}}

, the problem exhibits a variant of the \emph{Overlap-Gap-Property (OGP)}. As a direct consequence, we establish that a family of local MCMC based algorithms do not achieve optimal recovery. Finally, we establish that for

\lambda > 1/\rho

, a simple spectral method recovers a constant proportion of the hidden submatrix.Comment: 42 pages, 1 figur

arXiv.org e-Print Archive

DSpace@MIT