11 research outputs found

    Subexponential-Time Algorithms for Sparse PCA

    Full text link
    We study the computational cost of recovering a unit-norm sparse principal component xRnx \in \mathbb{R}^n planted in a random matrix, in either the Wigner or Wishart spiked model (observing either W+λxxW + \lambda xx^\top with WW drawn from the Gaussian orthogonal ensemble, or NN independent samples from N(0,In+βxx)\mathcal{N}(0, I_n + \beta xx^\top), respectively). Prior work has shown that when the signal-to-noise ratio (λ\lambda or βN/n\beta\sqrt{N/n}, respectively) is a small constant and the fraction of nonzero entries in the planted vector is x0/n=ρ\|x\|_0 / n = \rho, it is possible to recover xx in polynomial time if ρ1/n\rho \lesssim 1/\sqrt{n}. While it is possible to recover xx in exponential time under the weaker condition ρ1\rho \ll 1, it is believed that polynomial-time recovery is impossible unless ρ1/n\rho \lesssim 1/\sqrt{n}. We investigate the precise amount of time required for recovery in the "possible but hard" regime 1/nρ11/\sqrt{n} \ll \rho \ll 1 by exploring the power of subexponential-time algorithms, i.e., algorithms running in time exp(nδ)\exp(n^\delta) for some constant δ(0,1)\delta \in (0,1). For any 1/nρ11/\sqrt{n} \ll \rho \ll 1, we give a recovery algorithm with runtime roughly exp(ρ2n)\exp(\rho^2 n), demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the exp(ρn)\exp(\rho n)-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.Comment: 44 page

    Online stochastic gradient descent on non-convex losses from high-dimensional inference

    Full text link
    Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behaviour. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, parameter estimation for generalized linear models, spiked matrix models, and spiked tensor models, as well as for supervised learning for single-layer networks with general activation functions.Comment: Substantially revised presentation. Figures adde

    Computational Barriers to Estimation from Low-Degree Polynomials

    Full text link
    One fundamental goal of high-dimensional statistics is to detect or recover structure from noisy data. In many cases, the data can be faithfully modeled by a planted structure (such as a low-rank matrix) perturbed by random noise. But even for these simple models, the computational complexity of estimation is sometimes poorly understood. A growing body of work studies low-degree polynomials as a proxy for computational complexity: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polynomial-time algorithms for detection. While prior work has studied the power of low-degree polynomials for the task of detecting the presence of hidden structures, it has failed to address the estimation problem in settings where detection is qualitatively easier than estimation. In this work, we extend the method of low-degree polynomials to address problems of estimation and recovery. For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree-D polynomial. To our knowledge, this is the first instance in which the low-degree polynomial method can establish low-degree hardness of recovery problems where the associated detection problem is easy. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems, resolving (in the low-degree framework) open problems about the computational complexity of recovery in both cases.Comment: 38 page

    The Overlap Gap Property in Principal Submatrix Recovery

    Full text link
    We study support recovery for a k×kk \times k principal submatrix with elevated mean λ/N\lambda/N, hidden in an N×NN\times N symmetric mean zero Gaussian matrix. Here λ>0\lambda>0 is a universal constant, and we assume k=Nρk = N \rho for some constant ρ(0,1)\rho \in (0,1). We establish that {there exists a constant C>0C>0 such that} the MLE recovers a constant proportion of the hidden submatrix if λC1ρlog1ρ\lambda {\geq C} \sqrt{\frac{1}{\rho} \log \frac{1}{\rho}}, {while such recovery is information theoretically impossible if λ=o(1ρlog1ρ)\lambda = o( \sqrt{\frac{1}{\rho} \log \frac{1}{\rho}} )}. The MLE is computationally intractable in general, and in fact, for ρ>0\rho>0 sufficiently small, this problem is conjectured to exhibit a \emph{statistical-computational gap}. To provide rigorous evidence for this, we study the likelihood landscape for this problem, and establish that for some ε>0\varepsilon>0 and 1ρlog1ρλ1ρ1/2+ε\sqrt{\frac{1}{\rho} \log \frac{1}{\rho} } \ll \lambda \ll \frac{1}{\rho^{1/2 + \varepsilon}}, the problem exhibits a variant of the \emph{Overlap-Gap-Property (OGP)}. As a direct consequence, we establish that a family of local MCMC based algorithms do not achieve optimal recovery. Finally, we establish that for λ>1/ρ\lambda > 1/\rho, a simple spectral method recovers a constant proportion of the hidden submatrix.Comment: 42 pages, 1 figur
    corecore