9,006 research outputs found
Sparse CCA: Adaptive Estimation and Computational Barriers
Canonical correlation analysis is a classical technique for exploring the
relationship between two sets of variables. It has important applications in
analyzing high dimensional datasets originated from genomics, imaging and other
fields. This paper considers adaptive minimax and computationally tractable
estimation of leading sparse canonical coefficient vectors in high dimensions.
First, we establish separate minimax estimation rates for canonical coefficient
vectors of each set of random variables under no structural assumption on
marginal covariance matrices. Second, we propose a computationally feasible
estimator to attain the optimal rates adaptively under an additional sample
size condition. Finally, we show that a sample size condition of this kind is
needed for any randomized polynomial-time estimator to be consistent, assuming
hardness of certain instances of the Planted Clique detection problem. The
result is faithful to the Gaussian models used in the paper. As a byproduct, we
obtain the first computational lower bounds for sparse PCA under the Gaussian
single spiked covariance model
Computational Hardness of Certifying Bounds on Constrained PCA Problems
Given a random n×n symmetric matrix W drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form x⊤Wx over all vectors x in a constraint set S⊂Rn. For a certain class of normalized constraint sets S we show that, conditional on certain complexity-theoretic assumptions, there is no polynomial-time algorithm certifying a better upper bound than the largest eigenvalue of W. A notable special case included in our results is the hypercube S={±1/n−−√}n, which corresponds to the problem of certifying bounds on the Hamiltonian of the Sherrington-Kirkpatrick spin glass model from statistical physics.
Our proof proceeds in two steps. First, we give a reduction from the detection problem in the negatively-spiked Wishart model to the above certification problem. We then give evidence that this Wishart detection problem is computationally hard below the classical spectral threshold, by showing that no low-degree polynomial can (in expectation) distinguish the spiked and unspiked models. This method for identifying computational thresholds was proposed in a sequence of recent works on the sum-of-squares hierarchy, and is believed to be correct for a large class of problems. Our proof can be seen as constructing a distribution over symmetric matrices that appears computationally indistinguishable from the GOE, yet is supported on matrices whose maximum quadratic form over x∈S is much larger than that of a GOE matrix.ISSN:1868-896
- …