171 research outputs found

    Computational Barriers to Estimation from Low-Degree Polynomials

    Full text link
    One fundamental goal of high-dimensional statistics is to detect or recover structure from noisy data. In many cases, the data can be faithfully modeled by a planted structure (such as a low-rank matrix) perturbed by random noise. But even for these simple models, the computational complexity of estimation is sometimes poorly understood. A growing body of work studies low-degree polynomials as a proxy for computational complexity: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polynomial-time algorithms for detection. While prior work has studied the power of low-degree polynomials for the task of detecting the presence of hidden structures, it has failed to address the estimation problem in settings where detection is qualitatively easier than estimation. In this work, we extend the method of low-degree polynomials to address problems of estimation and recovery. For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree-D polynomial. To our knowledge, this is the first instance in which the low-degree polynomial method can establish low-degree hardness of recovery problems where the associated detection problem is easy. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems, resolving (in the low-degree framework) open problems about the computational complexity of recovery in both cases.Comment: 38 page

    Optimal detection of sparse principal components in high dimension

    Full text link
    We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1127 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Random Subgraph Detection Using Queries

    Full text link
    The planted densest subgraph detection problem refers to the task of testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on nn nodes. Under the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi graph with edge probability (or, density) qq. Under the alternative, there is a subgraph on kk vertices with edge probability p>qp>q. The statistical as well as the computational barriers of this problem are well-understood for a wide range of the edge parameters pp and qq. In this paper, we consider a natural variant of the above problem, where one can only observe a small part of the graph using adaptive edge queries. For this model, we determine the number of queries necessary and sufficient for detecting the presence of the planted subgraph. Specifically, we show that any (possibly randomized) algorithm must make Q=Ω(n2k2χ4(pq)log2n)\mathsf{Q} = \Omega(\frac{n^2}{k^2\chi^4(p||q)}\log^2n) adaptive queries (on expectation) to the adjacency matrix of the graph to detect the planted subgraph with probability more than 1/21/2, where χ2(pq)\chi^2(p||q) is the Chi-Square distance. On the other hand, we devise a quasi-polynomial-time algorithm that detects the planted subgraph with high probability by making Q=O(n2k2χ4(pq)log2n)\mathsf{Q} = O(\frac{n^2}{k^2\chi^4(p||q)}\log^2n) non-adaptive queries. We then propose a polynomial-time algorithm which is able to detect the planted subgraph using Q=O(n3k3χ2(pq)log3n)\mathsf{Q} = O(\frac{n^3}{k^3\chi^2(p||q)}\log^3 n) queries. We conjecture that in the leftover regime, where n2k2Qn3k3\frac{n^2}{k^2}\ll\mathsf{Q}\ll \frac{n^3}{k^3}, no polynomial-time algorithms exist. Our results resolve two questions posed in \cite{racz2020finding}, where the special case of adaptive detection and recovery of a planted clique was considered.Comment: 29 page

    Is the Space Complexity of Planted Clique Recovery the Same as That of Detection?

    Get PDF
    We study the planted clique problem in which a clique of size k is planted in an Erd\H{o}s-R\'enyi graph G(n, 1/2), and one is interested in either detecting or recovering this planted clique. This problem is interesting because it is widely believed to show a statistical-computational gap at clique size k=sqrt{n}, and has emerged as the prototypical problem with such a gap from which average-case hardness of other statistical problems can be deduced. It also displays a tight computational connection between the detection and recovery variants, unlike other problems of a similar nature. This wide investigation into the computational complexity of the planted clique problem has, however, mostly focused on its time complexity. In this work, we ask- Do the statistical-computational phenomena that make the planted clique an interesting problem also hold when we use `space efficiency' as our notion of computational efficiency? It is relatively easy to show that a positive answer to this question depends on the existence of a O(log n) space algorithm that can recover planted cliques of size k = Omega(sqrt{n}). Our main result comes very close to designing such an algorithm. We show that for k=Omega(sqrt{n}), the recovery problem can be solved in O((log*{n}-log*{k/sqrt{n}}) log n) bits of space. 1. If k = omega(sqrt{n}log^{(l)}n) for any constant integer l > 0, the space usage is O(log n) bits. 2.If k = Theta(sqrt{n}), the space usage is O(log*{n} log n) bits. Our result suggests that there does exist an O(log n) space algorithm to recover cliques of size k = Omega(sqrt{n}), since we come very close to achieving such parameters. This provides evidence that the statistical-computational phenomena that (conjecturally) hold for planted clique time complexity also (conjecturally) hold for space complexity

    Partitioning networks into cliques: a randomized heuristic approach

    Get PDF
    In the context of community detection in social networks, the term community can be grounded in the strict way that simply everybody should know each other within the community. We consider the corresponding community detection problem. We search for a partitioning of a network into the minimum number of non-overlapping cliques, such that the cliques cover all vertices. This problem is called the clique covering problem (CCP) and is one of the classical NP-hard problems. For CCP, we propose a randomized heuristic approach. To construct a high quality solution to CCP, we present an iterated greedy (IG) algorithm. IG can also be combined with a heuristic used to determine how far the algorithm is from the optimum in the worst case. Randomized local search (RLS) for maximum independent set was proposed to find such a bound. The experimental results of IG and the bounds obtained by RLS indicate that IG is a very suitable technique for solving CCP in real-world graphs. In addition, we summarize our basic rigorous results, which were developed for analysis of IG and understanding of its behavior on several relevant graph classes
    corecore