171 research outputs found
Computational Barriers to Estimation from Low-Degree Polynomials
One fundamental goal of high-dimensional statistics is to detect or recover
structure from noisy data. In many cases, the data can be faithfully modeled by
a planted structure (such as a low-rank matrix) perturbed by random noise. But
even for these simple models, the computational complexity of estimation is
sometimes poorly understood. A growing body of work studies low-degree
polynomials as a proxy for computational complexity: it has been demonstrated
in various settings that low-degree polynomials of the data can match the
statistical performance of the best known polynomial-time algorithms for
detection. While prior work has studied the power of low-degree polynomials for
the task of detecting the presence of hidden structures, it has failed to
address the estimation problem in settings where detection is qualitatively
easier than estimation.
In this work, we extend the method of low-degree polynomials to address
problems of estimation and recovery. For a large class of "signal plus noise"
problems, we give a user-friendly lower bound for the best possible mean
squared error achievable by any degree-D polynomial. To our knowledge, this is
the first instance in which the low-degree polynomial method can establish
low-degree hardness of recovery problems where the associated detection problem
is easy. As applications, we give a tight characterization of the low-degree
minimum mean squared error for the planted submatrix and planted dense subgraph
problems, resolving (in the low-degree framework) open problems about the
computational complexity of recovery in both cases.Comment: 38 page
Optimal detection of sparse principal components in high dimension
We perform a finite sample analysis of the detection levels for sparse
principal components of a high-dimensional covariance matrix. Our minimax
optimal test is based on a sparse eigenvalue statistic. Alas, computing this
test is known to be NP-complete in general, and we describe a computationally
efficient alternative test using convex relaxations. Our relaxation is also
proved to detect sparse principal components at near optimal detection levels,
and it performs well on simulated datasets. Moreover, using polynomial time
reductions from theoretical computer science, we bring significant evidence
that our results cannot be improved, thus revealing an inherent trade off
between statistical and computational performance.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1127 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Random Subgraph Detection Using Queries
The planted densest subgraph detection problem refers to the task of testing
whether in a given (random) graph there is a subgraph that is unusually dense.
Specifically, we observe an undirected and unweighted graph on nodes. Under
the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi graph
with edge probability (or, density) . Under the alternative, there is a
subgraph on vertices with edge probability . The statistical as well
as the computational barriers of this problem are well-understood for a wide
range of the edge parameters and . In this paper, we consider a natural
variant of the above problem, where one can only observe a small part of the
graph using adaptive edge queries.
For this model, we determine the number of queries necessary and sufficient
for detecting the presence of the planted subgraph. Specifically, we show that
any (possibly randomized) algorithm must make adaptive queries (on expectation)
to the adjacency matrix of the graph to detect the planted subgraph with
probability more than , where is the Chi-Square distance.
On the other hand, we devise a quasi-polynomial-time algorithm that detects the
planted subgraph with high probability by making non-adaptive queries. We then propose a
polynomial-time algorithm which is able to detect the planted subgraph using
queries. We conjecture
that in the leftover regime, where , no polynomial-time algorithms exist. Our results resolve two
questions posed in \cite{racz2020finding}, where the special case of adaptive
detection and recovery of a planted clique was considered.Comment: 29 page
Is the Space Complexity of Planted Clique Recovery the Same as That of Detection?
We study the planted clique problem in which a clique of size k is planted in
an Erd\H{o}s-R\'enyi graph G(n, 1/2), and one is interested in either detecting
or recovering this planted clique. This problem is interesting because it is
widely believed to show a statistical-computational gap at clique size
k=sqrt{n}, and has emerged as the prototypical problem with such a gap from
which average-case hardness of other statistical problems can be deduced. It
also displays a tight computational connection between the detection and
recovery variants, unlike other problems of a similar nature. This wide
investigation into the computational complexity of the planted clique problem
has, however, mostly focused on its time complexity. In this work, we ask-
Do the statistical-computational phenomena that make the planted clique an
interesting problem also hold when we use `space efficiency' as our notion of
computational efficiency?
It is relatively easy to show that a positive answer to this question depends
on the existence of a O(log n) space algorithm that can recover planted cliques
of size k = Omega(sqrt{n}). Our main result comes very close to designing such
an algorithm. We show that for k=Omega(sqrt{n}), the recovery problem can be
solved in O((log*{n}-log*{k/sqrt{n}}) log n) bits of space.
1. If k = omega(sqrt{n}log^{(l)}n) for any constant integer l > 0, the space
usage is O(log n) bits.
2.If k = Theta(sqrt{n}), the space usage is O(log*{n} log n) bits.
Our result suggests that there does exist an O(log n) space algorithm to
recover cliques of size k = Omega(sqrt{n}), since we come very close to
achieving such parameters. This provides evidence that the
statistical-computational phenomena that (conjecturally) hold for planted
clique time complexity also (conjecturally) hold for space complexity
Partitioning networks into cliques: a randomized heuristic approach
In the context of community detection in social networks, the term community can be grounded in the strict way that simply everybody should know each other within the community. We consider the corresponding community detection problem. We search for a partitioning of a network into the minimum number of non-overlapping cliques, such that the cliques cover all vertices. This problem is called the clique covering problem (CCP) and is one of the classical NP-hard problems. For CCP, we propose a randomized heuristic approach. To construct a high quality solution to CCP, we present an iterated greedy (IG) algorithm. IG can also be combined with a heuristic used to determine how far the algorithm is from the optimum in the worst case. Randomized local search (RLS) for maximum independent set was proposed to find such a bound. The experimental results of IG and the bounds obtained by RLS indicate that IG is a very suitable technique for solving CCP in real-world graphs. In addition, we summarize our basic rigorous results, which were developed for analysis of IG and understanding of its behavior on several relevant graph classes
- …