299 research outputs found
Finding planted cliques using Markov chain Monte Carlo
The planted clique problem is a paradigmatic model of
statistical-to-computational gaps: the planted clique is
information-theoretically detectable if its size but
polynomial-time algorithms only exist for the recovery task when . By now, there are many simple and fast algorithms that
succeed as soon as . Glaringly, however, no MCMC approach
to the problem had been shown to work, including the Metropolis process on
cliques studied by Jerrum since 1992. In fact, Chen, Mossel, and Zadik recently
showed that any Metropolis process whose state space is the set of cliques
fails to find any sub-linear sized planted clique in polynomial time if
initialized naturally from the empty set. Here, we redeem MCMC performance for
the planted clique problem by relaxing the state space to all vertex subsets
and adding a corresponding energy penalty for missing edges. With that, we
prove that energy-minimizing Markov chains (gradient descent and a
low-temperature relaxation of it) succeed at recovering planted cliques of size
if initialized from the full graph. Importantly,
initialized from the empty set, the relaxation still does not help the gradient
descent find sub-linear planted cliques. We also demonstrate robustness of
these Markov chain approaches under a natural contamination model.Comment: 24 pages, 2 figure
Fine-grained Search Space Classification for Hard Enumeration Variants of Subset Problems
We propose a simple, powerful, and flexible machine learning framework for
(i) reducing the search space of computationally difficult enumeration variants
of subset problems and (ii) augmenting existing state-of-the-art solvers with
informative cues arising from the input distribution. We instantiate our
framework for the problem of listing all maximum cliques in a graph, a central
problem in network analysis, data mining, and computational biology. We
demonstrate the practicality of our approach on real-world networks with
millions of vertices and edges by not only retaining all optimal solutions, but
also aggressively pruning the input instance size resulting in several fold
speedups of state-of-the-art algorithms. Finally, we explore the limits of
scalability and robustness of our proposed framework, suggesting that
supervised learning is viable for tackling NP-hard problems in practice.Comment: AAAI 201
Optimal detection of sparse principal components in high dimension
We perform a finite sample analysis of the detection levels for sparse
principal components of a high-dimensional covariance matrix. Our minimax
optimal test is based on a sparse eigenvalue statistic. Alas, computing this
test is known to be NP-complete in general, and we describe a computationally
efficient alternative test using convex relaxations. Our relaxation is also
proved to detect sparse principal components at near optimal detection levels,
and it performs well on simulated datasets. Moreover, using polynomial time
reductions from theoretical computer science, we bring significant evidence
that our results cannot be improved, thus revealing an inherent trade off
between statistical and computational performance.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1127 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On the Hardness of Signaling
There has been a recent surge of interest in the role of information in
strategic interactions. Much of this work seeks to understand how the realized
equilibrium of a game is influenced by uncertainty in the environment and the
information available to players in the game. Lurking beneath this literature
is a fundamental, yet largely unexplored, algorithmic question: how should a
"market maker" who is privy to additional information, and equipped with a
specified objective, inform the players in the game? This is an informational
analogue of the mechanism design question, and views the information structure
of a game as a mathematical object to be designed, rather than an exogenous
variable.
We initiate a complexity-theoretic examination of the design of optimal
information structures in general Bayesian games, a task often referred to as
signaling. We focus on one of the simplest instantiations of the signaling
question: Bayesian zero-sum games, and a principal who must choose an
information structure maximizing the equilibrium payoff of one of the players.
In this setting, we show that optimal signaling is computationally intractable,
and in some cases hard to approximate, assuming that it is hard to recover a
planted clique from an Erdos-Renyi random graph. This is despite the fact that
equilibria in these games are computable in polynomial time, and therefore
suggests that the hardness of optimal signaling is a distinct phenomenon from
the hardness of equilibrium computation. Necessitated by the non-local nature
of information structures, en-route to our results we prove an "amplification
lemma" for the planted clique problem which may be of independent interest
- …