299 research outputs found

    Finding planted cliques using Markov chain Monte Carlo

    Full text link
    The planted clique problem is a paradigmatic model of statistical-to-computational gaps: the planted clique is information-theoretically detectable if its size k2log2nk\ge 2\log_2 n but polynomial-time algorithms only exist for the recovery task when k=Ω(n)k= \Omega(\sqrt{n}). By now, there are many simple and fast algorithms that succeed as soon as k=Ω(n)k = \Omega(\sqrt{n}). Glaringly, however, no MCMC approach to the problem had been shown to work, including the Metropolis process on cliques studied by Jerrum since 1992. In fact, Chen, Mossel, and Zadik recently showed that any Metropolis process whose state space is the set of cliques fails to find any sub-linear sized planted clique in polynomial time if initialized naturally from the empty set. Here, we redeem MCMC performance for the planted clique problem by relaxing the state space to all vertex subsets and adding a corresponding energy penalty for missing edges. With that, we prove that energy-minimizing Markov chains (gradient descent and a low-temperature relaxation of it) succeed at recovering planted cliques of size k=Ω(n)k = \Omega(\sqrt{n}) if initialized from the full graph. Importantly, initialized from the empty set, the relaxation still does not help the gradient descent find sub-linear planted cliques. We also demonstrate robustness of these Markov chain approaches under a natural contamination model.Comment: 24 pages, 2 figure

    Fine-grained Search Space Classification for Hard Enumeration Variants of Subset Problems

    Full text link
    We propose a simple, powerful, and flexible machine learning framework for (i) reducing the search space of computationally difficult enumeration variants of subset problems and (ii) augmenting existing state-of-the-art solvers with informative cues arising from the input distribution. We instantiate our framework for the problem of listing all maximum cliques in a graph, a central problem in network analysis, data mining, and computational biology. We demonstrate the practicality of our approach on real-world networks with millions of vertices and edges by not only retaining all optimal solutions, but also aggressively pruning the input instance size resulting in several fold speedups of state-of-the-art algorithms. Finally, we explore the limits of scalability and robustness of our proposed framework, suggesting that supervised learning is viable for tackling NP-hard problems in practice.Comment: AAAI 201

    Optimal detection of sparse principal components in high dimension

    Full text link
    We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1127 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the Hardness of Signaling

    Full text link
    There has been a recent surge of interest in the role of information in strategic interactions. Much of this work seeks to understand how the realized equilibrium of a game is influenced by uncertainty in the environment and the information available to players in the game. Lurking beneath this literature is a fundamental, yet largely unexplored, algorithmic question: how should a "market maker" who is privy to additional information, and equipped with a specified objective, inform the players in the game? This is an informational analogue of the mechanism design question, and views the information structure of a game as a mathematical object to be designed, rather than an exogenous variable. We initiate a complexity-theoretic examination of the design of optimal information structures in general Bayesian games, a task often referred to as signaling. We focus on one of the simplest instantiations of the signaling question: Bayesian zero-sum games, and a principal who must choose an information structure maximizing the equilibrium payoff of one of the players. In this setting, we show that optimal signaling is computationally intractable, and in some cases hard to approximate, assuming that it is hard to recover a planted clique from an Erdos-Renyi random graph. This is despite the fact that equilibria in these games are computable in polynomial time, and therefore suggests that the hardness of optimal signaling is a distinct phenomenon from the hardness of equilibrium computation. Necessitated by the non-local nature of information structures, en-route to our results we prove an "amplification lemma" for the planted clique problem which may be of independent interest
    corecore