8 research outputs found

    Detecting High Log-Densities -- an O(n^1/4) Approximation for Densest k-Subgraph

    Full text link
    In the Densest k-Subgraph problem, given a graph G and a parameter k, one needs to find a subgraph of G induced on k vertices that contains the largest number of edges. There is a significant gap between the best known upper and lower bounds for this problem. It is NP-hard, and does not have a PTAS unless NP has subexponential time algorithms. On the other hand, the current best known algorithm of Feige, Kortsarz and Peleg, gives an approximation ratio of n^(1/3-epsilon) for some specific epsilon > 0 (estimated at around 1/60). We present an algorithm that for every epsilon > 0 approximates the Densest k-Subgraph problem within a ratio of n^(1/4+epsilon) in time n^O(1/epsilon). In particular, our algorithm achieves an approximation ratio of O(n^1/4) in time n^O(log n). Our algorithm is inspired by studying an average-case version of the problem where the goal is to distinguish random graphs from graphs with planted dense subgraphs. The approximation ratio we achieve for the general case matches the distinguishing ratio we obtain for this planted problem. At a high level, our algorithms involve cleverly counting appropriately defined trees of constant size in G, and using these counts to identify the vertices of the dense subgraph. Our algorithm is based on the following principle. We say that a graph G(V,E) has log-density alpha if its average degree is Theta(|V|^alpha). The algorithmic core of our result is a family of algorithms that output k-subgraphs of nontrivial density whenever the log-density of the densest k-subgraph is larger than the log-density of the host graph.Comment: 23 page

    Label optimal regret bounds for online local learning

    Get PDF
    We resolve an open question from (Christiano, 2014b) posed in COLT'14 regarding the optimal dependency of the regret achievable for online local learning on the size of the label set. In this framework the algorithm is shown a pair of items at each step, chosen from a set of nn items. The learner then predicts a label for each item, from a label set of size LL and receives a real valued payoff. This is a natural framework which captures many interesting scenarios such as collaborative filtering, online gambling, and online max cut among others. (Christiano, 2014a) designed an efficient online learning algorithm for this problem achieving a regret of O(nL3T)O(\sqrt{nL^3T}), where TT is the number of rounds. Information theoretically, one can achieve a regret of O(nlogLT)O(\sqrt{n \log L T}). One of the main open questions left in this framework concerns closing the above gap. In this work, we provide a complete answer to the question above via two main results. We show, via a tighter analysis, that the semi-definite programming based algorithm of (Christiano, 2014a), in fact achieves a regret of O(nLT)O(\sqrt{nLT}). Second, we show a matching computational lower bound. Namely, we show that a polynomial time algorithm for online local learning with lower regret would imply a polynomial time algorithm for the planted clique problem which is widely believed to be hard. We prove a similar hardness result under a related conjecture concerning planted dense subgraphs that we put forth. Unlike planted clique, the planted dense subgraph problem does not have any known quasi-polynomial time algorithms. Computational lower bounds for online learning are relatively rare, and we hope that the ideas developed in this work will lead to lower bounds for other online learning scenarios as well.Comment: 13 pages; Changes from previous version: small changes to proofs of Theorems 1 & 2, a small rewrite of introduction as well (this version is the same as camera-ready copy in COLT '15

    A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem

    Full text link
    Many graph mining applications rely on detecting subgraphs which are near-cliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand the densest subgraph problem (DSP) which maximizes the average degree over all subgraphs is solvable in polynomial time but for many networks fails to find subgraphs which are near-cliques. On the other hand, formulations that are geared towards finding near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem. In this work, we propose a formulation which combines the best of both worlds: it is solvable in polynomial time and finds near-cliques when the DSP fails. Surprisingly, our formulation is a simple variation of the DSP. Specifically, we define the triangle densest subgraph problem (TDSP): given G(V,E)G(V,E), find a subset of vertices SS^* such that τ(S)=maxSVt(S)S\tau(S^*)=\max_{S \subseteq V} \frac{t(S)}{|S|}, where t(S)t(S) is the number of triangles induced by the set SS. We provide various exact and approximation algorithms which the solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to the more general problem of maximizing the kk-clique average density. Finally, we provide empirical evidence that the TDSP should be used whenever the output of the DSP fails to output a near-clique.Comment: 42 page

    Machinery for Proving Sum-of-Squares Lower Bounds on Certification Problems

    Full text link
    In this paper, we construct general machinery for proving Sum-of-Squares lower bounds on certification problems by generalizing the techniques used by Barak et al. [FOCS 2016] to prove Sum-of-Squares lower bounds for planted clique. Using this machinery, we prove degree nϵn^{\epsilon} Sum-of-Squares lower bounds for tensor PCA, the Wishart model of sparse PCA, and a variant of planted clique which we call planted slightly denser subgraph.Comment: 134 page
    corecore