18 research outputs found

    New Abilities and Limitations of Spectral Graph Bisection

    Get PDF
    Spectral based heuristics belong to well-known commonly used methods which determines provably minimal graph bisection or outputs "fail" when the optimality cannot be certified. In this paper we focus on Boppana\u27s algorithm which belongs to one of the most prominent methods of this type. It is well known that the algorithm works well in the random planted bisection model - the standard class of graphs for analysis minimum bisection and relevant problems. In 2001 Feige and Kilian posed the question if Boppana\u27s algorithm works well in the semirandom model by Blum and Spencer. In our paper we answer this question affirmatively. We show also that the algorithm achieves similar performance on graph classes which extend the semirandom model. Since the behavior of Boppana\u27s algorithm on the semirandom graphs remained unknown, Feige and Kilian proposed a new semidefinite programming (SDP) based approach and proved that it works on this model. The relationship between the performance of the SDP based algorithm and Boppana\u27s approach was left as an open problem. In this paper we solve the problem in a complete way by proving that the bisection algorithm of Feige and Kilian provides exactly the same results as Boppana\u27s algorithm. As a consequence we get that Boppana\u27s algorithm achieves the optimal threshold for exact cluster recovery in the stochastic block model. On the other hand we prove some limitations of Boppana\u27s approach: we show that if the density difference on the parameters of the planted bisection model is too small then the algorithm fails with high probability in the model

    Consistency Thresholds for the Planted Bisection Model

    Full text link
    The planted bisection model is a random graph model in which the nodes are divided into two equal-sized communities and then edges are added randomly in a way that depends on the community membership. We establish necessary and sufficient conditions for the asymptotic recoverability of the planted bisection in this model. When the bisection is asymptotically recoverable, we give an efficient algorithm that successfully recovers it. We also show that the planted bisection is recoverable asymptotically if and only if with high probability every node belongs to the same community as the majority of its neighbors. Our algorithm for finding the planted bisection runs in time almost linear in the number of edges. It has three stages: spectral clustering to compute an initial guess, a "replica" stage to get almost every vertex correct, and then some simple local moves to finish the job. An independent work by Abbe, Bandeira, and Hall establishes similar (slightly weaker) results but only in the case of logarithmic average degree.Comment: latest version contains an erratum, addressing an error pointed out by Jan van Waai

    The minimum bisection in the planted bisection model

    Get PDF
    In the planted bisection model a random graph G(n,p+,pβˆ’)G(n,p_+,p_- ) with nn vertices is created by partitioning the vertices randomly into two classes of equal size (up to Β±1\pm1). Any two vertices that belong to the same class are linked by an edge with probability p+p_+ and any two that belong to different classes with probability pβˆ’<p+p_- <p_+ independently. The planted bisection model has been used extensively to benchmark graph partitioning algorithms. If pΒ±=2dΒ±/np_{\pm} =2d_{\pm} /n for numbers 0≀dβˆ’<d+0\leq d_- <d_+ that remain fixed as nβ†’βˆžn\to\infty, then w.h.p. the ``planted'' bisection (the one used to construct the graph) will not be a minimum bisection. In this paper we derive an asymptotic formula for the minimum bisection width under the assumption that d+βˆ’dβˆ’>cd+ln⁑d+d_+ -d_- >c\sqrt{d_+ \ln d_+ } for a certain constant c>0c>0

    Clustering Partially Observed Graphs via Convex Optimization

    Get PDF
    This paper considers the problem of clustering a partially observed unweighted graph---i.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"---i.e., the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) low-rank matrix and an (unknown) sparse matrix from their partially observed sum. We evaluate the performance of our algorithm on the classical Planted Partition/Stochastic Block Model. Our main theorem provides sufficient conditions for the success of our algorithm as a function of the minimum cluster size, edge density and observation probability; in particular, the results characterize the tradeoff between the observation probability and the edge density gap. When there are a constant number of clusters of equal size, our results are optimal up to logarithmic factors.Comment: This is the final version published in Journal of Machine Learning Research (JMLR). Partial results appeared in International Conference on Machine Learning (ICML) 201

    Asymptotic Mutual Information for the Two-Groups Stochastic Block Model

    Full text link
    We develop an information-theoretic view of the stochastic block model, a popular statistical model for the large-scale structure of complex networks. A graph GG from such a model is generated by first assigning vertex labels at random from a finite alphabet, and then connecting vertices with edge probabilities depending on the labels of the endpoints. In the case of the symmetric two-group model, we establish an explicit `single-letter' characterization of the per-vertex mutual information between the vertex labels and the graph. The explicit expression of the mutual information is intimately related to estimation-theoretic quantities, and --in particular-- reveals a phase transition at the critical point for community detection. Below the critical point the per-vertex mutual information is asymptotically the same as if edges were independent. Correspondingly, no algorithm can estimate the partition better than random guessing. Conversely, above the threshold, the per-vertex mutual information is strictly smaller than the independent-edges upper bound. In this regime there exists a procedure that estimates the vertex labels better than random guessing.Comment: 41 pages, 3 pdf figure

    Linear Programming and Community Detection

    Full text link
    The problem of community detection with two equal-sized communities is closely related to the minimum graph bisection problem over certain random graph models. In the stochastic block model distribution over networks with community structure, a well-known semidefinite programming (SDP) relaxation of the minimum bisection problem recovers the underlying communities whenever possible. Motivated by their superior scalability, we study the theoretical performance of linear programming (LP) relaxations of the minimum bisection problem for the same random models. We show that unlike the SDP relaxation that undergoes a phase transition in the logarithmic average-degree regime, the LP relaxation exhibits a transition from recovery to non-recovery in the linear average-degree regime. We show that in the logarithmic average-degree regime, the LP relaxation fails in recovering the planted bisection with high probability.Comment: 35 pages, 3 figure
    corecore