1,908 research outputs found

    Consistency of Spectral Hypergraph Partitioning under Planted Partition Model

    Full text link
    Hypergraph partitioning lies at the heart of a number of problems in machine learning and network sciences. Many algorithms for hypergraph partitioning have been proposed that extend standard approaches for graph partitioning to the case of hypergraphs. However, theoretical aspects of such methods have seldom received attention in the literature as compared to the extensive studies on the guarantees of graph partitioning. For instance, consistency results of spectral graph partitioning under the stochastic block model are well known. In this paper, we present a planted partition model for sparse random non-uniform hypergraphs that generalizes the stochastic block model. We derive an error bound for a spectral hypergraph partitioning algorithm under this model using matrix concentration inequalities. To the best of our knowledge, this is the first consistency result related to partitioning non-uniform hypergraphs.Comment: 35 pages, 2 figures, 1 tabl

    Subsampled Power Iteration: a Unified Algorithm for Block Models and Planted CSP's

    Get PDF
    We present an algorithm for recovering planted solutions in two well-known models, the stochastic block model and planted constraint satisfaction problems, via a common generalization in terms of random bipartite graphs. Our algorithm matches up to a constant factor the best-known bounds for the number of edges (or constraints) needed for perfect recovery and its running time is linear in the number of edges used. The time complexity is significantly better than both spectral and SDP-based approaches. The main contribution of the algorithm is in the case of unequal sizes in the bipartition (corresponding to odd uniformity in the CSP). Here our algorithm succeeds at a significantly lower density than the spectral approaches, surpassing a barrier based on the spectral norm of a random matrix. Other significant features of the algorithm and analysis include (i) the critical use of power iteration with subsampling, which might be of independent interest; its analysis requires keeping track of multiple norms of an evolving solution (ii) it can be implemented statistically, i.e., with very limited access to the input distribution (iii) the algorithm is extremely simple to implement and runs in linear time, and thus is practical even for very large instances

    Spectral Thresholds in the Bipartite Stochastic Block Model

    Get PDF
    We consider a bipartite stochastic block model on vertex sets V1V_1 and V2V_2, with planted partitions in each, and ask at what densities efficient algorithms can recover the partition of the smaller vertex set. When ∣V2βˆ£β‰«βˆ£V1∣|V_2| \gg |V_1|, multiple thresholds emerge. We first locate a sharp threshold for detection of the partition, in the sense of the results of \cite{mossel2012stochastic,mossel2013proof} and \cite{massoulie2014community} for the stochastic block model. We then show that at a higher edge density, the singular vectors of the rectangular biadjacency matrix exhibit a localization / delocalization phase transition, giving recovery above the threshold and no recovery below. Nevertheless, we propose a simple spectral algorithm, Diagonal Deletion SVD, which recovers the partition at a nearly optimal edge density. The bipartite stochastic block model studied here was used by \cite{feldman2014algorithm} to give a unified algorithm for recovering planted partitions and assignments in random hypergraphs and random kk-SAT formulae respectively. Our results give the best known bounds for the clause density at which solutions can be found efficiently in these models as well as showing a barrier to further improvement via this reduction to the bipartite block model.Comment: updated version, will appear in COLT 201

    A cost function for similarity-based hierarchical clustering

    Full text link
    The development of algorithms for hierarchical clustering has been hampered by a shortage of precise objective functions. To help address this situation, we introduce a simple cost function on hierarchies over a set of points, given pairwise similarities between those points. We show that this criterion behaves sensibly in canonical instances and that it admits a top-down construction procedure with a provably good approximation ratio

    Window-based Streaming Graph Partitioning Algorithm

    Full text link
    In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph applications. Traditional graph partitioning generally loads the whole graph data into the memory before performing partitioning; this is not only a time consuming task but it also creates memory bottlenecks. These issues of memory limitation and enormous time complexity can be resolved using stream-based graph partitioning. A streaming graph partitioning algorithm reads vertices once and assigns that vertex to a partition accordingly. This is also called an one-pass algorithm. This paper proposes an efficient window-based streaming graph partitioning algorithm called WStream. The WStream algorithm is an edge-cut partitioning algorithm, which distributes a vertex among the partitions. Our results suggest that the WStream algorithm is able to partition large graph data efficiently while keeping the load balanced across different partitions, and communication to a minimum. Evaluation results with real workloads also prove the effectiveness of our proposed algorithm, and it achieves a significant reduction in load imbalance and edge-cut with different ranges of dataset

    The minimum bisection in the planted bisection model

    Get PDF
    In the planted bisection model a random graph G(n,p+,pβˆ’)G(n,p_+,p_- ) with nn vertices is created by partitioning the vertices randomly into two classes of equal size (up to Β±1\pm1). Any two vertices that belong to the same class are linked by an edge with probability p+p_+ and any two that belong to different classes with probability pβˆ’<p+p_- <p_+ independently. The planted bisection model has been used extensively to benchmark graph partitioning algorithms. If pΒ±=2dΒ±/np_{\pm} =2d_{\pm} /n for numbers 0≀dβˆ’<d+0\leq d_- <d_+ that remain fixed as nβ†’βˆžn\to\infty, then w.h.p. the ``planted'' bisection (the one used to construct the graph) will not be a minimum bisection. In this paper we derive an asymptotic formula for the minimum bisection width under the assumption that d+βˆ’dβˆ’>cd+ln⁑d+d_+ -d_- >c\sqrt{d_+ \ln d_+ } for a certain constant c>0c>0
    • …
    corecore