1,908 research outputs found
Consistency of Spectral Hypergraph Partitioning under Planted Partition Model
Hypergraph partitioning lies at the heart of a number of problems in machine
learning and network sciences. Many algorithms for hypergraph partitioning have
been proposed that extend standard approaches for graph partitioning to the
case of hypergraphs. However, theoretical aspects of such methods have seldom
received attention in the literature as compared to the extensive studies on
the guarantees of graph partitioning. For instance, consistency results of
spectral graph partitioning under the stochastic block model are well known. In
this paper, we present a planted partition model for sparse random non-uniform
hypergraphs that generalizes the stochastic block model. We derive an error
bound for a spectral hypergraph partitioning algorithm under this model using
matrix concentration inequalities. To the best of our knowledge, this is the
first consistency result related to partitioning non-uniform hypergraphs.Comment: 35 pages, 2 figures, 1 tabl
Subsampled Power Iteration: a Unified Algorithm for Block Models and Planted CSP's
We present an algorithm for recovering planted solutions in two well-known
models, the stochastic block model and planted constraint satisfaction
problems, via a common generalization in terms of random bipartite graphs. Our
algorithm matches up to a constant factor the best-known bounds for the number
of edges (or constraints) needed for perfect recovery and its running time is
linear in the number of edges used. The time complexity is significantly better
than both spectral and SDP-based approaches.
The main contribution of the algorithm is in the case of unequal sizes in the
bipartition (corresponding to odd uniformity in the CSP). Here our algorithm
succeeds at a significantly lower density than the spectral approaches,
surpassing a barrier based on the spectral norm of a random matrix.
Other significant features of the algorithm and analysis include (i) the
critical use of power iteration with subsampling, which might be of independent
interest; its analysis requires keeping track of multiple norms of an evolving
solution (ii) it can be implemented statistically, i.e., with very limited
access to the input distribution (iii) the algorithm is extremely simple to
implement and runs in linear time, and thus is practical even for very large
instances
Spectral Thresholds in the Bipartite Stochastic Block Model
We consider a bipartite stochastic block model on vertex sets and
, with planted partitions in each, and ask at what densities efficient
algorithms can recover the partition of the smaller vertex set.
When , multiple thresholds emerge. We first locate a sharp
threshold for detection of the partition, in the sense of the results of
\cite{mossel2012stochastic,mossel2013proof} and \cite{massoulie2014community}
for the stochastic block model. We then show that at a higher edge density, the
singular vectors of the rectangular biadjacency matrix exhibit a localization /
delocalization phase transition, giving recovery above the threshold and no
recovery below. Nevertheless, we propose a simple spectral algorithm, Diagonal
Deletion SVD, which recovers the partition at a nearly optimal edge density.
The bipartite stochastic block model studied here was used by
\cite{feldman2014algorithm} to give a unified algorithm for recovering planted
partitions and assignments in random hypergraphs and random -SAT formulae
respectively. Our results give the best known bounds for the clause density at
which solutions can be found efficiently in these models as well as showing a
barrier to further improvement via this reduction to the bipartite block model.Comment: updated version, will appear in COLT 201
A cost function for similarity-based hierarchical clustering
The development of algorithms for hierarchical clustering has been hampered
by a shortage of precise objective functions. To help address this situation,
we introduce a simple cost function on hierarchies over a set of points, given
pairwise similarities between those points. We show that this criterion behaves
sensibly in canonical instances and that it admits a top-down construction
procedure with a provably good approximation ratio
Window-based Streaming Graph Partitioning Algorithm
In the recent years, the scale of graph datasets has increased to such a
degree that a single machine is not capable of efficiently processing large
graphs. Thereby, efficient graph partitioning is necessary for those large
graph applications. Traditional graph partitioning generally loads the whole
graph data into the memory before performing partitioning; this is not only a
time consuming task but it also creates memory bottlenecks. These issues of
memory limitation and enormous time complexity can be resolved using
stream-based graph partitioning. A streaming graph partitioning algorithm reads
vertices once and assigns that vertex to a partition accordingly. This is also
called an one-pass algorithm. This paper proposes an efficient window-based
streaming graph partitioning algorithm called WStream. The WStream algorithm is
an edge-cut partitioning algorithm, which distributes a vertex among the
partitions. Our results suggest that the WStream algorithm is able to partition
large graph data efficiently while keeping the load balanced across different
partitions, and communication to a minimum. Evaluation results with real
workloads also prove the effectiveness of our proposed algorithm, and it
achieves a significant reduction in load imbalance and edge-cut with different
ranges of dataset
The minimum bisection in the planted bisection model
In the planted bisection model a random graph with
vertices is created by partitioning the vertices randomly into two classes of
equal size (up to ). Any two vertices that belong to the same class are
linked by an edge with probability and any two that belong to different
classes with probability independently. The planted bisection model
has been used extensively to benchmark graph partitioning algorithms. If
for numbers that remain fixed as
, then w.h.p. the ``planted'' bisection (the one used to construct
the graph) will not be a minimum bisection. In this paper we derive an
asymptotic formula for the minimum bisection width under the assumption that
for a certain constant
- β¦