8,527 research outputs found
Clustering Partially Observed Graphs via Convex Optimization
This paper considers the problem of clustering a partially observed
unweighted graph---i.e., one where for some node pairs we know there is an edge
between them, for some others we know there is no edge, and for the remaining
we do not know whether or not there is an edge. We want to organize the nodes
into disjoint clusters so that there is relatively dense (observed)
connectivity within clusters, and sparse across clusters.
We take a novel yet natural approach to this problem, by focusing on finding
the clustering that minimizes the number of "disagreements"---i.e., the sum of
the number of (observed) missing edges within clusters, and (observed) present
edges across clusters. Our algorithm uses convex optimization; its basis is a
reduction of disagreement minimization to the problem of recovering an
(unknown) low-rank matrix and an (unknown) sparse matrix from their partially
observed sum. We evaluate the performance of our algorithm on the classical
Planted Partition/Stochastic Block Model. Our main theorem provides sufficient
conditions for the success of our algorithm as a function of the minimum
cluster size, edge density and observation probability; in particular, the
results characterize the tradeoff between the observation probability and the
edge density gap. When there are a constant number of clusters of equal size,
our results are optimal up to logarithmic factors.Comment: This is the final version published in Journal of Machine Learning
Research (JMLR). Partial results appeared in International Conference on
Machine Learning (ICML) 201
Scalable and Robust Community Detection with Randomized Sketching
This paper explores and analyzes the unsupervised clustering of large
partially observed graphs. We propose a scalable and provable randomized
framework for clustering graphs generated from the stochastic block model. The
clustering is first applied to a sub-matrix of the graph's adjacency matrix
associated with a reduced graph sketch constructed using random sampling. Then,
the clusters of the full graph are inferred based on the clusters extracted
from the sketch using a correlation-based retrieval step. Uniform random node
sampling is shown to improve the computational complexity over clustering of
the full graph when the cluster sizes are balanced. A new random degree-based
node sampling algorithm is presented which significantly improves upon the
performance of the clustering algorithm even when clusters are unbalanced. This
algorithm improves the phase transitions for matrix-decomposition-based
clustering with regard to computational complexity and minimum cluster size,
which are shown to be nearly dimension-free in the low inter-cluster
connectivity regime. A third sampling technique is shown to improve balance by
randomly sampling nodes based on spatial distribution. We provide analysis and
numerical results using a convex clustering algorithm based on matrix
completion
- …