14 research outputs found
Graph Clustering With Missing Data: Convex Algorithms and Analysis
We consider the problem of finding clusters in an unweighted graph, when the graph is partially observed. We analyze two programs, one which works for dense
graphs and one which works for both sparse and dense graphs, but requires some a priori knowledge of the total cluster size, that are based on the convex optimization
approach for low-rank matrix recovery using nuclear norm minimization. For the commonly used Stochastic Block Model, we obtain explicit bounds on the
parameters of the problem (size and sparsity of clusters, the amount of observed data) and the regularization parameter characterize the success and failure of the
programs. We corroborate our theoretical findings through extensive simulations. We also run our algorithm on a real data set obtained from crowdsourcing an
image classification task on the Amazon Mechanical Turk, and observe significant performance improvement over traditional methods such as k-means
Exact Clustering of Weighted Graphs via Semidefinite Programming
As a model problem for clustering, we consider the densest k-disjoint-clique
problem of partitioning a weighted complete graph into k disjoint subgraphs
such that the sum of the densities of these subgraphs is maximized. We
establish that such subgraphs can be recovered from the solution of a
particular semidefinite relaxation with high probability if the input graph is
sampled from a distribution of clusterable graphs. Specifically, the
semidefinite relaxation is exact if the graph consists of k large disjoint
subgraphs, corresponding to clusters, with weight concentrated within these
subgraphs, plus a moderate number of outliers. Further, we establish that if
noise is weakly obscuring these clusters, i.e, the between-cluster edges are
assigned very small weights, then we can recover significantly smaller
clusters. For example, we show that in approximately sparse graphs, where the
between-cluster weights tend to zero as the size n of the graph tends to
infinity, we can recover clusters of size polylogarithmic in n. Empirical
evidence from numerical simulations is also provided to support these
theoretical phase transitions to perfect recovery of the cluster structure
Community detection in sparse networks via Grothendieck's inequality
We present a simple and flexible method to prove consistency of semidefinite
optimization problems on random graphs. The method is based on Grothendieck's
inequality. Unlike the previous uses of this inequality that lead to constant
relative accuracy, we achieve any given relative accuracy by leveraging
randomness. We illustrate the method with the problem of community detection in
sparse networks, those with bounded average degrees. We demonstrate that even
in this regime, various simple and natural semidefinite programs can be used to
recover the community structure up to an arbitrarily small fraction of
misclassified vertices. The method is general; it can be applied to a variety
of stochastic models of networks and semidefinite programs.Comment: This is the final version, incorporating the referee's comment