6 research outputs found
Exact Clustering of Weighted Graphs via Semidefinite Programming
As a model problem for clustering, we consider the densest k-disjoint-clique
problem of partitioning a weighted complete graph into k disjoint subgraphs
such that the sum of the densities of these subgraphs is maximized. We
establish that such subgraphs can be recovered from the solution of a
particular semidefinite relaxation with high probability if the input graph is
sampled from a distribution of clusterable graphs. Specifically, the
semidefinite relaxation is exact if the graph consists of k large disjoint
subgraphs, corresponding to clusters, with weight concentrated within these
subgraphs, plus a moderate number of outliers. Further, we establish that if
noise is weakly obscuring these clusters, i.e, the between-cluster edges are
assigned very small weights, then we can recover significantly smaller
clusters. For example, we show that in approximately sparse graphs, where the
between-cluster weights tend to zero as the size n of the graph tends to
infinity, we can recover clusters of size polylogarithmic in n. Empirical
evidence from numerical simulations is also provided to support these
theoretical phase transitions to perfect recovery of the cluster structure
Sharp performance bounds for graph clustering via convex optimization
The problem of finding clusters in a graph arises in several applications such as social networks, data mining and computer networks. A typical, convex optimization-approach, that is often adopted is to identify a sparse plus low-rank decomposition of the adjacency matrix of the graph, with the (dense) low-rank component representing the clusters. In this paper, we sharply characterize the conditions for successfully identifying clusters using this approach. In particular, we introduce the “effective density” of a cluster that measures its significance and we find explicit upper and lower bounds on the minimum effective density that demarcates regions of success or failure of this technique. Our conditions are in terms of (a) the size of the clusters, (b) the denseness of the graph, and (c) regularization parameter of the convex program. We also present extensive simulations that corroborate our theoretical findings