12 research outputs found
Convex optimization for the planted k-disjoint-clique problem
We consider the k-disjoint-clique problem. The input is an undirected graph G
in which the nodes represent data items, and edges indicate a similarity
between the corresponding items. The problem is to find within the graph k
disjoint cliques that cover the maximum number of nodes of G. This problem may
be understood as a general way to pose the classical `clustering' problem. In
clustering, one is given data items and a distance function, and one wishes to
partition the data into disjoint clusters of data items, such that the items in
each cluster are close to each other. Our formulation additionally allows
`noise' nodes to be present in the input data that are not part of any of the
cliques. The k-disjoint-clique problem is NP-hard, but we show that a convex
relaxation can solve it in polynomial time for input instances constructed in a
certain way. The input instances for which our algorithm finds the optimal
solution consist of k disjoint large cliques (called `planted cliques') that
are then obscured by noise edges and noise nodes inserted either at random or
by an adversary
Guaranteed clustering and biclustering via semidefinite programming
Identifying clusters of similar objects in data plays a significant role in a
wide range of applications. As a model problem for clustering, we consider the
densest k-disjoint-clique problem, whose goal is to identify the collection of
k disjoint cliques of a given weighted complete graph maximizing the sum of the
densities of the complete subgraphs induced by these cliques. In this paper, we
establish conditions ensuring exact recovery of the densest k cliques of a
given graph from the optimal solution of a particular semidefinite program. In
particular, the semidefinite relaxation is exact for input graphs corresponding
to data consisting of k large, distinct clusters and a smaller number of
outliers. This approach also yields a semidefinite relaxation for the
biclustering problem with similar recovery guarantees. Given a set of objects
and a set of features exhibited by these objects, biclustering seeks to
simultaneously group the objects and features according to their expression
levels. This problem may be posed as partitioning the nodes of a weighted
bipartite complete graph such that the sum of the densities of the resulting
bipartite complete subgraphs is maximized. As in our analysis of the densest
k-disjoint-clique problem, we show that the correct partition of the objects
and features can be recovered from the optimal solution of a semidefinite
program in the case that the given data consists of several disjoint sets of
objects exhibiting similar features. Empirical evidence from numerical
experiments supporting these theoretical guarantees is also provided