48 research outputs found
Exact Clustering of Weighted Graphs via Semidefinite Programming
As a model problem for clustering, we consider the densest k-disjoint-clique
problem of partitioning a weighted complete graph into k disjoint subgraphs
such that the sum of the densities of these subgraphs is maximized. We
establish that such subgraphs can be recovered from the solution of a
particular semidefinite relaxation with high probability if the input graph is
sampled from a distribution of clusterable graphs. Specifically, the
semidefinite relaxation is exact if the graph consists of k large disjoint
subgraphs, corresponding to clusters, with weight concentrated within these
subgraphs, plus a moderate number of outliers. Further, we establish that if
noise is weakly obscuring these clusters, i.e, the between-cluster edges are
assigned very small weights, then we can recover significantly smaller
clusters. For example, we show that in approximately sparse graphs, where the
between-cluster weights tend to zero as the size n of the graph tends to
infinity, we can recover clusters of size polylogarithmic in n. Empirical
evidence from numerical simulations is also provided to support these
theoretical phase transitions to perfect recovery of the cluster structure
Guaranteed clustering and biclustering via semidefinite programming
Identifying clusters of similar objects in data plays a significant role in a
wide range of applications. As a model problem for clustering, we consider the
densest k-disjoint-clique problem, whose goal is to identify the collection of
k disjoint cliques of a given weighted complete graph maximizing the sum of the
densities of the complete subgraphs induced by these cliques. In this paper, we
establish conditions ensuring exact recovery of the densest k cliques of a
given graph from the optimal solution of a particular semidefinite program. In
particular, the semidefinite relaxation is exact for input graphs corresponding
to data consisting of k large, distinct clusters and a smaller number of
outliers. This approach also yields a semidefinite relaxation for the
biclustering problem with similar recovery guarantees. Given a set of objects
and a set of features exhibited by these objects, biclustering seeks to
simultaneously group the objects and features according to their expression
levels. This problem may be posed as partitioning the nodes of a weighted
bipartite complete graph such that the sum of the densities of the resulting
bipartite complete subgraphs is maximized. As in our analysis of the densest
k-disjoint-clique problem, we show that the correct partition of the objects
and features can be recovered from the optimal solution of a semidefinite
program in the case that the given data consists of several disjoint sets of
objects exhibiting similar features. Empirical evidence from numerical
experiments supporting these theoretical guarantees is also provided
Convex optimization for the planted k-disjoint-clique problem
We consider the k-disjoint-clique problem. The input is an undirected graph G
in which the nodes represent data items, and edges indicate a similarity
between the corresponding items. The problem is to find within the graph k
disjoint cliques that cover the maximum number of nodes of G. This problem may
be understood as a general way to pose the classical `clustering' problem. In
clustering, one is given data items and a distance function, and one wishes to
partition the data into disjoint clusters of data items, such that the items in
each cluster are close to each other. Our formulation additionally allows
`noise' nodes to be present in the input data that are not part of any of the
cliques. The k-disjoint-clique problem is NP-hard, but we show that a convex
relaxation can solve it in polynomial time for input instances constructed in a
certain way. The input instances for which our algorithm finds the optimal
solution consist of k disjoint large cliques (called `planted cliques') that
are then obscured by noise edges and noise nodes inserted either at random or
by an adversary
Matched Filters for Noisy Induced Subgraph Detection
The problem of finding the vertex correspondence between two noisy graphs
with different number of vertices where the smaller graph is still large has
many applications in social networks, neuroscience, and computer vision. We
propose a solution to this problem via a graph matching matched filter:
centering and padding the smaller adjacency matrix and applying graph matching
methods to align it to the larger network. The centering and padding schemes
can be incorporated into any algorithm that matches using adjacency matrices.
Under a statistical model for correlated pairs of graphs, which yields a noisy
copy of the small graph within the larger graph, the resulting optimization
problem can be guaranteed to recover the true vertex correspondence between the
networks.
However, there are currently no efficient algorithms for solving this
problem. To illustrate the possibilities and challenges of such problems, we
use an algorithm that can exploit a partially known correspondence and show via
varied simulations and applications to {\it Drosophila} and human connectomes
that this approach can achieve good performance.Comment: 41 pages, 7 figure
Matched filters for noisy induced subgraph detection
First author draftWe consider the problem of finding the vertex correspondence between two graphs with different number of vertices where the smaller graph is still potentially large. We propose a solution to this problem via a graph matching matched filter: padding the smaller graph in different ways and then using graph matching methods to align it to the larger network. Under a statistical model for correlated pairs of graphs, which yields a noisy copy of the small graph within the larger graph, the resulting optimization problem can be guaranteed to recover the true vertex correspondence between the networks, though there are currently no efficient algorithms for solving this problem. We consider an approach that exploits a partially known correspondence and show via varied simulations and applications to the Drosophila connectome that in practice this approach can achieve good performance.https://arxiv.org/abs/1803.02423https://arxiv.org/abs/1803.0242
Harnessing the mathematics of matrix decomposition to solve planted and maximum clique problem
We consider the problem of identifying a maximum clique in a given graph. We
have proposed a mathematical model for this problem. The model resembles the
matrix decomposition of the adjacency matrix of a given graph. The objective
function of the mathematical model includes a weighted -norm of the
sparse matrix of the decomposition, which has an advantage over the known
norm in reducing the error. The use of dynamically changing the
weights for the -norm has been motivated. We have used proximal
operators within the iterates of the ADMM (alternating direction method of
multipliers) algorithm to solve the optimization problem. Convergence of the
proposed ADMM algorithm has been provided. The theoretical guarantee of the
maximum clique in the form of the low-rank matrix has also been established
using the golfing scheme to construct approximate dual certificates. We have
constructed conditions that guarantee the recovery and uniqueness of the
solution, as well as a tight bound on the dual matrix that validates optimality
conditions. Numerical results for planted cliques are presented showing clear
advantages of our model when compared with two recent mathematical models.
Results are also presented for randomly generated graphs with minimal errors.
These errors are found using a formula we have proposed based on the size of
the clique. Moreover, we have applied our algorithm to real-world graphs for
which cliques have been recovered successfully. The validity of these clique
sizes comes from the decomposition of input graph into a rank-one matrix
(corresponds to the clique) and a sparse matrix
Convex relaxation for the planted clique, biclique, and clustering problems
A clique of a graph G is a set of pairwise adjacent nodes of G. Similarly, a biclique (U, V ) of a bipartite graph G is a pair of disjoint, independent vertex sets such that each node in U is adjacent to every node in V in G. We consider the problems of identifying the maximum clique of a graph, known as the maximum clique problem, and identifying the biclique (U, V ) of a bipartite graph that maximizes the product |U | · |V |, known as the maximum edge biclique problem. We show that finding a clique or biclique of a given size in a graph is equivalent to finding a rank one matrix satisfying a particular set of linear constraints. These problems can be formulated as rank minimization problems and relaxed to convex programming by replacing rank with its convex envelope, the nuclear norm. Both problems are NP-hard yet we show that our relaxation is exact in the case that the input graph contains a large clique or biclique plus additional nodes and edges. For each problem, we provide two analyses of when our relaxation is exact. In the first,
the diversionary edges are added deterministically by an adversary. In the second, each potential edge is added to the graph independently at random with fixed probability p. In the random case, our bounds match the earlier bounds of Alon, Krivelevich, and Sudakov, as well as Feige and Krauthgamer for the maximum clique problem.
We extend these results and techniques to the k-disjoint-clique problem. The maximum node k-disjoint-clique problem is to find a set of k disjoint cliques of a given input graph containing the maximum number of nodes. Given input graph G and nonnegative edge
weights w, the maximum mean weight k-disjoint-clique problem seeks to identify the set of k disjoint cliques of G that maximizes the sum of the average weights of the edges, with respect to w, of the complete subgraphs of G induced by the cliques. These problems may be considered as a way to pose the clustering problem. In clustering, one wants to partition a given data set so that the data items in each partition or cluster are similar and the items in different clusters are dissimilar. For the graph G such that the set of nodes represents a given data set and any two nodes are adjacent if and only if the corresponding items are similar, clustering the data into k disjoint clusters is equivalent to partitioning G into k-disjoint cliques. Similarly, given a complete graph with nodes corresponding to a given data set and edge weights indicating similarity between each pair of items, the data may be clustered by solving the maximum mean weight k-disjoint-clique problem.
We show that both instances of the k-disjoint-clique problem can be formulated as rank constrained optimization problems and relaxed to semidefinite programs using the nuclear norm relaxation of rank. We also show that when the input instance corresponds to a collection of k disjoint planted cliques plus additional edges and nodes, this semidefinite relaxation is exact for both problems. We provide theoretical bounds that guarantee exactness of our relaxation and provide empirical examples of successful applications of our algorithm to synthetic data sets, as well as data sets from clustering applications
Planted Models for the Densest k-Subgraph Problem
Given an undirected graph G, the Densest k-subgraph problem (DkS) asks to compute a set S ? V of cardinality |S| ? k such that the weight of edges inside S is maximized. This is a fundamental NP-hard problem whose approximability, inspite of many decades of research, is yet to be settled. The current best known approximation algorithm due to Bhaskara et al. (2010) computes a ?(n^{1/4 + ?}) approximation in time n^{?(1/?)}, for any ? > 0.
We ask what are some "easier" instances of this problem? We propose some natural semi-random models of instances with a planted dense subgraph, and study approximation algorithms for computing the densest subgraph in them. These models are inspired by the semi-random models of instances studied for various other graph problems such as the independent set problem, graph partitioning problems etc. For a large range of parameters of these models, we get significantly better approximation factors for the Densest k-subgraph problem. Moreover, our algorithm recovers a large part of the planted solution