410 research outputs found
Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering
Graph clustering, or community detection, is the task of identifying groups
of closely related objects in a large network. In this paper we introduce a new
community-detection framework called LambdaCC that is based on a specially
weighted version of correlation clustering. A key component in our methodology
is a clustering resolution parameter, , which implicitly controls the
size and structure of clusters formed by our framework. We show that, by
increasing this parameter, our objective effectively interpolates between two
different strategies in graph clustering: finding a sparse cut and forming
dense subgraphs. Our methodology unifies and generalizes a number of other
important clustering quality functions including modularity, sparsest cut, and
cluster deletion, and places them all within the context of an optimization
problem that has been well studied from the perspective of approximation
algorithms. Our approach is particularly relevant in the regime of finding
dense clusters, as it leads to a 2-approximation for the cluster deletion
problem. We use our approach to cluster several graphs, including large
collaboration networks and social networks
Bilu-Linial Stable Instances of Max Cut and Minimum Multiway Cut
We investigate the notion of stability proposed by Bilu and Linial. We obtain
an exact polynomial-time algorithm for -stable Max Cut instances with
for some absolute constant . Our
algorithm is robust: it never returns an incorrect answer; if the instance is
-stable, it finds the maximum cut, otherwise, it either finds the
maximum cut or certifies that the instance is not -stable. We prove
that there is no robust polynomial-time algorithm for -stable instances
of Max Cut when , where is the best
approximation factor for Sparsest Cut with non-uniform demands.
Our algorithm is based on semidefinite programming. We show that the standard
SDP relaxation for Max Cut (with triangle inequalities) is integral
if , where
is the least distortion with which every point metric space of negative
type embeds into . On the negative side, we show that the SDP
relaxation is not integral when .
Moreover, there is no tractable convex relaxation for -stable instances
of Max Cut when . That suggests that solving
-stable instances with might be difficult or
impossible.
Our results significantly improve previously known results. The best
previously known algorithm for -stable instances of Max Cut required
that (for some ) [Bilu, Daniely, Linial, and
Saks]. No hardness results were known for the problem. Additionally, we present
an algorithm for 4-stable instances of Minimum Multiway Cut. We also study a
relaxed notion of weak stability.Comment: 24 page
Certified Algorithms: Worst-Case Analysis and Beyond
In this paper, we introduce the notion of a certified algorithm. Certified algorithms provide worst-case and beyond-worst-case performance guarantees. First, a ?-certified algorithm is also a ?-approximation algorithm - it finds a ?-approximation no matter what the input is. Second, it exactly solves ?-perturbation-resilient instances (?-perturbation-resilient instances model real-life instances). Additionally, certified algorithms have a number of other desirable properties: they solve both maximization and minimization versions of a problem (e.g. Max Cut and Min Uncut), solve weakly perturbation-resilient instances, and solve optimization problems with hard constraints.
In the paper, we define certified algorithms, describe their properties, present a framework for designing certified algorithms, provide examples of certified algorithms for Max Cut/Min Uncut, Minimum Multiway Cut, k-medians and k-means. We also present some negative results
Improved Cheeger's Inequality: Analysis of Spectral Partitioning Algorithms through Higher Order Spectral Gap
Let \phi(G) be the minimum conductance of an undirected graph G, and let
0=\lambda_1 <= \lambda_2 <=... <= \lambda_n <= 2 be the eigenvalues of the
normalized Laplacian matrix of G. We prove that for any graph G and any k >= 2,
\phi(G) = O(k) \lambda_2 / \sqrt{\lambda_k}, and this performance guarantee
is achieved by the spectral partitioning algorithm. This improves Cheeger's
inequality, and the bound is optimal up to a constant factor for any k. Our
result shows that the spectral partitioning algorithm is a constant factor
approximation algorithm for finding a sparse cut if \lambda_k$ is a constant
for some constant k. This provides some theoretical justification to its
empirical performance in image segmentation and clustering problems. We extend
the analysis to other graph partitioning problems, including multi-way
partition, balanced separator, and maximum cut
Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes
- …