410 research outputs found

    Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

    Get PDF
    Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter, λ\lambda, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks

    Bilu-Linial Stable Instances of Max Cut and Minimum Multiway Cut

    Full text link
    We investigate the notion of stability proposed by Bilu and Linial. We obtain an exact polynomial-time algorithm for γ\gamma-stable Max Cut instances with γclognloglogn\gamma \geq c\sqrt{\log n}\log\log n for some absolute constant c>0c > 0. Our algorithm is robust: it never returns an incorrect answer; if the instance is γ\gamma-stable, it finds the maximum cut, otherwise, it either finds the maximum cut or certifies that the instance is not γ\gamma-stable. We prove that there is no robust polynomial-time algorithm for γ\gamma-stable instances of Max Cut when γ<αSC(n/2)\gamma < \alpha_{SC}(n/2), where αSC\alpha_{SC} is the best approximation factor for Sparsest Cut with non-uniform demands. Our algorithm is based on semidefinite programming. We show that the standard SDP relaxation for Max Cut (with 22\ell_2^2 triangle inequalities) is integral if γD221(n)\gamma \geq D_{\ell_2^2\to \ell_1}(n), where D221(n)D_{\ell_2^2\to \ell_1}(n) is the least distortion with which every nn point metric space of negative type embeds into 1\ell_1. On the negative side, we show that the SDP relaxation is not integral when γ<D221(n/2)\gamma < D_{\ell_2^2\to \ell_1}(n/2). Moreover, there is no tractable convex relaxation for γ\gamma-stable instances of Max Cut when γ<αSC(n/2)\gamma < \alpha_{SC}(n/2). That suggests that solving γ\gamma-stable instances with γ=o(logn)\gamma =o(\sqrt{\log n}) might be difficult or impossible. Our results significantly improve previously known results. The best previously known algorithm for γ\gamma-stable instances of Max Cut required that γcn\gamma \geq c\sqrt{n} (for some c>0c > 0) [Bilu, Daniely, Linial, and Saks]. No hardness results were known for the problem. Additionally, we present an algorithm for 4-stable instances of Minimum Multiway Cut. We also study a relaxed notion of weak stability.Comment: 24 page

    Certified Algorithms: Worst-Case Analysis and Beyond

    Get PDF
    In this paper, we introduce the notion of a certified algorithm. Certified algorithms provide worst-case and beyond-worst-case performance guarantees. First, a ?-certified algorithm is also a ?-approximation algorithm - it finds a ?-approximation no matter what the input is. Second, it exactly solves ?-perturbation-resilient instances (?-perturbation-resilient instances model real-life instances). Additionally, certified algorithms have a number of other desirable properties: they solve both maximization and minimization versions of a problem (e.g. Max Cut and Min Uncut), solve weakly perturbation-resilient instances, and solve optimization problems with hard constraints. In the paper, we define certified algorithms, describe their properties, present a framework for designing certified algorithms, provide examples of certified algorithms for Max Cut/Min Uncut, Minimum Multiway Cut, k-medians and k-means. We also present some negative results

    Improved Cheeger's Inequality: Analysis of Spectral Partitioning Algorithms through Higher Order Spectral Gap

    Get PDF
    Let \phi(G) be the minimum conductance of an undirected graph G, and let 0=\lambda_1 <= \lambda_2 <=... <= \lambda_n <= 2 be the eigenvalues of the normalized Laplacian matrix of G. We prove that for any graph G and any k >= 2, \phi(G) = O(k) \lambda_2 / \sqrt{\lambda_k}, and this performance guarantee is achieved by the spectral partitioning algorithm. This improves Cheeger's inequality, and the bound is optimal up to a constant factor for any k. Our result shows that the spectral partitioning algorithm is a constant factor approximation algorithm for finding a sparse cut if \lambda_k$ is a constant for some constant k. This provides some theoretical justification to its empirical performance in image segmentation and clustering problems. We extend the analysis to other graph partitioning problems, including multi-way partition, balanced separator, and maximum cut

    Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

    Get PDF
    Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes
    corecore