Search CORE

410 research outputs found

Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

Author: Gleich David
Veldt Nate
Wirth Anthony
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter,

\lambda

, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks

arXiv.org e-Print Archive

University of Melbourne Institutional Repository

Bilu-Linial Stable Instances of Max Cut and Minimum Multiway Cut

Author: Makarychev Konstantin
Makarychev Yury
Vijayaraghavan Aravindan
Publication venue
Publication date: 01/01/2013
Field of study

We investigate the notion of stability proposed by Bilu and Linial. We obtain an exact polynomial-time algorithm for

\gamma

-stable Max Cut instances with

\gamma \geq c\sqrt{\log n}\log\log n

for some absolute constant

c > 0

. Our algorithm is robust: it never returns an incorrect answer; if the instance is

\gamma

-stable, it finds the maximum cut, otherwise, it either finds the maximum cut or certifies that the instance is not

\gamma

-stable. We prove that there is no robust polynomial-time algorithm for

\gamma

-stable instances of Max Cut when

\gamma < \alpha_{SC}(n/2)

, where

\alpha_{SC}

is the best approximation factor for Sparsest Cut with non-uniform demands. Our algorithm is based on semidefinite programming. We show that the standard SDP relaxation for Max Cut (with

\ell_2^2

triangle inequalities) is integral if

\gamma \geq D_{\ell_2^2\to \ell_1}(n)

, where

D_{\ell_2^2\to \ell_1}(n)

is the least distortion with which every

n

point metric space of negative type embeds into

\ell_1

. On the negative side, we show that the SDP relaxation is not integral when

\gamma < D_{\ell_2^2\to \ell_1}(n/2)

. Moreover, there is no tractable convex relaxation for

\gamma

-stable instances of Max Cut when

\gamma < \alpha_{SC}(n/2)

. That suggests that solving

\gamma

-stable instances with

\gamma =o(\sqrt{\log n})

might be difficult or impossible. Our results significantly improve previously known results. The best previously known algorithm for

\gamma

-stable instances of Max Cut required that

\gamma \geq c\sqrt{n}

(for some

c > 0

) [Bilu, Daniely, Linial, and Saks]. No hardness results were known for the problem. Additionally, we present an algorithm for 4-stable instances of Minimum Multiway Cut. We also study a relaxed notion of weak stability.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Certified Algorithms: Worst-Case Analysis and Beyond

Author: Makarychev Konstantin
Makarychev Yury
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 01/01/2020
Field of study

In this paper, we introduce the notion of a certified algorithm. Certified algorithms provide worst-case and beyond-worst-case performance guarantees. First, a ?-certified algorithm is also a ?-approximation algorithm - it finds a ?-approximation no matter what the input is. Second, it exactly solves ?-perturbation-resilient instances (?-perturbation-resilient instances model real-life instances). Additionally, certified algorithms have a number of other desirable properties: they solve both maximization and minimization versions of a problem (e.g. Max Cut and Min Uncut), solve weakly perturbation-resilient instances, and solve optimization problems with hard constraints. In the paper, we define certified algorithms, describe their properties, present a framework for designing certified algorithms, provide examples of certified algorithms for Max Cut/Min Uncut, Minimum Multiway Cut, k-medians and k-means. We also present some negative results

Dagstuhl Research Online Publication Server

Improved Cheeger's Inequality: Analysis of Spectral Partitioning Algorithms through Higher Order Spectral Gap

Author: Gharan Shayan Oveis
Kwok Tsz Chiu
Lau Lap Chi
Lee Yin Tat
Trevisan Luca
Publication venue
Publication date: 01/01/2013
Field of study

Let \phi(G) be the minimum conductance of an undirected graph G, and let 0=\lambda_1 <= \lambda_2 <=... <= \lambda_n <= 2 be the eigenvalues of the normalized Laplacian matrix of G. We prove that for any graph G and any k >= 2, \phi(G) = O(k) \lambda_2 / \sqrt{\lambda_k}, and this performance guarantee is achieved by the spectral partitioning algorithm. This improves Cheeger's inequality, and the bound is optimal up to a constant factor for any k. Our result shows that the spectral partitioning algorithm is a constant factor approximation algorithm for finding a sparse cut if \lambda_k$ is a constant for some constant k. This provides some theoretical justification to its empirical performance in image segmentation and clustering problems. We extend the analysis to other graph partitioning problems, including multi-way partition, balanced separator, and maximum cut

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della Ricerca - Bocconi

Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

Author: Adrian Vetta
AP Topchy
C Huttenhower
C Stark
CT Harbison
D Cheng
D Hanisch
E Segal
EA Elion
EH Davidson
EL Hong
EN Smith
Eric E. Schadt
G Chua
G Yvert
H Ge
H Wang
HW Mewes
I Lee
I Tirosh
I Ulitsky
J Ihmels
J Shi
JL DeRisi
JM Stuart
Jun Zhu
Jörg Stelling
K MacIsaac
L Pena-Castillo
M Ashburner
Manikandan Narayanan
MC Oldham
NM Luscombe
O Alter
P Langfelder
R Andersen
R Bonneau
R Guimera
R Kannan
R Sharan
R Sharan
S Arora
S Bergmann
SA Jelinsky
T Ideker
TR Golub
TR Hughes
U Brandes
U de Lichtenberg
Z Hu
Z Kutalik
Publication venue: Public Library of Science
Publication date: 01/04/2010
Field of study

Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central