Search CORE

989 research outputs found

Bad Communities with High Modularity

Author: Kehagias Athanasios
Pitsoulis Leonidas
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we discuss some problematic aspects of Newman's modularity function QN. Given a graph G, the modularity of G can be written as QN = Qf -Q0, where Qf is the intracluster edge fraction of G and Q0 is the expected intracluster edge fraction of the null model, i.e., a randomly connected graph with same expected degree distribution as G. It follows that the maximization of QN must accomodate two factors pulling in opposite directions: Qf favors a small number of clusters and Q0 favors many balanced (i.e., with approximately equal degrees) clusters. In certain cases the Q0 term can cause overestimation of the true cluster number; this is the opposite of the well-known under estimation effect caused by the "resolution limit" of modularity. We illustrate the overestimation effect by constructing families of graphs with a "natural" community structure which, however, does not maximize modularity. In fact, we prove that we can always find a graph G with a "natural clustering" V of G and another, balanced clustering U of G such that (i) the pair (G; U) has higher modularity than (G; V) and (ii) V and U are arbitrarily different.Comment: Significantly improved version of the paper, with the help of L. Pitsouli

arXiv.org e-Print Archive

CiteSeerX

EDP Sciences OAI-PMH repository (1.2.0)

Multi-level algorithms for modularity clustering

Author: Noack Andreas
Rotta Randolf
Publication venue
Publication date: 01/01/2008
Field of study

Modularity is one of the most widely used quality measures for graph clusterings. Maximizing modularity is NP-hard, and the runtime of exact algorithms is prohibitive for large graphs. A simple and effective class of heuristics coarsens the graph by iteratively merging clusters (starting from singletons), and optionally refines the resulting clustering by iteratively moving individual vertices between clusters. Several heuristics of this type have been proposed in the literature, but little is known about their relative performance. This paper experimentally compares existing and new coarsening- and refinement-based heuristics with respect to their effectiveness (achieved modularity) and efficiency (runtime). Concerning coarsening, it turns out that the most widely used criterion for merging clusters (modularity increase) is outperformed by other simple criteria, and that a recent algorithm by Schuetz and Caflisch is no improvement over simple greedy coarsening for these criteria. Concerning refinement, a new multi-level algorithm is shown to produce significantly better clusterings than conventional single-level algorithms. A comparison with published benchmark results and algorithm implementations shows that combinations of coarsening and multi-level refinement are competitive with the best algorithms in the literature.Comment: 12 pages, 10 figures, see http://www.informatik.tu-cottbus.de/~rrotta/ for downloading the graph clustering softwar

arXiv.org e-Print Archive

CiteSeerX

Distributed Graph Clustering using Modularity and Map Equation

Author: A Lancichinetti
BH Good
C Staudt
DA Bader
G Karypis
J Zeng
L Hubert
M Rosvall
MEJ Newman
S Bae
S Fortunato
S Fortunato
S Fortunato
T Kawamoto
U Brandes
Vincent D Blondel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/06/2018
Field of study

We study large-scale, distributed graph clustering. Given an undirected graph, our objective is to partition the nodes into disjoint sets called clusters. A cluster should contain many internal edges while being sparsely connected to other clusters. In the context of a social network, a cluster could be a group of friends. Modularity and map equation are established formalizations of this internally-dense-externally-sparse principle. We present two versions of a simple distributed algorithm to optimize both measures. They are based on Thrill, a distributed big data processing framework that implements an extended MapReduce model. The algorithms for the two measures, DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality measures is straight-forward. We conduct an extensive experimental study on real-world graphs and on synthetic benchmark graphs with up to 68 billion edges. Our algorithms are fast while detecting clusterings similar to those detected by other sequential, parallel and distributed clustering algorithms. Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is up to an order of magnitude faster and achieves better quality.Comment: 14 pages, 3 figures; v3: Camera ready for Euro-Par 2018, more details, more results; v2: extended experiments to include comparison with competing algorithms, shortened for submission to Euro-Par 201

arXiv.org e-Print Archive

Crossref

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

Author: Börner Katy
Emmons Scott
Gallant Mike
Kobourov Stephen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/07/2016
Field of study

Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

The University of Arizona

Axioms for graph clustering quality functions

Author: Marchiori Elena
van Laarhoven Twan
Publication venue
Publication date: 01/01/2014
Field of study

We investigate properties that intuitively ought to be satisfied by graph clustering quality functions, that is, functions that assign a score to a clustering of a graph. Graph clustering, also known as network community detection, is often performed by optimizing such a function. Two axioms tailored for graph clustering quality functions are introduced, and the four axioms introduced in previous work on distance based clustering are reformulated and generalized for the graph setting. We show that modularity, a standard quality function for graph clustering, does not satisfy all of these six properties. This motivates the derivation of a new family of quality functions, adaptive scale modularity, which does satisfy the proposed axioms. Adaptive scale modularity has two parameters, which give greater flexibility in the kinds of clusterings that can be found. Standard graph clustering quality functions, such as normalized cut and unnormalized cut, are obtained as special cases of adaptive scale modularity. In general, the results of our investigation indicate that the considered axiomatic framework covers existing `good' quality functions for graph clustering, and can be used to derive an interesting new family of quality functions.Comment: 23 pages. Full text and sources available on: http://www.cs.ru.nl/~T.vanLaarhoven/graph-clustering-axioms-2014

arXiv.org e-Print Archive

Radboud Repository

Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

Author: Gleich David
Veldt Nate
Wirth Anthony
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter,

\lambda

, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks

arXiv.org e-Print Archive

University of Melbourne Institutional Repository

Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach

Author: Becker
Bishop
Blondel
Boulet
Butts
Cerny
Di Battista
Duch
Eades
Fabrice Rossi
Fabrikant
Fortunato
Fruchterman
Golub
Graepel
Graepel
Guimera
Herman
Hofmann
Jaakkola
Knuth
Lee
Lehmann
Nathalie Villa-Vialaneix
Newman
Newman
Newman
Newman
Newman
Noack
Noack
Purchase
Reichardt
Rose
Schaeffer
Schölkopf
Vesanto
von Luxburg
Ware
Wasserman
Watts
Yen
Zachary
Publication venue: 'Elsevier BV'
Publication date: 01/03/2010
Field of study

This paper proposes an organized generalization of Newman and Girvan's modularity measure for graph clustering. Optimized via a deterministic annealing scheme, this measure produces topologically ordered graph clusterings that lead to faithful and readable graph representations based on clustering induced graphs. Topographic graph clustering provides an alternative to more classical solutions in which a standard graph clustering method is applied to build a simpler graph that is then represented with a graph layout algorithm. A comparative study on four real world graphs ranging from 34 to 1 133 vertices shows the interest of the proposed approach with respect to classical solutions and to self-organizing maps for graphs

arXiv.org e-Print Archive

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Node-Centric Detection of Overlapping Communities in Social Networks

Author: Cohen Yehonatan
Hendler Danny
Rubin Amir
Publication venue
Publication date: 06/07/2016
Field of study

We present NECTAR, a community detection algorithm that generalizes Louvain method's local search heuristic for overlapping community structures. NECTAR chooses dynamically which objective function to optimize based on the network on which it is invoked. Our experimental evaluation on both synthetic benchmark graphs and real-world networks, based on ground-truth communities, shows that NECTAR provides excellent results as compared with state of the art community detection algorithms

arXiv.org e-Print Archive

Crossref