Search CORE

68,506 research outputs found

Hearing the clusters in a graph: A distributed algorithm

Author: Banaszuk Andrzej
Sahai Tuhin
Speranzon Alberto
Publication venue
Publication date: 17/04/2011
Field of study

We propose a novel distributed algorithm to cluster graphs. The algorithm recovers the solution obtained from spectral clustering without the need for expensive eigenvalue/vector computations. We prove that, by propagating waves through the graph, a local fast Fourier transform yields the local component of every eigenvector of the Laplacian matrix, thus providing clustering information. For large graphs, the proposed algorithm is orders of magnitude faster than random walk based approaches. We prove the equivalence of the proposed algorithm to spectral clustering and derive convergence rates. We demonstrate the benefit of using this decentralized clustering algorithm for community detection in social graphs, accelerating distributed estimation in sensor networks and efficient computation of distributed multi-agent search strategies

arXiv.org e-Print Archive

CiteSeerX

Distributed Community Detection with the WCC Metric

Author: Dominguez-Sal David
Prat-Pèrez Arnau
Saltz Matthew
Publication venue
Publication date: 03/11/2014
Field of study

Community detection has become an extremely active area of research in recent years, with researchers proposing various new metrics and algorithms to address the problem. Recently, the Weighted Community Clustering (WCC) metric was proposed as a novel way to judge the quality of a community partitioning based on the distribution of triangles in the graph, and was demonstrated to yield superior results over other commonly used metrics like modularity. The same authors later presented a parallel algorithm for optimizing WCC on large graphs. In this paper, we propose a new distributed, vertex-centric algorithm for community detection using the WCC metric. Results are presented that demonstrate the algorithm's performance and scalability on up to 32 worker machines and real graphs of up to 1.8 billion vertices. The algorithm scales best with the largest graphs, and to our knowledge, it is the first distributed algorithm for optimizing the WCC metric.Comment: 6 pages, 6 figure

arXiv.org e-Print Archive

Crossref

A Novel Scalable Clustering Method for Distributed Networks

Author: Aridhi Sabeur
Inoubli Wissm
Maddouri Mondher
Mephu Nguifo Engelbert
Mezni Haithem
Publication venue: HAL CCSD
Publication date: 10/06/2020
Field of study

Graph clustering is one of the key techniques to understand structures that are present in networks. In addition to clusters, bridges and outliers detection is also a critical task as it plays an important role in the analysis of networks. Recently, several graph clustering methods are developed and used in multiple application domains such as biological network analysis, recommendation systems and community detection. Most of these algorithms are based on the structural clustering algorithm. Yet, this kind of algorithm is based on the structural similarity, this later requires to parse all graph ' edges in order to compute the structural similarity. However, the height needs of similarity computing make this algorithm more adequate for small graphs, without significant support to deal with large-scale networks. In this paper, we propose a novel distributed graph clustering algorithm based on structural graph clustering. The experimental results show the efficiency in terms of running time of the proposed algorithm in large networks compared to existing structural graph clustering methods

INRIA a CCSD electronic archive server

HAL Clermont Université

Distributed Graph Clustering using Modularity and Map Equation

Author: A Lancichinetti
BH Good
C Staudt
DA Bader
G Karypis
J Zeng
L Hubert
M Rosvall
MEJ Newman
S Bae
S Fortunato
S Fortunato
S Fortunato
T Kawamoto
U Brandes
Vincent D Blondel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/06/2018
Field of study

We study large-scale, distributed graph clustering. Given an undirected graph, our objective is to partition the nodes into disjoint sets called clusters. A cluster should contain many internal edges while being sparsely connected to other clusters. In the context of a social network, a cluster could be a group of friends. Modularity and map equation are established formalizations of this internally-dense-externally-sparse principle. We present two versions of a simple distributed algorithm to optimize both measures. They are based on Thrill, a distributed big data processing framework that implements an extended MapReduce model. The algorithms for the two measures, DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality measures is straight-forward. We conduct an extensive experimental study on real-world graphs and on synthetic benchmark graphs with up to 68 billion edges. Our algorithms are fast while detecting clusterings similar to those detected by other sequential, parallel and distributed clustering algorithms. Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is up to an order of magnitude faster and achieves better quality.Comment: 14 pages, 3 figures; v3: Camera ready for Euro-Par 2018, more details, more results; v2: extended experiments to include comparison with competing algorithms, shortened for submission to Euro-Par 201

arXiv.org e-Print Archive

Crossref

On Efficiently Detecting Overlapping Communities over Distributed Dynamic Graphs

Author: Chen Lei
Jian Xun
Lian Xiang
Publication venue
Publication date: 18/01/2018
Field of study

Modern networks are of huge sizes as well as high dynamics, which challenges the efficiency of community detection algorithms. In this paper, we study the problem of overlapping community detection on distributed and dynamic graphs. Given a distributed, undirected and unweighted graph, the goal is to detect overlapping communities incrementally as the graph is dynamically changing. We propose an efficient algorithm, called \textit{randomized Speaker-Listener Label Propagation Algorithm} (rSLPA), based on the \textit{Speaker-Listener Label Propagation Algorithm} (SLPA) by relaxing the probability distribution of label propagation. Besides detecting high-quality communities, rSLPA can incrementally update the detected communities after a batch of edge insertion and deletion operations. To the best of our knowledge, rSLPA is the first algorithm that can incrementally capture the same communities as those obtained by applying the detection algorithm from the scratch on the updated graph. Extensive experiments are conducted on both synthetic and real-world datasets, and the results show that our algorithm can achieve high accuracy and efficiency at the same time.Comment: A short version of this paper will be published as ICDE'2018 poste

arXiv.org e-Print Archive

Crossref

A novel clustering methodology based on modularity optimisation for detecting authorship affinities in Shakespearean era plays

Author: Berretta R
Craig H
Moscato P
Naeni LM
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

© 2016 Naeni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays

University of Newcastle's Digital Repository

OPUS - University of Technology Sydney

Directory of Open Access Journals

PubMed Central

FigShare