68,506 research outputs found
Hearing the clusters in a graph: A distributed algorithm
We propose a novel distributed algorithm to cluster graphs. The algorithm
recovers the solution obtained from spectral clustering without the need for
expensive eigenvalue/vector computations. We prove that, by propagating waves
through the graph, a local fast Fourier transform yields the local component of
every eigenvector of the Laplacian matrix, thus providing clustering
information. For large graphs, the proposed algorithm is orders of magnitude
faster than random walk based approaches. We prove the equivalence of the
proposed algorithm to spectral clustering and derive convergence rates. We
demonstrate the benefit of using this decentralized clustering algorithm for
community detection in social graphs, accelerating distributed estimation in
sensor networks and efficient computation of distributed multi-agent search
strategies
Distributed Community Detection with the WCC Metric
Community detection has become an extremely active area of research in recent
years, with researchers proposing various new metrics and algorithms to address
the problem. Recently, the Weighted Community Clustering (WCC) metric was
proposed as a novel way to judge the quality of a community partitioning based
on the distribution of triangles in the graph, and was demonstrated to yield
superior results over other commonly used metrics like modularity. The same
authors later presented a parallel algorithm for optimizing WCC on large
graphs. In this paper, we propose a new distributed, vertex-centric algorithm
for community detection using the WCC metric. Results are presented that
demonstrate the algorithm's performance and scalability on up to 32 worker
machines and real graphs of up to 1.8 billion vertices. The algorithm scales
best with the largest graphs, and to our knowledge, it is the first distributed
algorithm for optimizing the WCC metric.Comment: 6 pages, 6 figure
A Novel Scalable Clustering Method for Distributed Networks
Graph clustering is one of the key techniques to understand structures that are present in networks. In addition to clusters, bridges and outliers detection is also a critical task as it plays an important role in the analysis of networks. Recently, several graph clustering methods are developed and used in multiple application domains such as biological network analysis, recommendation systems and community detection. Most of these algorithms are based on the structural clustering algorithm. Yet, this kind of algorithm is based on the structural similarity, this later requires to parse all graph ' edges in order to compute the structural similarity. However, the height needs of similarity computing make this algorithm more adequate for small graphs, without significant support to deal with large-scale networks. In this paper, we propose a novel distributed graph clustering algorithm based on structural graph clustering. The experimental results show the efficiency in terms of running time of the proposed algorithm in large networks compared to existing structural graph clustering methods
Distributed Graph Clustering using Modularity and Map Equation
We study large-scale, distributed graph clustering. Given an undirected
graph, our objective is to partition the nodes into disjoint sets called
clusters. A cluster should contain many internal edges while being sparsely
connected to other clusters. In the context of a social network, a cluster
could be a group of friends. Modularity and map equation are established
formalizations of this internally-dense-externally-sparse principle. We present
two versions of a simple distributed algorithm to optimize both measures. They
are based on Thrill, a distributed big data processing framework that
implements an extended MapReduce model. The algorithms for the two measures,
DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality
measures is straight-forward. We conduct an extensive experimental study on
real-world graphs and on synthetic benchmark graphs with up to 68 billion
edges. Our algorithms are fast while detecting clusterings similar to those
detected by other sequential, parallel and distributed clustering algorithms.
Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is
up to an order of magnitude faster and achieves better quality.Comment: 14 pages, 3 figures; v3: Camera ready for Euro-Par 2018, more
details, more results; v2: extended experiments to include comparison with
competing algorithms, shortened for submission to Euro-Par 201
On Efficiently Detecting Overlapping Communities over Distributed Dynamic Graphs
Modern networks are of huge sizes as well as high dynamics, which challenges
the efficiency of community detection algorithms. In this paper, we study the
problem of overlapping community detection on distributed and dynamic graphs.
Given a distributed, undirected and unweighted graph, the goal is to detect
overlapping communities incrementally as the graph is dynamically changing. We
propose an efficient algorithm, called \textit{randomized Speaker-Listener
Label Propagation Algorithm} (rSLPA), based on the \textit{Speaker-Listener
Label Propagation Algorithm} (SLPA) by relaxing the probability distribution of
label propagation. Besides detecting high-quality communities, rSLPA can
incrementally update the detected communities after a batch of edge insertion
and deletion operations. To the best of our knowledge, rSLPA is the first
algorithm that can incrementally capture the same communities as those obtained
by applying the detection algorithm from the scratch on the updated graph.
Extensive experiments are conducted on both synthetic and real-world datasets,
and the results show that our algorithm can achieve high accuracy and
efficiency at the same time.Comment: A short version of this paper will be published as ICDE'2018 poste
A novel clustering methodology based on modularity optimisation for detecting authorship affinities in Shakespearean era plays
© 2016 Naeni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays
- …