14,654 research outputs found
Distributed Graph Clustering using Modularity and Map Equation
We study large-scale, distributed graph clustering. Given an undirected
graph, our objective is to partition the nodes into disjoint sets called
clusters. A cluster should contain many internal edges while being sparsely
connected to other clusters. In the context of a social network, a cluster
could be a group of friends. Modularity and map equation are established
formalizations of this internally-dense-externally-sparse principle. We present
two versions of a simple distributed algorithm to optimize both measures. They
are based on Thrill, a distributed big data processing framework that
implements an extended MapReduce model. The algorithms for the two measures,
DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality
measures is straight-forward. We conduct an extensive experimental study on
real-world graphs and on synthetic benchmark graphs with up to 68 billion
edges. Our algorithms are fast while detecting clusterings similar to those
detected by other sequential, parallel and distributed clustering algorithms.
Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is
up to an order of magnitude faster and achieves better quality.Comment: 14 pages, 3 figures; v3: Camera ready for Euro-Par 2018, more
details, more results; v2: extended experiments to include comparison with
competing algorithms, shortened for submission to Euro-Par 201
Detecting Communities under Differential Privacy
Complex networks usually expose community structure with groups of nodes
sharing many links with the other nodes in the same group and relatively few
with the nodes of the rest. This feature captures valuable information about
the organization and even the evolution of the network. Over the last decade, a
great number of algorithms for community detection have been proposed to deal
with the increasingly complex networks. However, the problem of doing this in a
private manner is rarely considered. In this paper, we solve this problem under
differential privacy, a prominent privacy concept for releasing private data.
We analyze the major challenges behind the problem and propose several schemes
to tackle them from two perspectives: input perturbation and algorithm
perturbation. We choose Louvain method as the back-end community detection for
input perturbation schemes and propose the method LouvainDP which runs Louvain
algorithm on a noisy super-graph. For algorithm perturbation, we design
ModDivisive using exponential mechanism with the modularity as the score. We
have thoroughly evaluated our techniques on real graphs of different sizes and
verified their outperformance over the state-of-the-art
Stability of graph communities across time scales
The complexity of biological, social and engineering networks makes it
desirable to find natural partitions into communities that can act as
simplified descriptions and provide insight into the structure and function of
the overall system. Although community detection methods abound, there is a
lack of consensus on how to quantify and rank the quality of partitions. We
show here that the quality of a partition can be measured in terms of its
stability, defined in terms of the clustered autocovariance of a Markov process
taking place on the graph. Because the stability has an intrinsic dependence on
time scales of the graph, it allows us to compare and rank partitions at each
time and also to establish the time spans over which partitions are optimal.
Hence the Markov time acts effectively as an intrinsic resolution parameter
that establishes a hierarchy of increasingly coarser clusterings. Within our
framework we can then provide a unifying view of several standard partitioning
measures: modularity and normalized cut size can be interpreted as one-step
time measures, whereas Fiedler's spectral clustering emerges at long times. We
apply our method to characterize the relevance and persistence of partitions
over time for constructive and real networks, including hierarchical graphs and
social networks. We also obtain reduced descriptions for atomic level protein
structures over different time scales.Comment: submitted; updated bibliography from v
- …