187 research outputs found
Distributed Graph Clustering using Modularity and Map Equation
We study large-scale, distributed graph clustering. Given an undirected
graph, our objective is to partition the nodes into disjoint sets called
clusters. A cluster should contain many internal edges while being sparsely
connected to other clusters. In the context of a social network, a cluster
could be a group of friends. Modularity and map equation are established
formalizations of this internally-dense-externally-sparse principle. We present
two versions of a simple distributed algorithm to optimize both measures. They
are based on Thrill, a distributed big data processing framework that
implements an extended MapReduce model. The algorithms for the two measures,
DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality
measures is straight-forward. We conduct an extensive experimental study on
real-world graphs and on synthetic benchmark graphs with up to 68 billion
edges. Our algorithms are fast while detecting clusterings similar to those
detected by other sequential, parallel and distributed clustering algorithms.
Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is
up to an order of magnitude faster and achieves better quality.Comment: 14 pages, 3 figures; v3: Camera ready for Euro-Par 2018, more
details, more results; v2: extended experiments to include comparison with
competing algorithms, shortened for submission to Euro-Par 201
Generating realistic scaled complex networks
Research on generative models is a central project in the emerging field of
network science, and it studies how statistical patterns found in real networks
could be generated by formal rules. Output from these generative models is then
the basis for designing and evaluating computational methods on networks, and
for verification and simulation studies. During the last two decades, a variety
of models has been proposed with an ultimate goal of achieving comprehensive
realism for the generated networks. In this study, we (a) introduce a new
generator, termed ReCoN; (b) explore how ReCoN and some existing models can be
fitted to an original network to produce a structurally similar replica, (c)
use ReCoN to produce networks much larger than the original exemplar, and
finally (d) discuss open problems and promising research directions. In a
comparative experimental study, we find that ReCoN is often superior to many
other state-of-the-art network generation methods. We argue that ReCoN is a
scalable and effective tool for modeling a given network while preserving
important properties at both micro- and macroscopic scales, and for scaling the
exemplar data by orders of magnitude in size.Comment: 26 pages, 13 figures, extended version, a preliminary version of the
paper was presented at the 5th International Workshop on Complex Networks and
their Application
Parallel and I/O-efficient randomisation of massive networks using global curveball trades
Graph randomisation is a crucial task in the analysis and synthesis of networks. It is typically implemented as an edge switching process (ESMC) repeatedly swapping the nodes of random edge pairs while maintaining the degrees involved [23]. Curveball is a novel approach that instead considers the whole neighbourhoods of randomly drawn node pairs. Its Markov chain converges to a uniform distribution, and experiments suggest that it requires less steps than the established ESMC [6]. Since trades however are more expensive, we study Curveball’s practical runtime by introducing the first efficient Curveball algorithms: the I/O-efficient EM-CB for simple undirected graphs and its internal memory pendant IM-CB. Further, we investigate global trades [6] processing every node in a single super step, and show that undirected global trades converge to a uniform distribution and perform superior in practice. We then discuss EM-GCB and EMPGCB for global trades and give experimental evidence that EM-PGCB achieves the quality of the state-of-the-art ESMC algorithm EM-ES [15] nearly one order of magnitude faster
Algorithms and Software for the Analysis of Large Complex Networks
The work presented intersects three main areas, namely graph algorithmics, network science and applied software engineering. Each computational method discussed relates to one of the main tasks of data analysis: to extract structural features from network data, such as methods for community detection; or to transform network data, such as methods to sparsify a network and reduce its size while keeping essential properties; or to realistically model networks through generative models
Fast community structure local uncovering by independent vertex-centred process
This paper addresses the task of community detection and proposes a local
approach based on a distributed list building, where each vertex broadcasts
basic information that only depends on its degree and that of its neighbours. A
decentralised external process then unveils the community structure. The
relevance of the proposed method is experimentally shown on both artificial and
real data.Comment: 2015 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, Aug 2015, Paris, France. Proceedings of the 2015
IEEE/ACM International Conference on Advances in Social Networks Analysis and
Minin
- …