Search CORE

4 research outputs found

Time and Memory Efficient Parallel Algorithm for Structural Graph Summaries and two Extensions to Incremental Summarization and $k$ -Bisimulation for Long $k$ -Chaining

Author: Blume Till
Rau Jannik
Richerby David
Scherp Ansgar
Publication venue
Publication date: 04/11/2022
Field of study

We developed a flexible parallel algorithm for graph summarization based on vertex-centric programming and parameterized message passing. The base algorithm supports infinitely many structural graph summary models defined in a formal language. An extension of the parallel base algorithm allows incremental graph summarization. In this paper, we prove that the incremental algorithm is correct and show that updates are performed in time

\mathcal{O}(\Delta \cdot d^k)

, where

\Delta

is the number of additions, deletions, and modifications to the input graph,

d

the maximum degree, and

k

is the maximum distance in the subgraphs considered. Although the iterative algorithm supports values of

k>1

, it requires nested data structures for the message passing that are memory-inefficient. Thus, we extended the base summarization algorithm by a hash-based messaging mechanism to support a scalable iterative computation of graph summarizations based on

k

-bisimulation for arbitrary

k

. We empirically evaluate the performance of our algorithms using benchmark and real-world datasets. The incremental algorithm almost always outperforms the batch computation. We observe in our experiments that the incremental algorithm is faster even in cases when

50\%

of the graph database changes from one version to the next. The incremental computation requires a three-layered hash index, which has a low memory overhead of only

8\%

(

\pm 1\%

). Finally, the incremental summarization algorithm outperforms the batch algorithm even with fewer cores. The iterative parallel

k

-bisimulation algorithm computes summaries on graphs with over

10

M edges within seconds. We show that the algorithm processes graphs of

100+\,

M edges within a few minutes while having a moderate memory consumption of

<150

GB. For the largest BSBM1B dataset with 1 billion edges, it computes

k=10

bisimulation in under an hour

arXiv.org e-Print Archive

Designing algorithms for big graph datasets : a study of computing bisimulation and joins

Author: Luo Y.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2015
Field of study

Repository TU/e

Pure OAI Repository

Regularities and dynamics in bisimulation reductions of big graphs

Author: De Bra P.M.E.
Fletcher G.H.L.
Hidders A.J.H.
Luo Y.
Wu Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Bisimulation is a basic graph reduction operation, which plays a key role in a wide range of graph analytical applications. While there are many algorithms dedicated to computing bisimulation results, to our knowledge, little work has been done to analyze the results themselves. Since data properties such as skew can greatly influence the performances of data-intensive tasks, the lack of such insight leads to inefficient algorithm and system design. In this paper we take a close look into various aspects of bisimulation results on big graphs, from both real-world scenarios and synthetic graph generators, with graph size varying from 1 million to 1 billion edges. We make the following observations: (1) A certain degree of regularity exists in real-world graphs' bisimulation results. Specifically, power-law distributions appear in many of the results' properties. (2) Synthetic graphs fail to fulfill one or more of these regularities that are revealed in the real-world graphs. (3) By examining a growing social network graph (Flickr-Grow), we see that the corresponding bisimulation partition relation graph grows as well, but the growth is stable with respect to the original graph

Pure OAI Repository