4 research outputs found
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Incremental Lossless Graph Summarization
Given a fully dynamic graph, represented as a stream of edge insertions and
deletions, how can we obtain and incrementally update a lossless summary of its
current snapshot? As large-scale graphs are prevalent, concisely representing
them is inevitable for efficient storage and analysis. Lossless graph
summarization is an effective graph-compression technique with many desirable
properties. It aims to compactly represent the input graph as (a) a summary
graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e.,
edges between supernodes), which provide a rough description, and (b) edge
corrections which fix errors induced by the rough description. While a number
of batch algorithms, suited for static graphs, have been developed for rapid
and compact graph summarization, they are highly inefficient in terms of time
and space for dynamic graphs, which are common in practice. In this work, we
propose MoSSo, the first incremental algorithm for lossless summarization of
fully dynamic graphs. In response to each change in the input graph, MoSSo
updates the output representation by repeatedly moving nodes among supernodes.
MoSSo decides nodes to be moved and their destinations carefully but rapidly
based on several novel ideas. Through extensive experiments on 10 real graphs,
we show MoSSo is (a) Fast and 'any time': processing each change in
near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude
faster than running state-of-the-art batch methods, (b) Scalable: summarizing
graphs with hundreds of millions of edges, requiring sub-linear memory during
the process, and (c) Effective: achieving comparable compression ratios even to
state-of-the-art batch methods.Comment: to appear at the 26th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '20
Play like a Vertex: A Stackelberg Game Approach for Streaming Graph Partitioning
In the realm of distributed systems tasked with managing and processing
large-scale graph-structured data, optimizing graph partitioning stands as a
pivotal challenge. The primary goal is to minimize communication overhead and
runtime cost. However, alongside the computational complexity associated with
optimal graph partitioning, a critical factor to consider is memory overhead.
Real-world graphs often reach colossal sizes, making it impractical and
economically unviable to load the entire graph into memory for partitioning.
This is also a fundamental premise in distributed graph processing, where
accommodating a graph with non-distributed systems is unattainable. Currently,
existing streaming partitioning algorithms exhibit a skew-oblivious nature,
yielding satisfactory partitioning results exclusively for specific graph
types. In this paper, we propose a novel streaming partitioning algorithm, the
Skewness-aware Vertex-cut Partitioner S5P, designed to leverage the skewness
characteristics of real graphs for achieving high-quality partitioning. S5P
offers high partitioning quality by segregating the graph's edge set into two
subsets, head and tail sets. Following processing by a skewness-aware
clustering algorithm, these two subsets subsequently undergo a Stackelberg
graph game. Our extensive evaluations conducted on substantial real-world and
synthetic graphs demonstrate that, in all instances, the partitioning quality
of S5P surpasses that of existing streaming partitioning algorithms, operating
within the same load balance constraints. For example, S5P can bring up to a
51% improvement in partitioning quality compared to the top partitioner among
the baselines. Lastly, we showcase that the implementation of S5P results in up
to an 81% reduction in communication cost and a 130% increase in runtime
efficiency for distributed graph processing tasks on PowerGraph.Comment: This paper has been accepted by SIGMOD 202