28 research outputs found
Towards a property graph generator for benchmarking
The use of synthetic graph generators is a common practice among
graph-oriented benchmark designers, as it allows obtaining graphs with the
required scale and characteristics. However, finding a graph generator that
accurately fits the needs of a given benchmark is very difficult, thus
practitioners end up creating ad-hoc ones. Such a task is usually
time-consuming, and often leads to reinventing the wheel. In this paper, we
introduce the conceptual design of DataSynth, a framework for property graphs
generation with customizable schemas and characteristics. The goal of DataSynth
is to assist benchmark designers in generating graphs efficiently and at scale,
saving from implementing their own generators. Additionally, DataSynth
introduces novel features barely explored so far, such as modeling the
correlation between properties and the structure of the graph. This is achieved
by a novel property-to-node matching algorithm for which we present preliminary
promising results
Structural Patterns and Generative Models of Real-world Hypergraphs
Graphs have been utilized as a powerful tool to model pairwise relationships
between people or objects. Such structure is a special type of a broader
concept referred to as hypergraph, in which each hyperedge may consist of an
arbitrary number of nodes, rather than just two. A large number of real-world
datasets are of this form - for example, list of recipients of emails sent from
an organization, users participating in a discussion thread or subject labels
tagged in an online question. However, due to complex representations and lack
of adequate tools, little attention has been paid to exploring the underlying
patterns in these interactions.
In this work, we empirically study a number of real-world hypergraph datasets
across various domains. In order to enable thorough investigations, we
introduce the multi-level decomposition method, which represents each
hypergraph by a set of pairwise graphs. Each pairwise graph, which we refer to
as a k-level decomposed graph, captures the interactions between pairs of
subsets of k nodes. We empirically find that at each decomposition level, the
investigated hypergraphs obey five structural properties. These properties
serve as criteria for evaluating how realistic a hypergraph is, and establish a
foundation for the hypergraph generation problem. We also propose a hypergraph
generator that is remarkably simple but capable of fulfilling these evaluation
metrics, which are hardly achieved by other baseline generator models.Comment: to be published in the 26th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '20
Partitioner Selection with EASE to Optimize Distributed Graph Processing
For distributed graph processing on massive graphs, a graph is partitioned
into multiple equally-sized parts which are distributed among machines in a
compute cluster. In the last decade, many partitioning algorithms have been
developed which differ from each other with respect to the partitioning
quality, the run-time of the partitioning and the type of graph for which they
work best. The plethora of graph partitioning algorithms makes it a challenging
task to select a partitioner for a given scenario. Different studies exist that
provide qualitative insights into the characteristics of graph partitioning
algorithms that support a selection. However, in order to enable automatic
selection, a quantitative prediction of the partitioning quality, the
partitioning run-time and the run-time of subsequent graph processing jobs is
needed. In this paper, we propose a machine learning-based approach to provide
such a quantitative prediction for different types of edge partitioning
algorithms and graph processing workloads. We show that training based on
generated graphs achieves high accuracy, which can be further improved when
using real-world data. Based on the predictions, the automatic selection
reduces the end-to-end run-time on average by 11.1% compared to a random
selection, by 17.4% compared to selecting the partitioner that yields the
lowest cut size, and by 29.1% compared to the worst strategy, respectively.
Furthermore, in 35.7% of the cases, the best strategy was selected.Comment: To appear at IEEE International Conference on Data Engineering (ICDE
2023
RTGEN : A Relative Temporal Graph GENerator
International audienceGraph management systems are emerging as an efficient solution to store and query graph-oriented data. To assess the performance and compare such systems, practitioners often design benchmarks in which they use large scale graphs. However, such graphs either do not fit the scale requirements or are not publicly available. This has been the incentive of a number of graph generators which produce synthetic graphs whose characteristics mimic those of real-world graphs (degree distribution, community structure, diameter, etc.). Applications, however, require to deal with temporal graphs whose topology is in constant change. Although generating static graphs has been extensively studied in the literature, generating temporal graphs has received much less attention. In this work, we propose RTGEN a relative temporal graph generator that allows the generation of temporal graphs by controlling the evolution of the degree distribution. In particular, we propose to generate new graphs with a desired degree distribution out of existing ones while minimizing the efforts to transform our source graph to target. Our proposed relative graph generation method relies on optimal transport methods. We extend our method to also deal with the community structure of the generated graphs that is prevalent in a number of applications. Our generation model extends the concepts proposed in the Chung-Lu model with a temporal and community-aware support. We validate our generation procedure through experiments that prove the reliability of the generated graphs with the ground-truth parameters