Search CORE

14 research outputs found

The LDBC Social Network Benchmark Interactive workload v2: A transactional graph query benchmark with deep delete operations

Author: Boncz P.A. (Peter)
Püroja D.A. (David)
Szárnyas G. (Gábor)
Waudby J. (Jack)
Publication venue
Publication date: 22/08/2023
Field of study

The LDBC Social Network Benchmark’s Interactive workload captures an OLTP scenario operating on a correlated social network graph. It consists of complex graph queries executed concurrently with a stream of updates operation. Since its initial release in 2015, the Interactive workload has become the de facto industry standard for benchmarking transactional graph data management systems. As graph systems have matured and the community’s understanding of graph processing features has evolved, we initiated the renewal of this benchmark. This paper describes the draft Interactive v2 workload with several new features: delete operations, a cheapest path-finding query, support for larger data sets, and a novel temporal parameter curation algorithm that ensures stable runtimes for path queries

CWI's Institutional Repository

DuckPGQ: Bringing SQL/PGQ to DuckDB

Author: Boncz P.A. (Peter)
Szárnyas G. (Gábor)
Wolde D.L.J, (Daniël) ten
Publication venue
Publication date: 22/08/2023
Field of study

In this research project, we investigate an alternative to the standard cloud-centralized data architecture. Specifically, we aim to leave part of application data under the control of the individual data owners in conceptually decentralized personal data stores. Our primary goal is to increase data minimization, i. e., enabling more sensitive personal data to be under the control of its owners while providing a straightforward and efficient framework for architects to design data architectures that allow applications to run and their data to be analyzed. To serve this purpose, the centralized part of the schema contains aggregating views over this decentralized data. We propose to design a declarative language that extends SQL, for architects to specify at the schema level different kinds of tables: decentralized, centralized, and replicated, as well as centralized materialized views, and in addition, the sensitivity of decentralized columns and their minimum granularity levels, when these end up in centralized views. When users modify their personal data stores, the changes need to be reflected in the centralized views while ensuring privacy; this calls for the integration of cryptography techniques in distributed materialized view maintenance. We finally aim to implement this system, where the personal data stores could either live in mobile devices or encrypted cloud storage, in order to evaluate its performance properties experimentally.We demonstrate the most important new feature of SQL:2023, namely SQL/PGQ, which eases querying graphs using SQL by introducing new syntax for pattern matching and (shortest) path-finding. We show how support for SQL/PGQ can be integrated into an RDBMS, specifically in the DuckDB system, using an extension module called DuckPGQ. As such, we also demonstrate the use of the DuckDB extensibility mechanism, which allows us to add new functions, data types, operators, optimizer rules, storage systems, and even parsers to DuckDB. We also describe the new data structures and algorithms that the DuckPGQ module is based on, and how they are injected into SQL plans. While the demonstrated DuckPGQ extension module is lean and efficient, we sketch a roadmap to (i) improve its performance through new algorithms (factorized and WCOJ) and better parallelism and (ii) extend its functionality to scenarios beyond SQL, e.g., building and analyzing Graph Neural Networks.</p

CWI's Institutional Repository

DuckPGQ: Efficient property graph queries in an analytical RDBMS

Author: Boncz P.A. (Peter)
Singh T. (Tavneet)
Szárnyas G. (Gábor)
Wolde D.L.J, (Daniël) ten
Publication venue
Publication date: 08/01/2023
Field of study

In the past decade, property graph databases have emerged as a growing niche in data management. Many native graph systems and query languages have been created, but the functionality and performance still leave much room for improvement. The upcoming SQL:2023 will introduce the Property Graph Queries (SQL/PGQ) sub-language, giving relational systems the opportunity to standard- ize graph queries, and provide mature graph query functionality. We argue that (i) competent graph data systems must build on all technology that makes up a state-of-the-art relational system, (ii) the graph use case requires the addition to that of a many- source/destination path-finding algorithm and compact graph rep- resentation, and (iii) incites research in practical worst-case-optimal joins and factorized query processing techniques. We outline our design of DuckPGQ that follows this recipe, by adding efficient SQL/PGQ support to the popular open-source “embeddable analytics” relational database system DuckDB, also originally developed at CWI. Our design aims at minimizing techni- cal debt using an approach that relies on efficient vectorized UDFs. We benchmark DuckPGQ showing encouraging performance and scalability on large graph data sets, but also reinforcing the need for future research under (iii)

CWI's Institutional Repository

LSQB: A large-scale subgraph query benchmark

Author: Kuiper L.N. (Laurens)
Lissandrini M. (Matteo)
Mhedhbi A. (Amine)
Szárnyas G. (Gábor)
Waudby J. (Jack)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

We introduce LSQB, a new large-scale subgraph query benchmark. LSQB tests the performance of database management systems on an important class of subgraph queries overlooked by existing benchmarks. Matching a labelled structural graph pattern, referred to as subgraph matching, is the focus of LSQB. In relational terms, the benchmark tests DBMSs' join performance as a choke-point since subgraph matching is equivalent to multi-way joins between base Vertex and base Edge tables on ID attributes. The benchmark focuses on read-heavy workloads by relying on global queries which have been ignored by prior benchmarks. Global queries, also referred to as unseeded queries, are a type of queries that are only constrained by labels on the query vertices and edges. LSQB contains a total of nine queries and leverages the LDBC social network data generator for scalability. The benchmark gained both academic and industrial interest and is used internally by 5+ different vendors

CWI's Institutional Repository

Catalogo dei prodotti della ricerca

VBN

PolyPublie

LAGraph: Linear algebra, network analysis libraries, and the study of graph algorithms

Author: Bader D.A. (David)
Davis T.A. (Timothy)
Kitchen J. (James)
Mattson T.G. (Timothy)
McMillan S. (Scott)
Szárnyas G. (Gábor)
Welch E. (Erik)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/06/2021
Field of study

Graph algorithms can be expressed in terms of linear algebra. GraphBLAS is a library of low-level building blocks for such algorithms that targets algorithm developers. LAGraph builds on top of the GraphBLAS to target users of graph algorithms with high-level algorithms common in network analysis. In this paper, we describe the first release of the LAGraph library, the design decisions behind the library, and performance using the GAP benchmark suite. LAGraph, however, is much more than a library. It is also a project to document and analyze the full range of algorithms enabled by the GraphBLAS. To that end, we have developed a compact and intuitive notation for describing these algorithms. In this paper, we present that notation with examples from the GAP benchmark suite

CWI's Institutional Repository

LAGraph: Linear algebra, network analysis libraries, and the study of graph algorithms

Author: Bader D.A. (David)
Davis T.A. (Timothy)
Kitchen J. (James)
Mattson T.G. (Timothy)
McMillan S. (Scott)
Szárnyas G. (Gábor)
Welch E. (Erik)
Publication venue
Publication date: 04/04/2021
Field of study

arXiv.org e-Print Archive

CWI's Institutional Repository

The LDBC social network benchmark: Business intelligence workload

Author: Birler A. (Altan)
Boncz P.A. (Peter)
Steer B.A. (Benjamin)
Szakállas D. (David)
Szárnyas G. (Gábor)
Waudby J. (Jack)
Wu M. (Mingxi)
Zhang Y. (Yuchen)
Publication venue: 'VLDB Endowment'
Publication date: 20/01/2023
Field of study

The Social Network Benchmark’s Business Intelligence workload (SNB BI) is a comprehensive graph OLAP benchmark targeting analytical data systems capable of supporting graph workloads. This paper marks the finalization of almost a decade of research in academia and industry via the Linked Data Benchmark Council (LDBC). SNB BI advances the state-of-the art in synthetic and scalable analytical database benchmarks in many aspects. Its base is a sophisticated data generator, implemented on a scalable distributed infrastructure, that produces a social graph with small-world phenomena, whose value properties follow skewed and correlated distributions and where values correlate with structure. This is a temporal graph where all nodes and edges follow lifespan-based rules with temporal skew enabling realistic and consistent temporal inserts and (recursive) deletes. The query workload exploiting this skew and correlation is based on LDBC’s “choke point”-driven design methodology and will entice technical and scientific improvements in future (graph) database systems. SNB BI includes the first adoption of “parameter curation” in an analytical benchmark, a technique that ensures stable runtimes of query variants across different parameter values. Two performance metrics characterize peak single-query performance (power) and sustained concurrent query throughput. To demonstrate the portability of the benchmark, we present experimental results on a relational and a graph DBMS. Note that these do not constitute an official LDBC Benchmark Result – only audited results can use this trademarked term

CWI's Institutional Repository

The Linked Data Benchmark Council (LDBC): Driving competition and collaboration in the graph data management space

Graph data management is instrumental for several use cases such as recommendation, root cause analysis, financial fraud detection, and enterprise knowledge representation. Efficiently supporting these use cases yields a number of unique requirements, including the need for a concise query language and graph-aware query optimization techniques. The goal of the Linked Data Benchmark Council (LDBC) is to design a set of standard benchmarks that capture representative categories of graph data management problems, making the performance of systems comparable and facilitating competition among vendors. LDBC also conducts research on graph schemas and graph query languages. This paper introduces the LDBC organization and its work over the last decade

CWI's Institutional Repository

ldbc /ldbc_snb_datagen_spark

Author: Szárnyas G. (Gábor)
Publication venue
Publication date: 28/08/2023
Field of study

The LDBC SNB Data Generator (Datagen) produces the datasets for the LDBC Social Network Benchmark's workloads. The generator is designed to produce directed labelled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by Datagen, as well as the format of the output files, can be found in the latest version of official LDBC SNB specification document

CWI's Institutional Repository

LDBC SNB Documentation

Author: Szárnyas G. (Gábor)
Publication venue
Publication date: 01/01/2019
Field of study

CWI's Institutional Repository