Search CORE

25 research outputs found

TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark

Author: Boncz P.A. (Peter)
Erling O. (Orri)
Neumann T. (Thomas)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The TPC-D benchmark was developed almost 20 years ago, and even though its current existence as TPC H could be considered superseded by TPC-DS, one can still learn from it. We focus on the technical level, summarizing the challenges posed by the TPC-H workload as we now understand them, which w

VU Research Portal

CWI's Institutional Repository

S3G2: a Scalable Structure-correlated Social Graph Generator

Author: Boncz P.A. (Peter)
Erling O. (Orri)
Pham M.-D. (Minh-Duc)
Publication venue: CWI
Publication date: 01/06/2012
Field of study

Benchmarking graph-oriented database workloads and graph-oriented database systems are increasingly becoming relevant in analytical Big Data tasks, such as social network analysis. In graph data, structure is not mainly found inside the nodes, but especially in the way nodes happen to be connected, i.e. structural correlations. Because such structural correlations determine join fan-outs experienced by graph analysis algorithms and graph query executors, they are an essential, yet typically neglected, ingredient of synthetic graph generators. To address this, we present S3G2: a Scalable Structure-correlated Social Graph Generator. This graph generator creates a synthetic social graph, containing non-uniform value distributions and structural correlations, and is intended as a testbed for scalable graph analysis algorithms and graph database systems. We generalize the problem to decompose correlated graph generation in multiple passes that each focus on one so-called "correlation dimension"; each of which can be mapped to a MapReduce task. We show that using S3G2 can generate social graphs that (i) share well-known graph connectivity characteristics typically found in real social graphs (ii) contain certain plausible structural correlations that influence the performance of graph analysis algorithms and queries, and (iii) can be quickly generated at huge sizes on common cluster hardware

CWI's Institutional Repository

SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

Author: Acosta Maribel
Aluç Güneş
Aranda Carlos Buil
Bonatti Piero Andrea
Buil-Aranda Carlos
Erling Orri
Hartig Olaf
Hasnain Ali
Heling Lars
Hernández-Illera A.
Martínez-Prieto M.A.
Meimaris M.
Polleres Axel
Saleem Muhammad
Publication venue: ACM Digital Library
Publication date: 01/01/2020
Field of study

While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that smart-KG outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs

Crossref

KITopen

RDFSync: efficient remote synchronization of RDF models

Author: Christian Morbidoni
Giovanni Tummarello
Orri Erling
Reto Bachmann-gmür
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. In this paper we describe RDFSync, a methodology for efficient synchronization and merging of RDF models. RDFSync is based on decomposing a model into Minimum Self-Contained graphs (MSGs). After illustrating theory and deriving properties of MSGs, we show how a RDF model can be represented by a list of hashes of such information fragments. The synchronization procedure here described is based on the evaluation and remote comparison of these ordered lists. Experimental results show that the algorithm provides very significant savings on network traffic compared to the fileoriented synchronization of serialized RDF graphs. Finally, we provide the design and report the implementation of a protocol for executing the RDFSync algorithm over HTTP

CiteSeerX

IRIS UniversitÃ Politecnica delle Marche

The meaningful use of big data: Four perspectives - four challenges

Author: Bizer Christian
Boncz Peter
Brodie Michael L.
Erling Orri
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Twenty-five Semantic Web and Database researchers met at the 2011 STI Semantic Summit in Riga, Latvia July 6-8, 2011 to discuss the opportunities and challenges posed by Big Data for the Semantic Web, Semantic Technologies, and Database communities. The unanimous conclusion was that the greatest shared challenge was not only engineering Big Data, but also doing so meaningfully. The following are four expressions of that challenge from different perspectives

VU Research Portal

CWI's Institutional Repository

MAnnheim DOCument Server

Deriving an Emergent Relational Schema from RDF Data

Author: Boncz Peter
Erling Orri
Linnea P.
Pham Minh-Duc
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We motivate and describe techniques that allow to detect an ``emergent'' relational schema from RDF data. We show that on a wide variety of datasets, the found structure explains well over 90% of the RDF triples. Further, we also describe technical solutions to the semantic challenge to give short names that humans find logical to these emergent tables, columns and relationships between tables. Our techniques can be exploited in many ways, e.g., to improve the efficiency of SPARQL systems, or to use existing SQL-based applications on top of any RDF dataset using a RDBMS

CWI's Institutional Repository