Search CORE

5 research outputs found

RDF query answering using apache spark : Review and assessment

Author: Agathangelos Giannis
Kondylakis Haridimos
Plexousakis Dimitris
Stefanidis Kostas
Troullinou Georgia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/11/2019
Field of study

Crossref

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

On Distributed SPARQL Query Processing Using Triangles of RDF Triples

Author: Hubert Naacke
Olivier Curé
Publication venue: RonPub
Publication date: 01/01/2020
Field of study

Knowledge Graphs are providing valuable functionalities, such as data integration and reasoning, to an increasing number of applications in all kinds of companies. These applications partly depend on the efficiency of a Knowledge Graph management system which is often based on the RDF data model and queried with SPARQL. In this context, query performance is preponderant and relies on an optimizer that usually makes an intensive usage of a large set of indexes. Generally, these indexes correspond to different re-orderings of the subject, predicate and object of a triple pattern. In this work, we present a novel approach that considers indexes formed by a frequently encountered basic graph pattern: triangle of triples. We propose dedicated data structures to store these triangles, provide distributed algorithms to discover and materialize them, including inferred triangles, and detail query optimization techniques, including a data partitioning approach for bias data. We provide an implementation that runs on top of Apache Spark and experiment on two real-world RDF data sets. This evaluation emphasizes the performance boost (up to 40x on query processing) that one can obtain by using our approach when facing triangles of triples

RonPub -- Research Online Publishing

SPARQL Graph Pattern Processing with Apache Spark

Author: Amann Bernd
Curé Olivier
Naacke Hubert
Publication venue: HAL CCSD
Publication date: 19/05/2017
Field of study

International audienceA common way to achieve scalability for processing SPARQL queries over large RDF data sets is to choose map-reduce frameworks like Hadoop or Spark. Processing complex SPARQL queries generating large join plans over distributed data partitions is a major challenge in these shared nothing architectures. In this article we are particularly interested in two representative distributed join algorithms, partitioned join and broadcast join, which are deployed in map-reduce frameworks for the evaluation of complex distributed graph pattern join plans. We compare five SPARQL graph pattern evaluation implementations on top of Apache Spark to illustrate the importance of cautiously choosing the physical data storage layer and of the possibility to use both join algorithms to take account of the existing predefined data partitionings. Our experimentations with different SPARQL benchmarks over real-world and synthetic workloads emphasize that hybrid join plans introduce more flexibility and often can achieve better performance than join plans using a single kind of join implementation

HAL Descartes

SPARQL Graph Pattern Processing with Apache Spark

Author: Amann Bernd
Curé Olivier
Naacke Hubert
Publication venue: HAL CCSD
Publication date: 19/05/2017
Field of study

Hal-Diderot

SPARQL Graph Pattern Processing with Apache Spark

Author: Amann Bernd
Curé Olivier
Naacke Hubert
Publication venue: HAL CCSD
Publication date: 19/05/2017
Field of study

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM