Search CORE

6 research outputs found

[Demo] Low-latency spark queries on updatable data

Author: Boncz P.A. (Peter)
Dave A. (Ankur)
Ghit B. (Bogdan)
Uta A. (Alexandru)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/06/2019
Field of study

As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing

VU Research Portal

Crossref

CWI's Institutional Repository

Welcome to Sigmod 2019 - The 2019 ACM SIGMOD International Conference on the Management of Data!

Author: Ailamaki A. (Anastasia)
Boncz P.A. (Peter)
Manegold S. (Stefan)
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository

Proceedings of the 2019 International Conference on Management of Data

Author
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository

Multi-objective query optimization in Spark SQL

Author: D’orazio Laurent
Georgoulakis Michail
Kantere Verena
Publication venue: HAL CCSD
Publication date: 22/08/2022
Field of study

International audienc

INRIA a CCSD electronic archive server

On Distributed SPARQL Query Processing Using Triangles of RDF Triples

Author: Hubert Naacke
Olivier Curé
Publication venue: RonPub
Publication date: 01/01/2020
Field of study

Knowledge Graphs are providing valuable functionalities, such as data integration and reasoning, to an increasing number of applications in all kinds of companies. These applications partly depend on the efficiency of a Knowledge Graph management system which is often based on the RDF data model and queried with SPARQL. In this context, query performance is preponderant and relies on an optimizer that usually makes an intensive usage of a large set of indexes. Generally, these indexes correspond to different re-orderings of the subject, predicate and object of a triple pattern. In this work, we present a novel approach that considers indexes formed by a frequently encountered basic graph pattern: triangle of triples. We propose dedicated data structures to store these triangles, provide distributed algorithms to discover and materialize them, including inferred triangles, and detail query optimization techniques, including a data partitioning approach for bias data. We provide an implementation that runs on top of Apache Spark and experiment on two real-world RDF data sets. This evaluation emphasizes the performance boost (up to 40x on query processing) that one can obtain by using our approach when facing triangles of triples

RonPub -- Research Online Publishing

Improving query performance on dynamic graphs

Author: Barquero Gala
Troya Javier
Vallecillo-Moreno Antonio Jesus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/11/2020
Field of study

Querying large models efficiently often imposes high demands on system resources such as memory, processing time, disk access or network latency. The situation becomes more complicated when data are highly interconnected, e.g. in the form of graph structures, and when data sources are heterogeneous, partly coming from dynamic systems and partly stored in databases. These situations are now common in many existing social networking applications and geo-location systems, which require specialized and efficient query algorithms in order to make informed decisions on time. In this paper, we propose an algorithm to improve the memory consumption and time performance of this type of queries by reducing the amount of elements to be processed, focusing only on the information that is relevant to the query but without compromising the accuracy of its results. To this end, the reduced subset of data is selected depending on the type of query and its constituent f ilters. Three case studies are used to evaluate the performance of our proposal, obtaining significant speedups in all cases.This work is partially supported by the European Commission (FEDER) and the Spanish Government under projects APOLO (US-1264651), HORATIO (RTI2018-101204-B-C21), EKIPMENT-PLUS (P18-FR-2895) and COSCA (PGC2018-094905B-I00)

Repositorio Institucional Universidad de Málaga