Search CORE

323 research outputs found

Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

Author: Begoli Edmon
Hyde Julian
Lemire Daniel
Mior Michael J.
Rodríguez Jesús Camacho
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2018
Field of study

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.Comment: SIGMOD'1

arXiv.org e-Print Archive

R-libre

Make the most out of your SIMD investments: Counter control flow divergence in compiled query pipelines

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Lang H. (Harald)
Neumann T. (Thomas)
Passing L.K. (Linnea)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/06/2018
Field of study

Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for compiling efficient data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines without introducing inefficient memory materializations. We evaluate our approach with a high-performance geospatial join query, which shows performance improvements of up to 35%

Crossref

CWI's Institutional Repository

Scipedia

Distribution Policies for Datalog

Author: Albarghouthi Aws
Ketsman Bas
Koutris Paraschos
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Database Theory (ICDT 2018)
Publication date: 01/01/2018
Field of study

Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds

Dagstuhl Research Online Publication Server

Efficient Compute Node-Local Replication Mechanisms for NVRAM-Centric Data Structures

Author: Habich Dirk
Kissinger Thomas
Lehner Wolfgang
Zarubin Mikhail
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/07/2022
Field of study

Non-volatile random-access memory (NVRAM) is about to hit the market and will require significant changes to the architecture of in-memory database systems. Since such hybrid DRAM-NVRAM database systems will keep the primary data solely persistent in the NVRAM, efficient replication mechanisms need to be considered to prevent data losses and to guarantee high availability in case of NVDIMM failures. In this paper, we argue for a software-based replication approach and present compute node-local mechanisms to provide the building blocks for an efficient NVRAM replication with a low latency and throughput penalty. Within our evaluation, we measured up to 10x less overhead for our optimized replication mechanisms compared to the basic replication mechanism of the Intel persistent memory development kit (PMDK)

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa