60 research outputs found

    Exploring query execution strategies for JIT vectorization and SIMD

    Get PDF
    This paper partially explores the design space for efficient query processors on future hardware that is rich in SIMD capabilities. It departs from two well-known approaches: (1) interpreted block-at-a-time execution (a.k.a. "vectorization") and (2) "data-centric" JIT compilation, as in the HyPer system. We argue that in between these two design points in terms of granularity of execution and uni

    Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines

    Get PDF
    Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for the compilation of data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes the underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for the fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines so that tuples are never evicted from registers. We evaluate our approach with three query types: (i) a table scan query based on TPC-H Query 1, that performs up to 34% faster when addressing underutilization, (ii) a hashjoin query, where we observe up to 25% higher performance, and (iii) an approximate geospatial join query, which shows performance improvements of up to 30%

    Make the most out of your SIMD investments: Counter control flow divergence in compiled query pipelines

    Get PDF
    Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for compiling efficient data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines without introducing inefficient memory materializations. We evaluate our approach with a high-performance geospatial join query, which shows performance improvements of up to 35%

    Charting the design space of query execution using VOILA

    Get PDF
    Database architecture, while having been studied for four decades now, has delivered only a few designs with well-understood properties. These few are followed by most actual systems. Acquiring more knowledge about the design space is a very time-consuming processes that requires manually crafting prototypes with a low chance of generating material insight.We propose a framework that aims to accelerat

    Micro Adaptivity in Vectorwise

    Get PDF
    Performance of query processing functions in a DBMS can be affected by many factors, including the hardware platform, data distributions, predicate parameters, compilation method, algorithmic variations and the interactions between these. Given that there are often different function implementations possible, there is a latent performance diversity which represents both a threat to performance robustness if ignored (as is usual now) and an opportunity to increase the performance if one would be able to use the best performing implementation in each situation. Micro Adaptivity, proposed here, is a framework that keeps many alternative function implementations ("flavors") in a system. It uses a learning algorithm to choose the most promising flavor potentially at each function call, guided by the actual costs observed so far. We argue that Micro Adaptivity both increases performance robustness, and saves development time spent in finding and tuning heuristics and cost model thresholds in query optimization. In this paper, we (i) characterize a number of factors that cause performance diversity between primitive flavors, (ii) describe an e-greedy learning algorithm that casts the flavor selection into a multi-armed bandit problem, and (iii) describe the software framework for Micro Adaptivity that we implemented in the Vectorwise system. We provide micro-benchmarks, and an overall evaluation on TPC-H, showing consistent improvements

    Adaptive geospatial joins for modern hardware

    Get PDF
    Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an adaptive geospatial join that uses true hit filtering to avoid expensive geometric computations in most cases. Our technique uses a quadtree-based hierarchical grid to approximate polygons and stores these approximations in a specialized radix tree. We emphasize on an approximate version of our algorithm that guarantees a user-defined precision. The exact version of our algorithm can adapt to the expected point distribution by refining the index. We optimized our implementation for modern hardware architectures with wide SIMD vector processing units, including Intel’s brand new Knights Landing. Overall, our approach can perform up to two orders of magnitude faster than existing techniques

    Adaptive Geospatial Joins for Modern Hardware

    Get PDF
    Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an adaptive geospatial join that uses true hit filtering to avoid expensive geometric computations in most cases. Our technique uses a quadtree-based hierarchical grid to approximate polygons and stores these approximations in a specialized radix tree. We emphasize on an approximate version of our algorithm that guarantees a user-defined precision. The exact version of our algorithm can adapt to the expected point distribution by refining the index. We optimized our implementation for modern hardware architectures with wide SIMD vector processing units, including Intel's brand new Knights Landing. Overall, our approach can perform up to two orders of magnitude faster than existing techniques

    Pakkausmenetelmät hajautetussa aikasarjatietokannassa

    Get PDF
    Rise of microservices and distributed applications in containerized deployments are putting increasing amount of burden to the monitoring systems. They push the storage requirements to provide suitable performance for large queries. In this paper we present the changes we made to our distributed time series database, Hawkular-Metrics, and how it stores data more effectively in the Cassandra. We show that using our methods provides significant space savings ranging from 50 to 90% reduction in storage usage, while reducing the query speeds by over 90\% compared to the nominal approach when using Cassandra. We also provide our unique algorithm modified from Gorilla compression algorithm that we use in our solution, which provides almost three times the throughput in compression with equal compression ratio.Hajautettujen järjestelmien yleistyminen on aiheuttanut valvontajärjestelmissä tiedon määrän kasvua, sillä aikasarjojen määrä on kasvanut ja niihin talletetaan useammin tietoa. Tämä on aiheuttanut kasvavaa kuormitusta levyjärjestelmille, joilla on ongelmia palvella kasvavia kyselyitä Tässä paperissa esittelemme muutoksia hajautettuun aikasarjatietokantaamme, Hawkular-Metricsiin, käyttäen hyödyksi tehokkaampaa tiedon pakkausta ja järjestelyä kun tietoa talletetaan Cassandraan. Nopeutimme kyselyjä lähes kymmenkertaisesti ja samalla pienensimme levytilavaatimuksia aineistosta riippuen 50-95%. Esittelemme myös muutoksemme Gorilla pakkausalgoritmiin, jota hyödynnämme tulosten saavuttamiseksi. Muutoksemme nopeuttavat pakkaamista melkein kolminkertaiseksi alkuperäiseen algoritmiin nähden ilman pakkaustehon laskua
    corecore