23 research outputs found
Make the most out of your SIMD investments: Counter control flow divergence in compiled query pipelines
Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for compiling efficient data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines without introducing inefficient memory materializations. We evaluate our approach with a high-performance geospatial join query, which shows performance improvements of up to 35%
Multidimensional Range Queries on Modern Hardware
Range queries over multidimensional data are an important part of database
workloads in many applications. Their execution may be accelerated by using
multidimensional index structures (MDIS), such as kd-trees or R-trees. As for
most index structures, the usefulness of this approach depends on the
selectivity of the queries, and common wisdom told that a simple scan beats
MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom
is largely based on evaluations that are almost two decades old, performed on
data being held on disks, applying IO-optimized data structures, and using
single-core systems. The question is whether this rule of thumb still holds
when multidimensional range queries (MDRQ) are performed on modern
architectures with large main memories holding all data, multi-core CPUs and
data-parallel instruction sets. In this paper, we study the question whether
and how much modern hardware influences the performance ratio between index
structures and scans for MDRQ. To this end, we conservatively adapted three
popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit
features of modern servers and compared their performance to different flavors
of parallel scans using multiple (synthetic and real-world) analytical
workloads over multiple (synthetic and real-world) datasets of varying size,
dimensionality, and skew. We find that all approaches benefit considerably from
using main memory and parallelization, yet to varying degrees. Our evaluation
indicates that, on current machines, scanning should be favored over parallel
versions of classical MDIS even for very selective queries
Vivid cuckoo hash : fast cuckoo table building in SIMD
Orientador: Eduardo Cunha de AlmeidaCoorientador: Marco Antonio Zanata AlvezDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 09/07/2019Inclui referências: p. 39-40Área de concentração: Ciência da ComputaçãoResumo: Tabelas Hash possuem um lugar de destaque em Bancos de Dados modernos, encontrando aplicações na execução de junções, agrupamentos, indexação, remoção de duplicidades e acelerando consultas pontuais. Essa dissertação tem como foco estudar o efeito do paralelismo em Tabelas Cuckoo. Cuckoo Hashing (Pagh and Rodler (2004)) é uma técnica que lida com colisões garantindo que o dado seja recuperado em, no máximo, dois acessos à memória no pior caso. No entanto, a construção de tabelas Cuckoo com os métodos sequenciais atualmente utilizados é ineficiente ao lidar com o expurgo de chaves que colidem na estrutura de dados. Nós propomos um método vetorizado verticalmente e com técnica de dependência de dados para construir tabelas Cuckoo - ViViD Cuckoo Hash. Nosso método explora paralelismo de dados com instruções SIMD AVX512 e transforma dependências de controle em dependências de dados para reduzir o tempo de resposta médio para o processo de construção em cerca de 90% comparado ao método de construção sequencial. Palavras-chave: Cuckoo Hash. SIMD. Hash Join.Abstract: Hash Tables play a lead role in modern databases systems, finding applications in the execution of joins, grouping, indexing, removal of duplicates, and accelerating point queries. In this dissertation, we focus on Cuckoo Hash(Pagh and Rodler (2004)), a technique to deal with collisions guaranteeing that data is retrieved with at most two memory access in the worst case. However, building the Cuckoo Table with the current scalar methods is inefficient to treat the eviction of the colliding keys within the data structure. We propose a vertically vectorized data-dependent method to build Cuckoo Tables - ViViD Cuckoo Hash. Our method explores data parallelism with AVX-512 SIMD instructions and transforms control dependencies into data dependencies to make the build process faster with an overall reduction in response time by 90% compared to the scalar Cuckoo Hash. Keywords: Cuckoo Hash. SIMD. Hash Join
Exploring query execution strategies for JIT vectorization and SIMD
This paper partially explores the design space for efficient query processors on future hardware that is rich in SIMD capabilities. It departs from two well-known approaches: (1) interpreted block-at-a-time execution (a.k.a. "vectorization")
and (2) "data-centric" JIT compilation, as in the HyPer system. We argue that in between these two design points in terms of granularity of execution and uni
Exploiting Data Skew for Improved Query Performance
Analytic queries enable sophisticated large-scale data analysis within many
commercial, scientific and medical domains today. Data skew is a ubiquitous
feature of these real-world domains. In a retail database, some products are
typically much more popular than others. In a text database, word frequencies
follow a Zipf distribution with a small number of very common words, and a long
tail of infrequent words. In a geographic database, some regions have much
higher populations (and data measurements) than others. Current systems do not
make the most of caches for exploiting skew. In particular, a whole cache line
may remain cache resident even though only a small part of the cache line
corresponds to a popular data item. In this paper, we propose a novel index
structure for repositioning data items to concentrate popular items into the
same cache lines. The net result is better spatial locality, and better
utilization of limited cache resources. We develop a theoretical model for
analyzing the cache behavior, and implement database operators that are
efficient in the presence of skew. Our experiments on real and synthetic data
show that exploiting skew can significantly improve in-memory query
performance. In some cases, our techniques can speed up queries by over an
order of magnitude
Multiple Pattern Matching for Network Security Applications: Acceleration through Vectorization
Pattern matching is a key building block of Intrusion Detection Systems and firewalls, which are deployed nowadays on commodity systems from laptops to massive web servers in the cloud. In fact, pattern matching is one of their most computationally intensive parts and a bottleneck to their performance. In Network Intrusion Detection, for example, pattern matching algorithms handle thousands of patterns and contribute to more than 70% of the total running time of the system.In this paper, we introduce efficient algorithmic designs for multiple pattern matching which (a) ensure cache locality and (b) utilize modern SIMD instructions. We first identify properties of pattern matching that make it fit for vectorization and show how to use them in the algorithmic design. Second, we build on an earlier, cache-aware algorithmic design and we show how cache-locality combined with SIMD gather instructions, introduced in 2013 to Intel\u27s family of processors, can be applied to pattern matching. We evaluate our algorithmic design with open data sets of real-world network traffic:Our results on two different platforms, Haswell and Xeon-Phi, show a speedup of 1.8x and 3.6x, respectively, over Direct Filter Classification (DFC), a recently proposed algorithm by Choi et al. for pattern matching exploiting cache locality, and a speedup of more than 2.3x over Aho-Corasick, a widely used algorithm in today\u27s Intrusion Detection Systems