443 research outputs found

    Dynamic Space Efficient Hashing

    Get PDF
    We consider space efficient hash tables that can grow and shrink dynamically and are always highly space efficient, i.e., their space consumption is always close to the lower bound even while growing and when taking into account storage that is only needed temporarily. None of the traditionally used hash tables have this property. We show how known approaches like linear probing and bucket cuckoo hashing can be adapted to this scenario by subdividing them into many subtables or using virtual memory overcommitting. However, these rather straightforward solutions suffer from slow amortized insertion times due to frequent reallocation in small increments. Our main result is DySECT (Dynamic Space Efficient Cuckoo Table) which avoids these problems. DySECT consists of many subtables which grow by doubling their size. The resulting inhomogeneity in subtable sizes is equalized by the flexibility available in bucket cuckoo hashing where each element can go to several buckets each of which containing several cells. Experiments indicate that DySECT works well with load factors up to 98%. With up to 2.7 times better performance than the next best solution

    Scalable Hash Tables

    Get PDF
    The term scalability with regards to this dissertation has two meanings: It means taking the best possible advantage of the provided resources (both computational and memory resources) and it also means scaling data structures in the literal sense, i.e., growing the capacity, by “rescaling” the table. Scaling well to computational resources implies constructing the fastest best per- forming algorithms and data structures. On today’s many-core machines the best performance is immediately associated with parallelism. Since CPU frequencies have stopped growing about 10-15 years ago, parallelism is the only way to take ad- vantage of growing computational resources. But for data structures in general and hash tables in particular performance is not only linked to faster computations. The most execution time is actually spent waiting for memory. Thus optimizing data structures to reduce the amount of memory accesses or to take better advantage of the memory hierarchy especially through predictable access patterns and prefetch- ing is just as important. In terms of scaling the size of hash tables we have identified three domains where scaling hash-based data structures have been lacking previously, i.e., space effi- cient growing, concurrent hash tables, and Approximate Membership Query data structures (AMQ-filter). Throughout this dissertation, we describe the problems in these areas and develop efficient solutions. We highlight three different libraries that we have developed over the course of this dissertation, each containing mul- tiple implementations that have shown throughout our testing to be among the best implementations in their respective domains. In this composition they offer a comprehensive toolbox that can be used to solve many kinds of hashing related problems or to develop individual solutions for further ones. DySECT is a library for space efficient hash tables specifically growing space effi- cient hash tables that scale with their input size. It contains the namesake DySECT data structure in addition to a number of different probing and cuckoo based im- plementations. Growt is a library for highly efficient concurrent hash tables. It contains a very fast base table and a number of extensions to adapt this table to match any purpose. All extension can be combined to create a variety of different interfaces. In our extensive experimental evaluation, each adaptation has shown to be among the best hash tables for their specific purpose. Lpqfilter is a library for concurrent approximate membership query (AMQ) data structures. It contains some original data structures, like the linear probing quotient filter, as well as some novel approaches to dynamically sized quotient filters

    Sliding Block Hashing (Slick) -- Basic Algorithmic Ideas

    Full text link
    We present {\bf Sli}ding Blo{\bf ck} Hashing (Slick), a simple hash table data structure that combines high performance with very good space efficiency. This preliminary report outlines avenues for analysis and implementation that we intend to pursue

    A Survey of Hashing Techniques for High Performance Computing

    Get PDF
    Hashing is a well-known and widely used technique for providing O(1) access to large files on secondary storage and tables in memory. Hashing techniques were introduced in the early 60s. The term hash function historically is used to denote a function that compresses a string of arbitrary input to a string of fixed length. Hashing finds applications in other fields such as fuzzy matching, error checking, authentication, cryptography, and networking. Hashing techniques have found application to provide faster access in routing tables, with the increase in the size of the routing tables. More recently, hashing has found applications in transactional memory in hardware. Motivated by these newly emerged applications of hashing, in this paper we present a survey of hashing techniques starting from traditional hashing methods with greater emphasis on the recent developments. We provide a brief explanation on hardware hashing and a brief introduction to transactional memory

    Vivid cuckoo hash : fast cuckoo table building in SIMD

    Get PDF
    Orientador: Eduardo Cunha de AlmeidaCoorientador: Marco Antonio Zanata AlvezDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 09/07/2019Inclui referências: p. 39-40Área de concentração: Ciência da ComputaçãoResumo: Tabelas Hash possuem um lugar de destaque em Bancos de Dados modernos, encontrando aplicações na execução de junções, agrupamentos, indexação, remoção de duplicidades e acelerando consultas pontuais. Essa dissertação tem como foco estudar o efeito do paralelismo em Tabelas Cuckoo. Cuckoo Hashing (Pagh and Rodler (2004)) é uma técnica que lida com colisões garantindo que o dado seja recuperado em, no máximo, dois acessos à memória no pior caso. No entanto, a construção de tabelas Cuckoo com os métodos sequenciais atualmente utilizados é ineficiente ao lidar com o expurgo de chaves que colidem na estrutura de dados. Nós propomos um método vetorizado verticalmente e com técnica de dependência de dados para construir tabelas Cuckoo - ViViD Cuckoo Hash. Nosso método explora paralelismo de dados com instruções SIMD AVX512 e transforma dependências de controle em dependências de dados para reduzir o tempo de resposta médio para o processo de construção em cerca de 90% comparado ao método de construção sequencial. Palavras-chave: Cuckoo Hash. SIMD. Hash Join.Abstract: Hash Tables play a lead role in modern databases systems, finding applications in the execution of joins, grouping, indexing, removal of duplicates, and accelerating point queries. In this dissertation, we focus on Cuckoo Hash(Pagh and Rodler (2004)), a technique to deal with collisions guaranteeing that data is retrieved with at most two memory access in the worst case. However, building the Cuckoo Table with the current scalar methods is inefficient to treat the eviction of the colliding keys within the data structure. We propose a vertically vectorized data-dependent method to build Cuckoo Tables - ViViD Cuckoo Hash. Our method explores data parallelism with AVX-512 SIMD instructions and transforms control dependencies into data dependencies to make the build process faster with an overall reduction in response time by 90% compared to the scalar Cuckoo Hash. Keywords: Cuckoo Hash. SIMD. Hash Join

    Coherent Parallel Hashing

    Get PDF
    International audienceRecent spatial hashing schemes hash millions of keys in parallel, compacting sparse spatial data in small hash tables while still allowing for fast access from the GPU. Unfortunately, available schemes suffer from two drawbacks: Multiple runs of the construction process are often required before success, and the random nature of the hash functions decreases access performance. We introduce a new parallel hashing scheme which reaches high load factor with a very low failure rate. In addition our scheme has the unique advantage to exploit coherence in the data and the access patterns for faster performance. Compared to existing approaches, it exhibits much greater locality of memory accesses and consistent execution paths within groups of threads. This is especially well suited to Computer Graphics applications, where spatial coherence is common. In absence of coherence our scheme performs similarly to previous methods, but does not suffer from construction failures. Our scheme is based on the Robin Hood scheme modified to quickly abort queries of keys that are not in the table, and to preserve coherence. We demonstrate our scheme on a variety of data sets. We analyze construction and access performance, as well as cache and threads behavior.Nous décrivons une nouvelle technique de construction de table de hachage en parallèle qui permet d'insérer plusieurs centaines de millions de clés par secondes dans des tables pleines à 99%. Cette technique rend possible un accès très rapide aux données quand l'ordre des requêtes est cohérent, ce qui la rend particulièrement adaptée aux applications de l'informatique graphique

    Concurrent Expandable AMQs on the Basis of Quotient Filters

    Get PDF
    A quotient filter is a cache efficient Approximate Membership Query (AMQ) data structure. Depending on the fill degree of the filter most insertions and queries only need to access one or two consecutive cache lines. This makes quotient filters very fast compared to the more commonly used Bloom filters that incur multiple independent memory accesses depending on the false positive rate. However, concurrent Bloom filters are easy to implement and can be implemented lock-free while concurrent quotient filters are not as simple. Usually concurrent quotient filters work by using an external array of locks - each protecting a region of the table. Accessing this array incurs one additional memory access per operation. We propose a new locking scheme that has no memory overhead. Using this new locking scheme we achieve 1.6× times higher insertion performance and over 2.1× higher query performance than with the common external locking scheme. Another advantage of quotient filters over Bloom filters is that a quotient filter can change its capacity when it is becoming full. We implement this growing technique for our concurrent quotient filters and adapt it in a way that allows unbounded growing while keeping a bounded false positive rate. We call the resulting data structure a fully expandable quotient filter. Its design is similar to scalable Bloom filters, but we exploit some concepts inherent to quotient filters to improve the space efficiency and the query speed. Additionally, we propose several quotient filter variants that are aimed to reduce the number of status bits (2-status-bit variant) or to simplify concurrent implementations (linear probing quotient filter). The linear probing quotient filter even leads to a lock-free concurrent filter implementation. This is especially interesting, since we show that any lock-free implementation of other common quotient filter variants would incur significant overheads in the form of additional data fields or multiple passes over the accessed data. The code produced as part of this submission can be found at https://www.github.com/Toobiased/lpqfilter
    corecore