136 research outputs found

    Boosting Multi-Core Reachability Performance with Shared Hash Tables

    Get PDF
    This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

    On deletions in open addressing hashing

    Get PDF
    Deletions in open addressing tables have often been seen as problematic. The usual solution is to use a special mark ’deleted’ so that probe sequences continue past deleted slots, as if there was an element still sitting there. Such a solution, notwithstanding is wide applicability, may involve serious performance degradation. In the first part of this paper we review a practical implementation of the often overlooked deletion algorithm for linear probing hash tables, analyze its properties and performance, and provide several strong arguments in favor of the Robin Hood variant. In particular, we show how a small variation can yield substantial improvements for unsuccesful search. In the second part we propose an algorithm for true deletion in open addressing hashing with secondary clustering, like quadratic hashing. As far as we know, this is the first time that such an algorithm appears in the literature. Although it involves some extra memory for bookkeeping, the algorithm is comparatively easy and efficient, and might be of practical value, besides its theoretical interest.Postprint (published version

    On deletions in open addressing hashing

    Get PDF
    Deletions in open addressing tables have often been seen as problematic. The usual solution is to use a special mark’deleted’ so that probe sequences continue past deleted slots, as if there was an element still sitting there. Such a solution, notwithstanding is wide applicability, may involve performance degradation. In the first part of this paper we review a practical implementation of the often overlooked deletion algorithm for linear probing hash tables, analyze its properties and performance, and provide several strong arguments in favor of the Robin Hood variant. In particular, we show how a small variation can yield substantial improvements for unsuccessful search. In the second part we propose an algorithm for true deletion in open addressing hashing with secondary clustering, like quadratic hashing. As far as we know, this is the first time that such an algorithm appears in the literature. Moreover, for tables built using the Robin Hood variant the deletion algorithm strongly preserves randomness (the resulting table is identical to the table that would result if the item were not inserted at all). Although it involves some extra memory for bookkeeping, the algorithm is comparatively easy and efficient, and it might be of some practical value, besides its theoretical interest.Peer ReviewedPostprint (author's final draft

    Scalable Hash Tables

    Get PDF
    The term scalability with regards to this dissertation has two meanings: It means taking the best possible advantage of the provided resources (both computational and memory resources) and it also means scaling data structures in the literal sense, i.e., growing the capacity, by “rescaling” the table. Scaling well to computational resources implies constructing the fastest best per- forming algorithms and data structures. On today’s many-core machines the best performance is immediately associated with parallelism. Since CPU frequencies have stopped growing about 10-15 years ago, parallelism is the only way to take ad- vantage of growing computational resources. But for data structures in general and hash tables in particular performance is not only linked to faster computations. The most execution time is actually spent waiting for memory. Thus optimizing data structures to reduce the amount of memory accesses or to take better advantage of the memory hierarchy especially through predictable access patterns and prefetch- ing is just as important. In terms of scaling the size of hash tables we have identified three domains where scaling hash-based data structures have been lacking previously, i.e., space effi- cient growing, concurrent hash tables, and Approximate Membership Query data structures (AMQ-filter). Throughout this dissertation, we describe the problems in these areas and develop efficient solutions. We highlight three different libraries that we have developed over the course of this dissertation, each containing mul- tiple implementations that have shown throughout our testing to be among the best implementations in their respective domains. In this composition they offer a comprehensive toolbox that can be used to solve many kinds of hashing related problems or to develop individual solutions for further ones. DySECT is a library for space efficient hash tables specifically growing space effi- cient hash tables that scale with their input size. It contains the namesake DySECT data structure in addition to a number of different probing and cuckoo based im- plementations. Growt is a library for highly efficient concurrent hash tables. It contains a very fast base table and a number of extensions to adapt this table to match any purpose. All extension can be combined to create a variety of different interfaces. In our extensive experimental evaluation, each adaptation has shown to be among the best hash tables for their specific purpose. Lpqfilter is a library for concurrent approximate membership query (AMQ) data structures. It contains some original data structures, like the linear probing quotient filter, as well as some novel approaches to dynamically sized quotient filters

    Accelerators for Data Processing

    Get PDF
    The explosive growth in digital data and its growing role in real-time analytics motivate the design of high-performance database management systems (DBMSs). Meanwhile, slowdown in supply voltage scaling has stymied improvements in core performance and ushered an era of power-limited chips. These developments motivate the design of software and hardware DBMS accelerators that (1) maximize utility by accelerating the dominant operations, and (2) provide flexibility in the choice of DBMS, data layout, and data types. In this thesis, we identify pointer-intensive data structure operations as a key performance and efficiency bottleneck in data analytics workloads. We observe that data analytics tasks include a large number of independent data structure lookups, each of which is characterized by dependent long-latency memory accesses due to pointer chasing. Unfortunately, exploiting such inter-lookup parallelism to overlap memory accesses from different lookups is not possible within the limited instruction window of modern out-of-order cores. Similarly, software prefetching techniques attempt to exploit inter-lookup parallelism by statically staging independent lookups, and hence break down in the face of irregularity across lookup stages. Based on these observations, we provide a dynamic software acceleration scheme for exploiting inter-lookup parallelism to hide the memory access latency despite the irregularities across lookups. Furthermore, we propose a programmable hardware accelerator to maximize the efficiency of the data structure lookups. As a result, through flexible hardware and software techniques we eliminate a key efficiency and performance bottleneck in data analytics operations

    The computer science graduate record exam tutorial courseware, 1994

    Get PDF
    The design and development of Computer Science Graduate Record Examination Tutorial Software will be discussed. The courseware reviews Computer Design, File Structures, Data Structures, and Discrete Math to thoroughly prepare students for the exam. A demonstration of the software is included on diskette

    Handling Massive N-Gram Datasets Efficiently

    Get PDF
    This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and estimation, that is computing the probability distribution of the strings from a large textual source. Regarding the problem of indexing, we describe compressed, exact and lossless data structures that achieve, at the same time, high space reductions and no time degradation with respect to state-of-the-art solutions and related software packages. In particular, we present a compressed trie data structure in which each word following a context of fixed length k, i.e., its preceding k words, is encoded as an integer whose value is proportional to the number of words that follow such context. Since the number of words following a given context is typically very small in natural languages, we lower the space of representation to compression levels that were never achieved before. Despite the significant savings in space, our technique introduces a negligible penalty at query time. Regarding the problem of estimation, we present a novel algorithm for estimating modified Kneser-Ney language models, that have emerged as the de-facto choice for language modeling in both academia and industry, thanks to their relatively low perplexity performance. Estimating such models from large textual sources poses the challenge of devising algorithms that make a parsimonious use of the disk. The state-of-the-art algorithm uses three sorting steps in external memory: we show an improved construction that requires only one sorting step thanks to exploiting the properties of the extracted n-gram strings. With an extensive experimental analysis performed on billions of n-grams, we show an average improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February 2019, Article No: 2

    Power and Memory Efficient Hashing Schemes for Some Network Applications

    Get PDF
    Hash tables (HTs) are used to implement various lookup schemes and they need to be efficient in terms of speed, space utilization, and power consumptions. For IP lookup, the hashing schemes are attractive due to their deterministic O(1) lookup performance and low power consumptions, in contrast to the TCAM and Trie based approaches. As the size of IP lookup table grows exponentially, scalable lookup performance is highly desirable. For next generation high-speed routers, this is a vital requirement when IP lookup remains in the critical data path and demands a predictable throughput. However, recently proposed hash schemes, like a Bloomier filter HT and a Fast HT (FHT) suffer from a number of flaws, including setup failures, update overheads, duplicate keys, and pointer overheads. In this dissertation, four novel hashing schemes and their architectures are proposed to address the above concerns by using pipelined Bloom filters and a Fingerprint filter which are designed for a memory-efficient approximate match. For IP lookups, two new hash schemes such as a Hierarchically Indexed Hash Table (HIHT) and Fingerprint-based Hash Table (FPHT) are introduced to achieve a a perfect match is assured without pointer overhead. Further, two hash mechanisms are also proposed to provide memory and power efficient lookup for packet processing applications. Among four proposed schemes, the HIHT and the FPHT schemes are evaluated for their performance and compared with TCAM and Trie based IP lookup schemes. Various sizes of IP lookup tables are considered to demonstrate scalability in terms of speed, memory use, and power consumptions. While an FPHT uses less memory than an HIHT, an FPHT-based IP lookup scheme reduces power consumption by a factor of 51 and requires 1.8 times memory compared to TCAM-based and trie-based IP lookup schemes, respectively. In dissertation, a multi-tiered packet classifier has been proposed that saves at most 3.2 times power compared to the existing parallel packet classifier. Intrinsic hashing schemes lack of high throughput, unlike partitioned Ternary Content Addressable Memory (TCAM)-based scheme that are capable of parallel lookups despite large power consumption. A hybrid CAM (HCAM) architecture has been introduced. Simulation results indicate HCAM to achieve the same throughput as contemporary schemes while it uses 2.8 times less memory and 3.6 times less power compared to the contemporary schemes
    • …
    corecore