136 research outputs found
Boosting Multi-Core Reachability Performance with Shared Hash Tables
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible. In this paper, we present a scaling solution for
shared state storage which is based on a lockless hash table implementation.
The solution is specifically designed for the cache architecture of modern
CPUs. Because model checking algorithms impose loose requirements on the hash
table operations, their design can be streamlined substantially compared to
related work on lockless hash tables. Still, an implementation of the hash
table presented here has dozens of sensitive performance parameters (bucket
size, cache line size, data layout, probing sequence, etc.). We analyzed their
impact and compared the resulting speedups with related tools. Our
implementation outperforms two state-of-the-art multi-core model checkers (SPIN
and DiVinE) by a substantial margin, while placing fewer constraints on the
load balancing and search algorithms.Comment: preliminary repor
On deletions in open addressing hashing
Deletions in open addressing tables have often been seen as problematic. The usual solution is to use a special mark ’deleted’ so that probe sequences continue past deleted slots, as if there was an element still sitting there. Such a solution, notwithstanding is wide applicability, may involve serious performance degradation. In the first part of this paper we review a practical implementation of the often overlooked deletion algorithm for linear probing hash tables, analyze its properties and performance, and provide several strong arguments in favor of the Robin Hood variant. In particular, we show how a small variation can yield substantial improvements for unsuccesful search. In the second part we propose an algorithm for true deletion in open addressing hashing with secondary clustering, like quadratic hashing. As far as we know, this is the first time that such an algorithm appears in the literature. Although it involves some extra memory for bookkeeping, the algorithm is comparatively easy and efficient, and might be of practical value, besides its theoretical interest.Postprint (published version
On deletions in open addressing hashing
Deletions in open addressing tables have often been seen as problematic. The usual solution is to use a special mark’deleted’ so that probe sequences continue past deleted slots, as if there was an element still sitting there. Such a solution, notwithstanding is wide applicability, may involve performance degradation. In the first part of this paper we review a practical implementation of the often overlooked deletion algorithm for linear probing hash tables, analyze its properties and performance, and provide several strong arguments in favor of the Robin Hood variant. In particular, we show how a small variation can yield substantial improvements for unsuccessful search. In the second part we propose an algorithm for true deletion in open addressing hashing with secondary clustering, like quadratic hashing. As far as we know, this is the first time that such an algorithm appears in the literature. Moreover, for tables built using the Robin Hood variant the deletion algorithm strongly preserves randomness (the resulting table is identical to the table that would result if the item were not inserted at all). Although it involves some extra memory for bookkeeping, the algorithm is comparatively easy and efficient, and it might be of some practical value, besides its theoretical interest.Peer ReviewedPostprint (author's final draft
Scalable Hash Tables
The term scalability with regards to this dissertation has two meanings: It means
taking the best possible advantage of the provided resources (both computational
and memory resources) and it also means scaling data structures in the literal sense,
i.e., growing the capacity, by “rescaling” the table.
Scaling well to computational resources implies constructing the fastest best per-
forming algorithms and data structures. On today’s many-core machines the best
performance is immediately associated with parallelism. Since CPU frequencies
have stopped growing about 10-15 years ago, parallelism is the only way to take ad-
vantage of growing computational resources. But for data structures in general and
hash tables in particular performance is not only linked to faster computations. The
most execution time is actually spent waiting for memory. Thus optimizing data
structures to reduce the amount of memory accesses or to take better advantage of
the memory hierarchy especially through predictable access patterns and prefetch-
ing is just as important.
In terms of scaling the size of hash tables we have identified three domains where
scaling hash-based data structures have been lacking previously, i.e., space effi-
cient growing, concurrent hash tables, and Approximate Membership Query data
structures (AMQ-filter). Throughout this dissertation, we describe the problems
in these areas and develop efficient solutions. We highlight three different libraries
that we have developed over the course of this dissertation, each containing mul-
tiple implementations that have shown throughout our testing to be among the
best implementations in their respective domains. In this composition they offer
a comprehensive toolbox that can be used to solve many kinds of hashing related
problems or to develop individual solutions for further ones.
DySECT is a library for space efficient hash tables specifically growing space effi-
cient hash tables that scale with their input size. It contains the namesake DySECT
data structure in addition to a number of different probing and cuckoo based im-
plementations. Growt is a library for highly efficient concurrent hash tables. It
contains a very fast base table and a number of extensions to adapt this table to
match any purpose. All extension can be combined to create a variety of different
interfaces. In our extensive experimental evaluation, each adaptation has shown
to be among the best hash tables for their specific purpose. Lpqfilter is a library
for concurrent approximate membership query (AMQ) data structures. It contains
some original data structures, like the linear probing quotient filter, as well as some
novel approaches to dynamically sized quotient filters
Accelerators for Data Processing
The explosive growth in digital data and its growing role in real-time analytics motivate the design of high-performance database management systems (DBMSs). Meanwhile, slowdown in supply voltage scaling has stymied improvements in core performance and ushered an era of power-limited chips. These developments motivate the design of software and hardware DBMS accelerators that (1) maximize utility by accelerating the dominant operations, and (2) provide flexibility in the choice of DBMS, data layout, and data types. In this thesis, we identify pointer-intensive data structure operations as a key performance and efficiency bottleneck in data analytics workloads. We observe that data analytics tasks include a large number of independent data structure lookups, each of which is characterized by dependent long-latency memory accesses due to pointer chasing. Unfortunately, exploiting such inter-lookup parallelism to overlap memory accesses from different lookups is not possible within the limited instruction window of modern out-of-order cores. Similarly, software prefetching techniques attempt to exploit inter-lookup parallelism by statically staging independent lookups, and hence break down in the face of irregularity across lookup stages. Based on these observations, we provide a dynamic software acceleration scheme for exploiting inter-lookup parallelism to hide the memory access latency despite the irregularities across lookups. Furthermore, we propose a programmable hardware accelerator to maximize the efficiency of the data structure lookups. As a result, through flexible hardware and software techniques we eliminate a key efficiency and performance bottleneck in data analytics operations
The computer science graduate record exam tutorial courseware, 1994
The design and development of Computer Science Graduate Record Examination Tutorial Software will be discussed. The courseware reviews Computer Design, File Structures, Data Structures, and Discrete Math to thoroughly prepare students for the exam. A demonstration of the software is included on diskette
Handling Massive N-Gram Datasets Efficiently
This paper deals with the two fundamental problems concerning the handling of
large n-gram language models: indexing, that is compressing the n-gram strings
and associated satellite data without compromising their retrieval speed; and
estimation, that is computing the probability distribution of the strings from
a large textual source. Regarding the problem of indexing, we describe
compressed, exact and lossless data structures that achieve, at the same time,
high space reductions and no time degradation with respect to state-of-the-art
solutions and related software packages. In particular, we present a compressed
trie data structure in which each word following a context of fixed length k,
i.e., its preceding k words, is encoded as an integer whose value is
proportional to the number of words that follow such context. Since the number
of words following a given context is typically very small in natural
languages, we lower the space of representation to compression levels that were
never achieved before. Despite the significant savings in space, our technique
introduces a negligible penalty at query time. Regarding the problem of
estimation, we present a novel algorithm for estimating modified Kneser-Ney
language models, that have emerged as the de-facto choice for language modeling
in both academia and industry, thanks to their relatively low perplexity
performance. Estimating such models from large textual sources poses the
challenge of devising algorithms that make a parsimonious use of the disk. The
state-of-the-art algorithm uses three sorting steps in external memory: we show
an improved construction that requires only one sorting step thanks to
exploiting the properties of the extracted n-gram strings. With an extensive
experimental analysis performed on billions of n-grams, we show an average
improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February
2019, Article No: 2
Power and Memory Efficient Hashing Schemes for Some Network Applications
Hash tables (HTs) are used to implement various lookup schemes and they need
to be efficient in terms of speed, space utilization, and power consumptions. For IP
lookup, the hashing schemes are attractive due to their deterministic O(1) lookup
performance and low power consumptions, in contrast to the TCAM and Trie based
approaches. As the size of IP lookup table grows exponentially, scalable lookup
performance is highly desirable. For next generation high-speed routers, this is a
vital requirement when IP lookup remains in the critical data path and demands a
predictable throughput. However, recently proposed hash schemes, like a Bloomier
filter HT and a Fast HT (FHT) suffer from a number of flaws, including setup failures,
update overheads, duplicate keys, and pointer overheads. In this dissertation, four
novel hashing schemes and their architectures are proposed to address the above
concerns by using pipelined Bloom filters and a Fingerprint filter which are designed
for a memory-efficient approximate match. For IP lookups, two new hash schemes
such as a Hierarchically Indexed Hash Table (HIHT) and Fingerprint-based Hash
Table (FPHT) are introduced to achieve a a perfect match is assured without pointer
overhead. Further, two hash mechanisms are also proposed to provide memory and
power efficient lookup for packet processing applications.
Among four proposed schemes, the HIHT and the FPHT schemes are evaluated for their performance and compared with TCAM and Trie based IP lookup schemes.
Various sizes of IP lookup tables are considered to demonstrate scalability in terms
of speed, memory use, and power consumptions. While an FPHT uses less memory
than an HIHT, an FPHT-based IP lookup scheme reduces power consumption by a
factor of 51 and requires 1.8 times memory compared to TCAM-based and trie-based
IP lookup schemes, respectively. In dissertation, a multi-tiered packet classifier has
been proposed that saves at most 3.2 times power compared to the existing parallel
packet classifier.
Intrinsic hashing schemes lack of high throughput, unlike partitioned Ternary
Content Addressable Memory (TCAM)-based scheme that are capable of parallel
lookups despite large power consumption. A hybrid CAM (HCAM) architecture has
been introduced. Simulation results indicate HCAM to achieve the same throughput
as contemporary schemes while it uses 2.8 times less memory and 3.6 times less power
compared to the contemporary schemes
- …