Search CORE

366 research outputs found

Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

Author: A Poyias
D Arroyuelo
D Lemire
D Lemire
D Lemire
G Marsaglia
GH Gonnet
H Bannai
H Luan
J Fischer
J Fischer
J Jansson
J Kärkkäinen
J Ziv
J Ziv
JA Feldman
JG Cleary
K Chung
L Carter
P Tchebychev
RM Karp
RM Robinson
TA Welch
Y Nakashima
Publication venue
Publication date: 09/06/2017
Field of study

We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

arXiv.org e-Print Archive

Crossref

c-trie++: A Dynamic Trie Tailored for Fast Prefix Searches

Author: Bannai Hideo
Inenaga Shunsuke
Kanda Shunsuke
Köppl Dominik
Nakashima Yuto
Takeda Masayuki
Tsuruta Kazuya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/10/2020
Field of study

Given a dynamic set

K

k

strings of total length

n

whose characters are drawn from an alphabet of size

\sigma

, a keyword dictionary is a data structure built on

K

that provides locate, prefix search, and update operations on

K

. Under the assumption that

\alpha = w / \lg \sigma

characters fit into a single machine word

w

, we propose a keyword dictionary that represents

K

n \lg \sigma + \Theta(k \lg n)

bits of space, supporting all operations in

O(m / \alpha + \lg \alpha)

expected time on an input string of length

m

in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure, especially for prefix searches - one of the most elementary keyword dictionary operations

arXiv.org e-Print Archive

Crossref

Don't Thrash: How to Cache Your Hash on Flash

Author: Bender Michael A.
Farach-Colton Martin
Johnson Rob
Kraner Russell
Kuszmaul Bradley C.
Medjedovic Dzejla
Montes Pablo
Shetty Pradeep
Spillane Richard P.
Zadok Erez
Publication venue
Publication date: 01/01/2012
Field of study

This paper presents new alternatives to the well-known Bloom filter data structure. The Bloom filter, a compact data structure supporting set insertion and membership queries, has found wide application in databases, storage systems, and networks. Because the Bloom filter performs frequent random reads and writes, it is used almost exclusively in RAM, limiting the size of the sets it can represent. This paper first describes the quotient filter, which supports the basic operations of the Bloom filter, achieving roughly comparable performance in terms of space and time, but with better data locality. Operations on the quotient filter require only a small number of contiguous accesses. The quotient filter has other advantages over the Bloom filter: it supports deletions, it can be dynamically resized, and two quotient filters can be efficiently merged. The paper then gives two data structures, the buffered quotient filter and the cascade filter, which exploit the quotient filter advantages and thus serve as SSD-optimized alternatives to the Bloom filter. The cascade filter has better asymptotic I/O performance than the buffered quotient filter, but the buffered quotient filter outperforms the cascade filter on small to medium data sets. Both data structures significantly outperform recently-proposed SSD-optimized Bloom filter variants, such as the elevator Bloom filter, buffered Bloom filter, and forest-structured Bloom filter. In experiments, the cascade filter and buffered quotient filter performed insertions 8.6-11 times faster than the fastest Bloom filter variant and performed lookups 0.94-2.56 times faster.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Fast and Simple Compact Hashing via Bucketing

Author: Koppl Dominik
Puglisi Simon J.
Raman Rajeev
Publication venue
Publication date: 01/09/2022
Field of study

Compact hash tables store a set S of n key-value pairs, where the keys are from the universe U = {0, ..., u - 1}, and the values are v-bit integers, in close to B(u, n) + nv bits of space, where B(u, n) = log2 ((u)(n)) is the information-theoretic lower bound for representing the set of keys in S, and support operations insert, delete and lookup on S. Compact hash tables have received significant attention in recent years, and approaches dating back to Cleary [IEEE T. Comput, 1984], as well as more recent ones have been implemented and used in a number of applications. However, the wins on space usage of these approaches are outweighed by their slowness relative to conventional hash tables. In this paper, we demonstrate that compact hash tables based upon a simple idea of bucketing practically outperform existing compact hash table implementations in terms of memory usage and construction time, and existing fast hash table implementations in terms of memory usage (and sometimes also in terms of construction time), while having competitive query times. A related notion is that of a compact hash ID map, which stores a set (S) over cap of n keys from U, and implicitly associates each key in (S) over cap with a unique value (its ID), chosen by the data structure itself, which is an integer of magnitude O(n), and supports inserts and lookups on S, while using space close to B(u, n) bits. One of our approaches is suitable for use as a compact hash ID map.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Sliding Block Hashing (Slick) -- Basic Algorithmic Ideas

Author: Lehmann Hans-Peter
Sanders Peter
Walzer Stefan
Publication venue
Publication date: 18/04/2023
Field of study

We present {\bf Sli}ding Blo{\bf ck} Hashing (Slick), a simple hash table data structure that combines high performance with very good space efficiency. This preliminary report outlines avenues for analysis and implementation that we intend to pursue

arXiv.org e-Print Archive

Multi-core and/or Symbolic Model Checking

Author: Dijk Tom van
Laarman Alfons
Pol Jaco van de
Publication venue: European Association of Software Science and Technology
Publication date: 01/01/2012
Field of study

We review our progress in high-performance model checking. Our multi-core model checker is based on a scalable hash-table design and parallel random-walk traversal. Our symbolic model checker is based on Multiway Decision Diagrams and the saturation strategy. The LTSmin tool is based on the PINS architecture, decoupling model checking algorithms from the input specification language. Consequently, users can stay in their own specification language and postpone the choice between parallel or symbolic model checking. We support widely different specification languages including those of SPIN (Promela), mCRL2 and UPPAAL (timed automata). So far, multi-core and symbolic algorithms had very little in common, forcing the user in the end to make a wise trade-off between memory or speed. Recently, however, we designed a novel multi-core BDD package called Sylvan. This forms an excellent basis for scalable parallel symbolic model checking

CiteSeerX

University of Twente Research Information

Electronic Communications of the EASST (European Association of Software Science and Technology)

Parallel Recursive State Compression for Free

Author: A. Laarman
A.W. Laarman
G.J. Holzmann
J. Geldenhuys
J. Geldenhuys
J.G. Cleary
K. Larsen
P. Sanders
R. Pelánek
R.E. Bryant
S. Blom
S. Evangelista
S.C.C. Blom
Stefan Blom
T.M. Cover
Publication venue
Publication date: 01/01/2011
Field of study

This paper focuses on reducing memory usage in enumerative model checking, while maintaining the multi-core scalability obtained in earlier work. We present a tree-based multi-core compression method, which works by leveraging sharing among sub-vectors of state vectors. An algorithmic analysis of both worst-case and optimal compression ratios shows the potential to compress even large states to a small constant on average (8 bytes). Our experiments demonstrate that this holds up in practice: the median compression ratio of 279 measured experiments is within 17% of the optimum for tree compression, and five times better than the median compression ratio of SPIN's COLLAPSE compression. Our algorithms are implemented in the LTSmin tool, and our experiments show that for model checking, multi-core tree compression pays its own way: it comes virtually without overhead compared to the fastest hash table-based methods.Comment: 19 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Twente Research Information

Concurrent Expandable AMQs on the Basis of Quotient Filters

Author: Maier Tobias
Sanders Peter
Williger Robert
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH
Publication date: 19/11/2019
Field of study

A quotient filter is a cache efficient Approximate Membership Query (AMQ) data structure. Depending on the fill degree of the filter most insertions and queries only need to access one or two consecutive cache lines. This makes quotient filters very fast compared to the more commonly used Bloom filters that incur multiple independent memory accesses depending on the false positive rate. However, concurrent Bloom filters are easy to implement and can be implemented lock-free while concurrent quotient filters are not as simple. Usually concurrent quotient filters work by using an external array of locks - each protecting a region of the table. Accessing this array incurs one additional memory access per operation. We propose a new locking scheme that has no memory overhead. Using this new locking scheme we achieve 1.6× times higher insertion performance and over 2.1× higher query performance than with the common external locking scheme. Another advantage of quotient filters over Bloom filters is that a quotient filter can change its capacity when it is becoming full. We implement this growing technique for our concurrent quotient filters and adapt it in a way that allows unbounded growing while keeping a bounded false positive rate. We call the resulting data structure a fully expandable quotient filter. Its design is similar to scalable Bloom filters, but we exploit some concepts inherent to quotient filters to improve the space efficiency and the query speed. Additionally, we propose several quotient filter variants that are aimed to reduce the number of status bits (2-status-bit variant) or to simplify concurrent implementations (linear probing quotient filter). The linear probing quotient filter even leads to a lock-free concurrent filter implementation. This is especially interesting, since we show that any lock-free implementation of other common quotient filter variants would incur significant overheads in the form of additional data fields or multiple passes over the accessed data. The code produced as part of this submission can be found at https://www.github.com/Toobiased/lpqfilter

arXiv.org e-Print Archive

KITopen

Dagstuhl Research Online Publication Server