Search CORE

8 research outputs found

Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

Author: A Poyias
D Arroyuelo
D Lemire
D Lemire
D Lemire
G Marsaglia
GH Gonnet
H Bannai
H Luan
J Fischer
J Fischer
J Jansson
J Kärkkäinen
J Ziv
J Ziv
JA Feldman
JG Cleary
K Chung
L Carter
P Tchebychev
RM Karp
RM Robinson
TA Welch
Y Nakashima
Publication venue
Publication date: 09/06/2017
Field of study

We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

arXiv.org e-Print Archive

Crossref

Fast and Simple Compact Hashing via Bucketing

Author: Koppl Dominik
Puglisi Simon J.
Raman Rajeev
Publication venue
Publication date: 01/09/2022
Field of study

Compact hash tables store a set S of n key-value pairs, where the keys are from the universe U = {0, ..., u - 1}, and the values are v-bit integers, in close to B(u, n) + nv bits of space, where B(u, n) = log2 ((u)(n)) is the information-theoretic lower bound for representing the set of keys in S, and support operations insert, delete and lookup on S. Compact hash tables have received significant attention in recent years, and approaches dating back to Cleary [IEEE T. Comput, 1984], as well as more recent ones have been implemented and used in a number of applications. However, the wins on space usage of these approaches are outweighed by their slowness relative to conventional hash tables. In this paper, we demonstrate that compact hash tables based upon a simple idea of bucketing practically outperform existing compact hash table implementations in terms of memory usage and construction time, and existing fast hash table implementations in terms of memory usage (and sometimes also in terms of construction time), while having competitive query times. A related notion is that of a compact hash ID map, which stores a set (S) over cap of n keys from U, and implicitly associates each key in (S) over cap with a unique value (its ID), chosen by the data structure itself, which is an integer of magnitude O(n), and supports inserts and lookups on S, while using space close to B(u, n) bits. One of our approaches is suitable for use as a compact hash ID map.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A framework of dynamic data structures for string processing

Author: Prezza N.
Publication venue
Publication date: 01/01/2017
Field of study

In this paper we present DYNAMIC, an open-source C++ library implementing dynamic compressed data structures for string manipulation. Our framework includes useful tools such as searchable partial sums, succinct/gap-encoded bitvectors, and entropy/run-length compressed strings and FM indexes. We prove close-to-optimal theoretical bounds for the resources used by our structures, and show that our theoretical predictions are empirically tightly verified in practice. To conclude, we turn our attention to applications. We compare the performance of five recently-published compression algorithms implemented using DYNAMIC with those of stateof-the-art tools performing the same task. Our experiments show that algorithms making use of dynamic compressed data structures can be up to three orders of magnitude more space-efficient (albeit slower) than classical ones performing the same tasks

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Online Research Database In Technology

空間効率と時間効率の良い文字列辞書

Author: Kanda Shunsuke
Publication venue
Publication date: 02/07/2018
Field of study

Tokushima University Institutional Repository

Compact Dynamic Rewritable (CDRW) Arrays

Author
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Engineering compact dynamic data structures and in-memory data mining

Author: Andreas Poyias (7652237)
Publication venue
Publication date: 20/04/2018
Field of study

Compact and succinct data structures use space that approaches the information-theoretic lower bound on the space that is required to represent the data. In practice, their memory footprint is orders of magnitude smaller than normal data structures and at the same time they are competitive in speed. A main drawback with many of these data structures is that they do not support dynamic operations efficiently. It can be exceedingly expensive to rebuild a static data structure each time an update occurs. In this thesis, we propose a number of novel compact dynamic data structures including m-Bonsai, which is a compact tree representation, compact dynamic rewritable (CDRW) arrays which is a compact representation of variable-length bit-strings. These data structures can answer queries efficiently, perform updates fast while they maintain their small memory footprint. In addition to the designing of these data structures, we analyze them theoretically, we implement them and finally test them to show their good practical performance. Many data mining algorithms require data structures that can query and dynamically update data in memory. One such algorithm is FP-growth. It is one of the fastest algorithms for the solution of Frequent Itemset Mining, which is one of the most fundamental problems in data mining. FP-growth reads the entire data in memory, updates the data structures in memory and performs a series of queries on the given data. We propose a compact implementation for the FP-growth algorithm, the PFP-growth. Based on our experimental evaluation, our implementation is one order of magnitude more space efficient compared to the classic implementation of FP-growth and 2 - 3 times compared to a more recent carefully engineered implementation. At the same time it is competitive in terms of speed

Leicester Research Archive