Search CORE

3,437 research outputs found

Proving Tree Algorithms for Succinct Data Structures

Author: Affeldt Reynald
Garrigue Jacques
Qi Xuanrui
Tanaka Kazunari
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 10th International Conference on Interactive Theorem Proving (ITP 2019)
Publication date: 01/01/2019
Field of study

Succinct data structures give space-efficient representations of large amounts of data without sacrificing performance. They rely on cleverly designed data representations and algorithms. We present here the formalization in Coq/SSReflect of two different tree-based succinct representations and their accompanying algorithms. One is the Level-Order Unary Degree Sequence, which encodes the structure of a tree in breadth-first order as a sequence of bits, where access operations can be defined in terms of Rank and Select, which work in constant time for static bit sequences. The other represents dynamic bit sequences as binary balanced trees, where Rank and Select present a low logarithmic overhead compared to their static versions, and with efficient insertion and deletion. The two can be stacked to provide a dynamic representation of dictionaries for instance. While both representations are well-known, we believe this to be their first formalization and a needed step towards provably-safe implementations of big data

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

Author: Grossi Roberto
Ottaviano Giuseppe
Publication venue
Publication date: 01/01/2012
Field of study

An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

A Faster Implementation of Online Run-Length Burrows-Wheeler Transform

Author: A Bowe
D Belazzougui
G Navarro
G Navarro
J Sirén
J Ziv
JI Munro
MA Bender
V Mäkinen
W Hon
Publication venue
Publication date: 14/10/2017
Field of study

Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in

O(n\lg r)

time and

O(r\lg n)

bits of space, where

n

is the length of input string

S

received so far and

r

is the number of runs in the BWT of the reversed

S

. We improve the state-of-the-art algorithm for online RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. The empirical result for various benchmarks show the efficiency of our algorithm, especially for highly repetitive strings.Comment: In Proc. IWOCA201

arXiv.org e-Print Archive

Crossref

Compressed Representations of Permutations, and Applications

Author: Barbay Jérémy
Navarro Gonzalo
Publication venue
Publication date: 01/01/2008
Field of study

We explore various techniques to compress a permutation

\pi

over n integers, taking advantage of ordered subsequences in

\pi

, while supporting its application

\pi

(i) and the application of its inverse

\pi^{-1}(i)

in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on applications such as the encoding of a permutation in order to support iterated applications

\pi^k(i)

of it, of integer functions, and of inverted lists and suffix arrays

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server