Search CORE

7 research outputs found

Compact Binary Relation Representations with Rich Functionality

Author: Barbay Jérémy
Claude Francisco
Navarro Gonzalo
Publication venue
Publication date: 17/01/2012
Field of study

Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify reductions among those operations. We then introduce several novel binary relation representations, some simple and some quite sophisticated, that not only are space-efficient but also efficiently support a large subset of the desired queries.Comment: 32 page

arXiv.org e-Print Archive

CiteSeerX

Efficient Fully-Compressed Sequence Representations

Author: Barbay Jeremy
Claude Francisco
Gagie Travis
Navarro Gonzalo
Nekrich Yakov
Publication venue
Publication date: 01/01/2012
Field of study

We present a data structure that stores a sequence

s[1..n]

over alphabet

[1..\sigma]

in n\Ho(s) + o(n)(\Ho(s){+}1) bits, where \Ho(s) is the zero-order entropy of

s

. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time \Oh{\lg\lg\sigma} and average time \Oh{\lg \Ho(s)}. The worst-case complexity matches the best previous results, yet these had been achieved with data structures using n\Ho(s)+o(n\lg\sigma) bits. On highly compressible sequences the

o(n\lg\sigma)

bits of the redundancy may be significant compared to the the n\Ho(s) bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve

(i)

compressed redundancy, retaining the best time complexities, for the smallest existing full-text self-indexes;

(ii)

compressed permutations

\pi

with times for

\pi()

and \pii() improved to loglogarithmic; and

(iii)

the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. ..

arXiv.org e-Print Archive

Advanced rank/select data structures: succinctness, bounds and applications.

Author: ORLANDI ALESSIO
Publication venue: 'Pisa University Press'
Publication date: 10/12/2012
Field of study

The thesis explores new theoretical results and applications of rank and select data structures. Given a string, select(c, i) gives the position of the ith occurrence of character c in the string, while rank(c, p) counts the number of instances of character c on the left of position p. Succinct rank/select data structures are space-efficient versions of standard ones, designed to keep data compressed and at the same time answer to queries rapidly. They are at the basis of more involved compressed and succinct data structures which in turn are motivated by the nowadays need to analyze and operate on massive data sets quickly, where space efficiency is crucial. The thesis builds up on the state of the art left by years of study and produces results on multiple fronts. Analyzing binary succinct data structures and their link with predecessor data structures, we integrate data structures for the latter problem in the former. The result is a data structure which outperforms the one of Patrascu 08 in a range of cases which were not studied before, namely when the lower bound for predecessor do not apply and constant-time rank is not feasible. Further, we propose the first lower bound for succinct data structures on generic strings, achieving a linear trade-off between time for rank/select execution and additional space (w.r.t. to the plain data) needed by the data structure. The proposal addresses systematic data structures, namely those that only access the underlying string through ADT calls and do not encode it directly. Also, we propose a matching upper bound that proves the tightness of our lower bound. Finally, we apply rank/select data structures to the substring counting problem, where we seek to preprocess a text and generate a summary data structure which is stored in lieu of the text and answers to substring counting queries with additive error. The results include a theory-proven optimal data structure with generic additive error and a data structure that errs only on infrequent patterns with significative practical space gains

Electronic Thesis and Dissertation Archive - Università di Pisa

Algorithms and Data Structures for Strings, Points and Integers:or, Points about Strings and Strings about Points

Author: Vind Søren Juhl
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology