Search CORE

129 research outputs found

Succinct Indexable Dictionaries with Applications to Encoding $k$ -ary Trees, Prefix Sums and Multisets

Author: Fich F. E.
Grossi R.
Hagerup T.
Hagerup T.
Jansson J.
Munro J. I.
Paul W. J.
Rajeev Raman
Raman R.
Raman V.
Srinivasa Rao Satti
Venkatesh Raman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/05/2007
Field of study

We consider the {\it indexable dictionary} problem, which consists of storing a set

S \subseteq \{0,...,m-1\}

for some integer

m

, while supporting the operations of \Rank(x), which returns the number of elements in

S

that are less than

x

x \in S

, and -1 otherwise; and \Select(i) which returns the

i

-th smallest element in

S

. We give a data structure that supports both operations in O(1) time on the RAM model and requires

{\cal B}(n,m) + o(n) + O(\lg \lg m)

bits to store a set of size

n

, where {\cal B}(n,m) = \ceil{\lg {m \choose n}} is the minimum number of bits required to store any

n

-element subset from a universe of size

m

. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the

O(\lg \lg m)

additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a

k

-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size

n

from

\{0,...,m-1\}

{\cal B}(n,m+n) + o(n)

bits that supports (appropriate generalizations of) \Rank and \Select operations in constant time, and A representation of a sequence of

n

non-negative integers summing up to

m

{\cal B}(n,m+n) + o(n)

bits that supports prefix sum queries in constant time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report 2002/1

arXiv.org e-Print Archive

Crossref

More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries

Author: Grossi Roberto
Orlandi Alessio
Raman Rajeev
Rao S. Srinivasa
Publication venue
Publication date: 01/01/2009
Field of study

We consider the problem of representing, in a compressed format, a bit-vector

S

m

bits with

n

1s, supporting the following operations, where

b \in \{0, 1 \}

rank_b(S,i)

returns the number of occurrences of bit

b

in the prefix

S[1..i]

;

select_b(S,i)

returns the position of the

i

th occurrence of bit

b

S

. Such a data structure is called \emph{fully indexable dictionary (FID)} [Raman et al.,2007], and is at least as powerful as predecessor data structures. Our focus is on space-efficient FIDs on the \textsc{ram} model with word size

\Theta(\lg m)

and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring

S

to be encoded, having length

m

and containing

n

ones, the minimal amount of information that needs to be stored is

B(n,m) = \lceil \log {{m}\choose{n}} \rceil

. The state of the art in building a FID for

S

is given in [Patrascu,2008] using

B(m,n)+O(m / ((\log m/ t) ^t)) + O(m^{3/4})

bits, to support the operations in

O(t)

time. Here, we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants

0 0

, it uses B(n,m) + O(n^{1+\delta} + n (\frac{m}{n^s})^\eps) bits and performs all the operations in time O(s\delta^{-1} + \eps^{-1}). The improvement is twofold: our redundancy can be lowered parametrically and, fixing

s = O(1)

, we get a constant-time FID whose space is B(n,m) + O(m^\eps/\poly{n}) bits, for sufficiently large

m

. This is a significant improvement compared to the previous bounds for the general case

arXiv.org e-Print Archive

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

Improved ESP-index: a practical self-index for highly repetitive texts

Author: F. Claude
F. Claude
G. Navarro
J. Barbay
J.I. Munro
K. Goto
O. Delpratt
S. Maruyama
T. Gagie
T. Gagie
T. Yamamoto
Publication venue
Publication date: 01/01/2014
Field of study

While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th International Symposium on Experimental Algorithms (SEA2014

arXiv.org e-Print Archive

Crossref

Space-Efficient Graph Coarsening with Applications to Succinct Planar Encodings

Author: Kammer Frank
Meintrup Johannes
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

We present a novel space-efficient graph coarsening technique for n-vertex planar graphs G, called cloud partition, which partitions the vertices V(G) into disjoint sets C of size O(log n) such that each C induces a connected subgraph of G. Using this partition ? we construct a so-called structure-maintaining minor F of G via specific contractions within the disjoint sets such that F has O(n/log n) vertices. The combination of (F, ?) is referred to as a cloud decomposition. For planar graphs we show that a cloud decomposition can be constructed in O(n) time and using O(n) bits. Given a cloud decomposition (F, ?) constructed for a planar graph G we are able to find a balanced separator of G in O(n/log n) time. Contrary to related publications, we do not make use of an embedding of the planar input graph. We generalize our cloud decomposition from planar graphs to H-minor-free graphs for any fixed graph H. This allows us to construct the succinct encoding scheme for H-minor-free graphs due to Blelloch and Farzan (CPM 2010) in O(n) time and O(n) bits improving both runtime and space by a factor of ?(log n). As an additional application of our cloud decomposition we show that, for H-minor-free graphs, a tree decomposition of width O(n^{1/2 + ?}) for any ? > 0 can be constructed in O(n) bits and a time linear in the size of the tree decomposition. A similar result by Izumi and Otachi (ICALP 2020) constructs a tree decomposition of width O(k ?n log n) for graphs of treewidth k ? ?n in sublinear space and polynomial time

Dagstuhl Research Online Publication Server

The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

Author: Grossi Roberto
Ottaviano Giuseppe
Publication venue
Publication date: 01/01/2012
Field of study

An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

R3D3: A doubly opportunistic data structure for compressing and indexing massive data

Author: Nagy Máté
Rétvári Gábor
Tapolcai János
Publication venue: 'Infocommunications Journal'
Publication date: 01/01/2019
Field of study

Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in informationtheoretically minimum space. Yet, efficient data processing requires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed

Repository of the Academy's Library