Search CORE

17 research outputs found

Algorithms and data structures for grammar-compressed strings

Author: Cording Patrick Hagge
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

From LZ77 to the run-length encoded burrows-wheeler transform, and back

Author: Policriti Alberto
Prezza Nicola
Publication venue
Publication date: 01/01/2017
Field of study

The Lempel-Ziv factorization (LZ77) and the Run-Length encoded Burrows-Wheeler Transform (RLBWT) are two important tools in text compression and indexing, being their sizes z and r closely related to the amount of text self-repetitiveness. In this paper we consider the problem of converting the two representations into each other within a working space proportional to the input and the output. Let n be the text length. We show that RLBW T can be converted to LZ77 in O(n log r) time and O(r) words of working space. Conversely, we provide an algorithm to convert LZ77 to RLBW T in O n(log r + log z) time and O(r + z) words of working space. Note that r and z can be constant if the text is highly repetitive, and our algorithms can operate with (up to) exponentially less space than naive solutions based on full decompression

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

Dagstuhl Research Online Publication Server

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Online Research Database In Technology

Algorithms and Data Structures for Strings, Points and Integers:or, Points about Strings and Strings about Points

Author: Vind Søren Juhl
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology

Space-efficient conversions from SLPs

Author: Gagie Travis
Goga Adrián
Jeż Artur
Navarro Gonzalo
Publication venue
Publication date: 10/10/2023
Field of study

We give algorithms that, given a straight-line program (SLP) with

g

rules that generates (only) a text

T [1..n]

, builds within

O(g)

space the Lempel-Ziv (LZ) parse of

T

(of

z

phrases) in time

O(n\log^2 n)

or in time

O(gz\log^2(n/z))

. We also show how to build a locally consistent grammar (LCG) of optimal size

g_{lc} = O(\delta\log\frac{n}{\delta})

from the SLP within

O(g+g_{lc})

space and in

O(n\log g)

time, where

\delta

is the substring complexity measure of

T

. Finally, we show how to build the LZ parse of

T

from such a LCG within

O(g_{lc})

space and in time

O(z\log^2 n \log^2(n/z))

. All our results hold with high probability

arXiv.org e-Print Archive

Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data

Author: De Rajat
Kempa Dominik
Publication venue
Publication date: 17/07/2023
Field of study

Grammar compression is a general compression framework in which a string

T

of length

N

is represented as a context-free grammar of size

n

whose language contains only

T

. In this paper, we focus on studying the limitations of algorithms and data structures operating on strings in grammar-compressed form. Previous work focused on proving lower bounds for grammars constructed using algorithms that achieve the approximation ratio

\rho=\mathcal{O}(\text{polylog }N)

. Unfortunately, for the majority of grammar compressors,

\rho

is either unknown or satisfies

\rho=\omega(\text{polylog }N)

. In their seminal paper, Charikar et al. [IEEE Trans. Inf. Theory 2005] studied seven popular grammar compression algorithms: RePair, Greedy, LongestMatch, Sequential, Bisection, LZ78, and

\alpha

-Balanced. Only one of them (

\alpha

-Balanced) is known to achieve

\rho=\mathcal{O}(\text{polylog }N)

. We develop the first technique for proving lower bounds for data structures and algorithms on grammars that is fully general and does not depend on the approximation ratio

\rho

of the used grammar compressor. Using this technique, we first prove that

\Omega(\log N/\log \log N)

time is required for random access on RePair, Greedy, LongestMatch, Sequential, and Bisection, while

\Omega(\log\log N)

time is required for random access to LZ78. All these lower bounds hold within space

\mathcal{O}(n\text{ polylog }N)

and match the existing upper bounds. We also generalize this technique to prove several conditional lower bounds for compressed computation. For example, we prove that unless the Combinatorial

k

-Clique Conjecture fails, there is no combinatorial algorithm for CFG parsing on Bisection (for which it holds

\rho=\tilde{\Theta}(N^{1/2})

) that runs in

\mathcal{O}(n^c\cdot N^{3-\epsilon})

time for all constants

c>0

and

\epsilon>0

. Previously, this was known only for

c<2\epsilon

arXiv.org e-Print Archive

Compression by Contracting Straight-Line Programs

Author: Ganardi Moses
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th Annual European Symposium on Algorithms (ESA 2021)
Publication date: 01/01/2021
Field of study

In grammar-based compression a string is represented by a context-free grammar, also called a straight-line program (SLP), that generates only that string. We refine a recent balancing result stating that one can transform an SLP of size

g

in linear time into an equivalent SLP of size

O(g)

so that the height of the unique derivation tree is

O(\log N)

where

N

is the length of the represented string (FOCS 2019). We introduce a new class of balanced SLPs, called contracting SLPs, where for every rule

A \to \beta_1 \dots \beta_k

the string length of every variable

\beta_i

on the right-hand side is smaller by a constant factor than the string length of

A

. In particular, the derivation tree of a contracting SLP has the property that every subtree has logarithmic height in its leaf size. We show that a given SLP of size

g

can be transformed in linear time into an equivalent contracting SLP of size

O(g)

with rules of constant length. We present an application to the navigation problem in compressed unranked trees, represented by forest straight-line programs (FSLPs). We extend a linear space data structure by Reh and Sieber (2020) by the operation of moving to the

i

-th child in time

O(\log d)

where

d

is the degree of the current node. Contracting SLPs are also applied to the finger search problem over SLP-compressed strings where one wants to access positions near to a pre-specified finger position, ideally in