Search CORE

134 research outputs found

Sublinear Algorithms for Approximating String Compressibility

Author: A. Luca de
Adam Smith
C.K. Chui
D. Benedetto
D. Sculley
Dana Ron
E. Frank
E. Keogh
E. Lehman
E.J. Keogh
F. Levé
F.M.J. Willems
G. Cormode
H. Cai
I. Gheorghiciuc
I.H. Witten
J. Cleary
J. Shallit
J. Ziv
J. Ziv
L. Ilie
L. Paninski
L. Paninski
L. Pierce II
M. Brautbar
M. Charikar
M. Li
M. Li
N. Ahmed
N. Alon
O. Keller
O.V. Kukushkina
R. Cilibrasi
R. Cilibrasi
Ronitt Rubinfeld
S. Janson
S. Raskhodnikova
S. Raskhodnikova
Sofya Raskhodnikova
T. Batu
T. Cover
Z. Bar-Yossef
Z. Kása
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2011
Field of study

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LZ77 to the number of distinct short substrings contained in it (its ℓth subword complexity , for small ℓ). In addition, we show that approximating the compressibility with respect to LZ77 is related to approximating the support size of a distribution.National Science Foundation (U.S.) (Award CCF-1065125)National Science Foundation (U.S.) (Award CCF-0728645)Marie Curie International Reintegration Grant PIRG03-GA-2008-231077Israel Science Foundation (Grant 1147/09)Israel Science Foundation (Grant 1675/09

CiteSeerX

Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing

Author: Kempa Dominik
Langmead Ben
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th Annual European Symposium on Algorithms (ESA 2021)
Publication date: 01/01/2021
Field of study

Grammar compression is, next to Lempel-Ziv (LZ77) and run-length Burrows-Wheeler transform (RLBWT), one of the most flexible approaches to representing and processing highly compressible strings. The main idea is to represent a text as a context-free grammar whose language is precisely the input string. This is called a straight-line grammar (SLG). An AVL grammar, proposed by Rytter [Theor. Comput. Sci., 2003] is a type of SLG that additionally satisfies the AVL property: the heights of parse trees for children of every nonterminal differ by at most one. In contrast to other SLG constructions, AVL grammars can be constructed from the LZ77 parsing in compressed time: ?(z log n) where z is the size of the LZ77 parsing and n is the length of the input text. Despite these advantages, AVL grammars are thought to be too large to be practical. We present a new technique for rapidly constructing a small AVL grammar from an LZ77 or LZ77-like parse. Our algorithm produces grammars that are always at least five times smaller than those produced by the original algorithm, and usually not more than double the size of grammars produced by the practical Re-Pair compressor [Larsson and Moffat, Proc. IEEE, 2000]. Our algorithm also achieves low peak RAM usage. By combining this algorithm with recent advances in approximating the LZ77 parsing, we show that our method has the potential to construct a run-length BWT in about one third of the time and peak RAM required by other approaches. Overall, we show that AVL grammars are surprisingly practical, opening the door to much faster construction of key compressed data structures

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Image Characterization and Classification by Physical Complexity

Author: Delahaye Jean-Paul
Gaucherel Cedric
Zenil Hector
Publication venue
Publication date: 03/07/2011
Field of study

We present a method for estimating the complexity of an image based on Bennett's concept of logical depth. Bennett identified logical depth as the appropriate measure of organized complexity, and hence as being better suited to the evaluation of the complexity of objects in the physical world. Its use results in a different, and in some sense a finer characterization than is obtained through the application of the concept of Kolmogorov complexity alone. We use this measure to classify images by their information content. The method provides a means for classifying and evaluating the complexity of objects by way of their visual representations. To the authors' knowledge, the method and application inspired by the concept of logical depth presented herein are being proposed and implemented for the first time.Comment: 30 pages, 21 figure

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Oxford University Research Archive

Hal-Diderot

Decompressing Lempel-Ziv Compressed Text

Author: Bille Philip
Ettienne Mikko Berggren
Gagie Travis
Gørtz Inge Li
Prezza Nicola
Publication venue
Publication date: 04/11/2019
Field of study

We consider the problem of decompressing the Lempel--Ziv 77 representation of a string

S

of length

n

using a working space as close as possible to the size

z

of the input. The folklore solution for the problem runs in

O(n)

time but requires random access to the whole decompressed text. Another folklore solution is to convert LZ77 into a grammar of size

O(z\log(n/z))

and then stream

S

in linear time. In this paper, we show that

O(n)

time and

O(z)

working space can be achieved for constant-size alphabets. On general alphabets of size

\sigma

, we describe (i) a trade-off achieving

O(n\log^\delta \sigma)

time and

O(z\log^{1-\delta}\sigma)

space for any

0\leq \delta\leq 1

, and (ii) a solution achieving

O(n)

time and

O(z\log\log (n/z))

space. The latter solution, in particular, dominates both folklore algorithms for the problem. Our solutions can, more generally, extract any specified subsequence of

S

with little overheads on top of the linear running time and working space. As an immediate corollary, we show that our techniques yield improved results for pattern matching problems on LZ77-compressed text

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari