Search CORE

8 research outputs found

Simple Worst-Case Optimal Adaptive Prefix-Free Coding

Author: Gagie Travis
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 09/11/2021
Field of study

Gagie and Nekrich (2009) gave an algorithm for adaptive prefix-free coding that, given a string

S [1..n]

over the alphabet

\{1, \ldots, \sigma\}

with

\sigma = o (n / \log^{5 / 2} n)

, encodes

S

in at most

n (H + 1) + o (n)

bits, where

H

is the empirical entropy of

S

, such that encoding and decoding

S

take

O (n)

time. They also proved their bound on the encoding length is optimal, even when the empirical entropy is high. Their algorithm is impractical, however, because it uses complicated data structures. In this paper we give an algorithm with the same bounds, except that we require

\sigma = o (n^{1 / 2} / \log n)

, that uses no data structures more complicated than a lookup table. Moreover, when Gagie and Nekrich's algorithm is used for optimal adaptive alphabetic coding it takes

O (n \log \log n)

time for decoding, but ours still takes

O (n)

time

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Simple Worst-Case Optimal Adaptive Prefix-Free Coding

Author: Gagie Travis
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

New Algorithms and Lower Bounds for Sequential-Access Data Compression

Author: Gagie Travis
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

arXiv.org e-Print Archive

Publications at Bielefeld University

Efficient Fully-Compressed Sequence Representations

Author: Barbay Jeremy
Claude Francisco
Gagie Travis
Navarro Gonzalo
Nekrich Yakov
Publication venue
Publication date: 01/01/2012
Field of study

We present a data structure that stores a sequence

s[1..n]

over alphabet

[1..\sigma]

in n\Ho(s) + o(n)(\Ho(s){+}1) bits, where \Ho(s) is the zero-order entropy of

s

. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time \Oh{\lg\lg\sigma} and average time \Oh{\lg \Ho(s)}. The worst-case complexity matches the best previous results, yet these had been achieved with data structures using n\Ho(s)+o(n\lg\sigma) bits. On highly compressible sequences the

o(n\lg\sigma)

bits of the redundancy may be significant compared to the the n\Ho(s) bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve

(i)

compressed redundancy, retaining the best time complexities, for the smallest existing full-text self-indexes;

(ii)

compressed permutations

\pi

with times for

\pi()

and \pii() improved to loglogarithmic; and

(iii)

the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. ..

arXiv.org e-Print Archive

Worst-Case Optimal Adaptive Prefix Coding

Author: A. Andersson
A. Moffat
A. Turpin
C.E. Shannon
D.A. Huffman
D.E. Knuth
E.N. Gilbert
E.S. Schwartz
J.S. Vitter
M.L. Fredman
R.G. Gallager
R.L. Milidiú
R.M. Krause
T. Gagie
Publication venue
Publication date: 01/01/2009
Field of study

A common complaint about adaptive prefix coding is that it is much slower than static prefix coding. Karpinski and Nekrich recently took an important step towards resolving this: they gave an adaptive Shannon coding algorithm that encodes each character in O(1) amortized time and decodes it in O(log H) amortized time, where H is the empirical entropy of the input string s. For comparison, Gagie’s adaptive Shannon coder and both Knuth’s and Vitter’s adaptive Huffman coders all use Θ(H) amortized time for each character. In this paper we give an adaptive Shannon coder that both encodes and decodes each character in O(1) worst-case time. As with both previous adaptive Shannon coders, we store s in at most (H + 1)|s | + o(|s|) bits. We also show that this encoding length is worst-case optimal up to the lower order term

CiteSeerX

Crossref

LIPIcs, Volume 244, ESA 2022, Complete Volume

Author: Chechik Shiri
Herman Grzegorz
Navarro Gonzalo
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 244, ESA 2022, Complete Volum

Dagstuhl Research Online Publication Server