Search CORE

5,050 research outputs found

Simple Worst-Case Optimal Adaptive Prefix-Free Coding

Author: Gagie Travis
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 09/11/2021
Field of study

Gagie and Nekrich (2009) gave an algorithm for adaptive prefix-free coding that, given a string

S [1..n]

over the alphabet

\{1, \ldots, \sigma\}

with

\sigma = o (n / \log^{5 / 2} n)

, encodes

S

in at most

n (H + 1) + o (n)

bits, where

H

is the empirical entropy of

S

, such that encoding and decoding

S

take

O (n)

time. They also proved their bound on the encoding length is optimal, even when the empirical entropy is high. Their algorithm is impractical, however, because it uses complicated data structures. In this paper we give an algorithm with the same bounds, except that we require

\sigma = o (n^{1 / 2} / \log n)

, that uses no data structures more complicated than a lookup table. Moreover, when Gagie and Nekrich's algorithm is used for optimal adaptive alphabetic coding it takes

O (n \log \log n)

time for decoding, but ours still takes

O (n)

time

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Simple Worst-Case Optimal Adaptive Prefix-Free Coding

Author: Gagie Travis
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

New Algorithms and Lower Bounds for Sequential-Access Data Compression

Author: Gagie Travis
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

arXiv.org e-Print Archive

Publications at Bielefeld University

First-Come-First-Served for Online Slot Allocation and Huffman Coding

Author: Khare Monik
Mathieu Claire
Young Neal E.
Publication venue
Publication date: 07/10/2013
Field of study

Can one choose a good Huffman code on the fly, without knowing the underlying distribution? Online Slot Allocation (OSA) models this and similar problems: There are n slots, each with a known cost. There are n items. Requests for items are drawn i.i.d. from a fixed but hidden probability distribution p. After each request, if the item, i, was not previously requested, then the algorithm (knowing the slot costs and the requests so far, but not p) must place the item in some vacant slot j(i). The goal is to minimize the sum, over the items, of the probability of the item times the cost of its assigned slot. The optimal offline algorithm is trivial: put the most probable item in the cheapest slot, the second most probable item in the second cheapest slot, etc. The optimal online algorithm is First Come First Served (FCFS): put the first requested item in the cheapest slot, the second (distinct) requested item in the second cheapest slot, etc. The optimal competitive ratios for any online algorithm are 1+H(n-1) ~ ln n for general costs and 2 for concave costs. For logarithmic costs, the ratio is, asymptotically, 1: FCFS gives cost opt + O(log opt). For Huffman coding, FCFS yields an online algorithm (one that allocates codewords on demand, without knowing the underlying probability distribution) that guarantees asymptotically optimal cost: at most opt + 2 log(1+opt) + 2.Comment: ACM-SIAM Symposium on Discrete Algorithms (SODA) 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

On Probability Estimation by Exponential Smoothing

Author: Mattern Christopher
Publication venue
Publication date: 09/01/2015
Field of study

Probability estimation is essential for every statistical data compression algorithm. In practice probability estimation should be adaptive, recent observations should receive a higher weight than older observations. We present a probability estimation method based on exponential smoothing that satisfies this requirement and runs in constant time per letter. Our main contribution is a theoretical analysis in case of a binary alphabet for various smoothing rate sequences: We show that the redundancy w.r.t. a piecewise stationary model with

s

segments is

O\left(s\sqrt n\right)

for any bit sequence of length

n

, an improvement over redundancy

O\left(s\sqrt{n\log n}\right)

of previous approaches with similar time complexity

arXiv.org e-Print Archive

Crossref

Real-time and distributed applications for dictionary-based data compression

Author: DE AGOSTINO Sergio
Publication venue: Petre Dini
Publication date: 01/01/2015
Field of study

The greedy approach to dictionary-based static text compression can be executed by a finite state machine. When it is applied in parallel to different blocks of data independently, there is no lack of robustness even on standard large scale distributed systems with input files of arbitrary size. Beyond standard large scale, a negative effect on the compression effectiveness is caused by the very small size of the data blocks. A robust approach for extreme distributed systems is presented in this paper, where this problem is fixed by overlapping adjacent blocks and preprocessing the neighborhoods of the boundaries. Moreover, we introduce the notion of pseudo-prefix dictionary, which allows optimal compression by means of a real-time semi-greedy procedure and a slight improvement on the compression ratio obtained by the distributed implementations

Archivio della ricerca- Università di Roma La Sapienza