Search CORE

16,251 research outputs found

Recommended from our members

Parallel data compression

Author: Hirschberg Daniel S.
Stauffer Lynn M.
Publication venue: eScholarship, University of California
Publication date: 01/05/1991
Field of study

Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested

eScholarship - University of California

First-Come-First-Served for Online Slot Allocation and Huffman Coding

Author: Khare Monik
Mathieu Claire
Young Neal E.
Publication venue
Publication date: 07/10/2013
Field of study

Can one choose a good Huffman code on the fly, without knowing the underlying distribution? Online Slot Allocation (OSA) models this and similar problems: There are n slots, each with a known cost. There are n items. Requests for items are drawn i.i.d. from a fixed but hidden probability distribution p. After each request, if the item, i, was not previously requested, then the algorithm (knowing the slot costs and the requests so far, but not p) must place the item in some vacant slot j(i). The goal is to minimize the sum, over the items, of the probability of the item times the cost of its assigned slot. The optimal offline algorithm is trivial: put the most probable item in the cheapest slot, the second most probable item in the second cheapest slot, etc. The optimal online algorithm is First Come First Served (FCFS): put the first requested item in the cheapest slot, the second (distinct) requested item in the second cheapest slot, etc. The optimal competitive ratios for any online algorithm are 1+H(n-1) ~ ln n for general costs and 2 for concave costs. For logarithmic costs, the ratio is, asymptotically, 1: FCFS gives cost opt + O(log opt). For Huffman coding, FCFS yields an online algorithm (one that allocates codewords on demand, without knowing the underlying probability distribution) that guarantees asymptotically optimal cost: at most opt + 2 log(1+opt) + 2.Comment: ACM-SIAM Symposium on Discrete Algorithms (SODA) 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture

Author: Wan Andrew
Wright John
Wu Chenggang
Publication venue
Publication date: 10/12/2013
Field of study

Given

f:\{-1, 1\}^n \rightarrow \{-1, 1\}

, define the \emph{spectral distribution} of

f

to be the distribution on subsets of

[n]

in which the set

S

is sampled with probability

\widehat{f}(S)^2

. Then the Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that there is some absolute constant

C

such that

\operatorname{H}[\widehat{f}^2] \leq C\cdot\operatorname{Inf}[f]

. Here,

\operatorname{H}[\widehat{f}^2]

denotes the Shannon entropy of

f

's spectral distribution, and

\operatorname{Inf}[f]

is the total influence of

f

. This conjecture is one of the major open problems in the analysis of Boolean functions, and settling it would have several interesting consequences. Previous results on the FEI conjecture have been largely through direct calculation. In this paper we study a natural interpretation of the conjecture, which states that there exists a communication protocol which, given subset

S

[n]

distributed as

\widehat{f}^2

, can communicate the value of

S

using at most

C\cdot\operatorname{Inf}[f]

bits in expectation. Using this interpretation, we are able show the following results: 1. First, if

f

is computable by a read-

k

decision tree, then

\operatorname{H}[\widehat{f}^2] \leq 9k\cdot \operatorname{Inf}[f]

. 2. Next, if

f

has

\operatorname{Inf}[f] \geq 1

and is computable by a decision tree with expected depth

d

, then

\operatorname{H}[\widehat{f}^2] \leq 12d\cdot \operatorname{Inf}[f]

. 3. Finally, we give a new proof of the main theorem of O'Donnell and Tan (ICALP 2013), i.e. that their FEI

^+

conjecture composes. In addition, we show that natural improvements to our decision tree results would be sufficient to prove the FEI conjecture in its entirety. We believe that our methods give more illuminating proofs than previous results about the FEI conjecture

arXiv.org e-Print Archive

CiteSeerX

Shannon Information and Kolmogorov Complexity

Author: Grunwald Peter
Vitanyi Paul
Publication venue
Publication date: 01/01/2004
Field of study

We compare the elementary theories of Shannon information and Kolmogorov complexity, the extent to which they have a common purpose, and where they are fundamentally different. We discuss and relate the basic notions of both theories: Shannon entropy versus Kolmogorov complexity, the relation of both to universal coding, Shannon mutual information versus Kolmogorov (`algorithmic') mutual information, probabilistic sufficient statistic versus algorithmic sufficient statistic (related to lossy compression in the Shannon theory versus meaningful information in the Kolmogorov theory), and rate distortion theory versus Kolmogorov's structure function. Part of the material has appeared in print before, scattered through various publications, but this is the first comprehensive systematic comparison. The last mentioned relations are new.Comment: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans Information Theor

arXiv.org e-Print Archive

CiteSeerX

New Algorithms and Lower Bounds for Sequential-Access Data Compression

Author: Gagie Travis
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

arXiv.org e-Print Archive

Publications at Bielefeld University

Bidirectional Text Compression in External Memory

Author: Dinklage Patrick
Ellert Jonas
Fischer Johannes
Penschuck Manuel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Bidirectional compression algorithms work by substituting repeated substrings by references that, unlike in the famous LZ77-scheme, can point to either direction. We present such an algorithm that is particularly suited for an external memory implementation. We evaluate it experimentally on large data sets of size up to 128 GiB (using only 16 GiB of RAM) and show that it is significantly faster than all known LZ77 compressors, while producing a roughly similar number of factors. We also introduce an external memory decompressor for texts compressed with any uni- or bidirectional compression scheme

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server