6,717 research outputs found

    Results on the Redundancy of Universal Compression for Finite-Length Sequences

    Full text link
    In this paper, we investigate the redundancy of universal coding schemes on smooth parametric sources in the finite-length regime. We derive an upper bound on the probability of the event that a sequence of length nn, chosen using Jeffreys' prior from the family of parametric sources with dd unknown parameters, is compressed with a redundancy smaller than (1−ϔ)d2log⁥n(1-\epsilon)\frac{d}{2}\log n for any Ï”>0\epsilon>0. Our results also confirm that for large enough nn and dd, the average minimax redundancy provides a good estimate for the redundancy of most sources. Our result may be used to evaluate the performance of universal source coding schemes on finite-length sequences. Additionally, we precisely characterize the minimax redundancy for two--stage codes. We demonstrate that the two--stage assumption incurs a negligible redundancy especially when the number of source parameters is large. Finally, we show that the redundancy is significant in the compression of small sequences.Comment: accepted in the 2011 IEEE International Symposium on Information Theory (ISIT 2011

    Universal lossless source coding with the Burrows Wheeler transform

    Get PDF
    The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source

    A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm

    Full text link
    Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length-NN input sequence is partitioned into BB blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of BB, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the BB blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is O(N/B)O(N/B). Its redundancy is approximately Blog⁥(N/B)B\log(N/B) bits above Rissanen's lower bound on universal compression performance, with respect to any context tree source whose maximal depth is at most log⁥(N/B)\log(N/B). We improve the compression by using different quantizers for states of the context tree based on the number of symbols corresponding to those states. Numerical results from a prototype implementation suggest that our algorithm offers a better trade-off between compression and throughput than competing universal data compression algorithms.Comment: Accepted to Journal of Selected Topics in Signal Processing special issue on Signal Processing for Big Data (expected publication date June 2015). 10 pages double column, 6 figures, and 2 tables. arXiv admin note: substantial text overlap with arXiv:1405.6322. Version: Mar 2015: Corrected a typ

    A Parallel Two-Pass MDL Context Tree Algorithm for Universal Source Coding

    Full text link
    We present a novel lossless universal source coding algorithm that uses parallel computational units to increase the throughput. The length-NN input sequence is partitioned into BB blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of BB, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) source underlying the entire input, and then encode each of the BB blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is O(N/B)O(N/B). Its redundancy is approximately Blog⁥(N/B)B\log(N/B) bits above Rissanen's lower bound on universal coding performance, with respect to any tree source whose maximal depth is at most log⁥(N/B)\log(N/B)

    On the Network-Wide Gain of Memory-Assisted Source Coding

    Full text link
    Several studies have identified a significant amount of redundancy in the network traffic. For example, it is demonstrated that there is a great amount of redundancy within the content of a server over time. This redundancy can be leveraged to reduce the network flow by the deployment of memory units in the network. The question that arises is whether or not the deployment of memory can result in a fundamental improvement in the performance of the network. In this paper, we answer this question affirmatively by first establishing the fundamental gains of memory-assisted source compression and then applying the technique to a network. Specifically, we investigate the gain of memory-assisted compression in random network graphs consisted of a single source and several randomly selected memory units. We find a threshold value for the number of memories deployed in a random graph and show that if the number of memories exceeds the threshold we observe network-wide reduction in the traffic.Comment: To appear in 2011 IEEE Information Theory Workshop (ITW 2011
    • 

    corecore