23 research outputs found

    Analysis of Arithmetic Coding for Data Compression

    Get PDF
    Arithmetic coding, in conjunction with a suitable probabilistic model, can pro- vide nearly optimal data compression. In this article we analyze the e ect that the model and the particular implementation of arithmetic coding have on the code length obtained. Periodic scaling is often used in arithmetic coding im- plementations to reduce time and storage requirements; it also introduces a recency e ect which can further a ect compression. Our main contribution is introducing the concept of weighted entropy and using it to characterize in an elegant way the e ect that periodic scaling has on the code length. We explain why and by how much scaling increases the code length for les with a ho- mogeneous distribution of symbols, and we characterize the reduction in code length due to scaling for les exhibiting locality of reference. We also give a rigorous proof that the coding e ects of rounding scaled weights, using integer arithmetic, and encoding end-of- le are negligible

    On Probability Estimation by Exponential Smoothing

    Full text link
    Probability estimation is essential for every statistical data compression algorithm. In practice probability estimation should be adaptive, recent observations should receive a higher weight than older observations. We present a probability estimation method based on exponential smoothing that satisfies this requirement and runs in constant time per letter. Our main contribution is a theoretical analysis in case of a binary alphabet for various smoothing rate sequences: We show that the redundancy w.r.t. a piecewise stationary model with ss segments is O(sn)O\left(s\sqrt n\right) for any bit sequence of length nn, an improvement over redundancy O(snlogn)O\left(s\sqrt{n\log n}\right) of previous approaches with similar time complexity

    Data Compression For Multimedia Computing

    Get PDF
    This is a library based study on data compression for multimedia computing. Multimedia information needs a large storage capacity as they contain vast amount of data. This would mean multimedia information would be out of reach of most computer users as their PCs would not be able to store the enormous amount of data accumulated on such programs. However, it is not necessary to keep these data in its original form as there are techniques that could compressed multimedia data to a more manageable level. Therefore, the main objective of this study is to provide information on the availability of compression techniques that would enable PC users the opportunity to use such programs. The review of related literature reveals that there are two basic compression techniques available - lossless and lossy. Under the lossless technique, the Huffman Coding, Arithmetic Coding and Lempel-Ziv Welch Coding are discussed. On the other hand, the Predictive, Frequency Oriented and Importance Oriented techniques are discussed under the lossy technique. Besides these two main techniques, Hybrid techniques such as the JPEG, MPEG and Px64 are also discussed. In order to bind the discussion between compression and storage media, a description of popular storage media such as magnetic disk storage and optical disc storage are also included. Although the data are of secondary source, the writer uses a formula derived from Howard and Vitter (1992) to measure compression efficiency. Based on the data collection and analysis it is found that different types of data (text, audio, video etc.)should be compressed using different techniques in order to obtain the ideal compression ratio and quality. Although the writer believes that the secondary data obtained is sufficient to show the best compression techniques for the different types of multimedia data, he also believes that real experiment using real data, software application and hardware would give better and more precise results

    On optimally partitioning a text to improve its compression

    Full text link
    In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once. This problem was introduced in the context of table compression, and then further elaborated and extended to strings and trees. Unfortunately, the literature offers poor solutions: namely, we know either a cubic-time algorithm for computing the optimal partition based on dynamic programming, or few heuristics that do not guarantee any bounds on the efficacy of their computed partition, or algorithms that are efficient but work in some specific scenarios (such as the Burrows-Wheeler Transform) and achieve compression performance that might be worse than the optimal-partitioning by a Ω(logn)\Omega(\sqrt{\log n}) factor. Therefore, computing efficiently the optimal solution is still open. In this paper we provide the first algorithm which is guaranteed to compute in O(n \log_{1+\eps}n) time a partition of T whose compressed output is guaranteed to be no more than (1+ϵ)(1+\epsilon)-worse the optimal one, where ϵ\epsilon may be any positive constant
    corecore