23 research outputs found
Analysis of Arithmetic Coding for Data Compression
Arithmetic coding, in conjunction with a suitable probabilistic model, can pro-
vide nearly optimal data compression. In this article we analyze the e ect that
the model and the particular implementation of arithmetic coding have on the
code length obtained. Periodic scaling is often used in arithmetic coding im-
plementations to reduce time and storage requirements; it also introduces a
recency e ect which can further a ect compression. Our main contribution is
introducing the concept of weighted entropy and using it to characterize in an
elegant way the e ect that periodic scaling has on the code length. We explain
why and by how much scaling increases the code length for les with a ho-
mogeneous distribution of symbols, and we characterize the reduction in code
length due to scaling for les exhibiting locality of reference. We also give a
rigorous proof that the coding e ects of rounding scaled weights, using integer
arithmetic, and encoding end-of- le are negligible
On Probability Estimation by Exponential Smoothing
Probability estimation is essential for every statistical data compression
algorithm. In practice probability estimation should be adaptive, recent
observations should receive a higher weight than older observations. We present
a probability estimation method based on exponential smoothing that satisfies
this requirement and runs in constant time per letter. Our main contribution is
a theoretical analysis in case of a binary alphabet for various smoothing rate
sequences: We show that the redundancy w.r.t. a piecewise stationary model with
segments is for any bit sequence of length , an
improvement over redundancy of previous
approaches with similar time complexity
Data Compression For Multimedia Computing
This is a library based study on data compression for multimedia computing. Multimedia information needs a large storage capacity as they contain vast amount of data. This would mean multimedia information would be out of reach of most computer users as their PCs would not be able to store the enormous amount of data accumulated on such programs. However, it is not necessary to keep these data in its original form as there are techniques that could compressed multimedia data to a more manageable level. Therefore, the main objective of this study is to provide information on the availability of compression techniques that would enable PC users the opportunity to use such programs. The review of related literature reveals that there are two basic compression techniques available - lossless and lossy. Under the lossless technique, the Huffman Coding, Arithmetic Coding and Lempel-Ziv Welch Coding are discussed. On the other hand, the Predictive, Frequency Oriented and Importance Oriented techniques are
discussed under the lossy technique. Besides these two main techniques, Hybrid techniques such as the JPEG, MPEG and Px64 are also discussed. In order to bind the
discussion between compression and storage media, a description of popular storage media such as magnetic disk storage and optical disc storage are also included.
Although the data are of secondary source, the writer uses a formula derived from Howard and Vitter (1992) to measure compression efficiency. Based on the data collection and analysis it is found that different types of data (text, audio, video etc.)should be compressed using different techniques in order to obtain the ideal compression ratio and quality. Although the writer believes that the secondary data obtained is sufficient to show the best compression techniques for the different types of multimedia data, he also believes that real experiment using real data, software application and hardware
would give better and more precise results
On optimally partitioning a text to improve its compression
In this paper we investigate the problem of partitioning an input string T in
such a way that compressing individually its parts via a base-compressor C gets
a compressed output that is shorter than applying C over the entire T at once.
This problem was introduced in the context of table compression, and then
further elaborated and extended to strings and trees. Unfortunately, the
literature offers poor solutions: namely, we know either a cubic-time algorithm
for computing the optimal partition based on dynamic programming, or few
heuristics that do not guarantee any bounds on the efficacy of their computed
partition, or algorithms that are efficient but work in some specific scenarios
(such as the Burrows-Wheeler Transform) and achieve compression performance
that might be worse than the optimal-partitioning by a
factor. Therefore, computing efficiently the optimal solution is still open. In
this paper we provide the first algorithm which is guaranteed to compute in
O(n \log_{1+\eps}n) time a partition of T whose compressed output is
guaranteed to be no more than -worse the optimal one, where
may be any positive constant