5 research outputs found
Optimal Parsing for Dictionary Text Compression
Dictionary-based compression algorithms include a parsing strategy to
transform the input text into a sequence of dictionary phrases. Given a text,
such process usually is not unique and, for compression purpose, it makes
sense to find one of the possible parsing that minimize the final compression
ratio. This is the parsing problem. An optimal parsing is a parsing strategy
or a parsing algorithm that solve the parsing problem taking account of
all the constraints of a compression algorithm or of a class of homogeneous
compression algorithms. Compression algorithm constrains are, for instance,
the dictionary itself, i.e. the dynamic set of available phrases, and how much
a phrase weights on the compressed text, i.e. the number of bits of which
the codeword representing such phrase is composed, also denoted as the
encoding cost of a dictionary pointer.
In more than 30th years of history of dictionary based text compression,
while plenty of algorithms, variants and extensions appeared and while dictionary
approach to text compression became one of the most appreciated
and utilized in almost all the storage and communication processes, only few
optimal parsing algorithms were presented. Many compression algorithms
still leaks optimality of their parsing or, at least, proof of optimality. This
happens because there is not a general model of the parsing problem that includes
all the dictionary based algorithms and because the existing optimal
parsing algorithms work under too restrictive hypothesis.
This work focus on the parsing problem and presents both a general
model for dictionary based text compression called Dictionary-Symbolwise
Text Compression theory and a general parsing algorithm that is proved
to be optimal under some realistic hypothesis. This algorithm is called
iii
Dictionary-Symbolwise Flexible Parsing and it covers almost all of the known
cases of dictionary based text compression algorithms together with the large
class of their variants where the text is decomposed in a sequence of symbols
and dictionary phrases.
In this work we further consider the case of a free mixture of a dictionary
compressor and a symbolwise compressor. Our Dictionary-Symbolwise
Flexible Parsing covers also this case. We have indeed an optimal parsing
algorithm in the case of dictionary-symbolwise compression where the dictionary
is prefix closed and the cost of encoding dictionary pointer is variable.
The symbolwise compressor is any classical one that works in linear time, as
many common variable-length encoders do. Our algorithm works under the
assumption that a special graph that will be described in the following, is
well defined. Even if this condition is not satisfied, it is possible to use the
same method to obtain almost optimal parses. In detail, when the dictionary
is LZ78-like, we show how to implement our algorithm in linear time.
When the dictionary is LZ77-like our algorithm can be implemented in time
O(n log n). Both have O(n) space complexity.
Even if the main aim of this work is of theoretical nature, some experimental
results will be introduced to underline some practical effects of
the parsing optimality in terms of compression performance and to show
how to improve the compression ratio by building extensions Dictionary-
Symbolwise of known algorithms. Finally, some more detailed experiments
are hosted in a devoted appendix
Speeding up Lossless Image Compression: Experimental Results on a Parallel Machine
Arithmetic encoders enable the best compressors both for bi-level images (JBIG) and for grey scale and color images (CALIC), but they are often ruled out because too complex. The compression gap between simpler techniques and state of the art compressors can be significant. Storer extended dictionary text compression to bi-level images to avoid arithmetic encoders (BLOCK MATCHING), achieving 70 percent of the compression of JBIG1 on the CCITT bi-level image test set. We were able to partition an image into up to a hundred areas and to apply the BLOCK MATCHING heuristic independently to each area with no loss of compression effectiveness
Scalability and communication in parallel low-complexity lossless compression
Approximation schemes for optimal compression with static and sliding dictionaries which can run on a simple array of processors with distributed memory and no interconnections are presented. These approximation algorithms can be implemented on both small and large scale parallel systems. The sliding dictionary method requires large size files on large scale systems. As far as lossless image compression is concerned, arithmetic encoders enable the best lossless compressors but they are often ruled out because they are too complex. Storer extended dictionary text compression to bi-level images to avoid arithmetic encoders (BLOCK MATCHING). We were able to partition an image into up to a hundred areas and to apply the BLOCK MATCHING heuristic independently to each area with no loss of compression effectiveness. Therefore, the approach is suitable for a small scale parallel system at no communication cost. On the other hand, bi-level image compression seems to require communication on large scale systems. With regard to grey scale and color images, parallelizable lossless image compression (PALIC) is a highly parallelizable and scalable lossless compressor since it is applied independently to blocks of 8 × 8 pixels. We experimented the BLOCK MATCHING and PALIC heuristics with up to 32 processors of a 256 Intel Xeon 3.06 GHz processors machine (http://avogadro.cilea.it) on a test set of large topographic bi-level images and color images in RGB format. We obtained the expected speed-up of the compression and decompression times, achieving parallel running times about 25 times faster than the sequential ones. Finally, scalable algorithms computing static and sliding dictionary optimal text compression on an exclusive read, exclusive write shared memory parallel machine are presented. On the same model, compression by block matching of bi-level images is shown which can be implemented on a full binary tree architecture under some realistic assumptions with no scalability issues. © 2010 Birkhäuser / Springer Basel AG
Scalability and Communication in Parallel Low-Complexity Lossless Compression
Approximation schemes for optimal compression with static and sliding dictionaries which can run on a simple array of processors with distributed memory and no interconnections are presented. These approximation algorithms can be implemented on both small and large scale parallel systems. The sliding dictionary method requires large size files on large scale systems. As far as lossless image compression is concerned, arithmetic encoders enable the best lossless compressors but they are often ruled out because they are too complex. Storer extended dictionary text compression to bi-level images to avoid arithmetic encoders (BLOCK MATCHING). We were able to partition an image into up to a hundred areas and to apply the BLOCK MATCHING heuristic independently to each area with no loss of compression effectiveness. Therefore, the approach is suitable for a small scale parallel system at no communication cost. On the other hand, bi-level image compression seems to require communication on large scale systems. With regard to grey scale and color images, parallelizable lossless image compression (PALIC) is a highly parallelizable and scalable lossless compressor since it is applied independently to blocks of 8 Ă— 8 pixels. We experimented the BLOCK MATCHING and PALIC heuristics with up to 32 processors of a 256 Intel Xeon 3.06 GHz processors machine (http://avogadro.cilea.it) on a test set of large topographic bi-level images and color images in RGB format. We obtained the expected speed-up of the compression and decompression times, achieving parallel running times about 25 times faster than the sequential ones. Finally, scalable algorithms computing static and sliding dictionary optimal text compression on an exclusive read, exclusive write shared memory parallel machine are presented. On the same model, compression by block matching of bi-level images is shown which can be implemented on a full binary tree architecture under some realistic assumptions with no scalability issues