12 research outputs found
Modifikasi Algoritme J-Bit Encoding untuk Meningkatkan Rasio Kompresi
J-bit encoding merupakan algoritme kompresi data lossless yang memanipulasi setiap bit data dalam file untuk meminimalkan ukuran dengan cara membagi data menjadi dua
keluaran, kemudian dikombinasikan kembali menjadi satu keluaran. Makalah ini mengusulkan modifikasi algoritme J-bit
encoding dengan cara mengeliminasi simbol nol dan satu dari
keluaran pertama, sehingga keluaran pertama akan berisi data
asli selain nol dan satu (dalam ukuran byte) dan keluaran kedua akan berisi nilai dua bit yang menjelaskan posisi byte nol, byte satu, dan byte selain nol dan satu. Perbandingan unjuk kerja kedua algoritme ini dilakukan dengan menggunakan empat skema kombinasi algoritmem yaitu (i) transformasi Burrows-Wheeler, Move to Front, J-bit encoding, dan pengkodean aritmatika, (ii) transformasi Burrows-Wheeler, Move to Front, algoritme hasil modifikasi, dan pengkodean aritmatika, (iii) transformasi Burrows-Wheeler, Move One From Front, J-bit encoding, dan pengkodean aritmatika, (iv) transformasi Burrows-Wheeler, Move One From Front, algoritme hasil modifikasi, dan pengkodean aritmatika. Dengan menggunakan data set Calgary Corpus dan Canterbury Corpus, hasil pengujian menunjukkan bahwa rata-rata rasio kompresi terbaik diperoleh dengan menggunakan skema kedua. Sedangkan dengan menggunakan empat file gambar, hasil pengujian menunjukkan bahwa rata-rata rasio kompresi terbaik diperoleh dengan menggunakan skema keempat
On the Complexity of BWT-Runs Minimization via Alphabet Reordering
The Burrows-Wheeler Transform (BWT) has been an essential tool in text
compression and indexing. First introduced in 1994, it went on to provide the
backbone for the first encoding of the classic suffix tree data structure in
space close to the entropy-based lower bound. Recently, there has been the
development of compact suffix trees in space proportional to "", the number
of runs in the BWT, as well as the appearance of in the time complexity of
new algorithms. Unlike other popular measures of compression, the parameter
is sensitive to the lexicographic ordering given to the text's alphabet.
Despite several past attempts to exploit this, a provably efficient algorithm
for finding, or approximating, an alphabet ordering which minimizes has
been open for years.
We present the first set of results on the computational complexity of
minimizing BWT-runs via alphabet reordering. We prove that the decision version
of this problem is NP-complete and cannot be solved in time unless the Exponential Time Hypothesis fails, where is the
size of the alphabet and is the length of the text. We also show that the
optimization problem is APX-hard. In doing so, we relate two previously
disparate topics: the optimal traveling salesperson path and the number of runs
in the BWT of a text, providing a surprising connection between problems on
graphs and text compression. Also, by relating recent results in the field of
dictionary compression, we illustrate that an arbitrary alphabet ordering
provides a -approximation.
We provide an optimal linear-time algorithm for the problem of finding a run
minimizing ordering on a subset of symbols (occurring only once) under ordering
constraints, and prove a generalization of this problem to a class of graphs
with BWT like properties called Wheeler graphs is NP-complete
On Undetected Redundancy in the Burrows-Wheeler Transform
The Burrows-Wheeler-Transform (BWT) is an invertible permutation of a text known to be highly compressible but also useful for sequence analysis, what makes the BWT highly attractive for lossless data compression. In this paper, we present a new technique to reduce the size of a BWT using its combinatorial properties, while keeping it invertible. The technique can be applied to any BWT-based compressor, and, as experiments show, is able to reduce the encoding size by 8-16 % on average and up to 33-57 % in the best cases (depending on the BWT-compressor used), making BWT-based compressors competitive or even superior to today\u27s best lossless compressors
Implementation of Statistical Compression Method
Tato práce pojednává o statistických metodách komprese dat. Zabývá se návrhem kompresního postupu a jeho implementací ve formě knihovny v programovacím jazyce C++. Věnuje se popisu a analýze jednotlivých metod. Obsahuje výsledky testů provedených na různých kompresních metodách a jejich následné vyhodnocení.In this thesis statistical methods for data compression are presented. It deals with projecting of compression process and with it's implementation in a form of program library, which is created in language C++. Description and analysis of compression methods are discussed. The results of tests, which were performed with different compression methods are demonstrated.
Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding
Lossless compression methods based on the Burrows‐Wheeler transform
(BWT) are regarded as an excellent compromise between speed and
compression efficiency: they provide compression rates close to the PPM
algorithms, with the speed of dictionary‐based methods. Instead of the
laborious statistics‐gathering process used in PPM, the BWT reversibly
sorts the input symbols, using as the sort key as many following
characters as necessary to make the sort unique. Characters occurring in
similar contexts are sorted close together, resulting in a clustered
symbol sequence. Run‐length encoding and Move‐to‐Front (MTF) recoding,
combined with a statistical Huffman or arithmetic coder, is then
typically used to exploit the clustering. A drawback of the MTF recoding
is that knowledge of the character that produced the MTF number is
lost. In this paper, we present a new, competitive Burrows‐Wheeler
posttransform stage that takes advantage of interpolative coding—a fast
binary encoding method for integer sequences, being able to exploit
clusters without requiring explicit statistics. We introduce a fast and
simple way to retain knowledge of the run characters during the MTF
recoding and use this to improve the clustering of MTF numbers and
run‐lengths by applying reversible, stable sorting, with the run
characters as sort keys, achieving significant improvement in the
compression rate, as shown here by experiments on common text corpora.</p
Lossless Image Compression
Tato práce se zabývá bezeztrátovou kompresí obrazu. Jsou zde uvedeny některé barevné modely, vhodné pro bezeztrátovou kompresi, a vzorce použité pro převody mezi nimi a RGB modelem. Dále práce pojednává o prediktorech a jejich fungování. Je zde popsána funkčnost aritmetického a PPM kódéru, a stručný popis Huffmanova kódování.This thesis deals with lossless image compression. In this paper are shown some colour models, which can be used for lossless image compression and formulas how to convert them to RGB and vica versa. You can learn predictors, how they work and discription of some of them. There is described the function of arithmetic coder, PPM coder and a brief description of Huffman coding.
Implementation of Statistical Compression Methods
Cílem této práce je popsat statistické metody komprese dat. Úvod pokrývá teoretické minimum komprese dat. Těžiště práce tvoří popis jednotlivých metod a implementace Burrows-Wheelerovho kompresního algoritmu v programovacím jazyce C. Obsahuje výsledky testů jednotlivých metod a jejich vyhodnocení.The aim of this thesis is to describe statistical methods for data compression. Introduction covers theoretical minimum of data compression. Center of the work is about description of each method and implementation of Burrows-Wheeler compression algorithm in C programming language. It contains test results of each method and their evaluation.
Implementation of Statistical Compression Methods
Tato diplomová práce popisuje Burrowsův-Wheelerův kompresní algoritmus. Detailně se zaměřuje na jednotlivé části Burrowsova-Wheelerova algoritmu, nejvíce na transformaci globální struktury a entropické kódery. V rámci transormace globální struktury jsou popsány například tyto metody presuň na začátek, inverzní frekvence, intervalové kódování a další. Mezi popsanými entropickými kodéry jsou Huffmanovo, aritmetické a Riceovo-Golombovo kódování. V závěru je provedeno testování metod transformace globální struktury a entropických kodérů. Nejlepší kombinace je porovnána s nejpoužívanějšími kompresními algoritmy.This thesis describes Burrow-Wheeler compression algorithm. It focuses on each part of Burrow-Wheeler algorithm, most of all on and entropic coders. In section are described methods like move to front, inverse frequences, interval coding, etc. Among the described entropy coders are Huffman, arithmetic and Rice-Golomg coders. In conclusion there is testing of described methods of global structure transformation and entropic coders. Best combinations are compared with the most common compress algorithm.
Algorithms and Lower Bounds for Ordering Problems on Strings
This dissertation presents novel algorithms and conditional lower bounds for a collection of string and text-compression-related problems. These results are unified under the theme of ordering constraint satisfaction. Utilizing the connections to ordering constraint satisfaction, we provide hardness results and algorithms for the following: recognizing a type of labeled graph amenable to text-indexing known as Wheeler graphs, minimizing the number of maximal unary substrings occurring in the Burrows-Wheeler Transformation of a text, minimizing the number of factors occurring in the Lyndon factorization of a text, and finding an optimal reference string for relative Lempel-Ziv encoding