12 research outputs found

    Modifikasi Algoritme J-Bit Encoding untuk Meningkatkan Rasio Kompresi

    Get PDF
    J-bit encoding merupakan algoritme kompresi data lossless yang memanipulasi setiap bit data dalam file untuk meminimalkan ukuran dengan cara membagi data menjadi dua keluaran, kemudian dikombinasikan kembali menjadi satu keluaran. Makalah ini mengusulkan modifikasi algoritme J-bit encoding dengan cara mengeliminasi simbol nol dan satu dari keluaran pertama, sehingga keluaran pertama akan berisi data asli selain nol dan satu (dalam ukuran byte) dan keluaran kedua akan berisi nilai dua bit yang menjelaskan posisi byte nol, byte satu, dan byte selain nol dan satu. Perbandingan unjuk kerja kedua algoritme ini dilakukan dengan menggunakan empat skema kombinasi algoritmem yaitu (i) transformasi Burrows-Wheeler, Move to Front, J-bit encoding, dan pengkodean aritmatika, (ii) transformasi Burrows-Wheeler, Move to Front, algoritme hasil modifikasi, dan pengkodean aritmatika, (iii) transformasi Burrows-Wheeler, Move One From Front, J-bit encoding, dan pengkodean aritmatika, (iv) transformasi Burrows-Wheeler, Move One From Front, algoritme hasil modifikasi, dan pengkodean aritmatika. Dengan menggunakan data set Calgary Corpus dan Canterbury Corpus, hasil pengujian menunjukkan bahwa rata-rata rasio kompresi terbaik diperoleh dengan menggunakan skema kedua. Sedangkan dengan menggunakan empat file gambar, hasil pengujian menunjukkan bahwa rata-rata rasio kompresi terbaik diperoleh dengan menggunakan skema keempat

    On the Complexity of BWT-Runs Minimization via Alphabet Reordering

    Get PDF
    The Burrows-Wheeler Transform (BWT) has been an essential tool in text compression and indexing. First introduced in 1994, it went on to provide the backbone for the first encoding of the classic suffix tree data structure in space close to the entropy-based lower bound. Recently, there has been the development of compact suffix trees in space proportional to "rr", the number of runs in the BWT, as well as the appearance of rr in the time complexity of new algorithms. Unlike other popular measures of compression, the parameter rr is sensitive to the lexicographic ordering given to the text's alphabet. Despite several past attempts to exploit this, a provably efficient algorithm for finding, or approximating, an alphabet ordering which minimizes rr has been open for years. We present the first set of results on the computational complexity of minimizing BWT-runs via alphabet reordering. We prove that the decision version of this problem is NP-complete and cannot be solved in time 2o(σ+n)2^{o(\sigma + \sqrt{n})} unless the Exponential Time Hypothesis fails, where σ\sigma is the size of the alphabet and nn is the length of the text. We also show that the optimization problem is APX-hard. In doing so, we relate two previously disparate topics: the optimal traveling salesperson path and the number of runs in the BWT of a text, providing a surprising connection between problems on graphs and text compression. Also, by relating recent results in the field of dictionary compression, we illustrate that an arbitrary alphabet ordering provides a O(log2n)O(\log^2 n)-approximation. We provide an optimal linear-time algorithm for the problem of finding a run minimizing ordering on a subset of symbols (occurring only once) under ordering constraints, and prove a generalization of this problem to a class of graphs with BWT like properties called Wheeler graphs is NP-complete

    On Undetected Redundancy in the Burrows-Wheeler Transform

    Get PDF
    The Burrows-Wheeler-Transform (BWT) is an invertible permutation of a text known to be highly compressible but also useful for sequence analysis, what makes the BWT highly attractive for lossless data compression. In this paper, we present a new technique to reduce the size of a BWT using its combinatorial properties, while keeping it invertible. The technique can be applied to any BWT-based compressor, and, as experiments show, is able to reduce the encoding size by 8-16 % on average and up to 33-57 % in the best cases (depending on the BWT-compressor used), making BWT-based compressors competitive or even superior to today\u27s best lossless compressors

    Implementation of Statistical Compression Method

    Get PDF
    Tato práce pojednává o statistických metodách komprese dat. Zabývá se návrhem kompresního postupu a jeho implementací ve formě knihovny v programovacím jazyce C++. Věnuje se popisu a analýze jednotlivých metod. Obsahuje výsledky testů provedených na různých kompresních metodách a jejich následné vyhodnocení.In this thesis statistical methods for data compression are presented. It deals with projecting of compression process and with it's implementation in a form of program library, which is created in language C++. Description and analysis of compression methods are discussed. The results of tests, which were performed with different compression methods are demonstrated.

    Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding

    Get PDF
    Lossless compression methods based on the Burrows‐Wheeler transform (BWT) are regarded as an excellent compromise between speed and compression efficiency: they provide compression rates close to the PPM algorithms, with the speed of dictionary‐based methods. Instead of the laborious statistics‐gathering process used in PPM, the BWT reversibly sorts the input symbols, using as the sort key as many following characters as necessary to make the sort unique. Characters occurring in similar contexts are sorted close together, resulting in a clustered symbol sequence. Run‐length encoding and Move‐to‐Front (MTF) recoding, combined with a statistical Huffman or arithmetic coder, is then typically used to exploit the clustering. A drawback of the MTF recoding is that knowledge of the character that produced the MTF number is lost. In this paper, we present a new, competitive Burrows‐Wheeler posttransform stage that takes advantage of interpolative coding—a fast binary encoding method for integer sequences, being able to exploit clusters without requiring explicit statistics. We introduce a fast and simple way to retain knowledge of the run characters during the MTF recoding and use this to improve the clustering of MTF numbers and run‐lengths by applying reversible, stable sorting, with the run characters as sort keys, achieving significant improvement in the compression rate, as shown here by experiments on common text corpora.</p

    Lossless Image Compression

    Get PDF
    Tato práce se zabývá bezeztrátovou kompresí obrazu. Jsou zde uvedeny některé barevné modely, vhodné pro bezeztrátovou kompresi, a vzorce použité pro převody mezi nimi a RGB modelem. Dále práce pojednává o prediktorech a jejich fungování. Je zde popsána funkčnost aritmetického a PPM kódéru, a stručný popis Huffmanova kódování.This thesis deals with lossless image compression. In this paper are shown some colour models, which can be used for lossless image compression and formulas how to convert them to RGB and vica versa. You can learn predictors, how they work and discription of some of them. There is described the function of arithmetic coder, PPM coder and a brief description of Huffman coding.

    Implementation of Statistical Compression Methods

    Get PDF
    Cílem této práce je popsat statistické metody komprese dat. Úvod pokrývá teoretické minimum komprese dat. Těžiště práce tvoří popis jednotlivých metod a implementace Burrows-Wheelerovho kompresního algoritmu v programovacím jazyce C. Obsahuje výsledky testů jednotlivých metod a jejich vyhodnocení.The aim of this thesis is to describe statistical methods for data compression. Introduction covers theoretical minimum of data compression. Center of the work is about description of each method and implementation of Burrows-Wheeler compression algorithm in C programming language. It contains test results of each method and their evaluation.

    Implementation of Statistical Compression Methods

    Get PDF
    Tato diplomová práce popisuje Burrowsův-Wheelerův kompresní algoritmus. Detailně se zaměřuje na jednotlivé části Burrowsova-Wheelerova algoritmu, nejvíce na transformaci globální struktury a entropické kódery. V rámci transormace globální struktury jsou popsány například tyto metody presuň na začátek, inverzní frekvence, intervalové kódování a další. Mezi popsanými entropickými kodéry jsou Huffmanovo, aritmetické a Riceovo-Golombovo kódování. V závěru je provedeno testování metod transformace globální struktury a entropických kodérů. Nejlepší kombinace je porovnána s nejpoužívanějšími kompresními algoritmy.This thesis describes Burrow-Wheeler compression algorithm. It focuses on each part of Burrow-Wheeler algorithm, most of all on and entropic coders. In section are described methods like move to front, inverse frequences, interval coding, etc. Among the described entropy coders are Huffman, arithmetic and Rice-Golomg coders. In conclusion there is testing of described methods of global structure transformation and entropic coders. Best combinations are compared with the most common compress algorithm.

    Algorithms and Lower Bounds for Ordering Problems on Strings

    Get PDF
    This dissertation presents novel algorithms and conditional lower bounds for a collection of string and text-compression-related problems. These results are unified under the theme of ordering constraint satisfaction. Utilizing the connections to ordering constraint satisfaction, we provide hardness results and algorithms for the following: recognizing a type of labeled graph amenable to text-indexing known as Wheeler graphs, minimizing the number of maximal unary substrings occurring in the Burrows-Wheeler Transformation of a text, minimizing the number of factors occurring in the Lyndon factorization of a text, and finding an optimal reference string for relative Lempel-Ziv encoding
    corecore