Search CORE

4 research outputs found

Optimal Prefix Codes with Fewer Distinct Codeword Lengths are Faster to Construct

Author: Belal Ahmed
Elmasry Amr
Publication venue
Publication date: 16/08/2016
Field of study

A new method for constructing minimum-redundancy binary prefix codes is described. Our method does not explicitly build a Huffman tree; instead it uses a property of optimal prefix codes to compute the codeword lengths corresponding to the input weights. Let

n

be the number of weights and

k

be the number of distinct codeword lengths as produced by the algorithm for the optimum codes. The running time of our algorithm is

O(k \cdot n)

. Following our previous work in \cite{be}, no algorithm can possibly construct optimal prefix codes in

o(k \cdot n)

time. When the given weights are presorted our algorithm performs

O(9^k \cdot \log^{2k}{n})

comparisons.Comment: 23 pages, a preliminary version appeared in STACS 200

arXiv.org e-Print Archive

CiteSeerX

Design and application of variable-to-variable length codes

Author: Kirchhoffer Heiner (gnd: 1097398935)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

This work addresses the design of minimum redundancy variable-to-variable length (V2V) codes and studies their suitability for using them in the probability interval partitioning entropy (PIPE) coding concept as an alternative to binary arithmetic coding. Several properties and new concepts for V2V codes are discussed and a polynomial-based principle for designing V2V codes is proposed. Various minimum redundancy V2V codes are derived and combined with the PIPE coding concept. Their redundancy is compared to the binary arithmetic coder of the video compression standard H.265/HEVC

Rostocker Dokumentenserver

Efficient compression of large repetitive strings

Author: Hoobin C
Publication venue: RMIT University
Publication date
Field of study

When is comes to managing large volumes of data, general-purpose compressors such as gzip are ubiquitous. They are fast, practical and available on every modern platform from standard desktops to mobile devices. These tools exploit local redundancy in a text using a fixed-size sliding window. This window is usually very small relative to the text, however, in principle it can be as large as available memory. The window acts as a dictionary. Compression is achieved by replacing substrings with pointers to previous occurrences found in the dictionary. This type of algorithm becomes problematic when dealing with collections that are larger than physical memory, as it fails to capture any non-local redundancy, that is, repetition that occurs outside of its search window. With rapid growth in the already enormous amount of data we store and process there is a pressing need for improving compression effectiveness, reducing both storage requirements and decompression costs. However, many systems still use general-purpose compression tools on large highly repetitive data collections. In this thesis we focus on addressing this issue. We explore compression in a variety of domains where large volumes of data need to be stored and accessed, and general-purpose compression tools are cannon. First we discuss our work on web corpus compression, then we discuss the implementation of a practical index for repetitive texts that gives strong theoretical bounds in terms of size and access, and finally, we discuss our work on compression of high-throughput sequencing reads. We show that in all cases, our new methods improve on current techniques in both run-time and compression effectiveness, and provide important functionality such as fast decoding and random access

RMIT Research Repository