Search CORE

138 research outputs found

Inversion Ranks for Lossless Compression of Color Palette Images

Author: Arnavut Ziya
Sahin Ferat
Publication venue: RIT Scholar Works
Publication date: 01/05/2005
Field of study

Palette images are widely used in World Wide Web (WWW) and game cartridges applications. Many image used in the WWW are stored and transmitted after they are compressed losslessly with the standard graphics interchange format (GIF), or portable network graphic (PNG). Well known two dimensional compression scheme; such as JPEG-LS and CALIC, fails to yield better compression than GIF or PNG, due to the fact that the pixel value represent indices that point to color values in a look-up table. The GIF standard uses Lempel-Ziv compression, which treats the image as a one-dimensional sequence of index values, ignoring two-dimensional nature. Bzip, another universal compressor, yields even better compression gain that the GIF, PNG, JPEG-LS, and CALIC. Variants of block sorting coders, such as Bzip2, utilize Burrows-Wheeler transformation (BWT) by Burrows M. and Wheeler D. J. (1994), followed by move-to-front (MTF) transformation by Bentley J. L. (1986), Elias, P (1987) before using a statistical coder at the final stage. In this paper, we show that the compression performance of block sorting coder can be improved almost 14% on average by utilizing inversion ranks instead of the move-to-front coding

RIT Scholar Works

Information profiles for DNA pattern discovery

Author: Ferreira Paulo J. S. G.
Pinho Armando J.
Pratas Diogo
Publication venue
Publication date: 19/01/2014
Field of study

Finite-context modeling is a powerful tool for compressing and hence for representing DNA sequences. We describe an algorithm to detect genomic regularities, within a blind discovery strategy. The algorithm uses information profiles built using suitable combinations of finite-context models. We used the genome of the fission yeast Schizosaccharomyces pombe strain 972 h- for illustration, unveilling locations of low information content, which are usually associated with DNA regions of potential biological interest.Comment: Full version of DCC 2014 paper "Information profiles for DNA pattern discovery

arXiv.org e-Print Archive

Crossref

On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models

Author: A Milosavljević
AJ Pinho
AL Delcher
António J. R. Neves
Armando J. Pinho
B Behzadi
Carlos A. C. Bastos
CB Burge
Christos A. Ouzounis
D Loewenstern
D Robelin
D Salomon
E Rivals
ET Whittaker
G Korodi
G Korodi
G Manzini
GF Hardy
H Richard
I Tabus
J Rissanen
J Venn
J Ziv
K Sayood
K Sjölander
L Allison
L Allison
M Brown
M Rho
M Stanke
MD Cao
MD Cao
MY Borodovsky
MY Borodovsky
MY Borodovsky
P Ferragina
P Salamon
Paulo J. S. G. Ferreira
PS Laplace
R Giancarlo
S Grumbach
S Tavaré
SL Salzberg
SL Zabell
SL Zabell
T Bayes
TC Bell
TI Dix
W Zhu
WE Johnson
X Chen
Z Liu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

A finite-context (Markov) model of order yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth . Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character

CiteSeerX

Public Library of Science (PLOS)

Crossref

Repositório Institucional da Universidade de Aveiro

Directory of Open Access Journals

PubMed Central

Finite-Context Models for DNA Coding

Author: Antonio J. R. Neves
Armando J. Pinho
Carlos A. C. Bastos
Daniel A. Martins
Paulo J. S. G. Ferreira
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Crossref

GReEn: a tool for efficient compression of genome resequencing data

Author: Ahn
Armando J. Pinho
Behzadi
Brandon
Cao
Chen
Chen
Chen
Christley
Deorowicz
Diogo Pratas
Fritz
Giancarlo
Grumbach
Grumbach
Huala
Korodi
Korodi
Kozanitis
Kuruppu
Lander
Levy
Loewenstern
Manzini
Matsumoto
Ouyang
Pinho
Pinho
Rhee
Rissanen
Rivals
Sara P. Garcia
Tabus
Tembe
Venter
Venter
Wang
Wang
Waterston
Publication venue: Oxford University Press
Publication date
Field of study

Research in the genomic sciences is confronted with the volume of sequencing and resequencing data increasing at a higher pace than that of data storage and communication resources, shifting a significant part of research budgets from the sequencing component of a project to the computational one. Hence, being able to efficiently store sequencing and resequencing data is a problem of paramount importance. In this article, we describe GReEn (Genome Resequencing Encoding), a tool for compressing genome resequencing data using a reference genome sequence. It overcomes some drawbacks of the recently proposed tool GRS, namely, the possibility of compressing sequences that cannot be handled by GRS, faster running times and compression gains of over 100-fold for some sequences. This tool is freely available for non-commercial use at ftp://ftp.ieeta.pt/∼ap/codecs/GReEn1.tar.gz

Crossref

PubMed Central

Towards practical minimum-entropy universal decoding

Author: Coleman Todd P.
Effros Michelle
Médard Muriel
Publication venue: IEEE Computer Society
Publication date: 11/04/2005
Field of study

Minimum-entropy decoding is a universal decoding algorithm used in decoding block compression of discrete memoryless sources as well as block transmission of information across discrete memoryless channels. Extensions can also be applied for multiterminal decoding problems, such as the Slepian-Wolf source coding problem. The 'method of types' has been used to show that there exist linear codes for which minimum-entropy decoders achieve the same error exponent as maximum-likelihood decoders. Since minimum-entropy decoding is NP-hard in general, minimum-entropy decoders have existed primarily in the theory literature. We introduce practical approximation algorithms for minimum-entropy decoding. Our approach, which relies on ideas from linear programming, exploits two key observations. First, the 'method of types' shows that that the number of distinct types grows polynomially in n. Second, recent results in the optimization literature have illustrated polytope projection algorithms with complexity that is a function of the number of vertices of the projected polytope. Combining these two ideas, we leverage recent results on linear programming relaxations for error correcting codes to construct polynomial complexity algorithms for this setting. In the binary case, we explicitly demonstrate linear code constructions that admit provably good performance

Caltech Authors

Distributed Coding of Highly Correlated Image Sequences with Motion-Compensated Temporal Wavelets

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Improved Sequential MAP estimation of CABAC encoded data with objective adjustment of the complexity/efficiency tradeoff

Author: Ben--Jamaa S.
Duhamel Pierre
Kieffer Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/07/2009
Field of study

International audienceThis paper presents an efficient MAP estimator for the joint source-channel decoding of data encoded with a context adaptive binary arithmetic coder (CABAC). The decoding process is compatible with realistic implementations of CABAC in standards like H.264, i.e., handling adaptive probabilities, context modeling and integer arithmetic coding. Soft decoding is obtained using an improved sequential decoding technique, which allows to obtain various tradeoffs between complexity and efficiency. The algorithms are simulated in a context reminiscent of H264. Error detection is realized by exploiting on one side the properties of the binarization scheme and on the other side the redundancy left in the code string. As a result, the CABAC compression efficiency is preserved and no additional redundancy is introduced in the bit stream. Simulation results outline the efficiency of the proposed techniques for encoded data sent over AWGN and UMTS-OFDM channels

HAL-CentraleSupelec

Hal-Diderot

HAL-Rennes 1

Multiple Description Coding with Side Information: Practical Scheme and Iterative Decoding

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector