64 research outputs found

    Compressing Proteomes: The Relevance of Medium Range Correlations

    Get PDF
    We study the nonrandomness of proteome sequences by analysing the correlations that arise between amino acids at a short and medium range, more specifically, between amino acids located 10 or 100 residues apart; respectively. We show that statistical models that consider these two types of correlation are more likely to seize the information contained in protein sequences and thus achieve good compression rates. Finally, we propose that the cause for this redundancy is related to the evolutionary origin of proteomes and protein sequences

    A joint motion & disparity motion estimation technique for 3D integral video compression using evolutionary strategy

    Get PDF
    3D imaging techniques have the potential to establish a future mass-market in the fields of entertainment and communications. Integral imaging, which can capture true 3D color images with only one camera, has been seen as the right technology to offer stress-free viewing to audiences of more than one person. Just like any digital video, 3D video sequences must also be compressed in order to make it suitable for consumer domain applications. However, ordinary compression techniques found in state-of-the-art video coding standards such as H.264, MPEG-4 and MPEG-2 are not capable of producing enough compression while preserving the 3D clues. Fortunately, a huge amount of redundancies can be found in an integral video sequence in terms of motion and disparity. This paper discusses a novel approach to use both motion and disparity information to compress 3D integral video sequences. We propose to decompose the integral video sequence down to viewpoint video sequences and jointly exploit motion and disparity redundancies to maximize the compression. We further propose an optimization technique based on evolutionary strategies to minimize the computational complexity of the joint motion disparity estimation. Experimental results demonstrate that Joint Motion and Disparity Estimation can achieve over 1 dB objective quality gain over normal motion estimation. Once combined with Evolutionary strategy, this can achieve up to 94% computational cost saving

    Using semantic knowledge to improve compression on log files

    Get PDF
    With the move towards global and multi-national companies, information technology infrastructure requirements are increasing. As the size of these computer networks increases, it becomes more and more difficult to monitor, control, and secure them. Networks consist of a number of diverse devices, sensors, and gateways which are often spread over large geographical areas. Each of these devices produce log files which need to be analysed and monitored to provide network security and satisfy regulations. Data compression programs such as gzip and bzip2 are commonly used to reduce the quantity of data for archival purposes after the log files have been rotated. However, there are many other compression programs which exist - each with their own advantages and disadvantages. These programs each use a different amount of memory and take different compression and decompression times to achieve different compression ratios. System log files also contain redundancy which is not necessarily exploited by standard compression programs. Log messages usually use a similar format with a defined syntax. In the log files, all the ASCII characters are not used and the messages contain certain "phrases" which often repeated. This thesis investigates the use of compression as a means of data reduction and how the use of semantic knowledge can improve data compression (also applying results to different scenarios that can occur in a distributed computing environment). It presents the results of a series of tests performed on different log files. It also examines the semantic knowledge which exists in maillog files and how it can be exploited to improve the compression results. The results from a series of text preprocessors which exploit this knowledge are presented and evaluated. These preprocessors include: one which replaces the timestamps and IP addresses with their binary equivalents and one which replaces words from a dictionary with unused ASCII characters. In this thesis, data compression is shown to be an effective method of data reduction producing up to 98 percent reduction in filesize on a corpus of log files. The use of preprocessors which exploit semantic knowledge results in up to 56 percent improvement in overall compression time and up to 32 percent reduction in compressed size.TeXpdfTeX-1.40.

    Adaptive edge-based prediction for lossless image compression

    Get PDF
    Many lossless image compression methods have been suggested with established results hard to surpass. However there are some aspects that can be considered to improve the performance further. This research focuses on two-phase prediction-encoding method, separately studying each and suggesting new techniques.;In the prediction module, proposed Edge-Based-Predictor (EBP) and Least-Squares-Edge-Based-Predictor (LS-EBP) emphasizes on image edges and make predictions accordingly. EBP is a gradient based nonlinear adaptive predictor. EBP switches between prediction-rules based on few threshold parameters automatically determined by a pre-analysis procedure, which makes a first pass. The LS-EBP also uses these parameters, but optimizes the prediction for each pre-analysis assigned edge location, thus applying least-square approach only at the edge points.;For encoding module: a novel Burrows Wheeler Transform (BWT) inspired method is suggested, which performs better than applying the BWT directly on the images. We also present a context-based adaptive error modeling and encoding scheme. When coupled with the above-mentioned prediction schemes, the result is the best-known compression performance in the genre of compression schemes with same time and space complexity

    CompressĂŁo e anĂĄlise de dados genĂłmicos

    Get PDF
    Doutoramento em InformĂĄticaGenomic sequences are large codi ed messages describing most of the structure of all known living organisms. Since the presentation of the rst genomic sequence, a huge amount of genomics data have been generated, with diversi ed characteristics, rendering the data deluge phenomenon a serious problem in most genomics centers. As such, most of the data are discarded (when possible), while other are compressed using general purpose algorithms, often attaining modest data reduction results. Several speci c algorithms have been proposed for the compression of genomic data, but unfortunately only a few of them have been made available as usable and reliable compression tools. From those, most have been developed to some speci c purpose. In this thesis, we propose a compressor for genomic sequences of multiple natures, able to function in a reference or reference-free mode. Besides, it is very exible and can cope with diverse hardware speci cations. It uses a mixture of nite-context models (FCMs) and eXtended FCMs. The results show improvements over state-of-the-art compressors. Since the compressor can be seen as a unsupervised alignment-free method to estimate algorithmic complexity of genomic sequences, it is the ideal candidate to perform analysis of and between sequences. Accordingly, we de ne a way to approximate directly the Normalized Information Distance, aiming to identify evolutionary similarities in intra- and inter-species. Moreover, we introduce a new concept, the Normalized Relative Compression, that is able to quantify and infer new characteristics of the data, previously undetected by other methods. We also investigate local measures, being able to locate speci c events, using complexity pro les. Furthermore, we present and explore a method based on complexity pro les to detect and visualize genomic rearrangements between sequences, identifying several insights of the genomic evolution of humans. Finally, we introduce the concept of relative uniqueness and apply it to the Ebolavirus, identifying three regions that appear in all the virus sequences outbreak but nowhere in the human genome. In fact, we show that these sequences are su cient to classify di erent sub-species. Also, we identify regions in human chromosomes that are absent from close primates DNA, specifying novel traits in human uniqueness.As sequĂȘncias genĂłmicas podem ser vistas como grandes mensagens codificadas, descrevendo a maior parte da estrutura de todos os organismos vivos. Desde a apresentação da primeira sequĂȘncia, um enorme nĂșmero de dados genĂłmicos tem sido gerado, com diversas caracterĂ­sticas, originando um sĂ©rio problema de excesso de dados nos principais centros de genĂłmica. Por esta razĂŁo, a maioria dos dados Ă© descartada (quando possĂ­vel), enquanto outros sĂŁo comprimidos usando algoritmos genĂ©ricos, quase sempre obtendo resultados de compressĂŁo modestos. TĂȘm tambĂ©m sido propostos alguns algoritmos de compressĂŁo para sequĂȘncias genĂłmicas, mas infelizmente apenas alguns estĂŁo disponĂ­veis como ferramentas eficientes e prontas para utilização. Destes, a maioria tem sido utilizada para propĂłsitos especĂ­ficos. Nesta tese, propomos um compressor para sequĂȘncias genĂłmicas de natureza mĂșltipla, capaz de funcionar em modo referencial ou sem referĂȘncia. AlĂ©m disso, Ă© bastante flexĂ­vel e pode lidar com diversas especificaçÔes de hardware. O compressor usa uma mistura de modelos de contexto-finito (FCMs) e FCMs estendidos. Os resultados mostram melhorias relativamente a compressores estado-dearte. Uma vez que o compressor pode ser visto como um mĂ©todo nĂŁo supervisionado, que nĂŁo utiliza alinhamentos para estimar a complexidade algortĂ­mica das sequĂȘncias genĂłmicas, ele Ă© o candidato ideal para realizar anĂĄlise de e entre sequĂȘncias. Em conformidade, definimos uma maneira de aproximar directamente a distĂąncia de informação normalizada (NID), visando a identificação evolucionĂĄria de similaridades em intra e interespĂ©cies. AlĂ©m disso, introduzimos um novo conceito, a compressĂŁo relativa normalizada (NRC), que Ă© capaz de quantificar e inferir novas caracterĂ­sticas nos dados, anteriormente indetectados por outros mĂ©todos. Investigamos tambĂ©m medidas locais, localizando eventos especĂ­ficos, usando perfis de complexidade. Propomos e exploramos um novo mĂ©todo baseado em perfis de complexidade para detectar e visualizar rearranjos genĂłmicos entre sequĂȘncias, identificando algumas caracterĂ­sticas da evolução genĂłmica humana. Por Ășltimo, introduzimos um novo conceito de singularidade relativa e aplicamo-lo ao Ebolavirus, identificando trĂȘs regiĂ”es presentes em todas as sequĂȘncias do surto viral, mas ausentes do genoma humano. De facto, mostramos que as trĂȘs sequĂȘncias sĂŁo suficientes para classificar diferentes sub-espĂ©cies. TambĂ©m identificamos regiĂ”es nos cromossomas humanos que estĂŁo ausentes do ADN de primatas prĂłximos, especificando novas caracterĂ­sticas da singularidade humana

    Distributed Source Coding Techniques for Lossless Compression of Hyperspectral Images

    Get PDF
    This paper deals with the application of distributed source coding (DSC) theory to remote sensing image compression. Although DSC exhibits a significant potential in many application fields, up till now the results obtained on real signals fall short of the theoretical bounds, and often impose additional system-level constraints. The objective of this paper is to assess the potential of DSC for lossless image compression carried out onboard a remote platform. We first provide a brief overview of DSC of correlated information sources. We then focus on onboard lossless image compression, and apply DSC techniques in order to reduce the complexity of the onboard encoder, at the expense of the decoder's, by exploiting the correlation of different bands of a hyperspectral dataset. Specifically, we propose two different compression schemes, one based on powerful binary error-correcting codes employed as source codes, and one based on simpler multilevel coset codes. The performance of both schemes is evaluated on a few AVIRIS scenes, and is compared with other state-of-the-art 2D and 3D coders. Both schemes turn out to achieve competitive compression performance, and one of them also has reduced complexity. Based on these results, we highlight the main issues that are still to be solved to further improve the performance of DSC-based remote sensing systems

    Lossless compression of images with specific characteristics

    Get PDF
    Doutoramento em Engenharia ElectrotĂ©cnicaA compressĂŁo de certos tipos de imagens Ă© um desafio para algumas normas de compressĂŁo de imagem. Esta tese investiga a compressĂŁo sem perdas de imagens com caracterĂ­sticas especiais, em particular imagens simples, imagens de cor indexada e imagens de microarrays. Estamos interessados no desenvolvimento de mĂ©todos de compressĂŁo completos e no estudo de tĂ©cnicas de prĂ©-processamento que possam ser utilizadas em conjunto com as normas de compressĂŁo de imagem. A esparsidade do histograma, uma propriedade das imagens simples, Ă© um dos assuntos abordados nesta tese. Desenvolvemos uma tĂ©cnica de prĂ©-processamento, denominada compactação de histogramas, que explora esta propriedade e que pode ser usada em conjunto com as normas de compressĂŁo de imagem para um melhoramento significativo da eficiĂȘncia de compressĂŁo. A compactação de histogramas e os algoritmos de reordenação podem ser usados como prĂ©processamento para melhorar a compressĂŁo sem perdas de imagens de cor indexada. Esta tese apresenta vĂĄrios algoritmos e um estudo abrangente dos mĂ©todos jĂĄ existentes. MĂ©todos especĂ­ficos, como Ă© o caso da decomposição em ĂĄrvores binĂĄrias, sĂŁo tambĂ©m estudados e propostos. O uso de microarrays em biologia encontra-se em franca expansĂŁo. Devido ao elevado volume de dados gerados por experiĂȘncia, sĂŁo necessĂĄrias tĂ©cnicas de compressĂŁo sem perdas. Nesta tese, exploramos a utilização de normas de compressĂŁo sem perdas e apresentamos novos algoritmos para codificar eficientemente este tipo de imagens, baseados em modelos de contexto finito e codificação aritmĂ©tica.The compression of some types of images is a challenge for some standard compression techniques. This thesis investigates the lossless compression of images with specific characteristics, namely simple images, color-indexed images and microarray images. We are interested in the development of complete compression methods and in the study of preprocessing algorithms that could be used together with standard compression methods. The histogram sparseness, a property of simple images, is addressed in this thesis. We developed a preprocessing technique, denoted histogram packing, that explores this property and can be used with standard compression methods for improving significantly their efficiency. Histogram packing and palette reordering algorithms can be used as a preprocessing step for improving the lossless compression of color-indexed images. This thesis presents several algorithms and a comprehensive study of the already existing methods. Specific compression methods, such as binary tree decomposition, are also addressed. The use of microarray expression data in state-of-the-art biology has been well established and due to the significant volume of data generated per experiment, efficient lossless compression methods are needed. In this thesis, we explore the use of standard image coding techniques and we present new algorithms to efficiently compress this type of images, based on finite-context modeling and arithmetic coding
    • 

    corecore