9 research outputs found

    Algoritmos de compressão sem perdas para imagens de microarrays e alinhamento de genomas completos

    Get PDF
    Doutoramento em InformáticaNowadays, in the 21st century, the never-ending expansion of information is a major global concern. The pace at which storage and communication resources are evolving is not fast enough to compensate this tendency. In order to overcome this issue, sophisticated and efficient compression tools are required. The goal of compression is to represent information with as few bits as possible. There are two kinds of compression, lossy and lossless. In lossless compression, information loss is not tolerated so the decoded information is exactly the same as the encoded one. On the other hand, in lossy compression some loss is acceptable. In this work we focused on lossless methods. The goal of this thesis was to create lossless compression tools that can be used in two types of data. The first type is known in the literature as microarray images. These images have 16 bits per pixel and a high spatial resolution. The other data type is commonly called Whole Genome Alignments (WGA), in particularly applied to MAF files. Regarding the microarray images, we improved existing microarray-specific methods by using some pre-processing techniques (segmentation and bitplane reduction). Moreover, we also developed a compression method based on pixel values estimates and a mixture of finite-context models. Furthermore, an approach based on binary-tree decomposition was also considered. Two compression tools were developed to compress MAF files. The first one based on a mixture of finite-context models and arithmetic coding, where only the DNA bases and alignment gaps were considered. The second tool, designated as MAFCO, is a complete compression tool that can handle all the information that can be found in MAF files. MAFCO relies on several finite-context models and allows parallel compression/decompression of MAF files.Hoje em dia, no século XXI, a expansão interminável de informação é uma grande preocupação mundial. O ritmo ao qual os recursos de armazenamento e comunicação estão a evoluir não é suficientemente rápido para compensar esta tendência. De forma a ultrapassar esta situação, são necessárias ferramentas de compressão sofisticadas e eficientes. A compressão consiste em representar informação utilizando a menor quantidade de bits possível. Existem dois tipos de compressão, com e sem perdas. Na compressão sem perdas, a perda de informação não é tolerada, por isso a informação descodificada é exatamente a mesma que a informação que foi codificada. Por outro lado, na compressão com perdas alguma perda é aceitável. Neste trabalho, focámo-nos apenas em métodos de compressão sem perdas. O objetivo desta tese consistiu na criação de ferramentas de compressão sem perdas para dois tipos de dados. O primeiro tipo de dados é conhecido na literatura como imagens de microarrays. Estas imagens têm 16 bits por píxel e uma resolução espacial elevada. O outro tipo de dados é geralmente denominado como alinhamento de genomas completos, particularmente aplicado a ficheiros MAF. Relativamente às imagens de microarrays, melhorámos alguns métodos de compressão específicos utilizando algumas técnicas de pré-processamento (segmentação e redução de planos binários). Além disso, desenvolvemos também um método de compressão baseado em estimação dos valores dos pixéis e em misturas de modelos de contexto-finito. Foi também considerada, uma abordagem baseada em decomposição em árvore binária. Foram desenvolvidas duas ferramentas de compressão para ficheiros MAF. A primeira ferramenta, é baseada numa mistura de modelos de contexto-finito e codificação aritmética, onde apenas as bases de ADN e os símbolos de alinhamento foram considerados. A segunda, designada como MAFCO, é uma ferramenta de compressão completa que consegue lidar com todo o tipo de informação que pode ser encontrada nos ficheiros MAF. MAFCO baseia-se em vários modelos de contexto-finito e permite compressão/descompressão paralela de ficheiros MAF

    Compressió de microarrays d'ADN

    Get PDF
    La tecnologia relacionada amb la creació d'imatges de microarray d'ADN és una eina de gran importància en el descobriment de l'estructura i funcionament de la nostra informació genètica. Dins d'aquest camp, la detecció del comportament de determinats gens sota condicions especifiques adquireix una gran rellevància. La grandària de les imatges de microarray sol ser de mitjana elevada, a causa de la gran quantitat de gens que s'analitzen i al fet que s'intenta mantenir en les imatges el major contingut d'informació possible. Aquestes són generades en grups de dues imatges i en escala de grisos. La finalitat d'aquest projecte és facilitar l'anàlisi de les imatges de microarray als especialistes a través de la combinació del parell d'imatges per formar una imatge en color en RGB, i alhora reduir la grandària d'aquestes imatges per no desaprofitar l'espai físic. En el camp dels microarrays d'ADN perdre informació en les imatges equival a perdre dades d'anàlisi, per aquesta raó la reducció de la mida de les imatges és realitza a través de la compressió sense pèrdua amb l'estàndard JPEG 2000, el que permet reduir la grandària d'aquestes alhora que manté la informació original continguda en elles.The technology related to the DNA microarray image generation is a very important tool in the discovery of the structure and operation of our genetic information. Within this field, the detection of the behavior of some certain genes has a great relevance. The size of the images of microarray is at average high due to the large number of genes analyzed and the fact that it attempts to maintain the most amount of information in the images. These are generated in groups of two in grayscale. In order to facilitate the analysis of microarray images to specialists it is proposed to combine the pair of images to form a color image in RGB, and at the same time reducing the size of images to avoiding to squander the physical space. In the field of DNA microarray, losing information in the images is equivalent to losing data analysis, therefore reducing the size of images is done through lossless compression with the JPEG 2000 standard, which allows to reduce their size while maintaining the original information contained in them.La tecnología relacionada con la creación de imágenes de microarray de ADN es una herramienta de gran importancia en el descubrimiento de la estructura y funcionamiento de nuestra información genética. Dentro de este campo, la detección del comportamiento de determinados genes bajo condiciones específicas adquiere una gran relevancia. El tamaño de las imágenes de microarray suele ser de media elevada, debido a la gran cantidad de genes que se analizan y al hecho que se intenta mantener en las imágenes el mayor contenido de información posible. Éstas son generadas en grupos de dos imágenes y en escala de grises. La finalidad de este proyecto es facilitar el análisis de las imágenes de microarray a los especialistas a través de la combinación de la pareja de imágenes para formar una imagen en color en RGB, y a la vez reducir el tamaño de estas imágenes para no desaprovechar el espacio físico. En el campo de los microarrays de ADN perder información en las imágenes equivale a perder datos de análisis, por esta razón la reducción del tamaño de las imágenes se realiza a través de la compresión sin pérdida con el estándar JPEG 2000, el cual permite reducir el tamaño de éstas a la vez que mantiene la información original contenida en ellas

    Lossless compression of images with specific characteristics

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaA compressão de certos tipos de imagens é um desafio para algumas normas de compressão de imagem. Esta tese investiga a compressão sem perdas de imagens com características especiais, em particular imagens simples, imagens de cor indexada e imagens de microarrays. Estamos interessados no desenvolvimento de métodos de compressão completos e no estudo de técnicas de pré-processamento que possam ser utilizadas em conjunto com as normas de compressão de imagem. A esparsidade do histograma, uma propriedade das imagens simples, é um dos assuntos abordados nesta tese. Desenvolvemos uma técnica de pré-processamento, denominada compactação de histogramas, que explora esta propriedade e que pode ser usada em conjunto com as normas de compressão de imagem para um melhoramento significativo da eficiência de compressão. A compactação de histogramas e os algoritmos de reordenação podem ser usados como préprocessamento para melhorar a compressão sem perdas de imagens de cor indexada. Esta tese apresenta vários algoritmos e um estudo abrangente dos métodos já existentes. Métodos específicos, como é o caso da decomposição em árvores binárias, são também estudados e propostos. O uso de microarrays em biologia encontra-se em franca expansão. Devido ao elevado volume de dados gerados por experiência, são necessárias técnicas de compressão sem perdas. Nesta tese, exploramos a utilização de normas de compressão sem perdas e apresentamos novos algoritmos para codificar eficientemente este tipo de imagens, baseados em modelos de contexto finito e codificação aritmética.The compression of some types of images is a challenge for some standard compression techniques. This thesis investigates the lossless compression of images with specific characteristics, namely simple images, color-indexed images and microarray images. We are interested in the development of complete compression methods and in the study of preprocessing algorithms that could be used together with standard compression methods. The histogram sparseness, a property of simple images, is addressed in this thesis. We developed a preprocessing technique, denoted histogram packing, that explores this property and can be used with standard compression methods for improving significantly their efficiency. Histogram packing and palette reordering algorithms can be used as a preprocessing step for improving the lossless compression of color-indexed images. This thesis presents several algorithms and a comprehensive study of the already existing methods. Specific compression methods, such as binary tree decomposition, are also addressed. The use of microarray expression data in state-of-the-art biology has been well established and due to the significant volume of data generated per experiment, efficient lossless compression methods are needed. In this thesis, we explore the use of standard image coding techniques and we present new algorithms to efficiently compress this type of images, based on finite-context modeling and arithmetic coding

    Efficient architectures of heterogeneous fpga-gpu for 3-d medical image compression

    Get PDF
    The advent of development in three-dimensional (3-D) imaging modalities have generated a massive amount of volumetric data in 3-D images such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US). Existing survey reveals the presence of a huge gap for further research in exploiting reconfigurable computing for 3-D medical image compression. This research proposes an FPGA based co-processing solution to accelerate the mentioned medical imaging system. The HWT block implemented on the sbRIO-9632 FPGA board is Spartan 3 (XC3S2000) chip prototyping board. Analysis and performance evaluation of the 3-D images were been conducted. Furthermore, a novel architecture of context-based adaptive binary arithmetic coder (CABAC) is the advanced entropy coding tool employed by main and higher profiles of H.264/AVC. This research focuses on GPU implementation of CABAC and comparative study of discrete wavelet transform (DWT) and without DWT for 3-D medical image compression systems. Implementation results on MRI and CT images, showing GPU significantly outperforming single-threaded CPU implementation. Overall, CT and MRI modalities with DWT outperform in term of compression ratio, peak signal to noise ratio (PSNR) and latency compared with images without DWT process. For heterogeneous computing, MRI images with various sizes and format, such as JPEG and DICOM was implemented. Evaluation results are shown for each memory iteration, transfer sizes from GPU to CPU consuming more bandwidth or throughput. For size 786, 486 bytes JPEG format, both directions consumed bandwidth tend to balance. Bandwidth is relative to the transfer size, the larger sizing will take more latency and throughput. Next, OpenCL implementation for concurrent task via dedicated FPGA. Finding from implementation reveals, OpenCL on batch procession mode with AOC techniques offers substantial results where the amount of logic, area, register and memory increased proportionally to the number of batch. It is because of the kernel will copy the kernel block refer to batch number. Therefore memory bank increased periodically related to kernel block. It was found through comparative study that the tree balance and unroll loop architecture provides better achievement, in term of local memory, latency and throughput

    Wavelet-based noise reduction of cDNA microarray images

    Get PDF
    The advent of microarray imaging technology has lead to enormous progress in the life sciences by allowing scientists to analyze the expression of thousands of genes at a time. For complementary DNA (cDNA) microarray experiments, the raw data are a pair of red and green channel images corresponding to the treatment and control samples. These images are contaminated by a high level of noise due to the numerous noise sources affecting the image formation. A major challenge of microarray image analysis is the extraction of accurate gene expression measurements from the noisy microarray images. A crucial step in this process is denoising, which consists of reducing the noise in the observed microarray images while preserving the signal information as much as possible. This thesis deals with the problem of developing novel methods for reducing noise in cDNA microarray images for accurate estimation of the gene expression levels. Denoising methods based on the wavelet transform have shown significant success when applied to natural images. However, these methods are not very efficient for reducing noise in cDNA microarray images. An important reason for this is that existing methods are only capable of processing the red and green channel images separately. In doing so. they ignore the signal correlation as well as the noise correlation that exists between the wavelet coefficients of the two channels. The primary objective of this research is to design efficient wavelet-based noise reduction algorithms for cDNA microarray images that take into account these inter-channel dependencies by 'jointly' estimating the noise-free coefficients in both the channels. Denoising algorithms are developed using two types of wavelet transforms, namely, the frequently-used discrete wavelet transform (DWT) and the complex wavelet transform (CWT). The main advantage of using the DWT for denoising is that this transform is computationally very efficient. In order to obtain a better denoising performance for microarray images, however, the CWT is preferred to DWT because the former has good directional selectivity properties that are necessary for better representation of the circular edges of spots. The linear minimum mean squared error and maximum a posteriori estimation techniques are used to develop bivariate estimators for the noise-free coefficients of the two images. These estimators are derived by utilizing appropriate joint probability density functions for the image coefficients as well as the noise coefficients of the two channels. Extensive experimentations are carried out on a large set of cDNA microarray images to evaluate the performance of the proposed denoising methods as compared to the existing ones. Comparisons are made using standard metrics such as the peak signal-to-noise ratio (PSNR) for measuring the amount of noise removed from the pixels of the images, and the mean absolute error for measuring the accuracy of the estimated log-intensity ratios obtained from the denoised version of the images. Results indicate that the proposed denoising methods that are developed specifically for the microarray images do, indeed, lead to more accurate estimation of gene expression levels. Thus, it is expected that the proposed methods will play a significant role in improving the reliability of the results obtained from practical microarray experiments

    Fifth Biennial Report : June 1999 - August 2001

    No full text

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp

    Cereal Genomics II

    Get PDF
    During the last decades, major advances have been made in the field of cereal genomics. For instance, high-density genetic maps, physical maps, QTL maps and even draft genome sequence have become available for several cereal species. This has been facilitated by the development of next generation sequencing (NGS) technologies, so that, it is now possible to sequence genomes of hundreds or thousands of accessions of an individual cereal crop. Significant amounts of data generated using these latest NGS technologies created a demand for computational tools to analyse this massive data. These developments related to technology and the tools, along with their applications not only to plant and genome biology but also to breeding have been documented in this volume. The volume, entitled “Cereal Genomics II”, therefore supplements the earlier edited volume “Cereal Genomics” published in 2004. The new volume has updated chapters, from the leading authorities in their fields, on molecular markers, next generation sequencing platform and their use for QTL analysis, domestication studies, functional genomics and molecular breeding. In addition, there are also chapters on computational genomics, whole genome sequencing and comparative genomics of cereals. The book should prove useful to students, teachers and young research workers as a ready reference to the latest information on cereal genomics
    corecore