38 research outputs found

    Influence of Dictionary Size on the Lossless Compression of Microarray Images

    Full text link
    A key challenge in the management of microarray data is the large size of images that constitute the output of microarray experiments. Therefore, only the expression values extracted from these experiments are generally made available. However, the extraction of expression data is effected by a variety of factors, such as the thresholds used for background intensity correction, method used for grid determination, and parameters used in foreground (spot)-background delineation. This information is not always available or consistent across experiments and impacts downstream data analysis. Furthermore, the lack of access to the image-based primary data often leads to costly replication of experiments. Currently, both lossy and lossless compression techniques have been developed for microarray images. While lossy algorithms deliver better compression, a significant advantage of the lossless techniques is that they guarantee against loss of information that is putatively of biological importance. A key challenge therefore is the development of more efficacious lossless compression techniques. Dictionary-based compression is one of the critical methods used in lossless microarray compression. However, the image-based microarray data has potentially infinite variability. So the selection and effect of the dictionary size on the compression rate is crucial. Our paper examines this problem and shows that increasing the dictionary size beyond a certain size, does not lead to better compression. Our investigations also point to strategies for determining the optimal dictionary size. 1

    Standard and specific compression techniques for DNA microarray images

    Get PDF
    We review the state of the art in DNA microarray image compression and provide original comparisons between standard and microarray-specific compression techniques that validate and expand previous work. First, we describe the most relevant approaches published in the literature and classify them according to the stage of the typical image compression process where each approach makes its contribution, and then we summarize the compression results reported for these microarray-specific image compression schemes. In a set of experiments conducted for this paper, we obtain new results for several popular image coding techniques that include the most recent coding standards. Prediction-based schemes CALIC and JPEG-LS are the best-performing standard compressors, but are improved upon by the best microarray-specific technique, Battiato's CNN-based scheme

    Algoritmos de compressão sem perdas para imagens de microarrays e alinhamento de genomas completos

    Get PDF
    Doutoramento em InformáticaNowadays, in the 21st century, the never-ending expansion of information is a major global concern. The pace at which storage and communication resources are evolving is not fast enough to compensate this tendency. In order to overcome this issue, sophisticated and efficient compression tools are required. The goal of compression is to represent information with as few bits as possible. There are two kinds of compression, lossy and lossless. In lossless compression, information loss is not tolerated so the decoded information is exactly the same as the encoded one. On the other hand, in lossy compression some loss is acceptable. In this work we focused on lossless methods. The goal of this thesis was to create lossless compression tools that can be used in two types of data. The first type is known in the literature as microarray images. These images have 16 bits per pixel and a high spatial resolution. The other data type is commonly called Whole Genome Alignments (WGA), in particularly applied to MAF files. Regarding the microarray images, we improved existing microarray-specific methods by using some pre-processing techniques (segmentation and bitplane reduction). Moreover, we also developed a compression method based on pixel values estimates and a mixture of finite-context models. Furthermore, an approach based on binary-tree decomposition was also considered. Two compression tools were developed to compress MAF files. The first one based on a mixture of finite-context models and arithmetic coding, where only the DNA bases and alignment gaps were considered. The second tool, designated as MAFCO, is a complete compression tool that can handle all the information that can be found in MAF files. MAFCO relies on several finite-context models and allows parallel compression/decompression of MAF files.Hoje em dia, no século XXI, a expansão interminável de informação é uma grande preocupação mundial. O ritmo ao qual os recursos de armazenamento e comunicação estão a evoluir não é suficientemente rápido para compensar esta tendência. De forma a ultrapassar esta situação, são necessárias ferramentas de compressão sofisticadas e eficientes. A compressão consiste em representar informação utilizando a menor quantidade de bits possível. Existem dois tipos de compressão, com e sem perdas. Na compressão sem perdas, a perda de informação não é tolerada, por isso a informação descodificada é exatamente a mesma que a informação que foi codificada. Por outro lado, na compressão com perdas alguma perda é aceitável. Neste trabalho, focámo-nos apenas em métodos de compressão sem perdas. O objetivo desta tese consistiu na criação de ferramentas de compressão sem perdas para dois tipos de dados. O primeiro tipo de dados é conhecido na literatura como imagens de microarrays. Estas imagens têm 16 bits por píxel e uma resolução espacial elevada. O outro tipo de dados é geralmente denominado como alinhamento de genomas completos, particularmente aplicado a ficheiros MAF. Relativamente às imagens de microarrays, melhorámos alguns métodos de compressão específicos utilizando algumas técnicas de pré-processamento (segmentação e redução de planos binários). Além disso, desenvolvemos também um método de compressão baseado em estimação dos valores dos pixéis e em misturas de modelos de contexto-finito. Foi também considerada, uma abordagem baseada em decomposição em árvore binária. Foram desenvolvidas duas ferramentas de compressão para ficheiros MAF. A primeira ferramenta, é baseada numa mistura de modelos de contexto-finito e codificação aritmética, onde apenas as bases de ADN e os símbolos de alinhamento foram considerados. A segunda, designada como MAFCO, é uma ferramenta de compressão completa que consegue lidar com todo o tipo de informação que pode ser encontrada nos ficheiros MAF. MAFCO baseia-se em vários modelos de contexto-finito e permite compressão/descompressão paralela de ficheiros MAF

    Topics in genomic image processing

    Get PDF
    The image processing methodologies that have been actively studied and developed now play a very significant role in the flourishing biotechnology research. This work studies, develops and implements several image processing techniques for M-FISH and cDNA microarray images. In particular, we focus on three important areas: M-FISH image compression, microarray image processing and expression-based classification. Two schemes, embedded M-FISH image coding (EMIC) and Microarray BASICA: Background Adjustment, Segmentation, Image Compression and Analysis, have been introduced for M-FISH image compression and microarray image processing, respectively. In the expression-based classification area, we investigate the relationship between optimal number of features and sample size, either analytically or through simulation, for various classifiers

    Effect of image compression and scaling on automated scoring of immunohistochemical stainings and segmentation of tumor epithelium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Digital whole-slide scanning of tissue specimens produces large images demanding increasing storing capacity. To reduce the need of extensive data storage systems image files can be compressed and scaled down. The aim of this article is to study the effect of different levels of image compression and scaling on automated image analysis of immunohistochemical (IHC) stainings and automated tumor segmentation.</p> <p>Methods</p> <p>Two tissue microarray (TMA) slides containing 800 samples of breast cancer tissue immunostained against Ki-67 protein and two TMA slides containing 144 samples of colorectal cancer immunostained against EGFR were digitized with a whole-slide scanner. The TMA images were JPEG2000 wavelet compressed with four compression ratios: lossless, and 1:12, 1:25 and 1:50 lossy compression. Each of the compressed breast cancer images was furthermore scaled down either to 1:1, 1:2, 1:4, 1:8, 1:16, 1:32, 1:64 or 1:128. Breast cancer images were analyzed using an algorithm that quantitates the extent of staining in Ki-67 immunostained images, and EGFR immunostained colorectal cancer images were analyzed with an automated tumor segmentation algorithm. The automated tools were validated by comparing the results from losslessly compressed and non-scaled images with results from conventional visual assessments. Percentage agreement and kappa statistics were calculated between results from compressed and scaled images and results from lossless and non-scaled images.</p> <p>Results</p> <p>Both of the studied image analysis methods showed good agreement between visual and automated results. In the automated IHC quantification, an agreement of over 98% and a kappa value of over 0.96 was observed between losslessly compressed and non-scaled images and combined compression ratios up to 1:50 and scaling down to 1:8. In automated tumor segmentation, an agreement of over 97% and a kappa value of over 0.93 was observed between losslessly compressed images and compression ratios up to 1:25.</p> <p>Conclusions</p> <p>The results of this study suggest that images stored for assessment of the extent of immunohistochemical staining can be compressed and scaled significantly, and images of tumors to be segmented can be compressed without compromising computer-assisted analysis results using studied methods.</p> <p>Virtual slides</p> <p>The virtual slide(s) for this article can be found here: <url>http://www.diagnosticpathology.diagnomx.eu/vs/2442925476534995</url></p

    Lossless compression of images with specific characteristics

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaA compressão de certos tipos de imagens é um desafio para algumas normas de compressão de imagem. Esta tese investiga a compressão sem perdas de imagens com características especiais, em particular imagens simples, imagens de cor indexada e imagens de microarrays. Estamos interessados no desenvolvimento de métodos de compressão completos e no estudo de técnicas de pré-processamento que possam ser utilizadas em conjunto com as normas de compressão de imagem. A esparsidade do histograma, uma propriedade das imagens simples, é um dos assuntos abordados nesta tese. Desenvolvemos uma técnica de pré-processamento, denominada compactação de histogramas, que explora esta propriedade e que pode ser usada em conjunto com as normas de compressão de imagem para um melhoramento significativo da eficiência de compressão. A compactação de histogramas e os algoritmos de reordenação podem ser usados como préprocessamento para melhorar a compressão sem perdas de imagens de cor indexada. Esta tese apresenta vários algoritmos e um estudo abrangente dos métodos já existentes. Métodos específicos, como é o caso da decomposição em árvores binárias, são também estudados e propostos. O uso de microarrays em biologia encontra-se em franca expansão. Devido ao elevado volume de dados gerados por experiência, são necessárias técnicas de compressão sem perdas. Nesta tese, exploramos a utilização de normas de compressão sem perdas e apresentamos novos algoritmos para codificar eficientemente este tipo de imagens, baseados em modelos de contexto finito e codificação aritmética.The compression of some types of images is a challenge for some standard compression techniques. This thesis investigates the lossless compression of images with specific characteristics, namely simple images, color-indexed images and microarray images. We are interested in the development of complete compression methods and in the study of preprocessing algorithms that could be used together with standard compression methods. The histogram sparseness, a property of simple images, is addressed in this thesis. We developed a preprocessing technique, denoted histogram packing, that explores this property and can be used with standard compression methods for improving significantly their efficiency. Histogram packing and palette reordering algorithms can be used as a preprocessing step for improving the lossless compression of color-indexed images. This thesis presents several algorithms and a comprehensive study of the already existing methods. Specific compression methods, such as binary tree decomposition, are also addressed. The use of microarray expression data in state-of-the-art biology has been well established and due to the significant volume of data generated per experiment, efficient lossless compression methods are needed. In this thesis, we explore the use of standard image coding techniques and we present new algorithms to efficiently compress this type of images, based on finite-context modeling and arithmetic coding

    Efficient architectures of heterogeneous fpga-gpu for 3-d medical image compression

    Get PDF
    The advent of development in three-dimensional (3-D) imaging modalities have generated a massive amount of volumetric data in 3-D images such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US). Existing survey reveals the presence of a huge gap for further research in exploiting reconfigurable computing for 3-D medical image compression. This research proposes an FPGA based co-processing solution to accelerate the mentioned medical imaging system. The HWT block implemented on the sbRIO-9632 FPGA board is Spartan 3 (XC3S2000) chip prototyping board. Analysis and performance evaluation of the 3-D images were been conducted. Furthermore, a novel architecture of context-based adaptive binary arithmetic coder (CABAC) is the advanced entropy coding tool employed by main and higher profiles of H.264/AVC. This research focuses on GPU implementation of CABAC and comparative study of discrete wavelet transform (DWT) and without DWT for 3-D medical image compression systems. Implementation results on MRI and CT images, showing GPU significantly outperforming single-threaded CPU implementation. Overall, CT and MRI modalities with DWT outperform in term of compression ratio, peak signal to noise ratio (PSNR) and latency compared with images without DWT process. For heterogeneous computing, MRI images with various sizes and format, such as JPEG and DICOM was implemented. Evaluation results are shown for each memory iteration, transfer sizes from GPU to CPU consuming more bandwidth or throughput. For size 786, 486 bytes JPEG format, both directions consumed bandwidth tend to balance. Bandwidth is relative to the transfer size, the larger sizing will take more latency and throughput. Next, OpenCL implementation for concurrent task via dedicated FPGA. Finding from implementation reveals, OpenCL on batch procession mode with AOC techniques offers substantial results where the amount of logic, area, register and memory increased proportionally to the number of batch. It is because of the kernel will copy the kernel block refer to batch number. Therefore memory bank increased periodically related to kernel block. It was found through comparative study that the tree balance and unroll loop architecture provides better achievement, in term of local memory, latency and throughput

    Learning in the compressed data domain: Application to milk quality prediction

    Get PDF
    Smart dairy farming has become one of the most exciting and challenging area in cloud-based data analytics. Transfer of raw data from all farms to a central cloud is currently not feasible as applications are generating more data while internet connectivity is lacking in rural farms. As a solution, Fog computing has become a key factor to process data near the farm and derive farm insights by exchanging data between on-farm applications and transferring some data to the cloud. In this context, learning in the compressed data domain, where decompression is not necessary, is highly desirable as it minimizes the energy used for communication/computation, reduces required memory/storage, and improves application latency. Mid-infrared spectroscopy (MIRS) is used globally to predict several milk quality parameters as well as deriving many animal-level phenotypes. Therefore, compressed learning on MIRS data is beneficial both in terms of data processing in the Fog, as well as storing large data sets in the cloud. In this paper, we used principal component analysis and wavelet transform as two techniques for compressed learning to convert MIRS data into a compressed data domain. The study derives near lossless compression parameters for both techniques to transform MIRS data without impacting the prediction accuracy for a selection of milk quality traits