388 research outputs found

    Data compression for sequencing data

    Get PDF
    Post-Sanger sequencing methods produce tons of data, and there is a general agreement that the challenge to store and process them must be addressed with data compression. In this review we first answer the question “why compression” in a quantitative manner. Then we also answer the questions “what” and “how”, by sketching the fundamental compression ideas, describing the main sequencing data types and formats, and comparing the specialized compression algorithms and tools. Finally, we go back to the question “why compression” and give other, perhaps surprising answers, demonstrating the pervasiveness of data compression techniques in computational biology

    Full-State Quantum Circuit Simulation by Using Data Compression

    Full text link
    Quantum circuit simulations are critical for evaluating quantum algorithms and machines. However, the number of state amplitudes required for full simulation increases exponentially with the number of qubits. In this study, we leverage data compression to reduce memory requirements, trading computation time and fidelity for memory space. Specifically, we develop a hybrid solution by combining the lossless compression and our tailored lossy compression method with adaptive error bounds at each timestep of the simulation. Our approach optimizes for compression speed and makes sure that errors due to lossy compression are uncorrelated, an important property for comparing simulation output with physical machines. Experiments show that our approach reduces the memory requirement of simulating the 61-qubit Grover's search algorithm from 32 exabytes to 768 terabytes of memory on Argonne's Theta supercomputer using 4,096 nodes. The results suggest that our techniques can increase the simulation size by 2 to 16 qubits for general quantum circuits.Comment: Published in SC2019. Please cite the SC versio

    A forensics software toolkit for DNA steganalysis.

    Get PDF
    Recent advances in genetic engineering have allowed the insertion of artificial DNA strands into the living cells of organisms. Several methods have been developed to insert information into a DNA sequence for the purpose of data storage, watermarking, or communication of secret messages. The ability to detect, extract, and decode messages from DNA is important for forensic data collection and for data security. We have developed a software toolkit that is able to detect the presence of a hidden message within a DNA sequence, extract that message, and then decode it. The toolkit is able to detect, extract, and decode messages that have been encoded with a variety of different coding schemes. The goal of this project is to enable our software toolkit to determine with which coding scheme a message has been encoded in DNA and then to decode it. The software package is able to decode messages that have been encoded with every variation of most of the coding schemes described in this document. The software toolkit has two different options for decoding that can be selected by the user. The first is a frequency analysis approach that is very commonly used in cryptanalysis. This approach is very fast, but is unable to decode messages shorter than 200 words accurately. The second option is using a Genetic Algorithm (GA) in combination with a Wisdom of Artificial Crowds (WoAC) technique. This approach is very time consuming, but can decode shorter messages with much higher accuracy

    Video coding for compression and content-based functionality

    Get PDF
    The lifetime of this research project has seen two dramatic developments in the area of digital video coding. The first has been the progress of compression research leading to a factor of two improvement over existing standards, much wider deployment possibilities and the development of the new international ITU-T Recommendation H.263. The second has been a radical change in the approach to video content production with the introduction of the content-based coding concept and the addition of scene composition information to the encoded bit-stream. Content-based coding is central to the latest international standards efforts from the ISO/IEC MPEG working group. This thesis reports on extensions to existing compression techniques exploiting a priori knowledge about scene content. Existing, standardised, block-based compression coding techniques were extended with work on arithmetic entropy coding and intra-block prediction. These both form part of the H.263 and MPEG-4 specifications respectively. Object-based coding techniques were developed within a collaborative simulation model, known as SIMOC, then extended with ideas on grid motion vector modelling and vector accuracy confidence estimation. An improved confidence measure for encouraging motion smoothness is proposed. Object-based coding ideas, with those from other model and layer-based coding approaches, influenced the development of content-based coding within MPEG-4. This standard made considerable progress in this newly adopted content based video coding field defining normative techniques for arbitrary shape and texture coding. The means to generate this information, the analysis problem, for the content to be coded was intentionally not specified. Further research work in this area concentrated on video segmentation and analysis techniques to exploit the benefits of content based coding for generic frame based video. The work reported here introduces the use of a clustering algorithm on raw data features for providing initial segmentation of video data and subsequent tracking of those image regions through video sequences. Collaborative video analysis frameworks from COST 21 l qual and MPEG-4, combining results from many other segmentation schemes, are also introduced

    The 1995 Science Information Management and Data Compression Workshop

    Get PDF
    This document is the proceedings from the 'Science Information Management and Data Compression Workshop,' which was held on October 26-27, 1995, at the NASA Goddard Space Flight Center, Greenbelt, Maryland. The Workshop explored promising computational approaches for handling the collection, ingestion, archival, and retrieval of large quantities of data in future Earth and space science missions. It consisted of fourteen presentations covering a range of information management and data compression approaches that are being or have been integrated into actual or prototypical Earth or space science data information systems, or that hold promise for such an application. The Workshop was organized by James C. Tilton and Robert F. Cromp of the NASA Goddard Space Flight Center

    A taxonomy of Lempel-Ziv compression algorithms

    Get PDF
    • …
    corecore