2,572 research outputs found

    Data Compression in the Petascale Astronomy Era: a GERLUMPH case study

    Full text link
    As the volume of data grows, astronomers are increasingly faced with choices on what data to keep -- and what to throw away. Recent work evaluating the JPEG2000 (ISO/IEC 15444) standards as a future data format standard in astronomy has shown promising results on observational data. However, there is still a need to evaluate its potential on other type of astronomical data, such as from numerical simulations. GERLUMPH (the GPU-Enabled High Resolution cosmological MicroLensing parameter survey) represents an example of a data intensive project in theoretical astrophysics. In the next phase of processing, the ~27 terabyte GERLUMPH dataset is set to grow by a factor of 100 -- well beyond the current storage capabilities of the supercomputing facility on which it resides. In order to minimise bandwidth usage, file transfer time, and storage space, this work evaluates several data compression techniques. Specifically, we investigate off-the-shelf and custom lossless compression algorithms as well as the lossy JPEG2000 compression format. Results of lossless compression algorithms on GERLUMPH data products show small compression ratios (1.35:1 to 4.69:1 of input file size) varying with the nature of the input data. Our results suggest that JPEG2000 could be suitable for other numerical datasets stored as gridded data or volumetric data. When approaching lossy data compression, one should keep in mind the intended purposes of the data to be compressed, and evaluate the effect of the loss on future analysis. In our case study, lossy compression and a high compression ratio do not significantly compromise the intended use of the data for constraining quasar source profiles from cosmological microlensing.Comment: 15 pages, 9 figures, 5 tables. Published in the Special Issue of Astronomy & Computing on The future of astronomical data format

    Lossless compression with latent variable models

    Get PDF
    We develop a simple and elegant method for lossless compression using latent variable models, which we call `bits back with asymmetric numeral systems' (BB-ANS). The method involves interleaving encode and decode steps, and achieves an optimal rate when compressing batches of data. We demonstrate it rstly on the MNIST test set, showing that state-of-the-art lossless compression is possible using a small variational autoencoder (VAE) model. We then make use of a novel empirical insight, that fully convolutional generative models, trained on small images, are able to generalize to images of arbitrary size, and extend BB-ANS to hierarchical latent variable models, enabling state-of-the-art lossless compression of full-size colour images from the ImageNet dataset. We describe `Craystack', a modular software framework which we have developed for rapid prototyping of compression using deep generative models

    Image Steganography with Dual Layer Security Using Fragment and Unite Technique

    Get PDF
    At the present time where a regularly increasing number of data is made in different structures, kept and transferred, online security is the most vigorous factor. Different ways such as Cryptography, Steganography and Digital Watermarking are used to defend the data. The proposed framework gives additional dual layer of security as Cryptography and Steganography have been combined. Here data will be encrypted by using Encryption Algorithm AES. Then the encrypted data is embedded into a system’s Defaulting image using least significant byte LSB Algorithm. Steganographed Default image is then fragmented into uniform parts and gets unite into reverse sequence using Uniform Fragment and Unite Technique. Reverse Steganographed Default image is then hidden (unseen) into another image. The proposed framework has summarized the goal to be safety and security factors

    Streaming visualisation of quantitative mass spectrometry data based on a novel raw signal decomposition method

    Get PDF
    As data rates rise, there is a danger that informatics for high-throughput LC-MS becomes more opaque and inaccessible to practitioners. It is therefore critical that efficient visualisation tools are available to facilitate quality control, verification, validation, interpretation, and sharing of raw MS data and the results of MS analyses. Currently, MS data is stored as contiguous spectra. Recall of individual spectra is quick but panoramas, zooming and panning across whole datasets necessitates processing/memory overheads impractical for interactive use. Moreover, visualisation is challenging if significant quantification data is missing due to data-dependent acquisition of MS/MS spectra. In order to tackle these issues, we leverage our seaMass technique for novel signal decomposition. LC-MS data is modelled as a 2D surface through selection of a sparse set of weighted B-spline basis functions from an over-complete dictionary. By ordering and spatially partitioning the weights with an R-tree data model, efficient streaming visualisations are achieved. In this paper, we describe the core MS1 visualisation engine and overlay of MS/MS annotations. This enables the mass spectrometrist to quickly inspect whole runs for ionisation/chromatographic issues, MS/MS precursors for coverage problems, or putative biomarkers for interferences, for example. The open-source software is available from http://seamass.net/viz/

    New Lossless Compression Method using Cyclic Reversible Low Contrast Mapping (CRLCM)

    Get PDF
    In general, the compression method is developed to reduce the redundancy of data. This study uses a different approach to embed some bits of datum in image data into other datum using a Reversible Low Contrast Mapping (RLCM) transformation. Besides using the RLCM for embedding, this method also applies the properties of RLCM to compress the datum before it is embedded. In its algorithm, the proposed method engages Queue and Recursive Indexing. The algorithm encodes the data in a cyclic manner. In contrast to RLCM, the proposed method is a coding method as Huffman coding. This research uses publicly available image data to examine the proposed method. For all testing images, the proposed method has higher compression ratio than the Huffman coding
    • …
    corecore