4,074 research outputs found

    Parallel Implementation of Lossy Data Compression for Temporal Data Sets

    Full text link
    Many scientific data sets contain temporal dimensions. These are the data storing information at the same spatial location but different time stamps. Some of the biggest temporal datasets are produced by parallel computing applications such as simulations of climate change and fluid dynamics. Temporal datasets can be very large and cost a huge amount of time to transfer among storage locations. Using data compression techniques, files can be transferred faster and save storage space. NUMARCK is a lossy data compression algorithm for temporal data sets that can learn emerging distributions of element-wise change ratios along the temporal dimension and encodes them into an index table to be concisely represented. This paper presents a parallel implementation of NUMARCK. Evaluated with six data sets obtained from climate and astrophysics simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when running 12800 MPI processes on a parallel computer. We also compare the compression ratios against two lossy data compression algorithms, ISABELA and ZFP. The results show that NUMARCK achieved higher compression ratio than ISABELA and ZFP.Comment: 10 pages, HiPC 201

    Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP

    Full text link
    With ever-increasing volumes of scientific data produced by HPC applications, significantly reducing data size is critical because of limited capacity of storage space and potential bottlenecks on I/O or networks in writing/reading or transferring data. SZ and ZFP are the two leading lossy compressors available to compress scientific data sets. However, their performance is not consistent across different data sets and across different fields of some data sets: for some fields SZ provides better compression performance, while other fields are better compressed with ZFP. This situation raises the need for an automatic online (during compression) selection between SZ and ZFP, with a minimal overhead. In this paper, the automatic selection optimizes the rate-distortion, an important statistical quality metric based on the signal-to-noise ratio. To optimize for rate-distortion, we investigate the principles of SZ and ZFP. We then propose an efficient online, low-overhead selection algorithm that predicts the compression quality accurately for two compressors in early processing stages and selects the best-fit compressor for each data field. We implement the selection algorithm into an open-source library, and we evaluate the effectiveness of our proposed solution against plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results on three data sets representing about 100 fields show that our selection algorithm improves the compression ratio up to 70% with the same level of data distortion because of very accurate selection (around 99%) of the best-fit compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio

    Temporal Lossy In-Situ Compression for Computational Fluid Dynamics Simulations

    Get PDF
    Während CFD Simulationen für Metallschmelze im Rahmen des SFB920 fallen auf dem Taurus HPC Cluster in Dresden sehr große Datenmengen an, deren Handhabung den wissenschaftlichen Arbeitsablauf stark verlangsamen. Zum einen ist der Transfer in Visualisierungssysteme nur unter hohem Zeitaufwand möglich. Zum anderen ist interaktive Analyse von zeitlich abhängigen Prozessen auf Grund des Speicherflaschenhalses nahezu unmöglich. Aus diesen Gründen beschäftigt sich die vorliegende Dissertation mit der Entwicklung sog. Temporaler In-Situ Kompression für wissenschaftliche Daten direkt innerhalb von CFD Simulationen. Dabei werden mittels neuer Quantisierungsverfahren die Daten auf ~10% komprimiert, wobei dekomprimierte Daten einen Fehler von maximal 1% aufweisen. Im Gegensatz zu nicht-temporaler Kompression, wird bei temporaler Kompression der Unterschied zwischen Zeitschritten komprimiert, um den Kompressionsgrad zu erhöhen. Da die Datenmenge um ein Vielfaches kleiner ist, werden Kosten für die Speicherung und die Übertragung gesenkt. Da Kompression, Transfer und Dekompression bis zu 4 mal schneller ablaufen als der Transfer von unkomprimierten Daten, wird der wissenschaftliche Arbeitsablauf beschleunigt

    Feasibility and performances of compressed-sensing and sparse map-making with Herschel/PACS data

    Full text link
    The Herschel Space Observatory of ESA was launched in May 2009 and is in operation since. From its distant orbit around L2 it needs to transmit a huge quantity of information through a very limited bandwidth. This is especially true for the PACS imaging camera which needs to compress its data far more than what can be achieved with lossless compression. This is currently solved by including lossy averaging and rounding steps on board. Recently, a new theory called compressed-sensing emerged from the statistics community. This theory makes use of the sparsity of natural (or astrophysical) images to optimize the acquisition scheme of the data needed to estimate those images. Thus, it can lead to high compression factors. A previous article by Bobin et al. (2008) showed how the new theory could be applied to simulated Herschel/PACS data to solve the compression requirement of the instrument. In this article, we show that compressed-sensing theory can indeed be successfully applied to actual Herschel/PACS data and give significant improvements over the standard pipeline. In order to fully use the redundancy present in the data, we perform full sky map estimation and decompression at the same time, which cannot be done in most other compression methods. We also demonstrate that the various artifacts affecting the data (pink noise, glitches, whose behavior is a priori not well compatible with compressed-sensing) can be handled as well in this new framework. Finally, we make a comparison between the methods from the compressed-sensing scheme and data acquired with the standard compression scheme. We discuss improvements that can be made on ground for the creation of sky maps from the data.Comment: 11 pages, 6 figures, 5 tables, peer-reviewed articl
    • …
    corecore