254 research outputs found
Compression Methods for Structured Floating-Point Data and their Application in Climate Research
The use of new technologies, such as GPU boosters, have led to a dramatic
increase in the computing power of High-Performance Computing (HPC)
centres. This development, coupled with new climate models that can better
utilise this computing power thanks to software development and internal
design, led to the bottleneck moving from solving the differential equations
describing Earth’s atmospheric interactions to actually storing the variables.
The current approach to solving the storage problem is inadequate: either
the number of variables to be stored is limited or the temporal resolution
of the output is reduced. If it is subsequently determined that another vari-
able is required which has not been saved, the simulation must run again.
This thesis deals with the development of novel compression algorithms
for structured floating-point data such as climate data so that they can be
stored in full resolution.
Compression is performed by decorrelation and subsequent coding of
the data. The decorrelation step eliminates redundant information in the
data. During coding, the actual compression takes place and the data is
written to disk. A lossy compression algorithm additionally has an approx-
imation step to unify the data for better coding. The approximation step
reduces the complexity of the data for the subsequent coding, e.g. by using
quantification. This work makes a new scientific contribution to each of the
three steps described above.
This thesis presents a novel lossy compression method for time-series
data using an Auto Regressive Integrated Moving Average (ARIMA) model
to decorrelate the data. In addition, the concept of information spaces and
contexts is presented to use information across dimensions for decorrela-
tion. Furthermore, a new coding scheme is described which reduces the
weaknesses of the eXclusive-OR (XOR) difference calculation and achieves
a better compression factor than current lossless compression methods for
floating-point numbers. Finally, a modular framework is introduced that
allows the creation of user-defined compression algorithms.
The experiments presented in this thesis show that it is possible to in-
crease the information content of lossily compressed time-series data by
applying an adaptive compression technique which preserves selected data
with higher precision. An analysis for lossless compression of these time-
series has shown no success. However, the lossy ARIMA compression model
proposed here is able to capture all relevant information. The reconstructed
data can reproduce the time-series to such an extent that statistically rele-
vant information for the description of climate dynamics is preserved.
Experiments indicate that there is a significant dependence of the com-
pression factor on the selected traversal sequence and the underlying data
model. The influence of these structural dependencies on prediction-based
compression methods is investigated in this thesis. For this purpose, the
concept of Information Spaces (IS) is introduced. IS contributes to improv-
ing the predictions of the individual predictors by nearly 10% on average.
Perhaps more importantly, the standard deviation of compression results is
on average 20% lower. Using IS provides better predictions and consistent
compression results.
Furthermore, it is shown that shifting the prediction and true value leads
to a better compression factor with minimal additional computational costs.
This allows the use of more resource-efficient prediction algorithms to
achieve the same or better compression factor or higher throughput during
compression or decompression. The coding scheme proposed here achieves
a better compression factor than current state-of-the-art methods.
Finally, this paper presents a modular framework for the development
of compression algorithms. The framework supports the creation of user-
defined predictors and offers functionalities such as the execution of bench-
marks, the random subdivision of n-dimensional data, the quality evalua-
tion of predictors, the creation of ensemble predictors and the execution of
validity tests for sequential and parallel compression algorithms.
This research was initiated because of the needs of climate science, but
the application of its contributions is not limited to it. The results of this the-
sis are of major benefit to develop and improve any compression algorithm
for structured floating-point data
QARV: Quantization-Aware ResNet VAE for Lossy Image Compression
This paper addresses the problem of lossy image compression, a fundamental
problem in image processing and information theory that is involved in many
real-world applications. We start by reviewing the framework of variational
autoencoders (VAEs), a powerful class of generative probabilistic models that
has a deep connection to lossy compression. Based on VAEs, we develop a novel
scheme for lossy image compression, which we name quantization-aware ResNet VAE
(QARV). Our method incorporates a hierarchical VAE architecture integrated with
test-time quantization and quantization-aware training, without which efficient
entropy coding would not be possible. In addition, we design the neural network
architecture of QARV specifically for fast decoding and propose an adaptive
normalization operation for variable-rate compression. Extensive experiments
are conducted, and results show that QARV achieves variable-rate compression,
high-speed decoding, and a better rate-distortion performance than existing
baseline methods. The code of our method is publicly accessible at
https://github.com/duanzhiihao/lossy-vaeComment: Technical repor
Lossless compression with latent variable models
We develop a simple and elegant method for lossless compression using latent variable models, which we call `bits back with asymmetric numeral systems' (BB-ANS). The method involves interleaving encode and decode steps, and achieves an optimal rate when compressing batches of data. We demonstrate it rstly on the MNIST test set, showing that state-of-the-art lossless compression is possible using a small variational autoencoder (VAE) model. We then make use of a novel empirical insight, that fully convolutional generative models, trained on small images, are able to generalize to images of arbitrary size, and extend BB-ANS to hierarchical latent variable models, enabling state-of-the-art lossless compression of full-size colour images from the ImageNet dataset. We describe `Craystack', a modular software framework which we have developed for rapid prototyping of compression using deep generative models
Adapted generalized lifting schemes for scalable lossless image coding
International audienceStill image coding occasionally uses linear predictive coding together with multi-resolution decompositions, as may be found in several papers. Those related approaches do not take into account all the information available at the decoder in the prediction stage. In this paper, we introduce an adapted generalized lifting scheme in which the predictor is built upon two filters, leading to taking advantage of all this available information. With this structure included in a multi-resolution decomposition framework, we study two kinds of adaptation based on least squares estimation, according to different assumptions, which are either a global or a local second order stationarity of the image. The efficiency in lossless coding of these decompositions is shown on synthetic images and their performances are compared with those of well-known codecs (S+P, JPEG-LS, JPEG2000, CALIC) on actual images. Four images' families are distinguished: natural, MRI medical, satellite and textures associated with fingerprints. On natural and medical images, the performances of our codecs do not exceed those of classical codecs. Now for satellite images and textures, they present a slightly noticeable (about 0.05 to 0.08 bpp) coding gain compared to the others that permit a progressive coding in resolution, but with a greater coding time
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
- …