229 research outputs found
Lossless compression with latent variable models
We develop a simple and elegant method for lossless compression using latent variable models, which we call `bits back with asymmetric numeral systems' (BB-ANS). The method involves interleaving encode and decode steps, and achieves an optimal rate when compressing batches of data. We demonstrate it rstly on the MNIST test set, showing that state-of-the-art lossless compression is possible using a small variational autoencoder (VAE) model. We then make use of a novel empirical insight, that fully convolutional generative models, trained on small images, are able to generalize to images of arbitrary size, and extend BB-ANS to hierarchical latent variable models, enabling state-of-the-art lossless compression of full-size colour images from the ImageNet dataset. We describe `Craystack', a modular software framework which we have developed for rapid prototyping of compression using deep generative models
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Übertragung von Audiosignalen verwendet. Für
professionelle Anwendungen, wie etwa die Archivierung und Übertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
überlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprünglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprünglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
UDC: Unified DNAS for Compressible TinyML Models
Deploying TinyML models on low-cost IoT hardware is very challenging, due to
limited device memory capacity. Neural processing unit (NPU) hardware address
the memory challenge by using model compression to exploit weight quantization
and sparsity to fit more parameters in the same footprint. However, designing
compressible neural networks (NNs) is challenging, as it expands the design
space across which we must make balanced trade-offs. This paper demonstrates
Unified DNAS for Compressible (UDC) NNs, which explores a large search space to
generate state-of-the-art compressible NNs for NPU. ImageNet results show UDC
networks are up to smaller (iso-accuracy) or 6.25% more accurate
(iso-model size) than previous work
The Alternating BWT: An algorithmic perspective
The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23], where we have shown that BWT and ABWT are part of a larger class of reversible transformations, here we provide a combinatorial and algorithmic study of the novel transform ABWT. We establish a deep analogy between BWT and ABWT by proving they are the only ones in the above mentioned class to be rank-invertible, a novel notion guaranteeing efficient invertibility. In addition, we show that the backward-search procedure can be efficiently generalized to the ABWT; this result implies that also the ABWT can be used as a basis for efficient compressed full text indices. Finally, we prove that the ABWT can be efficiently computed by using a combination of the Difference Cover suffix sorting algorithm (K\ue4rkk\ue4inen et al., 2006 [28]) with a linear time algorithm for finding the minimal cyclic rotation of a word with respect to the alternating lexicographical order
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Übertragung von Audiosignalen verwendet. Für
professionelle Anwendungen, wie etwa die Archivierung und Übertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
überlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprünglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprünglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Remote Sensing Data Compression
A huge amount of data is acquired nowadays by different remote sensing systems installed on satellites, aircrafts, and UAV. The acquired data then have to be transferred to image processing centres, stored and/or delivered to customers. In restricted scenarios, data compression is strongly desired or necessary. A wide diversity of coding methods can be used, depending on the requirements and their priority. In addition, the types and properties of images differ a lot, thus, practical implementation aspects have to be taken into account. The Special Issue paper collection taken as basis of this book touches on all of the aforementioned items to some degree, giving the reader an opportunity to learn about recent developments and research directions in the field of image compression. In particular, lossless and near-lossless compression of multi- and hyperspectral images still remains current, since such images constitute data arrays that are of extremely large size with rich information that can be retrieved from them for various applications. Another important aspect is the impact of lossless compression on image classification and segmentation, where a reasonable compromise between the characteristics of compression and the final tasks of data processing has to be achieved. The problems of data transition from UAV-based acquisition platforms, as well as the use of FPGA and neural networks, have become very important. Finally, attempts to apply compressive sensing approaches in remote sensing image processing with positive outcomes are observed. We hope that readers will find our book useful and interestin
- …