843 research outputs found
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Übertragung von Audiosignalen verwendet. Für
professionelle Anwendungen, wie etwa die Archivierung und Übertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
überlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprünglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprünglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Übertragung von Audiosignalen verwendet. Für
professionelle Anwendungen, wie etwa die Archivierung und Übertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
überlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprünglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprünglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Scalable and perceptual audio compression
This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner
Combined Industry, Space and Earth Science Data Compression Workshop
The sixth annual Space and Earth Science Data Compression Workshop and the third annual Data Compression Industry Workshop were held as a single combined workshop. The workshop was held April 4, 1996 in Snowbird, Utah in conjunction with the 1996 IEEE Data Compression Conference, which was held at the same location March 31 - April 3, 1996. The Space and Earth Science Data Compression sessions seek to explore opportunities for data compression to enhance the collection, analysis, and retrieval of space and earth science data. Of particular interest is data compression research that is integrated into, or has the potential to be integrated into, a particular space or earth science data information system. Preference is given to data compression research that takes into account the scien- tist's data requirements, and the constraints imposed by the data collection, transmission, distribution and archival systems
Audio Compression using a Modified Vector Quantization algorithm for Mastering Applications
Audio data compression is used to reduce the transmission bandwidth and storage requirements of audio data. It is the second stage in the audio mastering process with audio equalization being the first stage. Compression algorithms such as BSAC, MP3 and AAC are used as standards in this paper. The challenge faced in audio compression is compressing the signal at low bit rates. The previous algorithms which work well at low bit rates cannot be dominant at higher bit rates and vice-versa. This paper proposes an altered form of vector quantization algorithm which produces a scalable bit stream which has a number of fine layers of audio fidelity. This modified form of the vector quantization algorithm is used to generate a perceptually audio coder which is scalable and uses the quantization and encoding stages which are responsible for the psychoacoustic and arithmetical terminations that are actually detached as practically all the data detached during the prediction phases at the encoder side is supplemented towards the audio signal at decoder stage. Therefore, clearly the quantization phase which is modified to produce a bit stream which is scalable. This modified algorithm works well at both lower and higher bit rates. Subjective evaluations were done by audio professionals using the MUSHRA test and the mean normalized scores at various bit rates was noted and compared with the previous algorithms
3D Medical Image Lossless Compressor Using Deep Learning Approaches
The ever-increasing importance of accelerated information processing, communica-tion, and storing are major requirements within the big-data era revolution. With the extensive rise in data availability, handy information acquisition, and growing data rate, a critical challenge emerges in efficient handling. Even with advanced technical hardware developments and multiple Graphics Processing Units (GPUs) availability, this demand is still highly promoted to utilise these technologies effectively. Health-care systems are one of the domains yielding explosive data growth. Especially when considering their modern scanners abilities, which annually produce higher-resolution and more densely sampled medical images, with increasing requirements for massive storage capacity. The bottleneck in data transmission and storage would essentially be handled with an effective compression method. Since medical information is critical and imposes an influential role in diagnosis accuracy, it is strongly encouraged to guarantee exact reconstruction with no loss in quality, which is the main objective of any lossless compression algorithm. Given the revolutionary impact of Deep Learning (DL) methods in solving many tasks while achieving the state of the art results, includ-ing data compression, this opens tremendous opportunities for contributions. While considerable efforts have been made to address lossy performance using learning-based approaches, less attention was paid to address lossless compression. This PhD thesis investigates and proposes novel learning-based approaches for compressing 3D medical images losslessly.Firstly, we formulate the lossless compression task as a supervised sequential prediction problem, whereby a model learns a projection function to predict a target voxel given sequence of samples from its spatially surrounding voxels. Using such 3D local sampling information efficiently exploits spatial similarities and redundancies in a volumetric medical context by utilising such a prediction paradigm. The proposed NN-based data predictor is trained to minimise the differences with the original data values while the residual errors are encoded using arithmetic coding to allow lossless reconstruction.Following this, we explore the effectiveness of Recurrent Neural Networks (RNNs) as a 3D predictor for learning the mapping function from the spatial medical domain (16 bit-depths). We analyse Long Short-Term Memory (LSTM) models’ generalisabil-ity and robustness in capturing the 3D spatial dependencies of a voxel’s neighbourhood while utilising samples taken from various scanning settings. We evaluate our proposed MedZip models in compressing unseen Computerized Tomography (CT) and Magnetic Resonance Imaging (MRI) modalities losslessly, compared to other state-of-the-art lossless compression standards.This work investigates input configurations and sampling schemes for a many-to-one sequence prediction model, specifically for compressing 3D medical images (16 bit-depths) losslessly. The main objective is to determine the optimal practice for enabling the proposed LSTM model to achieve a high compression ratio and fast encoding-decoding performance. A solution for a non-deterministic environments problem was also proposed, allowing models to run in parallel form without much compression performance drop. Compared to well-known lossless codecs, experimental evaluations were carried out on datasets acquired by different hospitals, representing different body segments, and have distinct scanning modalities (i.e. CT and MRI).To conclude, we present a novel data-driven sampling scheme utilising weighted gradient scores for training LSTM prediction-based models. The objective is to determine whether some training samples are significantly more informative than others, specifically in medical domains where samples are available on a scale of billions. The effectiveness of models trained on the presented importance sampling scheme was evaluated compared to alternative strategies such as uniform, Gaussian, and sliced-based sampling
Language Modeling Is Compression
It has long been established that predictive models can be transformed into
lossless compressors and vice versa. Incidentally, in recent years, the machine
learning community has focused on training increasingly large and powerful
self-supervised (language) models. Since these large language models exhibit
impressive predictive capabilities, they are well-positioned to be strong
compressors. In this work, we advocate for viewing the prediction problem
through the lens of compression and evaluate the compression capabilities of
large (foundation) models. We show that large language models are powerful
general-purpose predictors and that the compression viewpoint provides novel
insights into scaling laws, tokenization, and in-context learning. For example,
Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to
43.4% and LibriSpeech samples to 16.4% of their raw size, beating
domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively.
Finally, we show that the prediction-compression equivalence allows us to use
any compressor (like gzip) to build a conditional generative model
Context-based bit plane golomb coder for scalable image coding
Master'sMASTER OF ENGINEERIN
- …