328 research outputs found
Coding overcomplete representations of audio using the MCLT
We propose a system for audio coding using the modulated complex
lapped transform (MCLT). In general, it is difficult to encode signals using
overcomplete representations without avoiding a penalty in rate-distortion
performance. We show that the penalty can be significantly reduced for
MCLT-based representations, without the need for iterative methods of
sparsity reduction. We achieve that via a magnitude-phase polar quantization
and the use of magnitude and phase prediction. Compared to systems based
on quantization of orthogonal representations such as the modulated lapped
transform (MLT), the new system allows for reduced warbling artifacts and
more precise computation of frequency-domain auditory masking functions
Efficiency in audio processing : filter banks and transcoding
Audio transcoding is the conversion of digital audio from one compressed form A to another compressed form B, where A and B have different compression properties, such as a different bit-rate, sampling frequency or compression method. This is typically achieved by decoding A to an intermediate uncompressed form, and then encoding it to B. A significant portion of the involved computational effort pertains to operating the synthesis filter bank, which is an important processing block in the decoding stage, and the analysis filter bank, which is an important processing block in the encoding stage. This thesis presents methods for efficient implementations of filter banks and audio transcoders, and is separated into two main parts. In the first part, a new class of Frequency Response Masking (FRM) filter banks is introduced. These filter banks are usually characterized by comprising a tree-structured cascade of subfilters, which have small individual filter lengths. Methods of complexity reduction are proposed for the scenarios when the filter banks are operated in single-rate mode, and when they are operated in multirate mode; and for the scenarios when the input signal is real-valued, and when it is complex-valued. An efficient variable bandwidth FRM filter bank is designed by using signed-powers-of-two reduction of its subfilter coefficients. Our design has a complexity an order lower than that of an octave filter bank with the same specifications. In the second part, the audio transcoding process is analyzed. Audio transcoding is modeled as a cascaded quantization process, and the cascaded quantization of an input signal is analyzed under different conditions, for the MPEG 1 Layer 2 and MP3 compression methods. One condition is the input-to-output delay of the transcoder, which is known to have an impact on the audio quality of the transcoded material. Methods to reduce the error in a cascaded quantization process are also proposed. An ultra-fast MP3 transcoder that requires only integer operations is proposed and implemented in software. Our implementation shows an improvement by a factor of 5 to 16 over other best known transcoders in terms of execution speed
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populÀren Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden hÀufig zur
effizienten Speicherung und Ăbertragung von Audiosignalen verwendet. FĂŒr
professionelle Anwendungen, wie etwa die Archivierung und Ăbertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen AnsĂ€tze fĂŒr gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf FilterbÀnken, wie etwa der
ĂŒberlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prÀdiktive Codierung zur Redundanzreduktion. Nur wenige
AnsÀtze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit prÀsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten ĂŒberlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprĂŒnglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch fĂŒr FilterbĂ€nke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprĂŒnglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. SchlieĂlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populÀren Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden hÀufig zur
effizienten Speicherung und Ăbertragung von Audiosignalen verwendet. FĂŒr
professionelle Anwendungen, wie etwa die Archivierung und Ăbertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen AnsĂ€tze fĂŒr gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf FilterbÀnken, wie etwa der
ĂŒberlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prÀdiktive Codierung zur Redundanzreduktion. Nur wenige
AnsÀtze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit prÀsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten ĂŒberlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprĂŒnglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch fĂŒr FilterbĂ€nke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprĂŒnglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. SchlieĂlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Detection and localization of double compression in MP3 audio tracks
In this work, by exploiting the traces left by double compression in the statistics of quantized modified discrete cosine transform coefficients, a single measure has been derived that allows to decide whether an MP3 file is singly or doubly compressed and, in the last case, to devise also the bit-rate of the first compression. Moreover, the proposed method as well as two state-of-the-art methods have been applied to analyze short temporal windows of the track, allowing the localization of possible tampered portions in the MP3 file under analysis. Experiments confirm the good performance of the proposed scheme and demonstrate that current detection methods are useful for tampering localization, thus offering a new tool for the forensic analysis of MP3 audio tracks
Pixinwav: Residual steganography for hiding pixels in audio
Steganography comprises the mechanics of hiding data in a host media that may be publicly available. While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spectrograms. Among our results, we find that the residual steganography setup we propose allows an encoding of the hidden image that is independent from the host audio without compromising quality. Accordingly, while previous works require both host and hidden signals to hide a signal, PixInWav can encode images offlineâwhich can be later hidden, in a residual fashion, into any audio signal.Work partially supported by the European Union through the Erasmus+ student mobility program, Science Foundation Ireland (SFI) under grant numbers SFI/15/SIRG/3283 and SFI/12/RC/2289 P2, and the Spanish Research Agency (AEI) under project PID2020117142GB-I00 of the call MCIN/ AEI /10.13039/501100011033.Peer ReviewedPostprint (author's final draft
Design of digital IP block for discrete cosine transform
Tato diplomovĂĄ prĂĄce se zabĂœvĂĄ nĂĄvrhem IP bloku pro diskrĂ©tnĂ kosinovou transformaci. V~teoretickĂ© ÄĂĄsti jsou shrnuty algoritmy pro vĂœpoÄet diskrĂ©tnĂ kosinovĂ© transformace a diskutovĂĄna jejich pouĆŸitelnost v~hardwaru. ZvolenĂœ algoritmus pro hardwarovou implementaci je modelovĂĄn v jazyce C. PotĂ© je popsĂĄn na RTL Ășrovni, verifikovĂĄn a je provedena syntĂ©za v~technologii TSMC 65 nm. HardwarovĂĄ implementace je potĂ© zhodnocena s ohledem na datovou propustnost, plochu, rychlost and spotĆebu.This diploma thesis deals with design of IP block for discrete cosine transform. Theoretical part summarizes algorithms for computation of discrete cosine transform and their hardware usability is discussed. Chosen algorithm for hardware implementation is modeled in C language. Algorithm is described at RTL level, verified and synthesized to TSMC 65 nm technology. Hardware implementation is then evaluated with respect of throughput, area, speed and power consumption.
- âŠ