599 research outputs found
Lexicographic Bit Allocation for MPEG Video
We consider the problem of allocating bits among pictures in an MPEG video coder to equalize
the visual quality of the coded pictures, while meeting bu er and channel constraints imposed by
the MPEG Video Bu ering Veri er. We address this problem within a framework that consists of
three components: 1) a bit production model for the input pictures, 2) a set of bit-rate constraints
imposed by the Video Bu ering Veri er, and 3) a novel lexicographic criterion for optimality.
Under this framework, we derive simple necessary and su cient conditions for optimality that lead
to e cient algorithms
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Ăśbertragung von Audiosignalen verwendet. FĂĽr
professionelle Anwendungen, wie etwa die Archivierung und Ăśbertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
ĂĽberlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprĂĽnglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprĂĽnglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. SchlieĂźlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Ăśbertragung von Audiosignalen verwendet. FĂĽr
professionelle Anwendungen, wie etwa die Archivierung und Ăśbertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
ĂĽberlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprĂĽnglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprĂĽnglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. SchlieĂźlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Frequency-warped autoregressive modeling and filtering
This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles.
Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications.
Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe
Recommended from our members
Error resilient video transcoding for robust inter-network communications using GPRS
A novel fully comprehensive mobile video communications
system is proposed in this paper. This system exploits
the useful rate management features of the video transcoders and
combines them with error resilience for transmissions of coded
video streams over general packet radio service (GPRS) mobileaccess
networks. The error-resilient video transcoding operation
takes place at a centralized point, referred to as a video proxy,
which provides the necessary output transmission rates with the
required amount of robustness. With the use of this proposed
algorithm, error resilience can be added to an already compressed
video stream at an intermediate stage at the edge of two or more
different networks through two resilience schemes, namely the
adaptive intra refresh (AIR) and feedback control signaling (FCS)
methods. Both resilience tools impose an output rate increase
which can also be prevented with the proposed novel technique in
this paper. Thus, an error-resilient video transcoding scheme is
presented to give robust video outputs at near target transmission
rates that only require the same number of GPRS timeslots as
the nonresilient schemes. Moreover, an ultimate robustness is
also accomplished with the combination of the two resilience
algorithms at the video proxy. Extensive computer simulations
demonstrate the effectiveness of the proposed system
Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding
Speech codecs learn compact representations of speech signals to facilitate
data transmission. Many recent deep neural network (DNN) based end-to-end
speech codecs achieve low bitrates and high perceptual quality at the cost of
model complexity. We propose a cross-module residual learning (CMRL) pipeline
as a module carrier with each module reconstructing the residual from its
preceding modules. CMRL differs from other DNN-based speech codecs, in that
rather than modeling speech compression problem in a single large neural
network, it optimizes a series of less-complicated modules in a two-phase
training scheme. The proposed method shows better objective performance than
AMR-WB and the state-of-the-art DNN-based speech codec with a similar network
architecture. As an end-to-end model, it takes raw PCM signals as an input, but
is also compatible with linear predictive coding (LPC), showing better
subjective quality at high bitrates than AMR-WB and OPUS. The gain is achieved
by using only 0.9 million trainable parameters, a significantly less complex
architecture than the other DNN-based codecs in the literature.Comment: Accepted for publication in INTERSPEECH 201
Seminario sullo Standard MPEG-4: utilizzo ed aspetti implementativi
Una delle tecnologie chiave che hanno permesso il grande sviluppo della televisione digitale è la compressione video. La tecnologia di codifica video nota come MPEG-2, sviluppata nei primi anni novanta, è diventata lo standard di trasmissione DTV (Digital TV) sia satellitare sia terrestre in quasi tutti i paesi del mondo. Da allora la velocità dei microprocessori e le capacità di memoria dei dispositivi hardware per la codifica e la decodifica sono migliorate significativamente rendendo possibile lo sviluppo e
l’implementazione di algoritmi di codifica innovativi in grado di abbattere significativamente i limiti di compressione dello standard MPEG-2. Tali innovazioni, sfociate nel 2003 nello standard MPEG-4 AVC (Advanced Video Coding), non hanno permesso di mantenere la compatibilità all’indietro con l’MPEG-2, e questo ha inizialmente costituito un limite alla loro introduzione nei sistemi di trasmissione DTV. Tuttavia, negli ultimi anni la codifica MPEG-4 AVC si è diffusa rapidamente, è stata adottata dal progetto DVB, recentemente dall’ATSC, ed è lo standard di codifica nell’IPTV.
L’obiettivo di questo seminario, che si articola in due giornate, è quello di presentare lo standard di codifica MPEG-4 AVC con particolare attenzione agli aspetti implementativi del livello di codifica video.2008-11-18Sardegna Ricerche, Edificio 2, Località Piscinamanna 09010 Pula (CA) - ItaliaSeminario sullo Standard MPEG-4: utilizzo ed aspetti implementativ
- …