1,063 research outputs found

    Audio Coding Based on Integer Transforms

    Get PDF
    Die Audiocodierung hat sich in den letzten Jahren zu einem sehr populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3 (MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur effizienten Speicherung und Übertragung von Audiosignalen verwendet. Für professionelle Anwendungen, wie etwa die Archivierung und Übertragung im Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht. Die bisherigen Ansätze für gehörangepasste und verlustlose Audiocodierung sind technisch völlig verschieden. Moderne gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der überlappenden orthogonalen Transformation "Modifizierte Diskrete Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden bisher versucht. Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das Lifting-Schema auf die in der gehörangepassten Audiocodierung verwendeten überlappenden Transformationen anwendet. Dies ermöglicht eine invertierbare Integer-Approximation der ursprünglichen Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler Lifting-Ansatz und eine Technik zur Spektralformung von Quantisierungsfehlern eine Verbesserung der Approximation der ursprünglichen Transformation. Basierend auf diesen neuen Integer-Transformationen werden in dieser Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren umfassen verlustlose Audiocodierung, eine skalierbare verlustlose Erweiterung eines gehörangepassten Audiocoders und einen integrierten Ansatz zur fein skalierbaren gehörangepassten und verlustlosen Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for research and applications. Especially perceptual audio coding schemes, such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are widely used for efficient storage and transmission of music signals. Nevertheless, for professional applications, such as archiving and transmission in studio environments, lossless audio coding schemes are considered more appropriate. Traditionally, the technical approaches used in perceptual and lossless audio coding have been separate worlds. In perceptual audio coding, the use of filter banks, such as the lapped orthogonal transform "Modified Discrete Cosine Transform" (MDCT), has been the approach of choice being used by many state of the art coding schemes. On the other hand, lossless audio coding schemes mostly employ predictive coding of waveforms to remove redundancy. Only few attempts have been made so far to use transform coding for the purpose of lossless audio coding. This work presents a new approach of applying the lifting scheme to lapped transforms used in perceptual audio coding. This allows for an invertible integer-to-integer approximation of the original transform, e.g. the IntMDCT as an integer approximation of the MDCT. The same technique can also be applied to low-delay filter banks. A generalized, multi-dimensional lifting approach and a noise-shaping technique are introduced, allowing to further optimize the accuracy of the approximation to the original transform. Based on these new integer transforms, this work presents new audio coding schemes and applications. The audio coding applications cover lossless audio coding, scalable lossless enhancement of a perceptual audio coder and fine-grain scalable perceptual and lossless audio coding. Finally an approach to data hiding with high data rates in uncompressed audio signals based on integer transforms is described

    Audio Coding Based on Integer Transforms

    Get PDF
    Die Audiocodierung hat sich in den letzten Jahren zu einem sehr populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3 (MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur effizienten Speicherung und Übertragung von Audiosignalen verwendet. Für professionelle Anwendungen, wie etwa die Archivierung und Übertragung im Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht. Die bisherigen Ansätze für gehörangepasste und verlustlose Audiocodierung sind technisch völlig verschieden. Moderne gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der überlappenden orthogonalen Transformation "Modifizierte Diskrete Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden bisher versucht. Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das Lifting-Schema auf die in der gehörangepassten Audiocodierung verwendeten überlappenden Transformationen anwendet. Dies ermöglicht eine invertierbare Integer-Approximation der ursprünglichen Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler Lifting-Ansatz und eine Technik zur Spektralformung von Quantisierungsfehlern eine Verbesserung der Approximation der ursprünglichen Transformation. Basierend auf diesen neuen Integer-Transformationen werden in dieser Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren umfassen verlustlose Audiocodierung, eine skalierbare verlustlose Erweiterung eines gehörangepassten Audiocoders und einen integrierten Ansatz zur fein skalierbaren gehörangepassten und verlustlosen Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for research and applications. Especially perceptual audio coding schemes, such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are widely used for efficient storage and transmission of music signals. Nevertheless, for professional applications, such as archiving and transmission in studio environments, lossless audio coding schemes are considered more appropriate. Traditionally, the technical approaches used in perceptual and lossless audio coding have been separate worlds. In perceptual audio coding, the use of filter banks, such as the lapped orthogonal transform "Modified Discrete Cosine Transform" (MDCT), has been the approach of choice being used by many state of the art coding schemes. On the other hand, lossless audio coding schemes mostly employ predictive coding of waveforms to remove redundancy. Only few attempts have been made so far to use transform coding for the purpose of lossless audio coding. This work presents a new approach of applying the lifting scheme to lapped transforms used in perceptual audio coding. This allows for an invertible integer-to-integer approximation of the original transform, e.g. the IntMDCT as an integer approximation of the MDCT. The same technique can also be applied to low-delay filter banks. A generalized, multi-dimensional lifting approach and a noise-shaping technique are introduced, allowing to further optimize the accuracy of the approximation to the original transform. Based on these new integer transforms, this work presents new audio coding schemes and applications. The audio coding applications cover lossless audio coding, scalable lossless enhancement of a perceptual audio coder and fine-grain scalable perceptual and lossless audio coding. Finally an approach to data hiding with high data rates in uncompressed audio signals based on integer transforms is described

    A new model-based algorithm for optimizing the MPEG-AAC in MS-stereo

    No full text
    International audienceIn this paper, a new model-based algorithm for optimizing the MPEG-Advanced Audio Coder (AAC) in MS-stereo mode is presented. This algorithm is an extension to stereo signals of prior work on a statistical model of quantization noise. Traditionally, MS-stereo coding approaches replace the Left (L) and Right (R) channels by the Middle (M) and Sides (S) channels, each channel being independently processed, almost like a monophonic signal. In contrast, our method proposes a global approach for coding both channels in the same process. A model for the quantization error allows us to tune the quantizers on channels M and S with respect to a distortion constraint on the reconstructed channels L and R as they will appear in the decoder. This approach leads to a more efficient perceptual noise-shaping and avoids using complex psychoacoustic models built on the M and S channels. Furthermore, it provides a straightforward scheme to choose between LR and MS modes in each subband for each frame. Subjective listening tests prove that the coding efficiency at a medium bitrate (96 kbits/s for both channels) is significantly better with our algorithm than with the standard algorithm, without increase of complexity

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    MASCOT : metadata for advanced scalable video coding tools : final report

    Get PDF
    The goal of the MASCOT project was to develop new video coding schemes and tools that provide both an increased coding efficiency as well as extended scalability features compared to technology that was available at the beginning of the project. Towards that goal the following tools would be used: - metadata-based coding tools; - new spatiotemporal decompositions; - new prediction schemes. Although the initial goal was to develop one single codec architecture that was able to combine all new coding tools that were foreseen when the project was formulated, it became clear that this would limit the selection of the new tools. Therefore the consortium decided to develop two codec frameworks within the project, a standard hybrid DCT-based codec and a 3D wavelet-based codec, which together are able to accommodate all tools developed during the course of the project

    Efficient Coding of Shape and Transparency for Video Objects

    Get PDF

    Network streaming and compression for mixed reality tele-immersion

    Get PDF
    Bulterman, D.C.A. [Promotor]Cesar, P.S. [Copromotor

    Novel Ideas for Lossless Audio Coding

    Get PDF
    Novel ideas for lossless audio coding analyzed in the paper are linked with forward predictor adaptation, and concern optimization of predictors on the basis of zero-orderentropy and MMAE criterions, and context sound coding. Direct use of the former criterion is linked with exponential growth of optimization procedure, hence, a suboptimal algorithm having polynomial complexity is proposed. It is shown that on average the new types of predictors are better than those obtained by MMSE technique, while two- and three context systems are on average better than a single predictor one. It also appears that 7-bit PARCOR coefficients in the MPEG-4 ALS standard have insufficient precision for some predictor length, and that for very long frames coding results improve with the predictor rank practically in unlimited way
    corecore