516 research outputs found

    Efficient compression of motion compensated residuals

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Perceptually-Driven Video Coding with the Daala Video Codec

    Full text link
    The Daala project is a royalty-free video codec that attempts to compete with the best patent-encumbered codecs. Part of our strategy is to replace core tools of traditional video codecs with alternative approaches, many of them designed to take perceptual aspects into account, rather than optimizing for simple metrics like PSNR. This paper documents some of our experiences with these tools, which ones worked and which did not. We evaluate which tools are easy to integrate into a more traditional codec design, and show results in the context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital Image Processing (ADIP), 201

    Gaussian Mixture Model-based Quantization of Line Spectral Frequencies for Adaptive Multirate Speech Codec

    Get PDF
    In this paper, we investigate the use of a Gaussian MixtureModel (GMM)-based quantizer for quantization of the Line Spectral Frequencies (LSFs) in the Adaptive Multi-Rate (AMR) speech codec. We estimate the parametric GMM model of the probability density function (pdf) for the prediction error (residual) of mean-removed LSF parameters that are used in the AMR codec for speech spectral envelope representation. The studied GMM-based quantizer is based on transform coding using Karhunen-Loeve transform (KLT) and transform domain scalar quantizers (SQ) individually designed for each Gaussian mixture. We have investigated the applicability of such a quantization scheme in the existing AMR codec by solely replacing the AMR LSF quantization algorithm segment. The main novelty in this paper lies in applying and adapting the entropy constrained (EC) coding for fixed-rate scalar quantization of transformed residuals thereby allowing for better adaptation to the local statistics of the source. We study and evaluate the compression efficiency, computational complexity and memory requirements of the proposed algorithm. Experimental results show that the GMM-based EC quantizer provides better rate/distortion performance than the quantization schemes used in the referent AMR codec by saving up to 7.32 bits/frame at much lower rate-independent computational complexity and memory requirements

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    Parental finite state vector quantizer and vector wavelet transform-linear predictive coding.

    Get PDF
    by Lam Chi Wah.Thesis submitted in: December 1997.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 89-91).Abstract also in Chinese.Chapter Chapter 1 --- Introduction to Data Compression and Image Coding --- p.1Chapter 1.1 --- Introduction --- p.1Chapter 1.2 --- Fundamental Principle of Data Compression --- p.2Chapter 1.3 --- Some Data Compression Algorithms --- p.3Chapter 1.4 --- Image Coding Overview --- p.4Chapter 1.5 --- Image Transformation --- p.5Chapter 1.6 --- Quantization --- p.7Chapter 1.7 --- Lossless Coding --- p.8Chapter Chapter 2 --- Subband Coding and Wavelet Transform --- p.9Chapter 2.1 --- Subband Coding Principle --- p.9Chapter 2.2 --- Perfect Reconstruction --- p.11Chapter 2.3 --- Multi-Channel System --- p.13Chapter 2.4 --- Discrete Wavelet Transform --- p.13Chapter Chapter 3 --- Vector Quantization (VQ) --- p.16Chapter 3.1 --- Introduction --- p.16Chapter 3.2 --- Basic Vector Quantization Procedure --- p.17Chapter 3.3 --- Codebook Searching and the LBG Algorithm --- p.18Chapter 3.3.1 --- Codebook --- p.18Chapter 3.3.2 --- LBG Algorithm --- p.19Chapter 3.4 --- Problem of VQ and Variations of VQ --- p.21Chapter 3.4.1 --- Classified VQ (CVQ) --- p.22Chapter 3.4.2 --- Finite State VQ (FSVQ) --- p.23Chapter 3.5 --- Vector Quantization on Wavelet Coefficients --- p.24Chapter Chapter 4 --- Vector Wavelet Transform-Linear Predictor Coding --- p.26Chapter 4.1 --- Image Coding Using Wavelet Transform with Vector Quantization --- p.26Chapter 4.1.1 --- Future Standard --- p.26Chapter 4.1.2 --- Drawback of DCT --- p.27Chapter 4.1.3 --- "Wavelet Coding and VQ, the Future Trend" --- p.28Chapter 4.2 --- Mismatch between Scalar Transformation and VQ --- p.29Chapter 4.3 --- Vector Wavelet Transform (VWT) --- p.30Chapter 4.4 --- Example of Vector Wavelet Transform --- p.34Chapter 4.5 --- Vector Wavelet Transform - Linear Predictive Coding (VWT-LPC) --- p.36Chapter 4.6 --- An Example of VWT-LPC --- p.38Chapter Chapter 5 --- Vector Quantizaton with Inter-band Bit Allocation (IBBA) --- p.40Chapter 5.1 --- Bit Allocation Problem --- p.40Chapter 5.2 --- Bit Allocation for Wavelet Subband Vector Quantizer --- p.42Chapter 5.2.1 --- Multiple Codebooks --- p.42Chapter 5.2.2 --- Inter-band Bit Allocation (IBBA) --- p.42Chapter Chapter 6 --- Parental Finite State Vector Quantizers (PFSVQ) --- p.45Chapter 6.1 --- Introduction --- p.45Chapter 6.2 --- Parent-Child Relationship Between Subbands --- p.46Chapter 6.3 --- Wavelet Subband Vector Structures for VQ --- p.48Chapter 6.3.1 --- VQ on Separate Bands --- p.48Chapter 6.3.2 --- InterBand Information for Intraband Vectors --- p.49Chapter 6.3.3 --- Cross band Vector Methods --- p.50Chapter 6.4 --- Parental Finite State Vector Quantization Algorithms --- p.52Chapter 6.4.1 --- Scheme I: Parental Finite State VQ with Parent Index Equals Child Class Number --- p.52Chapter 6.4.2 --- Scheme II: Parental Finite State VQ with Parent Index Larger than Child Class Number --- p.55Chapter Chapter 7 --- Simulation Result --- p.58Chapter 7.1 --- Introduction --- p.58Chapter 7.2 --- Simulation Result of Vector Wavelet Transform (VWT) --- p.59Chapter 7.3 --- Simulation Result of Vector Wavelet Transform - Linear Predictive Coding (VWT-LPC) --- p.61Chapter 7.3.1 --- First Test --- p.61Chapter 7.3.2 --- Second Test --- p.61Chapter 7.3.3 --- Third Test --- p.61Chapter 7.4 --- Simulation Result of Vector Quantization Using Inter-band Bit Allocation (IBBA) --- p.62Chapter 7.5 --- Simulation Result of Parental Finite State Vector Quantizers (PFSVQ) --- p.63Chapter Chapter 8 --- Conclusion --- p.86REFERENCE --- p.8

    Audio Compression using a Modified Vector Quantization algorithm for Mastering Applications

    Get PDF
    Audio data compression is used to reduce the transmission bandwidth and storage requirements of audio data. It is the second stage in the audio mastering process with audio equalization being the first stage. Compression algorithms such as BSAC, MP3 and AAC are used as standards in this paper. The challenge faced in audio compression is compressing the signal at low bit rates. The previous algorithms which work well at low bit rates cannot be dominant at higher bit rates and vice-versa. This paper proposes an altered form of vector quantization algorithm which produces a scalable bit stream which has a number of fine layers of audio fidelity. This modified form of the vector quantization algorithm is used to generate a perceptually audio coder which is scalable and uses the quantization and encoding stages which are responsible for the psychoacoustic and arithmetical terminations that are actually detached as practically all the data detached during the prediction phases at the encoder side is supplemented towards the audio signal at decoder stage. Therefore, clearly the quantization phase which is modified to produce a bit stream which is scalable. This modified algorithm works well at both lower and higher bit rates. Subjective evaluations were done by audio professionals using the MUSHRA test and the mean normalized scores at various bit rates was noted and compared with the previous algorithms

    Variational Speech Waveform Compression to Catalyze Semantic Communications

    Full text link
    We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more accurately, giving rise to better compression performance. In particular, the speech signals are analyzed and synthesized by a pair of nonlinear transforms, yielding latent features. An entropy model with hyperprior is built to capture the probabilistic distribution of latent features, followed with quantization and entropy coding. The proposed waveform codec can be optimized flexibly towards arbitrary rate, and the other appealing feature is that it can be easily optimized for any differentiable loss function, including perceptual loss used in semantic communications. To further improve the fidelity, we incorporate residual coding to mitigate the degradation arising from quantization distortion at the latent space. Results indicate that achieving the same performance, the proposed method saves up to 27% coding rate than widely used adaptive multi-rate wideband (AMR-WB) codec as well as emerging neural waveform coding methods
    corecore