217 research outputs found
High Quality Audio Coding with MDCTNet
We propose a neural audio generative model, MDCTNet, operating in the
perceptually weighted domain of an adaptive modified discrete cosine transform
(MDCT). The architecture of the model captures correlations in both time and
frequency directions with recurrent layers (RNNs). An audio coding system is
obtained by training MDCTNet on a diverse set of fullband monophonic audio
signals at 48 kHz sampling, conditioned by a perceptual audio encoder. In a
subjective listening test with ten excerpts chosen to be balanced across
content types, yet stressful for both codecs, the mean performance of the
proposed system for 24 kb/s variable bitrate (VBR) is similar to that of Opus
at twice the bitrate.Comment: Five pages, five figure
Entropy coding of Quantized Spectral Components in FDLP audio codec
Audio codec based on Frequency Domain Linear Prediction (FDLP) exploits auto-regressive modeling to approximate instantaneous energy in critical frequency sub-bands of relatively long input segments. Current version of the FDLP codec operating at 66 kbps has shown to provide comparable subjective listening quality results to the state-of-the-art codecs on similar bit-rates even without employing strategic blocks, such as entropy coding or simultaneous masking. This paper describes an experimental work to increase compression efficiency of the FDLP codec provided by employing entropy coding. Unlike traditionally used Huffman coding in current audio coding systems, we describe an efficient way to exploit Arithmetic coding to entropy compress quantized magnitude spectral components of the sub-band FDLP residuals. Such approach outperforms Huffman coding algorithm and provides more than 3 kbps bit-rate reduction
Autoregressive Modelling of Hilbert Envelopes for Wide-band Audio Coding
Frequency Domain Linear Prediction (FDLP) represents the technique for approximating temporal envelopes of a signal using autoregressive models. In this paper, we propose a wide-band audio coding system exploiting FDLP. Specifically, FDLP is applied on critically sampled sub-bands to model the Hilbert envelopes. The residual of the linear prediction forms the Hilbert carrier, which is transmitted along with the envelope parameters. This process is reversed at the decoder to reconstruct the signal. In the objective and subjective quality evaluations, the FDLP based audio codec at kbps provides competitive results compared to the state-of-art codecs at similar bit-rates
Non-uniform QMF Decomposition for Wide-band Audio Coding based on Frequency Domain Linear Prediction
This paper presents a new technique for perfect reconstruction non-uniform QMF decomposition developed to increase efficiency of a generic wide-band audio coding system based on Frequency Domain Linear Prediction (FDLP). The base line FDLP codec, operating at high bit-rates (~136 kbps), exploits an uniform QMF decomposition into 64 sub-bands followed by sub-band processing based on FDLP. Here, we propose a non-uniform QMF decomposition into 32 frequency sub-bands obtained by merging 64 uniform QMF bands. The merging operation is performed in such a way that bandwidths of the resulting critically sampled sub-bands emulate the characteristics of the critical band filters in the human auditory system. Such frequency decomposition, when employed in the FDLP audio codec, results in a bit-rate reduction of 40% over the base line. We also describe the complete audio codec, which provides high-fidelity audio compression at ~66 kbps. In subjective listening tests, the FDLP codec outperforms MPEG-1 Layer 3 (MP3) and achieves similar qualities as MPEG-4 AAC+ standard
Reducing Audible Spectral Discontinuities
In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities
Parametric coding of stereo audio
Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the parameterized description of spatial properties enables a highly efficient, high-quality stereo audio representation
Scalable and perceptual audio compression
This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner
Error Resilient Speech Coding Using Sub-band Hilbert Envelopes
Frequency Domain Linear Prediction (FDLP) represents a technique for auto-regressive modelling of Hilbert envelopes of a signal. In this paper, we propose a speech coding technique that uses FDLP in Quadrature Mirror Filter (QMF) sub-bands of short segments of the speech signal (25 ms). Line Spectral Frequency parameters related to au-toregressive models and the spectral components of the residual signals are transmitted. For simulating the effects of lossy transmission channels, bit-packets are dropped randomly. In the objective and subjective quality evaluations, the proposed FDLP speech codec is judged to be more resilient to bit-packet losses compared to the state-of-the-art Adaptive Multi-Rate Wide-Band (AMR-WB) codec at 12 kbps
- …