75 research outputs found
Vector Sum Excited Linear Prediction (VSELP) speech coding at 4.8 kbps
Code Excited Linear Prediction (CELP) speech coders exhibit good performance at data rates as low as 4800 bps. The major drawback to CELP type coders is their larger computational requirements. The Vector Sum Excited Linear Prediction (VSELP) speech coder utilizes a codebook with a structure which allows for a very efficient search procedure. Other advantages of the VSELP codebook structure is discussed and a detailed description of a 4.8 kbps VSELP coder is given. This coder is an improved version of the VSELP algorithm, which finished first in the NSA's evaluation of the 4.8 kbps speech coders. The coder uses a subsample resolution single tap long term predictor, a single VSELP excitation codebook, a novel gain quantizer which is robust to channel errors, and a new adaptive pre/postfilter arrangement
Vector adaptive predictive coder for speech and audio
A real-time vector adaptive predictive coder which approximates each vector of K speech samples by using each of M fixed vectors in a first codebook to excite a time-varying synthesis filter and picking the vector that minimizes distortion. Predictive analysis for each frame determines parameters used for computing from vectors in the first codebook zero-state response vectors that are stored at the same address (index) in a second codebook. Encoding of input speech vectors s.sub.n is then carried out using the second codebook. When the vector that minimizes distortion is found, its index is transmitted to a decoder which has a codebook identical to the first codebook of the decoder. There the index is used to read out a vector that is used to synthesize an output speech vector s.sub.n. The parameters used in the encoder are quantized, for example by using a table, and the indices are transmitted to the decoder where they are decoded to specify transfer characteristics of filters used in producing the vector s.sub.n from the receiver codebook vector selected by the vector index transmitted
New Directions in Subband Coding
Two very different subband coders are described. The first is a modified dynamic bit-allocation-subband coder (D-SBC) designed for variable rate coding situations and easily adaptable to noisy channel environments. It can operate at rates as low as 12 kb/s and still give good quality speech. The second coder is a 16-kb/s waveform coder, based on a combination of subband coding and vector quantization (VQ-SBC). The key feature of this coder is its short coding delay, which makes it suitable for real-time communication networks. The speech quality of both coders has been enhanced by adaptive postfiltering. The coders have been implemented on a single AT&T DSP32 signal processo
NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping
Speech codec enhancement methods are designed to remove distortions added by
speech codecs. While classical methods are very low in complexity and add zero
delay, their effectiveness is rather limited. Compared to that, DNN-based
methods deliver higher quality but they are typically high in complexity and/or
require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE)
addresses this problem by combining DNNs with classical long-term/short-term
postfiltering resulting in a causal low-complexity model. A short-coming of the
LACE model is, however, that quality quickly saturates when the model size is
scaled up. To mitigate this problem, we propose a novel adatpive temporal
shaping module that adds high temporal resolution to the LACE model resulting
in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance
the Opus codec and show that NoLACE significantly outperforms both the Opus
baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE
and NoLACE are well-behaved when used with an ASR system.Comment: submitted to ICASSP 202
Postfiltering techniques in low bit-rate speech coders
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (leaves 78-80).by Azhar K. Mustapha.M.Eng
LACE: A light-weight, causal model for enhancing coded speech through adaptive convolutions
Classical speech coding uses low-complexity postfilters with zero lookahead
to enhance the quality of coded speech, but their effectiveness is limited by
their simplicity. Deep Neural Networks (DNNs) can be much more effective, but
require high complexity and model size, or added delay. We propose a DNN model
that generates classical filter kernels on a per-frame basis with a model of
just 300~K parameters and 100~MFLOPS complexity, which is a practical
complexity for desktop or mobile device CPUs. The lack of added delay allows it
to be integrated into the Opus codec, and we demonstrate that it enables
effective wideband encoding for bitrates down to 6 kb/s.Comment: 5 pages, accepted at WASPAA 202
- …