19,261 research outputs found
Time and frequency domain algorithms for speech coding
The promise of digital hardware economies (due to recent advances in
VLSI technology), has focussed much attention on more complex and sophisticated
speech coding algorithms which offer improved quality at relatively
low bit rates.
This thesis describes the results (obtained from computer simulations)
of research into various efficient (time and frequency domain) speech
encoders operating at a transmission bit rate of 16 Kbps.
In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM)
systems employing both forward and backward adaptive prediction were
examined. A number of algorithms were proposed and evaluated, including
several variants of the Stochastic Approximation Predictor (SAP). A
Backward Block Adaptive (BBA) predictor was also developed and found to
outperform the conventional stochastic methods, even though its complexity
in terms of signal processing requirements is lower. A simplified
Adaptive Predictive Coder (APC) employing a single tap pitch predictor
considered next provided a slight improvement in performance over ADPCM,
but with rather greater complexity.
The ultimate test of any speech coding system is the perceptual performance
of the received speech. Recent research has indicated that this
may be enhanced by suitable control of the noise spectrum according to
the theory of auditory masking. Various noise shaping ADPCM
configurations were examined, and it was demonstrated that a proposed
pre-/post-filtering arrangement which exploits advantageously the
predictor-quantizer interaction, leads to the best subjective
performance in both forward and backward prediction systems.
Adaptive quantization is instrumental to the performance of ADPCM systems.
Both the forward adaptive quantizer (AQF) and the backward oneword
memory adaptation (AQJ) were examined. In addition, a novel method
of decreasing quantization noise in ADPCM-AQJ coders, which involves the
application of correction to the decoded speech samples, provided
reduced output noise across the spectrum, with considerable high frequency
noise suppression.
More powerful (and inevitably more complex) frequency domain speech
coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder
(SBC) offer good quality speech at 16 Kbps. To reduce complexity and
coding delay, whilst retaining the advantage of sub-band coding, a novel
transform based split-band coder (TSBC) was developed and found to compare
closely in performance with the SBC.
To prevent the heavy side information requirement associated with a
large number of bands in split-band coding schemes from impairing coding
accuracy, without forgoing the efficiency provided by adaptive bit
allocation, a method employing AQJs to code the sub-band signals together
with vector quantization of the bit allocation patterns was also
proposed.
Finally, 'pipeline' methods of bit allocation and step size estimation
(using the Fast Fourier Transform (FFT) on the input signal) were examined.
Such methods, although less accurate, are nevertheless useful in
limiting coding delay associated with SRC schemes employing Quadrature
Mirror Filters (QMF)
Bit rates in audio source coding
The goal is to introduce and solve the audio coding optimization problem. Psychoacoustic results such as masking and excitation pattern models are combined with results from rate distortion theory to formulate the audio coding optimization problem. The solution of the audio optimization problem is a masked error spectrum, prescribing how quantization noise must be distributed over the audio spectrum to obtain a minimal bit rate and an inaudible coding errors. This result cannot only be used to estimate performance bounds, but can also be directly applied in audio coding systems. Subband coding applications to magnetic recording and transmission are discussed in some detail. Performance bounds for this type of subband coding system are derived
Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition
In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad
Hyperspectral image compression : adapting SPIHT and EZW to Anisotropic 3-D Wavelet Coding
Hyperspectral images present some specific characteristics that should be used by an efficient compression system. In compression, wavelets have shown a good adaptability to a wide range of data, while being of reasonable complexity. Some wavelet-based compression algorithms have been successfully used for some hyperspectral space missions. This paper focuses on the optimization of a full wavelet compression system for hyperspectral images. Each step of the compression algorithm is studied and optimized. First, an algorithm to find the optimal 3-D wavelet decomposition in a rate-distortion sense is defined. Then, it is shown that a specific fixed decomposition has almost the same performance, while being more useful in terms of complexity issues. It is shown that this decomposition significantly improves the classical isotropic decomposition. One of the most useful properties of this fixed decomposition is that it allows the use of zero tree algorithms. Various tree structures, creating a relationship between coefficients, are compared. Two efficient compression methods based on zerotree coding (EZW and SPIHT) are adapted on this near-optimal decomposition with the best tree structure found. Performances are compared with the adaptation of JPEG 2000 for hyperspectral images on six different areas presenting different statistical properties
Subband coding for image data archiving
The use of subband coding on image data is discussed. An overview of subband coding is given. Advantages of subbanding for browsing and progressive resolution are presented. Implementations for lossless and lossy coding are discussed. Algorithm considerations and simple implementations of subband systems are given
- âŠ