29 research outputs found
Time and frequency domain algorithms for speech coding
The promise of digital hardware economies (due to recent advances in
VLSI technology), has focussed much attention on more complex and sophisticated
speech coding algorithms which offer improved quality at relatively
low bit rates.
This thesis describes the results (obtained from computer simulations)
of research into various efficient (time and frequency domain) speech
encoders operating at a transmission bit rate of 16 Kbps.
In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM)
systems employing both forward and backward adaptive prediction were
examined. A number of algorithms were proposed and evaluated, including
several variants of the Stochastic Approximation Predictor (SAP). A
Backward Block Adaptive (BBA) predictor was also developed and found to
outperform the conventional stochastic methods, even though its complexity
in terms of signal processing requirements is lower. A simplified
Adaptive Predictive Coder (APC) employing a single tap pitch predictor
considered next provided a slight improvement in performance over ADPCM,
but with rather greater complexity.
The ultimate test of any speech coding system is the perceptual performance
of the received speech. Recent research has indicated that this
may be enhanced by suitable control of the noise spectrum according to
the theory of auditory masking. Various noise shaping ADPCM
configurations were examined, and it was demonstrated that a proposed
pre-/post-filtering arrangement which exploits advantageously the
predictor-quantizer interaction, leads to the best subjective
performance in both forward and backward prediction systems.
Adaptive quantization is instrumental to the performance of ADPCM systems.
Both the forward adaptive quantizer (AQF) and the backward oneword
memory adaptation (AQJ) were examined. In addition, a novel method
of decreasing quantization noise in ADPCM-AQJ coders, which involves the
application of correction to the decoded speech samples, provided
reduced output noise across the spectrum, with considerable high frequency
noise suppression.
More powerful (and inevitably more complex) frequency domain speech
coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder
(SBC) offer good quality speech at 16 Kbps. To reduce complexity and
coding delay, whilst retaining the advantage of sub-band coding, a novel
transform based split-band coder (TSBC) was developed and found to compare
closely in performance with the SBC.
To prevent the heavy side information requirement associated with a
large number of bands in split-band coding schemes from impairing coding
accuracy, without forgoing the efficiency provided by adaptive bit
allocation, a method employing AQJs to code the sub-band signals together
with vector quantization of the bit allocation patterns was also
proposed.
Finally, 'pipeline' methods of bit allocation and step size estimation
(using the Fast Fourier Transform (FFT) on the input signal) were examined.
Such methods, although less accurate, are nevertheless useful in
limiting coding delay associated with SRC schemes employing Quadrature
Mirror Filters (QMF)
Differential encoding techniques applied to speech signals
The increasing use of digital communication systems has
produced a continuous search for efficient methods of speech
encoding.
This thesis describes investigations of novel differential
encoding systems. Initially Linear First Order DPCM systems
employing a simple delayed encoding algorithm are examined.
The systems detect an overload condition in the encoder, and
through a simple algorithm reduce the overload noise at the
expense of some increase in the quantization (granular) noise.
The signal-to-noise ratio (snr) performance of such d codec has
1 to 2 dB's advantage compared to the First Order Linear DPCM
system.
In order to obtain a large improvement in snr the high
correlation between successive pitch periods as well as the
correlation between successive samples in the voiced speech
waveform is exploited. A system called "Pitch Synchronous
First Order DPCM" (PSFOD) has been developed. Here the difference
Sequence formed between the samples of the input sequence in the
current pitch period and the samples of the stored decoded
sequence from the previous pitch period are encoded. This
difference sequence has a smaller dynamic range than the original
input speech sequence enabling a quantizer with better resolution
to be used for the same transmission bit rate. The snr is increased
by 6 dB compared with the peak snr of a First Order DPCM codea.
A development of the PSFOD system called a Pitch Synchronous
Differential Predictive Encoding system (PSDPE) is next investigated.
The principle of its operation is to predict the next sample in
the voiced-speech waveform, and form the prediction error which
is then subtracted from the corresponding decoded prediction
error in the previous pitch period. The difference is then
encoded and transmitted. The improvement in snr is approximately
8 dB compared to an ADPCM codea, when the PSDPE system uses an
adaptive PCM encoder. The snr of the system increases further
when the efficiency of the predictors used improve. However,
the performance of a predictor in any differential system is
closely related to the quantizer used. The better the quantization
the more information is available to the predictor and the better
the prediction of the incoming speech samples. This leads
automatically to the investigation in techniques of efficient
quantization. A novel adaptive quantization technique called
Dynamic Ratio quantizer (DRQ) is then considered and its theory
presented. The quantizer uses an adaptive non-linear element
which transforms the input samples of any amplitude to samples
within a defined amplitude range. A fixed uniform quantizer
quantizes the transformed signal. The snr for this quantizer
is almost constant over a range of input power limited in practice
by the dynamia range of the adaptive non-linear element, and it
is 2 to 3 dB's better than the snr of a One Word Memory adaptive
quantizer.
Digital computer simulation techniques have been used widely
in the above investigations and provide the necessary experimental
flexibility. Their use is described in the text
Speech coding at medium bit rates using analysis by synthesis techniques
Speech coding at medium bit rates using analysis by synthesis technique
Recommended from our members
Speech coding
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably
Media gateway utilizando um GPU
Mestrado em Engenharia de Computadores e Telemátic
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
The low bit-rate coding of speech signals
Imperial Users onl