53 research outputs found
Novel Pitch Detection Algorithm With Application to Speech Coding
This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions
Speech coding at medium bit rates using analysis by synthesis techniques
Speech coding at medium bit rates using analysis by synthesis technique
Structured codebook design in CELP
Codebook Excited Linear Protection (CELP) is a popular analysis by synthesis technique for quantizing speech at bit rates from 4 to 6 kbps. Codebook design techniques to date have been largely based on either random (often Gaussian) codebooks, or on known binary or ternary codes which efficiently map the space of (assumed white) excitation codevectors. It has been shown that by introducing symmetries into the codebook, good complexity reduction can be realized with only marginal decrease in performance. Codebook design algorithms are considered for a wide range of structured codebooks
Low Delay Sparse and Mixed Excitation CELP Coders for Wideband Speech Coding
Code Excited Linear Prediction (CELP) algorithmsare proposed for compression of speech in 8 kHz band atswitched or variable bit rate and algorithmic delay not exceeding2 msec. Two structures of Low-Delay CELP coders are analyzed:Low-delay sparse excitation and mixed excitation CELP. Sparseexcitation is based on MP-MLQ and multilayer models. Mixedexcitation CELP algorithm stems from the narrowband G.728standard. As opposed to G.728 LD-CELP coder, mixed excitationcodebook consists of pseudorandom vectors and sequencesobtained with Long-Term Prediction (LTP). Variable rate codingconsists in maximizing vector dimension while keeping therequired speech quality. Good speech quality (MOS=3.9according to PESQ algorithm) is obtained at average bit rate 33.5kbit/sec
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
A code excited linear predictive coder: using a moments algorithm
A speech coding algorithm was developed which was based on a new method of selecting the excitation signal from a codebook of residual error sequences. The residual error sequences in the codebook were generated from 512 frames of real speech signals. L.P.C. inverse filtering was used to obtain the residual signal.
Each residual error signal was assigned an index. The index was generated using a moments algorithm. These indices were stored on a Graded Binary Tree. A Binary Search was then used to select the correct index. The use of a Graded Binary Tree in the coding algorithm reduced the search time.
The algorithm faithfully reproduced the original speech when the test residual error signal was chosen from the training data. When the test residual error signal was outside the training data, synthetic speech of a recognisable quality was produced.
Finally, the fundamentals of speech coders are discussed in detail and various developments are suggested
Recommended from our members
Speech coding
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably
- …