4,838 research outputs found
Quantisation mechanisms in multi-protoype waveform coding
Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours
Recommended from our members
A novel framework for high-quality voice source analysis and synthesis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified
speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate
- …