361 research outputs found
Speech coding at medium bit rates using analysis by synthesis techniques
Speech coding at medium bit rates using analysis by synthesis technique
Structure-Constrained Basis Pursuit for Compressively Sensing Speech
Compressed Sensing (CS) exploits the sparsity of many signals to enable sampling below the Nyquist rate. If the original signal is sufficiently sparse, the Basis Pursuit (BP) algorithm will perfectly reconstruct the original signal. Unfortunately many signals that intuitively appear sparse do not meet the threshold for sufficient sparsity . These signals require so many CS samples for accurate reconstruction that the advantages of CS disappear. This is because Basis Pursuit/Basis Pursuit Denoising only models sparsity. We developed a Structure-Constrained Basis Pursuit that models the structure of somewhat sparse signals as upper and lower bound constraints on the Basis Pursuit Denoising solution. We applied it to speech, which seems sparse but does not compress well with CS, and gained improved quality over Basis Pursuit Denoising. When a single parameter (i.e. the phone) is encoded, Normalized Mean Squared Error (NMSE) decreases by between 16.2% and 1.00% when sampling with CS between 1/10 and 1/2 the Nyquist rate, respectively. When bounds are coded as a sum of Gaussians, NMSE decreases between 28.5% and 21.6% in the same range. SCBP can be applied to any somewhat sparse signal with a predictable structure to enable improved reconstruction quality with the same number of samples
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
Using a low-bit rate speech enhancement variable post-filter as a speech recognition system pre-filter to improve robustness to GSM speech
Includes bibliographical references.Performance of speech recognition systems degrades when they are used to recognize speech that has been transmitted through GS1 (Global System for Mobile Communications) voice communication channels (GSM speech). This degradation is mainly due to GSM speech coding and GSM channel noise on speech signals transmitted through the network. This poor recognition of GSM channel speech limits the use of speech recognition applications over GSM networks. If speech recognition technology is to be used unlimitedly over GSM networks recognition accuracy of GSM channel speech has to be improved. Different channel normalization techniques have been developed in an attempt to improve recognition accuracy of voice channel modified speech in general (not specifically for GSM channel speech). These techniques can be classified into three broad categories, namely, model modification, signal pre-processing and feature processing techniques. In this work, as a contribution toward improving the robustness of speech recognition systems to GSM speech, the use of a low-bit speech enhancement post-filter as a speech recognition system pre-filter is proposed. This filter is to be used in recognition systems in combination with channel normalization techniques
- …