113 research outputs found

    Quantisation mechanisms in multi-protoype waveform coding

    Get PDF
    Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours

    Comparison of CELP speech coder with a wavelet method

    Get PDF
    This thesis compares the speech quality of Code Excited Linear Predictor (CELP, Federal Standard 1016) speech coder with a new wavelet method to compress speech. The performances of both are compared by performing subjective listening tests. The test signals used are clean signals (i.e. with no background noise), speech signals with room noise and speech signals with artificial noise added. Results indicate that for clean signals and signals with predominantly voiced components the CELP standard performs better than the wavelet method but for signals with room noise the wavelet method performs much better than the CELP. For signals with artificial noise added, the results are mixed depending on the level of artificial noise added with CELP performing better for low level noise added signals and the wavelet method performing better for higher noise levels

    A variable rate speech compressor for mobile applications

    Get PDF
    One of the most promising speech coder at the bit rate of 9.6 to 4.8 kbits/s is CELP. Code Excited Linear Prediction (CELP) has been dominating 9.6 to 4.8 kbits/s region during the past 3 to 4 years. Its set back however, is its expensive implementation. As an alternative to CELP, the Base-Band CELP (CELP-BB) was developed which produced good quality speech comparable to CELP and a single chip implementable complexity as reported previously. Its robustness was also improved to tolerate errors up to 1.0 pct. and maintain intelligibility up to 5.0 pct. and more. Although, CELP-BB produces good quality speech at around 4.8 kbits/s, it has a fundamental problem when updating the pitch filter memory. A sub-optimal solution is proposed for this problem. Below 4.8 kbits/s, however, CELP-BB suffers from noticeable quantization noise as a result of the large vector dimensions used. Efficient representation of speech below 4.8 kbits/s is reported by introducing Sinusoidal Transform Coding (STC) to represent the LPC excitation which is called Sine Wave Excited LPC (SWELP). In this case, natural sounding good quality synthetic speech is obtained at around 2.4 kbits/s

    Vector Sum Excited Linear Prediction (VSELP) speech coding at 4.8 kbps

    Get PDF
    Code Excited Linear Prediction (CELP) speech coders exhibit good performance at data rates as low as 4800 bps. The major drawback to CELP type coders is their larger computational requirements. The Vector Sum Excited Linear Prediction (VSELP) speech coder utilizes a codebook with a structure which allows for a very efficient search procedure. Other advantages of the VSELP codebook structure is discussed and a detailed description of a 4.8 kbps VSELP coder is given. This coder is an improved version of the VSELP algorithm, which finished first in the NSA's evaluation of the 4.8 kbps speech coders. The coder uses a subsample resolution single tap long term predictor, a single VSELP excitation codebook, a novel gain quantizer which is robust to channel errors, and a new adaptive pre/postfilter arrangement

    Burst Packet Loss Concealment Using Multiple Codebooks and Comfort Noise for CELP-Type Speech Coders in Wireless Sensor Networks

    Get PDF
    In this paper, a packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed in order to improve the quality of decoded speech under burst packet loss conditions in a wireless sensor network. Conventional receiver-based PLC algorithms in the G.729 speech codec are usually based on speech correlation to reconstruct the decoded speech of lost frames by using parameter information obtained from the previous correctly received frames. However, this approach has difficulty in reconstructing voice onset signals since the parameters such as pitch, linear predictive coding coefficient, and adaptive/fixed codebooks of the previous frames are mostly related to silence frames. Thus, in order to reconstruct speech signals in the voice onset intervals, we propose a multiple codebook-based approach that includes a traditional adaptive codebook and a new random codebook composed of comfort noise. The proposed PLC algorithm is designed as a PLC algorithm for G.729 and its performance is then compared with that of the PLC algorithm currently employed in G.729 via a perceptual evaluation of speech quality, a waveform comparison, and a preference test under different random and burst packet loss conditions. It is shown from the experiments that the proposed PLC algorithm provides significantly better speech quality than the PLC algorithm employed in G.729 under all the test conditions

    Hybrid techniques for speech coding

    Get PDF

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    Advanced signal processing techniques for pitch synchronous sinusoidal speech coders

    Get PDF
    Recent trends in commercial and consumer demand have led to the increasing use of multimedia applications in mobile and Internet telephony. Although audio, video and data communications are becoming more prevalent, a major application is and will remain the transmission of speech. Speech coding techniques suited to these new trends must be developed, not only to provide high quality speech communication but also to minimise the required bandwidth for speech, so as to maximise that available for the new audio, video and data services. The majority of current speech coders employed in mobile and Internet applications employ a Code Excited Linear Prediction (CELP) model. These coders attempt to reproduce the input speech signal and can produce high quality synthetic speech at bit rates above 8 kbps. Sinusoidal speech coders tend to dominate at rates below 6 kbps but due to limitations in the sinusoidal speech coding model, their synthetic speech quality cannot be significantly improved even if their bit rate is increased. Recent developments have seen the emergence and application of Pitch Synchronous (PS) speech coding techniques to these coders in order to remove the limitations of the sinusoidal speech coding model. The aim of the research presented in this thesis is to investigate and eliminate the factors that limit the quality of the synthetic speech produced by PS sinusoidal coders. In order to achieve this innovative signal processing techniques have been developed. New parameter analysis and quantisation techniques have been produced which overcome many of the problems associated with applying PS techniques to sinusoidal coders. In sinusoidal based coders, two of the most important elements are the correct formulation of pitch and voicing values from the' input speech. The techniques introduced here have greatly improved these calculations resulting in a higher quality PS sinusoidal speech coder than was previously available. A new quantisation method which is able to reduce the distortion from quantising speech spectral information has also been developed. When these new techniques are utilised they effectively raise the synthetic speech quality of sinusoidal coders to a level comparable to that produced by CELP based schemes, making PS sinusoidal coders a promising alternative at low to medium bit rates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Frequency-warped autoregressive modeling and filtering

    Get PDF
    This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe

    A multiband excited waveform-interpolated 2.35-kbps speech codec for bandlimited channels

    Full text link
    corecore