63 research outputs found

    Low Delay Sparse and Mixed Excitation CELP Coders for Wideband Speech Coding

    Get PDF
    Code Excited Linear Prediction (CELP) algorithmsare proposed for compression of speech in 8 kHz band atswitched or variable bit rate and algorithmic delay not exceeding2 msec. Two structures of Low-Delay CELP coders are analyzed:Low-delay sparse excitation and mixed excitation CELP. Sparseexcitation is based on MP-MLQ and multilayer models. Mixedexcitation CELP algorithm stems from the narrowband G.728standard. As opposed to G.728 LD-CELP coder, mixed excitationcodebook consists of pseudorandom vectors and sequencesobtained with Long-Term Prediction (LTP). Variable rate codingconsists in maximizing vector dimension while keeping therequired speech quality. Good speech quality (MOS=3.9according to PESQ algorithm) is obtained at average bit rate 33.5kbit/sec

    Sparsity in Linear Predictive Coding of Speech

    Get PDF
    nrpages: 197status: publishe

    Structure-Constrained Basis Pursuit for Compressively Sensing Speech

    Get PDF
    Compressed Sensing (CS) exploits the sparsity of many signals to enable sampling below the Nyquist rate. If the original signal is sufficiently sparse, the Basis Pursuit (BP) algorithm will perfectly reconstruct the original signal. Unfortunately many signals that intuitively appear sparse do not meet the threshold for sufficient sparsity . These signals require so many CS samples for accurate reconstruction that the advantages of CS disappear. This is because Basis Pursuit/Basis Pursuit Denoising only models sparsity. We developed a Structure-Constrained Basis Pursuit that models the structure of somewhat sparse signals as upper and lower bound constraints on the Basis Pursuit Denoising solution. We applied it to speech, which seems sparse but does not compress well with CS, and gained improved quality over Basis Pursuit Denoising. When a single parameter (i.e. the phone) is encoded, Normalized Mean Squared Error (NMSE) decreases by between 16.2% and 1.00% when sampling with CS between 1/10 and 1/2 the Nyquist rate, respectively. When bounds are coded as a sum of Gaussians, NMSE decreases between 28.5% and 21.6% in the same range. SCBP can be applied to any somewhat sparse signal with a predictable structure to enable improved reconstruction quality with the same number of samples

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    Comparison of CELP speech coder with a wavelet method

    Get PDF
    This thesis compares the speech quality of Code Excited Linear Predictor (CELP, Federal Standard 1016) speech coder with a new wavelet method to compress speech. The performances of both are compared by performing subjective listening tests. The test signals used are clean signals (i.e. with no background noise), speech signals with room noise and speech signals with artificial noise added. Results indicate that for clean signals and signals with predominantly voiced components the CELP standard performs better than the wavelet method but for signals with room noise the wavelet method performs much better than the CELP. For signals with artificial noise added, the results are mixed depending on the level of artificial noise added with CELP performing better for low level noise added signals and the wavelet method performing better for higher noise levels

    Speech coding at medium bit rates using analysis by synthesis techniques

    Get PDF
    Speech coding at medium bit rates using analysis by synthesis technique

    The development of speech coding and the first standard coder for public mobile telephony

    Get PDF
    This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a historical perspective by two preceding chapters on the history of speech production models and the development of speech coding techniques until the mid 1980s, respectively. In the epilogue a brief review is given of later developments in speech coding. The introductory Chapter 1 starts with some preliminaries. It is de- ??ned what speech coding is and the reader is introduced to speech coding standards and the standardization institutes which set them. Then, the attributes of a speech coder playing a role in standardization are explained. Subsequently, several applications of speech coders - including mobile telephony - will be discussed and the state of the art in speech coding will be illustrated on the basis of some worldwide recognized standards. Chapter 2 starts with a summary of the features of speech signals and their source, the human speech organ. Then, historical models of speech production which form the basis of di??erent kinds of modern speech coders are discussed. Starting with a review of ancient mechanical models, we will arrive at the electrical source-??lter model of the 1930s. Subsequently, the acoustic-tube models as they arose in the 1950s and 1960s are discussed. Finally the 1970s are reviewed which brought the discrete-time ??lter model on the basis of linear prediction. In a unique way the logical sequencing of these models is exposed, and the links are discussed. Whereas the historical models are discussed in a narrative style, the acoustic tube models and the linear prediction tech nique as applied to speech, are subject to more mathematical analysis in order to create a sound basis for the treatise of Chapter 4. This trend continues in Chapter 3, whenever instrumental in completing that basis. In Chapter 3 the reader is taken by the hand on a guided tour through time during which successive speech coding methods pass in review. In an original way special attention is paid to the evolutionary aspect. Speci??cally, for each newly proposed method it is discussed what it added to the known techniques of the time. After presenting the relevant predecessors starting with Pulse Code Modulation (PCM) and the early vocoders of the 1930s, we will arrive at Residual-Excited Linear Predictive (RELP) coders, Analysis-by-Synthesis systems and Regular- Pulse Excitation in 1984. The latter forms the basis of the GSM full-rate coder. In Chapter 4, which constitutes the core of this thesis, explicit forms of Multi-Pulse Excited (MPE) and Regular-Pulse Excited (RPE) analysis-by-synthesis coding systems are developed. Starting from current pulse-amplitude computation methods in 1984, which included solving sets of equations (typically of order 10-16) two hundred times a second, several explicit-form designs are considered by which solving sets of equations in real time is avoided. Then, the design of a speci??c explicitform RPE coder and an associated eÆcient architecture are described. The explicit forms and the resulting architectural features have never been published in so much detail as presented here. Implementation of such a codec enabled real-time operation on a state-of-the-art singlechip digital signal processor of the time. This coder, at a bit rate of 13 kbit/s, has been selected as the Full-Rate GSM standard in 1988. Its performance is recapitulated. Chapter 5 is an epilogue brie y reviewing the major developments in speech coding technology after 1988. Many speech coding standards have been set, for mobile telephony as well as for other applications, since then. The chapter is concluded by an outlook
    corecore