66 research outputs found

    Novel Pitch Detection Algorithm With Application to Speech Coding

    Get PDF
    This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions

    Speech coding at medium bit rates using analysis by synthesis techniques

    Get PDF
    Speech coding at medium bit rates using analysis by synthesis technique

    Sparsity in Linear Predictive Coding of Speech

    Get PDF
    nrpages: 197status: publishe

    Perceptual models in speech quality assessment and coding

    Get PDF
    The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment. [Continues.

    Novel multiscale methods for nonlinear speech analysis

    Get PDF
    Cette thèse présente une recherche exploratoire sur l'application du Formalisme Microcanonique Multiéchelles (FMM) à l'analyse de la parole. Dérivé de principes issus en physique statistique, le FMM permet une analyse géométrique précise de la dynamique non linéaire des signaux complexes. Il est fondé sur l'estimation des paramètres géométriques locaux (les exposants de singularité) qui quantifient le degré de prédictibilité à chaque point du signal. Si correctement définis est estimés, ils fournissent des informations précieuses sur la dynamique locale de signaux complexes. Nous démontrons le potentiel du FMM dans l'analyse de la parole en développant: un algorithme performant pour la segmentation phonétique, un nouveau codeur, un algorithme robuste pour la détection précise des instants de fermeture glottale, un algorithme rapide pour l analyse par prédiction linéaire parcimonieuse et une solution efficace pour l approximation multipulse du signal source d'excitation.This thesis presents an exploratory research on the application of a nonlinear multiscale formalism, called the Microcanonical Multiscale Formalism (the MMF), to the analysis of speech signals. Derived from principles in Statistical Physics, the MMF allows accurate analysis of the nonlinear dynamics of complex signals. It relies on the estimation of local geometrical parameters, the singularity exponents (SE), which quantify the degree of predictability at each point of the signal domain. When correctly defined and estimated, these exponents can provide valuable information about the local dynamics of complex signals and has been successfully used in many applications ranging from signal representation to inference and prediction.We show the relevance of the MMF to speech analysis and develop several applications to show the strength and potential of the formalism. Using the MMF, in this thesis we introduce: a novel and accurate text-independent phonetic segmentation algorithm, a novel waveform coder, a robust accurate algorithm for detection of the Glottal Closure Instants, a closed-form solution for the problem of sparse linear prediction analysis and finally, an efficient algorithm for estimation of the excitation source signal.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    A robust low bit rate quad-band excitation LSP vocoder.

    Get PDF
    by Chiu Kim Ming.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 103-108).Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Speech production --- p.2Chapter 1.2 --- Low bit rate speech coding --- p.4Chapter Chapter 2 --- Speech analysis & synthesis --- p.8Chapter 2.1 --- Linear prediction of speech signal --- p.8Chapter 2.2 --- LPC vocoder --- p.11Chapter 2.2.1 --- Pitch and voiced/unvoiced decision --- p.11Chapter 2.2.2 --- Spectral envelope representation --- p.15Chapter 2.3 --- Excitation --- p.16Chapter 2.3.1 --- Regular pulse excitation and Multipulse excitation --- p.16Chapter 2.3.2 --- Coded excitation and vector sum excitation --- p.19Chapter 2.4 --- Multiband excitation --- p.22Chapter 2.5 --- Multiband excitation vocoder --- p.25Chapter Chapter 3 --- Dual-band and Quad-band excitation --- p.31Chapter 3.1 --- Dual-band excitation --- p.31Chapter 3.2 --- Quad-band excitation --- p.37Chapter 3.3 --- Parameters determination --- p.41Chapter 3.3.1 --- Pitch detection --- p.41Chapter 3.3.2 --- Voiced/unvoiced pattern generation --- p.43Chapter 3.4 --- Excitation generation --- p.47Chapter Chapter 4 --- A low bit rate Quad-Band Excitation LSP Vocoder --- p.51Chapter 4.1 --- Architecture of QBELSP vocoder --- p.51Chapter 4.2 --- Coding of excitation parameters --- p.58Chapter 4.2.1 --- Coding of pitch value --- p.58Chapter 4.2.2 --- Coding of voiced/unvoiced pattern --- p.60Chapter 4.3 --- Spectral envelope estimation and coding --- p.62Chapter 4.3.1 --- Spectral envelope & the gain value --- p.62Chapter 4.3.2 --- Line Spectral Pairs (LSP) --- p.63Chapter 4.3.3 --- Coding of LSP frequencies --- p.68Chapter 4.3.4 --- Coding of gain value --- p.77Chapter Chapter 5 --- Performance evaluation --- p.80Chapter 5.1 --- Spectral analysis --- p.80Chapter 5.2 --- Subjective listening test --- p.93Chapter 5.2.1 --- Mean Opinion Score (MOS) --- p.93Chapter 5.2.2 --- Diagnostic Rhyme Test (DRT) --- p.96Chapter Chapter 6 --- Conclusions and discussions --- p.99References --- p.103Appendix A Subroutine of pitch detection --- p.A-I - A-IIIAppendix B Subroutine of voiced/unvoiced decision --- p.B-I - B-VAppendix C Subroutine of LPC coefficients calculation using Durbin's recursive method --- p.C-I - C-IIAppendix D Subroutine of LSP calculation using Chebyshev Polynomials --- p.D-I - D-IIIAppendix E Single syllable word pairs for Diagnostic Rhyme Test --- p.E-

    Speech coding

    Full text link
    • …
    corecore