232 research outputs found

    Inverse Filtering Techniques in Speech Analysis

    Get PDF
    This paper reviews certain speech analytical techniques to which the label 'inverse filtering' has been applied. The unifying features of these techniques are presented, namely:1. a basis in the source-filter theory of speech production,2. the use of a network whose transfer function is the inverse of the transfer function of one or a combination of the articulatory system filters to modify the speech wave either in the time domain or in the frequency domain.However their differences, which lie in the particular system filter being inverted and in the manner of realisation. provide a basis for the classification adopted in the paper which is as follows: (1) inverse vocal tract analogue filtering. (2) inverse vocal tract digital filtering. (3) direct inverse glottal filtering. (4) linear predictive coding. An assessment of the comparative usefulness of inverse-filtering in contemporary speech studies is given

    Time-Varying Modeling of Glottal Source and Vocal Tract and Sequential Bayesian Estimation of Model Parameters for Speech Synthesis

    Get PDF
    abstract: Speech is generated by articulators acting on a phonatory source. Identification of this phonatory source and articulatory geometry are individually challenging and ill-posed problems, called speech separation and articulatory inversion, respectively. There exists a trade-off between decomposition and recovered articulatory geometry due to multiple possible mappings between an articulatory configuration and the speech produced. However, if measurements are obtained only from a microphone sensor, they lack any invasive insight and add additional challenge to an already difficult problem. A joint non-invasive estimation strategy that couples articulatory and phonatory knowledge would lead to better articulatory speech synthesis. In this thesis, a joint estimation strategy for speech separation and articulatory geometry recovery is studied. Unlike previous periodic/aperiodic decomposition methods that use stationary speech models within a frame, the proposed model presents a non-stationary speech decomposition method. A parametric glottal source model and an articulatory vocal tract response are represented in a dynamic state space formulation. The unknown parameters of the speech generation components are estimated using sequential Monte Carlo methods under some specific assumptions. The proposed approach is compared with other glottal inverse filtering methods, including iterative adaptive inverse filtering, state-space inverse filtering, and the quasi-closed phase method.Dissertation/ThesisMasters Thesis Electrical Engineering 201

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Joint estimation of vocal tract and source parameters of a speech production model

    Get PDF
    This thesis describes algorithms developed to jointly estimate vocal tract shapes and source signals from real speech. The methodology was developed and evaluated using simple articulatory models of the vocal tract, coupled with lumped parametric models of the loss mechanisms in the tract. The vocal tract is modelled by a five parameter area function model [Lm, 1990] Energy losses due to wall vibration and glottal resistance are modelled as a pole- zero filter placed at the glottis. A model described in [Lame, 1982] is used to approximate the lip radiation characteristic. An articulatory-to-acoustic "linked codebook" of approximately 1600 shapes is generated and exhaustively searched to estimate the vocal tract parameters. Glottal waveforms (input signals) are obtained by inverse filtering real speech using the estimated vocal tract parameters. The inverse filter is constructed using the estimated area function. A new method is proposed to fit the Liljencrants - Fant glottal flow model [Fant, Liljencrants and Lm, 1985] to the inverse filtered signals Estimates of the parameters are found from both the inverse filtered signal and its derivative. The descnbed model successfully estimates articulatory parameters for artificial speech waveforms. Tests on recorded vowels suggest that the technique is applicable to real speech. The technique has applications in the development of natural sounding speech synthesis, the treatment of speech disorders and the reduction of data bit rates in speech codin

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

    Vowel coding using an articulatory model

    Get PDF
    An investigation into articulatory vocoding for vowels, as a means of achieving high quality coding at low bit rates, is carried out in this thesis. Methods of estimating the vocal tract transfer function from the speech wave are compared, and an algorithm for closed glottis interval (CGI) analysis is developed. CGI analysis is chosen over autocorrelation based inverse filtering methods. Various distortion measures for use in Vector Quantization are evaluated, and a new covariance distortion measure is proposed. It is shown that this measure yields close matches from an acoustic codebook. An articulatory coding system is designed, including a linked codebook of articulatory shapes, based on synthetic speech. A method of generating a similar codebook from real speech is proposed, and an investigation into estimating articulatory parameters from the speech wave is carried out to this end

    Multi-parametric source-filter separation of speech and prosodic voice restoration

    Get PDF
    In this thesis, methods and models are developed and presented aiming at the estimation, restoration and transformation of the characteristics of human speech. During a first period of the thesis, a concept was developed that allows restoring prosodic voice features and reconstruct more natural sounding speech from pathological voices using a multi-resolution approach. Inspired from observations with respect to this approach, the necessity of a novel method for the separation of speech into voice source and articulation components emerged in order to improve the perceptive quality of the restored speech signal. This work subsequently represents the main part of this work and therefore is presented first in this thesis. The proposed method is evaluated on synthetic, physically modelled, healthy and pathological speech. A robust, separate representation of source and filter characteristics has applications in areas that go far beyond the reconstruction of alaryngeal speech. It is potentially useful for efficient speech coding, voice biometrics, emotional speech synthesis, remote and/or non-invasive voice disorder diagnosis, etc. A key aspect of the voice restoration method is the reliable separation of the speech signal into voice source and articulation for it is mostly the voice source that requires replacement or enhancement in alaryngeal speech. Observations during the evaluation of above method highlighted that this separation is insufficient with currently known methods. Therefore, the main part of this thesis is concerned with the modelling of voice and vocal tract and the estimation of the respective model parameters. Most methods for joint source filter estimation known today represent a compromise between model complexity, estimation feasibility and estimation efficiency. Typically, single-parametric models are used to represent the source for the sake of tractable optimization or multi-parametric models are estimated using inefficient grid searches over the entire parameter space. The novel method presented in this work proposes advances in the direction of efficiently estimating and fitting multi-parametric source and filter models to healthy and pathological speech signals, resulting in a more reliable estimation of voice source and especially vocal tract coefficients. In particular, the proposed method is exhibits a largely reduced bias in the estimated formant frequencies and bandwidths over a large variety of experimental conditions such as environmental noise, glottal jitter, fundamental frequency, voice types and glottal noise. The methods appears to be especially robust to environmental noise and improves the separation of deterministic voice source components from the articulation. Alaryngeal speakers often have great difficulty at producing intelligible, not to mention prosodic, speech. Despite great efforts and advances in surgical and rehabilitative techniques, currently known methods, devices and modes of speech rehabilitation leave pathological speakers with a lack in the ability to control key aspects of their voice. The proposed multiresolution approach presented at the end of this thesis provides alaryngeal speakers an intuitive manner to increase prosodic features in their speech by reconstructing a more intelligible, more natural and more prosodic voice. The proposed method is entirely non-invasive. Key prosodic cues are reconstructed and enhanced at different temporal scales by inducing additional volatility estimated from other, still intact, speech features. The restored voice source is thus controllable in an intuitive way by the alaryngeal speaker. Despite the above mentioned advantages there is also a weak point of the proposed joint source-filter estimation method to be mentioned. The proposed method exhibits a susceptibility to modelling errors of the glottal source. On the other hand, the proposed estimation framework appears to be well suited for future research on exactly this topic. A logical continuation of this work is the leverage the efficiency and reliability of the proposed method for the development of new, more accurate glottal source models

    Estimation of vocal tract shape trajectory using lossy Kelly-Lochbaum model

    Get PDF
    On esitetty teorioita, joiden mukaan puheen ymmärtämistä helpottaa aikaisempi kokemus puheen tuottamisesta. Muuntamalla akustinen puhesignaali hypoteesiksi puhujan artikulaatioeleistä voidaan saavuttaa puhujariippumattomampi ja äänteitä paremmin erotteleva kuvaus puheesta. Tämä työ esittelee metodin, jolla ääntöväylän liikeratoja voidaan arvioida suoraan puhesignaaleista. Tässä työssä luodaan Kelly-Lochbaum-tyyppinen ääntöväylämalli käyttäen apuna puheentuottamisen teoriaa. Malli on varustettu huulisäteilyllä ja säädettävällä huulten pituudella. Mallia käyttäen luodaan hakutaulukko, joka kuvaa vastaavuuksia puheen hetkellisten spektriominaisuuksien ja artikulatoristen muotojen välillä. Hakutaulukkoa voidaan käyttää mappaukseen akustisen ja artikulatorisen avaruuden välillä. Luotua mallia käytetään ääntöväylän liikeratojen arvioinnissa jatkuvan puheen aikana. Liikeradat löydetään käyttämällä yksinkertaista optimointialgoritmia, joka estimoi liikeradan minimoimalla artikulaatioon kuluvaa energiaa.There are theories that during speech perception, the understanding of speech is boosted by the knowledge of the articulatory gestures based on former speech production experience. By transforming an acoustic speech signal into a hypothesis about the articulatory gestures of the speaker, it is possible to obtain a more accurate, speaker-independent description of speech. This thesis introduces a method of estimating vocal tract trajectories from speech signals. Using the theory of speech production, a lossy Kelly-Lochbaum vocal tract model equipped with lip radiation impedance and variable lip rounding length is created. A lookup table consisting of correspondences between spectral qualities of instantaneous speech signals and articulatory shapes is created using this model. The lookup table can be used to perform acoustic-to-articulatory mapping. The obtained model is used in estimation of vocal tract shape trajectories in continuous speech. Smooth and minimum energy trajectories are found by using a simple optimization algorithm
    corecore