2 research outputs found

    Estimation of glottal source waveforms and vocal tract shapes from speech signals based on ARX-LF model

    Get PDF
    The widely used method to estimate glottal source waveform and vocal tract shape is to process speech signal using inverse filter and then to fit residual signal using glottal source model. However, since source-tract interactions, estimation accuracy is reduced. In this paper, we propose a method to estimate glottal source waveform and vocal tract shape simultaneously based on analysis-by-synthesis approach with a source-filter model constructed with an auto-regressive eXogenous (ARX) model combined with the Lilijencrant-Fant (LF) model. Since the optimization of multiple parameters makes simultaneous estimation difficult, there are two steps: the glottal source parameters are initialized using the inverse filter method, then the accurate parameters of the glottal source and the vocal tract shape are estimated simultaneously using an analysis-by-synthesis approach. Experimental results with synthetic and real speech signals showed the higher estimation accuracy of the proposed method than inverse filter

    Simultaneous Estimation of Glottal Source Waveforms and Vocal Tract Shapes from Speech Signals Based on ARX-LF Model

    Get PDF
    Estimating glottal source waveforms and vocal tract shapes is typically done by processing the speech signal using an inverse filter and then fitting the residual signal using the glottal source model. However, due to source-tract interactions, the estimation accuracy is reduced. In this paper, we propose a method to estimate glottal source waveforms and vocal tract shapes simultaneously based on an analysis-by-synthesis approach with a source-filter model constructed of an Auto-Regressive eXogenous (ARX) model and the Liljencrants-Fant (LF) model. Since the optimization of multiple parameters makes simultaneous estimation difficult, we first initialize the glottal source parameters using the inverse filter method, and then simultaneously estimate the accurate parameters of the glottal sources and the vocal tract shapes using an analysis-by-synthesis approach. Experimental results with synthetic and real speech signals showed that the proposed method had higher estimation accuracy than using the inverse filter
    corecore