7 research outputs found

    LF model based glottal source parameter estimation by extended Kalman filtering

    Get PDF
    A new algorithm for glottal source parameter estimation of voiced speech based on the Liljencrants-Fant (LF) model is presented in this work. Each pitch period of the inverse filtered glottal flow derivative is divided into two phases according to the glottal closing instant and an extended Kalman filter is iteratively applied to estimate the shape controlling parameters for both phases. By searching the minimal mean square error between the reconstructed LF pulse and the original signal, an optimal set of estimates can be obtained. Preliminary experimental results show that the proposed algorithm is effective for a wide range of LF parameters for different voice qualities with different noise levels, and accuracy especially for estimation of return phase parameters compares better than standard time-domain fitting methods while requiring a significantly lower computational load

    HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

    Get PDF

    Analysis of glottal pulses

    Get PDF
    Práce se zabývá odhadem hlasivkových pulzů z řečového záznamu. Je zde popsán proces tvorby řeči, dále popis přístrojů pro měření hlasivkových pulzů, přehled softwarových nástrojů umožňující odhad hlasivkových pulzů z řečového signálu. Popis metody IAIF a Sahoo –vy metody pro odhad hlasivkových pulzů. Pro snadnější ovládání zmíněných metod je vytvořeno grafické uživatelské prostředí (GUI) v programu MATLAB.The work is about the estimation of vocal pulses from the speech record. Contains a description of the process of speech production, description of the instruments for the measurement of vocal pulses, an overview of software tools for estimating vocal pulses from the speech signal. Description of IAIF and Sahoo method for estimating vocal pulses. The Graphic User Interface in MATLAB is created for easier control of mentioned methods.

    Observations on the dynamic control of an articulatory synthesizer using speech production data

    Get PDF
    This dissertation explores the automatic generation of gestural score based control structures for a three-dimensional articulatory speech synthesizer. The gestural scores are optimized in an articulatory resynthesis paradigm using a dynamic programming algorithm and a cost function which measures the deviation from a gold standard in the form of natural speech production data. This data had been recorded using electromagnetic articulography, from the same speaker to which the synthesizer\u27s vocal tract model had previously been adapted. Future work to create an English voice for the synthesizer and integrate it into a text-to-speech platform is outlined.Die vorliegende Dissertation untersucht die automatische Erzeugung von gesturalpartiturbasierten Steuerdaten für ein dreidimensionales artikulatorisches Sprachsynthesesystem. Die gesturalen Partituren werden in einem artikulatorischen Resynthese-Paradigma mittels dynamischer Programmierung optimiert, unter Zuhilfenahme einer Kostenfunktion, die den Abstand zu einem "Gold Standard" in Form natürlicher Sprachproduktionsdaten mißt. Diese Daten waren mit elektromagnetischer Artikulographie am selben Sprecher aufgenommen worden, an den zuvor das Vokaltraktmodell des Synthesesystems angepaßt worden war. Weiterführende Forschung, eine englische Stimme für das Synthesesystem zu erzeugen und sie in eine Text-to-Speech-Plattform einzubetten, wird umrissen

    Glottal Spectral Separation for Parametric Speech Synthesis

    Get PDF
    This paper presents a method to control the characteristics of synthetic speech flexibly by integrating articulatory features into a Hidden Markov Model (HMM)-based parametric speech synthesis system. In contrast to model adaptation and interpolation approaches for speaking style control, this method is driven by phonetic knowledge, and target speech samples are not required. The joint distribution of parallel acoustic and articulatory features considering cross-stream feature dependency is estimated. At synthesis time, acoustic and articulatory features are generated simultaneously based on the maximum-likelihood criterion. The synthetic speech can be controlled flexibly by modifying the generated articulatory features according to arbitrary phonetic rules in the parameter generation process. Our experiments show that the proposed method is effective in both changing the overall character of synthesized speech and in controlling the quality of a specific vowel
    corecore