3 research outputs found

    Inversion from Audiovisual Speech to Articulatory Information by Exploiting Multimodal Data

    Get PDF
    International audienceWe present an inversion framework to identify speech production properties from audiovisual information. Our system is built on a multimodal articulatory dataset comprising ultrasound, X-ray, magnetic resonance images as well as audio and stereovisual recordings of the speaker. Visual information is captured via stereovision while the vocal tract state is represented by a properly trained articulatory model. Inversion is based on an adaptive piecewise linear approximation of the audiovisualto- articulation mapping. The presented system can recover the hidden vocal tract shapes and may serve as a basis for a more widely applicable inversion setup

    Kinematic formant-to-area mapping

    No full text
    info:eu-repo/semantics/nonPublishe

    Kinematic formant-to-area mapping

    No full text
    This article presents a method of formant-to-area mapping consisting of the direct calculation of the time derivatives of the cross-sections and length of a vocal tract model so that the time derivatives of the observed formant frequencies and the model's eigenfrequencies match. The vocal tract model is a concatenation of uniform tubelets whose cross-section areas and lengths can vary in time. Time derivatives of the tubelet parameters are obtained by solving a linear algebraic system of equations. The derivatives are then numerically integrated to arrive at cross-section and length movements. Since more than one area function is compatible with the observed formant frequencies, pseudo-energy constraints are made use of to determine a unique solution. The results show that the formant-matched movements of the tubelet cross-sections and lengths are smooth, and that the agreement between the observed and model-generated formant frequencies is better than 0.01 Hz.SCOPUS: ar.jinfo:eu-repo/semantics/publishe
    corecore