802 research outputs found

    Using a biomechanical model for tongue tracking in ultrasound images

    Get PDF
    International audienceWe propose in this paper a new method for tongue tracking in ultrasound images which is based on a biomechanical model of the tongue. The deformation is guided both by points tracked at the surface of the tongue and by inner points of the tongue. Possible uncertainties on the tracked points are handled by the algorithm. Experiments prove that the method is efficient even in case of abrupt movements

    A multilinear tongue model derived from speech related MRI data of the human vocal tract

    Get PDF
    We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data

    Artimate: an articulatory animation framework for audiovisual speech synthesis

    Get PDF
    We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide realistic animation of the tongue and teeth for a virtual character. The framework also provides an interface to articulatory animation synthesis, as well as an example application to illustrate its use with a 3D game engine. We rely on cross-platform, open-source software and open standards to provide a lightweight, accessible, and portable workflow.Comment: Workshop on Innovation and Applications in Speech Technology (2012

    Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

    Full text link
    Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding

    Tongue Movements in Feeding and Speech

    Get PDF
    The position of the tongue relative to the upper and lower jaws is regulated in part by the position of the hyoid bone, which, with the anterior and posterior suprahyoid muscles, controls the angulation and length of the floor of the mouth on which the tongue body \u27rides\u27. The instantaneous shape of the tongue is controlled by the \u27extrinsic muscles \u27 acting in concert with the \u27intrinsic \u27 muscles. Recent anatomical research in non-human mammals has shown that the intrinsic muscles can best be regarded as a \u27laminated segmental system \u27 with tightly packed layers of the \u27transverse\u27, \u27longitudinal\u27, and \u27vertical\u27 muscle fibers. Each segment receives separate innervation from branches of the hypoglosssal nerve. These new anatomical findings are contributing to the development of functional models of the tongue, many based on increasingly refined finite element modeling techniques. They also begin to explain the observed behavior of the jaw-hyoid-tongue complex, or the hyomandibular \u27kinetic chain\u27, in feeding and consecutive speech. Similarly, major efforts, involving many imaging techniques (cinefluorography, ultrasound, electro-palatography, NMRI, and others), have examined the spatial and temporal relationships of the tongue surface in sound production. The feeding literature shows localized tongue-surface change as the process progresses. The speech literature shows extensive change in tongue shape between classes of vowels and consonants. Although there is a fundamental dichotomy between the referential framework and the methodological approach to studies of the orofacial complex in feeding and speech, it is clear that many of the shapes adopted by the tongue in speaking are seen in feeding. It is suggested that the range of shapes used in feeding is the matrix for both behaviors
    corecore