Search CORE

802 research outputs found

Using a biomechanical model for tongue tracking in ultrasound images

Author: Berger Marie-Odile
Loosvelt Matthieu
Villard Pierre-Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceWe propose in this paper a new method for tongue tracking in ultrasound images which is based on a biomechanical model of the tongue. The deformation is guided both by points tracked at the surface of the tongue and by inner points of the tongue. Possible uncertainties on the tracked points are handled by the algorithm. Experiments prove that the method is efficient even in case of abrupt movements

Crossref

INRIA a CCSD electronic archive server

A multilinear tongue model derived from speech related MRI data of the human vocal tract

Author: Alexander Hewer
Allen
Ananthakrishnan
Badin
Badin
Badin
Baer
Beautemps
Bijar
Blandin
Blanz
Bolkart
Botsch
Brunner
Buchaillard
Buchaillard
Burdumy
De Silva
Demolin
Dryden
Elie
Engwall
Engwall
Engwall
Eryildirim
Fang
Foldvik
Fu
Fuchs
Geng
Harandi
Harandi
Harshman
Harshman
Hewer
Hewer
Honda
Hoole
Hoole
Ingmar Steiner
International Phonetic Association
Jackson
Johnson
Kaburagi
Kiers
Kim
Korin Richmond
Kröger
Ladefoged
Ladefoged
Le Maguer
Lee
Li
Lingala
Lingala
Liu
McGurk
Mermelstein
Narayanan
Narayanan
Narayanan
Niebergall
Otsu
Peng
Raeesy
Richmond
Rodrigues
Rosset
Rudy
Scott
Serrurier
Shadle
Stefanie Wuhrer
Steiner
Stone
Stone
Stone
Styner
Tiede
Toutios
Tucker
Valdés Vargas
Valdés Vargas
Weickert
Weirich
Weirich
Woo
Woo
Wu
Yunusova
Zheng
Publication venue: 'Elsevier BV'
Publication date: 21/02/2018
Field of study

We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Artimate: an articulatory animation framework for audiovisual speech synthesis

Author: Ouni Slim
Steiner Ingmar
Publication venue
Publication date: 01/01/2012
Field of study

We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide realistic animation of the tongue and teeth for a virtual character. The framework also provides an interface to articulatory animation synthesis, as well as an example application to illustrate its use with a 3D game engine. We rely on cross-platform, open-source software and open standards to provide a lightweight, accessible, and portable workflow.Comment: Workshop on Innovation and Applications in Speech Technology (2012

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

Author: Fels Sidney
Saha Pramit
Srungarapu Praneeth
Publication venue
Publication date: 29/07/2018
Field of study

Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding

arXiv.org e-Print Archive

Crossref

Tongue Movements in Feeding and Speech

Author: Hiiemae Karen M.
Palmer Jeffrey B.
Publication venue: SURFACE at Syracuse University
Publication date: 01/11/2003
Field of study

The position of the tongue relative to the upper and lower jaws is regulated in part by the position of the hyoid bone, which, with the anterior and posterior suprahyoid muscles, controls the angulation and length of the floor of the mouth on which the tongue body \u27rides\u27. The instantaneous shape of the tongue is controlled by the \u27extrinsic muscles \u27 acting in concert with the \u27intrinsic \u27 muscles. Recent anatomical research in non-human mammals has shown that the intrinsic muscles can best be regarded as a \u27laminated segmental system \u27 with tightly packed layers of the \u27transverse\u27, \u27longitudinal\u27, and \u27vertical\u27 muscle fibers. Each segment receives separate innervation from branches of the hypoglosssal nerve. These new anatomical findings are contributing to the development of functional models of the tongue, many based on increasingly refined finite element modeling techniques. They also begin to explain the observed behavior of the jaw-hyoid-tongue complex, or the hyomandibular \u27kinetic chain\u27, in feeding and consecutive speech. Similarly, major efforts, involving many imaging techniques (cinefluorography, ultrasound, electro-palatography, NMRI, and others), have examined the spatial and temporal relationships of the tongue surface in sound production. The feeding literature shows localized tongue-surface change as the process progresses. The speech literature shows extensive change in tongue shape between classes of vowels and consonants. Although there is a fundamental dichotomy between the referential framework and the methodological approach to studies of the orofacial complex in feeding and speech, it is clear that many of the shapes adopted by the tongue in speaking are seen in feeding. It is suggested that the range of shapes used in feeding is the matrix for both behaviors

Syracuse University Research Facility and Collaborative Environment