802 research outputs found
Using a biomechanical model for tongue tracking in ultrasound images
International audienceWe propose in this paper a new method for tongue tracking in ultrasound images which is based on a biomechanical model of the tongue. The deformation is guided both by points tracked at the surface of the tongue and by inner points of the tongue. Possible uncertainties on the tracked points are handled by the algorithm. Experiments prove that the method is efficient even in case of abrupt movements
A multilinear tongue model derived from speech related MRI data of the human vocal tract
We present a multilinear statistical model of the human tongue that captures
anatomical and tongue pose related shape variations separately. The model is
derived from 3D magnetic resonance imaging data of 11 speakers sustaining
speech related vocal tract configurations. The extraction is performed by using
a minimally supervised method that uses as basis an image segmentation approach
and a template fitting technique. Furthermore, it uses image denoising to deal
with possibly corrupt data, palate surface information reconstruction to handle
palatal tongue contacts, and a bootstrap strategy to refine the obtained
shapes. Our evaluation concludes that limiting the degrees of freedom for the
anatomical and speech related variations to 5 and 4, respectively, produces a
model that can reliably register unknown data while avoiding overfitting
effects. Furthermore, we show that it can be used to generate a plausible
tongue animation by tracking sparse motion capture data
Artimate: an articulatory animation framework for audiovisual speech synthesis
We present a modular framework for articulatory animation synthesis using
speech motion capture data obtained with electromagnetic articulography (EMA).
Adapting a skeletal animation approach, the articulatory motion data is applied
to a three-dimensional (3D) model of the vocal tract, creating a portable
resource that can be integrated in an audiovisual (AV) speech synthesis
platform to provide realistic animation of the tongue and teeth for a virtual
character. The framework also provides an interface to articulatory animation
synthesis, as well as an example application to illustrate its use with a 3D
game engine. We rely on cross-platform, open-source software and open standards
to provide a lightweight, accessible, and portable workflow.Comment: Workshop on Innovation and Applications in Speech Technology (2012
Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Vocal tract configurations play a vital role in generating distinguishable
speech sounds, by modulating the airflow and creating different resonant
cavities in speech production. They contain abundant information that can be
utilized to better understand the underlying speech production mechanism. As a
step towards automatic mapping of vocal tract shape geometry to acoustics, this
paper employs effective video action recognition techniques, like Long-term
Recurrent Convolutional Networks (LRCN) models, to identify different
vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract.
Such a model typically combines a CNN based deep hierarchical visual feature
extractor with Recurrent Networks, that ideally makes the network
spatio-temporally deep enough to learn the sequential dynamics of a short video
clip for video classification tasks. We use a database consisting of 2D
real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The
comparative performances of this class of algorithms under various parameter
settings and for various classification tasks are discussed. Interestingly, the
results show a marked difference in the model performance in the context of
speech classification with respect to generic sequence or video classification
tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding
Tongue Movements in Feeding and Speech
The position of the tongue relative to the upper and lower jaws is regulated in part by the position of the hyoid bone, which, with the anterior and posterior suprahyoid muscles, controls the angulation and length of the floor of the mouth on which the tongue body \u27rides\u27. The instantaneous shape of the tongue is controlled by the \u27extrinsic muscles \u27 acting in concert with the \u27intrinsic \u27 muscles. Recent anatomical research in non-human mammals has shown that the intrinsic muscles can best be regarded as a \u27laminated segmental system \u27 with tightly packed layers of the \u27transverse\u27, \u27longitudinal\u27, and \u27vertical\u27 muscle fibers. Each segment receives separate innervation from branches of the hypoglosssal nerve. These new anatomical findings are contributing to the development of functional models of the tongue, many based on increasingly refined finite element modeling techniques. They also begin to explain the observed behavior of the jaw-hyoid-tongue complex, or the hyomandibular \u27kinetic chain\u27, in feeding and consecutive speech. Similarly, major efforts, involving many imaging techniques (cinefluorography, ultrasound, electro-palatography, NMRI, and others), have examined the spatial and temporal relationships of the tongue surface in sound production. The feeding literature shows localized tongue-surface change as the process progresses. The speech literature shows extensive change in tongue shape between classes of vowels and consonants. Although there is a fundamental dichotomy between the referential framework and the methodological approach to studies of the orofacial complex in feeding and speech, it is clear that many of the shapes adopted by the tongue in speaking are seen in feeding. It is suggested that the range of shapes used in feeding is the matrix for both behaviors
- …