1,797 research outputs found
Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Vocal tract configurations play a vital role in generating distinguishable
speech sounds, by modulating the airflow and creating different resonant
cavities in speech production. They contain abundant information that can be
utilized to better understand the underlying speech production mechanism. As a
step towards automatic mapping of vocal tract shape geometry to acoustics, this
paper employs effective video action recognition techniques, like Long-term
Recurrent Convolutional Networks (LRCN) models, to identify different
vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract.
Such a model typically combines a CNN based deep hierarchical visual feature
extractor with Recurrent Networks, that ideally makes the network
spatio-temporally deep enough to learn the sequential dynamics of a short video
clip for video classification tasks. We use a database consisting of 2D
real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The
comparative performances of this class of algorithms under various parameter
settings and for various classification tasks are discussed. Interestingly, the
results show a marked difference in the model performance in the context of
speech classification with respect to generic sequence or video classification
tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding
Magnetic resonance imaging of the vocal tract: techniques and applications
Magnetic resonance (MR) imaging has been used to analyse and evaluate the vocal tract shape through different techniques and with promising results in several fields. Our purpose is to demonstrate the relevance of MR and image processing for the vocal tract study. The extraction of contours of the air cavities allowed the set-up of a number of 3D reconstruction image stacks by means of the combination of orthogonally oriented sets of slices for each articulatory gesture, as a new approach to solve the expected spatial under sampling of the imaging process. In result these models give improved information for the visualization of morphologic and anatomical aspects and are useful for partial measurements of the vocal tract shape in different situations. Potential use can be found in Medical and therapeutic applications as well as in acoustic articulatory speech modelling
Magnetic resonance imaging of the vocal tract: techniques and applications
Magnetic resonance (MR) imaging has been used to analyse and evaluate the vocal tract shape through different techniques and with promising results in several fields. Our purpose is to demonstrate the relevance of MR and image processing for the vocal tract study. The extraction of contours of the air cavities allowed the set - up of a number of 3D reconstruction image stacks by means of the combination of orthogonally oriented sets of slices for e ach articulatory gesture, as a new approach to solve the expected spatial under sampling of the imaging process. In result these models give improved information for the visualization of morphologic and anatomical aspects and are useful for partial measure ments of the vocal tract shape in different situations. Potential use can be found in Medical and therapeutic applications as well as in acoustic articulatory speech modelling
Three-dimensional modeling of tongue during speech using MRI data
The tongue is the most important and dynamic articulator for speech formation, because
of its anatomic aspects (particularly, the large volume of this muscular organ
comparatively to the surrounding organs of the vocal tract) and also due to the wide
range of movements and flexibility that are involved. In speech communication
research, a variety of techniques have been used for measuring the three-dimensional
vocal tract shapes. More recently, magnetic resonance imaging (MRI) becomes
common; mainly, because this technique allows the collection of a set of static and
dynamic images that can represent the entire vocal tract along any orientation. Over the
years, different anatomical organs of the vocal tract have been modelled; namely, 2D
and 3D tongue models, using parametric or statistical modelling procedures. Our aims
are to present and describe some 3D reconstructed models from MRI data, for one
subject uttering sustained articulations of some typical Portuguese sounds. Thus, we
present a 3D database of the tongue obtained by stack combinations with the subject
articulating Portuguese vowels. This 3D knowledge of the speech organs could be very
important; especially, for clinical purposes (for example, for the assessment of
articulatory impairments followed by tongue surgery in speech rehabilitation), and also
for a better understanding of acoustic theory in speech formation
An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms
Development of an Atlas-Based Segmentation of Cranial Nerves Using Shape-Aware Discrete Deformable Models for Neurosurgical Planning and Simulation
Twelve pairs of cranial nerves arise from the brain or brainstem and control our sensory functions such as vision, hearing, smell and taste as well as several motor functions to the head and neck including facial expressions and eye movement. Often, these cranial nerves are difficult to detect in MRI data, and thus represent problems in neurosurgery planning and simulation, due to their thin anatomical structure, in the face of low imaging resolution as well as image artifacts. As a result, they may be at risk in neurosurgical procedures around the skull base, which might have dire consequences such as the loss of eyesight or hearing and facial paralysis. Consequently, it is of great importance to clearly delineate cranial nerves in medical images for avoidance in the planning of neurosurgical procedures and for targeting in the treatment of cranial nerve disorders. In this research, we propose to develop a digital atlas methodology that will be used to segment the cranial nerves from patient image data. The atlas will be created from high-resolution MRI data based on a discrete deformable contour model called 1-Simplex mesh. Each of the cranial nerves will be modeled using its centerline and radius information where the centerline is estimated in a semi-automatic approach by finding a shortest path between two user-defined end points. The cranial nerve atlas is then made more robust by integrating a Statistical Shape Model so that the atlas can identify and segment nerves from images characterized by artifacts or low resolution. To the best of our knowledge, no such digital atlas methodology exists for segmenting nerves cranial nerves from MRI data. Therefore, our proposed system has important benefits to the neurosurgical community
A multilinear tongue model derived from speech related MRI data of the human vocal tract
We present a multilinear statistical model of the human tongue that captures
anatomical and tongue pose related shape variations separately. The model is
derived from 3D magnetic resonance imaging data of 11 speakers sustaining
speech related vocal tract configurations. The extraction is performed by using
a minimally supervised method that uses as basis an image segmentation approach
and a template fitting technique. Furthermore, it uses image denoising to deal
with possibly corrupt data, palate surface information reconstruction to handle
palatal tongue contacts, and a bootstrap strategy to refine the obtained
shapes. Our evaluation concludes that limiting the degrees of freedom for the
anatomical and speech related variations to 5 and 4, respectively, produces a
model that can reliably register unknown data while avoiding overfitting
effects. Furthermore, we show that it can be used to generate a plausible
tongue animation by tracking sparse motion capture data
- …