105 research outputs found
A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images
Real-time magnetic resonance imaging (RT-MRI) of human speech production is
enabling significant advances in speech science, linguistics, bio-inspired
speech technology development, and clinical applications. Easy access to RT-MRI
is however limited, and comprehensive datasets with broad access are needed to
catalyze research across numerous domains. The imaging of the rapidly moving
articulators and dynamic airway shaping during speech demands high
spatio-temporal resolution and robust reconstruction methods. Further, while
reconstructed images have been published, to-date there is no open dataset
providing raw multi-coil RT-MRI data from an optimized speech production
experimental setup. Such datasets could enable new and improved methods for
dynamic image reconstruction, artifact correction, feature extraction, and
direct extraction of linguistically-relevant biomarkers. The present dataset
offers a unique corpus of 2D sagittal-view RT-MRI videos along with
synchronized audio for 75 subjects performing linguistically motivated speech
tasks, alongside the corresponding first-ever public domain raw RT-MRI data.
The dataset also includes 3D volumetric vocal tract MRI during sustained speech
sounds and high-resolution static anatomical T2-weighted upper airway MRI for
each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat
A multilinear tongue model derived from speech related MRI data of the human vocal tract
We present a multilinear statistical model of the human tongue that captures
anatomical and tongue pose related shape variations separately. The model is
derived from 3D magnetic resonance imaging data of 11 speakers sustaining
speech related vocal tract configurations. The extraction is performed by using
a minimally supervised method that uses as basis an image segmentation approach
and a template fitting technique. Furthermore, it uses image denoising to deal
with possibly corrupt data, palate surface information reconstruction to handle
palatal tongue contacts, and a bootstrap strategy to refine the obtained
shapes. Our evaluation concludes that limiting the degrees of freedom for the
anatomical and speech related variations to 5 and 4, respectively, produces a
model that can reliably register unknown data while avoiding overfitting
effects. Furthermore, we show that it can be used to generate a plausible
tongue animation by tracking sparse motion capture data
Registration and statistical analysis of the tongue shape during speech production
This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio
3D vocal tract reconstruction using magnetic resonance imaging data to study fricative consonant production
The development of Magnetic Resonance Imaging (MRI) has grown rapidly in clinical practice. Currently, the use of MRI in speech research provides useful and accurate qualitative and quantitative data of speech articulation. The aim of this work was to describe an effective method to extract vocal tract and compute their volumes during speech production from MRI images. Using a 3.0 Tesla MRI system, 2D and 3D images of the vocal tract were collected and used to analyze the vocal tract during the production of fricative consonants. These images were also used to build the associated 3D models and compute their volumes. This approach showed that, in general, the volumes measured for the voiceless consonants are smaller than the counterpart voiced consonants. (c) Springer International Publishing Switzerland 2015
An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms
Segmentation of tongue shapes during vowel production in magnetic resonance images based on statistical modelling
Quantification of the anatomic and functional aspects of the tongue is pertinent to analyse the mechanisms involved in speech production. Speech requires dynamic and complex articulation of the vocal tract organs, and the tongue is one of the main articulators during speech production. Magnetic resonance imaging has been widely used in speech-related studies. Moreover, the segmentation of such images of speech organs is required to extract reliable statistical data. However, standard solutions to analyse a large set of articulatory images have not yet been established. Therefore, this article presents an approach to segment the tongue in two-dimensional magnetic resonance images and statistically model the segmented tongue shapes. The proposed approach assesses the articulator morphology based on an active shape model, which captures the shape variability of the tongue during speech production. To validate this new approach, a dataset of mid-sagittal magnetic resonance images acquired from four subjects was used, and key aspects of the shape of the tongue during the vocal production of relevant European Portuguese vowels were evaluated
Models and Analysis of Vocal Emissions for Biomedical Applications
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
- …