Search CORE

105 research outputs found

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Author: Bliesener Yannick
Byrd Dani
Chen Weiyi
Godinez Bianca
Goldstein Louis
Harper Sarah
Lee Yoonjeong
Lim Yongwan
Lingala Sajan Goud
Montesserin Mairym Lloréns
Narayanan Shrikanth S.
Nayak Krishna S.
Oh Miran
Smith Caitlin
Sorensen Tanner
Tian Ye
Toutios Asterios
Töger Johannes
Vaz Colin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2021
Field of study

Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 subjects performing linguistically motivated speech tasks, alongside the corresponding first-ever public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat

arXiv.org e-Print Archive

Directory of Open Access Journals

A multilinear tongue model derived from speech related MRI data of the human vocal tract

Author: Alexander Hewer
Allen
Ananthakrishnan
Badin
Badin
Badin
Baer
Beautemps
Bijar
Blandin
Blanz
Bolkart
Botsch
Brunner
Buchaillard
Buchaillard
Burdumy
De Silva
Demolin
Dryden
Elie
Engwall
Engwall
Engwall
Eryildirim
Fang
Foldvik
Fu
Fuchs
Geng
Harandi
Harandi
Harshman
Harshman
Hewer
Hewer
Honda
Hoole
Hoole
Ingmar Steiner
International Phonetic Association
Jackson
Johnson
Kaburagi
Kiers
Kim
Korin Richmond
Kröger
Ladefoged
Ladefoged
Le Maguer
Lee
Li
Lingala
Lingala
Liu
McGurk
Mermelstein
Narayanan
Narayanan
Narayanan
Niebergall
Otsu
Peng
Raeesy
Richmond
Rodrigues
Rosset
Rudy
Scott
Serrurier
Shadle
Stefanie Wuhrer
Steiner
Stone
Stone
Stone
Styner
Tiede
Toutios
Tucker
Valdés Vargas
Valdés Vargas
Weickert
Weirich
Weirich
Woo
Woo
Wu
Yunusova
Zheng
Publication venue: 'Elsevier BV'
Publication date: 21/02/2018
Field of study

We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Registration and statistical analysis of the tongue shape during speech production

Author: Hewer Alexander
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

Universaar

Acronym

3D vocal tract reconstruction using magnetic resonance imaging data to study fricative consonant production

Author: Diamantino Rui S. Freitas
Isabel Maria A. P. Ramos
João Manuel R. S. Tavares
Sandra M. Rua Ventura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The development of Magnetic Resonance Imaging (MRI) has grown rapidly in clinical practice. Currently, the use of MRI in speech research provides useful and accurate qualitative and quantitative data of speech articulation. The aim of this work was to describe an effective method to extract vocal tract and compute their volumes during speech production from MRI images. Using a 3.0 Tesla MRI system, 2D and 3D images of the vocal tract were collected and used to analyze the vocal tract during the production of fricative consonants. These images were also used to build the associated 3D models and compute their volumes. This approach showed that, in general, the volumes measured for the voiceless consonants are smaller than the counterpart voiced consonants. (c) Springer International Publishing Switzerland 2015

Repositório Científico do Instituto Politécnico do Porto

Repositório Aberto da Universidade do Porto

An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

Author: Belyk Michel
Carignan Christopher
McGettigan Carolyn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/07/2023
Field of study

Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms

UCL Discovery

Segmentation of tongue shapes during vowel production in magnetic resonance images based on statistical modelling

Author: Jessica C. Delmoral
João Manuel R. S. Tavares
Sandra M. Rua Ventura
Publication venue: 'SAGE Publications'
Publication date: 01/01/2018
Field of study

Quantification of the anatomic and functional aspects of the tongue is pertinent to analyse the mechanisms involved in speech production. Speech requires dynamic and complex articulation of the vocal tract organs, and the tongue is one of the main articulators during speech production. Magnetic resonance imaging has been widely used in speech-related studies. Moreover, the segmentation of such images of speech organs is required to extract reliable statistical data. However, standard solutions to analyse a large set of articulatory images have not yet been established. Therefore, this article presents an approach to segment the tongue in two-dimensional magnetic resonance images and statistically model the segmented tongue shapes. The proposed approach assesses the articulator morphology based on an active shape model, which captures the shape variability of the tongue during speech production. To validate this new approach, a dataset of mid-sagittal magnetic resonance images acquired from four subjects was used, and key aspects of the shape of the tongue during the vocal production of relevant European Portuguese vowels were evaluated

Repositório Científico do Instituto Politécnico do Porto

Repositório Aberto da Universidade do Porto

Computational algorithms for image analysis: Applications on human vocal tract and silhouette

Author: Maria João Medeiros de Vasconcelos
Publication venue
Publication date: 23/03/2015
Field of study

Repositório Aberto da Universidade do Porto

THE SPECTRAL IMPACT OF THE HYPOPHARYNGEAL CAVITIES ON THE SINGING VOICE

Author: Delvaux Bertrand
Publication venue: University of York
Publication date: 01/01/2014
Field of study

White Rose E-theses Online

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

Directory of Open Access Books (DOAB)

Computational Analysis of Magnetic Resonance Images of the Upper Airways: Algorithms and Applications

Author: Jessica Condesso Delmoral
Publication venue
Publication date: 12/10/2015
Field of study

Repositório Aberto da Universidade do Porto