Search CORE

1,878 research outputs found

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

Author: Fels Sidney
Saha Pramit
Srungarapu Praneeth
Publication venue
Publication date: 29/07/2018
Field of study

Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding

arXiv.org e-Print Archive

Crossref

Data-Driven Critical Tract Variable Determination for European Portuguese

Author: Almeida N.
Cunha C.
Frahm J.
Joseph A.
Silva S.
Teixeira A.
Publication venue: 'MDPI AG'
Publication date: 21/10/2020
Field of study

Technologies, such as real-time magnetic resonance (RT-MRI), can provide valuable information to evolve our understanding of the static and dynamic aspects of speech by contributing to the determination of which articulators are essential (critical) in producing specific sounds and how (gestures). While a visual analysis and comparison of imaging data or vocal tract profiles can already provide relevant findings, the sheer amount of available data demands and can strongly profit from unsupervised data-driven approaches. Recent work, in this regard, has asserted the possibility of determining critical articulators from RT-MRI data by considering a representation of vocal tract configurations based on landmarks placed on the tongue, lips, and velum, yielding meaningful results for European Portuguese (EP). Advancing this previous work to obtain a characterization of EP sounds grounded on Articulatory Phonology, important to explore critical gestures and advance, for example, articulatory speech synthesis, entails the consideration of a novel set of tract variables. To this end, this article explores critical variable determination considering a vocal tract representation aligned with Articulatory Phonology and the Task Dynamics framework. The overall results, obtained considering data for three EP speakers, show the applicability of this approach and are consistent with existing descriptions of EP sounds

Multidisciplinary Digital Publishing Institute

MPG.PuRe

Magnetic resonance imaging of the vocal tract: techniques and applications

Author: Diamantino Freitas
João Manuel R. S. Tavares
Sandra M. Rua Ventura
Publication venue
Publication date: 01/01/2009
Field of study

Magnetic resonance (MR) imaging has been used to analyse and evaluate the vocal tract shape through different techniques and with promising results in several fields. Our purpose is to demonstrate the relevance of MR and image processing for the vocal tract study. The extraction of contours of the air cavities allowed the set-up of a number of 3D reconstruction image stacks by means of the combination of orthogonally oriented sets of slices for each articulatory gesture, as a new approach to solve the expected spatial under sampling of the imaging process. In result these models give improved information for the visualization of morphologic and anatomical aspects and are useful for partial measurements of the vocal tract shape in different situations. Potential use can be found in Medical and therapeutic applications as well as in acoustic articulatory speech modelling

Repositório Aberto da Universidade do Porto

Magnetic resonance imaging of the vocal tract: techniques and applications

Author: Freitas Diamantino Rui
Tavares João Manuel
Ventura Sandra Moreira Rua
Publication venue: Institute for Systems and Technologies of Information, Control and Communication Press
Publication date: 17/11/2013
Field of study

Magnetic resonance (MR) imaging has been used to analyse and evaluate the vocal tract shape through different techniques and with promising results in several fields. Our purpose is to demonstrate the relevance of MR and image processing for the vocal tract study. The extraction of contours of the air cavities allowed the set - up of a number of 3D reconstruction image stacks by means of the combination of orthogonally oriented sets of slices for e ach articulatory gesture, as a new approach to solve the expected spatial under sampling of the imaging process. In result these models give improved information for the visualization of morphologic and anatomical aspects and are useful for partial measure ments of the vocal tract shape in different situations. Potential use can be found in Medical and therapeutic applications as well as in acoustic articulatory speech modelling

Repositório Científico do Instituto Politécnico do Porto

Neural Modeling and Imaging of the Cortical Interactions Underlying Syllable Production

Author: Ghosh Satrajit
Guenther Frank
Tourville Jason
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/08/2004
Field of study

This paper describes a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements. The model is a neural network whose components correspond to regions of the cerebral cortex and cerebellum, including premotor, motor, auditory, and somatosensory cortical areas. Computer simulations of the model verify its ability to account for compensation to lip and jaw perturbations during speech. Specific anatomical locations of the model's components are estimated, and these estimates are used to simulate fMRI experiments of simple syllable production with and without jaw perturbations.National Institute on Deafness and Other Communication Disorders (R01 DC02852, RO1 DC01925

Boston University Institutional Repository (OpenBU)

Registration and statistical analysis of the tongue shape during speech production

Author: Hewer Alexander
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

Universaar

Acronym

Three-dimensional modeling of tongue during speech using MRI data

Author: Freitas Diamantino Rui
Tavares João Manuel
Ventura Sandra Moreira Rua
Publication venue
Publication date: 17/11/2013
Field of study

The tongue is the most important and dynamic articulator for speech formation, because of its anatomic aspects (particularly, the large volume of this muscular organ comparatively to the surrounding organs of the vocal tract) and also due to the wide range of movements and flexibility that are involved. In speech communication research, a variety of techniques have been used for measuring the three-dimensional vocal tract shapes. More recently, magnetic resonance imaging (MRI) becomes common; mainly, because this technique allows the collection of a set of static and dynamic images that can represent the entire vocal tract along any orientation. Over the years, different anatomical organs of the vocal tract have been modelled; namely, 2D and 3D tongue models, using parametric or statistical modelling procedures. Our aims are to present and describe some 3D reconstructed models from MRI data, for one subject uttering sustained articulations of some typical Portuguese sounds. Thus, we present a 3D database of the tongue obtained by stack combinations with the subject articulating Portuguese vowels. This 3D knowledge of the speech organs could be very important; especially, for clinical purposes (for example, for the assessment of articulatory impairments followed by tongue surgery in speech rehabilitation), and also for a better understanding of acoustic theory in speech formation

Repositório Científico do Instituto Politécnico do Porto

A multilinear tongue model derived from speech related MRI data of the human vocal tract

Author: Alexander Hewer
Allen
Ananthakrishnan
Badin
Badin
Badin
Baer
Beautemps
Bijar
Blandin
Blanz
Bolkart
Botsch
Brunner
Buchaillard
Buchaillard
Burdumy
De Silva
Demolin
Dryden
Elie
Engwall
Engwall
Engwall
Eryildirim
Fang
Foldvik
Fu
Fuchs
Geng
Harandi
Harandi
Harshman
Harshman
Hewer
Hewer
Honda
Hoole
Hoole
Ingmar Steiner
International Phonetic Association
Jackson
Johnson
Kaburagi
Kiers
Kim
Korin Richmond
Kröger
Ladefoged
Ladefoged
Le Maguer
Lee
Li
Lingala
Lingala
Liu
McGurk
Mermelstein
Narayanan
Narayanan
Narayanan
Niebergall
Otsu
Peng
Raeesy
Richmond
Rodrigues
Rosset
Rudy
Scott
Serrurier
Shadle
Stefanie Wuhrer
Steiner
Stone
Stone
Stone
Styner
Tiede
Toutios
Tucker
Valdés Vargas
Valdés Vargas
Weickert
Weirich
Weirich
Woo
Woo
Wu
Yunusova
Zheng
Publication venue: 'Elsevier BV'
Publication date: 21/02/2018
Field of study

We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Edinburgh Research Explorer