Search CORE

3,451 research outputs found

Using Active Shape Modeling Based on MRI to Study Morphologic and Pitch-Related Functional Changes Affecting Vocal Structures and the Airway

Author: Aspden Richard Malcolm
Gilbert Fiona Jane
Gregory Jenny
Miller Nicola
Stollery Pete
Publication venue: 'Elsevier BV'
Publication date: 15/03/2014
Field of study

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

Author: Fels Sidney
Saha Pramit
Srungarapu Praneeth
Publication venue
Publication date: 29/07/2018
Field of study

Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding

arXiv.org e-Print Archive

Crossref

Magnetic resonance imaging of the brain and vocal tract:Applications to the study of speech production and language learning

Author: Badin
Baer
Berken
Bresch
Bresch
Bresch
Bressman
Bressmann
Buchsbaum
Carolyn McGettigan
Cartei
Cheng
Dagenais
Daniel Carey
Delvaux
Devereux
Drissi
Dronkers
Evans
Fitch
Flege
Flege
Garnier
Gibbon
Golestani
Goozée
Goozée
Guenther
Guenther
Hagedorn
Hashizume
Hickok
Hickok
Hickok
Hu
Hughes
Jacquemot
Jacquemot
Kappes
Katz
Kriegeskorte
Kriegeskorte
Krishnan
McGettigan
McGettigan
McLeod
Moser
Narayanan
Niebergall
Oh
Pardo
Pardo
Pardo
Pardo
Peschke
Peschke
Pisanski
Piske
Proctor
Rauschecker
Reiterer
Reiterer
Sagar
Schoenle
Scott
Scott
Segawa
Silva
Silva
Simmonds
Simmonds
Simmonds
Simmonds
Tourville
Vasquez Miloro
Vorperian
Weirich
Weiss-Croft
Publication venue: 'Elsevier BV'
Publication date: 01/04/2017
Field of study

The human vocal system is highly plastic, allowing for the flexible expression of language, mood and intentions. However, this plasticity is not stable throughout the life span, and it is well documented that adult learners encounter greater difficulty than children in acquiring the sounds of foreign languages. Researchers have used magnetic resonance imaging (MRI) to interrogate the neural substrates of vocal imitation and learning, and the correlates of individual differences in phonetic “talent”. In parallel, a growing body of work using MR technology to directly image the vocal tract in real time during speech has offered primarily descriptive accounts of phonetic variation within and across languages. In this paper, we review the contribution of neural MRI to our understanding of vocal learning, and give an overview of vocal tract imaging and its potential to inform the field. We propose methods by which our understanding of speech production and learning could be advanced through the combined measurement of articulation and brain activity using MRI – specifically, we describe a novel paradigm, developed in our laboratory, that uses both MRI techniques to for the first time map directly between neural, articulatory and acoustic data in the investigation of vocalisation. This non-invasive, multimodal imaging method could be used to track central and peripheral correlates of spoken language learning, and speech recovery in clinical settings, as well as provide insights into potential sites for targeted neural interventions

Crossref

Royal Holloway - Pure

UCL Discovery

An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

Author: Belyk Michel
Carignan Christopher
McGettigan Carolyn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/07/2023
Field of study

Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms

UCL Discovery

Active adjustment of the cervical spine during pitch production compensates for shape: The ArtiVarK study

Author: Dediu D.
Moisik S.
Zhi Yun D.
Publication venue
Publication date: 01/08/2019
Field of study

The anterior lordosis of the cervical spine is thought to contribute to pitch (fo) production by influencing cricoid rotation as a function of larynx height. This study examines the matter of inter-individual variation in cervical spine shape and whether this has an influence on how fo is produced along increasing or decreasing scales, using the ArtiVarK dataset, which contains real-time MRI pitch production data. We find that the cervical spine actively participates in fo production, but the amount of displacement depends on individual shape. In general, anterior spine motion (tending toward cervical lordosis) occurs for low fo, while posterior movement (tending towards cervical kyphosis) occurs for high fo

MPG.PuRe

Segmentation and 3D reconstruction of the vocal tract from MR images - a comparative study

Author: D. R. Freitas
I. M. Ramos
João Manuel R. S. Tavares
S. R. Ventura
Publication venue
Publication date: 01/01/2010
Field of study

Speech production is an important human function involving a set of organs with specific morphological and dynamic aspects. The inter-speaker variability, the coarticulation or the nasality are some interesting aspects to improve a realistic 3D modeling of the vocal tract. For this, the understanding of the mechanism of speech production is crucial, as the current image data is not sufficient to reproduce truthfully the speakers anatomy and articulation. Hence, the goal of 3D modeling is to generate the complete geometrical and dynamical information concerning the vocal tract from medical images, such as from magnetic reso-nance imaging (MRI). This work aims to describe and compare two different segmentation techniques to at-tain the 3D shape of the vocal tract during speech production from MR images: the former based on manual tracing of the vocal tract contours and the latter based on image thresholding. Thus, the segmented cross-sectional areas were measured, and 3D models were built from the sagittal data by blending the contours ob-tained from the two segmentation techniques. The mean error of the measures computed were low for both segmentation techniques, which let us conclude that the techniques are useful to evaluate the vocal tract ge-ometry accurately. Additionally, the 3D models built using both segmentation techniques were also very similar and truthful. However, when the coronal data was used, various difficulties occurred

Repositório Aberto da Universidade do Porto

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Author: Bliesener Yannick
Byrd Dani
Chen Weiyi
Godinez Bianca
Goldstein Louis
Harper Sarah
Lee Yoonjeong
Lim Yongwan
Lingala Sajan Goud
Montesserin Mairym Lloréns
Narayanan Shrikanth S.
Nayak Krishna S.
Oh Miran
Smith Caitlin
Sorensen Tanner
Tian Ye
Toutios Asterios
Töger Johannes
Vaz Colin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2021
Field of study

Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 subjects performing linguistically motivated speech tasks, alongside the corresponding first-ever public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat

arXiv.org e-Print Archive

Directory of Open Access Journals

High-resolution three-dimensional hybrid MRI + low dose CT vocal tract modeling:A cadaveric pilot study

Author: Alam Wahidul
Atha Jarron
Christensen Gary E
Hoffman Eric
Howard David
Lingala Sajan Goud
Meyer David
Rusho Rushdi Zahid
Story Brad
Titze Ingo R
Publication venue
Publication date: 28/10/2022
Field of study

SummaryObjectivesMRI based vocal tract models have many applications in voice research and education. These models do not adequately capture bony structures (e.g. teeth, mandible), and spatial resolution is often relatively low in order to minimize scanning time. Most MRI sequences achieve 3D vocal tract coverage at gross resolutions of 2 mm3 within a scan time of <20 seconds. Computed tomography (CT) is well suited for vocal tract imaging, but is infrequently used due to the risk of ionizing radiation. In this cadaveric study, a single, extremely low-dose CT scan of the bony structures is blended with accelerated high-resolution (1 mm3) MRI scans of the soft tissues, creating a high-resolution hybrid CT-MRI vocal tract model.MethodsMinimum CT dosages were determined and a custom 16-channel airway receiver coil for accelerated high (1 mm3) resolution MRI was evaluated. A rigid body landmark based partial volume registration scheme was then applied to the images, creating a hybrid CT-MRI model that was segmented in Slicer.ResultsUltra-low dose CT produced images with sufficient quality to clearly visualize the bone, and exposed the cadaver to 0.06 mSv. This is comparable to atmospheric exposures during a round trip transatlantic flight. The custom 16-channel vocal tract coil produced acceptable image quality at 1 mm3 resolution when reconstructed from ∼6 fold undersampled data. High (1 mm3) resolution MR imaging of short (<10 seconds) sustained sounds was achieved. The feasibility of hybrid CT-MRI vocal tract modeling was successfully demonstrated using the rigid body landmark based partial volume registration scheme. Segmentations of CT and hybrid CT-MRI images provided more detailed 3D representations of the vocal tract than 2 mm3 MRI based segmentations.ConclusionsThe method described in this study indicates that high-resolution CT and MR image sets can be combined so that structures such as teeth and bone are accurately represented in vocal tract reconstructions. Such scans will aid learning and deepen understanding of anatomical features that relate to voice production, as well as furthering knowledge of the static and dynamic functioning of individual structures relating to voice production

Royal Holloway - Pure

Morfometria do trato vocal por ressonância magnética: simulação de padrões patológicos articulatórios

Author: Diamantino Rui S. Freitas
Isabel Maria A. P. Ramos
João Manuel R. S. Tavares
Sandra M. Rua Ventura
Publication venue
Publication date: 01/01/2014
Field of study

Introdução - A análise da forma ou morfometria de estruturas anatómicas, como o trato vocal, pode ser efetuada a partir de imagens bidimensionais (2D) como de aquisições volumétricas (3D) de ressonância magnética (RM). Esta técnica de imagem tem vindo a ter uma utilização crescente no estudo da produção da fala. Objetivos - Demonstrar como pode ser efetuada a morfometria do trato vocal a partir da imagem por ressonância magnética e ainda apresentar padrões anatómicos normais durante a produção das vogais [i a u] e dois padrões articulatórios patológicos em contexto simulado. Métodos - As imagens consideradas foram recolhidas a partir de aquisições 2D (Turbo Spin-eco) e 3D (Flash Gradiente-Eco) de RM em quatro sujeitos durante a produção das vogais em estudo; adicionalmente procedeu-se à avaliação de duas perturbações articulatórias usando o mesmo protocolo de RM. A morfometria do trato vocal foi extraída com recurso a técnicas manuais (para extração de cinco medidas articulatórias) e automáticas (para determinação de volumes) de processamento e análise de imagem. Resultados - Foi possível analisar todo o trato vocal, incluindo a posição e a forma dos articuladores, tendo por base cinco medidas descritivas do posicionamento destes órgãos durante a produção das vogais. A determinação destas medições permitiu identificar quais as estratégias mais comummente adotadas na produção de cada som, nomeadamente a postura articulatória e a variação de cada medida para cada um dos sujeitos em estudo. No contexto de voz falada intersujeitos, foi notória a variabilidade nos volumes estimados do trato vocal para cada som e, em especial, o aumento do volume do trato vocal na perturbação articulatória de sigmatismo. Conclusão - A imagem por RM é, sem dúvida, uma técnica promissora no estudo da fala, inócua, não-invasiva e que fornece informação fiável da morfometria do trato vocal.Introduction - The shape or morphologic analysis of anatomical structures, such as the vocal tract can be performed from two-dimensional (2D) or volumetric acquisitions (3D) of magnetic resonance imaging (MRI). This imaging technique has had an increasing use in the study of speech production. Objectives - To determine a method to perform the morphometric analysis of the vocal tract from magnetic resonance imaging; to present anatomical patterns during the normal speech production of some vowels and two pathological articulatory disorders in simulated context. Methods - The image data was collected from 2D (Turbo Spin Echo) and 3D (Flash Gradient Echo) acquisitions of MRI of four subjects during the production of three vowels; in addition, two articulatory disorders were assessed using this imaging protocol. The morphology of the vocal tract was extracted using manual and automatic techniques of image processing and analysis. Results - Based on five articulatory measurements, it was possible to study the entire vocal tract during vowel production, including the position and shape of the articulators involved. Based on these measurements, it was possible to identify the strategies most commonly adopted in the production of each sound, including the articulatory posture and the modification of each measure for the subjects under study. Concerning the voices of the different speakers, the variability in the assessed volumes of the vocal tract for each sound was found, and in particular, the increased vocal tract volume in the articulatory disorder - the sigmatism. Conclusion - MRI is a promising technique for speech production studies, safe, non-invasive and that provides reliable information concerning the morphometric analysis of the vocal tract

Repositório Científico do Instituto Politécnico do Porto

Directory of Open Access Journals

Repositório Aberto da Universidade do Porto