3,451 research outputs found

    Using Active Shape Modeling Based on MRI to Study Morphologic and Pitch-Related Functional Changes Affecting Vocal Structures and the Airway

    Get PDF
    Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.Peer reviewedPostprin

    Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

    Full text link
    Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding

    Magnetic resonance imaging of the brain and vocal tract:Applications to the study of speech production and language learning

    Get PDF
    The human vocal system is highly plastic, allowing for the flexible expression of language, mood and intentions. However, this plasticity is not stable throughout the life span, and it is well documented that adult learners encounter greater difficulty than children in acquiring the sounds of foreign languages. Researchers have used magnetic resonance imaging (MRI) to interrogate the neural substrates of vocal imitation and learning, and the correlates of individual differences in phonetic “talent”. In parallel, a growing body of work using MR technology to directly image the vocal tract in real time during speech has offered primarily descriptive accounts of phonetic variation within and across languages. In this paper, we review the contribution of neural MRI to our understanding of vocal learning, and give an overview of vocal tract imaging and its potential to inform the field. We propose methods by which our understanding of speech production and learning could be advanced through the combined measurement of articulation and brain activity using MRI – specifically, we describe a novel paradigm, developed in our laboratory, that uses both MRI techniques to for the first time map directly between neural, articulatory and acoustic data in the investigation of vocalisation. This non-invasive, multimodal imaging method could be used to track central and peripheral correlates of spoken language learning, and speech recovery in clinical settings, as well as provide insights into potential sites for targeted neural interventions

    An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

    Get PDF
    Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms

    Active adjustment of the cervical spine during pitch production compensates for shape: The ArtiVarK study

    Get PDF
    The anterior lordosis of the cervical spine is thought to contribute to pitch (fo) production by influencing cricoid rotation as a function of larynx height. This study examines the matter of inter-individual variation in cervical spine shape and whether this has an influence on how fo is produced along increasing or decreasing scales, using the ArtiVarK dataset, which contains real-time MRI pitch production data. We find that the cervical spine actively participates in fo production, but the amount of displacement depends on individual shape. In general, anterior spine motion (tending toward cervical lordosis) occurs for low fo, while posterior movement (tending towards cervical kyphosis) occurs for high fo

    Segmentation and 3D reconstruction of the vocal tract from MR images - a comparative study

    Get PDF
    Speech production is an important human function involving a set of organs with specific morphological and dynamic aspects. The inter-speaker variability, the coarticulation or the nasality are some interesting aspects to improve a realistic 3D modeling of the vocal tract. For this, the understanding of the mechanism of speech production is crucial, as the current image data is not sufficient to reproduce truthfully the speakers anatomy and articulation. Hence, the goal of 3D modeling is to generate the complete geometrical and dynamical information concerning the vocal tract from medical images, such as from magnetic reso-nance imaging (MRI). This work aims to describe and compare two different segmentation techniques to at-tain the 3D shape of the vocal tract during speech production from MR images: the former based on manual tracing of the vocal tract contours and the latter based on image thresholding. Thus, the segmented cross-sectional areas were measured, and 3D models were built from the sagittal data by blending the contours ob-tained from the two segmentation techniques. The mean error of the measures computed were low for both segmentation techniques, which let us conclude that the techniques are useful to evaluate the vocal tract ge-ometry accurately. Additionally, the 3D models built using both segmentation techniques were also very similar and truthful. However, when the coronal data was used, various difficulties occurred

    A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

    Full text link
    Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 subjects performing linguistically motivated speech tasks, alongside the corresponding first-ever public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat

    High-resolution three-dimensional hybrid MRI + low dose CT vocal tract modeling:A cadaveric pilot study

    Get PDF
    SummaryObjectivesMRI based vocal tract models have many applications in voice research and education. These models do not adequately capture bony structures (e.g. teeth, mandible), and spatial resolution is often relatively low in order to minimize scanning time. Most MRI sequences achieve 3D vocal tract coverage at gross resolutions of 2 mm3 within a scan time of <20 seconds. Computed tomography (CT) is well suited for vocal tract imaging, but is infrequently used due to the risk of ionizing radiation. In this cadaveric study, a single, extremely low-dose CT scan of the bony structures is blended with accelerated high-resolution (1 mm3) MRI scans of the soft tissues, creating a high-resolution hybrid CT-MRI vocal tract model.MethodsMinimum CT dosages were determined and a custom 16-channel airway receiver coil for accelerated high (1 mm3) resolution MRI was evaluated. A rigid body landmark based partial volume registration scheme was then applied to the images, creating a hybrid CT-MRI model that was segmented in Slicer.ResultsUltra-low dose CT produced images with sufficient quality to clearly visualize the bone, and exposed the cadaver to 0.06 mSv. This is comparable to atmospheric exposures during a round trip transatlantic flight. The custom 16-channel vocal tract coil produced acceptable image quality at 1 mm3 resolution when reconstructed from ∼6 fold undersampled data. High (1 mm3) resolution MR imaging of short (<10 seconds) sustained sounds was achieved. The feasibility of hybrid CT-MRI vocal tract modeling was successfully demonstrated using the rigid body landmark based partial volume registration scheme. Segmentations of CT and hybrid CT-MRI images provided more detailed 3D representations of the vocal tract than 2 mm3 MRI based segmentations.ConclusionsThe method described in this study indicates that high-resolution CT and MR image sets can be combined so that structures such as teeth and bone are accurately represented in vocal tract reconstructions. Such scans will aid learning and deepen understanding of anatomical features that relate to voice production, as well as furthering knowledge of the static and dynamic functioning of individual structures relating to voice production

    Morfometria do trato vocal por ressonância magnética: simulação de padrões patológicos articulatórios

    Get PDF
    Introdução - A análise da forma ou morfometria de estruturas anatómicas, como o trato vocal, pode ser efetuada a partir de imagens bidimensionais (2D) como de aquisições volumétricas (3D) de ressonância magnética (RM). Esta técnica de imagem tem vindo a ter uma utilização crescente no estudo da produção da fala. Objetivos - Demonstrar como pode ser efetuada a morfometria do trato vocal a partir da imagem por ressonância magnética e ainda apresentar padrões anatómicos normais durante a produção das vogais [i a u] e dois padrões articulatórios patológicos em contexto simulado. Métodos - As imagens consideradas foram recolhidas a partir de aquisições 2D (Turbo Spin-eco) e 3D (Flash Gradiente-Eco) de RM em quatro sujeitos durante a produção das vogais em estudo; adicionalmente procedeu-se à avaliação de duas perturbações articulatórias usando o mesmo protocolo de RM. A morfometria do trato vocal foi extraída com recurso a técnicas manuais (para extração de cinco medidas articulatórias) e automáticas (para determinação de volumes) de processamento e análise de imagem. Resultados - Foi possível analisar todo o trato vocal, incluindo a posição e a forma dos articuladores, tendo por base cinco medidas descritivas do posicionamento destes órgãos durante a produção das vogais. A determinação destas medições permitiu identificar quais as estratégias mais comummente adotadas na produção de cada som, nomeadamente a postura articulatória e a variação de cada medida para cada um dos sujeitos em estudo. No contexto de voz falada intersujeitos, foi notória a variabilidade nos volumes estimados do trato vocal para cada som e, em especial, o aumento do volume do trato vocal na perturbação articulatória de sigmatismo. Conclusão - A imagem por RM é, sem dúvida, uma técnica promissora no estudo da fala, inócua, não-invasiva e que fornece informação fiável da morfometria do trato vocal.Introduction - The shape or morphologic analysis of anatomical structures, such as the vocal tract can be performed from two-dimensional (2D) or volumetric acquisitions (3D) of magnetic resonance imaging (MRI). This imaging technique has had an increasing use in the study of speech production. Objectives - To determine a method to perform the morphometric analysis of the vocal tract from magnetic resonance imaging; to present anatomical patterns during the normal speech production of some vowels and two pathological articulatory disorders in simulated context. Methods - The image data was collected from 2D (Turbo Spin Echo) and 3D (Flash Gradient Echo) acquisitions of MRI of four subjects during the production of three vowels; in addition, two articulatory disorders were assessed using this imaging protocol. The morphology of the vocal tract was extracted using manual and automatic techniques of image processing and analysis. Results - Based on five articulatory measurements, it was possible to study the entire vocal tract during vowel production, including the position and shape of the articulators involved. Based on these measurements, it was possible to identify the strategies most commonly adopted in the production of each sound, including the articulatory posture and the modification of each measure for the subjects under study. Concerning the voices of the different speakers, the variability in the assessed volumes of the vocal tract for each sound was found, and in particular, the increased vocal tract volume in the articulatory disorder - the sigmatism. Conclusion - MRI is a promising technique for speech production studies, safe, non-invasive and that provides reliable information concerning the morphometric analysis of the vocal tract
    corecore