2,558 research outputs found
The Effects of Humming and Pitch on Craniofacial and Craniocervical Morphology Measured Using MRI
Peer reviewedPreprin
Relationships Between Vocal Structures, the Airway, and Craniocervical Posture Investigated Using Magnetic Resonance Imaging
Peer reviewedPreprin
Ultrax:An Animated Midsagittal Vocal Tract Display for Speech Therapy
Speech sound disorders (SSD) are the most common communication impairment in childhood, and can hamper social development and learning. Current speech therapy interventions rely predominantly on the auditory skills of the child, as little technology is available to assist in diagnosis and therapy of SSDs. Realtime visualisation of tongue movements has the potential to bring enormous benefit to speech therapy. Ultrasound scanning offers this possibility, although its display may be hard to interpret. Our ultimate goal is to exploit ultrasound to track tongue movement, while displaying a simplified, diagrammatic vocal tract that is easier for the user to interpret. In this paper, we outline a general approach to this problem, combining a latent space model with a dimensionality reducing model of vocal tract shapes. We assess the feasibility of this approach using magnetic resonance imaging (MRI) scans to train a model of vocal tract shapes, which is animated using electromagnetic articulography (EMA) data from the same speaker. Index Terms: Ultrasound, speech therapy, vocal tract visualisation 1
A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images
Real-time magnetic resonance imaging (RT-MRI) of human speech production is
enabling significant advances in speech science, linguistics, bio-inspired
speech technology development, and clinical applications. Easy access to RT-MRI
is however limited, and comprehensive datasets with broad access are needed to
catalyze research across numerous domains. The imaging of the rapidly moving
articulators and dynamic airway shaping during speech demands high
spatio-temporal resolution and robust reconstruction methods. Further, while
reconstructed images have been published, to-date there is no open dataset
providing raw multi-coil RT-MRI data from an optimized speech production
experimental setup. Such datasets could enable new and improved methods for
dynamic image reconstruction, artifact correction, feature extraction, and
direct extraction of linguistically-relevant biomarkers. The present dataset
offers a unique corpus of 2D sagittal-view RT-MRI videos along with
synchronized audio for 75 subjects performing linguistically motivated speech
tasks, alongside the corresponding first-ever public domain raw RT-MRI data.
The dataset also includes 3D volumetric vocal tract MRI during sustained speech
sounds and high-resolution static anatomical T2-weighted upper airway MRI for
each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat
Rapid dynamic speech imaging at 3 Tesla using combination of a custom vocal tract coil, variable density spirals and manifold regularization
Purpose: To improve dynamic speech imaging at 3 Tesla.
Methods: A novel scheme combining a 16-channel vocal tract coil, variable
density spirals (VDS), and manifold regularization was developed. Short readout
duration spirals (1.3 ms long) were used to minimize sensitivity to
off-resonance. The manifold model leveraged similarities between frames sharing
similar vocal tract postures without explicit motion binning. Reconstruction
was posed as a SENSE-based non-local soft weighted temporal regularization
scheme. The self-navigating capability of VDS was leveraged to learn the
structure of the manifold. Our approach was compared against low-rank and
finite difference reconstruction constraints on two volunteers performing
repetitive and arbitrary speaking tasks. Blinded image quality evaluation in
the categories of alias artifacts, spatial blurring, and temporal blurring were
performed by three experts in voice research.
Results: We achieved a spatial resolution of 2.4mm2/pixel and a temporal
resolution of 17.4 ms/frame for single slice imaging, and 52.2 ms/frame for
concurrent 3-slice imaging. Implicit motion binning of the manifold scheme for
both repetitive and fluent speaking tasks was demonstrated. The manifold scheme
provided superior fidelity in modeling articulatory motion compared to low rank
and temporal finite difference schemes. This was reflected by higher image
quality scores in spatial and temporal blurring categories. Our technique
exhibited faint alias artifacts, but offered a reduced interquartile range of
scores compared to other methods in alias artifact category.
Conclusion: Synergistic combination of a custom vocal-tract coil, variable
density spirals and manifold regularization enables robust dynamic speech
imaging at 3 Tesla.Comment: 30 pages, 10 figure
Diphthong Synthesis Using the Dynamic 3D Digital Waveguide Mesh
Articulatory speech synthesis has the potential to offer more natural sounding synthetic speech than established concatenative or parametric synthesis methods. Time-domain acoustic models are particularly suited to the dynamic nature of the speech signal, and recent work has demonstrated the potential of dynamic vocal tract models that accurately reproduce the vocal tract geometry. This paper presents a dynamic 3D digital waveguide mesh (DWM) vocal tract model, capable of movement to produce diphthongs. The technique is compared to existing dynamic 2D and static 3D DWM models, for both monophthongs and diphthongs. The results indicate that the proposed model provides improved formant accuracy over existing DWM vocal tract models. Furthermore, the computational requirements of the proposed method are significantly lower than those of comparable dynamic simulation techniques. This paper represents another step toward a fully functional articulatory vocal tract model which will lead to more natural speech synthesis systems for use across society
- …