4 research outputs found
A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images
Real-time magnetic resonance imaging (RT-MRI) of human speech production is
enabling significant advances in speech science, linguistics, bio-inspired
speech technology development, and clinical applications. Easy access to RT-MRI
is however limited, and comprehensive datasets with broad access are needed to
catalyze research across numerous domains. The imaging of the rapidly moving
articulators and dynamic airway shaping during speech demands high
spatio-temporal resolution and robust reconstruction methods. Further, while
reconstructed images have been published, to-date there is no open dataset
providing raw multi-coil RT-MRI data from an optimized speech production
experimental setup. Such datasets could enable new and improved methods for
dynamic image reconstruction, artifact correction, feature extraction, and
direct extraction of linguistically-relevant biomarkers. The present dataset
offers a unique corpus of 2D sagittal-view RT-MRI videos along with
synchronized audio for 75 subjects performing linguistically motivated speech
tasks, alongside the corresponding first-ever public domain raw RT-MRI data.
The dataset also includes 3D volumetric vocal tract MRI during sustained speech
sounds and high-resolution static anatomical T2-weighted upper airway MRI for
each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat
Rapid dynamic speech imaging at 3 Tesla using combination of a custom vocal tract coil, variable density spirals and manifold regularization
Purpose: To improve dynamic speech imaging at 3 Tesla.
Methods: A novel scheme combining a 16-channel vocal tract coil, variable
density spirals (VDS), and manifold regularization was developed. Short readout
duration spirals (1.3 ms long) were used to minimize sensitivity to
off-resonance. The manifold model leveraged similarities between frames sharing
similar vocal tract postures without explicit motion binning. Reconstruction
was posed as a SENSE-based non-local soft weighted temporal regularization
scheme. The self-navigating capability of VDS was leveraged to learn the
structure of the manifold. Our approach was compared against low-rank and
finite difference reconstruction constraints on two volunteers performing
repetitive and arbitrary speaking tasks. Blinded image quality evaluation in
the categories of alias artifacts, spatial blurring, and temporal blurring were
performed by three experts in voice research.
Results: We achieved a spatial resolution of 2.4mm2/pixel and a temporal
resolution of 17.4 ms/frame for single slice imaging, and 52.2 ms/frame for
concurrent 3-slice imaging. Implicit motion binning of the manifold scheme for
both repetitive and fluent speaking tasks was demonstrated. The manifold scheme
provided superior fidelity in modeling articulatory motion compared to low rank
and temporal finite difference schemes. This was reflected by higher image
quality scores in spatial and temporal blurring categories. Our technique
exhibited faint alias artifacts, but offered a reduced interquartile range of
scores compared to other methods in alias artifact category.
Conclusion: Synergistic combination of a custom vocal-tract coil, variable
density spirals and manifold regularization enables robust dynamic speech
imaging at 3 Tesla.Comment: 30 pages, 10 figure
Vocal tract cross-distance estimation from real-time MRI using region-of-interest analysis
Real-Time Magnetic Resonance Imaging affords speech articulation data with good spatial and temporal resolution and complete midsagittal views of the moving vocal tract, but also brings many challenges in the domain of image processing and analysis. Region-of-interest analysis has previously been proposed for simple, efficient and robust extraction of linguistically-meaningful constriction degree information. However, the accuracy of such methods has not been rigorously evaluated, and no method has been proposed to calibrate the pixel intensity values or convert them into absolute measurements of length. This work provides such an evaluation, as well as insights into the placement of regions in the image plane and calibration of the resultant pixel intensity measurements. Measurement errors are shown to be generally at or below the spatial resolution of the imaging protocol with a high degree of consistency across time and overall vocal tract configuration, validating the utility of this method of image analysis.4 page(s