22 research outputs found

    Synthesize MRI vocal tract data during CV production

    Get PDF
    International audienceA set of rtMR image transformations across time is computed during the production of CV that is afterwards applied to a new speaker in order to synthesize his/her CV pseudo rtMRI data. Synthesized images are compared with the original ones using image cross-correlation. 2 Purpose To be able to enlarge MRI speech corpus by synthesizing data

    A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research

    Get PDF
    International audienceIn this work we describe the creation of ArtSpeechMRIfr: a real-time as well as static magnetic resonance imaging (rtMRI, 3D MRI) database of the vocal tract. The database contains also processed data: denoised audio, its phonetically aligned annotation, articulatory contours, and vocal tract volume information , which provides a rich resource for speech research. The database is built on data from two male speakers of French. It covers a number of phonetic contexts in the controlled part, as well as spontaneous speech, 3D MRI scans of sustained vocalic articulations, and of the dental casts of the subjects. The corpus for rtMRI consists of 79 synthetic sentences constructed from a phonetized dictionary that makes possible to shorten the duration of acquisitions while keeping a very good coverage of the phonetic contexts which exist in French. The 3D MRI includes acquisitions for 12 French vowels and 10 consonants, each of which was pronounced in several vocalic contexts. Ar-ticulatory contours (tongue, jaw, epiglottis, larynx, velum, lips) as well as 3D volumes were manually drawn for a part of the images

    Towards the prediction of the vocal tract shape from the sequence of phonemes to be articulated

    Get PDF
    International audienceIn this work, we address the prediction of speech articulators' temporal geometric position from the sequence of phonemes to be articulated. We start from a set of real-time MRI sequences uttered by a female French speaker. The contours of five articulators were tracked automatically in each of the frames in the MRI video. Then, we explore the capacity of a bidirectional GRU to correctly predict each articulator's shape and position given the sequence of phonemes and their duration. We propose a 5-fold cross-validation experiment to evaluate the generalization capacity of the model. In a second experiment, we evaluate our model's data efficiency by reducing training data. We evaluate the point-to-point Euclidean distance and the Pearson's correlations along time between the predicted and the target shapes. We also evaluate produced shapes of the critical articulators of specific phonemes. We show that our model can achieve good results with minimal data, producing very realistic vocal tract shapes

    Towards a method of dynamic vocal tract shapes generation by combining static 3D and dynamic 2D MRI speech data

    Get PDF
    International audienceWe present an algorithm for augmenting the shape of the vocal tract using 3D static and 2D dynamic speech MRI data. While static 3D images have better resolution and provide spatial information, 2D dynamic images capture the transitions. The aim of this work is to combine strong points of these two types of data to obtain better image quality of 2D dynamic images and extend the 2D dynamic images to the 3D domain. To produce a 3D dynamic consonant-vowel (CV) sequence, our algorithm takes as input the 2D CV transition and the static 3D targets for C and V. To obtain the enhanced sequence of images , the first step is to find a transformation between the 2D images and the mid-sagittal slice of the acoustically corresponding 3D image stack, and then find a transformation between neighbouring sagittal slices in the 3D static image stack. Combination of these transformations allows producing the final set of images. In the present study we first examined the transformation from the 3D mid-sagittal frame to the 2D video in order to improve image quality and then we examined the extension of the 2D video to the 3rd dimension with the aim to enrich spatial information

    Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated

    No full text
    International audienceArticulatory speech synthesis requires generating realistic vocal tract shapes from thesequence of phonemes to be articulated. This work proposes the first model trained fromrt-MRI films to automatically predict all of the vocal tract articulators’ contours. The dataare the contours tracked in the rt-MRI database recorded for one speaker. Those contourswere exploited to train an encoder-decoder network to map the sequence of phonemes andtheir durations to the exact gestures performed by the speaker. Different from other works,all the individual articulator contours are predicted separately, allowing the investigation oftheir interactions. We measure four tract variables closely coupled with critical articulatorsand observe their variations over time. The test demonstrates that our model can producehigh-quality shapes of the complete vocal tract with a good correlation between the predictedand the target variables observed in rt-MRI films, even though the tract variables are notincluded in the optimization procedure

    Intracellular Detection and Localization of Nanoparticles by Refractive Index Measurement

    No full text
    International audienceThe measuring of nanoparticle toxicity faces an important limitation since it is based on metrics exposure, the concentration at which cells are exposed instead the true concentration inside the cells. In vitro studies of nanomaterials would benefit from the direct measuring of the true intracellular dose of nanoparticles. The objective of the present study was to state whether the intracellular detection of nanodiamonds is possible by measuring the refractive index. Based on optical diffraction tomography of treated live cells, the results show that unlabeled nanoparticles can be detected and localized inside cells. The results were confirmed by fluorescence measurements. Optical diffraction tomography paves the way to measuring the true intracellular concentrations and the localization of nanoparticles which will improve the dose-response paradigm of pharmacology and toxicology in the field of nanomaterials

    Super-Resolved Dynamic 3D Reconstruction of the Vocal Tract during Natural Speech

    No full text
    International audienceMRI is the gold standard modality for speech imaging. However, it remains relatively slow, which complicates imaging of fast movements. Thus, an MRI of the vocal tract is often performed in 2D. While 3D MRI provides more information, the quality of such images is often insufficient. The goal of this study was to test the applicability of super-resolution algorithms for dynamic vocal tract MRI. In total, 25 sagittal slices of 8 mm with an in-plane resolution of 1.6 × 1.6 mm2 were acquired consecutively using a highly-undersampled radial 2D FLASH sequence. The volunteers were reading a text in French with two different protocols. The slices were aligned using the simultaneously recorded sound. The super-resolution strategy was used to reconstruct 1.6 × 1.6 × 1.6 mm3 isotropic volumes. The resulting images were less sharp than the native 2D images but demonstrated a higher signal-to-noise ratio. It was also shown that the super-resolution allows for eliminating inconsistencies leading to regular transitions between the slices. Additionally, it was demonstrated that using visual stimuli and shorter text fragments improves the inter-slice consistency and the super-resolved image sharpness. Therefore, with a correct speech task choice, the proposed method allows for the reconstruction of high-quality dynamic 3D volumes of the vocal tract during natural speech
    corecore