6 research outputs found

    Synthesize MRI vocal tract data during CV production

    Get PDF
    International audienceA set of rtMR image transformations across time is computed during the production of CV that is afterwards applied to a new speaker in order to synthesize his/her CV pseudo rtMRI data. Synthesized images are compared with the original ones using image cross-correlation. 2 Purpose To be able to enlarge MRI speech corpus by synthesizing data

    3D dynamic spatiotemporal atlas of the vocal tract during consonant-vowel production from 2D real time MRI

    Get PDF
    International audienceIn this work we address the problem of creating a 3D dynamic atlas of the vocal tract that captures the dynamics of the articulators in all three dimensions in order to create a global speaker model independent from speaker specific characteristics. The core steps of the proposed method are temporal alignment of the real-time MR images acquired in several sagittal planes and their combination with adaptive kernel regression. As apreprocessing step, a reference space was created to be used in order to remove anatomical information of the speakers and keep only the variability in speech production for the construction of the atlas. The adaptive kernel regression makes the choice of atlas time points independently of the time points of the frames that are used as an input for the construction. The evaluation of this atlas construction method was made by mapping two new speakers to the atlas and by checking how similar the resulting mapped images are. The use of the atlas helps in reducing subject variability.Results show that the use of the proposed atlas can capture the dynamic behavior of the articulators and is able to generalize the speech production process by creating a universal-speaker reference space

    MRI Vocal Tract Sagittal Slices Estimation during Speech Production of CV

    Get PDF
    International audienceIn this paper we propose an algorithm for estimating vocal tract para sagittal slices in order to have a better overview of the behaviour of the articulators during speech production. The first step is to align the consonant-vowel (CV) data of the sagittal plains between them for the train speaker. Sets of transformations that connect the midsagittal frames with the neighbouring ones is acquired for the train speaker. Another set of transformations is calculated which transforms the midsagittal frames of the train speaker to the corresponding midsagittal frames of the test speaker and is used to adapt to the test speaker domain the previously computed sets of transformations. The newly adapted transformations are applied to the midsagittal frames of the test speaker in order to estimate the neighbouring sagittal frames. Several mono speaker models are combined to produce the final frame estimation. To evaluate the results, image cross-correlation between the original and the estimated frames was used. Results show good agreement between the original and the estimated frames

    Using Silence MR Image to Synthesise Dynamic MRI Vocal Tract Data of CV

    No full text
    International audienceIn this work we present an algorithm for synthesising pseudo rtMRI data of the vocal tract. rtMRI data on the midsagittal plane were used to synthesise target consonant-vowel (CV) using only a silence frame of the target speaker. For this purpose, several single speaker models were created. The input of the algorithm is a silence frame of both train and target speaker and the rtMRI data of the target CV. An image transformation is computed from each CV frame to the next one, creating a set of transformations that describe the dynamics of the CV production. Another image transformation is computed from the silence frame of train speaker to the silence frame of the target speaker and is used to adapt the set of transformations computed previously to the target speaker. The adapted set of transformations is applied to the silence of the target speaker tosynthesise his/her CV pseudo rtMRI data. Synthesised images from multiple single speaker models are frame aligned and then averaged to create the final version of synthesised images. Synthesised images are compared with the original ones using image cross-correlation. Results show good agreement between the synthesised and the original images
    corecore