2 research outputs found

    Speech organ contour extraction using real-time MRI and machine learning method

    Get PDF
    Chiba Institute of TechnologyChiba Institute of TechnologyChiba Institute of TechnologyChiba Institute of TechnologyKonan UniversityNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsReal-time MRI can be used to obtain videos that describe articulatory movements during running speech. For detailed analysis based on a large number of video frames, it is necessary to extract the contours of speech organs, such as the tongue, semi-automatically. The present study attempted to extract the contours of speech organs from videos using a machine learning method. First, an expert operator manually extracted the contours from the frames of a video to build training data sets. The learning operators, or learners, then extracted the contours from each frame of the video. Finally, the errors representing the geometrical distance between the extracted contours and the ground truth, which were the contours excluded from the training data sets, were examined. The results showed that the contours extracted using machine learning were closer to the ground truth than the contours traced by other expert and non-expert operators. In addition, using the same learners, the contours were extracted from other naive videos obtained during different speech tasks of the same subject. As a result, the errors in those videos were similar to those in the video in which the learners were trained

    A SUPERVISED AIR-TISSUE BOUNDARY SEGMENTATION TECHNIQUE IN REAL-TIME MAGNETIC RESONANCE IMAGING VIDEO USING A NOVEL MEASURE OF CONTRAST AND DYNAMIC PROGRAMMING

    No full text
    This paper introduces a technique for the supervised segmentation of Air-Tissue Boundaries (ATBs) in the upper airway of the vocal tract in the real time magnetic resonance imaging (rtMRI) videos. The proposed technique uses a novel measure of contrast across a boundary using Fisher discriminant function. ATBs in all frames of an rtMRI video are jointly estimated by maximizing the proposed measure of contrast around the predicted ATBs and incorporating a smoothness constraint to ensure the ATBs in consecutive frames do not change drastically. Dynamic programming is used for this purpose. The accuracy of the proposed technique is evaluated separately for the upper and lower ATBs using the Dynamic Time Warping distance between the predicted and the ground truth contours. Experiments with rtMRI videos from four subjects show that the error in ATB prediction using the proposed technique is 8.99% less than that using a semi-supervised grid based segmentation approach. A key feature of the proposed approach is that it can reliably predict the ATB outside the vocal tract unlike those with the existing methods
    corecore