1 research outputs found

    Text spotting in large speech databases for under-resourced languages

    No full text
    Abstract—Lightly supervised acoustic modeling in under-resourced languages raises new issues due to the poor accuracy of Automatic Speech Recognition (ASR) systems for such languages and the quality of the speech transcriptions that may be found. In these conditions, the common alignment techniques are not always capable of aligning the ASR output and the approximate transcription. We propose two aligning methods that overcome these issues. In the first approach we apply an image processing algorithm on the matching matrix of the two texts to be aligned, while the second alignment approach is based on segmental DTW. The approaches outperform the current Dynamic Time Warping technique (DTW) by extracting in average 29 % and 27 % respectively more speech data than the currently used DTW
    corecore