1 research outputs found

    Data Selection in the Framework of Automatic Speech Recognition

    Get PDF
    International audienceTraining a speech recognition system needs audio data and their corresponding exact transcriptions. However, manual transcribing is expensive, labor intensive and error-prone. Some sources, such as TV broadcast, have subtitles. Subtitles are closed to the exact transcription, but not exactly the same. Some sentences might be paraphrased, deleted, changed in word order, etc. Building automatic speech recognition from inexact subtitles may result in a poor models and low performance system. Therefore, selecting data is crucial to obtain a highly performance models. In this work, we explore the lightly supervised approach, which is a process to select a good acoustic data to train Deep Neural Network acoustic models. We study data selection methods based on phone matched error rate and average word duration. Furthermore, we propose a new data selection method combining three recognizers. Recognizing the development set produces word error rate that is the metric to measure how good the model is. Data selection methods are evaluated on the real TV broadcast dataset
    corecore