research

Confidence Scoring and Speaker Adaptation in Mobile Automatic Speech Recognition Applications

Abstract

Generally, the user group of a language is remarkably diverse in terms of speaker-specific characteristics such as dialect and speaking style. Hence, quality of spoken content varies notably from one individual to another. This diversity causes problems for Automatic Speech Recognition systems. An Automatic Speech Recognition system should be able to assess the hypothesised results. This can be done by evaluating a confidence measure on the recognition results and comparing the resulting measure to a specified threshold. This threshold value, referred to as confidence score, informs how reliable a particular recognition result is for the given speech. A system should perform optimally irrespective of input speaker characteristics. However, most systems are inflexible and non-adaptive and thus, speaker adaptability can be improved. For achieving these purposes, a solid criterion is required to evaluate the quality of spoken content and the system should be made robust and adaptive towards new speakers as well. This thesis implements a confidence score using posterior probabilities to examine the quality of the output, based on the speech data and corpora provided by Devoca Oy. Furthermore, speaker adaptation algorithms: Maximum Likelihood Linear Regression and Maximum a Posteriori are applied on a GMM-HMM system and their results are compared. Experiments show that Maximum a Posteriori adaptation brings 2% to 25% improvement in word error rates of semi-continuous model and is recommended for use in the commercial product. The results of other methods are also reported. In addition, word graph is suggested as the method for obtaining posterior probabilities. Since it guarantees no such improvement in the results, the confidence score is proposed as an optional feature for the system

    Similar works