Filler model based word level confidence measures for spoken dialogue systems

Abstract

In conversational dialogue applications it is critical to understand the requests accurately. However, the performance of current speech recognition systems are far from perfect. In order to function effectively with imperfect speech recognition, an accurate confidence scoring mechanism should be employed. To determine a confidence score for a hypothesis, certain confidence features are combined. In this thesis, the performance of filler-model based confidence features have been investigated. Five types of filler model were defined: triphone-network, phonenetwork, phone-class network, 5-state catch-all model and 3-state catch-all model. First all models were evaluated in terms of their ability to correctly tag (recognitionerror or correct) recognition hypotheses. Here, the best performance was obtained from triphone recognition network. Then the performance of reliable combinations of these models were investigated and it was observed that certain reliable combinations of filler models could significantly improve the accuracy of the confidence annotation. On the practical side of the work, a dialogue management system was implemented for Turkish. For acoustic model training, a 3-hour speech data was collected. To increase the speech recogntion accuracy, some parameters were tied by using decision trees. For an accurate clustering, an efficient set of questions was constructed for Turkish triphones. With this setup we obtained %97.2 word recognition accuracy

    Similar works