Context dependent speech recognition

Abstract

Poor speech recognition is a problem when developing spoken dialogue systems, but several studies has showed that speech recognition can be improved by post-processing of recognition output that use the dialogue context, acoustic properties of a user utterance and other available resources to train a statistical model to use as a filter between the speech recogniser and dialogue manager. In this thesis a corpus of logged interactions between users and a dialogue system was used to extract features from previous dialogue context, acoustics from the user utterance and n-best recognition hypotheses. The features were used to train maximum entropy models with different feature sets to rerank the n-best hypotheses. The models fail to some extent to predict intended labels but using the reranked output in effect means that 94.9% of the adequate hypotheses will be sent to the dialogue manager, a decrease in relative error over baseline with 44.6% showing that contextual reranking can improve speech recognition for dialogue systems. Future work involves developing the current feature sets and maxEnt models to better classify whether a hypothesis should be accepted or rejected by the dialogue system rather than rerank them

    Similar works