Spoken and multimodal dialogue systems typically make use of confidence scores to choose among (or reject) a speech recognizer’s N-best hypotheses for a particular utterance. We argue that it is beneficial to instead choose among a list of candidate system responses. We propose a novel method in which a confidence score for each response is derived from a classifier trained on acoustic and lexical features emitted by the recognizer, as well as features culled from the generation of the candidate response itself. Our responsebased method yields statistically significant improvements in F-measure over a baseline in which hypotheses are chosen based on recognition confidence scores only.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.