In this paper we describe an approach to automatic evaluation of both the
speech recognition and understanding capabilities of a spoken dialogue system
for train time table information. We use word accuracy for recognition and
concept accuracy for understanding performance judgement. Both measures are
calculated by comparing these modules' output with a correct reference answer.
We report evaluation results for a spontaneous speech corpus with about 10000
utterances. We observed a nearly linear relationship between word accuracy and
concept accuracy.Comment: 4 pages PS, Latex2e source importing 2 eps figures, uses icslp.cls,
caption.sty, psfig.sty; to appear in the Proceedings of the Fourth
International Conference on Spoken Language Processing (ICSLP 96