In the development of speech recognition algorithms, it is important to know whether any apparent difference in performance of algorithms is statistically significant, yet this issue is almost always overlooked. We present two simple tests for deciding whether the difference in error-rates between two algorithms tested on the same data set is statistically significant. The first (McNemar’s test) requires the errors made by an algorithm to be independent events and is most appropriate for isolated word algorithms. The second (a matched-pairs test) can be used even when errors are not independent events and is more appropriate for connected speech. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.