Hidden Markov models (HMMs) have been successfully applied to automatic
speech recognition for more than 35 years in spite of the fact that a key HMM
assumption -- the statistical independence of frames -- is obviously violated
by speech data. In fact, this data/model mismatch has inspired many attempts to
modify or replace HMMs with alternative models that are better able to take
into account the statistical dependence of frames. However it is fair to say
that in 2010 the HMM is the consensus model of choice for speech recognition
and that HMMs are at the heart of both commercially available products and
contemporary research systems. In this paper we present a preliminary
exploration aimed at understanding how speech data depart from HMMs and what
effect this departure has on the accuracy of HMM-based speech recognition. Our
analysis uses standard diagnostic tools from the field of statistics --
hypothesis testing, simulation and resampling -- which are rarely used in the
field of speech recognition. Our main result, obtained by novel manipulations
of real and resampled data, demonstrates that real data have statistical
dependency and that this dependency is responsible for significant numbers of
recognition errors. We also demonstrate, using simulation and resampling, that
if we `remove' the statistical dependency from data, then the resulting
recognition error rates become negligible. Taken together, these results
suggest that a better understanding of the structure of the statistical
dependency in speech data is a crucial first step towards improving HMM-based
speech recognition