Institute of Transport Studies, University of Leeds
Abstract
It is known that problems arise in long sessions of voice
tape-recording for off-line data entry to computers via speech
recognition systems, as a result of operator fatigue or loss of
attention. In this study the task of reading vehicle licence
plates aloud for one hour was simulated in laboratory conditions,
each speaker undergoing one recording session with feedback on
recognition accuracy and one without feedback. No significant
difference in recognition success rates between the two
conditions was discovered. The audio tape-recordings of the
sessions were analysed acoustically for fatigue-induced changes
both in long-term prosodic characteristics (including fundamental
frequency, intensity, spectral balance and rate) and in segmental
characteristics such as frequency of occurrence of different
sound types and segmental durations. Although a number of
intra-speaker di-fferences were detected in various measures, no
consistent tendencies were found in all speakers. It is concluded
that the choice of speakers and conditions resulted in
insufficient fatigue to produce clear-cut effects; however, it is
felt that the more sophisticated techniques developed during the
project on the basis of the automatic segmentation of continuous
speech is capable of more revealing analysis than the relatively
crude techniques used previously. It is not felt likely that this
would result in automatic techniques for improving speech
recognition accuracy, but it could form the basis for more
effective operator training and assessment of new applications
and techniques