Voice Degradation in Using Speech Recognisers for Transcribing Inventory Data: Draft Final Report

Abstract

It is known that problems arise in long sessions of voice tape-recording for off-line data entry to computers via speech recognition systems, as a result of operator fatigue or loss of attention. In this study the task of reading vehicle licence plates aloud for one hour was simulated in laboratory conditions, each speaker undergoing one recording session with feedback on recognition accuracy and one without feedback. No significant difference in recognition success rates between the two conditions was discovered. The audio tape-recordings of the sessions were analysed acoustically for fatigue-induced changes both in long-term prosodic characteristics (including fundamental frequency, intensity, spectral balance and rate) and in segmental characteristics such as frequency of occurrence of different sound types and segmental durations. Although a number of intra-speaker di-fferences were detected in various measures, no consistent tendencies were found in all speakers. It is concluded that the choice of speakers and conditions resulted in insufficient fatigue to produce clear-cut effects; however, it is felt that the more sophisticated techniques developed during the project on the basis of the automatic segmentation of continuous speech is capable of more revealing analysis than the relatively crude techniques used previously. It is not felt likely that this would result in automatic techniques for improving speech recognition accuracy, but it could form the basis for more effective operator training and assessment of new applications and techniques

    Similar works

    This paper was published in White Rose Research Online.

    Having an issue?

    Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.