Application of Automatic Speech Recognition Technology for Dysphonic Speech Assessment

Abstract

Dysphonia is a communication disorder secondary to a problem with voice production. Speakers with dysphonia often report decreased intelligibility, particularly in a noisy communication environment. Intelligibility is the primary measure of a speaker’s communicative ability; however, it is not routinely assessed in clinical settings today. This lack of intelligibility assessment can be partly attributed to the time-consuming, labor-intensive nature of manually transcribing a speaker’s utterance. Recent advances in automatic speech recognition technology have significantly increased the ease and accuracy of speech-to-text transcription, and incorporation of this technology may dramatically increase efficiency in clinical intelligibility assessment. Therefore, this project examined the feasibility of an automatic speech-to-text transcription program for describing speech production abnormalities among speakers with dysphonia. Audio recordings of the Rainbow Passage from 30 adult female speakers with normal voice and 23 adult female speakers with dysphonic voice were transcribed using IBM Watson speech-to-text transcription service. Differences between the groups were evaluated based on three measures: 1) error rate in transcribed words, 2) confidence level of transcribed words, and 3) number of possible alternatives for transcribed words. The results indicated that the confidence level was significantly lower, and the number of possible alternatives was significantly higher in the dysphonic group. Interestingly, there was no significant between-group difference in the error rate. Clinical implications of these findings and future direction will be discussed.Ope

    Similar works