thesis

PRODUCTION AND PERCEPTUAL ANALYSIS OF SPEECH PRODUCED IN NOISE

Abstract

When exposed to noise, speakers modify the way they speak, possibly in an effort to maintain intelligible communication. These modifications are collectively referred to as the Lombard effect. The work described in this thesis compares speech production changes induced by noise with various spectral and temporal characteristics, and explores the perceptual consequence of these changes. The thesis consists of a series of experimental studies, which involve the analysis of speech corpora collected under different noise conditions, with and without a communicative task. Intelligibility is also measured and predicted using a computer model. The first study concerns the acoustic and phonetic consequences of N-talker “babble” noise on sentence production for a range of values of N from 1 (competing talker) to “infinity” (speech-shaped noise). The effect of noise on speech production increased with N and noise level, both of which act to increase the energetic masking effect of the noise. In a background of stationary noise, noise-induced speech was always more intelligible than speech produced in quiet, and the gain in intelligibility increased with N and noise level, suggesting that talkers modify their productions to ameliorate energetic masking at the ears of the listener. The effect of low- and high-pass filtered noise on speech production was also examined to address the issue of whether speakers can compensate for energetic masking by actively shifting their spectral energy to regions least affected by the noise. Little evidence was found that speakers are able to modify their speech production to take advantage of those spectral regions clear of noise. To evaluate the origin of the increased intelligibility of Lombard speech, the fundamental frequency and spectral tilt of speech produced in quiet were artificially manipulated to match those of speech produced in speech-shaped noise. A perceptual evaluation showed that spectral flattening made a larger contribution to Lombard speech intelligibility, but failed to find an influence of an increase in fundamental frequency. A computational modeling study indicated that durational changes could also play an important role in increasing intelligibility. These findings suggest that speech modifications which reallocate energy in time and frequency to introduce more “glimpses” of clean speech in the presence of noise are able to contribute to speech intelligibility. An analysis of the effect of noise on speech production requires material recorded while undertaking realistic tasks. The effect of a communication factor was explored using conversational speech collected in the presence of maskers with differing degrees of energetic and informational masking potential. The size of speech production changes was found to scale with the energetic masking potential of background noise, extending the findings with read speech to a communicative task. In addition, relative to the non-communicative task, speakers exploited temporal planning to reduce the amount of overlap with a modulated background noise, an effect which was stronger when the noise contained intelligible speech. In conclusion, the strategies used by talkers to promote successful speech communication under various noise conditions reported in this thesis could enable spoken output applications such as dialogue systems to adapt to communicational environment

    Similar works