Location of Repository

PRODUCTION AND PERCEPTUAL ANALYSIS OF SPEECH PRODUCED IN NOISE

By YOUYI LU

Abstract

When exposed to noise, speakers modify the way they speak, possibly in an effort to\ud maintain intelligible communication. These modifications are collectively referred to\ud as the Lombard effect. The work described in this thesis compares speech production\ud changes induced by noise with various spectral and temporal characteristics, and\ud explores the perceptual consequence of these changes. The thesis consists of a series\ud of experimental studies, which involve the analysis of speech corpora collected\ud under different noise conditions, with and without a communicative task.\ud Intelligibility is also measured and predicted using a computer model.\ud The first study concerns the acoustic and phonetic consequences of N-talker\ud “babble” noise on sentence production for a range of values of N from 1 (competing\ud talker) to “infinity” (speech-shaped noise). The effect of noise on speech production\ud increased with N and noise level, both of which act to increase the energetic masking\ud effect of the noise. In a background of stationary noise, noise-induced speech was\ud always more intelligible than speech produced in quiet, and the gain in intelligibility\ud increased with N and noise level, suggesting that talkers modify their productions to\ud ameliorate energetic masking at the ears of the listener.\ud The effect of low- and high-pass filtered noise on speech production was also\ud examined to address the issue of whether speakers can compensate for energetic\ud masking by actively shifting their spectral energy to regions least affected by the\ud noise. Little evidence was found that speakers are able to modify their speech\ud production to take advantage of those spectral regions clear of noise.\ud To evaluate the origin of the increased intelligibility of Lombard speech, the\ud fundamental frequency and spectral tilt of speech produced in quiet were artificially\ud manipulated to match those of speech produced in speech-shaped noise. A perceptual\ud evaluation showed that spectral flattening made a larger contribution to Lombard\ud speech intelligibility, but failed to find an influence of an increase in fundamental\ud frequency. A computational modeling study indicated that durational changes could\ud also play an important role in increasing intelligibility. These findings suggest that\ud speech modifications which reallocate energy in time and frequency to introduce more\ud “glimpses” of clean speech in the presence of noise are able to contribute to speech\ud intelligibility.\ud An analysis of the effect of noise on speech production requires material recorded\ud while undertaking realistic tasks. The effect of a communication factor was explored\ud using conversational speech collected in the presence of maskers with differing\ud degrees of energetic and informational masking potential. The size of speech\ud production changes was found to scale with the energetic masking potential of\ud background noise, extending the findings with read speech to a communicative task.\ud In addition, relative to the non-communicative task, speakers exploited temporal\ud planning to reduce the amount of overlap with a modulated background noise, an\ud effect which was stronger when the noise contained intelligible speech.\ud In conclusion, the strategies used by talkers to promote successful speech\ud communication under various noise conditions reported in this thesis could enable\ud spoken output applications such as dialogue systems to adapt to communicational\ud environment

Publisher: Computer Science (Sheffield)
Year: 2010
OAI identifier: oai:etheses.whiterose.ac.uk:816

Suggested articles

Preview

Citations

  1. (1996). Classification of speech under stress using target driven features,” doi
  2. (2004). Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences,” doi
  3. (1962). Effects of ambient noise and nearby talkers on a face-to-face communication task,” doi
  4. (1975). Factors in the discrimination of tonal patterns. I: Component frequency, temporal position, and silent intervals,” doi
  5. (1999). N-channel hidden Markov models for combined stress speech classification and recognition,” doi
  6. (2008). The effect of fundamental frequency on the intelligibility of speech with flattened intonation contours,” doi
  7. (1993). The effects of noise on connected speech: a consideration for automatic speech processing,” In Visual Representation of Speech Signals,
  8. (1987). Uncertainty, informational masking and the capacity of immediate auditory memory,” in Auditory Processing of Complex Sounds,
  9. vocal characteristics and audibility in the recognition of concurrent syllables,” doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.