112 research outputs found
Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech
Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech (NiCLS) produce more pronounced amplitude modulations in noise vs. in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech
Recommended from our members
Evaluation of near-end speech enhancement under equal-loudness constraint for listeners with normal-hearing and mild-to-moderate hearing loss.
Four algorithms designed to enhance the intelligibility of speech when noise is added after processing were evaluated under the constraint that the speech should have the same loudness before and after processing, as determined using a loudness model. The algorithms applied spectral modifications and two of them included dynamic-range compression. On average, the methods with dynamic-range compression required the least level adjustment to equate loudness for the unprocessed and processed speech. Subjects with normal-hearing (experiment 1) and mild-to-moderate hearing loss (experiment 2) were tested using unmodified and enhanced speech presented in speech-shaped noise (SSN) and a competing speaker (CS). The results showed (a) the algorithms with dynamic-range compression yielded the largest intelligibility gains in both experiments and for both types of background; (b) the algorithms without dynamic-range compression either yielded benefit only with the SSN or yielded no consistent benefit; (c) speech reception thresholds for unprocessed speech were higher for hearing-impaired than for normal-hearing subjects, by about 2 dB for the SSN and 6 dB for the CS. It is concluded that the enhancement methods incorporating dynamic-range compression can improve intelligibility under the equal-loudness constraint for both normal-hearing and hearing-impaired subjects and for both steady and fluctuating backgrounds
Recommended from our members
Environment- and listener-oriented speaking style adaptations across the lifespan
textThis dissertation examines how age affects the ability to produce intelligibility- enhancing speaking style adaptations in response to environment-related difficulties (noise-adapted speech) and in response to listeners’ perceptual difficulties (clear speech). Materials consisted of conversational and clear speech sentences produced in quiet and in response to noise by children (11-13 years), young adults (18-29 years), and older adults (60-84 years). Acoustic measures of global, segmental, and voice characteristics were obtained. Young adult listeners participated in word-recognition-in-noise and perceived age tasks. The study also examined relative talker intelligibility as well as the relationship between the acoustic measurements and intelligibility results. Several age-related differences in speaking style adaptation strategies were found. Children increased mean F0 and F1 more than adults in response to noise, and exhibited greater changes to voice quality when producing clear speech (increased HNR, decreased shimmer). Older adults lengthened pause duration more in clear speech compared to younger talkers. Word recognition in noise results revealed no age-related differences in the intelligibility of conversational speech. Noise-adapted and clear speech modifications increased intelligibility for all talker groups. However, the acoustic changes implemented by children when producing noise-adapted and clear speech were less efficient in enhancing intelligibility compared to the young adult talkers. Children were also less intelligible than older adults for speech produced in quiet. Results confirmed that the talkers formed 3 perceptually-distinct age groups. Correlation analyses revealed that relative talker intelligibility was consistent for conversational and clear speech in quiet. However, relative talker intelligibility was found to be more variable with the inclusion of additional speaking style adaptations. 1-3 kHz energy, speaking rate, vowel and pause durations all emerged as significant acoustic-phonetic predictors of intelligibility. This is the first study to investigate how clear speech and noise-adapted speech benefits interact with each other across multiple talker groups. The findings enhance our understanding of intelligibility variation across the lifespan and have implications for a number of applied realms, from audiologic rehabilitation to speech synthesis.Linguistic
Individual and environment-related acoustic-phonetic strategies for communicating in adverse conditions
In many situations it is necessary to produce speech in ‘adverse conditions’: that is, conditions that make speech communication difficult. Research has demonstrated that speaker strategies, as described by a range of acoustic-phonetic measures, can vary both at the individual level and according to the environment, and are argued to facilitate communication. There has been debate as to the environmental specificity of these adaptations, and their effectiveness in overcoming communication difficulty. Furthermore, the manner and extent to which adaptation strategies differ between individuals is not yet well understood. This thesis presents three studies that explore the acoustic-phonetic adaptations of speakers in noisy and degraded communication conditions and their relationship with intelligibility. Study 1 investigated the effects of temporally fluctuating maskers on global acoustic-phonetic measures associated with speech in noise (Lombard speech). The results replicated findings of increased power in the modulation spectrum in Lombard speech, but showed little evidence of adaptation to masker fluctuations via the temporal envelope. Study 2 collected a larger corpus of semi-spontaneous communicative speech in noise and other degradations perturbing specific acoustic dimensions. Speakers showed different adaptations across the environments that were likely suited to overcome noise (steady and temporally fluctuating), restricted spectral and pitch information by a noise-excited vocoder, and a sensorineural hearing loss simulation. Analyses of inter-speaker variation in both studies 1 and 2 showed behaviour was highly variable and some strategy combinations were identified. Study 3 investigated the intelligibility of strategies ‘tailored’ to specific environments and the relationship between intelligibility and speaker acoustics, finding a benefit of tailored speech adaptations and discussing the potential roles of speaker flexibility, adaptation level, and intrinsic intelligibility. The overall results are discussed in relation to models of communication in adverse conditions and a model accounting for individual variability in these conditions is proposed
Context-aware speech synthesis: A human-inspired model for monitoring and adapting synthetic speech
The aim of this PhD thesis is to illustrate the development a computational model for speech synthesis, which mimics the behaviour of human speaker when they adapt their production to their communicative conditions.
The PhD project was motivated by the observed differences between state-of-the- art synthesiser’s speech and human production. In particular, synthesiser outcome does not exhibit any adaptation to communicative context such as environmental disturbances, listener’s needs, or speech content meanings, as the human speech does. No evaluation is performed by standard synthesisers to check whether their production is suitable for the communication requirements.
Inspired by Lindblom's Hyper and Hypo articulation theory (H&H) theory of speech production, the computational model of Hyper and Hypo articulation theory (C2H) is proposed. This novel computational model for automatic speech production is designed to monitor its outcome and to be able to control the effort involved in the synthetic speech generation.
Speech transformations are based on the hypothesis that low-effort attractors for a human speech production system can be identified. Such acoustic configurations are close to minimum possible effort that a speaker can make in speech production. The interpolation/extrapolation along the key dimension of hypo/hyper-articulation can be motivated by energetic considerations of phonetic contrast. The complete reactive speech synthesis is enabled by adding a negative perception feedback loop to the speech production chain in order to constantly assess the communicative effectiveness of the proposed adaptation. The distance to the original communicative intents is the control signal that drives the speech transformations.
A hidden Markov model (HMM)-based speech synthesiser along with the continuous adaptation of its statistical models is used to implement the C2H model.
A standard version of the synthesis software does not allow for transformations of speech during the parameter generation. Therefore, the generation algorithm of one the most well-known speech synthesis frameworks, HMM/DNN-based speech synthesis framework (HTS), is modified. The short-time implementation of speech intelligibility index (SII), named extended speech intelligibility index (eSII), is also chosen as the main perception measure in the feedback loop to control the transformation.
The effectiveness of the proposed model is tested by performing acoustic analysis, objective, and subjective evaluations. A key assessment is to measure the control of the speech clarity in noisy condition, and the similarities between the emerging modifications and human behaviour. Two objective scoring methods are used to assess the speech intelligibility of the implemented system: the speech intelligibility index (SII) and the index based upon the Dau measure (Dau).
Results indicate that the intelligibility of C2H-generated speech can be continuously controlled. The effectiveness of reactive speech synthesis and of the phonetic contrast motivated transforms is confirmed by the acoustic and objective results. More precisely, in the maximum-strength hyper-articulation transformations, the improvement with respect to non-adapted speech is above 10% for all intelligibility indices and tested noise conditions
Speech communication strategies in older children: acoustic-phonetic and linguistic adaptations to a hearing-impaired peer
This thesis examines the communication strategies used by both hearing (NH) and hearing-impaired (HI) children when interacting with a peer with hearing loss, focusing on the acoustic-phonetic and linguistic properties of their speech. To elicit frequent repetitions of segmental contrasts in HI children’s spontaneous speech in interaction, a new task was developed using minimal pair keywords in a communicative game context. In addition, another referential communication task, the ‘spot the difference’ Diapix task (Van Engen et al., 2010), was used. Eighteen NH and eighteen HI children between 9 and 15 years of age performed the two tasks in pairs, once with a friend with normal hearing (NH-directed speech) and once with a friend with a hearing-impairment (HI-directed speech). Task difficulty increased in interactions involving a HI interlocutor, implying a need for speaker- listener adaptations. Participants’ global acoustic-phonetic (articulation rate, F0 median and range, speech intensity and pausing), segmental (/p/-/b/, /s/- /ʃ/, and /i/-/ɪ/) and linguistic (phrase length, lexical frequency, lexical diversity and speech overlap) adaptations to a HI interlocutor were explored. Although HI speakers were found to differ from NH speakers in many aspects of their speech and language, the two groups used similar, mostly global and linguistic, strategies to adapt to the needs of their HI friend – and the HI children’s ability to adapt did not seem to be related to their own speech level. Only a subset of speakers was found to increase the discriminability of phonetic contrasts in speech, perhaps partly due to speakers using segmental and linguistic strategies as alternative methods in adaptation. Both NH and HI speakers appeared to adjust the extent of adaptations made to the specific needs of their HI interlocutor, therefore implying surprising sensitivity to listener needs. Implications to models of speech communication are discussed
- …