897 research outputs found

    Perceptual contribution of vowels to Mandarin sentence intelligibility under conditions of spectral degradation

    Get PDF
    A recent study showed a vowel advantage over consonant to sentence intelligibility in Mandarin. Considering the fact that many important acoustic cues for sentence intelligibility are contained in the vowel segment, the present study investigated the effect of spectral degradation and its interaction effect with vowel duration on Mandarin vowel-only sentence intelligibility. Three types of spectrally degraded stimuli, including fundamental frequency flattened (F0F), sine-wave synthesized (SWS)and noise-vocoded(NV)vowel-only sentences, were generated. Different proportions of vowel centers were preserved by using a noise-replacement paradigm. Listening experiments showed that fundamental frequency contour only had a minimal effect to vowel-only sentence intelligibility, while harmonic cues had a more notable effect. Intelligibility of NV sentences was significantly lower than that of SWS sentences, suggesting other acoustic cues such as formant frequency information contribute to the vowel advantage when harmonic cues are discarded. Discarding vowel edges had a significantly negative effect on vowel-only sentence intelligibility under conditions of spectral degradation. The present study supports emphasis on the preservation of harmonic cues and vowel duration in speech processing strategies.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    The influence of channel and source degradations on intelligibility and physiological measurements of effort

    Get PDF
    Despite the fact that everyday listening is compromised by acoustic degradations, individuals show a remarkable ability to understand degraded speech. However, recent trends in speech perception research emphasise the cognitive load imposed by degraded speech on both normal-hearing and hearing-impaired listeners. The perception of degraded speech is often studied through channel degradations such as background noise. However, source degradations determined by talkers’ acoustic-phonetic characteristics have been studied to a lesser extent, especially in the context of listening effort models. Similarly, little attention has been given to speaking effort, i.e., effort experienced by talkers when producing speech under channel degradations. This thesis aims to provide a holistic understanding of communication effort, i.e., taking into account both listener and talker factors. Three pupillometry studies are presented. In the first study, speech was recorded for 16 Southern British English speakers and presented to normal-hearing listeners in quiet and in combination with three degradations: noise-vocoding, masking and time-compression. Results showed that acoustic-phonetic talker characteristics predicted intelligibility of degraded speech, but not listening effort, as likely indexed by pupil dilation. In the second study, older hearing-impaired listeners were presented fast time-compressed speech under simulated room acoustics. Intelligibility was kept at high levels. Results showed that both fast speech and reverberant speech were associated with higher listening effort, as suggested by pupillometry. Discrepancies between pupillometry and perceived effort ratings suggest that both methods should be employed in speech perception research to pinpoint processing effort. While findings from the first two studies support models of degraded speech perception, emphasising the relevance of source degradations, they also have methodological implications for pupillometry paradigms. In the third study, pupillometry was combined with a speech production task, aiming to establish an equivalent to listening effort for talkers: speaking effort. Normal-hearing participants were asked to read and produce speech in quiet or in the presence of different types of masking: stationary and modulated speech-shaped noise, and competing-talker masking. Results indicated that while talkers acoustically enhance their speech more under stationary masking, larger pupil dilation associated with competing-speaker masking reflected higher speaking effort. Results from all three studies are discussed in conjunction with models of degraded speech perception and production. Listening effort models are revisited to incorporate pupillometry results from speech production paradigms. Given the new approach of investigating source factors using pupillometry, methodological issues are discussed as well. The main insight provided by this thesis, i.e., the feasibility of applying pupillometry to situations involving listener and talker factors, is suggested to guide future research employing naturalistic conversations

    Encoding speech rate in challenging listening conditions: White noise and reverberation

    Get PDF
    Temporal contrasts in speech are perceived relative to the speech rate of the surrounding context. That is, following a fast context sentence, listeners interpret a given target sound as longer than following a slow context, and vice versa. This rate effect, often referred to as “rate-dependent speech perception,” has been suggested to be the result of a robust, low-level perceptual process, typically examined in quiet laboratory settings. However, speech perception often occurs in more challenging listening condi- tions. Therefore, we asked whether rate-dependent perception would be (partially) compromised by signal degradation relative to a clear listening condition. Specifically, we tested effects of white noise and reverberation, with the latter specifically distorting temporal information. We hypothesized that signal degradation would reduce the precision of encoding the speech rate in the context and thereby reduce the rate effect relative to a clear context. This prediction was borne out for both types of degradation in Experiment 1, where the context sentences but not the subsequent target words were degraded. However, in Experiment 2, which compared rate effects when contexts and targets were coherent in terms of signal quality, no reduction of the rate effect was found. This suggests that, when confronted with coherently degraded signals, listeners adapt to challenging listening situations, eliminating the difference between rate-dependent perception in clear and degraded conditions. Overall, the present study contributes towards understanding the consequences of different types of listening environments on the functioning of low- level perceptual processes that listeners use during speech perception

    Acoustic voice characteristics with and without wearing a facemask

    Get PDF
    Facemasks are essential for healthcare workers but characteristics of the voice whilst wearing this personal protective equipment are not well understood. In the present study, we compared acoustic voice measures in recordings of sixteen adults producing standardised vocal tasks with and without wearing either a surgical mask or a KN95 mask. Data were analysed for mean spectral levels at 0–1 kHz and 1–8 kHz regions, an energy ratio between 0–1 and 1–8 kHz (LH1000), harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPPS), and vocal intensity. In connected speech there was significant attenuation of mean spectral level at 1–8 kHz region and there was no significant change in this measure at 0–1 kHz. Mean spectral levels of vowel did not change significantly in mask-wearing conditions. LH1000 for connected speech significantly increased whilst wearing either a surgical mask or KN95 mask but no significant change in this measure was found for vowel. HNR was higher in the mask-wearing conditions than the no-mask condition. CPPS and vocal intensity did not change in mask-wearing conditions. These findings implied an attenuation effects of wearing these types of masks on the voice spectra with surgical mask showing less impact than the KN95

    The impact of reverberant self-masking and overlap-masking effects on speech intelligibility by cochlear implant listeners (L)

    Get PDF
    This is the published version, also available here: http://dx.doi.org/10.1121/1.3614539.The purpose of this study is to determine the relative impact of reverberant self-masking and overlap-masking effects on speech intelligibility by cochlear implant listeners. Sentences were presented in two conditions wherein reverberant consonant segments were replaced with clean consonants, and in another condition wherein reverberant vowel segments were replaced with clean vowels. The underlying assumption is that self-masking effects would dominate in the first condition, whereas overlap-masking effects would dominate in the second condition. Results indicated that the degradation of speech intelligibility in reverberant conditions is caused primarily by self-masking effects that give rise to flattened formant transitions

    The impact of spectrally asynchronous delay on the intelligibility of conversational speech

    Get PDF
    Conversationally spoken speech is rampant with rapidly changing and complex acoustic cues that individuals are able to hear, process, and encode to meaning. For many hearing-impaired listeners, a hearing aid is necessary to hear these spectral and temporal acoustic cues of speech. For listeners with mild-moderate high frequency sensorineural hearing loss, open-fit digital signal processing (DSP) hearing aids are the most common amplification option. Open-fit DSP hearing aids introduce a spectrally asynchronous delay to the acoustic signal by allowing audible low frequency information to pass to the eardrum unimpeded while the aid delivers amplified high frequency sounds to the eardrum that has a delayed onset relative to the natural pathway of sound. These spectrally asynchronous delays may disrupt the natural acoustic pattern of speech. The primary goal of this study is to measure the effect of spectrally asynchronous delay on the intelligibility of conversational speech by normal-hearing and hearing-impaired listeners. A group of normal-hearing listeners (n = 25) and listeners with mild-moderate high frequency sensorineural hearing loss (n = 25) participated in this study. The acoustic stimuli included 200 conversationally-spoken recordings of the low predictability sentences from the revised speech perception in noise test (r-SPIN). These 200 sentences were modified to control for audibility for the hearing-impaired group and so that the acoustic energy above 2 kHz was delayed by either 0 ms (control), 4ms, 8ms, or 32 ms relative to the low frequency energy. The data were analyzed in order to find the effect of each of the four delay conditions on the intelligibility of the final key word of each sentence. Normal-hearing listeners were minimally affected by the asynchronous delay. However, the hearing-impaired listeners were deleteriously affected by increasing amounts of spectrally asynchronous delay. Although the hearing-impaired listeners performed well overall in their perception of conversationally spoken speech in quiet, the intelligibility of conversationally spoken sentences significantly decreased when the delay values were equal to or greater than 4 ms. Therefore, hearing aid manufacturers need to restrict the amount of delay introduced by DSP so that it does not distort the acoustic patterns of conversational speech

    The effects of binaural spectral resolution mismatch on Mandarin speech perception in simulated electric hearing

    Get PDF
    This study assessed the effects of binaural spectral resolution mismatch on the intelligibility of Mandarin speech in noise using bilateral cochlear implant simulations. Noise-vocoded Mandarin speech, corrupted by speech-shaped noise at 0 and 5 dB signal-to-noise ratios, were presented unilaterally or bilaterally to normal-hearing listeners with mismatched spectral resolution between ears. Significant binaural benefits for Mandarin speech recognition were observed only with matched spectral resolution between ears. In addition, the performance of tone identification was more robust to noise than that of sentence recognition, suggesting factors other than tone identification might account more for the degraded sentence recognition in noise.published_or_final_versio

    Communication Biophysics

    Get PDF
    Contains reports on six research projects.National Institutes of Health (Grant 5 PO1 NS13126)National Institutes of Health (Grant 5 RO1 NS18682)National Institutes of Health (Grant 5 RO1 NS20322)National Institutes of Health (Grant 5 R01 NS20269)National Institutes of Health (Grant 5 T32NS 07047)Symbion, Inc.National Science Foundation (Grant BNS 83-19874)National Science Foundation (Grant BNS 83-19887)National Institutes of Health (Grant 6 RO1 NS 12846)National Institutes of Health (Grant 1 RO1 NS 21322

    Individual and environment-related acoustic-phonetic strategies for communicating in adverse conditions

    Get PDF
    In many situations it is necessary to produce speech in ‘adverse conditions’: that is, conditions that make speech communication difficult. Research has demonstrated that speaker strategies, as described by a range of acoustic-phonetic measures, can vary both at the individual level and according to the environment, and are argued to facilitate communication. There has been debate as to the environmental specificity of these adaptations, and their effectiveness in overcoming communication difficulty. Furthermore, the manner and extent to which adaptation strategies differ between individuals is not yet well understood. This thesis presents three studies that explore the acoustic-phonetic adaptations of speakers in noisy and degraded communication conditions and their relationship with intelligibility. Study 1 investigated the effects of temporally fluctuating maskers on global acoustic-phonetic measures associated with speech in noise (Lombard speech). The results replicated findings of increased power in the modulation spectrum in Lombard speech, but showed little evidence of adaptation to masker fluctuations via the temporal envelope. Study 2 collected a larger corpus of semi-spontaneous communicative speech in noise and other degradations perturbing specific acoustic dimensions. Speakers showed different adaptations across the environments that were likely suited to overcome noise (steady and temporally fluctuating), restricted spectral and pitch information by a noise-excited vocoder, and a sensorineural hearing loss simulation. Analyses of inter-speaker variation in both studies 1 and 2 showed behaviour was highly variable and some strategy combinations were identified. Study 3 investigated the intelligibility of strategies ‘tailored’ to specific environments and the relationship between intelligibility and speaker acoustics, finding a benefit of tailored speech adaptations and discussing the potential roles of speaker flexibility, adaptation level, and intrinsic intelligibility. The overall results are discussed in relation to models of communication in adverse conditions and a model accounting for individual variability in these conditions is proposed
    corecore