119 research outputs found

    Factors affecting the perception of noise-vocoded speech: stimulus properties and listener variability.

    Get PDF
    This thesis presents an investigation of two general factors affecting speech perception in normal-hearing adults. Two sets of experiments are described, in which speakers of English are presented with degraded (noise-vocoded) speech. The first set of studies investigates the importance of linguistic rhythm as a cue for perceptual adaptation to noise-vocoded sentences. Results indicate that the presence of native English rhythmic patterns benefits speech recognition and adaptation, but not when higher-level linguistic information is absent (i.e. when the sentences are in a foreign language). It is proposed that rhythm may help in the perceptual encoding of degraded speech in phonological working memory. Experiments in this strand also present evidence against a critical role for indexical characteristics of the speaker in the adaptation process. The second set of studies concerns the issue of individual differences in speech perception. A psychometric curve-fitting approach is selected as the preferred method of quantifying variability in noise-vocoded sentence recognition. Measures of working memory and verbal IQ are identified as candidate correlates of performance with noise-vocoded sentences. When the listener is exposed to noise-vocoded stimuli from different linguistic categories (consonants and vowels, isolated words, sentences), there is evidence for the interplay of two initial listening 'modes' in response to the degraded speech signal, representing 'top-down' cognitive-linguistic processing and 'bottom-up' acoustic-phonetic analysis. Detailed analysis of segment recognition presents a perceptual role for temporal information across all the linguistic categories, and suggests that performance could be improved through training regimes that direct attention to the most informative acoustic properties of the stimulus. Across several experiments, the results also demonstrate long-term aspects of perceptual learning. In sum, this thesis demonstrates that consideration of both stimulus-based and listener-based factors forms a promising approach to the characterization of speech perception processes in the healthy adult listener

    Noise reduction algorithms and performance metrics for improving speech reception in noise by cochlear-implant users

    Get PDF
    Thesis (Ph. D.)--Harvard University--MIT Division of Health Sciences and Technology, 2005.Includes bibliographical references (p. 229-233).This thesis addresses the design and evaluation of algorithms to improve speech reception for cochlear-implant (CI) users in adverse listening environments. We develop and assess performance metrics for use in the algorithm design process; such metrics make algorithm evaluation efficient, consistent, and subject independent. One promising performance metric is the Speech Transmission Index (STI), which is well correlated with speech reception by normal-hearing listeners for additive noise and reverberation. We expect the STI will effectively predict speech reception by CI users since typical CI sound-processing strategies, like the STI, rely on the envelope signals in frequency bands spanning the speech spectrum. However, STI-based metrics have proven unsatisfactory for assessing the effects of nonlinear operations on the intelligibility of processed speech. In this work we consider modifications to the STI that account for nonlinear operations commonly found in CI sound-processing and noise reduction algorithms. We consider a number of existing speech-based STI metrics and propose novel metrics applicable to nonlinear operations. A preliminary evaluation results in the selection of three candidate metrics for extensive evaluation. In four central experiments, we consider the effects of acoustic degradation, N-of-M processing, spectral subtraction, and binaural noise reduction on the intelligibility of CI-processed speech. We assess the ability of the candidate metrics to predict speech reception scores.(cont.) Subjects include CI users as well as normal-hearing subjects listening to a noise-vocoder simulation of CI sound-processing. Our results show that: 1) both spectral subtraction and binaural noise reduction improve the intelligibility of CI-processed speech and 2) of the candidate metrics, one method (the normalized correlation metric) consistently predicts the major trends in speech reception scores for all four experiments.by Raymond Lee Goldsworthy.Ph.D

    On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement

    Get PDF
    The majority of deep neural network (DNN) based speech enhancement algorithms rely on the mean-square error (MSE) criterion of short-time spectral amplitudes (STSA), which has no apparent link to human perception, e.g. speech intelligibility. Short-Time Objective Intelligibility (STOI), a popular state-of-the-art speech intelligibility estimator, on the other hand, relies on linear correlation of speech temporal envelopes. This raises the question if a DNN training criterion based on envelope linear correlation (ELC) can lead to improved speech intelligibility performance of DNN based speech enhancement algorithms compared to algorithms based on the STSA-MSE criterion. In this paper we derive that, under certain general conditions, the STSA-MSE and ELC criteria are practically equivalent, and we provide empirical data to support our theoretical results. Furthermore, our experimental findings suggest that the standard STSA minimum-MSE estimator is near optimal, if the objective is to enhance noisy speech in a manner which is optimal with respect to the STOI speech intelligibility estimator

    Speech Intelligibility Prediction for Hearing Aid Systems

    Get PDF

    Measuring listening effort using physiological, behavioral and subjective methods in normal hearing subjects: Effect of signal to noise ratio and presentation level

    Get PDF
    The main objective of the study is to compare the effectiveness of pupillometry, working memory and subjective rating scale —the physiological, behavioral, and subjective measures of listening effort— at different signal to noise ratios (SNR) and presentation levels: when administered together. Eleven young normal hearing individuals with mean age of 21.7 years (SD=1.9 years) participated in the study. The HINT sentences were used for speech perception in noise task. The listening effort was quantified using peak pupil dilation, working memory, working memory difference, subjective rating of listening and recall effort. The rating of perceived performance, frustration level and disengagement were also obtained. Using a repeated measure design, we examined how SNR (+6 dB to -10 dB) and presentation level (50- and 65-dB SPL) affect listening effort. Tobii eye-tracker software and custom MATLAB programing were used for stimulus presentation and data analysis. SNR had significant effect on peak pupil dilation, working memory, working memory difference, and subjective rating of listening effort. Speech intelligibility had significant correlation with all of the listening effort measures except working memory difference. The listening effort measures did not correlate significantly when controlled for speech intelligibility indicating different underlying constructs. When effect sizes are compared working memory (η2p = 0.98) was most sensitive to SNR effect, followed by subjective rating of listening effort (η2p = 0.84), working memory difference (η2p = 0.52) and peak pupil dilation (η2p = 0.40). Only peak pupil dilation showed significant effect of presentation level. The physiological, behavioral and subjective measures of listening effort have different underlying constructs and the sensitivity of these measures varies in representing the effect of SNR and presentation level. The individual data trend analysis shows different breakdown points for physiological and behavioral and subjective measures. There is a need to further explore the relationship of listening effort measures across different SNRs also how these relationship changes in persons with hearing loss

    The feasibility of the dual-task paradigm as a framework for a clinical test of listening effort in cochlear implant users

    Get PDF
    The overall aim of this thesis is to evaluate the feasibility of using the behavioural framework of the dual-task paradigm as the basis of a clinical test of listening effort (LE) in cochlear implant (CI) users. It is hypothesised that, if a primary listening task is performed together with a secondary visual task, performance in the visual task will deteriorate as the listening task becomes harder. This deterioration in secondary visual task performance can then provide an index of LE. An initial series of six experiments progressively modified the dual-task design (in an attempt to optimise its sensitivity to LE), leading to the selection of British English Lexicon (BEL) sentences for the listening task and a digit stream visual task. A further three experiments applied this dual-task to 30 normal hearing (NH) participants listening to normal speech, 30 NH participants listening to CI simulations, and 25 CI users listening through their speech processors. Performance in quiet conditions was compared to that in different levels of background noise. Adaptive tracking procedures were used in an attempt to ensure that the challenge of noise was equal for all participants. This principle was also applied to equalise difficulty in terms of the number of channels used in the spectral resolution of the CI simulations. As expected, NH participants only exhibited significant deterioration in visual accuracy when noise was present (p<.001), suggesting increased LE. Interestingly, however, when CI simulations were applied, this significant visual deterioration occurred immediately in quiet (p<.001). The same result occurred in quiet for the CI users too (p<.001). Therefore, it appears that the degraded auditory input provided by CI induces LE even in optimal listening conditions. These results suggest that the dual-task paradigm could feasibly become a framework for developing a clinical test of LE in the CI user population

    Realising the head-shadow benefit to cochlear implant users

    Get PDF
    Cochlear implant (CI) users struggle to understand speech in noise. They suffer from elevated hearing thresholds and, with practically no binaural unmasking, they rely heavily on better-ear listening and lip reading. Traditional measures of spatial release from masking (SRM) quantify the speech reception threshold (SRT) improvement due to the azimuthal separation of speech and interferers when directly facing the speech source. The Jelfs et al. (2011) model of SRM predicts substantial benefits of orienting the head away from the target speech. Audio-only and audio-visual (AV) SRTs in normally hearing (NH) listeners and CI users confirmed model predictions of speech-facing SRM and head-orientation benefit (HOB). The lip-reading benefit (LRB) was not disrupted by a modest 30° orientation. When attending to speech with a gradually diminishing speech-to-noise-ratio (SNR), CI users were found to make little spontaneous use of their available HOB. Following a simple instruction to explore their HOB, CI users immediately reached as much as 5 dB lower SNRs. AV speech presentation significantly inhibited head movements (it nearly eradicated CI users’ spontaneous head turns), but had a limited impact on the SNRs reached post-instruction, compared to audio-only presentation. NH listeners age-matched to our CI participants made more spontaneous head turns in the free-head experiment but were poorer than CI users at exploiting their HOB post-instruction, despite their exhibiting larger objective HOB. NH listeners’ and CI users’ LRB measured 3 and 5 dB, respectively. Our findings both dispel the erroneous beliefs held by CI professionals that facing the speech constitutes an optimal listening strategy (whether for lip-reading or to optimise the use of microphone directionality) and pave the way to obvious translational applications

    Individual and environment-related acoustic-phonetic strategies for communicating in adverse conditions

    Get PDF
    In many situations it is necessary to produce speech in ‘adverse conditions’: that is, conditions that make speech communication difficult. Research has demonstrated that speaker strategies, as described by a range of acoustic-phonetic measures, can vary both at the individual level and according to the environment, and are argued to facilitate communication. There has been debate as to the environmental specificity of these adaptations, and their effectiveness in overcoming communication difficulty. Furthermore, the manner and extent to which adaptation strategies differ between individuals is not yet well understood. This thesis presents three studies that explore the acoustic-phonetic adaptations of speakers in noisy and degraded communication conditions and their relationship with intelligibility. Study 1 investigated the effects of temporally fluctuating maskers on global acoustic-phonetic measures associated with speech in noise (Lombard speech). The results replicated findings of increased power in the modulation spectrum in Lombard speech, but showed little evidence of adaptation to masker fluctuations via the temporal envelope. Study 2 collected a larger corpus of semi-spontaneous communicative speech in noise and other degradations perturbing specific acoustic dimensions. Speakers showed different adaptations across the environments that were likely suited to overcome noise (steady and temporally fluctuating), restricted spectral and pitch information by a noise-excited vocoder, and a sensorineural hearing loss simulation. Analyses of inter-speaker variation in both studies 1 and 2 showed behaviour was highly variable and some strategy combinations were identified. Study 3 investigated the intelligibility of strategies ‘tailored’ to specific environments and the relationship between intelligibility and speaker acoustics, finding a benefit of tailored speech adaptations and discussing the potential roles of speaker flexibility, adaptation level, and intrinsic intelligibility. The overall results are discussed in relation to models of communication in adverse conditions and a model accounting for individual variability in these conditions is proposed

    Máscaras tempo-frequência para a redução de ruído aditivo em implantes cocleares

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2019.Implantes cocleares (IC) são dispositivos que, a partir da estimulação elétrica do nervo auditivo, permitem a restituição parcial da audição em indivíduos com surdez profunda. Apesar de fornecerem uma informação limitada em resolução tanto no tempo quanto na frequência, seus usuários chegam a atingir índices de cerca de 80% de inteligibilidade da fala. Entretanto, esse desempenho cai significativamente na presença de ruído, o que caracteriza a maior parte dos cenários acústicos quotidianos. Técnicas de processamento de sinais para a redução de ruído se apresentam como uma alternativa para melhorar a percepção acústica de usuários de implante coclear. As principais técnicas propostas para redução de ruído em implantes cocleares consistem de máscaras tempo-frequência, destacando-se a máscara binária (BM), o filtro de Wiener (WF) e suas variantes (paramétrico e restrito). Neste trabalho, uma nova teoria unificada de máscaras tempo-frequência é apresentada. A partir do ajuste de dois parâmetros, diferentes funções de supressão podem ser realizadas, dentre as quais, algumas máscaras bem estabelecidas, tais como a máscara binária e o filtro de Wiener. Uma vantagem adicional da teoria proposta é que as máscaras derivadas por esse método são de alguma forma ótimas, diferentemente do que acontece com algumas propostas empíricas, como o filtro de Wiener paramétrico (WP). Além disso, a máscara proposta pode ser ajustada de maneira mais abrangente que o WP. Simulações numéricas extensivas mostram que a máscara proposta e a WP podem trazer melhorias na percepção de fala por usuários de IC em ambientes ruidosos. Entretanto, o desenvolvimento dessas máscaras não leva em conta características específicas do dispositivo. A maior parte dos ICs apresenta ao usuário apenas a informação de envelope temporal do sinal, ignorando totalmente a informação de fase. Nesse contexto, um novo filtro no domínio do tempo é proposto de forma a estimar o envelope de cada sub-banda da fala. Simulações numéricas indicam que o filtro proposto leva a estimações melhores do envelope em relação ao WF. Resultados de experimentos psicoacústicos tanto com normouvintes usando um simulador de IC, quanto com usuários de IC, indicam que a o estimador de envelope proposto leva a maiores valores de inteligibilidade em relação ao WF, sobretudo para sinais com SNR < ?5dB.Abstract: Cochlear implants (IC) are devices that partially restore hearing in subjects with severe deafness, this occurs through electrical stimulation of the auditory nerve. Even though the provided information is limited, due to poor time and frequency resolution, cochlear implant users may score up to 80% in speech intelligibility experiments. However, this performance is significantly reduced in presence of noise, which is the case in most everyday acoustic scenarios. Noise reduction techniques are generally applied to enhance acoustic perception by cochlear implant users. The main proposed techniques consist of time-frequency masks, such as the binary mask (BM), and the Wiener filter (WF) and its variations (parametric and constrained). In this work, a new unified theory for time-frequency masks is presented. By setting two parameters, different suppression functions may be realized, comprising well-established masks, such as the BM and the WF. Another advantage of the proposed theory is that the masks derived from it are somehow optimal, differently from heuristic masks such as the parametric Wiener filter (WP). Besides, the proposed mask can be adjusted within a wider range of suppression functions than the WP. Extensive numerical simulations show that the proposed mask and the WP may provide benefits to IC users perception in noisy environments. Nevertheless, those masks do not take into account specific IC characteristics. Most IC devices present only the signal?s temporal envelope information to the user, regardless of phase information. Thus, a new time-domain filter is proposed in order to estimate the speech signal?s temporal envelope. Numerical simulations show that this second proposed filter leads to better estimates of the speech envelope, compared to the WF. Psychoacoustical experiments with normal hearing subjects using an IC simulator, as well as with actual IC users indicate that the proposed envelope estimator leads to better intelligibility results when compared with the WF, mainly for signals corrupted at SNR < ?5dB
    • …
    corecore