8,640 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Speaker-normalized sound representations in the human auditory cortex

    Get PDF
    The acoustic dimensions that distinguish speech sounds (like the vowel differences in “boot” and “boat”) also differentiate speakers’ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners’ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener’s perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers

    An evaluation of intrusive instrumental intelligibility metrics

    Full text link
    Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and sEPSMcorr\text{sEPSM}^\text{corr}. In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of ρ=0.92\rho=0.92 and ρ=0.89\rho=0.89, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on data sets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, the paper presents a new version of SIIB called SIIBGauss\text{SIIB}^\text{Gauss}, which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 201

    The evolution of auditory contrast

    Get PDF
    This paper reconciles the standpoint that language users do not aim at improving their sound systems with the observation that languages seem to improve their sound systems. Computer simulations of inventories of sibilants show that Optimality-Theoretic learners who optimize their perception grammars automatically introduce a so-called prototype effect, i.e. the phenomenon that the learner’s preferred auditory realization of a certain phonological category is more peripheral than the average auditory realization of this category in her language environment. In production, however, this prototype effect is counteracted by an articulatory effect that limits the auditory form to something that is not too difficult to pronounce. If the prototype effect and the articulatory effect are of a different size, the learner must end up with an auditorily different sound system from that of her language environment. The computer simulations show that, independently of the initial auditory sound system, a stable equilibrium is reached within a small number of generations. In this stable state, the dispersion of the sibilants of the language strikes an optimal balance between articulatory ease and auditory contrast. The important point is that this is derived within a model without any goal-oriented elements such as dispersion constraints

    Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech

    No full text
    Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech (NiCLS) produce more pronounced amplitude modulations in noise vs. in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech

    Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation

    Get PDF
    A study is presented on how well objective measures of speech quality and intelligibility can predict the subjective in- telligibility of speech that has undergone spectral envelope smoothing and simplification of its excitation. Speech modi- fications are made by resynthesising speech that has been spec- trally smoothed. Objective measures are applied to the mod- ified speech and include measures of speech quality, signal- to-noise ratio and intelligibility, as well as proposing the nor- malised frequency-weighted spectral distortion (NFD) measure. The measures are compared to subjective intelligibility scores where it is found that several have high correlation (|r| ≥ 0.7), with NFD achieving the highest correlation (r = −0.81
    corecore