129 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Listeners’ Spectral Reallocation Preferences for Speech in Noise

    Get PDF
    Modifying the spectrum of recorded or synthetic speech is an effective strategy for boosting intelligibility in noise without increasing the speech level. However, the wider impact of changes to the spectral energy distribution of speech is poorly understood. The present study explored the influence of spectral modifications using an experimental paradigm in which listeners were able to adjust speech parameters directly with real-time audio feedback, allowing the joint elicitation of preferences and word recognition scores. In two experiments involving full-bandwidth and bandwidth-limited speech, respectively, listeners adjusted one of eight features that altered the speech spectrum, and then immediately carried out a sentence-in-noise recognition task at the chosen setting. Listeners’ preferred adjustments in most conditions involved the transfer of speech energy from the sub-1 kHz region to the 1–4 kHz range. Preferences were not random, even when intelligibility was at the ceiling or constant across a range of adjustment values, suggesting that listener choices encompass more than a desire to maintain comprehensibility.Olympia Simantiraki was funded by the European Commission under the Marie Curie European Training Network ENRICH (675324)

    Intelligibility-enhancing speech modifications: the Hurricane Challenge

    Get PDF
    Speech output is used extensively, including in situations where correct message reception is threatened by adverse listening conditions. Recently, there has been a growing interest in algorithmic modifications that aim to increase the intelligibility of both natural and synthetic speech when presented in noise. The Hurricane Challenge is the first large-scale open evaluation of algorithms designed to enhance speech intelligibility. Eighteen systems operating on a common data set were subjected to extensive listening tests and compared to unmodified natural and text-to-speech (TTS) baselines. The best-performing systems achieved gains over unmodified natural speech of 4.4 and 5.1 dB in competing speaker and stationary noise respectively, while TTS systems made gains of 5.6 and 5.1 dB over their baseline. Surprisingly, for most conditions the largest gains were observed for noise-independent algorithms, suggesting that performance in this task can be further improved by exploiting information in the masking signal. Index Terms: intelligibility, speech modification, TTS 1

    Kinematic signatures of prosody in Lombard speech

    Get PDF
    Peer reviewe

    Speech Modifications for Supporting Auditory Comprehension in Aphasia

    Get PDF
    Speaking “clearly” is a common strategy used to support auditory comprehension for people with hearing loss (Pichney, Durlach, & Braida, 1986). Recent preliminary research has also found that modifying speaking behaviors can facilitate comprehension for all people, not just those with hearing loss. This technique of using “clear speech” was shown to help people with language disorders following neurological impairment (aphasia) as well as the typical control adults. The aim of the present study was to further these findings by analyzing the benefits of using clear speech for people with neurological impairment and typical control peers in less than optimal listening environments (background noise). Although no significant differences were found in participant response accuracy or reaction time regardless of speaking style or listening environment, results of this study were limited by small participant numbers and simple stimuli that lead to observed ceiling effects

    Speech produced in noise: Relationship between listening difficulty and acoustic and durational parameters.

    Get PDF
    Conversational speech produced in noise can be characterised by increases in intelligibility relative to such speech produced in quiet. Listening difficulty (LD) is a metric that can be used to evaluate speech transmission performance more sensitively than intelligibility scores in situations in which performance is likely to be high. The objectives of the present study were to evaluate the LD of speech produced in different noise and style conditions, to evaluate the spectral and durational speech modifications associated with these conditions, and to determine whether any of the spectral and durational parameters predicted LD. Nineteen subjects were instructed to speak at normal and loud volumes in the presence of background noise at 40.5 dB(A) and babble noise at 61 dB(A). The speech signals were amplitude-normalised, combined with pink noise to obtain a signal-to-noise ratio of -6 dB, and presented to twenty raters who judged their LD. Vowel duration, fundamental frequency and the proportion of the spectral energy in high vs low frequencies increased with the noise level within both styles. LD was lowest when the speech was produced in the presence of high level noise and at a loud volume, indicating improved intelligibility. Spectrum balance was observed to predict LD
    corecore