7 research outputs found
iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
The intelligibility of natural speech is seriously degraded when exposed to
adverse noisy environments. In this work, we propose a deep learning-based
speech modification method to compensate for the intelligibility loss, with the
constraint that the root mean square (RMS) level and duration of the speech
signal are maintained before and after modifications. Specifically, we utilize
an iMetricGAN approach to optimize the speech intelligibility metrics with
generative adversarial networks (GANs). Experimental results show that the
proposed iMetricGAN outperforms conventional state-of-the-art algorithms in
terms of objective measures, i.e., speech intelligibility in bits (SIIB) and
extended short-time objective intelligibility (ESTOI), under a Cafeteria noise
condition. In addition, formal listening tests reveal significant
intelligibility gains when both noise and reverberation exist.Comment: 5 pages, Submitted to INTERSPEECH 202
The listening talker: A review of human and algorithmic context-induced modifications of speech
International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output