Effects of Lombard Reflex on the Performance of Deep-Learning-Based
  Audio-Visual Speech Enhancement Systems

Jensen, Jesper; Michelsanti, Daniel; Sigurdsson, Sigurdur; Tan, Zheng-Hua

research

Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems

Authors: Jesper Jensen
Daniel Michelsanti
Sigurdur Sigurdsson
Zheng-Hua Tan
Publication date: 15 November 2018
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

Humans tend to change their way of speaking when they are immersed in a noisy environment, a reflex known as Lombard effect. Current speech enhancement systems based on deep learning do not usually take into account this change in the speaking style, because they are trained with neutral (non-Lombard) speech utterances recorded under quiet conditions to which noise is artificially added. In this paper, we investigate the effects that the Lombard reflex has on the performance of audio-visual speech enhancement systems based on deep learning. The results show that a gap in the performance of as much as approximately 5 dB between the systems trained on neutral speech and the ones trained on Lombard speech exists. This indicates the benefit of taking into account the mismatch between neutral and Lombard speech in the design of audio-visual speech enhancement systems

Similar works

Full text

Available Versions

Crossref

Last time updated on 10/08/2021

VBN (Videnbasen) Aalborg Universitets forskningsportal

oai:pure.atira.dk:publications...

Last time updated on 22/04/2020