This paper presents a configurable version of Extreme Bandwidth Extension
Network (EBEN), a Generative Adversarial Network (GAN) designed to improve
audio captured with body-conduction microphones. We show that although these
microphones significantly reduce environmental noise, this insensitivity to
ambient noise happens at the expense of the bandwidth of the speech signal
acquired by the wearer of the devices. The obtained captured signals therefore
require the use of signal enhancement techniques to recover the full-bandwidth
speech. EBEN leverages a configurable multiband decomposition of the raw
captured signal. This decomposition allows the data time domain dimensions to
be reduced and the full band signal to be better controlled. The multiband
representation of the captured signal is processed through a U-Net-like model,
which combines feature and adversarial losses to generate an enhanced speech
signal. We also benefit from this original representation in the proposed
configurable discriminators architecture. The configurable EBEN approach can
achieve state-of-the-art enhancement results on synthetic data with a
lightweight generator that allows real-time processing.Comment: Accepted in IEEE/ACM Transactions on Audio, Speech and Language
Processing on 14/08/202