Time-domain speech enhancement using generative adversarial networks

Bonafonte Cávez, Antonio; Pascual de la Puente, Santiago; Serra, Joan

Time-domain speech enhancement using generative adversarial networks

Authors: Antonio Bonafonte Cávez
Santiago Pascual de la Puente
Joan Serra
Publication date: 1 November 2019
Publisher: 'Elsevier BV'
Doi

Abstract

Speech enhancement improves recorded voice utterances to eliminate noise that might be impeding their intelligibility or compromising their quality. Typical speech enhancement systems are based on regression approaches that subtract noise or predict clean signals. Most of them do not operate directly on waveforms. In this work, we propose a generative approach to regenerate corrupted signals into a clean version by using generative adversarial networks on the raw signal. We also explore several variations of the proposed system, obtaining insights into proper architectural choices for an adversarially trained, convolutional autoencoder applied to speech. We conduct both objective and subjective evaluations to assess the performance of the proposed method. The former helps us choose among variations and better tune hyperparameters, while the latter is used in a listening experiment with 42 subjects, confirming the effectiveness of the approach in the real world. We also demonstrate the applicability of the approach for more generalized speech enhancement, where we have to regenerate voices from whispered signals.Peer ReviewedPostprint (author's final draft

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/180...

Last time updated on 09/04/2020

UPCommons

oai:upcommons.upc.edu:2117/180...

Last time updated on 17/04/2020