Monaural Singing Voice Separation with Skip-Filtering Connections and
  Recurrent Inference of Time-Frequency Mask

Bengio, Yoshua; Drossos, Konstantinos; Mimilakis, Stylianos Ioannis; Santos, João F.; Schuller, Gerald; Virtanen, Tuomas

research

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

Authors: Yoshua Bengio
Konstantinos Drossos
Stylianos Ioannis Mimilakis
João F. Santos
Gerald Schuller
Tuomas Virtanen
Publication date: 13 February 2018
Publisher
Doi

Abstract

Singing voice separation based on deep learning relies on the usage of time-frequency masking. In many cases the masking process is not a learnable function or is not encapsulated into the deep learning optimization. Consequently, most of the existing methods rely on a post processing step using the generalized Wiener filtering. This work proposes a method that learns and optimizes (during training) a source-dependent mask and does not need the aforementioned post processing step. We introduce a recurrent inference algorithm, a sparse transformation step to improve the mask generation process, and a learned denoising filter. Obtained results show an increase of 0.49 dB for the signal to distortion ratio and 0.30 dB for the signal to interference ratio, compared to previous state-of-the-art approaches for monaural singing voice separation

Similar works

Full text

Available Versions

Fraunhofer-ePrints

oai:fraunhofer.de:N-520118

Last time updated on 17/01/2019

ZENODO

oai:zenodo.org:1064805

Last time updated on 05/01/2018

Crossref

Last time updated on 10/08/2021