Joint singing voice separation and F0 estimation with deep U-net architectures

andreas; chao-ling; colin; emmanuel; geoffrey; jean-louis; jong; justin; matthias; olaf; rachel; rachel; scott; sheng; sungheon; tak-shing; tuomas; yipeng

research

Joint singing voice separation and F0 estimation with deep U-net architectures

Authors: andreas
chao-ling
colin
emmanuel
geoffrey
jean-louis
jong
justin
matthias
olaf
rachel
rachel
scott
sheng
sungheon
tak-shing
tuomas
yipeng
Publication date: 18 November 2019
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

Last time updated on 10/08/2021

Sustaining member

City Research Online

oai:openaccess.city.ac.uk:2366...

Last time updated on 11/03/2020