On the use of U-Net for dominant melody estimation in polyphonic music

Abstract

International audienceEstimation of dominant melody in polyphonic music remains a difficult task, even though promising breakthroughs have been done recently with the introduction of the Harmonic CQT and the use of fully convolutional networks. In this paper, we build upon this idea and describe how U-Net-a neural network originally designed for medical image segmentation-can be used to estimate the dominant melody in polyphonic audio. We propose in particular the use of an original layer-by-layer sequential training method, and show that this method used along with careful training data conditioning improve the results compared to plain convolutional networks

    Similar works