616 research outputs found
Deep Clustering and Conventional Networks for Music Separation: Stronger Together
Deep clustering is the first method to handle general audio separation
scenarios with multiple sources of the same type and an arbitrary number of
sources, performing impressively in speaker-independent speech separation
tasks. However, little is known about its effectiveness in other challenging
situations such as music source separation. Contrary to conventional networks
that directly estimate the source signals, deep clustering generates an
embedding for each time-frequency bin, and separates sources by clustering the
bins in the embedding space. We show that deep clustering outperforms
conventional networks on a singing voice separation task, in both matched and
mismatched conditions, even though conventional networks have the advantage of
end-to-end training for best signal approximation, presumably because its more
flexible objective engenders better regularization. Since the strengths of deep
clustering and conventional network architectures appear complementary, we
explore combining them in a single hybrid network trained via an approach akin
to multi-task learning. Remarkably, the combination significantly outperforms
either of its components.Comment: Published in ICASSP 201
Recommended from our members
Joint singing voice separation and F0 estimation with deep U-net architectures
Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases
A COMPARISON OF EXTENDED SOURCE-FILTER MODELS FOR MUSICAL SIGNAL RECONSTRUCTION
China Scholarship Council (CSC)/
Queen Mary Joint PhD scholarship;
Royal Academy of Engineering Research Fellowshi
Upmixing from Mono : a Source Separation Approach
We present a system for upmixing mono recordings to stereo through the use of sound source separation techniques. The use of sound source separation has the advantage of allowing sources to be placed at distinct points in the stereo field, resulting in more natural sounding upmixes. The system separates an input signal into a number of sources, which can then be imported into a digital audio workstation for upmixing to stereo. Considerations to be taken into account when upmixing are discussed, and a brief overview of the various sound source separation techniques used in the system are given. The effectiveness of the proposed system is then demonstrated on real-world mono recordings
PoLyScriber: Integrated Training of Extractor and Lyrics Transcriber for Polyphonic Music
Lyrics transcription of polyphonic music is challenging as the background
music affects lyrics intelligibility. Typically, lyrics transcription can be
performed by a two step pipeline, i.e. singing vocal extraction frontend,
followed by a lyrics transcriber backend, where the frontend and backend are
trained separately. Such a two step pipeline suffers from both imperfect vocal
extraction and mismatch between frontend and backend. In this work, we propose
a novel end-to-end integrated training framework, that we call PoLyScriber, to
globally optimize the vocal extractor front-end and lyrics transcriber backend
for lyrics transcription in polyphonic music. The experimental results show
that our proposed integrated training model achieves substantial improvements
over the existing approaches on publicly available test datasets.Comment: 13 page
- …