4 research outputs found

    Time-domain music source separation for choirs and ensembles

    Get PDF
    Music source separation is the task of separating musical sources from an audio mixture. It has various direct applications including automatic karaoke generation, enhancing musical recordings, and 3D-audio upmixing; but also has implications for other downstream music information retrieval tasks such as multi-instrument transcription. However, the majority of research has focused on fixed stem separation of vocals, drums, and bass stems. While such models have highlighted capabilities of source separation using deep learning, their implications are limited to very few use cases. Such models are unable to separate most other instruments due to insufficient training data. Moreover, class-based separation inherently limits the applicability of such models to be unable to separate monotimbral mixtures. This thesis focuses on separating musical sources without requiring timbral distinction among the sources. Preliminary attempts focus on the separation of vocal harmonies from choral ensembles using time-domain models with permutation invariant training. The method performs well but fails to generalise across datasets mainly due to a lack of sizeable clean training data. Recognising the challenge of obtaining sizeable, bleed-free data for ensemble recordings, a new high-quality synthesised dataset "EnsembleSet" is presented which was used to train a monotimbral ensemble separation model for string ensembles. Moreover, training a model using permutation invariant training is found to be capable of separate mixtures of identical, distinct, and unseen timbres as well. Although models trained on EnsembleSet can separate mixtures from unseen real-world datasets, performance drops are observed for out-of-domain test data. Subsequently improving cross-dataset performance using fine-tuning is explored for time-domain and complex-domain separation models. Further investigation into the performance of these models with different training strategies and different musical contexts is investigated to achieve a better understanding of the behaviour of these timbre-agnostic separation models. The techniques developed in this work are currently being utilised in the industry for vocal harmony separation and also lay the groundwork for future exploration toward universal source separation based on monophonic sound event separation

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f
    corecore