4 research outputs found

    MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

    Get PDF
    We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).WOS:0009105595000042-s2.0-85146250664Science Citation Index ExpandedarticleUluslararası işbirliği ile yapılmayan - HAYIROcak2022YÖK - 2022-2

    Değişimli oto-kodlayıcılar kullanılarak birleşik kaynak ayrıştırma ve sınıflandırma

    No full text
    In this paper, we propose a novel multi-task variational auto encoder (VAE) based approach for joint source separation and classification. The network uses a probabilistic encoder for each sources to map the input data to latent space. The latent representation is then used by a probabilistic decoder for the two tasks: source separation and source classification. Throughout a variety of experiments performed on various image and audio datasets, source separation performance of our method is as good as the method that performs source separation under source class supervision. In addition, the proposed method does not require the class labels and can predict the labels.Istanbul Medipol UnivWOS:000653136100066Scopus - Affiliation ID: 60105072Conference Proceedings Citation Index- ScienceProceedings PaperUluslararası işbirliği ile yapılmayan - HAYIROctober2020YÖK - 2020-2

    Weak label supervision for monaural source separation using non-negative denoising variational autoencoders

    No full text
    Derin öğrenme modelleri, büyük miktarda etiketlenmiş veri bulunduğunda kaynak ayrıştırmada çok başarılı olmaktadır. Bununla birlikte, dikkatlice etiketlenmiş veri kümelerine erişim her zaman mümkün olmamaktadır. Bu bildiride, kısa konuşma karışımlarını ayrıştırmayı öğrenmek için kaynak işaretlerini değil de sadece sınıf bilgisini kullanan zayıf bir denetim önerilmektedir. Negatif olmayan bir modeldeki her bir sınıfla degişimsel bir otomatik kodlayıcıyı (VAE) ilişkilendirilmektedir. Derin evrisimsel VAE’lerin, herhangi bir kaynak sinyaline ihtiyaç duymadan, bir ses karı¸sımındaki karmasık isaretleri kestirmek için önsel bir model sundugu gösterilmektedir. Ayrıstırma sonuçlarının kaynak isaret denetimiyle esit düzeyde oldugu gösterilmektedir.Deep learning models are very effective in source separation when there are large amounts of labeled data available. However it is not always possible to have carefully labeled datasets. In this paper, we propose a weak supervision method that only uses class information rather than source signals for learning to separate short utterance mixtures. We associate a variational autoencoder (VAE) with each class within a nonnegative model. We demonstrate that deep convolutional VAEs provide a prior model to identify complex signals in a sound mixture without having access to any source signal. We show that the separation results are on par with source signal supervisionWOS:000518994300189Scopus - Affiliation ID: 60105072Conference Proceedings Citation Index- ScienceArticleNisan2019YÖK - 2018-1

    Audio source separation using variational autoencoders and weak class supervision

    No full text
    In this letter, we propose a source separation method that is trained by observing the mixtures and the class labels of the sources present in the mixture without any access to isolated sources. Since our method does not require source class labels for every time-frequency bin but only a single label for each source constituting the mixture signal, we call this scenario as weak class supervision. We associate a variational autoencoder (VAE) with each source class within a non negative (compositional) model. Each VAE provides a prior model to identify the signal from its associated class in a sound mixture. After training the model on mixtures, we obtain a generative model for each source class and demonstrate our method on one-second mixtures of utterances of digits from 0 to 9. We show that the separation performance obtained by source class supervision is as good as the performance obtained by source signal supervision.WOS:000480311900003Science Citation Index ExpandedQ2ArticleUluslararası işbirliği ile yapılmayan - HAYIREylül2019YÖK - 2019-2
    corecore