10 research outputs found

    MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

    Get PDF
    We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).WOS:0009105595000042-s2.0-85146250664Science Citation Index ExpandedarticleUluslararası işbirliği ile yapılmayan - HAYIROcak2022YÖK - 2022-2

    A bayesian allocation model based approach to mixed membership stochastic blockmodels

    Get PDF
    Although detecting communities in networks has attracted considerable recent attention, estimating the number of communities is still an open problem. In this paper, we propose a model, which replicates the generative process of the mixed-membership stochastic block model (MMSB) within the generic allocation framework of Bayesian allocation model (BAM) and BAM-MMSB. In contrast to traditional blockmodels, BAM-MMSB considers the observations as Poisson counts generated by a base Poisson process and marks according to the generative process of MMSB. Moreover, the optimal number of communities for BAM-MMSB is estimated by computing the variational approximations of the marginal likelihood for each model order. Experiments on synthetic and real data sets show that the proposed approach promises a generalized model selection solution that can choose not only the model size but also the most appropriate decomposition.WOS:000750893600001Scopus - Affiliation ID: 60105072Science Citation Index ExpandedQ3-Q4Article; Early AccessUluslararası işbirliği ile yapılmayan - HAYIRŞubat2022YÖK - 2021-22YÖK - 2021-2

    Değişimli oto-kodlayıcılar kullanılarak birleşik kaynak ayrıştırma ve sınıflandırma

    No full text
    In this paper, we propose a novel multi-task variational auto encoder (VAE) based approach for joint source separation and classification. The network uses a probabilistic encoder for each sources to map the input data to latent space. The latent representation is then used by a probabilistic decoder for the two tasks: source separation and source classification. Throughout a variety of experiments performed on various image and audio datasets, source separation performance of our method is as good as the method that performs source separation under source class supervision. In addition, the proposed method does not require the class labels and can predict the labels.Istanbul Medipol UnivWOS:000653136100066Scopus - Affiliation ID: 60105072Conference Proceedings Citation Index- ScienceProceedings PaperUluslararası işbirliği ile yapılmayan - HAYIROctober2020YÖK - 2020-2

    Veri artırma ve işleme yöntemleri kullanarak evrişimli sinir ağı tabanlı duygu tanıma

    No full text
    In this paper, a system that recognizes emotion from human faces is designed using Convolutional Neural Networks (CNN). CNN is known to perform well when trained with a large database. The lack of large and balanced publicly available databases that can be used by deep learning methods for emotion recognition is still a challenge. To overcome this problem, the number of data is increased by merging FER+, CK+ and KDEF databases; and preprocessing is applied to the face images in order to reduce the variations in the database. Data augmentation methods are used to reduce the imbalance in the data distribution that still remains despite the increasing number of data in the merged database. The CNN-based method developed using database merging, image preprocessing and data augmentation, achieved emotion recognition with 80% accuracy.2-s2.0-85178277713Eki

    Model selection for relational data factorization

    No full text
    İlişkisel veri analizinde temel bir problem olan topluluk seziminde, ilişkisel veri için model seçimi hala açık bir problemdir. Bu bildiride, Bayesçi atama modelinin (BAM) genel atama çerçevesi içinde, karmaşık üyelik rastlantısal öbek modellerinin (KÜRÖB) model boyutunun tahmin edilmesi önerilmektedir. İlişkisel verilerin atama modelindeki Poisson sayıları olarak nasıl temsil edildiği açıklanmaktadır. Deney sonuçları sentetik ve gerçek veri kümeleri üzerinde gösterilmektedir. Genelleyici atama yaklaşımı, sadece model boyutunu değil, aynı zamanda en uygun ayrıştırmayı seçebilen genelleştirilmiş bir model seçimi çözümü vaat etmektedirAbstract—As a fundamental problem in relational data analysis, model selection for relational data factorization is still an open problem. In our work, we propose to estimate model order for mixed membership blockmodels (MMSB) within the generic allocation framework of Bayesian allocation model (BAM). We describe how relational data is represented as Poisson counts of the allocation model, and demonstrate our results both on synthetic and real-world data sets. We believe that the generic allocation perspective promises a generalized model selection solution where we do not only select the model order, but also choose the most appropriate factorization.IEEE Turkey Sect; Turkcell; Turkhavacilik Uzaysanayii; Turitak Bilgem; Gebze Teknik Univ; SAP, Detaysoft; NETAS; HavelsanWOS:000518994300248Scopus - Affiliation ID: 60105072Conference Proceedings Citation Index- ScienceProceedings PaperNisan2019YÖK - 2018-1

    Weak label supervision for monaural source separation using non-negative denoising variational autoencoders

    No full text
    Derin öğrenme modelleri, büyük miktarda etiketlenmiş veri bulunduğunda kaynak ayrıştırmada çok başarılı olmaktadır. Bununla birlikte, dikkatlice etiketlenmiş veri kümelerine erişim her zaman mümkün olmamaktadır. Bu bildiride, kısa konuşma karışımlarını ayrıştırmayı öğrenmek için kaynak işaretlerini değil de sadece sınıf bilgisini kullanan zayıf bir denetim önerilmektedir. Negatif olmayan bir modeldeki her bir sınıfla degişimsel bir otomatik kodlayıcıyı (VAE) ilişkilendirilmektedir. Derin evrisimsel VAE’lerin, herhangi bir kaynak sinyaline ihtiyaç duymadan, bir ses karı¸sımındaki karmasık isaretleri kestirmek için önsel bir model sundugu gösterilmektedir. Ayrıstırma sonuçlarının kaynak isaret denetimiyle esit düzeyde oldugu gösterilmektedir.Deep learning models are very effective in source separation when there are large amounts of labeled data available. However it is not always possible to have carefully labeled datasets. In this paper, we propose a weak supervision method that only uses class information rather than source signals for learning to separate short utterance mixtures. We associate a variational autoencoder (VAE) with each class within a nonnegative model. We demonstrate that deep convolutional VAEs provide a prior model to identify complex signals in a sound mixture without having access to any source signal. We show that the separation results are on par with source signal supervisionWOS:000518994300189Scopus - Affiliation ID: 60105072Conference Proceedings Citation Index- ScienceArticleNisan2019YÖK - 2018-1

    Audio source separation using variational autoencoders and weak class supervision

    No full text
    In this letter, we propose a source separation method that is trained by observing the mixtures and the class labels of the sources present in the mixture without any access to isolated sources. Since our method does not require source class labels for every time-frequency bin but only a single label for each source constituting the mixture signal, we call this scenario as weak class supervision. We associate a variational autoencoder (VAE) with each source class within a non negative (compositional) model. Each VAE provides a prior model to identify the signal from its associated class in a sound mixture. After training the model on mixtures, we obtain a generative model for each source class and demonstrate our method on one-second mixtures of utterances of digits from 0 to 9. We show that the separation performance obtained by source class supervision is as good as the performance obtained by source signal supervision.WOS:000480311900003Science Citation Index ExpandedQ2ArticleUluslararası işbirliği ile yapılmayan - HAYIREylül2019YÖK - 2019-2

    Çekirdek Katkili Modelleme Kullanarak Diyalog Geliştirme

    No full text
    It is a major problem for the sound engineers to find the right balance between the dialogue signals and the ambient sources. This problem also makes one of the main causes of the audience concerns. The audience wants to arrange the sound balance based on their personal preferences, listening environment and their hearing. In this work, a method is proposed for enhancing the dialogue signals in stereo recordings that consist of more than one source. The kernel additive modelling that has been used successfully in sound source separation is used to extract the dialogues and the ambient sources from the movie sounds. The separated dialogue and ambient sources can later be upmixed by the user to make a personal mix. The separation performance of the proposed method is evaluated on the sounds generated by mixing the sources which were taken from the only dialogue and only music parts of the movies. It has been shown that the Kernel Additive Modelling (KAM) based method can be successfully used for dialogue enhancement. © 2015 IEEE.2-s2.0-8493921331

    PERCEPTUAL CODING-BASED INFORMED SOURCE SEPARATION

    Get PDF
    Informed Source Separation (ISS) techniques enable manip-ulation of the source signals that compose an audio mix-ture, based on a coder-decoder configuration. Provided the source signals are known at the encoder, a low-bitrate side-information is sent to the decoder and permits to achieve efficient source separation. Recent research has focused on a Coding-based ISS framework, which has an advantage to encode the desired audio objects, while exploiting their mix-ture in an information-theoretic framework. Here, we show how the perceptual quality of the separated sources can be improved by inserting perceptual source coding techniques in this framework, achieving a continuum of optimal bitrate-perceptual distortion trade-offs. Index Terms — Informed source separation, source cod-ing, perceptual models 1
    corecore