5 research outputs found

    Restoration of clipped audio signals

    Get PDF
    Ses işaretlerinde oluşan bozulmaların ortadan kaldırılması için yenileme işlemi yapılmaktadır. Bu bozulmalardan birisi olan kırpılmış ses işaretlerinin yenileme işleminde, işaretin bozulmamış bölgesindeki işaret parçası aracılığı ile işaretin bozulmaya uğramış bölgesinin özgün durumuna geri getirilmesi amaçlanmaktadır. İşaretin normal olarak verildiği ya da kayıt edildiği zaman ortamından farklı bir ortama dönüştürülmesi ve bu sayede temsil edilmesi için gerekli örnek sayısının azalması seyrek gösterim sayesinde mümkün olmaktadır. Bu çalışmada işaretin ayrık Fourier dönüşümü katsayılarının oluşturduğu seyrek gösterime dayanan bir yenileme yöntemi sunulmaktadır. Önerilen yöntemin başarımının değerlendirilmesi için farklı konuşma ve müzik işaretlerinden oluşan örnekler üzerinde çalışmalar yapılmıştır. Önerilen yöntemin işaretin daha yüksek oranda kırpılması durumunda karşılaştırılan diğer yöntemlere göre daha iyi işaret gürültü oranı başarımı elde ettiği gösterilmiştir.Restoration process is performed to remove degradations formed on the audio signals. In the restoration of clipped audio signals, which is one of these degradations, the degraded section is aimed to be restored to its original by the part of the undegraded section of the signal. The transformation of the signal from as normally given or recorded in the time domain to a different domain and thus reducing the number of samples required to be represented might be possible due to sparse representation. In this study, a restoration method is presented that relies on sparse representation of the discrete Fourier transform coefficients of the signal. In order to evaluate the performance of the proposed method, experiments were performed on various speech and music signal examples. It has been shown that the proposed method achieves better signal to noise ratio performance compared to the other methods in cases of higher clipping ratios

    Variational Bayesian Inference for Source Separation and Robust Feature Extraction

    Get PDF
    International audienceWe consider the task of separating and classifying individual sound sources mixed together. The main challenge is to achieve robust classification despite residual distortion of the separated source signals. A promising paradigm is to estimate the uncertainty about the separated source signals and to propagate it through the subsequent feature extraction and classification stages. We argue that variational Bayesian (VB) inference offers a mathematically rigorous way of deriving uncertainty estimators, which contrasts with state-of-the-art estimators based on heuristics or on maximum likelihood (ML) estimation. We propose a general VB source separation algorithm, which makes it possible to jointly exploit spatial and spectral models of the sources. This algorithm achieves 6% and 5% relative error reduction compared to ML uncertainty estimation on the CHiME noise-robust speaker identification and speech recognition benchmarks, respectively, and it opens the way for more complex VB approximations of uncertainty.Dans cet article, nous considérons le problème de l'extraction des descripteurs de chaque source dans un enregistrement audio multi-sources à l'aide d'un algorithme général de séparation de sources. La difficulté consiste à estimer l'incertitude sur les sources et à la propager aux descripteurs, afin de les estimer de façon robuste en dépit des erreurs de séparation. Les méthodes de l'état de l'art estiment l'incertitude de façon heuristique, tandis que nous proposons d'intégrer sur les paramètres de l'algorithme de séparation de sources. Nous décrivons dans ce but une méthode d'inférence variationnelle bayésienne pour l'estimation de la distribution a posteriori des sources et nous calculons ensuite l'espérance des descripteurs par propagation de l'incertitude selon la méthode d'identification des moments. Nous évaluons la précision des descripteurs en terme d'erreur quadratique moyenne et conduisons des expériences de reconnaissance du locuteur afin d'observer la performance qui en découle pour un problème réel. Dans les deux cas, la méthode proposée donne les meilleurs résultats

    Underdetermined convolutive source separation using two dimensional non-negative factorization techniques

    Get PDF
    PhD ThesisIn this thesis the underdetermined audio source separation has been considered, that is, estimating the original audio sources from the observed mixture when the number of audio sources is greater than the number of channels. The separation has been carried out using two approaches; the blind audio source separation and the informed audio source separation. The blind audio source separation approach depends on the mixture signal only and it assumes that the separation has been accomplished without any prior information (or as little as possible) about the sources. The informed audio source separation uses the exemplar in addition to the mixture signal to emulate the targeted speech signal to be separated. Both approaches are based on the two dimensional factorization techniques that decompose the signal into two tensors that are convolved in both the temporal and spectral directions. Both approaches are applied on the convolutive mixture and the high-reverberant convolutive mixture which are more realistic than the instantaneous mixture. In this work a novel algorithm based on the nonnegative matrix factor two dimensional deconvolution (NMF2D) with adaptive sparsity has been proposed to separate the audio sources that have been mixed in an underdetermined convolutive mixture. Additionally, a novel Gamma Exponential Process has been proposed for estimating the convolutive parameters and number of components of the NMF2D/ NTF2D, and to initialize the NMF2D parameters. In addition, the effects of different window length have been investigated to determine the best fit model that suit the characteristics of the audio signal. Furthermore, a novel algorithm, namely the fusion K models of full-rank weighted nonnegative tensor factor two dimensional deconvolution (K-wNTF2D) has been proposed. The K-wNTF2D is developed for its ability in modelling both the spectral and temporal changes, and the spatial covariance matrix that addresses the high reverberation problem. Variable sparsity that derived from the Gibbs distribution is optimized under the Itakura-Saito divergence and adapted into the K-wNTF2D model. The tensors of this algorithm have been initialized by a novel initialization method, namely the SVD two-dimensional deconvolution (SVD2D). Finally, two novel informed source separation algorithms, namely, the semi-exemplar based algorithm and the exemplar-based algorithm, have been proposed. These algorithms based on the NMF2D model and the proposed two dimensional nonnegative matrix partial co-factorization (2DNMPCF) model. The idea of incorporating the exemplar is to inform the proposed separation algorithms about the targeted signal to be separated by initializing its parameters and guide the proposed separation algorithms. The adaptive sparsity is derived for both ii of the proposed algorithms. Also, a multistage of the proposed exemplar based algorithm has been proposed in order to further enhance the separation performance. Results have shown that the proposed separation algorithms are very promising, more flexible, and offer an alternative model to the conventional methods

    CASCADE : Channel-Aware Structured Cosparse Audio DEclipper

    Get PDF
    International audienceThis work features a new algorithm, CASCADE, which leverages a structured cosparse prior across channels to address the multichannel audio declipping problem. CASCADE technique outperforms the state-of-the-art method A-SPADE applied on each channel separately in all tested settings, while retaining similar runtime
    corecore