22 research outputs found
Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
Identification and extraction of singing voice from within musical mixtures
is a key challenge in source separation and machine audition. Recently, deep
neural networks (DNN) have been used to estimate 'ideal' binary masks for
carefully controlled cocktail party speech separation problems. However, it is
not yet known whether these methods are capable of generalizing to the
discrimination of voice and non-voice in the context of musical mixtures. Here,
we trained a convolutional DNN (of around a billion parameters) to provide
probabilistic estimates of the ideal binary mask for separation of vocal sounds
from real-world musical mixtures. We contrast our DNN results with more
traditional linear methods. Our approach may be useful for automatic removal of
vocal sounds from musical mixtures for 'karaoke' type applications
Evaluations on underdetermined blind source separation in adverse environments using time-frequency masking
The successful implementation of speech processing systems in the real world depends on its ability to handle adverse acoustic conditions with undesirable factors such as room reverberation and background noise. In this study, an extension to the established multiple sensors degenerate unmixing estimation technique (MENUET) algorithm for blind source separation is proposed based on the fuzzy c-means clustering to yield improvements in separation ability for underdetermined situations using a nonlinear microphone array. However, rather than test the blind source separation ability solely on reverberant conditions, this paper extends this to include a variety of simulated and real-world noisy environments. Results reported encouraging separation ability and improved perceptual quality of the separated sources for such adverse conditions. Not only does this establish this proposed methodology as a credible improvement to the system, but also implies further applicability in areas such as noise suppression in adverse acoustic environments
A novel underdetermined source recovery algorithm based on k-sparse component analysis
Sparse component analysis (SCA) is a popular method for addressing underdetermined blind source separation in array signal processing applications. We are motivated by problems that arise in the applications where the sources are densely sparse (i.e. the number of active sources is high and very close to the number of sensors). The separation performance of current underdetermined source recovery (USR) solutions, including the relaxation and greedy families, reduces with decreasing the mixing system dimension and increasing the sparsity level (k). In this paper, we present a k-SCA-based algorithm that is suitable for USR in low-dimensional mixing systems. Assuming the sources is at most (m−1) sparse where m is the number of mixtures; the proposed method is capable of recovering the sources from the mixtures given the mixing matrix using a subspace detection framework. Simulation results show that the proposed algorithm achieves better separation performance in k-SCA conditions compared to state-of-the-art USR algorithms such as basis pursuit, minimizing norm-L1, smoothed L0, focal underdetermined system solver and orthogonal matching pursuit
Blind partial separation of underdetermined convolutive mixtures of complex sources based on differential normalized kurtosis
International audienc
From blind source separation to blind source cancellation in the underdetermine case: A new approach based on time-frequency analysis
Many source separation methods are restricted to non-Gaussian, stationary and independent sources. This yields some problems in real applications where the sources often do not match these hypotheses. Moreover, in some cases we are dealing with more sources than available observations which is critical for most classical source separation approaches. In this paper, we propose a new simple source separation method which uses time-frequency information to cancel one source signal from two observations in linear instantaneous mixtures. This efficient method is directly designed for non-stationary sources and applies to various dependent or Gaussian signals which have different time-frequency representations