30,659 research outputs found

    Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation

    Get PDF
    We propose a new method to incorporate rich statistical priors, modeling temporal gain sequences in the solutions of nonnegative matrix factorization (NMF). The proposed method can be used for single-channel source separation (SCSS) applications. In NMF based SCSS, NMF is used to decompose the spectra of the observed mixed signal as a weighted linear combination of a set of trained basis vectors. In this work, the NMF decomposition weights are enforced to consider statistical and temporal prior information on the weight combination patterns that the trained basis vectors can jointly receive for each source in the observed mixed signal. The Hidden Markov Model (HMM) is used as a log-normalized gains (weights) prior model for the NMF solution. The normalization makes the prior models energy independent. HMM is used as a rich model that characterizes the statistics of sequential data. The NMF solutions for the weights are encouraged to increase the log-likelihood with the trained gain prior HMMs while reducing the NMF reconstruction error at the same time

    Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation

    Get PDF
    We propose to use minimum mean squared error (MMSE) estimates to enhance the signals that are separated by nonnegative matrix factorization (NMF). In single channel source separation (SCSS), NMF is used to train a set of basis vectors for each source from their training spectrograms. Then NMF is used to decompose the mixed signal spectrogram as a weighted linear combination of the trained basis vectors from which estimates of each corresponding source can be obtained. In this work, we deal with the spectrogram of each separated signal as a 2D distorted signal that needs to be restored. A multiplicative distortion model is assumed where the logarithm of the true signal distribution is modeled with a Gaussian mixture model (GMM) and the distortion is modeled as having a log-normal distribution. The parameters of the GMM are learned from training data whereas the distortion parameters are learned online from each separated signal. The initial source estimates are improved and replaced with their MMSE estimates under this new probabilistic framework. The experimental results show that using the proposed MMSE estimation technique as a post enhancement after NMF improves the quality of the separated signal

    Gaussian mixture gain priors for regularized nonnegative matrix factorization in single-channel source separation

    Get PDF
    We propose a new method to incorporate statistical priors on the solution of the nonnegative matrix factorization (NMF) for single-channel source separation (SCSS) applications. The Gaussian mixture model (GMM) is used as a log-normalized gain prior model for the NMF solution. The normalization makes the prior models energy independent. In NMF based SCSS, NMF is used to decompose the spectra of the observed mixed signal as a weighted linear combination of a set of trained basis vectors. In this work, the NMF decomposition weights are enforced to consider statistical prior information on the weight combination patterns that the trained basis vectors can jointly receive for each source in the observed mixed signal. The NMF solutions for the weights are encouraged to increase the loglikelihood with the trained gain prior GMMs while reducing the NMF reconstruction error at the same time

    Semi-blind speech-music separation using sparsity and continuity priors

    Get PDF
    In this paper we propose an approach for the problem of single channel source separation of speech and music signals. Our approach is based on representing each source's power spectral density using dictionaries and nonlinearly projecting the mixture signal spectrum onto the combined span of the dictionary entries. We encourage sparsity and continuity of the dictionary coefficients using penalty terms (or log-priors) in an optimization framework. We propose to use a novel coordinate descent technique for optimization, which nicely handles nonnegativity constraints and nonquadratic penalty terms. We use an adaptive Wiener filter, and spectral subtraction to reconstruct both of the sources from the mixture data after corresponding power spectral densities (PSDs) are estimated for each source. Using conventional metrics, we measure the performance of the system on simulated mixtures of single person speech and piano music sources. The results indicate that the proposed method is a promising technique for low speech-to-music ratio conditions and that sparsity and continuity priors help improve the performance of the proposed system

    Single channel speech music separation using nonnegative matrix factorization and spectral masks

    Get PDF
    A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with spectral masks is proposed in this work. The proposed algorithm uses training data of speech and music signals with nonnegative matrix factorization followed by masking to separate the mixed signal. In the training stage, NMF uses the training data to train a set of basis vectors for each source. These bases are trained using NMF in the magnitude spectrum domain. After observing the mixed signal, NMF is used to decompose its magnitude spectra into a linear combination of the trained bases for both sources. The decomposition results are used to build a mask, which explains the contribution of each source in the mixed signal. Experimental results show that using masks after NMF improves the separation process even when calculating NMF with fewer iterations, which yields a faster separation process

    Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks

    Get PDF
    A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with sliding windows and spectral masks is proposed in this work. We train a set of basis vectors for each source signal using NMF in the magnitude spectral domain. Rather than forming the columns of the matrices to be decomposed by NMF of a single spectral frame, we build them with multiple spectral frames stacked in one column. After observing the mixed signal, NMF is used to decompose its magnitude spectra into a weighted linear combination of the trained basis vectors for both sources. An initial spectrogram estimate for each source is found, and a spectral mask is built using these initial estimates. This mask is used to weight the mixed signal spectrogram to find the contributions of each source signal in the mixed signal. The method is shown to perform better than the conventional NMF approach

    Single channel speech-music separation using matching pursuit and spectral masks

    Get PDF
    A single-channel speech music separation algorithm based on matching pursuit (MP) with multiple dictionaries and spectral masks is proposed in this work. A training data for speech and music signals is used to build two sets of magnitude spectral vectors of each source signal. These vectors’ sets are called dictionaries, and the vectors are called atoms. Matching pursuit is used to sparsely decompose the magnitude spectrum of the observed mixed signal as a nonnegative weighted linear combination of the best atoms in the two dictionaries that match the mixed signal structure. The weighted sum of the resulting decomposition terms that include atoms from the speech dictionary is used as an initial estimate of the speech signal contribution in the mixed signal, and the weighted sum of the remaining terms for the music signal contribution. The initial estimate of each source is used to build a spectral mask that is used to reconstruct the source signals. Experimental results show that integrating MP with spectral mask gives good separation results

    Using local temporal features of bounding boxes for walking/running classification

    Get PDF
    For intelligent surveillance, one of the major tasks to achieve is to recognize activities present in the scene of interest. Human subjects are the most important elements in a surveillance system and it is crucial to classify human actions. In this paper, we tackle the problem of classifying human actions as running or walking in videos. We propose using local temporal features extracted from rectangular boxes that surround the subject of interest in each frame. We test the system using a database of hand-labeled walking and running videos. Our experiments yield a low 2.5% classification error rate using period-based features and the local speed computed using a range of frames around the current frame. Shorter range time-derivative features are not very useful since they are highly variable. Our results show that the system is able to correctly recognize running or walking activities despite differences in appearance and clothing of subjects

    Güvenilir biyometrik kıyım yöntemi (Trustworthy biometric hashing method)

    Get PDF
    In this paper, we propose a novel biometric hashing method. We employ a password-generated random projection matrix applied to the face images directly instead of applying to the features extracted from face images and improve the methods in the literature. We aim to preserve privacy while achieving desirable accuracy in a biometric verification system. We do the verification in the hash domain and ensure irreversibility. In addition, we can get a new hash value by only changing the password which ensures cancelable biometrics property. We achieve zero equal error rate (EER) on Carnegie Mellon University face database. Furthermore, we achieve an EER of 0.0061, even if the attackers compromise the password and the random number generator. Besides, we test robustness of the proposed system against possible degradations due to sensor and environment inperfections. The norm of error is below optimum threshold obtained at EER for all distortions

    The Einstein-Vlasov system/Kinetic theory

    Get PDF
    The main purpose of this article is to provide a guide to theorems on global properties of solutions to the Einstein-Vlasov system. This system couples Einstein's equations to a kinetic matter model. Kinetic theory has been an important field of research during several decades in which the main focus has been on nonrelativistic and special relativistic physics, {\it i.e.} to model the dynamics of neutral gases, plasmas, and Newtonian self-gravitating systems. In 1990, Rendall and Rein initiated a mathematical study of the Einstein-Vlasov system. Since then many theorems on global properties of solutions to this system have been established. The Vlasov equation describes matter phenomenologically and it should be stressed that most of the theorems presented in this article are not presently known for other such matter models ({\it i.e.} fluid models). This paper gives introductions to kinetic theory in non-curved spacetimes and then the Einstein-Vlasov system is introduced. We believe that a good understanding of kinetic theory in non-curved spacetimes is fundamental to good comprehension of kinetic theory in general relativity.Comment: 40 pages, updated version, to appear in Living Reviews in Relativit
    corecore