134 research outputs found

    Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization

    Get PDF
    Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.Comment: This article has been accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language processing (TASLP

    A Generative Product-of-Filters Model of Audio

    Full text link
    We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic filtering approach to signal processing, but replaces hand-designed decompositions built of basic signal processing operations with a learned decomposition based on statistical inference. This paper formulates the PoF model and derives a mean-field method for posterior inference and a variational EM algorithm to estimate the model's free parameters. We demonstrate PoF's potential for audio processing on a bandwidth expansion task, and show that PoF can serve as an effective unsupervised feature extractor for a speaker identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod

    Algorithms for nonnegative matrix factorization with the beta-divergence

    Get PDF
    This paper describes algorithms for nonnegative matrix factorization (NMF) with the beta-divergence (beta-NMF). The beta-divergence is a family of cost functions parametrized by a single shape parameter beta that takes the Euclidean distance, the Kullback-Leibler divergence and the Itakura-Saito divergence as special cases (beta = 2,1,0, respectively). The proposed algorithms are based on a surrogate auxiliary function (a local majorization of the criterion function). We first describe a majorization-minimization (MM) algorithm that leads to multiplicative updates, which differ from standard heuristic multiplicative updates by a beta-dependent power exponent. The monotonicity of the heuristic algorithm can however be proven for beta in (0,1) using the proposed auxiliary function. Then we introduce the concept of majorization-equalization (ME) algorithm which produces updates that move along constant level sets of the auxiliary function and lead to larger steps than MM. Simulations on synthetic and real data illustrate the faster convergence of the ME approach. The paper also describes how the proposed algorithms can be adapted to two common variants of NMF : penalized NMF (i.e., when a penalty function of the factors is added to the criterion function) and convex-NMF (when the dictionary is assumed to belong to a known subspace).Comment: \`a para\^itre dans Neural Computatio

    Speech Enhancement Using an Iterative Posterior NMF

    Get PDF
    Over the years, miscellaneous methods for speech enhancement have been proposed, such as spectral subtraction (SS) and minimum mean square error (MMSE) estimators. These methods do not require any prior knowledge about the speech and noise signals nor any training stage beforehand, so they are highly flexible and allow implementation in various situations. However, these algorithms usually assume that the noise is stationary and are thus not good at dealing with nonstationary noise types, especially under low signal-to-noise (SNR) conditions. To overcome the drawbacks of the above methods, nonnegative matrix factorization (NMF) is introduced. NMF approach is more robust to nonstationary noise. In this chapter, we are actually interested in the application of speech enhancement using NMF approach. A speech enhancement method based on regularized nonnegative matrix factorization (NMF) for nonstationary Gaussian noise is proposed. The spectral components of speech and noise are modeled as Gamma and Rayleigh, respectively. We propose to adaptively estimate the sufficient statistics of these distributions to obtain a natural regularization of the NMF criterion
    • …
    corecore