5,682 research outputs found
A Review of Audio Features and Statistical Models Exploited for Voice Pattern Design
Audio fingerprinting, also named as audio hashing, has been well-known as a
powerful technique to perform audio identification and synchronization. It
basically involves two major steps: fingerprint (voice pattern) design and
matching search. While the first step concerns the derivation of a robust and
compact audio signature, the second step usually requires knowledge about
database and quick-search algorithms. Though this technique offers a wide range
of real-world applications, to the best of the authors' knowledge, a
comprehensive survey of existing algorithms appeared more than eight years ago.
Thus, in this paper, we present a more up-to-date review and, for emphasizing
on the audio signal processing aspect, we focus our state-of-the-art survey on
the fingerprint design step for which various audio features and their
tractable statistical models are discussed.Comment: http://www.iaria.org/conferences2015/PATTERNS15.html ; Seventh
International Conferences on Pervasive Patterns and Applications (PATTERNS
2015), Mar 2015, Nice, Franc
Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation
We propose a new method to incorporate rich statistical priors, modeling temporal gain sequences in the solutions of nonnegative matrix factorization (NMF). The proposed method can be used for single-channel source separation (SCSS) applications. In NMF based SCSS, NMF is used to decompose the spectra of the observed mixed signal as a weighted linear combination of a set of trained basis vectors. In this work, the NMF decomposition weights are enforced to consider statistical and temporal prior information on the weight combination patterns that the trained basis vectors can jointly receive for each source in the observed mixed signal. The Hidden Markov Model (HMM) is used as a log-normalized gains (weights) prior model for the NMF solution. The normalization makes the prior models energy independent. HMM is used as a rich model that characterizes the statistics of sequential data. The NMF solutions for the weights are encouraged to increase the log-likelihood with the trained gain prior HMMs while reducing the NMF reconstruction error at the same time
Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain
Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies
- …