444 research outputs found
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
A Geometric Approach to Sound Source Localization from Time-Delay Estimates
This paper addresses the problem of sound-source localization from time-delay
estimates using arbitrarily-shaped non-coplanar microphone arrays. A novel
geometric formulation is proposed, together with a thorough algebraic analysis
and a global optimization solver. The proposed model is thoroughly described
and evaluated. The geometric analysis, stemming from the direct acoustic
propagation model, leads to necessary and sufficient conditions for a set of
time delays to correspond to a unique position in the source space. Such sets
of time delays are referred to as feasible sets. We formally prove that every
feasible set corresponds to exactly one position in the source space, whose
value can be recovered using a closed-form localization mapping. Therefore we
seek for the optimal feasible set of time delays given, as input, the received
microphone signals. This time delay estimation problem is naturally cast into a
programming task, constrained by the feasibility conditions derived from the
geometric analysis. A global branch-and-bound optimization technique is
proposed to solve the problem at hand, hence estimating the best set of
feasible time delays and, subsequently, localizing the sound source. Extensive
experiments with both simulated and real data are reported; we compare our
methodology to four state-of-the-art techniques. This comparison clearly shows
that the proposed method combined with the branch-and-bound algorithm
outperforms existing methods. These in-depth geometric understanding, practical
algorithms, and encouraging results, open several opportunities for future
work.Comment: 13 pages, 2 figures, 3 table, journa
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
Square root-based multi-source early PSD estimation and recursive RETF update in reverberant environments by means of the orthogonal Procrustes problem
Multi-channel short-time Fourier transform (STFT) domain-based processing of
reverberant microphone signals commonly relies on power-spectral-density (PSD)
estimates of early source images, where early refers to reflections contained
within the same STFT frame. State-of-the-art approaches to multi-source early
PSD estimation, given an estimate of the associated relative early transfer
functions (RETFs), conventionally minimize the approximation error defined with
respect to the early correlation matrix, requiring non-negative inequality
constraints on the PSDs. Instead, we here propose to factorize the early
correlation matrix and minimize the approximation error defined with respect to
the early-correlation-matrix square root. The proposed minimization problem --
constituting a generalization of the so-called orthogonal Procrustes problem --
seeks a unitary matrix and the square roots of the early PSDs up to an
arbitrary complex argument, making non-negative inequality constraints
redundant. A solution is obtained iteratively, requiring one singular value
decomposition (SVD) per iteration. The estimated unitary matrix and early PSD
square roots further allow to recursively update the RETF estimate, which is
not inherently possible in the conventional approach. An estimate of the said
early-correlation-matrix square root itself is obtained by means of the
generalized eigenvalue decomposition (GEVD), where we further propose to
restore non-stationarities by desmoothing the generalized eigenvalues in order
to compensate for inevitable recursive averaging. Simulation results indicate
fast convergence of the proposed multi-source early PSD estimation approach in
only one iteration if initialized appropriately, and better performance as
compared to the conventional approach
Self-Localization of Ad-Hoc Arrays Using Time Difference of Arrivals
This work was supported by the U.K. Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/K007491/1
A Speech Distortion and Interference Rejection Constraint Beamformer
Signals captured by a set of microphones in a speech communication system are mixtures of desired and undesired signals and ambient noise. Existing beamformers can be divided into those that preserve or distort the desired signal. Beamformers that preserve the desired signal are, for example, the linearly constrained minimum variance (LCMV) beamformer that is supposed, ideally, to reject the undesired signal and reduce the ambient noise power, and the minimum variance distortionless response (MVDR) beamformer that reduces the interference-plus-noise power. The multichannel Wiener filter, on the other hand, reduces the interference-plus-noise power without preserving the desired signal. In this paper, a speech distortion and interference rejection constraint (SDIRC) beamformer is derived that minimizes the ambient noise power subject to specific constraints that allow a tradeoff between speech distortion and interference-plus-noise reduction on the one hand, and undesire d signal and ambient noise reductions on the other hand. Closed-form expressions for the performance measures of the SDIRC beamformer are derived and the relations to the aforementioned beamformers are derived. The performance evaluation demonstrates the tradeoffs that can be made using the SDIRC beamformer
- …