351 research outputs found
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION
The detection of sound sources with microphone arrays can be enhanced through processing individual microphone signals prior to the delay and sum operation. One method in particular, the Phase Transform (PHAT) has demonstrated improvement in sound source location images, especially in reverberant and noisy environments. Recent work proposed a modification to the PHAT transform that allows varying degrees of spectral whitening through a single parameter, andamp;acirc;, which has shown positive improvement in target detection in simulation results. This work focuses on experimental evaluation of the modified SRP-PHAT algorithm. Performance results are computed from actual experimental setup of an 8-element perimeter array with a receiver operating characteristic (ROC) analysis for detecting sound sources. The results verified simulation results of PHAT- andamp;acirc; in improving target detection probabilities. The ROC analysis demonstrated the relationships between various target types (narrowband and broadband), room reverberation levels (high and low) and noise levels (different SNR) with respect to optimal andamp;acirc;. Results from experiment strongly agree with those of simulations on the effect of PHAT in significantly improving detection performance for narrowband and broadband signals especially at low SNR and in the presence of high levels of reverberation
Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework
This paper proposes a novel joint multi-speaker tracking-and-separation
method based on the generalized labeled multi-Bernoulli (GLMB) multi-target
tracking filter, using sound mixtures recorded by microphones. Standard
multi-speaker tracking algorithms usually only track speaker locations, and
ambiguity occurs when speakers are spatially close. The proposed multi-feature
GLMB tracking filter treats the set of vectors of associated speaker features
(location, pitch and sound) as the multi-target multi-feature observation,
characterizes transitioning features with corresponding transition models and
overall likelihood function, thus jointly tracks and separates each
multi-feature speaker, and addresses the spatial ambiguity problem. Numerical
evaluation verifies that the proposed method can correctly track locations of
multiple speakers and meanwhile separate speech signals
Acoustic SLAM based on the Direction-of-Arrival and the Direct-to-Reverberant Energy Ratio
This paper proposes a new method that fuses acoustic measurements in the
reverberation field and low-accuracy inertial measurement unit (IMU) motion
reports for simultaneous localization and mapping (SLAM). Different from
existing studies that only use acoustic data for direction-of-arrival (DoA)
estimates, the source's distance from sensors is calculated with the
direct-to-reverberant energy ratio (DRR) and applied as a new constraint to
eliminate the nonlinear noise from motion reports. A particle filter is applied
to estimate the critical distance, which is key for associating the source's
distance with the DRR. A keyframe method is used to eliminate the deviation of
the source position estimation toward the robot. The proposed DoA-DRR acoustic
SLAM (D-D SLAM) is designed for three-dimensional motion and is suitable for
most robots. The method is the first acoustic SLAM algorithm that has been
validated on a real-world indoor scene dataset that contains only acoustic data
and IMU measurements. Compared with previous methods, D-D SLAM has acceptable
performance in locating the robot and building a source map from a real-world
indoor dataset. The average location accuracy is 0.48 m, while the source
position error converges to less than 0.25 m within 2.8 s. These results prove
the effectiveness of D-D SLAM in real-world indoor scenes, which may be
especially useful in search and rescue missions after disasters where the
environment is foggy, i.e., unsuitable for light or laser irradiation
Bayesian framework for multiple acoustic source tracking
Acoustic source (speaker) tracking in the room environment plays an important role in many
speech and audio applications such as multimedia, hearing aids and hands-free speech communication
and teleconferencing systems; the position information can be fed into a higher
processing stage for high-quality speech acquisition, enhancement of a specific speech signal
in the presence of other competing talkers, or keeping a camera focused on the speaker in
a video-conferencing scenario. Most of existing systems focus on the single source tracking
problem, which assumes one and only one source is active all the time, and the state to be estimated
is simply the source position. However, in practical scenarios, multiple speakers may
be simultaneously active, and the tracking algorithm should be able to localise each individual
source and estimate the number of sources. This thesis contains three contributions towards
solutions to multiple acoustic source tracking in a moderate noisy and reverberant environment.
The first contribution of this thesis is proposing a time-delay of arrival (TDOA) estimation
approach for multiple sources. Although the phase transform (PHAT) weighted generalised
cross-correlation (GCC) method has been employed to extract the TDOAs of multiple sources,
it is primarily used for a single source scenario and its performance for multiple TDOA estimation
has not been comprehensively studied. The proposed approach combines the degenerate
unmixing estimation technique (DUET) and GCC method. Since the speech mixtures are assumed
window-disjoint orthogonal (WDO) in the time-frequency domain, the spectrograms can
be separated by employing DUET, and the GCC method can then be applied to the spectrogram
of each individual source. The probabilities of detection and false alarm are also proposed to
evaluate the TDOA estimation performance under a series of experimental parameters.
Next, considering multiple acoustic sources may appear nonconcurrently, an extended Kalman
particle filtering (EKPF) is developed for a special multiple acoustic source tracking problem,
namely “nonconcurrent multiple acoustic tracking (NMAT)”. The extended Kalman filter
(EKF) is used to approximate the optimum weights, and the subsequent particle filtering (PF)
naturally takes the previous position estimates as well as the current TDOA measurements into
account. The proposed approach is thus able to lock on the sharp change of the source position
quickly, and avoid the tracking-lag in the general sequential importance resampling (SIR) PF.
Finally, these investigations are extended into an approach to track the multiple unknown and
time-varying number of acoustic sources. The DUET-GCC method is used to obtain the TDOA
measurements for multiple sources and a random finite set (RFS) based Rao-blackwellised PF
is employed and modified to track the sources. Each particle has a RFS form encapsulating
the states of all sources and is capable of addressing source dynamics: source survival, new
source appearance and source deactivation. A data association variable is defined to depict the
source dynamic and its relation to the measurements. The Rao-blackwellisation step is used
to decompose the state: the source positions are marginalised by using an EKF, and only the
data association variable needs to be handled by a PF. The performances of all the proposed
approaches are extensively studied under different noisy and reverberant environments, and are
favorably comparable with the existing tracking techniques
Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework
The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods
- …