Acoustic source (speaker) tracking in the room environment plays an important role in many
speech and audio applications such as multimedia, hearing aids and hands-free speech communication
and teleconferencing systems; the position information can be fed into a higher
processing stage for high-quality speech acquisition, enhancement of a specific speech signal
in the presence of other competing talkers, or keeping a camera focused on the speaker in
a video-conferencing scenario. Most of existing systems focus on the single source tracking
problem, which assumes one and only one source is active all the time, and the state to be estimated
is simply the source position. However, in practical scenarios, multiple speakers may
be simultaneously active, and the tracking algorithm should be able to localise each individual
source and estimate the number of sources. This thesis contains three contributions towards
solutions to multiple acoustic source tracking in a moderate noisy and reverberant environment.
The first contribution of this thesis is proposing a time-delay of arrival (TDOA) estimation
approach for multiple sources. Although the phase transform (PHAT) weighted generalised
cross-correlation (GCC) method has been employed to extract the TDOAs of multiple sources,
it is primarily used for a single source scenario and its performance for multiple TDOA estimation
has not been comprehensively studied. The proposed approach combines the degenerate
unmixing estimation technique (DUET) and GCC method. Since the speech mixtures are assumed
window-disjoint orthogonal (WDO) in the time-frequency domain, the spectrograms can
be separated by employing DUET, and the GCC method can then be applied to the spectrogram
of each individual source. The probabilities of detection and false alarm are also proposed to
evaluate the TDOA estimation performance under a series of experimental parameters.
Next, considering multiple acoustic sources may appear nonconcurrently, an extended Kalman
particle filtering (EKPF) is developed for a special multiple acoustic source tracking problem,
namely “nonconcurrent multiple acoustic tracking (NMAT)”. The extended Kalman filter
(EKF) is used to approximate the optimum weights, and the subsequent particle filtering (PF)
naturally takes the previous position estimates as well as the current TDOA measurements into
account. The proposed approach is thus able to lock on the sharp change of the source position
quickly, and avoid the tracking-lag in the general sequential importance resampling (SIR) PF.
Finally, these investigations are extended into an approach to track the multiple unknown and
time-varying number of acoustic sources. The DUET-GCC method is used to obtain the TDOA
measurements for multiple sources and a random finite set (RFS) based Rao-blackwellised PF
is employed and modified to track the sources. Each particle has a RFS form encapsulating
the states of all sources and is capable of addressing source dynamics: source survival, new
source appearance and source deactivation. A data association variable is defined to depict the
source dynamic and its relation to the measurements. The Rao-blackwellisation step is used
to decompose the state: the source positions are marginalised by using an EKF, and only the
data association variable needs to be handled by a PF. The performances of all the proposed
approaches are extensively studied under different noisy and reverberant environments, and are
favorably comparable with the existing tracking techniques