4 research outputs found
An Online Solution for Localisation, Tracking and Separation of Moving Speech Sources
The problem of separating a time varying number of speech sources in a room is difficult to solve. The challenge lies in estimating the number and the location of these speech sources. Furthermore, the tracked speech sources need to be separated. This thesis proposes a solution which utilises the Random Finite Set approach to estimate the number and location of these speech sources and subsequently separate the speech source mixture via time frequency masking
Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach
The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources
Multiple moving speaker tracking via degenerate unmixing estimation technique and Cardinality Balanced Multi-target Multi-Bernoulli Filter (DUET-CBMeMBer)
The "cocktail party problem" has always been a challenging problem to solve and many blind source separation algorithms have been proposed as solutions. This problem has mainly been discussed for non-moving sound sources but it still remains for moving sound sources and high acoustic reverberations. The ability to localise and track multiple moving speakers is a pre-requisite to solving this problem. The aim of this paper is to show that a combination of Degenerate Unmixing Estimation Technique and a Cardinality Balanced Multitarget Multi-Bernoulli Filter provides a viable way to track multiple sound sources and subsequently address the problem of sound separation for moving targets
Acoustic source localisation and tracking using microphone arrays
This thesis considers the domain of acoustic source localisation and tracking in an indoor environment.
Acoustic tracking has applications in security, human-computer interaction, and the
diarisation of meetings. Source localisation and tracking is typically a computationally expensive
task, making it hard to process on-line, especially as the number of speakers to track increases.
Much of the literature considers single-source localisation, however a practical system
must be able to cope with multiple speakers, possibly active simultaneously, without knowing
beforehand how many speakers are present. Techniques are explored for reducing the computational
requirements of an acoustic localisation system. Techniques to localise and track
multiple active sources are also explored, and developed to be more computationally efficient
than the current state of the art algorithms, whilst being able to track more speakers.
The first contribution is the modification of a recent single-speaker source localisation technique,
which improves the localisation speed. This is achieved by formalising the implicit assumption
by the modified algorithm that speaker height is uniformly distributed on the vertical
axis. Estimating height information effectively reduces the search space where speakers have
previously been detected, but who may have moved over the horizontal-plane, and are unlikely
to have significantly changed height. This is developed to allow multiple non-simultaneously
active sources to be located. This is applicable when the system is given information from a
secondary source such as a set of cameras allowing the efficient identification of active speakers
rather than just the locations of people in the environment.
The next contribution of the thesis is the application of a particle swarm technique to significantly
further decrease the computational cost of localising a single source in an indoor environment,
compared the state of the art. Several variants of the particle swarm technique are
explored, including novel variants designed specifically for localising acoustic sources. Each
method is characterised in terms of its computational complexity as well as the average localisation
error. The techniques’ responses to acoustic noise are also considered, and they are
found to be robust.
A further contribution is made by using multi-optima swarm techniques to localise multiple
simultaneously active sources. This makes use of techniques which extend the single-source
particle swarm techniques to finding multiple optima of the acoustic objective function. Several
techniques are investigated and their performance in terms of localisation accuracy and computational
complexity is characterised. Consideration is also given to how these metrics change
when an increasing number of active speakers are to be localised.
Finally, the application of the multi-optima localisation methods as an input to a multi-target
tracking system is presented. Tracking multiple speakers is a more complex task than tracking
single acoustic source, as observations of audio activity must be associated in some way with
distinct speakers. The tracker used is known to be a relatively efficient technique, and the nature
of the multi-optima output format is modified to allow the application of this technique to the
task of speaker tracking