20 research outputs found
Acoustic source localisation and tracking using microphone arrays
This thesis considers the domain of acoustic source localisation and tracking in an indoor environment.
Acoustic tracking has applications in security, human-computer interaction, and the
diarisation of meetings. Source localisation and tracking is typically a computationally expensive
task, making it hard to process on-line, especially as the number of speakers to track increases.
Much of the literature considers single-source localisation, however a practical system
must be able to cope with multiple speakers, possibly active simultaneously, without knowing
beforehand how many speakers are present. Techniques are explored for reducing the computational
requirements of an acoustic localisation system. Techniques to localise and track
multiple active sources are also explored, and developed to be more computationally efficient
than the current state of the art algorithms, whilst being able to track more speakers.
The first contribution is the modification of a recent single-speaker source localisation technique,
which improves the localisation speed. This is achieved by formalising the implicit assumption
by the modified algorithm that speaker height is uniformly distributed on the vertical
axis. Estimating height information effectively reduces the search space where speakers have
previously been detected, but who may have moved over the horizontal-plane, and are unlikely
to have significantly changed height. This is developed to allow multiple non-simultaneously
active sources to be located. This is applicable when the system is given information from a
secondary source such as a set of cameras allowing the efficient identification of active speakers
rather than just the locations of people in the environment.
The next contribution of the thesis is the application of a particle swarm technique to significantly
further decrease the computational cost of localising a single source in an indoor environment,
compared the state of the art. Several variants of the particle swarm technique are
explored, including novel variants designed specifically for localising acoustic sources. Each
method is characterised in terms of its computational complexity as well as the average localisation
error. The techniquesâ responses to acoustic noise are also considered, and they are
found to be robust.
A further contribution is made by using multi-optima swarm techniques to localise multiple
simultaneously active sources. This makes use of techniques which extend the single-source
particle swarm techniques to finding multiple optima of the acoustic objective function. Several
techniques are investigated and their performance in terms of localisation accuracy and computational
complexity is characterised. Consideration is also given to how these metrics change
when an increasing number of active speakers are to be localised.
Finally, the application of the multi-optima localisation methods as an input to a multi-target
tracking system is presented. Tracking multiple speakers is a more complex task than tracking
single acoustic source, as observations of audio activity must be associated in some way with
distinct speakers. The tracker used is known to be a relatively efficient technique, and the nature
of the multi-optima output format is modified to allow the application of this technique to the
task of speaker tracking
Array signal processing for source localization and enhancement
âA common approach to the wide-band microphone array problem is to assume a certain array geometry and then design optimal weights (often in subbands) to meet a set of desired criteria. In addition to weights, we consider the geometry of the microphone arrangement to be part of the optimization problem. Our approach is to use particle swarm optimization (PSO) to search for the optimal geometry while using an optimal weight design to design the weights for each particleâs geometry. The resulting directivity indices (DIâs) and white noise SNR gains (WNGâs) form the basis of the PSOâs fitness function. Another important consideration in the optimal weight design are several regularization parameters. By including those parameters in the particles, we optimize their values as well in the operation of the PSO. The proposed method allows the user great flexibility in specifying desired DIâs and WNGâs over frequency by virtue of the PSO fitness function.
Although the above method discusses beam and nulls steering for fixed locations, in real time scenarios, it requires us to estimate the source positions to steer the beam position adaptively. We also investigate source localization of sound and RF sources using machine learning techniques. As for the RF source localization, we consider radio frequency identification (RFID) antenna tags. Using a planar RFID antenna array with beam steering capability and using received signal strength indicator (RSSI) value captured for each beam position, the position of each RFID antenna tag is estimated. The proposed approach is also shown to perform well under various challenging scenariosâ--Abstract, page iv
Deep Learning for Distant Speech Recognition
Deep learning is an emerging technology that is considered one of the most
promising directions for reaching higher levels of artificial intelligence.
Among the other achievements, building computers that understand speech
represents a crucial leap towards intelligent machines. Despite the great
efforts of the past decades, however, a natural and robust human-machine speech
interaction still appears to be out of reach, especially when users interact
with a distant microphone in noisy and reverberant environments. The latter
disturbances severely hamper the intelligibility of a speech signal, making
Distant Speech Recognition (DSR) one of the major open challenges in the field.
This thesis addresses the latter scenario and proposes some novel techniques,
architectures, and algorithms to improve the robustness of distant-talking
acoustic models. We first elaborate on methodologies for realistic data
contamination, with a particular emphasis on DNN training with simulated data.
We then investigate on approaches for better exploiting speech contexts,
proposing some original methodologies for both feed-forward and recurrent
neural networks. Lastly, inspired by the idea that cooperation across different
DNNs could be the key for counteracting the harmful effects of noise and
reverberation, we propose a novel deep learning paradigm called network of deep
neural networks. The analysis of the original concepts were based on extensive
experimental validations conducted on both real and simulated data, considering
different corpora, microphone configurations, environments, noisy conditions,
and ASR tasks.Comment: PhD Thesis Unitn, 201
Acoustic Source Localisation in constrained environments
Acoustic Source Localisation (ASL) is a problem with real-world applications
across multiple domains, from smart assistants to acoustic detection and tracking.
And yet, despite the level of attention in recent years, a technique for rapid and
robust ASL remains elusive â not least in the constrained environments in which
such techniques are most likely to be deployed.
In this work, we seek to address some of these current limitations by presenting
improvements to the ASL method for three commonly encountered constraints: the
number and configuration of sensors; the limited signal sampling potentially available;
and the nature and volume of training data required to accurately estimate Direction
of Arrival (DOA) when deploying a particular supervised machine learning technique.
In regard to the number and configuration of sensors, we find that accuracy can be
maintained at state-of-the-art levels, Steered Response Power (SRP), while reducing
computation sixfold, based on direct optimisation of well known ASL formulations.
Moreover, we find that the circular microphone configuration is the least desirable
as it yields the highest localisation error.
In regard to signal sampling, we demonstrate that the computer vision inspired
algorithm presented in this work, which extracts selected keypoints from the signal spectrogram, and uses them to select signal samples, outperforms an audio
fingerprinting baseline while maintaining a compression ratio of 40:1.
In regard to the training data employed in machine learning ASL techniques,
we show that the use of music training data yields an improvement of 19% against
a noise data baseline while maintaining accuracy using only 25% of the training
data, while training with speech as opposed to noise improves DOA estimation by
an average of 17%, outperforming the Generalised Cross-Correlation technique by
125% in scenarios in which the test and training acoustic environments are matched.Heriot-Watt University James Watt
Scholarship (JSW) in the School of Engineering & Physical Sciences
ATHENA Research Book
The ATHENA European University is an alliance of nine Higher Education Institutions with the mission of fostering excellence in research and innovation by facilitating international cooperation. The ATHENA acronym stands for Advanced Technologies in Higher Education Alliance. The partner institutions are from France, Germany, Greece, Italy, Lithuania, Portugal, and Slovenia: the University of OrlĂ©ans, the University of Siegen, the Hellenic Mediterranean University, the NiccolĂČ Cusano University, the Vilnius Gediminas Technical University, the Polytechnic Institute of Porto, and the University of Maribor. In 2022 institutions from Poland and Spain joined the alliance: the Maria Curie-SkĆodowska University and the University of Vigo.
This research book presents a selection of the ATHENA university partners' research activities. It incorporates peer-reviewed original articles, reprints and student contributions. The ATHENA Research Book provides a platform that promotes joint and interdisciplinary research projects of both advanced and early-career researchers
ATHENA Research Book, Volume 1
The ATHENA European University is an alliance of nine Higher Education Institutions with the mission of fostering excellence in research and innovation by facilitating international cooperation. The ATHENA acronym stands for Advanced Technologies in Higher Education Alliance. The partner institutions are from France, Germany, Greece, Italy, Lithuania, Portugal, and Slovenia: the University of OrlĂ©ans, the University of Siegen, the Hellenic Mediterranean University, the NiccolĂČ Cusano University, the Vilnius Gediminas Technical University, the Polytechnic Institute of Porto, and the University of Maribor. In 2022 institutions from Poland and Spain joined the alliance: the Maria Curie-SkĆodowska University and the University of Vigo.
This research book presents a selection of the ATHENA university partners' research activities. It incorporates peer-reviewed original articles, reprints and student contributions. The ATHENA Research Book provides a platform that promotes joint and interdisciplinary research projects of both advanced and early-career researchers