137 research outputs found
Design exploration and performance strategies towards power-efficient FPGA-based achitectures for sound source localization
Many applications rely on MEMS microphone arrays for locating sound sources prior to their execution. Those applications not only are executed under real-time constraints but also are often embedded on low-power devices. These environments become challenging when increasing the number of microphones or requiring dynamic responses. Field-Programmable Gate Arrays (FPGAs) are usually chosen due to their flexibility and computational power. This work intends to guide the design of reconfigurable acoustic beamforming architectures, which are not only able to accurately determine the sound Direction-Of-Arrival (DoA) but also capable to satisfy the most demanding applications in terms of power efficiency. Design considerations of the required operations performing the sound location are discussed and analysed in order to facilitate the elaboration of reconfigurable acoustic beamforming architectures. Performance strategies are proposed and evaluated based on the characteristics of the presented architecture. This power-efficient architecture is compared to a different architecture prioritizing performance in order to reveal the unavoidable design trade-offs
Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates
This paper presents a novel approach for indoor acoustic source localization
using microphone arrays and based on a Convolutional Neural Network (CNN). The
proposed solution is, to the best of our knowledge, the first published work in
which the CNN is designed to directly estimate the three dimensional position
of an acoustic source, using the raw audio signal as the input information
avoiding the use of hand crafted audio features. Given the limited amount of
available localization data, we propose in this paper a training strategy based
on two steps. We first train our network using semi-synthetic data, generated
from close talk speech recordings, and where we simulate the time delays and
distortion suffered in the signal that propagates from the source to the array
of microphones. We then fine tune this network using a small amount of real
data. Our experimental results show that this strategy is able to produce
networks that significantly improve existing localization methods based on
\textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN
method exhibits better resistance against varying gender of the speaker and
different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table
Design, implementation and evaluation of an acoustic source localization system using Deep Learning techniques
This Master Thesis presents a novel approach for indoor acoustic source localization using microphone
arrays, based on a Convolutional Neural Network (CNN) that we call the ASLNet. It directly estimates
the three-dimensional position of a single acoustic source using as inputs the raw audio signals from a set
of microphones. We use supervised learning methods to train our network end-to-end. The amount of
labeled training data available for this problem is however small. This Thesis presents a training strategy
based on two steps that mitigates this problem. We first train our network using semi-synthetic data
generated from close talk speech recordings and a mathematical model for signal propagation from the
source to the microphones. The amount of semi-synthetic data can be virtually as large as needed. We
then fine tune the resulting network using a small amount of real data. Our experimental results, evaluated
on a publicly available dataset recorded in a real room, show that this approach is able to improve existing
localization methods based on SRP-PHAT strategies and also those presented in very recent proposals
based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the
performance of the ASLNet does not show a relevant dependency on the speaker’s gender, nor on the
size of the signal window being used. This work also investigates methods to improve the generalization
properties of our network using only semi-synthetic data for training. This is a highly important objective
due to the cost of labelling localization data. We proceed by including specific effects in the input signals
to force the network to be insensitive to multipath, high noise and distortion likely to be present in real
scenarios. We obtain promising results with this strategy although they still lack behind strategies based
on fine-tuning.Máster Universitario en IngenierĂa de TelecomunicaciĂłn (M125
Frequency-Sliding Generalized Cross-Correlation: A Sub-band Time Delay Estimation Approach
The generalized cross correlation (GCC) is regarded as the most popular
approach for estimating the time difference of arrival (TDOA) between the
signals received at two sensors. Time delay estimates are obtained by
maximizing the GCC output, where the direct-path delay is usually observed as a
prominent peak. Moreover, GCCs play also an important role in steered response
power (SRP) localization algorithms, where the SRP functional can be written as
an accumulation of the GCCs computed from multiple sensor pairs. Unfortunately,
the accuracy of TDOA estimates is affected by multiple factors, including
noise, reverberation and signal bandwidth. In this paper, a sub-band approach
for time delay estimation aimed at improving the performance of the
conventional GCC is presented. The proposed method is based on the extraction
of multiple GCCs corresponding to different frequency bands of the cross-power
spectrum phase in a sliding-window fashion. The major contributions of this
paper include: 1) a sub-band GCC representation of the cross-power spectrum
phase that, despite having a reduced temporal resolution, provides a more
suitable representation for estimating the true TDOA; 2) such matrix
representation is shown to be rank one in the ideal noiseless case, a property
that is exploited in more adverse scenarios to obtain a more robust and
accurate GCC; 3) we propose a set of low-rank approximation alternatives for
processing the sub-band GCC matrix, leading to better TDOA estimates and source
localization performance. An extensive set of experiments is presented to
demonstrate the validity of the proposed approach.Comment: Article accepted in IEEE/ACM Transactions on Audio, Speech, and
Language Processin
Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance
The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating important aspects dealing with the practical deployment of IoT systems for acoustic source localization. Recent systems-on-chip composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity needed in intensive signal processing algorithms while partially retaining the appealing low power consumption of embedded systems. Different algorithms and implementations over several state-of-the-art platforms are discussed, analyzing important aspects, such as the tradeoffs between performance, energy efficiency, and exploitation of parallelism by taking into account real-time constraintsThis work was supported in part by the Post-Doctoral Fellowship from Generalitat
Valenciana under Grant APOSTD/2016/069, in part by the Spanish
Government under Grant TIN2014-53495-R, Grant TIN2015-65277-R, and
Grant BIA2016-76957-C3-1-R, and in part by the Universidad Jaume I under
Project UJI-B2016-20.Publicad
Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework
The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods
A code-division, multiple beam sonar imaging system
Submitted in partial fulfillment of the requirements for the degree of Master of Science at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution August 1989In this thesis, a new active sonar imaging concept is explored using the principle
of code-division and the simultaneous transmission of multiple coded signals. The signals
are sixteen symbol, four-bit, non-linear, block Frequency-Shift Keyed (FSK)
codes, each of which is projected into a different direction. Upon reception of the reflected
waveform, each signal is separately detected and the results are inverted to yield
an estimation of the spatial location of an object in three dimensions. The code-division
sonar is particularly effective operating in situations where the phase of the transmitted
signal is perturbed by the propagation media and the target Most imaging techniques
presently used rely on preservation of the phase of the received signal over the dimension
of the receiving array. In the code-division sonar, spatial resolution is obtained by
using the combined effects of code-to-code rejection and the a-priori knowledge of
which direction each code was transmitted. The coded signals are shown to be highly
tolerable of phase distortion over the duration of the transmission. The result is a high-resolution,
three-dimensional image, obtainable in a highly perturbative environment
Additionally, the code-division sonar is capable of a high frame rate due to the simplicity
of the processing required. Two algorithms are presented which estimate the spatial
coordinates of an object in the ensonified aperture of the system, and the performance of
the two is compared for different signal to noise levels. Finally, the concept of code-division
imaging is employed in a series of experiments in which a code-division sonar
was used to image objects under a variety of conditions. The results of the experiments
are presented, showing the resolution capabilities of the system
Analysis of Vector Sensor Data Collected in Gulf of Mexico
In 2015, the Naval Oceanographic Office collected vector sensor data in approximately 100 meters of water southwest of Panama City, Florida in the Gulf of Mexico. The vector sensor was deployed at a center mass height of one foot above the seafloor and de-coupled from its mooring through lightweight springs to measure local acoustical pressure and particle velocity.
Accuracy of the data across frequency and source azimuth is measured by evaluating acoustical impedance as a function of frequency and source azimuthal direction. Results indicate the vector sensor has an effective band from 50 to 450 Hz with mooring reflections and resonances degrading performance above this band. Localization using three spatial processing methods are analyzed for high and low Signal to Noise Ratio (SNR) sources. Directional accuracy is approximately 3 degrees up to 350 Hz and 10 degrees above 350 Hz.
Noise sources from air guns, ships, and mammals are spatially processed and the results show that the vector sensor is capable of discriminating the location of two high SNR sources in the environment that are sufficiently separated in either location, time, or frequency
Low-frequency bottom backscattering data analysis using multiple constraints beamforming
Submitted in partial fulfillment of the requirements for the degree of Ocean Engineer at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution May 1995The data analysis of a deep-sea bottom backscattering experiment, carried out over a
sediment pond on the western flank of the Mid-Atlantic Ridge in July 1993 with a 250-
650 Hz chirp source and a vertical receiving array suspended near the fiat seafloor,
is presented in this thesis. Reflected signals in the normal incidence direction as
the output of endfire beamforming are used to determine the sediment structure.
The sediment is found to be horizontally stratified, except for two irregular regions,
each about 20 m t hick, located around 18 m and 60 m beneath the water-sediment
interface. Multiple constraints beamforming is shown to be effective in removing
coherent reflections from internal stratified layers, which is critical to the analysis of
bottom backscattering. With backscattered signals obtained by beamforming, the
above-mentioned two inhomogeneous regions are found to be the dominant factors on
the bottom backscattered field, both in the normal incidence and oblique directions.
The backscattering strength as a function of grazing angle is estimated for each of
the two regions
A study into the design of steerable microphones arrays
Beamforming, being a multi-channel signal processing technique, can offer both spatial and temporal selective filtering. It has much more potential than single channel signal processing in various commercial applications. This thesis presents a study on steerable robust broadband beamformers together with a number of their design formulations. The design formulations allow a simple steering mechanism and yet maintain a frequency invariant property as well as achieve robustness against practical imperfectio
- …