137 research outputs found

    Design exploration and performance strategies towards power-efficient FPGA-based achitectures for sound source localization

    Get PDF
    Many applications rely on MEMS microphone arrays for locating sound sources prior to their execution. Those applications not only are executed under real-time constraints but also are often embedded on low-power devices. These environments become challenging when increasing the number of microphones or requiring dynamic responses. Field-Programmable Gate Arrays (FPGAs) are usually chosen due to their flexibility and computational power. This work intends to guide the design of reconfigurable acoustic beamforming architectures, which are not only able to accurately determine the sound Direction-Of-Arrival (DoA) but also capable to satisfy the most demanding applications in terms of power efficiency. Design considerations of the required operations performing the sound location are discussed and analysed in order to facilitate the elaboration of reconfigurable acoustic beamforming architectures. Performance strategies are proposed and evaluated based on the characteristics of the presented architecture. This power-efficient architecture is compared to a different architecture prioritizing performance in order to reveal the unavoidable design trade-offs

    Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

    Full text link
    This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on \textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table

    Design, implementation and evaluation of an acoustic source localization system using Deep Learning techniques

    Get PDF
    This Master Thesis presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN) that we call the ASLNet. It directly estimates the three-dimensional position of a single acoustic source using as inputs the raw audio signals from a set of microphones. We use supervised learning methods to train our network end-to-end. The amount of labeled training data available for this problem is however small. This Thesis presents a training strategy based on two steps that mitigates this problem. We first train our network using semi-synthetic data generated from close talk speech recordings and a mathematical model for signal propagation from the source to the microphones. The amount of semi-synthetic data can be virtually as large as needed. We then fine tune the resulting network using a small amount of real data. Our experimental results, evaluated on a publicly available dataset recorded in a real room, show that this approach is able to improve existing localization methods based on SRP-PHAT strategies and also those presented in very recent proposals based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the performance of the ASLNet does not show a relevant dependency on the speaker’s gender, nor on the size of the signal window being used. This work also investigates methods to improve the generalization properties of our network using only semi-synthetic data for training. This is a highly important objective due to the cost of labelling localization data. We proceed by including specific effects in the input signals to force the network to be insensitive to multipath, high noise and distortion likely to be present in real scenarios. We obtain promising results with this strategy although they still lack behind strategies based on fine-tuning.Máster Universitario en Ingeniería de Telecomunicación (M125

    Frequency-Sliding Generalized Cross-Correlation: A Sub-band Time Delay Estimation Approach

    Full text link
    The generalized cross correlation (GCC) is regarded as the most popular approach for estimating the time difference of arrival (TDOA) between the signals received at two sensors. Time delay estimates are obtained by maximizing the GCC output, where the direct-path delay is usually observed as a prominent peak. Moreover, GCCs play also an important role in steered response power (SRP) localization algorithms, where the SRP functional can be written as an accumulation of the GCCs computed from multiple sensor pairs. Unfortunately, the accuracy of TDOA estimates is affected by multiple factors, including noise, reverberation and signal bandwidth. In this paper, a sub-band approach for time delay estimation aimed at improving the performance of the conventional GCC is presented. The proposed method is based on the extraction of multiple GCCs corresponding to different frequency bands of the cross-power spectrum phase in a sliding-window fashion. The major contributions of this paper include: 1) a sub-band GCC representation of the cross-power spectrum phase that, despite having a reduced temporal resolution, provides a more suitable representation for estimating the true TDOA; 2) such matrix representation is shown to be rank one in the ideal noiseless case, a property that is exploited in more adverse scenarios to obtain a more robust and accurate GCC; 3) we propose a set of low-rank approximation alternatives for processing the sub-band GCC matrix, leading to better TDOA estimates and source localization performance. An extensive set of experiments is presented to demonstrate the validity of the proposed approach.Comment: Article accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance

    Get PDF
    The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating important aspects dealing with the practical deployment of IoT systems for acoustic source localization. Recent systems-on-chip composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity needed in intensive signal processing algorithms while partially retaining the appealing low power consumption of embedded systems. Different algorithms and implementations over several state-of-the-art platforms are discussed, analyzing important aspects, such as the tradeoffs between performance, energy efficiency, and exploitation of parallelism by taking into account real-time constraintsThis work was supported in part by the Post-Doctoral Fellowship from Generalitat Valenciana under Grant APOSTD/2016/069, in part by the Spanish Government under Grant TIN2014-53495-R, Grant TIN2015-65277-R, and Grant BIA2016-76957-C3-1-R, and in part by the Universidad Jaume I under Project UJI-B2016-20.Publicad

    Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework

    Get PDF
    The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods

    A code-division, multiple beam sonar imaging system

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Master of Science at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution August 1989In this thesis, a new active sonar imaging concept is explored using the principle of code-division and the simultaneous transmission of multiple coded signals. The signals are sixteen symbol, four-bit, non-linear, block Frequency-Shift Keyed (FSK) codes, each of which is projected into a different direction. Upon reception of the reflected waveform, each signal is separately detected and the results are inverted to yield an estimation of the spatial location of an object in three dimensions. The code-division sonar is particularly effective operating in situations where the phase of the transmitted signal is perturbed by the propagation media and the target Most imaging techniques presently used rely on preservation of the phase of the received signal over the dimension of the receiving array. In the code-division sonar, spatial resolution is obtained by using the combined effects of code-to-code rejection and the a-priori knowledge of which direction each code was transmitted. The coded signals are shown to be highly tolerable of phase distortion over the duration of the transmission. The result is a high-resolution, three-dimensional image, obtainable in a highly perturbative environment Additionally, the code-division sonar is capable of a high frame rate due to the simplicity of the processing required. Two algorithms are presented which estimate the spatial coordinates of an object in the ensonified aperture of the system, and the performance of the two is compared for different signal to noise levels. Finally, the concept of code-division imaging is employed in a series of experiments in which a code-division sonar was used to image objects under a variety of conditions. The results of the experiments are presented, showing the resolution capabilities of the system

    Analysis of Vector Sensor Data Collected in Gulf of Mexico

    Get PDF
    In 2015, the Naval Oceanographic Office collected vector sensor data in approximately 100 meters of water southwest of Panama City, Florida in the Gulf of Mexico. The vector sensor was deployed at a center mass height of one foot above the seafloor and de-coupled from its mooring through lightweight springs to measure local acoustical pressure and particle velocity. Accuracy of the data across frequency and source azimuth is measured by evaluating acoustical impedance as a function of frequency and source azimuthal direction. Results indicate the vector sensor has an effective band from 50 to 450 Hz with mooring reflections and resonances degrading performance above this band. Localization using three spatial processing methods are analyzed for high and low Signal to Noise Ratio (SNR) sources. Directional accuracy is approximately 3 degrees up to 350 Hz and 10 degrees above 350 Hz. Noise sources from air guns, ships, and mammals are spatially processed and the results show that the vector sensor is capable of discriminating the location of two high SNR sources in the environment that are sufficiently separated in either location, time, or frequency

    Low-frequency bottom backscattering data analysis using multiple constraints beamforming

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Ocean Engineer at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution May 1995The data analysis of a deep-sea bottom backscattering experiment, carried out over a sediment pond on the western flank of the Mid-Atlantic Ridge in July 1993 with a 250- 650 Hz chirp source and a vertical receiving array suspended near the fiat seafloor, is presented in this thesis. Reflected signals in the normal incidence direction as the output of endfire beamforming are used to determine the sediment structure. The sediment is found to be horizontally stratified, except for two irregular regions, each about 20 m t hick, located around 18 m and 60 m beneath the water-sediment interface. Multiple constraints beamforming is shown to be effective in removing coherent reflections from internal stratified layers, which is critical to the analysis of bottom backscattering. With backscattered signals obtained by beamforming, the above-mentioned two inhomogeneous regions are found to be the dominant factors on the bottom backscattered field, both in the normal incidence and oblique directions. The backscattering strength as a function of grazing angle is estimated for each of the two regions

    A study into the design of steerable microphones arrays

    Get PDF
    Beamforming, being a multi-channel signal processing technique, can offer both spatial and temporal selective filtering. It has much more potential than single channel signal processing in various commercial applications. This thesis presents a study on steerable robust broadband beamformers together with a number of their design formulations. The design formulations allow a simple steering mechanism and yet maintain a frequency invariant property as well as achieve robustness against practical imperfectio
    • …
    corecore