446 research outputs found

    Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks

    Full text link
    The propagation of sound in a shallow water environment is characterized by boundary reflections from the sea surface and sea floor. These reflections result in multiple (indirect) sound propagation paths, which can degrade the performance of passive sound source localization methods. This paper proposes the use of convolutional neural networks (CNNs) for the localization of sources of broadband acoustic radiated noise (such as motor vessels) in shallow water multipath environments. It is shown that CNNs operating on cepstrogram and generalized cross-correlogram inputs are able to more reliably estimate the instantaneous range and bearing of transiting motor vessels when the source localization performance of conventional passive ranging methods is degraded. The ensuing improvement in source localization performance is demonstrated using real data collected during an at-sea experiment.Comment: 5 pages, 5 figures, Final draft of paper submitted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 15-20 April 2018 in Calgary, Alberta, Canada. arXiv admin note: text overlap with arXiv:1612.0350

    Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

    Full text link
    This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on \textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table

    Acoustic localization of people in reverberant environments using deep learning techniques

    Get PDF
    La localización de las personas a partir de información acústica es cada vez más importante en aplicaciones del mundo real como la seguridad, la vigilancia y la interacción entre personas y robots. En muchos casos, es necesario localizar con precisión personas u objetos en función del sonido que generan, especialmente en entornos ruidosos y reverberantes en los que los métodos de localización tradicionales pueden fallar, o en escenarios en los que los métodos basados en análisis de vídeo no son factibles por no disponer de ese tipo de sensores o por la existencia de oclusiones relevantes. Por ejemplo, en seguridad y vigilancia, la capacidad de localizar con precisión una fuente de sonido puede ayudar a identificar posibles amenazas o intrusos. En entornos sanitarios, la localización acústica puede utilizarse para controlar los movimientos y actividades de los pacientes, especialmente los que tienen problemas de movilidad. En la interacción entre personas y robots, los robots equipados con capacidades de localización acústica pueden percibir y responder mejor a su entorno, lo que permite interacciones más naturales e intuitivas con los humanos. Por lo tanto, el desarrollo de sistemas de localización acústica precisos y robustos utilizando técnicas avanzadas como el aprendizaje profundo es de gran importancia práctica. Es por esto que en esta tesis doctoral se aborda dicho problema en tres líneas de investigación fundamentales: (i) El diseño de un sistema extremo a extremo (end-to-end) basado en redes neuronales capaz de mejorar las tasas de localización de sistemas ya existentes en el estado del arte. (ii) El diseño de un sistema capaz de localizar a uno o varios hablantes simultáneos en entornos con características y con geometrías de arrays de sensores diferentes sin necesidad de re-entrenar. (iii) El diseño de sistemas capaces de refinar los mapas de potencia acústica necesarios para localizar a las fuentes acústicas para conseguir una mejor localización posterior. A la hora de evaluar la consecución de dichos objetivos se han utilizado diversas bases de datos realistas con características diferentes, donde las personas involucradas en las escenas pueden actuar sin ningún tipo de restricción. Todos los sistemas propuestos han sido evaluados bajo las mismas condiciones consiguiendo superar en términos de error de localización a los sistemas actuales del estado del arte

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Array signal processing for source localization and enhancement

    Get PDF
    “A common approach to the wide-band microphone array problem is to assume a certain array geometry and then design optimal weights (often in subbands) to meet a set of desired criteria. In addition to weights, we consider the geometry of the microphone arrangement to be part of the optimization problem. Our approach is to use particle swarm optimization (PSO) to search for the optimal geometry while using an optimal weight design to design the weights for each particle’s geometry. The resulting directivity indices (DI’s) and white noise SNR gains (WNG’s) form the basis of the PSO’s fitness function. Another important consideration in the optimal weight design are several regularization parameters. By including those parameters in the particles, we optimize their values as well in the operation of the PSO. The proposed method allows the user great flexibility in specifying desired DI’s and WNG’s over frequency by virtue of the PSO fitness function. Although the above method discusses beam and nulls steering for fixed locations, in real time scenarios, it requires us to estimate the source positions to steer the beam position adaptively. We also investigate source localization of sound and RF sources using machine learning techniques. As for the RF source localization, we consider radio frequency identification (RFID) antenna tags. Using a planar RFID antenna array with beam steering capability and using received signal strength indicator (RSSI) value captured for each beam position, the position of each RFID antenna tag is estimated. The proposed approach is also shown to perform well under various challenging scenarios”--Abstract, page iv

    Human localization and activity classification by machine learning on Wi-Fi channel state information

    Get PDF
    Devices communicating via Wi-Fi adjust subcarrier correction coefficients in real time. The stream of correction coefficients for all subcarriers is called channel state information (CSI). The latter can be used for human body sensing, in particular current activity and location. The thesis aims to create a robust, environment-agnostic activity classifier. In other words, a neural network (NN) trained to recognize and classify human actions in one location should not dramatically lose prediction capability if transferred to another. This purpose has been achieved in three steps. First, for a neural network to abstract from a particular environment, a diverse data has to be collected. Therefore, a dedicated laboratory equipment to automate a Wi-Fi access point (AP) physical movement and rotation has been developed and constructed. After data for training NNs has been collected, the environment-specific information has been cleaned by classic signal processing algorithms. Finally, a dedicated neural network architecture adjustments have been implemented. Altogether, the goal of environment agnostic classification for the target "sit down", "stand up", "lie down", and "unlie" activities is achieved. However, classification accuracy depends on the similarity between train and test human subjects. The work argues that activity classification and localization tasks have orthogonal goals and focus on different aspects of CSI information. In particular, activity classification NN is interested in ongoing physical movement features and should work independently of the human location. On the contrary, localization NN should ignore subject activities and infer only positioning. Therefore, these two tasks are separated into standalone NN architectures. Since environments such as apartments can be very different from each other, it is assumed that training a universal NN localizer is not possible. Since the localization NN needs to be re-trained for each particular environment, its virtue would be an inexpensive training cycle. Tho achieve this goal, localization NN is substantially reduced in size from 58.8 to 0.4 million parameters
    corecore