192 research outputs found
Broadband DOA estimation using Convolutional neural networks trained with noise signals
A convolution neural network (CNN) based classification method for broadband
DOA estimation is proposed, where the phase component of the short-time Fourier
transform coefficients of the received microphone signals are directly fed into
the CNN and the features required for DOA estimation are learnt during
training. Since only the phase component of the input is used, the CNN can be
trained with synthesized noise signals, thereby making the preparation of the
training data set easier compared to using speech signals. Through experimental
evaluation, the ability of the proposed noise trained CNN framework to
generalize to speech sources is demonstrated. In addition, the robustness of
the system to noise, small perturbations in microphone positions, as well as
its ability to adapt to different acoustic conditions is investigated using
experiments with simulated and real data.Comment: Published in Proceedings of IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA) 201
Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks
The propagation of sound in a shallow water environment is characterized by
boundary reflections from the sea surface and sea floor. These reflections
result in multiple (indirect) sound propagation paths, which can degrade the
performance of passive sound source localization methods. This paper proposes
the use of convolutional neural networks (CNNs) for the localization of sources
of broadband acoustic radiated noise (such as motor vessels) in shallow water
multipath environments. It is shown that CNNs operating on cepstrogram and
generalized cross-correlogram inputs are able to more reliably estimate the
instantaneous range and bearing of transiting motor vessels when the source
localization performance of conventional passive ranging methods is degraded.
The ensuing improvement in source localization performance is demonstrated
using real data collected during an at-sea experiment.Comment: 5 pages, 5 figures, Final draft of paper submitted to 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
15-20 April 2018 in Calgary, Alberta, Canada. arXiv admin note: text overlap
with arXiv:1612.0350
Exploiting CNNs for Improving Acoustic Source Localization in Noisy and Reverberant Conditions
This paper discusses the application of convolutional neural networks (CNNs) to minimum variance distortionless response localization schemes. We investigate the direction of arrival estimation problems in noisy and reverberant conditions using a uniform linear array (ULA). CNNs are used to process the multichannel data from the ULA and to improve the data fusion scheme, which is performed in the steered response power computation. CNNs improve the incoherent frequency fusion of the narrowband response power by weighting the components, reducing the deleterious effects of those components affected by artifacts due to noise and reverberation. The use of CNNs avoids the necessity of previously encoding the multichannel data into selected acoustic cues with the advantage to exploit its ability in recognizing geometrical pattern similarity. Experiments with both simulated and real acoustic data demonstrate the superior localization performance of the proposed SRP beamformer with respect to other state-of-the-art techniques
Multi-scale aggregation of phase information for reducing computational cost of CNN based DOA estimation
In a recent work on direction-of-arrival (DOA) estimation of multiple
speakers with convolutional neural networks (CNNs), the phase component of
short-time Fourier transform (STFT) coefficients of the microphone signal is
given as input and small filters are used to learn the phase relations between
neighboring microphones. Due to this chosen filter size, convolution
layers are required to achieve the best performance for a microphone array with
M microphones. For arrays with large number of microphones, this requirement
leads to a high computational cost making the method practically infeasible. In
this work, we propose to use systematic dilations of the convolution filters in
each of the convolution layers of the previously proposed CNN for expansion of
the receptive field of the filters to reduce the computational cost of the
method. Different strategies for expansion of the receptive field of the
filters for a specific microphone array are explored. With experimental
analysis of the different strategies, it is shown that an aggressive expansion
strategy results in a considerable reduction in computational cost while a
relatively gradual expansion of the receptive field exhibits the best DOA
estimation performance along with reduction in the computational cost.Comment: arXiv admin note: text overlap with arXiv:1807.1172
- âŠ