42 research outputs found
On the difference-to-sum power ratio of speech and wind noise based on the Corcos model
The difference-to-sum power ratio was proposed and used to suppress wind
noise under specific acoustic conditions. In this contribution, a general
formulation of the difference-to-sum power ratio associated with a mixture of
speech and wind noise is proposed and analyzed. In particular, it is assumed
that the complex coherence of convective turbulence can be modelled by the
Corcos model. In contrast to the work in which the power ratio was first
presented, the employed Corcos model holds for every possible air stream
direction and takes into account the lateral coherence decay rate. The obtained
expression is subsequently validated with real data for a dual microphone
set-up. Finally, the difference-to- sum power ratio is exploited as a spatial
feature to indicate the frame-wise presence of wind noise, obtaining improved
detection performance when compared to an existing multi-channel wind noise
detection approach.Comment: 5 pages, 3 figures, IEEE-ICSEE Eilat-Israel conference (special
session
Broadband DOA estimation using Convolutional neural networks trained with noise signals
A convolution neural network (CNN) based classification method for broadband
DOA estimation is proposed, where the phase component of the short-time Fourier
transform coefficients of the received microphone signals are directly fed into
the CNN and the features required for DOA estimation are learnt during
training. Since only the phase component of the input is used, the CNN can be
trained with synthesized noise signals, thereby making the preparation of the
training data set easier compared to using speech signals. Through experimental
evaluation, the ability of the proposed noise trained CNN framework to
generalize to speech sources is demonstrated. In addition, the robustness of
the system to noise, small perturbations in microphone positions, as well as
its ability to adapt to different acoustic conditions is investigated using
experiments with simulated and real data.Comment: Published in Proceedings of IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA) 201
Simulating Multi-channel Wind Noise Based on the Corcos Model
A novel multi-channel artificial wind noise generator based on a fluid
dynamics model, namely the Corcos model, is proposed. In particular, the model
is used to approximate the complex coherence function of wind noise signals
measured with closely-spaced microphones in the free-field and for
time-invariant wind stream direction and speed. Preliminary experiments focus
on a spatial analysis of recorded wind noise signals and the validation of the
Corcos model for diverse measurement set-ups. Subsequently, the Corcos model is
used to synthetically generate wind noise signals exhibiting the desired
complex coherence. The multi-channel generator is designed extending an
existing single-channel generator to create N mutually uncorrelated signals,
while the predefined complex coherence function is obtained exploiting an
algorithm developed to generate multi-channel non-stationary noise signals
under a complex coherence constraint. Temporal, spectral and spatial
characteristics of synthetic signals match with those observed in measured wind
noise. The artificial generation overcomes the time-consuming challenge of
collecting pure wind noise samples for noise reduction evaluations and provides
flexibility in the number of generated signals used in the simulations.Comment: 5 pages, 2 figures, IWAENC 201
Multi-scale aggregation of phase information for reducing computational cost of CNN based DOA estimation
In a recent work on direction-of-arrival (DOA) estimation of multiple
speakers with convolutional neural networks (CNNs), the phase component of
short-time Fourier transform (STFT) coefficients of the microphone signal is
given as input and small filters are used to learn the phase relations between
neighboring microphones. Due to this chosen filter size, convolution
layers are required to achieve the best performance for a microphone array with
M microphones. For arrays with large number of microphones, this requirement
leads to a high computational cost making the method practically infeasible. In
this work, we propose to use systematic dilations of the convolution filters in
each of the convolution layers of the previously proposed CNN for expansion of
the receptive field of the filters to reduce the computational cost of the
method. Different strategies for expansion of the receptive field of the
filters for a specific microphone array are explored. With experimental
analysis of the different strategies, it is shown that an aggressive expansion
strategy results in a considerable reduction in computational cost while a
relatively gradual expansion of the receptive field exhibits the best DOA
estimation performance along with reduction in the computational cost.Comment: arXiv admin note: text overlap with arXiv:1807.1172
Modal Decomposition of Feedback Delay Networks
Feedback delay networks (FDNs) belong to a general class of recursive filters
which are widely used in sound synthesis and physical modeling applications. We
present a numerical technique to compute the modal decomposition of the FDN
transfer function. The proposed pole finding algorithm is based on the
Ehrlich-Aberth iteration for matrix polynomials and has improved computational
performance of up to three orders of magnitude compared to a scalar polynomial
root finder. We demonstrate how explicit knowledge of the FDN's modal behavior
facilitates analysis and improvements for artificial reverberation. The
statistical distribution of mode frequency and residue magnitudes demonstrate
that relatively few modes contribute a large portion of impulse response
energy