290 research outputs found
Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function
This paper addresses the problems of blind channel identification and
multichannel equalization for speech dereverberation and noise reduction. The
time-domain cross-relation method is not suitable for blind room impulse
response identification, due to the near-common zeros of the long impulse
responses. We extend the cross-relation method to the short-time Fourier
transform (STFT) domain, in which the time-domain impulse responses are
approximately represented by the convolutive transfer functions (CTFs) with
much less coefficients. The CTFs suffer from the common zeros caused by the
oversampled STFT. We propose to identify CTFs based on the STFT with the
oversampled signals and the critical sampled CTFs, which is a good compromise
between the frequency aliasing of the signals and the common zeros problem of
CTFs. In addition, a normalization of the CTFs is proposed to remove the gain
ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for
multichannel equalization, in which the sparsity of speech signals is
exploited. We propose to perform inverse filtering by minimizing the
-norm of the source signal with the relaxed -norm fitting error
between the micophone signals and the convolution of the estimated source
signal and the CTFs used as a constraint. This method is advantageous in that
the noise can be reduced by relaxing the -norm to a tolerance
corresponding to the noise power, and the tolerance can be automatically set.
The experiments confirm the efficiency of the proposed method even under
conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table
Robust Audio Adversarial Example for a Physical Attack
We propose a method to generate audio adversarial examples that can attack a
state-of-the-art speech recognition model in the physical world. Previous work
assumes that generated adversarial examples are directly fed to the recognition
model, and is not able to perform such a physical attack because of
reverberation and noise from playback environments. In contrast, our method
obtains robust adversarial examples by simulating transformations caused by
playback or recording in the physical world and incorporating the
transformations into the generation process. Evaluation and a listening
experiment demonstrated that our adversarial examples are able to attack
without being noticed by humans. This result suggests that audio adversarial
examples generated by the proposed method may become a real threat.Comment: Accepted to IJCAI 201
- …