698 research outputs found
Spotting adversarial samples for speaker verification by neural vocoders
Automatic speaker verification (ASV), one of the most important technology
for biometric identification, has been widely adopted in security-critical
applications, including transaction authentication and access control. However,
previous work has shown that ASV is seriously vulnerable to recently emerged
adversarial attacks, yet effective countermeasures against them are limited. In
this paper, we adopt neural vocoders to spot adversarial samples for ASV. We
use the neural vocoder to re-synthesize audio and find that the difference
between the ASV scores for the original and re-synthesized audio is a good
indicator for discrimination between genuine and adversarial samples. This
effort is, to the best of our knowledge, among the first to pursue such a
technical direction for detecting adversarial samples for ASV, and hence there
is a lack of established baselines for comparison. Consequently, we implement
the Griffin-Lim algorithm as the detection baseline. The proposed approach
achieves effective detection performance that outperforms all the baselines in
all the settings. We also show that the neural vocoder adopted in the detection
framework is dataset-independent. Our codes will be made open-source for future
works to do comparison.Comment: Submitted to ASRU 202
Diffusion-Based Adversarial Purification for Speaker Verification
Recently, automatic speaker verification (ASV) based on deep learning is
easily contaminated by adversarial attacks, which is a new type of attack that
injects imperceptible perturbations to audio signals so as to make ASV produce
wrong decisions. This poses a significant threat to the security and
reliability of ASV systems. To address this issue, we propose a Diffusion-Based
Adversarial Purification (DAP) method that enhances the robustness of ASV
systems against such adversarial attacks. Our method leverages a conditional
denoising diffusion probabilistic model to effectively purify the adversarial
examples and mitigate the impact of perturbations. DAP first introduces
controlled noise into adversarial examples, and then performs a reverse
denoising process to reconstruct clean audio. Experimental results demonstrate
the efficacy of the proposed DAP in enhancing the security of ASV and meanwhile
minimizing the distortion of the purified audio signals
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
Although the security of automatic speaker verification (ASV) is seriously
threatened by recently emerged adversarial attacks, there have been some
countermeasures to alleviate the threat. However, many defense approaches not
only require the prior knowledge of the attackers but also possess weak
interpretability. To address this issue, in this paper, we propose an
attacker-independent and interpretable method, named learnable mask detector
(LMD), to separate adversarial examples from the genuine ones. It utilizes
score variation as an indicator to detect adversarial examples, where the score
variation is the absolute discrepancy between the ASV scores of an original
audio recording and its transformed audio synthesized from its masked complex
spectrogram. A core component of the score variation detector is to generate
the masked spectrogram by a neural network. The neural network needs only
genuine examples for training, which makes it an attacker-independent approach.
Its interpretability lies that the neural network is trained to minimize the
score variation of the targeted ASV, and maximize the number of the masked
spectrogram bins of the genuine training examples. Its foundation is based on
the observation that, masking out the vast majority of the spectrogram bins
with little speaker information will inevitably introduce a large score
variation to the adversarial example, and a small score variation to the
genuine example. Experimental results with 12 attackers and two representative
ASV systems show that our proposed method outperforms five state-of-the-art
baselines. The extensive experimental results can also be a benchmark for the
detection-based ASV defenses.Comment: 13 pages, 9 figure
- …