36 research outputs found
Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Malicious actors may seek to use different voice-spoofing attacks to fool ASV
systems and even use them for spreading misinformation. Various countermeasures
have been proposed to detect these spoofing attacks. Due to the extensive work
done on spoofing detection in automated speaker verification (ASV) systems in
the last 6-7 years, there is a need to classify the research and perform
qualitative and quantitative comparisons on state-of-the-art countermeasures.
Additionally, no existing survey paper has reviewed integrated solutions to
voice spoofing evaluation and speaker verification, adversarial/antiforensics
attacks on spoofing countermeasures, and ASV itself, or unified solutions to
detect multiple attacks using a single model. Further, no work has been done to
provide an apples-to-apples comparison of published countermeasures in order to
assess their generalizability by evaluating them across corpora. In this work,
we conduct a review of the literature on spoofing detection using hand-crafted
features, deep learning, end-to-end, and universal spoofing countermeasure
solutions to detect speech synthesis (SS), voice conversion (VC), and replay
attacks. Additionally, we also review integrated solutions to voice spoofing
evaluation and speaker verification, adversarial and anti-forensics attacks on
voice countermeasures, and ASV. The limitations and challenges of the existing
spoofing countermeasures are also presented. We report the performance of these
countermeasures on several datasets and evaluate them across corpora. For the
experiments, we employ the ASVspoof2019 and VSDC datasets along with GMM, SVM,
CNN, and CNN-GRU classifiers. (For reproduceability of the results, the code of
the test bed can be found in our GitHub Repository
GANBA: Generative Adversarial Network for Biometric Anti-Spoofing
Acknowledgments: Alejandro Gomez-Alanis holds a FPU fellowship (FPU16/05490) from the
Spanish Ministry of Education and Vocational Training. Jose A. Gonzalez-Lopez also holds a Juan
de la Cierva-Incorporaci贸n fellowship (IJCI-2017-32926) from the Spanish Ministry of Science and
Innovation. Furthermore, we acknowledge the support of Nvidia with the donation of a Titan X GPU.Data Availability Statement: The ASVspoof 2019 datasets were used in this study. They are publicly
available at https://datashare.ed.ac.uk/handle/10283/3336 (accessed on 5 December 2021).Automatic speaker verification (ASV) is a voice biometric technology whose security
might be compromised by spoofing attacks. To increase the robustness against spoofing attacks,
presentation attack detection (PAD) or anti-spoofing systems for detecting replay, text-to-speech and
voice conversion-based spoofing attacks are being developed. However, it was recently shown that
adversarial spoofing attacks may seriously fool anti-spoofing systems. Moreover, the robustness of the
whole biometric system (ASV + PAD) against this new type of attack is completely unexplored. In
this work, a new generative adversarial network for biometric anti-spoofing (GANBA) is proposed.
GANBA has a twofold basis: (1) it jointly employs the anti-spoofing and ASV losses to yield very
damaging adversarial spoofing attacks, and (2) it trains the PAD as a discriminator in order to make
them more robust against these types of adversarial attacks. The proposed system is able to generate
adversarial spoofing attacks which can fool the complete voice biometric system. Then, the resulting
PAD discriminators of the proposed GANBA can be used as a defense technique for detecting both
original and adversarial spoofing attacks. The physical access (PA) and logical access (LA) scenarios of
the ASVspoof 2019 database were employed to carry out the experiments. The experimental results
show that the GANBA attacks are quite effective, outperforming other adversarial techniques when
applied in white-box and black-box attack setups. In addition, the resulting PAD discriminators are
more robust against both original and adversarial spoofing attacks.FEDER/Junta de Andaluc铆a-Consejer铆a de Transformaci贸n
Econ贸mica, Industria, Conocimiento y Universidades Proyecto PY20_00902PID2019-104206GB-I00 funded by MCIN/ AEI /10.13039/50110001103
Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems
We present Malafide, a universal adversarial attack against automatic speaker
verification (ASV) spoofing countermeasures (CMs). By introducing convolutional
noise using an optimised linear time-invariant filter, Malafide attacks can be
used to compromise CM reliability while preserving other speech attributes such
as quality and the speaker's voice. In contrast to other adversarial attacks
proposed recently, Malafide filters are optimised independently of the input
utterance and duration, are tuned instead to the underlying spoofing attack,
and require the optimisation of only a small number of filter coefficients.
Even so, they degrade CM performance estimates by an order of magnitude, even
in black-box settings, and can also be configured to overcome integrated CM and
ASV subsystems. Integrated solutions that use self-supervised learning CMs,
however, are more robust, under both black-box and white-box settings.Comment: Accepted at INTERSPEECH 202
Training strategy for a lightweight countermeasure model for automatic speaker verification
The countermeasure (CM) model is developed to protect Automatic Speaker
Verification (ASV) systems from spoof attacks and prevent resulting personal
information leakage. Based on practicality and security considerations, the CM
model is usually deployed on edge devices, which have more limited computing
resources and storage space than cloud-based systems. This work proposes
training strategies for a lightweight CM model for ASV, using generalized
end-to-end (GE2E) pre-training and adversarial fine-tuning to improve
performance, and applying knowledge distillation (KD) to reduce the size of the
CM model. In the evaluation phase of the ASVspoof 2021 Logical Access task, the
lightweight ResNetSE model reaches min t-DCF 0.2695 and EER 3.54%. Compared to
the teacher model, the lightweight student model only uses 22.5% of parameters
and 21.1% of multiply and accumulate operands of the teacher model.Comment: ASVspoof202