177 research outputs found
Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Malicious actors may seek to use different voice-spoofing attacks to fool ASV
systems and even use them for spreading misinformation. Various countermeasures
have been proposed to detect these spoofing attacks. Due to the extensive work
done on spoofing detection in automated speaker verification (ASV) systems in
the last 6-7 years, there is a need to classify the research and perform
qualitative and quantitative comparisons on state-of-the-art countermeasures.
Additionally, no existing survey paper has reviewed integrated solutions to
voice spoofing evaluation and speaker verification, adversarial/antiforensics
attacks on spoofing countermeasures, and ASV itself, or unified solutions to
detect multiple attacks using a single model. Further, no work has been done to
provide an apples-to-apples comparison of published countermeasures in order to
assess their generalizability by evaluating them across corpora. In this work,
we conduct a review of the literature on spoofing detection using hand-crafted
features, deep learning, end-to-end, and universal spoofing countermeasure
solutions to detect speech synthesis (SS), voice conversion (VC), and replay
attacks. Additionally, we also review integrated solutions to voice spoofing
evaluation and speaker verification, adversarial and anti-forensics attacks on
voice countermeasures, and ASV. The limitations and challenges of the existing
spoofing countermeasures are also presented. We report the performance of these
countermeasures on several datasets and evaluate them across corpora. For the
experiments, we employ the ASVspoof2019 and VSDC datasets along with GMM, SVM,
CNN, and CNN-GRU classifiers. (For reproduceability of the results, the code of
the test bed can be found in our GitHub Repository
Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Recent advances in sophisticated synthetic speech generated from
text-to-speech (TTS) or voice conversion (VC) systems cause threats to the
existing automatic speaker verification (ASV) systems. Since such synthetic
speech is generated from diverse algorithms, generalization ability with using
limited training data is indispensable for a robust anti-spoofing system. In
this work, we propose a transfer learning scheme based on the wav2vec 2.0
pretrained model with variational information bottleneck (VIB) for speech
anti-spoofing task. Evaluation on the ASVspoof 2019 logical access (LA)
database shows that our method improves the performance of distinguishing
unseen spoofed and genuine speech, outperforming current state-of-the-art
anti-spoofing systems. Furthermore, we show that the proposed system improves
performance in low-resource and cross-dataset settings of anti-spoofing task
significantly, demonstrating that our system is also robust in terms of data
size and data distribution.Comment: Submitted to Interspeech 202
Subband modeling for spoofing detection in automatic speaker verification
Spectrograms - time-frequency representations of audio signals - have found widespread use in neural network-based spoofing detection. While deep models are trained on the fullband spectrum of the signal, we argue that not all frequency bands are useful for these tasks. In this paper, we systematically investigate the impact of different subbands and their importance on replay spoofing detection on two benchmark datasets: ASVspoof 2017 v2.0 and ASVspoof 2019 PA. We propose a joint subband modelling framework that employs n different sub-networks to learn subband specific features. These are later combined and passed to a classifier and the whole network weights are updated during training. Our findings on the ASVspoof 2017 dataset suggest that the most discriminative information appears to be in the first and the last 1 kHz frequency bands, and the joint model trained on these two subbands shows the best performance outperforming the baselines by a large margin. However, these findings do not generalise on the ASVspoof 2019 PA dataset. This suggests that the datasets available for training these models do not reflect real world replay conditions suggesting a need for careful design of datasets for training replay spoofing countermeasures
Can spoofing countermeasure and speaker verification systems be jointly optimised?
Spoofing countermeasure (CM) and automatic speaker verification (ASV)
sub-systems can be used in tandem with a backend classifier as a solution to
the spoofing aware speaker verification (SASV) task. The two sub-systems are
typically trained independently to solve different tasks. While our previous
work demonstrated the potential of joint optimisation, it also showed a
tendency to over-fit to speakers and a lack of sub-system complementarity.
Using only a modest quantity of auxiliary data collected from new speakers, we
show that joint optimisation degrades the performance of separate CM and ASV
sub-systems, but that it nonetheless improves complementarity, thereby
delivering superior SASV performance. Using standard SASV evaluation data and
protocols, joint optimisation reduces the equal error rate by 27\% relative to
performance obtained using fixed, independently-optimised sub-systems under
like-for-like training conditions.Comment: Accepted to ICASSP 2023. Code will be available soo
- …