55 research outputs found
Anti-spoofing Methods for Automatic SpeakerVerification System
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer
and Information Science (CCIS) vol. 66
Machine Learning Mitigants for Speech Based Cyber Risk
Statistical analysis of speech is an emerging area of machine learning. In this paper, we tackle the biometric challenge of Automatic Speaker Verification (ASV) of differentiating between samples generated by two distinct populations of utterances, those of an authentic human voice and those generated by a synthetic one. Solving such an issue through a statistical perspective foresees the definition of a decision rule function and a learning procedure to identify the optimal classifier. Classical state-of-the-art countermeasures rely on strong assumptions such as stationarity or local-stationarity of speech that may be atypical to encounter in practice. We explore in this regard a robust non-linear and non-stationary signal decomposition method known as the Empirical Mode Decomposition combined with the Mel-Frequency Cepstral Coefficients in a novel fashion with a refined classifier technique known as multi-kernel Support Vector machine. We undertake significant real data case studies covering multiple ASV systems using different datasets, including the ASVSpoof 2019 challenge database. The obtained results overwhelmingly demonstrate the significance of our feature extraction and classifier approach versus existing conventional methods in reducing the threat of cyber-attack perpetrated by synthetic voice replication seeking unauthorised access
Effects of Waveform PMF on Anti-Spoofing Detection
International audienceIn the context of detection of speaker recognition identity impersonation , we observed that the waveform probability mass function (PMF) of genuine speech differs from significantly of of PMF from identity theft extracts. This is true for synthesized or converted speech as well as for replayed speech. In this work, we mainly ask whether this observation has a significant impact on spoofing detection performance. In a second step, we want to reduce the distribution gap of waveforms between authentic speech and spoofing speech. We propose a genuiniza-tion of the spoofing speech (by analogy with Gaussianisation), i.e. to obtain spoofing speech with a PMF close to the PMF of genuine speech. Our genuinization is evaluated on ASVspoof 2019 challenge datasets, using the baseline system provided by the challenge organization. In the case of constant Q cep-stral coefficients (CQCC) features, the genuinization leads to a degradation of the baseline system performance by a factor of 10, which shows a potentially large impact of the distribution os waveforms on spoofing detection performance. However, by ''playing" with all configurations, we also observed different behaviors, including performance improvements in specific cases. This leads us to conclude that waveform distribution plays an important role and must be taken into account by anti-spoofing systems
Deep Generative Variational Autoencoding for Replay Spoof Detection in Automatic Speaker Verification
Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount - yet difficult to detect reliably. The generalization failure of spoofing countermeasures (CMs) has driven the community to study various alternative deep learning CMs. The majority of them are supervised approaches that learn a human-spoof discriminator. In this paper, we advocate a different, deep generative approach that leverages from powerful unsupervised manifold learning in classification. The potential benefits include the possibility to sample new data, and to obtain insights to the latent features of genuine and spoofed speech. To this end, we propose to use variational autoencoders (VAEs) as an alternative backend for replay attack detection, via three alternative models that differ in their class-conditioning. The first one, similar to the use of Gaussian mixture models (GMMs) in spoof detection, is to train independently two VAEs - one for each class. The second one is to train a single conditional model (C-VAE) by injecting a one-hot class label vector to the encoder and decoder networks. Our final proposal integrates an auxiliary classifier to guide the learning of the latent space. Our experimental results using constant-Q cepstral coefficient (CQCC) features on the ASVspoof 2017 and 2019 physical access subtask datasets indicate that the C-VAE offers substantial improvement in comparison to training two separate VAEs for each class. On the 2019 dataset, the C-VAE outperforms the VAE and the baseline GMM by an absolute 9-10% in both equal error rate (EER) and tandem detection cost function (t-DCF) metrics. Finally, we propose VAE residuals --- the absolute difference of the original input and the reconstruction as features for spoofing detection. The proposed frontend approach augmented with a convolutional neural network classifier demonstrated substantial improvement over the VAE backend use case
Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Malicious actors may seek to use different voice-spoofing attacks to fool ASV
systems and even use them for spreading misinformation. Various countermeasures
have been proposed to detect these spoofing attacks. Due to the extensive work
done on spoofing detection in automated speaker verification (ASV) systems in
the last 6-7 years, there is a need to classify the research and perform
qualitative and quantitative comparisons on state-of-the-art countermeasures.
Additionally, no existing survey paper has reviewed integrated solutions to
voice spoofing evaluation and speaker verification, adversarial/antiforensics
attacks on spoofing countermeasures, and ASV itself, or unified solutions to
detect multiple attacks using a single model. Further, no work has been done to
provide an apples-to-apples comparison of published countermeasures in order to
assess their generalizability by evaluating them across corpora. In this work,
we conduct a review of the literature on spoofing detection using hand-crafted
features, deep learning, end-to-end, and universal spoofing countermeasure
solutions to detect speech synthesis (SS), voice conversion (VC), and replay
attacks. Additionally, we also review integrated solutions to voice spoofing
evaluation and speaker verification, adversarial and anti-forensics attacks on
voice countermeasures, and ASV. The limitations and challenges of the existing
spoofing countermeasures are also presented. We report the performance of these
countermeasures on several datasets and evaluate them across corpora. For the
experiments, we employ the ASVspoof2019 and VSDC datasets along with GMM, SVM,
CNN, and CNN-GRU classifiers. (For reproduceability of the results, the code of
the test bed can be found in our GitHub Repository
- …