130 research outputs found
Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Malicious actors may seek to use different voice-spoofing attacks to fool ASV
systems and even use them for spreading misinformation. Various countermeasures
have been proposed to detect these spoofing attacks. Due to the extensive work
done on spoofing detection in automated speaker verification (ASV) systems in
the last 6-7 years, there is a need to classify the research and perform
qualitative and quantitative comparisons on state-of-the-art countermeasures.
Additionally, no existing survey paper has reviewed integrated solutions to
voice spoofing evaluation and speaker verification, adversarial/antiforensics
attacks on spoofing countermeasures, and ASV itself, or unified solutions to
detect multiple attacks using a single model. Further, no work has been done to
provide an apples-to-apples comparison of published countermeasures in order to
assess their generalizability by evaluating them across corpora. In this work,
we conduct a review of the literature on spoofing detection using hand-crafted
features, deep learning, end-to-end, and universal spoofing countermeasure
solutions to detect speech synthesis (SS), voice conversion (VC), and replay
attacks. Additionally, we also review integrated solutions to voice spoofing
evaluation and speaker verification, adversarial and anti-forensics attacks on
voice countermeasures, and ASV. The limitations and challenges of the existing
spoofing countermeasures are also presented. We report the performance of these
countermeasures on several datasets and evaluate them across corpora. For the
experiments, we employ the ASVspoof2019 and VSDC datasets along with GMM, SVM,
CNN, and CNN-GRU classifiers. (For reproduceability of the results, the code of
the test bed can be found in our GitHub Repository
Secure Automatic Speaker Verification Systems
The growing number of voice-enabled devices and applications consider automatic speaker verification (ASV) a fundamental component. However, maximum outreach for ASV in critical domains e.g., financial services and health care, is not possible unless we overcome security breaches caused by voice cloning, and replayed audios collectively known as the spoofing attacks. The audio spoofing attacks over ASV systems on one hand strictly limit the usability of voice-enabled applications; and on the other hand, the counterfeiter also remains untraceable. Therefore, to overcome these vulnerabilities, a secure ASV (SASV) system is presented in this dissertation. The proposed SASV system is based on the concept of novel sign modified acoustic local ternary pattern (sm-ALTP) features and asymmetric bagging-based classifier-ensemble. The proposed audio representation approach clusters the high and low-frequency components in audio frames by normally distributing frequency components against a convex function. Then, the neighborhood statistics are applied to capture the user specific vocal tract information. This information is then utilized by the classifier ensemble that is based on the concept of weighted normalized voting rule to detect various spoofing attacks. Contrary to the existing ASV systems, the proposed SASV system not only detects the conventional spoofing attacks (i.e. voice cloning, and replays), but also the new attacks that are still unexplored by the research community and a requirement of the future. In this regard, a concept of cloned replays is presented in this dissertation, where, replayed audios contains the microphone characteristics as well as the voice cloning artifacts. This depicts the scenario when voice cloning is applied in real-time. The voice cloning artifacts suppresses the microphone characteristics thus fails replay detection modules and similarly with the amalgamation of microphone characteristics the voice cloning detection gets deceived. Furthermore, the proposed scheme can be utilized to obtain a possible clue against the counterfeiter through voice cloning algorithm detection module that is also a novel concept proposed in this dissertation. The voice cloning algorithm detection module determines the voice cloning algorithm used to generate the fake audios. Overall, the proposed SASV system simultaneously verifies the bonafide speakers and detects the voice cloning attack, cloning algorithm used to synthesize cloned audio (in the defined settings), and voice-replay attacks over the ASVspoof 2019 dataset. In addition, the proposed method detects the voice replay and cloned voice replay attacks over the VSDC dataset. Rigorous experimentation against state-of-the-art approaches also confirms the robustness of the proposed research
When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition
Automatic speech recognition (ASR) systems have been widely deployed in
modern smart devices to provide convenient and diverse voice-controlled
services. Since ASR systems are vulnerable to audio replay attacks that can
spoof and mislead ASR systems, a number of defense systems have been proposed
to identify replayed audio signals based on the speakers' unique acoustic
features in the frequency domain. In this paper, we uncover a new type of
replay attack called modulated replay attack, which can bypass the existing
frequency domain based defense systems. The basic idea is to compensate for the
frequency distortion of a given electronic speaker using an inverse filter that
is customized to the speaker's transform characteristics. Our experiments on
real smart devices confirm the modulated replay attacks can successfully escape
the existing detection mechanisms that rely on identifying suspicious features
in the frequency domain. To defeat modulated replay attacks, we design and
implement a countermeasure named DualGuard. We discover and formally prove that
no matter how the replay audio signals could be modulated, the replay attacks
will either leave ringing artifacts in the time domain or cause spectrum
distortion in the frequency domain. Therefore, by jointly checking suspicious
features in both frequency and time domains, DualGuard can successfully detect
various replay attacks including the modulated replay attacks. We implement a
prototype of DualGuard on a popular voice interactive platform, ReSpeaker Core
v2. The experimental results show DualGuard can achieve 98% accuracy on
detecting modulated replay attacks.Comment: 17 pages, 24 figures, In Proceedings of the 2020 ACM SIGSAC
Conference on Computer and Communications Security (CCS' 20
Anti-spoofing Methods for Automatic SpeakerVerification System
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer
and Information Science (CCIS) vol. 66
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
We present our system submission to the ASVspoof 2019 Challenge Physical
Access (PA) task. The objective for this challenge was to develop a
countermeasure that identifies speech audio as either bona fide or intercepted
and replayed. The target prediction was a value indicating that a speech
segment was bona fide (positive values) or "spoofed" (negative values). Our
system used convolutional neural networks (CNNs) and a representation of the
speech audio that combined x-vector attack embeddings with signal processing
features. The x-vector attack embeddings were created from mel-frequency
cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These
embeddings jointly modeled 27 different environments and 9 types of attacks
from the labeled data. We also used sub-band spectral centroid magnitude
coefficients (SCMCs) as features. We included an additive Gaussian noise layer
during training as a way to augment the data to make our system more robust to
previously unseen attack examples. We report system performance using the
tandem detection cost function (tDCF) and equal error rate (EER). Our approach
performed better that both of the challenge baselines. Our technique suggests
that our x-vector attack embeddings can help regularize the CNN predictions
even when environments or attacks are more challenging.Comment: Presented at Interspeech 201
- …