234 research outputs found
Anti-spoofing Methods for Automatic SpeakerVerification System
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer
and Information Science (CCIS) vol. 66
Audio Deepfake Detection: A Survey
Audio deepfake detection is an emerging active topic. A growing number of
literatures have aimed to study deepfake detection algorithms and achieved
effective performance, the problem of which is far from being solved. Although
there are some review literatures, there has been no comprehensive survey that
provides researchers with a systematic overview of these developments with a
unified evaluation. Accordingly, in this survey paper, we first highlight the
key differences across various types of deepfake audio, then outline and
analyse competitions, datasets, features, classifications, and evaluation of
state-of-the-art approaches. For each aspect, the basic techniques, advanced
developments and major challenges are discussed. In addition, we perform a
unified comparison of representative features and classifiers on ASVspoof 2021,
ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively.
The survey shows that future research should address the lack of large scale
datasets in the wild, poor generalization of existing detection methods to
unknown fake attacks, as well as interpretability of detection results
Secure Automatic Speaker Verification Systems
The growing number of voice-enabled devices and applications consider automatic speaker verification (ASV) a fundamental component. However, maximum outreach for ASV in critical domains e.g., financial services and health care, is not possible unless we overcome security breaches caused by voice cloning, and replayed audios collectively known as the spoofing attacks. The audio spoofing attacks over ASV systems on one hand strictly limit the usability of voice-enabled applications; and on the other hand, the counterfeiter also remains untraceable. Therefore, to overcome these vulnerabilities, a secure ASV (SASV) system is presented in this dissertation. The proposed SASV system is based on the concept of novel sign modified acoustic local ternary pattern (sm-ALTP) features and asymmetric bagging-based classifier-ensemble. The proposed audio representation approach clusters the high and low-frequency components in audio frames by normally distributing frequency components against a convex function. Then, the neighborhood statistics are applied to capture the user specific vocal tract information. This information is then utilized by the classifier ensemble that is based on the concept of weighted normalized voting rule to detect various spoofing attacks. Contrary to the existing ASV systems, the proposed SASV system not only detects the conventional spoofing attacks (i.e. voice cloning, and replays), but also the new attacks that are still unexplored by the research community and a requirement of the future. In this regard, a concept of cloned replays is presented in this dissertation, where, replayed audios contains the microphone characteristics as well as the voice cloning artifacts. This depicts the scenario when voice cloning is applied in real-time. The voice cloning artifacts suppresses the microphone characteristics thus fails replay detection modules and similarly with the amalgamation of microphone characteristics the voice cloning detection gets deceived. Furthermore, the proposed scheme can be utilized to obtain a possible clue against the counterfeiter through voice cloning algorithm detection module that is also a novel concept proposed in this dissertation. The voice cloning algorithm detection module determines the voice cloning algorithm used to generate the fake audios. Overall, the proposed SASV system simultaneously verifies the bonafide speakers and detects the voice cloning attack, cloning algorithm used to synthesize cloned audio (in the defined settings), and voice-replay attacks over the ASVspoof 2019 dataset. In addition, the proposed method detects the voice replay and cloned voice replay attacks over the VSDC dataset. Rigorous experimentation against state-of-the-art approaches also confirms the robustness of the proposed research
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects
Audio has become an increasingly crucial biometric modality due to its
ability to provide an intuitive way for humans to interact with machines. It is
currently being used for a range of applications, including person
authentication to banking to virtual assistants. Research has shown that these
systems are also susceptible to spoofing and attacks. Therefore, protecting
audio processing systems against fraudulent activities, such as identity theft,
financial fraud, and spreading misinformation, is of paramount importance. This
paper reviews the current state-of-the-art techniques for detecting audio
spoofing and discusses the current challenges along with open research
problems. The paper further highlights the importance of considering the
ethical and privacy implications of audio spoofing detection systems. Lastly,
the work aims to accentuate the need for building more robust and generalizable
methods, the integration of automatic speaker verification and countermeasure
systems, and better evaluation protocols.Comment: Accepted in IJCAI 202
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
We present our system submission to the ASVspoof 2019 Challenge Physical
Access (PA) task. The objective for this challenge was to develop a
countermeasure that identifies speech audio as either bona fide or intercepted
and replayed. The target prediction was a value indicating that a speech
segment was bona fide (positive values) or "spoofed" (negative values). Our
system used convolutional neural networks (CNNs) and a representation of the
speech audio that combined x-vector attack embeddings with signal processing
features. The x-vector attack embeddings were created from mel-frequency
cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These
embeddings jointly modeled 27 different environments and 9 types of attacks
from the labeled data. We also used sub-band spectral centroid magnitude
coefficients (SCMCs) as features. We included an additive Gaussian noise layer
during training as a way to augment the data to make our system more robust to
previously unseen attack examples. We report system performance using the
tandem detection cost function (tDCF) and equal error rate (EER). Our approach
performed better that both of the challenge baselines. Our technique suggests
that our x-vector attack embeddings can help regularize the CNN predictions
even when environments or attacks are more challenging.Comment: Presented at Interspeech 201
- …