252 research outputs found
Attentive filtering networks for audio replay attack detection
An attacker may use a variety of techniques to fool an automatic speaker
verification system into accepting them as a genuine user. Anti-spoofing
methods meanwhile aim to make the system robust against such attacks. The
ASVspoof 2017 Challenge focused specifically on replay attacks, with the
intention of measuring the limits of replay attack detection as well as
developing countermeasures against them. In this work, we propose our replay
attacks detection system - Attentive Filtering Network, which is composed of an
attention-based filtering mechanism that enhances feature representations in
both the frequency and time domains, and a ResNet-based classifier. We show
that the network enables us to visualize the automatically acquired feature
representations that are helpful for spoofing detection. Attentive Filtering
Network attains an evaluation EER of 8.99 on the ASVspoof 2017 Version 2.0
dataset. With system fusion, our best system further obtains a 30 relative
improvement over the ASVspoof 2017 enhanced baseline system.Comment: Submitted to ICASSP 201
Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Malicious actors may seek to use different voice-spoofing attacks to fool ASV
systems and even use them for spreading misinformation. Various countermeasures
have been proposed to detect these spoofing attacks. Due to the extensive work
done on spoofing detection in automated speaker verification (ASV) systems in
the last 6-7 years, there is a need to classify the research and perform
qualitative and quantitative comparisons on state-of-the-art countermeasures.
Additionally, no existing survey paper has reviewed integrated solutions to
voice spoofing evaluation and speaker verification, adversarial/antiforensics
attacks on spoofing countermeasures, and ASV itself, or unified solutions to
detect multiple attacks using a single model. Further, no work has been done to
provide an apples-to-apples comparison of published countermeasures in order to
assess their generalizability by evaluating them across corpora. In this work,
we conduct a review of the literature on spoofing detection using hand-crafted
features, deep learning, end-to-end, and universal spoofing countermeasure
solutions to detect speech synthesis (SS), voice conversion (VC), and replay
attacks. Additionally, we also review integrated solutions to voice spoofing
evaluation and speaker verification, adversarial and anti-forensics attacks on
voice countermeasures, and ASV. The limitations and challenges of the existing
spoofing countermeasures are also presented. We report the performance of these
countermeasures on several datasets and evaluate them across corpora. For the
experiments, we employ the ASVspoof2019 and VSDC datasets along with GMM, SVM,
CNN, and CNN-GRU classifiers. (For reproduceability of the results, the code of
the test bed can be found in our GitHub Repository
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
We present our system submission to the ASVspoof 2019 Challenge Physical
Access (PA) task. The objective for this challenge was to develop a
countermeasure that identifies speech audio as either bona fide or intercepted
and replayed. The target prediction was a value indicating that a speech
segment was bona fide (positive values) or "spoofed" (negative values). Our
system used convolutional neural networks (CNNs) and a representation of the
speech audio that combined x-vector attack embeddings with signal processing
features. The x-vector attack embeddings were created from mel-frequency
cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These
embeddings jointly modeled 27 different environments and 9 types of attacks
from the labeled data. We also used sub-band spectral centroid magnitude
coefficients (SCMCs) as features. We included an additive Gaussian noise layer
during training as a way to augment the data to make our system more robust to
previously unseen attack examples. We report system performance using the
tandem detection cost function (tDCF) and equal error rate (EER). Our approach
performed better that both of the challenge baselines. Our technique suggests
that our x-vector attack embeddings can help regularize the CNN predictions
even when environments or attacks are more challenging.Comment: Presented at Interspeech 201
Audio Deepfake Detection: A Survey
Audio deepfake detection is an emerging active topic. A growing number of
literatures have aimed to study deepfake detection algorithms and achieved
effective performance, the problem of which is far from being solved. Although
there are some review literatures, there has been no comprehensive survey that
provides researchers with a systematic overview of these developments with a
unified evaluation. Accordingly, in this survey paper, we first highlight the
key differences across various types of deepfake audio, then outline and
analyse competitions, datasets, features, classifications, and evaluation of
state-of-the-art approaches. For each aspect, the basic techniques, advanced
developments and major challenges are discussed. In addition, we perform a
unified comparison of representative features and classifiers on ASVspoof 2021,
ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively.
The survey shows that future research should address the lack of large scale
datasets in the wild, poor generalization of existing detection methods to
unknown fake attacks, as well as interpretability of detection results
- …