91 research outputs found
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
We present our system submission to the ASVspoof 2019 Challenge Physical
Access (PA) task. The objective for this challenge was to develop a
countermeasure that identifies speech audio as either bona fide or intercepted
and replayed. The target prediction was a value indicating that a speech
segment was bona fide (positive values) or "spoofed" (negative values). Our
system used convolutional neural networks (CNNs) and a representation of the
speech audio that combined x-vector attack embeddings with signal processing
features. The x-vector attack embeddings were created from mel-frequency
cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These
embeddings jointly modeled 27 different environments and 9 types of attacks
from the labeled data. We also used sub-band spectral centroid magnitude
coefficients (SCMCs) as features. We included an additive Gaussian noise layer
during training as a way to augment the data to make our system more robust to
previously unseen attack examples. We report system performance using the
tandem detection cost function (tDCF) and equal error rate (EER). Our approach
performed better that both of the challenge baselines. Our technique suggests
that our x-vector attack embeddings can help regularize the CNN predictions
even when environments or attacks are more challenging.Comment: Presented at Interspeech 201
Voice biometric system security: Design and analysis of countermeasures for replay attacks.
PhD ThesisVoice biometric systems use automatic speaker veri cation (ASV) technology for
user authentication. Even if it is among the most convenient means of biometric
authentication, the robustness and security of ASV in the face of spoo ng attacks
(or presentation attacks) is of growing concern and is now well acknowledged
by the research community. A spoo ng attack involves illegitimate access to
personal data of a targeted user. Replay is among the simplest attacks to
mount | yet di cult to detect reliably and is the focus of this thesis.
This research focuses on the analysis and design of existing and novel countermeasures
for replay attack detection in ASV, organised in two major parts.
The rst part of the thesis investigates existing methods for spoo ng detection
from several perspectives. I rst study the generalisability of hand-crafted features
for replay detection that show promising results on synthetic speech detection.
I nd, however, that it is di cult to achieve similar levels of performance
due to the acoustically di erent problem under investigation. In addition, I show
how class-dependent cues in a benchmark dataset (ASVspoof 2017) can lead to
the manipulation of class predictions. I then analyse the performance of several
countermeasure models under varied replay attack conditions. I nd that it is
di cult to account for the e ects of various factors in a replay attack: acoustic
environment, playback device and recording device, and their interactions.
Subsequently, I developed and studied a convolutional neural network (CNN)
model that demonstrates comparable performance to the one that ranked rst
in the ASVspoof 2017 challenge. Here, the experiment analyses what the CNN
has learned for replay detection using a method from interpretable machine
learning. The ndings suggest that the model highly attends at the rst few
milliseconds of test recordings in order to make predictions. Then, I perform
an in-depth analysis of a benchmark dataset (ASVspoof 2017) for spoo ng detection
and demonstrate that any machine learning countermeasure model can
still exploit the artefacts I identi ed in this dataset.
The second part of the thesis studies the design of countermeasures for ASV,
focusing on model robustness and avoiding dataset biases. First, I proposed
an ensemble model combining shallow and deep machine learning methods for
spoo ng detection, and then demonstrate its e ectiveness on the latest benchmark
datasets (ASVspoof 2019). Next, I proposed the use of speech endpoint detection
for reliable and robust model predictions on the ASVspoof 2017 dataset.
For this, I created a publicly available collection of hand-annotations of speech
endpoints for the same dataset, and new benchmark results for both frame-based
and utterance-based countermeasures are also developed.
I then proposed spectral subband modelling using CNNs for replay detection.
My results indicate that models that learn subband-speci c information
substantially outperform models trained on complete spectrograms. Finally, I
proposed to use variational autoencoders | deep unsupervised generative models
| as an alternative backend for spoo ng detection and demonstrate encouraging
results when compared with the traditional Gaussian mixture mode
How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning
Shortcut learning, or `Clever Hans effect` refers to situations where a
learning agent (e.g., deep neural networks) learns spurious correlations
present in data, resulting in biased models. We focus on finding shortcuts in
deep learning based spoofing countermeasures (CMs) that predict whether a given
utterance is spoofed or not. While prior work has addressed specific data
artifacts, such as silence, no general normative framework has been explored
for analyzing shortcut learning in CMs. In this study, we propose a generic
approach to identifying shortcuts by introducing systematic interventions on
the training and test sides, including the boundary cases of `near-perfect` and
`worse than coin flip` (label flip). By using three different models, ranging
from classic to state-of-the-art, we demonstrate the presence of shortcut
learning in five simulated conditions. We analyze the results using a
regression model to understand how biases affect the class-conditional score
statistics.Comment: Interspeech 202
Ensemble Models for Spoofing Detection in Automatic Speaker Verification
Detecting spoofing attempts of automatic speaker verification (ASV) systems is challenging, especially when using only one modelling approach. For robustness, we use both deep neural networks and traditional machine learning models and combine them as ensemble models through logistic regression. They are trained to detect logical access (LA) and physical access (PA) attacks on the dataset released as part of the ASV Spoofing and Countermeasures Challenge 2019. We propose dataset partitions that ensure different attack types are present during training and validation to improve system robustness. Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types. We investigate why some models on the PA dataset strongly outperform others and find that spoofed recordings in the dataset tend to have longer silences at the end than genuine ones. By removing them, the PA task becomes much more challenging, with the tandem detection cost function (t-DCF) of our best single model rising from 0.1672 to 0.5018 and equal error rate (EER) increasing from 5.98% to 19.8% on the development set
Restrictive Voting Technique for Faces Spoofing Attack
Face anti-spoofing has become widely used due to the increasing use of biometric authentication systems that rely on facial recognition. It is a critical issue in biometric authentication systems that aim to prevent unauthorized access. In this paper, we propose a modified version of majority voting that ensembles the votes of six classifiers for multiple video chunks to improve the accuracy of face anti-spoofing. Our approach involves sampling sub-videos of 2 seconds each with a one-second overlap and classifying each sub-video using multiple classifiers. We then ensemble the classifications for each sub-video across all classifiers to decide the complete video classification. We focus on the False Acceptance Rate (FAR) metric to highlight the importance of preventing unauthorized access. We evaluated our method using the Replay Attack dataset and achieved a zero FAR. We also reported the Half Total Error Rate (HTER) and Equal Error Rate (EER) and gained a better result than most state-of-the-art methods. Our experimental results show that our proposed method significantly reduces the FAR, which is crucial for real-world face anti-spoofing applications
Biometric Presentation Attack Detection for Mobile Devices Using Gaze Information
Facial recognition systems are among the most widely deployed in biometric applications. However, such systems are vulnerable to presentation attacks (spoofing), where a person tries to disguise as someone else by mimicking their biometric data and thereby gaining access to the system. Significant research attention has been directed toward developing robust strategies for detecting such attacks and thus assuring the security of these systems in real-world applications. This thesis is focused on presentation attack detection for face recognition systems using a gaze tracking approach.
The proposed challenge-response presentation attack detection system assesses the gaze of the user in response to a randomly moving stimulus on the screen. The user is required to track the moving stimulus with their gaze with natural head/eye movements. If the response is adequately similar to the challenge, the access attempt is seen as genuine. The attack scenarios considered in this work included the use of hand held displayed photos, 2D masks, and 3D masks. Due to the nature of the proposed challenge-response approaches for presentation attack detection, none of the existing public databases were appropriate and a new database has been collected. The Kent Gaze Dynamics Database (KGDD) consists of 2,400 sets of genuine and attack-based presentation attempts collected from 80 participants. The use of a mobile device were simulated on a desktop PC for two possible geometries corresponding to mobile phone and tablet devices. Three different types of challenge trajectories were used in this data collection exercise.
A number of novel gaze-based features were explored to develop the presentation attack detection algorithm. Initial experiments using the KGDD provided an encouraging indication of the potential of the proposed system for attack detection. In order to explore the feasibility of the scheme on a real hand held device, another database, the Mobile KGDD (MKGDD), was collected from 30 participants using a single mobile device (Google Nexus 6), to test the proposed features.
Comprehensive experimental analysis has been performed on the two collected databases for each of the proposed features. Performance evaluation results indicate that the proposed gaze-based features are effective in discriminating between genuine and presentation attack attempts
- …