91 research outputs found

    Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features

    Full text link
    We present our system submission to the ASVspoof 2019 Challenge Physical Access (PA) task. The objective for this challenge was to develop a countermeasure that identifies speech audio as either bona fide or intercepted and replayed. The target prediction was a value indicating that a speech segment was bona fide (positive values) or "spoofed" (negative values). Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal processing features. The x-vector attack embeddings were created from mel-frequency cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These embeddings jointly modeled 27 different environments and 9 types of attacks from the labeled data. We also used sub-band spectral centroid magnitude coefficients (SCMCs) as features. We included an additive Gaussian noise layer during training as a way to augment the data to make our system more robust to previously unseen attack examples. We report system performance using the tandem detection cost function (tDCF) and equal error rate (EER). Our approach performed better that both of the challenge baselines. Our technique suggests that our x-vector attack embeddings can help regularize the CNN predictions even when environments or attacks are more challenging.Comment: Presented at Interspeech 201

    Voice biometric system security: Design and analysis of countermeasures for replay attacks.

    Get PDF
    PhD ThesisVoice biometric systems use automatic speaker veri cation (ASV) technology for user authentication. Even if it is among the most convenient means of biometric authentication, the robustness and security of ASV in the face of spoo ng attacks (or presentation attacks) is of growing concern and is now well acknowledged by the research community. A spoo ng attack involves illegitimate access to personal data of a targeted user. Replay is among the simplest attacks to mount | yet di cult to detect reliably and is the focus of this thesis. This research focuses on the analysis and design of existing and novel countermeasures for replay attack detection in ASV, organised in two major parts. The rst part of the thesis investigates existing methods for spoo ng detection from several perspectives. I rst study the generalisability of hand-crafted features for replay detection that show promising results on synthetic speech detection. I nd, however, that it is di cult to achieve similar levels of performance due to the acoustically di erent problem under investigation. In addition, I show how class-dependent cues in a benchmark dataset (ASVspoof 2017) can lead to the manipulation of class predictions. I then analyse the performance of several countermeasure models under varied replay attack conditions. I nd that it is di cult to account for the e ects of various factors in a replay attack: acoustic environment, playback device and recording device, and their interactions. Subsequently, I developed and studied a convolutional neural network (CNN) model that demonstrates comparable performance to the one that ranked rst in the ASVspoof 2017 challenge. Here, the experiment analyses what the CNN has learned for replay detection using a method from interpretable machine learning. The ndings suggest that the model highly attends at the rst few milliseconds of test recordings in order to make predictions. Then, I perform an in-depth analysis of a benchmark dataset (ASVspoof 2017) for spoo ng detection and demonstrate that any machine learning countermeasure model can still exploit the artefacts I identi ed in this dataset. The second part of the thesis studies the design of countermeasures for ASV, focusing on model robustness and avoiding dataset biases. First, I proposed an ensemble model combining shallow and deep machine learning methods for spoo ng detection, and then demonstrate its e ectiveness on the latest benchmark datasets (ASVspoof 2019). Next, I proposed the use of speech endpoint detection for reliable and robust model predictions on the ASVspoof 2017 dataset. For this, I created a publicly available collection of hand-annotations of speech endpoints for the same dataset, and new benchmark results for both frame-based and utterance-based countermeasures are also developed. I then proposed spectral subband modelling using CNNs for replay detection. My results indicate that models that learn subband-speci c information substantially outperform models trained on complete spectrograms. Finally, I proposed to use variational autoencoders | deep unsupervised generative models | as an alternative backend for spoo ng detection and demonstrate encouraging results when compared with the traditional Gaussian mixture mode

    How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning

    Full text link
    Shortcut learning, or `Clever Hans effect` refers to situations where a learning agent (e.g., deep neural networks) learns spurious correlations present in data, resulting in biased models. We focus on finding shortcuts in deep learning based spoofing countermeasures (CMs) that predict whether a given utterance is spoofed or not. While prior work has addressed specific data artifacts, such as silence, no general normative framework has been explored for analyzing shortcut learning in CMs. In this study, we propose a generic approach to identifying shortcuts by introducing systematic interventions on the training and test sides, including the boundary cases of `near-perfect` and `worse than coin flip` (label flip). By using three different models, ranging from classic to state-of-the-art, we demonstrate the presence of shortcut learning in five simulated conditions. We analyze the results using a regression model to understand how biases affect the class-conditional score statistics.Comment: Interspeech 202

    Ensemble Models for Spoofing Detection in Automatic Speaker Verification

    Get PDF
    Detecting spoofing attempts of automatic speaker verification (ASV) systems is challenging, especially when using only one modelling approach. For robustness, we use both deep neural networks and traditional machine learning models and combine them as ensemble models through logistic regression. They are trained to detect logical access (LA) and physical access (PA) attacks on the dataset released as part of the ASV Spoofing and Countermeasures Challenge 2019. We propose dataset partitions that ensure different attack types are present during training and validation to improve system robustness. Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types. We investigate why some models on the PA dataset strongly outperform others and find that spoofed recordings in the dataset tend to have longer silences at the end than genuine ones. By removing them, the PA task becomes much more challenging, with the tandem detection cost function (t-DCF) of our best single model rising from 0.1672 to 0.5018 and equal error rate (EER) increasing from 5.98% to 19.8% on the development set

    Restrictive Voting Technique for Faces Spoofing Attack

    Get PDF
    Face anti-spoofing has become widely used due to the increasing use of biometric authentication systems that rely on facial recognition. It is a critical issue in biometric authentication systems that aim to prevent unauthorized access. In this paper, we propose a modified version of majority voting that ensembles the votes of six classifiers for multiple video chunks to improve the accuracy of face anti-spoofing. Our approach involves sampling sub-videos of 2 seconds each with a one-second overlap and classifying each sub-video using multiple classifiers. We then ensemble the classifications for each sub-video across all classifiers to decide the complete video classification. We focus on the False Acceptance Rate (FAR) metric to highlight the importance of preventing unauthorized access. We evaluated our method using the Replay Attack dataset and achieved a zero FAR. We also reported the Half Total Error Rate (HTER) and Equal Error Rate (EER) and gained a better result than most state-of-the-art methods. Our experimental results show that our proposed method significantly reduces the FAR, which is crucial for real-world face anti-spoofing applications

    Analysing and Preventing Self-Issued Voice Commands

    Get PDF

    Biometric Presentation Attack Detection for Mobile Devices Using Gaze Information

    Get PDF
    Facial recognition systems are among the most widely deployed in biometric applications. However, such systems are vulnerable to presentation attacks (spoofing), where a person tries to disguise as someone else by mimicking their biometric data and thereby gaining access to the system. Significant research attention has been directed toward developing robust strategies for detecting such attacks and thus assuring the security of these systems in real-world applications. This thesis is focused on presentation attack detection for face recognition systems using a gaze tracking approach. The proposed challenge-response presentation attack detection system assesses the gaze of the user in response to a randomly moving stimulus on the screen. The user is required to track the moving stimulus with their gaze with natural head/eye movements. If the response is adequately similar to the challenge, the access attempt is seen as genuine. The attack scenarios considered in this work included the use of hand held displayed photos, 2D masks, and 3D masks. Due to the nature of the proposed challenge-response approaches for presentation attack detection, none of the existing public databases were appropriate and a new database has been collected. The Kent Gaze Dynamics Database (KGDD) consists of 2,400 sets of genuine and attack-based presentation attempts collected from 80 participants. The use of a mobile device were simulated on a desktop PC for two possible geometries corresponding to mobile phone and tablet devices. Three different types of challenge trajectories were used in this data collection exercise. A number of novel gaze-based features were explored to develop the presentation attack detection algorithm. Initial experiments using the KGDD provided an encouraging indication of the potential of the proposed system for attack detection. In order to explore the feasibility of the scheme on a real hand held device, another database, the Mobile KGDD (MKGDD), was collected from 30 participants using a single mobile device (Google Nexus 6), to test the proposed features. Comprehensive experimental analysis has been performed on the two collected databases for each of the proposed features. Performance evaluation results indicate that the proposed gaze-based features are effective in discriminating between genuine and presentation attack attempts
    • …
    corecore