11 research outputs found

    Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

    Full text link
    Robust audio anti-spoofing has been increasingly challenging due to the recent advancements on deepfake techniques. While spectrograms have demonstrated their capability for anti-spoofing, complementary information presented in multi-order spectral patterns have not been well explored, which limits their effectiveness for varying spoofing attacks. Therefore, we propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations. Specifically, spectral patterns up to second-order are fused in a coarse-to-fine manner and two branches are designed for the fine-level fusion from the spectral and temporal contexts. A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss. Our method achieved the state-of-the-art performance with an EER of 0.77% on a widely used dataset: ASVspoof2019 LA Challenge

    INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION

    Get PDF
    Ensuring security in speaker recognition systems is crucial. In the past years, it has been demonstrated that spoofing attacks can fool these systems. In order to deal with this issue, spoof speech detection systems have been developed. While these systems have served with a good performance, their effectiveness tends to degrade under noise. Traditional speech enhancement methods are not efficient for improving performance, they even make it worse. In this research paper, performance of the noise mask obtained via a convolutional neural network structure for reducing the noise effects was investigated. The mask is used to suppress noisy regions of spectrograms in order to extract robust i-vectors. The proposed system is tested on the ASVspoof 2015 database with three different noise types and accomplished superior performance compared to the traditional systems. However, there is a loss of performance in noise types that are not encountered during training phase

    An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing

    No full text

    Voice biometric system security: Design and analysis of countermeasures for replay attacks.

    Get PDF
    PhD ThesisVoice biometric systems use automatic speaker veri cation (ASV) technology for user authentication. Even if it is among the most convenient means of biometric authentication, the robustness and security of ASV in the face of spoo ng attacks (or presentation attacks) is of growing concern and is now well acknowledged by the research community. A spoo ng attack involves illegitimate access to personal data of a targeted user. Replay is among the simplest attacks to mount | yet di cult to detect reliably and is the focus of this thesis. This research focuses on the analysis and design of existing and novel countermeasures for replay attack detection in ASV, organised in two major parts. The rst part of the thesis investigates existing methods for spoo ng detection from several perspectives. I rst study the generalisability of hand-crafted features for replay detection that show promising results on synthetic speech detection. I nd, however, that it is di cult to achieve similar levels of performance due to the acoustically di erent problem under investigation. In addition, I show how class-dependent cues in a benchmark dataset (ASVspoof 2017) can lead to the manipulation of class predictions. I then analyse the performance of several countermeasure models under varied replay attack conditions. I nd that it is di cult to account for the e ects of various factors in a replay attack: acoustic environment, playback device and recording device, and their interactions. Subsequently, I developed and studied a convolutional neural network (CNN) model that demonstrates comparable performance to the one that ranked rst in the ASVspoof 2017 challenge. Here, the experiment analyses what the CNN has learned for replay detection using a method from interpretable machine learning. The ndings suggest that the model highly attends at the rst few milliseconds of test recordings in order to make predictions. Then, I perform an in-depth analysis of a benchmark dataset (ASVspoof 2017) for spoo ng detection and demonstrate that any machine learning countermeasure model can still exploit the artefacts I identi ed in this dataset. The second part of the thesis studies the design of countermeasures for ASV, focusing on model robustness and avoiding dataset biases. First, I proposed an ensemble model combining shallow and deep machine learning methods for spoo ng detection, and then demonstrate its e ectiveness on the latest benchmark datasets (ASVspoof 2019). Next, I proposed the use of speech endpoint detection for reliable and robust model predictions on the ASVspoof 2017 dataset. For this, I created a publicly available collection of hand-annotations of speech endpoints for the same dataset, and new benchmark results for both frame-based and utterance-based countermeasures are also developed. I then proposed spectral subband modelling using CNNs for replay detection. My results indicate that models that learn subband-speci c information substantially outperform models trained on complete spectrograms. Finally, I proposed to use variational autoencoders | deep unsupervised generative models | as an alternative backend for spoo ng detection and demonstrate encouraging results when compared with the traditional Gaussian mixture mode
    corecore