111 research outputs found

    One-shot lip-based biometric authentication: extending behavioral features with authentication phrase information

    Full text link
    Lip-based biometric authentication (LBBA) is an authentication method based on a person's lip movements during speech in the form of video data captured by a camera sensor. LBBA can utilize both physical and behavioral characteristics of lip movements without requiring any additional sensory equipment apart from an RGB camera. State-of-the-art (SOTA) approaches use one-shot learning to train deep siamese neural networks which produce an embedding vector out of these features. Embeddings are further used to compute the similarity between an enrolled user and a user being authenticated. A flaw of these approaches is that they model behavioral features as style-of-speech without relation to what is being said. This makes the system vulnerable to video replay attacks of the client speaking any phrase. To solve this problem we propose a one-shot approach which models behavioral features to discriminate against what is being said in addition to style-of-speech. We achieve this by customizing the GRID dataset to obtain required triplets and training a siamese neural network based on 3D convolutions and recurrent neural network layers. A custom triplet loss for batch-wise hard-negative mining is proposed. Obtained results using an open-set protocol are 3.2% FAR and 3.8% FRR on the test set of the customized GRID dataset. Additional analysis of the results was done to quantify the influence and discriminatory power of behavioral and physical features for LBBA.Comment: 28 pages, 10 figures, 7 table

    Face Anti-Spoofing and Deep Learning Based Unsupervised Image Recognition Systems

    Get PDF
    One of the main problems of a supervised deep learning approach is that it requires large amounts of labeled training data, which are not always easily available. This PhD dissertation addresses the above-mentioned problem by using a novel unsupervised deep learning face verification system called UFace, that does not require labeled training data as it automatically, in an unsupervised way, generates training data from even a relatively small size of data. The method starts by selecting, in unsupervised way, k-most similar and k-most dissimilar images for a given face image. Moreover, this PhD dissertation proposes a new loss function to make it work with the proposed method. Specifically, the method computes loss function k times for both similar and dissimilar images for each input image in order to increase the discriminative power of feature vectors to learn the inter-class and intra-class face variability. The training is carried out based on the similar and dissimilar input face image vector rather than the same training input face image vector in order to extract face embeddings. The UFace is evaluated on four benchmark face verification datasets: Labeled Faces in the Wild dataset (LFW), YouTube Faces dataset (YTF), Cross-age LFW (CALFW) and Celebrities in Frontal Profile in the Wild (CFP-FP) datasets. The results show that we gain an accuracy of 99.40\%, 96.04\%, 95.12\% and 97.89\% respectively. The achieved results, despite being unsupervised, is on par to a similar but fully supervised methods. Another, related to face verification, area of research is on face anti-spoofing systems. State-of-the-art face anti-spoofing systems use either deep learning, or manually extracted image quality features. However, many of the existing image quality features used in face anti-spoofing systems are not well discriminating spoofed and genuine faces. Additionally, State-of-the-art face anti-spoofing systems that use deep learning approaches do not generalize well. Thus, to address the above problem, this PhD dissertation proposes hybrid face anti-spoofing system that considers the best from image quality feature and deep learning approaches. This work selects and proposes a set of seven novel no-reference image quality features measurement, that discriminate well between spoofed and genuine faces, to complement the deep learning approach. It then, proposes two approaches: In the first approach, the scores from the image quality features are fused with the deep learning classifier scores in a weighted fashion. The combined scores are used to determine whether a given input face image is genuine or spoofed. In the second approach, the image quality features are concatenated with the deep learning features. Then, the concatenated features vector is fed to the classifier to improve the performance and generalization of anti-spoofing system. Extensive evaluations are conducted to evaluate their performance on five benchmark face anti-spoofing datasets: Replay-Attack, CASIA-MFSD, MSU-MFSD, OULU-NPU and SiW. Experiments on these datasets show that it gives better results than several of the state-of-the-art anti-spoofing systems in many scenarios

    EfficientWord-Net: An Open Source Hotword Detection Engine based on One-shot Learning

    Full text link
    Voice assistants like Siri, Google Assistant, Alexa etc. are used widely across the globe for home automation, these require the use of special phrases also known as hotwords to wake it up and perform an action like "Hey Alexa!", "Ok Google!" and "Hey Siri!" etc. These hotwords are detected with lightweight real-time engines whose purpose is to detect the hotwords uttered by the user. This paper presents the design and implementation of a hotword detection engine based on one-shot learning which detects the hotword uttered by the user in real-time with just one or few training samples of the hotword. This approach is efficient when compared to existing implementations because the process of adding a new hotword in the existing systems requires enormous amounts of positive and negative training samples and the model needs to retrain for every hotword. This makes the existing implementations inefficient in terms of computation and cost. The architecture proposed in this paper has achieved an accuracy of 94.51%.Comment: 9 pages, 17 figure

    Learning Domain Invariant Information to Enhance Presentation Attack Detection in Visible Face Recognition Systems

    Get PDF
    Face signatures, including size, shape, texture, skin tone, eye color, appearance, and scars/marks, are widely used as discriminative, biometric information for access control. Despite recent advancements in facial recognition systems, presentation attacks on facial recognition systems have become increasingly sophisticated. The ability to detect presentation attacks or spoofing attempts is a pressing concern for the integrity, security, and trust of facial recognition systems. Multi-spectral imaging has been previously introduced as a way to improve presentation attack detection by utilizing sensors that are sensitive to different regions of the electromagnetic spectrum (e.g., visible, near infrared, long-wave infrared). Although multi-spectral presentation attack detection systems may be discriminative, the need for additional sensors and computational resources substantially increases complexity and costs. Instead, we propose a method that exploits information from infrared imagery during training to increase the discriminability of visible-based presentation attack detection systems. We introduce (1) a new cross-domain presentation attack detection framework that increases the separability of bonafide and presentation attacks using only visible spectrum imagery, (2) an inverse domain regularization technique for added training stability when optimizing our cross-domain presentation attack detection framework, and (3) a dense domain adaptation subnetwork to transform representations between visible and non-visible domains. Adviser: Benjamin Rigga

    Backdoor Attacks and Defences on Deep Neural Networks

    Get PDF
    Nowadays, due to the huge amount of resources required for network training, pre-trained models are commonly exploited in all kinds of deep learning tasks, like image classification, natural language processing, etc. These models are directly deployed in the real environments, or only fine-tuned on a limited set of data that are collected, for instance, from the Internet. However, a natural question arises: can we trust pre-trained models or the data downloaded from the Internet? The answer is ‘No’. An attacker can easily perform a so-called backdoor attack to hide a backdoor into a pre-trained model by poisoning the dataset used for training or indirectly releasing some poisoned data on the Internet as a bait. Such an attack is stealthy since the hidden backdoor does not affect the behaviour of the network in normal operating conditions, and the malicious behaviour being activated only when a triggering signal is presented at the network input. In this thesis, we present a general framework for backdoor attacks and defences, and overview the state-of-the-art backdoor attacks and the corresponding defences in the field image classification, by casting them in the introduced framework. By focusing on the face recognition domain, two new backdoor attacks were proposed, effective under different threat models. Finally, we design a universal method to defend against backdoor attacks, regardless of the specific attack setting, namely the poisoning strategy and the triggering signal

    A Study of Using Cepstrogram for Countermeasure Against Replay Attacks

    Full text link
    In this paper, we investigate the properties of the cepstrogram and demonstrate its effectiveness as a powerful feature for countermeasure against replay attacks. Cepstrum analysis of replay attacks suggests that crucial information for anti-spoofing against replay attacks may retain in the cepstrogram. Experimental results on the ASVspoof 2019 physical access (PA) database demonstrate that, compared with other features, the cepstrogram dominates in both single and fusion systems when building countermeasures against replay attacks. Our LCNN-based single and fusion systems with the cepstrogram feature outperform the corresponding LCNN-based systems without using the cepstrogram feature and several state-of-the-art (SOTA) single and fusion systems in the literature.Comment: Submitted to INTERSPEECH 202

    Deep Generative Variational Autoencoding for Replay Spoof Detection in Automatic Speaker Verification

    Get PDF
    Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount - yet difficult to detect reliably. The generalization failure of spoofing countermeasures (CMs) has driven the community to study various alternative deep learning CMs. The majority of them are supervised approaches that learn a human-spoof discriminator. In this paper, we advocate a different, deep generative approach that leverages from powerful unsupervised manifold learning in classification. The potential benefits include the possibility to sample new data, and to obtain insights to the latent features of genuine and spoofed speech. To this end, we propose to use variational autoencoders (VAEs) as an alternative backend for replay attack detection, via three alternative models that differ in their class-conditioning. The first one, similar to the use of Gaussian mixture models (GMMs) in spoof detection, is to train independently two VAEs - one for each class. The second one is to train a single conditional model (C-VAE) by injecting a one-hot class label vector to the encoder and decoder networks. Our final proposal integrates an auxiliary classifier to guide the learning of the latent space. Our experimental results using constant-Q cepstral coefficient (CQCC) features on the ASVspoof 2017 and 2019 physical access subtask datasets indicate that the C-VAE offers substantial improvement in comparison to training two separate VAEs for each class. On the 2019 dataset, the C-VAE outperforms the VAE and the baseline GMM by an absolute 9-10% in both equal error rate (EER) and tandem detection cost function (t-DCF) metrics. Finally, we propose VAE residuals --- the absolute difference of the original input and the reconstruction as features for spoofing detection. The proposed frontend approach augmented with a convolutional neural network classifier demonstrated substantial improvement over the VAE backend use case

    Audio Deepfake Detection: A Survey

    Full text link
    Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results
    • …
    corecore