10 research outputs found
A Review on Face Anti-Spoofing
The biometric system is a security technology that uses information based on a living person's characteristics to verify or recognize the identity, such as facial recognition. Face recognition has numerous applications in the real world, such as access control and surveillance. But face recognition has a security issue of spoofing. A face anti-spoofing, a task to prevent fake authorization by breaching the face recognition systems using a photo, video, mask, or a different substitute for an authorized person's face, is used to overcome this challenge. There is also increasing research of new datasets by providing new types of attack or diversity to reach a better generalization. This paper review of the recent development includes a general understanding of face spoofing, anti-spoofing methods, and the latest development to solve the problem against various spoof types
PipeNet: Selective Modal Pipeline of Fusion Network for Multi-Modal Face Anti-Spoofing
Face anti-spoofing has become an increasingly important and critical security
feature for authentication systems, due to rampant and easily launchable
presentation attacks. Addressing the shortage of multi-modal face dataset,
CASIA recently released the largest up-to-date CASIA-SURF Cross-ethnicity Face
Anti-spoofing(CeFA) dataset, covering 3 ethnicities, 3 modalities, 1607
subjects, and 2D plus 3D attack types in four protocols, and focusing on the
challenge of improving the generalization capability of face anti-spoofing in
cross-ethnicity and multi-modal continuous data. In this paper, we propose a
novel pipeline-based multi-stream CNN architecture called PipeNet for
multi-modal face anti-spoofing. Unlike previous works, Selective Modal Pipeline
(SMP) is designed to enable a customized pipeline for each data modality to
take full advantage of multi-modal data. Limited Frame Vote (LFV) is designed
to ensure stable and accurate prediction for video classification. The proposed
method wins the third place in the final ranking of Chalearn Multi-modal
Cross-ethnicity Face Anti-spoofing Recognition Challenge@CVPR2020. Our final
submission achieves the Average Classification Error Rate (ACER) of 2.21 with
Standard Deviation of 1.26 on the test set.Comment: Accepted to appear in CVPR2020 WM
Benchmarking Joint Face Spoofing and Forgery Detection with Visual and Physiological Cues
Face anti-spoofing (FAS) and face forgery detection play vital roles in
securing face biometric systems from presentation attacks (PAs) and vicious
digital manipulation (e.g., deepfakes). Despite promising performance upon
large-scale data and powerful deep models, the generalization problem of
existing approaches is still an open issue. Most of recent approaches focus on
1) unimodal visual appearance or physiological (i.e., remote
photoplethysmography (rPPG)) cues; and 2) separated feature representation for
FAS or face forgery detection. On one side, unimodal appearance and rPPG
features are respectively vulnerable to high-fidelity face 3D mask and video
replay attacks, inspiring us to design reliable multi-modal fusion mechanisms
for generalized face attack detection. On the other side, there are rich common
features across FAS and face forgery detection tasks (e.g., periodic rPPG
rhythms and vanilla appearance for bonafides), providing solid evidence to
design a joint FAS and face forgery detection system in a multi-task learning
fashion. In this paper, we establish the first joint face spoofing and forgery
detection benchmark using both visual appearance and physiological rPPG cues.
To enhance the rPPG periodicity discrimination, we design a two-branch
physiological network using both facial spatio-temporal rPPG signal map and its
continuous wavelet transformed counterpart as inputs. To mitigate the modality
bias and improve the fusion efficacy, we conduct a weighted batch and layer
normalization for both appearance and rPPG features before multi-modal fusion.
We find that the generalization capacities of both unimodal (appearance or
rPPG) and multi-modal (appearance+rPPG) models can be obviously improved via
joint training on these two tasks. We hope this new benchmark will facilitate
the future research of both FAS and deepfake detection communities.Comment: Accepted by IEEE Transactions on Dependable and Secure Computing
(TDSC). Corresponding authors: Zitong Yu and Wenhan Yan
Taming Self-Supervised Learning for Presentation Attack Detection: De-Folding and De-Mixing
Biometric systems are vulnerable to Presentation Attacks (PA) performed using
various Presentation Attack Instruments (PAIs). Even though there are numerous
Presentation Attack Detection (PAD) techniques based on both deep learning and
hand-crafted features, the generalization of PAD for unknown PAI is still a
challenging problem. In this work, we empirically prove that the initialization
of the PAD model is a crucial factor for the generalization, which is rarely
discussed in the community. Based on such observation, we proposed a
self-supervised learning-based method, denoted as DF-DM. Specifically, DF-DM is
based on a global-local view coupled with De-Folding and De-Mixing to derive
the task-specific representation for PAD. During De-Folding, the proposed
technique will learn region-specific features to represent samples in a local
pattern by explicitly minimizing generative loss. While De-Mixing drives
detectors to obtain the instance-specific features with global information for
more comprehensive representation by minimizing interpolation-based
consistency. Extensive experimental results show that the proposed method can
achieve significant improvements in terms of both face and fingerprint PAD in
more complicated and hybrid datasets when compared with state-of-the-art
methods. When training in CASIA-FASD and Idiap Replay-Attack, the proposed
method can achieve an 18.60% Equal Error Rate (EER) in OULU-NPU and MSU-MFSD,
exceeding baseline performance by 9.54%. The source code of the proposed
technique is available at https://github.com/kongzhecn/dfdm.Comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems
(TNNLS
STIDNet: Identity-Aware Face Forgery Detection with Spatiotemporal Knowledge Distillation
The impressive development of facial manipulation techniques has raised severe public concerns. Identity-aware methods, especially suitable for protecting celebrities, are seen as one of promising face forgery detection approaches with additional reference video. However, without in-depth observation of fake video’s characteristics, most existing identity-aware algorithms are just naive imitation of face verification model and fail to exploit discriminative information. In this article, we argue that it is necessary to take both spatial and temporal perspectives into consideration for adequate inconsistency clues and propose a novel forgery detector named SpatioTemporal IDentity network (STIDNet). To effectively capture heterogeneous spatiotemporal information in a unified formulation, our STIDNet is following a knowledge distillation architecture that the student identity extractor receives supervision from a spatial information encoder (SIE) and a temporal information encoder (TIE) through multiteacher training. Specifically, a regional sensitive identity modelling paradigm is proposed in SIE by introducing facial blending augmentation but with uniform identity label, thus encourage model to focus on spatial discriminative region like outer face. Meanwhile, considering the strong temporal correlation between audio and talking face video, our TIE is devised in a cross-modal pattern that the audio information is introduced to supervise model exploiting temporal personalized movements. Benefit from knowledge transfer from SIE and TIE, STIDNet is able to capture individual’s essential spatiotemporal identity attributes and sensitive to even subtle identity deviation caused by manipulation. Extensive experiments indicate the superiority of our STIDNet compared with previous works. Moreover, we also demonstrate STIDNet is more suitable for real-world implementation in terms of model complexity and reference set size
Deep spatial gradient and temporal depth learning for face anti-spoofing
Abstract
Face anti-spoofing is critical to the security of face recognition systems. Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing. Despite the great success, most previous works still formulate the problem as a single-frame multi-task one by simply augmenting the loss with depth, while neglecting the detailed fine-grained information and the interplay between facial depths and moving patterns. In contrast, we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces. The proposed method is able to capture discriminative details via Residual Spatial Gradient Block (RSGB) and encode spatio-temporal information from Spatio-Temporal Propagation Module (STPM) efficiently. Moreover, a novel Contrastive Depth Loss is presented for more accurate depth supervision. To assess the efficacy of our method, we also collect a Double-modal Anti-spoofing Dataset (DMAD) which provides actual depth for each sample. The experiments demonstrate that the proposed approach achieves state-of-the-art results on five benchmark datasets including OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, and the new DMAD. Codes will be available at https://github.com/clks-wzz/FAS-SGTD