495 research outputs found
Deep Learning for Face Anti-Spoofing: A Survey
Face anti-spoofing (FAS) has lately attracted increasing attention due to its
vital role in securing face recognition systems from presentation attacks
(PAs). As more and more realistic PAs with novel types spring up, traditional
FAS methods based on handcrafted features become unreliable due to their
limited representation capacity. With the emergence of large-scale academic
datasets in the recent decade, deep learning based FAS achieves remarkable
performance and dominates this area. However, existing reviews in this field
mainly focus on the handcrafted features, which are outdated and uninspiring
for the progress of FAS community. In this paper, to stimulate future research,
we present the first comprehensive review of recent advances in deep learning
based FAS. It covers several novel and insightful components: 1) besides
supervision with binary label (e.g., '0' for bonafide vs. '1' for PAs), we also
investigate recent methods with pixel-wise supervision (e.g., pseudo depth
map); 2) in addition to traditional intra-dataset evaluation, we collect and
analyze the latest methods specially designed for domain generalization and
open-set FAS; and 3) besides commercial RGB camera, we summarize the deep
learning applications under multi-modal (e.g., depth and infrared) or
specialized (e.g., light field and flash) sensors. We conclude this survey by
emphasizing current open issues and highlighting potential prospects.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
Unmasking the imposters: towards improving the generalisation of deep learning methods for face presentation attack detection.
Identity theft has had a detrimental impact on the reliability of face recognition, which has been extensively employed in security applications. The most prevalent are presentation attacks. By using a photo, video, or mask of an authorized user, attackers can bypass face recognition systems. Fake presentation attacks are detected by the camera sensors of face recognition systems using face presentation attack detection. Presentation attacks can be detected using convolutional neural networks, commonly used in computer vision applications. An in-depth analysis of current deep learning methods is used in this research to examine various aspects of detecting face presentation attacks. A number of new techniques are implemented and evaluated in this study, including pre-trained models, manual feature extraction, and data aggregation. The thesis explores the effectiveness of various machine learning and deep learning models in improving detection performance by using publicly available datasets with different dataset partitions than those specified in the official dataset protocol. Furthermore, the research investigates how deep models and data aggregation can be used to detect face presentation attacks, as well as a novel approach that combines manual features with deep features in order to improve detection accuracy. Moreover, task-specific features are also extracted using pre-trained deep models to enhance the performance of detection and generalisation further. This problem is motivated by the need to achieve generalization against new and rapidly evolving attack variants. It is possible to extract identifiable features from presentation attack variants in order to detect them. However, new methods are needed to deal with emerging attacks and improve the generalization capability. This thesis examines the necessary measures to detect face presentation attacks in a more robust and generalised manner
Dual Teacher Knowledge Distillation with Domain Alignment for Face Anti-spoofing
Face recognition systems have raised concerns due to their vulnerability to
different presentation attacks, and system security has become an increasingly
critical concern. Although many face anti-spoofing (FAS) methods perform well
in intra-dataset scenarios, their generalization remains a challenge. To
address this issue, some methods adopt domain adversarial training (DAT) to
extract domain-invariant features. However, the competition between the encoder
and the domain discriminator can cause the network to be difficult to train and
converge. In this paper, we propose a domain adversarial attack (DAA) method to
mitigate the training instability problem by adding perturbations to the input
images, which makes them indistinguishable across domains and enables domain
alignment. Moreover, since models trained on limited data and types of attacks
cannot generalize well to unknown attacks, we propose a dual perceptual and
generative knowledge distillation framework for face anti-spoofing that
utilizes pre-trained face-related models containing rich face priors.
Specifically, we adopt two different face-related models as teachers to
transfer knowledge to the target student model. The pre-trained teacher models
are not from the task of face anti-spoofing but from perceptual and generative
tasks, respectively, which implicitly augment the data. By combining both DAA
and dual-teacher knowledge distillation, we develop a dual teacher knowledge
distillation with domain alignment framework (DTDA) for face anti-spoofing. The
advantage of our proposed method has been verified through extensive ablation
studies and comparison with state-of-the-art methods on public datasets across
multiple protocols
Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Malicious actors may seek to use different voice-spoofing attacks to fool ASV
systems and even use them for spreading misinformation. Various countermeasures
have been proposed to detect these spoofing attacks. Due to the extensive work
done on spoofing detection in automated speaker verification (ASV) systems in
the last 6-7 years, there is a need to classify the research and perform
qualitative and quantitative comparisons on state-of-the-art countermeasures.
Additionally, no existing survey paper has reviewed integrated solutions to
voice spoofing evaluation and speaker verification, adversarial/antiforensics
attacks on spoofing countermeasures, and ASV itself, or unified solutions to
detect multiple attacks using a single model. Further, no work has been done to
provide an apples-to-apples comparison of published countermeasures in order to
assess their generalizability by evaluating them across corpora. In this work,
we conduct a review of the literature on spoofing detection using hand-crafted
features, deep learning, end-to-end, and universal spoofing countermeasure
solutions to detect speech synthesis (SS), voice conversion (VC), and replay
attacks. Additionally, we also review integrated solutions to voice spoofing
evaluation and speaker verification, adversarial and anti-forensics attacks on
voice countermeasures, and ASV. The limitations and challenges of the existing
spoofing countermeasures are also presented. We report the performance of these
countermeasures on several datasets and evaluate them across corpora. For the
experiments, we employ the ASVspoof2019 and VSDC datasets along with GMM, SVM,
CNN, and CNN-GRU classifiers. (For reproduceability of the results, the code of
the test bed can be found in our GitHub Repository
FLIP: Cross-domain Face Anti-spoofing with Language Guidance
Face anti-spoofing (FAS) or presentation attack detection is an essential
component of face recognition systems deployed in security-critical
applications. Existing FAS methods have poor generalizability to unseen spoof
types, camera sensors, and environmental conditions. Recently, vision
transformer (ViT) models have been shown to be effective for the FAS task due
to their ability to capture long-range dependencies among image patches.
However, adaptive modules or auxiliary loss functions are often required to
adapt pre-trained ViT weights learned on large-scale datasets such as ImageNet.
In this work, we first show that initializing ViTs with multimodal (e.g., CLIP)
pre-trained weights improves generalizability for the FAS task, which is in
line with the zero-shot transfer capabilities of vision-language pre-trained
(VLP) models. We then propose a novel approach for robust cross-domain FAS by
grounding visual representations with the help of natural language.
Specifically, we show that aligning the image representation with an ensemble
of class descriptions (based on natural language semantics) improves FAS
generalizability in low-data regimes. Finally, we propose a multimodal
contrastive learning strategy to boost feature generalization further and
bridge the gap between source and target domains. Extensive experiments on
three standard protocols demonstrate that our method significantly outperforms
the state-of-the-art methods, achieving better zero-shot transfer performance
than five-shot transfer of adaptive ViTs. Code:
https://github.com/koushiksrivats/FLIPComment: Accepted to ICCV-2023. Project Page:
https://koushiksrivats.github.io/FLIP
Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment
This work studies the generalization issue of face anti-spoofing (FAS) models
on domain gaps, such as image resolution, blurriness and sensor variations.
Most prior works regard domain-specific signals as a negative impact, and apply
metric learning or adversarial losses to remove them from feature
representation. Though learning a domain-invariant feature space is viable for
the training data, we show that the feature shift still exists in an unseen
test domain, which backfires on the generalizability of the classifier. In this
work, instead of constructing a domain-invariant feature space, we encourage
domain separability while aligning the live-to-spoof transition (i.e., the
trajectory from live to spoof) to be the same for all domains. We formulate
this FAS strategy of separability and alignment (SA-FAS) as a problem of
invariant risk minimization (IRM), and learn domain-variant feature
representation but domain-invariant classifier. We demonstrate the
effectiveness of SA-FAS on challenging cross-domain FAS datasets and establish
state-of-the-art performance.Comment: Accepted in CVPR202
- …