Face presentation attacks, also known as spoofing attacks, pose a significant
threat to biometric systems that rely on facial recognition systems, such as
access control systems, mobile payments, and identity verification systems. To
prevent spoofing, several video-based methods have been presented in the
literature that analyze facial motion in successive video frames. However,
estimating the motion between adjacent frames is a challenging task and
requires high computational cost. In this paper, we reformulate the face
anti-spoofing task as a motion prediction problem and introduce a deep ensemble
learning model with a frame skipping mechanism. The proposed frame skipping is
based on a uniform sampling approach where the original video is divided into
fixed size video clips. In this way, every nth frame of the clip is selected to
ensure that the temporal patterns can easily be perceived during the training
of three different recurrent neural networks (RNNs). Motivated by the
performance of each RNNs, a meta-model is developed to improve the overall
recognition performance by combining the predictions of the individual RNNs.
Extensive experiments were conducted on four datasets, and state-of-the-art
performance is reported for MSU-MFSD (3.12\%), Replay-Attack (11.19\%), and
OULU-NPU (12.23\%) using half total error rate (HTER) in the most challenging
cross-dataset test scenario