3,881 research outputs found
Combining EfficientNet and Vision Transformers for Video Deepfake Detection
Deepfakes are the result of digital manipulation to obtain credible videos in
order to deceive the viewer. This is done through deep learning techniques
based on autoencoders or GANs that become more accessible and accurate year
after year, resulting in fake videos that are very difficult to distinguish
from real ones. Traditionally, CNN networks have been used to perform deepfake
detection, with the best results obtained using methods based on EfficientNet
B7. In this study, we combine various types of Vision Transformers with a
convolutional EfficientNet B0 used as a feature extractor, obtaining comparable
results with some very recent methods that use Vision Transformers. Differently
from the state-of-the-art approaches, we use neither distillation nor ensemble
methods. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very
close to the state-of-the-art on the DeepFake Detection Challenge (DFDC)
- …