A critical yet frequently overlooked challenge in the field of deepfake
detection is the lack of a standardized, unified, comprehensive benchmark. This
issue leads to unfair performance comparisons and potentially misleading
results. Specifically, there is a lack of uniformity in data processing
pipelines, resulting in inconsistent data inputs for detection models.
Additionally, there are noticeable differences in experimental settings, and
evaluation strategies and metrics lack standardization. To fill this gap, we
present the first comprehensive benchmark for deepfake detection, called
DeepfakeBench, which offers three key contributions: 1) a unified data
management system to ensure consistent input across all detectors, 2) an
integrated framework for state-of-the-art methods implementation, and 3)
standardized evaluation metrics and protocols to promote transparency and
reproducibility. Featuring an extensible, modular-based codebase, DeepfakeBench
contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series
of deepfake detection evaluation protocols and analysis tools, as well as
comprehensive evaluations. Moreover, we provide new insights based on extensive
analysis of these evaluations from various perspectives (e.g., data
augmentations, backbones). We hope that our efforts could facilitate future
research and foster innovation in this increasingly critical domain. All codes,
evaluations, and analyses of our benchmark are publicly available at
https://github.com/SCLBD/DeepfakeBench