Image forgery detection aims to detect and locate forged regions in an image.
Most existing forgery detection algorithms formulate classification problems to
classify pixels into forged or pristine. However, the definition of forged and
pristine pixels is only relative within one single image, e.g., a forged region
in image A is actually a pristine one in its source image B (splicing forgery).
Such a relative definition has been severely overlooked by existing methods,
which unnecessarily mix forged (pristine) regions across different images into
the same category. To resolve this dilemma, we propose the FOrensic ContrAstive
cLustering (FOCAL) method, a novel, simple yet very effective paradigm based on
contrastive learning and unsupervised clustering for the image forgery
detection. Specifically, FOCAL 1) utilizes pixel-level contrastive learning to
supervise the high-level forensic feature extraction in an image-by-image
manner, explicitly reflecting the above relative definition; 2) employs an
on-the-fly unsupervised clustering algorithm (instead of a trained one) to
cluster the learned features into forged/pristine categories, further
suppressing the cross-image influence from training data; and 3) allows to
further boost the detection performance via simple feature-level concatenation
without the need of retraining. Extensive experimental results over six public
testing datasets demonstrate that our proposed FOCAL significantly outperforms
the state-of-the-art competing algorithms by big margins: +24.3% on Coverage,
+18.6% on Columbia, +17.5% on FF++, +14.2% on MISD, +13.5% on CASIA and +10.3%
on NIST in terms of IoU. The paradigm of FOCAL could bring fresh insights and
serve as a novel benchmark for the image forgery detection task. The code is
available at https://github.com/HighwayWu/FOCAL