Deepfake refers to tailored and synthetically generated videos which are now
prevalent and spreading on a large scale, threatening the trustworthiness of
the information available online. While existing datasets contain different
kinds of deepfakes which vary in their generation technique, they do not
consider progression of deepfakes in a "phylogenetic" manner. It is possible
that an existing deepfake face is swapped with another face. This process of
face swapping can be performed multiple times and the resultant deepfake can be
evolved to confuse the deepfake detection algorithms. Further, many databases
do not provide the employed generative model as target labels. Model
attribution helps in enhancing the explainability of the detection results by
providing information on the generative model employed. In order to enable the
research community to address these questions, this paper proposes DeePhy, a
novel Deepfake Phylogeny dataset which consists of 5040 deepfake videos
generated using three different generation techniques. There are 840 videos of
one-time swapped deepfakes, 2520 videos of two-times swapped deepfakes and 1680
videos of three-times swapped deepfakes. With over 30 GBs in size, the database
is prepared in over 1100 hours using 18 GPUs of 1,352 GB cumulative memory. We
also present the benchmark on DeePhy dataset using six deepfake detection
algorithms. The results highlight the need to evolve the research of model
attribution of deepfakes and generalize the process over a variety of deepfake
generation techniques. The database is available at:
http://iab-rubric.org/deephy-databaseComment: Accepted at 2022, International Joint Conference on Biometrics (IJCB
2022