Synthetically generated face images have shown to be indistinguishable from
real images by humans and as such can lead to a lack of trust in digital
content as they can, for instance, be used to spread misinformation. Therefore,
the need to develop algorithms for detecting entirely synthetic face images is
apparent. Of interest are images generated by state-of-the-art deep
learning-based models, as these exhibit a high level of visual realism. Recent
works have demonstrated that detecting such synthetic face images under
realistic circumstances remains difficult as new and improved generative models
are proposed with rapid speed and arbitrary image post-processing can be
applied. In this work, we propose a multi-channel architecture for detecting
entirely synthetic face images which analyses information both in the frequency
and visible spectra using Cross Modal Focal Loss. We compare the proposed
architecture with several related architectures trained using Binary Cross
Entropy and show in cross-model experiments that the proposed architecture
supervised using Cross Modal Focal Loss, in general, achieves most competitive
performance