Generalisation -- the ability of a model to perform well on unseen data -- is
crucial for building reliable deep fake detectors. However, recent studies have
shown that the current audio deep fake models fall short of this desideratum.
In this paper we show that pretrained self-supervised representations followed
by a simple logistic regression classifier achieve strong generalisation
capabilities, reducing the equal error rate from 30% to 8% on the newly
introduced In-the-Wild dataset. Importantly, this approach also produces
considerably better calibrated models when compared to previous approaches.
This means that we can trust our model's predictions more and use these for
downstream tasks, such as uncertainty estimation. In particular, we show that
the entropy of the estimated probabilities provides a reliable way of rejecting
uncertain samples and further improving the accuracy.Comment: Submitted to ICASSP 202