System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

Abstract

The malicious use of deep speech synthesis models may pose significant threat to society. Therefore, many studies have emerged to detect the so-called ``deepfake audio". However, these studies focus on the binary detection of real audio and fake audio. For some realistic application scenarios, it is needed to know what tool or model generated the deepfake audio. This raises a question: Can we recognize the system fingerprints of deepfake audio? Therefore, in this paper, we propose a deepfake audio dataset for system fingerprint recognition (SFR) and conduct an initial investigation. We collected the dataset from five speech synthesis systems using the latest state-of-the-art deep learning technologies, including both clean and compressed sets. In addition, to facilitate the further development of system fingerprint recognition methods, we give researchers some benchmarks that can be compared, and research findings. The dataset will be publicly available.Comment: 12 pages, 3 figures. arXiv admin note: text overlap with arXiv:2208.0964

    Similar works

    Full text

    thumbnail-image

    Available Versions