Article thumbnail

Using audio fingerprinting for duplicate detection and thumbnail generation

By Christopher J. C. Burges, Daniel Plastina, John C. Platt, Erin Renshaw and Henrique S. Malvar


Audio fingerprinting is a powerful tool for identifying filebased or streaming audio, using a database of fingerprints. This paper presents two new applications of audio fingerprinting: duplicate detection, whose goal is to identify duplicate audio clips in a set, even if they differ in compression quality or duration, and thumbnail generation, which aims to provide a representative short clip of a music track. Neither application requires an external database of fingerprints. Thanks to the robustness of the fingerprinting engine, both applications perform well; the duplicate detector has a false positive rate that is conservatively bounded above by 1 % on a very large data set, and the thumbnail generator significantly outperforms using a fixed window. We build these two applications using the RARE (Robust Audio Recognition Engine) AFP system [5], which converts a segment of audio to 64 floating-point numbers (a fingerprint), and identifies clips using a weighted Euclidean distance. RARE has been shown to be very robust to distortions of the original audio [5]. In the following, “trace ” will mean any kind of fingerprint extracted from audio, and “fingerprint” will mean a reference fingerprint against which traces are compared to determine the audio identity. 2. THE RARE DUPLICATE DETECTOR The RARE duplicate detector DupDet works as shown in Fig. 1, recursively processing all audio files in a directory tree. It creates a set of traces for each file, and checks them 1

Year: 2005
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.