960 research outputs found

    Perceptual Video Hashing for Content Identification and Authentication

    Get PDF
    Perceptual hashing has been broadly used in the literature to identify similar contents for video copy detection. It has also been adopted to detect malicious manipulations for video authentication. However, targeting both applications with a single system using the same hash would be highly desirable as this saves the storage space and reduces the computational complexity. This paper proposes a perceptual video hashing system for content identification and authentication. The objective is to design a hash extraction technique that can withstand signal processing operations on one hand and detect malicious attacks on the other hand. The proposed system relies on a new signal calibration technique for extracting the hash using the discrete cosine transform (DCT) and the discrete sine transform (DST). This consists of determining the number of samples, called the normalizing shift, that is required for shifting a digital signal so that the shifted version matches a certain pattern according to DCT/DST coefficients. The rationale for the calibration idea is that the normalizing shift resists signal processing operations while it exhibits sensitivity to local tampering (i.e., replacing a small portion of the signal with a different one). While the same hash serves both applications, two different similarity measures have been proposed for video identification and authentication, respectively. Through intensive experiments with various types of video distortions and manipulations, the proposed system has been shown to outperform related state-of-the art video hashing techniques in terms of identification and authentication with the advantageous ability to locate tampered regions

    Fast fallback watermark detection using perceptual hashes

    Get PDF
    Forensic watermarking is often used to enable the tracing of digital pirates that leak copyright-protected videos. However, existing watermarking methods have a limited robustness and may be vulnerable to targeted attacks. Our previous work proposed a fallback detection method that uses secondary watermarks rather than the primary watermarks embedded by existing methods. However, the previously proposed fallback method is slow and requires access to all watermarked videos. This paper proposes to make the fallback watermark detection method faster using perceptual hashes instead of uncompressed secondary watermark signals. These perceptual hashes can be calculated prior to detection, such that the actual detection process is sped up with a factor of approximately 26,000 to 92,000. In this way, the proposed method tackles the main criticism about practical usability of the slow fallback method. The fast detection comes at the cost of a modest decrease in robustness, although the fast fallback detection method can still outperform the existing primary watermark method. In conclusion, the proposed method enables fast and more robust detection of watermarks that were embedded by existing watermarking methods

    Perceptual Video Hashing for Content Identification and Authentication

    Full text link

    Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

    Get PDF
    Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval
    • …