2 research outputs found

    A Deterministic Eviction Model for Removing Redundancies in Video Corpus

    Get PDF
    The traditional storage approaches are being challenged by huge data volumes. In multimedia content, every file does not necessarily get tagged as an exact duplicate; rather they are prone to editing and resulting in similar copies of the same file. This paper proposes the similarity-based deduplication approach to evict similar duplicates from the archive storage, which compares the samples of binary hashes to identify the duplicates. This eviction is done by initially dividing the query video into dynamic key frames based on the video length. Binary hash codes of these frames are then compared with existing key frames to identify the differences. The similarity score is determined based on these differences, which decides the eradication strategy of duplicate copy. Duplicate elimination goes through two levels, namely removal of exact duplicates and similar duplicates. The proposed approach has shortened the comparison window by comparing only the candidate hash codes based on the dynamic key frames and aims the accurate lossless duplicate removals. The presented work is executed and tested on the produced synthetic video dataset. Results show the reduction in redundant data and increase in the storage space. Binary hashes and similarity scores contributed to achieving good deduplication ratio and overall performance

    Cross-Layer Fragment Indexing based File Deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for Optimized Cloud Storage

    Get PDF
    Cloud computing and storage processing is a big service for maintaining a large number of data in a centralized server to store and retrieve data depending on the use to pay as a service model. Due to increasing storage depending on duplicate copy presence during different sceneries, the increased size leads to increased cost. To resolve this problem, we propose a Cross-Layer Fragment Indexing (CLFI) based file deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for optimized cloud storage. Initially, the file storage indexing easy carried out with Lexical Syntactic Parser (LSP) to split the files into blocks. Then comparativesector was created based on Chunk staking. Based on the file frequency weight, the relative Indexing was verified through Cross-Layer Fragment Indexing (CLFI). Then the fragmented index gets grouped by maximum relative threshold margin usingIntra Subset Near-Duplicate Clusters (ISNDC). The hashing is applied to get comparative index points based on hyper correlation comparer using Hyper Spectral Hash Duplicate Filter (HSHDF). This filter the near duplicate contentdepending on file content difference to identify the duplicates. This proposed system produces high performance compared to the other system. This optimizes cloudstorage and has a higher precision rate than other methods
    corecore