1,165 research outputs found

    A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems

    Full text link
    Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a system failure, resulting in inconsistencies in data and deduplication metadata. In this paper, we propose a robust, fault-tolerant and scalable cluster-wide deduplication that can eliminate duplicate copies across the cluster. We design a distributed deduplication metadata shard which guarantees performance scalability while preserving the design constraints of shared- nothing storage systems. The placement of chunks and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. To ensure transactional consistency and garbage identification, we employ a flag-based asynchronous consistency mechanism. We implement the proposed deduplication on Ceph. The evaluation shows high disk-space savings with minimal performance degradation as well as high robustness in the event of sudden server failure.Comment: 6 Pages including reference

    SEARS: Space Efficient And Reliable Storage System in the Cloud

    Full text link
    Today's cloud storage services must offer storage reliability and fast data retrieval for large amount of data without sacrificing storage cost. We present SEARS, a cloud-based storage system which integrates erasure coding and data deduplication to support efficient and reliable data storage with fast user response time. With proper association of data to storage server clusters, SEARS provides flexible mixing of different configurations, suitable for real-time and archival applications. Our prototype implementation of SEARS over Amazon EC2 shows that it outperforms existing storage systems in storage efficiency and file retrieval time. For 3 MB files, SEARS delivers retrieval time of 2.52.5 s compared to 77 s with existing systems.Comment: 4 pages, IEEE LCN 201

    A Survey on Data Deduplication

    Get PDF
    Now-a-days, the demand of data storage capacity is increasing drastically. Due to more demands of storage, the computer society is attracting toward cloud storage. Security of data and cost factors are important challenges in cloud storage. A duplicate file not only waste the storage, it also increases the access time. So the detection and removal of duplicate data is an essential task. Data deduplication, an efficient approach to data reduction, has gained increasing attention and popularity in large-scale storage systems. It eliminates redundant data at the file or subfile level and identifies duplicate content by its cryptographically secure hash signature. It is very tricky because neither duplicate files don?t have a common key nor they contain error. There are several approaches to identify and remove redundant data at file and chunk levels. In this paper, the background and key features of data deduplication is covered, then summarize and classify the data deduplication process according to the key workflow

    Optimal Data Deduplication In Cloud With Authorization

    Get PDF
    Cloud technology is widely used technology as it allows sharing and centralized storage of data, sharing of data processing and online access of computer services and resources on various types of devices. One of the critical challenges of cloud storage services is the management of the ever-increasing volume of data .To address these data deduplication is one of the novel technique. Deduplication helps to remove and prevent from having duplicate copies of same data. Though deduplication has several benefits it adds concerns related to privacy and security of users as it can lead to insider or outsider attacks. Achieving deduplication along with data security in cloud environment makes it more critical problem to solve. Objective of this paper on Optima Authorized Data Deduplication in Cloud is to mention the proposed system and analysis of deduplication techniques and optimal authorization measures for security along with deduplication technique in cloud environment DOI: 10.17762/ijritcc2321-8169.15073

    A Review on Deduplication-Cost Efficient Method to Store Data Over Cloud Using Convergent Encryption

    Get PDF
    This paper represents that, many techniques are using for the elimination of duplicate copies of repeating data, out of those techniques, the most important data compression technique is data deduplication. Convergent technique has been used to encrypt data before outsourcing for privacy and security point of view. In the proposed system, we apply the technique of cryptographic tuning to make the encryption more secure and flexible. In previous systems, there was a limitation of convergent encryption. Data deduplication does not allow the storage of repetitive blocks. It also puts the pointer to the existing blocks so that the data owner have the freedom of selecting users, to have access to the published file. Access control is provided into the application. The integrity of data outsourced to the cloud is managed by the hash calculation of any content following the proof-of-ownership module. Proposed system calculates the hash value of the data content on both sides i.e.; destination as well as source side. Request hash for the cloud side to predict the tampering of data. The expected analysis shows the improvement in execution time and development cost

    Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

    Get PDF
    Background Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians
    • …
    corecore