A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for
  Shared-Nothing Storage Systems

Hamandawana, Prince; Khan, Awais; Kim, Youngjae; Lee, Chang-Gyu; Park, Sungyong

research

A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems

Authors: Prince Hamandawana
Awais Khan
Youngjae Kim
Chang-Gyu Lee
Sungyong Park
Publication date: 20 March 2018
Publisher
Doi

Abstract

Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a system failure, resulting in inconsistencies in data and deduplication metadata. In this paper, we propose a robust, fault-tolerant and scalable cluster-wide deduplication that can eliminate duplicate copies across the cluster. We design a distributed deduplication metadata shard which guarantees performance scalability while preserving the design constraints of shared- nothing storage systems. The placement of chunks and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. To ensure transactional consistency and garbage identification, we employ a flag-based asynchronous consistency mechanism. We implement the proposed deduplication on Ceph. The evaluation shows high disk-space savings with minimal performance degradation as well as high robustness in the event of sudden server failure.Comment: 6 Pages including reference

Similar works

Full text

Available Versions

Crossref

Last time updated on 10/08/2021