2,772 research outputs found
A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems
Deduplication has been largely employed in distributed storage systems to
improve space efficiency. Traditional deduplication research ignores the design
specifications of shared-nothing distributed storage systems such as no central
metadata bottleneck, scalability, and storage rebalancing. Further,
deduplication introduces transactional changes, which are prone to errors in
the event of a system failure, resulting in inconsistencies in data and
deduplication metadata. In this paper, we propose a robust, fault-tolerant and
scalable cluster-wide deduplication that can eliminate duplicate copies across
the cluster. We design a distributed deduplication metadata shard which
guarantees performance scalability while preserving the design constraints of
shared- nothing storage systems. The placement of chunks and deduplication
metadata is made cluster-wide based on the content fingerprint of chunks. To
ensure transactional consistency and garbage identification, we employ a
flag-based asynchronous consistency mechanism. We implement the proposed
deduplication on Ceph. The evaluation shows high disk-space savings with
minimal performance degradation as well as high robustness in the event of
sudden server failure.Comment: 6 Pages including reference
- …