1 research outputs found
Topology-Aware Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems
Distributed storage systems introduce redundancy to protect data from node
failures. After a storage node fails, the lost data should be regenerated at a
replacement storage node as soon as possible to maintain the same level of
redundancy. Minimizing such a regeneration time is critical to the reliability
of distributed storage systems. Existing work commits to reduce the
regeneration time by either minimizing the regenerating traffic, or adjusting
the regenerating traffic patterns, whereas nodes participating data
regeneration are generally assumed to be given beforehand. However, such
regeneration time also depends heavily on the selection of the participating
nodes. Selecting different participating nodes actually involve different data
links between the nodes. Real-world distributed storage systems usually exhibit
heterogeneous link capacities. It is possible to further reduce the
regeneration time via exploiting such link capacity differences and avoiding
the link bottlenecks. In this paper, we consider the minimization of the
regeneration time by selecting the participating nodes in heterogeneous
networks. We analyze the regeneration time and propose node selection
algorithms for overlay networks and real-world topologies. Considering that the
flexible amount of data blocks from each provider may deeply influence the
regeneration time, several techniques are designed to enhance our schemes in
overlay networks. Experimental results show that our node selection schemes can
significantly reduce the regeneration time for each topology, especially in
practical networks with heterogeneous link capacities.Comment: 14pages, 7 pages, 4 algorithm