Search CORE

1,979 research outputs found

Tradeoff for Heterogeneous Distributed Storage Systems between Storage and Repair Cost

Author: Benerjee Krishna Gopal
Gupta Manish K.
Publication venue
Publication date: 08/03/2015
Field of study

In this paper, we consider heterogeneous distributed storage systems (DSSs) having flexible reconstruction degree, where each node in the system has dynamic repair bandwidth and dynamic storage capacity. In particular, a data collector can reconstruct the file at time

t

using some arbitrary nodes in the system and for a node failure the system can be repaired by some set of arbitrary nodes. Using

min

cut

bound, we investigate the fundamental tradeoff between storage and repair cost for our model of heterogeneous DSS. In particular, the problem is formulated as bi-objective optimization linear programing problem. For an arbitrary DSS, it is shown that the calculated

min

cut

bound is tight.Comment: 10 pages, 5 figures, draf

arXiv.org e-Print Archive

Non-homogeneous Two-Rack Model for Distributed Storage Systems

Author: Gastony Bernat
Pernas Jaume
Pujol Jaume
Yuen Chau
Publication venue
Publication date: 30/07/2013
Field of study

In the traditional two-rack distributed storage system (DSS) model, due to the assumption that the storage capacity of each node is the same, the minimum bandwidth regenerating (MBR) point becomes infeasible. In this paper, we design a new non-homogeneous two-rack model by proposing a generalization of the threshold function used to compute the tradeoff curve. We prove that by having the nodes in the rack with higher regenerating bandwidth stores more information, all the points on the tradeoff curve, including the MBR point, become feasible. Finally, we show how the non-homogeneous two-rack model outperforms the traditional model in the tradeoff curve between the storage per node and the repair bandwidth.Comment: ISIT 2013. arXiv admin note: text overlap with arXiv:1004.0785 by other author

arXiv.org e-Print Archive

Crossref

Modeling and Optimization of Latency in Erasure-coded Storage Systems

Author: Aggarwal Vaneet
Lan Tian
Publication venue
Publication date: 21/05/2020
Field of study

As consumers are increasingly engaged in social networking and E-commerce activities, businesses grow to rely on Big Data analytics for intelligence, and traditional IT infrastructures continue to migrate to the cloud and edge, these trends cause distributed data storage demand to rise at an unprecedented speed. Erasure coding has seen itself quickly emerged as a promising technique to reduce storage cost while providing similar reliability as replicated systems, widely adopted by companies like Facebook, Microsoft and Google. However, it also brings new challenges in characterizing and optimizing the access latency when erasure codes are used in distributed storage. The aim of this monograph is to provide a review of recent progress (both theoretical and practical) on systems that employ erasure codes for distributed storage. In this monograph, we will first identify the key challenges and taxonomy of the research problems and then give an overview of different approaches that have been developed to quantify and model latency of erasure-coded storage. This includes recent work leveraging MDS-Reservation, Fork-Join, Probabilistic, and Delayed-Relaunch scheduling policies, as well as their applications to characterize access latency (e.g., mean, tail, asymptotic latency) of erasure-coded distributed storage systems. We will also extend the problem to the case when users are streaming videos from erasure-coded distributed storage systems. Next, we bridge the gap between theory and practice, and discuss lessons learned from prototype implementation. In particular, we will discuss exemplary implementations of erasure-coded storage, illuminate key design degrees of freedom and tradeoffs, and summarize remaining challenges in real-world storage systems such as in content delivery and caching. Open problems for future research are discussed at the end of each chapter.Comment: Monograph for use by researchers interested in latency aspects of distributed storage system

arXiv.org e-Print Archive

Multi-Rack Distributed Data Storage Networks

Author: Chan Terence H.
Sung Chi Wan
Tebbi Ali
Publication venue
Publication date: 07/03/2019
Field of study

The majority of works in distributed storage networks assume a simple network model with a collection of identical storage nodes with the same communication cost between the nodes. In this paper, we consider a realistic multi-rack distributed data storage network and present a code design framework for this model. Considering the cheaper data transmission within the racks, our code construction method is able to locally repair the nodes failure within the same rack by using only the survived nodes in the same rack. However, in the case of severe failure patterns when the information content of the survived nodes is not sufficient to repair the failures, other racks will participate in the repair process. By employing the criteria of our multi-rack storage code, we establish a linear programming bound on the size of the code in order to maximize the code rate

arXiv.org e-Print Archive

Accelerating Data Regeneration for Distributed Storage Systems with Heterogeneous Link Capacities

Author: He Yucheng
Wang Xin
Wang Yan
Wei Dongsheng
Yin Xunrui
Publication venue
Publication date: 16/03/2016
Field of study

Distributed storage systems provide large-scale reliable data storage services by spreading redundancy across a large group of storage nodes. In such a large system, node failures take place on a regular basis. When a storage node breaks down, a replacement node is expected to regenerate the redundant data as soon as possible in order to maintain the same level of redundancy. Previous results have been mainly focused on the minimization of network traffic in regeneration. However, in practical networks, where link capacities vary in a wide range, minimizing network traffic does not always yield the minimum regeneration time. In this paper, we investigate two approaches to the problem of minimizing regeneration time in networks with heterogeneous link capacities. The first approach is to download different amounts of repair data from the helping nodes according to the link capacities. The second approach generalizes the conventional star-structured regeneration topology to tree-structured topologies so that we can utilize the links between helping nodes with bypassing low-capacity links. Simulation results show that the flexible tree-structured regeneration scheme that combines the advantages of both approaches can achieve a substantial reduction in the regeneration time.Comment: submitted to Trans. IT in Feb. 201

arXiv.org e-Print Archive

Joint Latency and Cost Optimization for Erasure-coded Data Center Storage

Author: Aggarwal Vaneet
Chen Yih-Farn R
Lan Tian
Xiang Yu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/08/2014
Field of study

Modern distributed storage systems offer large capacity to satisfy the exponentially increasing need of storage space. They often use erasure codes to protect against disk and node failures to increase reliability, while trying to meet the latency requirements of the applications and clients. This paper provides an insightful upper bound on the average service delay of such erasure-coded storage with arbitrary service time distribution and consisting of multiple heterogeneous files. Not only does the result supersede known delay bounds that only work for a single file or homogeneous files, it also enables a novel problem of joint latency and storage cost minimization over three dimensions: selecting the erasure code, placement of encoded chunks, and optimizing scheduling policy. The problem is efficiently solved via the computation of a sequence of convex approximations with provable convergence. We further prototype our solution in an open-source, cloud storage deployment over three geographically distributed data centers. Experimental results validate our theoretical delay analysis and show significant latency reduction, providing valuable insights into the proposed latency-cost tradeoff in erasure-coded storage.Comment: 14 pages, presented in part at IFIP Performance, Oct 201

arXiv.org e-Print Archive

Capacity of Distributed Storage Systems with Clusters and Separate Nodes

Author: Luo Yuan
Shum Kenneth W.
Wang Jingzhao
Wang Tinghan
Publication venue
Publication date: 09/01/2019
Field of study

In distributed storage systems (DSSs), the optimal tradeoff between node storage and repair bandwidth is an important issue for designing distributed coding strategies to ensure large scale data reliability. The capacity of DSSs is obtained as a function of node storage and repair bandwidth parameters, characterizing the tradeoff. There are lots of works on DSSs with clusters (racks) where the repair bandwidths from intra-cluster and cross-cluster are differentiated. However, separate nodes are also prevalent in the realistic DSSs, but the works on DSSs with clusters and separate nodes (CSN-DSSs) are insufficient. In this paper, we formulate the capacity of CSN-DSSs with one separate node for the first time where the bandwidth to repair a separate node is of cross-cluster. Consequently, the optimal tradeoff between node storage and repair bandwidth are derived and compared with cluster DSSs. A regenerating code instance is constructed based on the tradeoff. Furthermore, the influence of adding a separate node is analyzed and formulated theoretically. We prove that when each cluster contains R nodes and any k nodes suffice to recover the original file (MDS property), adding an extra separate node will keep the capacity if R|k, and reduce the capacity otherwise

arXiv.org e-Print Archive

Diffusive Load Balancing of Loosely-Synchronous Parallel Programs over Peer-to-Peer Networks

Author: Douglas Scott
Harwood Aaron
Publication venue
Publication date: 05/10/2004
Field of study

The use of under-utilized Internet resources is widely recognized as a viable form of high performance computing. Sustained processing power of roughly 40T FLOPS using 4 million volunteered Internet hosts has been reported for embarrassingly parallel problems. At the same time, peer-to-peer (P2P) file sharing networks, with more than 50 million participants, have demonstrated the capacity for scale in distributed systems. This paper contributes a study of load balancing techniques for a general class of loosely-synchronous parallel algorithms when executed over a P2P network. We show that decentralized, diffusive load balancing can be effective at balancing load and is facilitated by the dynamic properties of P2P. While a moderate degree of dynamicity can benefit load balancing, significant dynamicity hinders the parallel program performance due to the need for increased load migration. To the best of our knowledge this study provides new insight into the performance of loosely-synchronous parallel programs over the Internet.Comment: 14 pages with 10 figure

arXiv.org e-Print Archive

On the Latency and Energy Efficiency of Erasure-Coded Cloud Storage Systems

Author: Clancy T. Charles
Kumar Akshay
Tandon Ravi
Publication venue
Publication date: 21/05/2015
Field of study

The increase in data storage and power consumption at data-centers has made it imperative to design energy efficient Distributed Storage Systems (DSS). The energy efficiency of DSS is strongly influenced not only by the volume of data, frequency of data access and redundancy in data storage, but also by the heterogeneity exhibited by the DSS in these dimensions. To this end, we propose and analyze the energy efficiency of a heterogeneous distributed storage system in which

n

storage servers (disks) store the data of

R

distinct classes. Data of class

i

is encoded using a

(n,k_{i})

erasure code and the (random) data retrieval requests can also vary across classes. We show that the energy efficiency of such systems is closely related to the average latency and hence motivates us to study the energy efficiency via the lens of average latency. Through this connection, we show that erasure coding serves the dual purpose of reducing latency and increasing energy efficiency. We present a queuing theoretic analysis of the proposed model and establish upper and lower bounds on the average latency for each data class under various scheduling policies. Through extensive simulations, we present qualitative insights which reveal the impact of coding rate, number of servers, service distribution and number of redundant requests on the average latency and energy efficiency of the DSS.Comment: Submitted to IEEE Transactions on Cloud Computing. Contains 24 pages, 13 figure

arXiv.org e-Print Archive

On the Duality and File Size Hierarchy of Fractional Repetition Codes

Author: Li Hui
Shum Kenneth W.
Zhu Bing
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/08/2018
Field of study

Distributed storage systems that deploy erasure codes can provide better features such as lower storage overhead and higher data reliability. In this paper, we focus on fractional repetition (FR) codes, which are a class of storage codes characterized by the features of uncoded exact repair and minimum repair bandwidth. We study the duality of FR codes, and investigate the relationship between the supported file size of an FR code and its dual code. Based on the established relationship, we derive an improved dual bound on the supported file size of FR codes. We further show that FR codes constructed from

t

-designs are optimal when the size of the stored file is sufficiently large. Moreover, we present the tensor product technique for combining FR codes, and elaborate on the file size hierarchy of resulting codes.Comment: Submitted for possible journal publicatio

arXiv.org e-Print Archive