1,979 research outputs found

    Tradeoff for Heterogeneous Distributed Storage Systems between Storage and Repair Cost

    Full text link
    In this paper, we consider heterogeneous distributed storage systems (DSSs) having flexible reconstruction degree, where each node in the system has dynamic repair bandwidth and dynamic storage capacity. In particular, a data collector can reconstruct the file at time tt using some arbitrary nodes in the system and for a node failure the system can be repaired by some set of arbitrary nodes. Using minmin-cutcut bound, we investigate the fundamental tradeoff between storage and repair cost for our model of heterogeneous DSS. In particular, the problem is formulated as bi-objective optimization linear programing problem. For an arbitrary DSS, it is shown that the calculated minmin-cutcut bound is tight.Comment: 10 pages, 5 figures, draf

    Non-homogeneous Two-Rack Model for Distributed Storage Systems

    Full text link
    In the traditional two-rack distributed storage system (DSS) model, due to the assumption that the storage capacity of each node is the same, the minimum bandwidth regenerating (MBR) point becomes infeasible. In this paper, we design a new non-homogeneous two-rack model by proposing a generalization of the threshold function used to compute the tradeoff curve. We prove that by having the nodes in the rack with higher regenerating bandwidth stores more information, all the points on the tradeoff curve, including the MBR point, become feasible. Finally, we show how the non-homogeneous two-rack model outperforms the traditional model in the tradeoff curve between the storage per node and the repair bandwidth.Comment: ISIT 2013. arXiv admin note: text overlap with arXiv:1004.0785 by other author

    Modeling and Optimization of Latency in Erasure-coded Storage Systems

    Full text link
    As consumers are increasingly engaged in social networking and E-commerce activities, businesses grow to rely on Big Data analytics for intelligence, and traditional IT infrastructures continue to migrate to the cloud and edge, these trends cause distributed data storage demand to rise at an unprecedented speed. Erasure coding has seen itself quickly emerged as a promising technique to reduce storage cost while providing similar reliability as replicated systems, widely adopted by companies like Facebook, Microsoft and Google. However, it also brings new challenges in characterizing and optimizing the access latency when erasure codes are used in distributed storage. The aim of this monograph is to provide a review of recent progress (both theoretical and practical) on systems that employ erasure codes for distributed storage. In this monograph, we will first identify the key challenges and taxonomy of the research problems and then give an overview of different approaches that have been developed to quantify and model latency of erasure-coded storage. This includes recent work leveraging MDS-Reservation, Fork-Join, Probabilistic, and Delayed-Relaunch scheduling policies, as well as their applications to characterize access latency (e.g., mean, tail, asymptotic latency) of erasure-coded distributed storage systems. We will also extend the problem to the case when users are streaming videos from erasure-coded distributed storage systems. Next, we bridge the gap between theory and practice, and discuss lessons learned from prototype implementation. In particular, we will discuss exemplary implementations of erasure-coded storage, illuminate key design degrees of freedom and tradeoffs, and summarize remaining challenges in real-world storage systems such as in content delivery and caching. Open problems for future research are discussed at the end of each chapter.Comment: Monograph for use by researchers interested in latency aspects of distributed storage system

    Multi-Rack Distributed Data Storage Networks

    Full text link
    The majority of works in distributed storage networks assume a simple network model with a collection of identical storage nodes with the same communication cost between the nodes. In this paper, we consider a realistic multi-rack distributed data storage network and present a code design framework for this model. Considering the cheaper data transmission within the racks, our code construction method is able to locally repair the nodes failure within the same rack by using only the survived nodes in the same rack. However, in the case of severe failure patterns when the information content of the survived nodes is not sufficient to repair the failures, other racks will participate in the repair process. By employing the criteria of our multi-rack storage code, we establish a linear programming bound on the size of the code in order to maximize the code rate

    Accelerating Data Regeneration for Distributed Storage Systems with Heterogeneous Link Capacities

    Full text link
    Distributed storage systems provide large-scale reliable data storage services by spreading redundancy across a large group of storage nodes. In such a large system, node failures take place on a regular basis. When a storage node breaks down, a replacement node is expected to regenerate the redundant data as soon as possible in order to maintain the same level of redundancy. Previous results have been mainly focused on the minimization of network traffic in regeneration. However, in practical networks, where link capacities vary in a wide range, minimizing network traffic does not always yield the minimum regeneration time. In this paper, we investigate two approaches to the problem of minimizing regeneration time in networks with heterogeneous link capacities. The first approach is to download different amounts of repair data from the helping nodes according to the link capacities. The second approach generalizes the conventional star-structured regeneration topology to tree-structured topologies so that we can utilize the links between helping nodes with bypassing low-capacity links. Simulation results show that the flexible tree-structured regeneration scheme that combines the advantages of both approaches can achieve a substantial reduction in the regeneration time.Comment: submitted to Trans. IT in Feb. 201

    Joint Latency and Cost Optimization for Erasure-coded Data Center Storage

    Full text link
    Modern distributed storage systems offer large capacity to satisfy the exponentially increasing need of storage space. They often use erasure codes to protect against disk and node failures to increase reliability, while trying to meet the latency requirements of the applications and clients. This paper provides an insightful upper bound on the average service delay of such erasure-coded storage with arbitrary service time distribution and consisting of multiple heterogeneous files. Not only does the result supersede known delay bounds that only work for a single file or homogeneous files, it also enables a novel problem of joint latency and storage cost minimization over three dimensions: selecting the erasure code, placement of encoded chunks, and optimizing scheduling policy. The problem is efficiently solved via the computation of a sequence of convex approximations with provable convergence. We further prototype our solution in an open-source, cloud storage deployment over three geographically distributed data centers. Experimental results validate our theoretical delay analysis and show significant latency reduction, providing valuable insights into the proposed latency-cost tradeoff in erasure-coded storage.Comment: 14 pages, presented in part at IFIP Performance, Oct 201

    Capacity of Distributed Storage Systems with Clusters and Separate Nodes

    Full text link
    In distributed storage systems (DSSs), the optimal tradeoff between node storage and repair bandwidth is an important issue for designing distributed coding strategies to ensure large scale data reliability. The capacity of DSSs is obtained as a function of node storage and repair bandwidth parameters, characterizing the tradeoff. There are lots of works on DSSs with clusters (racks) where the repair bandwidths from intra-cluster and cross-cluster are differentiated. However, separate nodes are also prevalent in the realistic DSSs, but the works on DSSs with clusters and separate nodes (CSN-DSSs) are insufficient. In this paper, we formulate the capacity of CSN-DSSs with one separate node for the first time where the bandwidth to repair a separate node is of cross-cluster. Consequently, the optimal tradeoff between node storage and repair bandwidth are derived and compared with cluster DSSs. A regenerating code instance is constructed based on the tradeoff. Furthermore, the influence of adding a separate node is analyzed and formulated theoretically. We prove that when each cluster contains R nodes and any k nodes suffice to recover the original file (MDS property), adding an extra separate node will keep the capacity if R|k, and reduce the capacity otherwise

    Diffusive Load Balancing of Loosely-Synchronous Parallel Programs over Peer-to-Peer Networks

    Full text link
    The use of under-utilized Internet resources is widely recognized as a viable form of high performance computing. Sustained processing power of roughly 40T FLOPS using 4 million volunteered Internet hosts has been reported for embarrassingly parallel problems. At the same time, peer-to-peer (P2P) file sharing networks, with more than 50 million participants, have demonstrated the capacity for scale in distributed systems. This paper contributes a study of load balancing techniques for a general class of loosely-synchronous parallel algorithms when executed over a P2P network. We show that decentralized, diffusive load balancing can be effective at balancing load and is facilitated by the dynamic properties of P2P. While a moderate degree of dynamicity can benefit load balancing, significant dynamicity hinders the parallel program performance due to the need for increased load migration. To the best of our knowledge this study provides new insight into the performance of loosely-synchronous parallel programs over the Internet.Comment: 14 pages with 10 figure

    On the Latency and Energy Efficiency of Erasure-Coded Cloud Storage Systems

    Full text link
    The increase in data storage and power consumption at data-centers has made it imperative to design energy efficient Distributed Storage Systems (DSS). The energy efficiency of DSS is strongly influenced not only by the volume of data, frequency of data access and redundancy in data storage, but also by the heterogeneity exhibited by the DSS in these dimensions. To this end, we propose and analyze the energy efficiency of a heterogeneous distributed storage system in which nn storage servers (disks) store the data of RR distinct classes. Data of class ii is encoded using a (n,ki)(n,k_{i}) erasure code and the (random) data retrieval requests can also vary across classes. We show that the energy efficiency of such systems is closely related to the average latency and hence motivates us to study the energy efficiency via the lens of average latency. Through this connection, we show that erasure coding serves the dual purpose of reducing latency and increasing energy efficiency. We present a queuing theoretic analysis of the proposed model and establish upper and lower bounds on the average latency for each data class under various scheduling policies. Through extensive simulations, we present qualitative insights which reveal the impact of coding rate, number of servers, service distribution and number of redundant requests on the average latency and energy efficiency of the DSS.Comment: Submitted to IEEE Transactions on Cloud Computing. Contains 24 pages, 13 figure

    On the Duality and File Size Hierarchy of Fractional Repetition Codes

    Full text link
    Distributed storage systems that deploy erasure codes can provide better features such as lower storage overhead and higher data reliability. In this paper, we focus on fractional repetition (FR) codes, which are a class of storage codes characterized by the features of uncoded exact repair and minimum repair bandwidth. We study the duality of FR codes, and investigate the relationship between the supported file size of an FR code and its dual code. Based on the established relationship, we derive an improved dual bound on the supported file size of FR codes. We further show that FR codes constructed from tt-designs are optimal when the size of the stored file is sufficiently large. Moreover, we present the tensor product technique for combining FR codes, and elaborate on the file size hierarchy of resulting codes.Comment: Submitted for possible journal publicatio
    • …
    corecore