2,328 research outputs found

    A Class of MSR Codes for Clustered Distributed Storage

    Full text link
    Clustered distributed storage models real data centers where intra- and cross-cluster repair bandwidths are different. In this paper, exact-repair minimum-storage-regenerating (MSR) codes achieving capacity of clustered distributed storage are designed. Focus is given on two cases: ϵ=0\epsilon=0 and ϵ=1/(n−k)\epsilon=1/(n-k), where ϵ\epsilon is the ratio of the available cross- and intra-cluster repair bandwidths, nn is the total number of distributed nodes and kk is the number of contact nodes in data retrieval. The former represents the scenario where cross-cluster communication is not allowed, while the latter corresponds to the case of minimum cross-cluster bandwidth that is possible under the minimum storage overhead constraint. For the ϵ=0\epsilon=0 case, two types of locally repairable codes are proven to achieve the MSR point. As for ϵ=1/(n−k)\epsilon=1/(n-k), an explicit MSR coding scheme is suggested for the two-cluster situation under the specific condition of n=2kn = 2k.Comment: 9 pages, a part of this paper is submitted to IEEE ISIT201

    Increasing Availability in Distributed Storage Systems via Clustering

    Full text link
    We introduce the Fixed Cluster Repair System (FCRS) as a novel architecture for Distributed Storage Systems (DSS), achieving a small repair bandwidth while guaranteeing a high availability. Specifically we partition the set of servers in a DSS into ss clusters and allow a failed server to choose any cluster other than its own as its repair group. Thereby, we guarantee an availability of s−1s-1. We characterize the repair bandwidth vs. storage trade-off for the FCRS under functional repair and show that the minimum repair bandwidth can be improved by an asymptotic multiplicative factor of 2/32/3 compared to the state of the art coding techniques that guarantee the same availability. We further introduce Cubic Codes designed to minimize the repair bandwidth of the FCRS under the exact repair model. We prove an asymptotic multiplicative improvement of 0.790.79 in the minimum repair bandwidth compared to the existing exact repair coding techniques that achieve the same availability. We show that Cubic Codes are information-theoretically optimal for the FCRS with 22 and 33 complete clusters. Furthermore, under the repair-by-transfer model, Cubic Codes are optimal irrespective of the number of clusters

    Functional broadcast repair of multiple partial failures in wireless distributed storage systems

    Get PDF
    We consider a distributed storage system with n nodes, where a user can recover the stored file from any k nodes, and study the problem of repairing r partially failed nodes. We consider broadcast repair , that is, d surviving nodes transmit broadcast messages on an error-free wireless channel to the r nodes being repaired, which are then used, together with the surviving data in the local memories of the failed nodes, to recover the lost content. First, we derive the trade-off between the storage capacity and the repair bandwidth for partial repair of multiple failed nodes, based on the cut-set bound for information flow graphs. It is shown that utilizing the broadcast nature of the wireless medium and the surviving contents at the partially failed nodes reduces the repair bandwidth per node significantly. Then, we list a set of invariant conditions that are sufficient for a functional repair code to be feasible. We further propose a scheme for functional repair of multiple failed nodes that satisfies the invariant conditions with high probability, and its extension to the repair of partial failures. The performance of the proposed scheme meets the cut-set bound on all the points on the trade-off curve for all admissible parameters when k is divisible by r , while employing linear subpacketization, which is an important practical consideration in the design of distributed storage codes. Unlike random linear codes, which are conventionally used for functional repair of failed nodes, the proposed repair scheme has lower overhead, lower input-output cost, and lower computational complexity during repair
    • …
    corecore