548 research outputs found

    Practical Functional Regenerating Codes for Broadcast Repair of Multiple Nodes

    Full text link
    A code construction and repair scheme for optimal functional regeneration of multiple node failures is presented, which is based on stitching together short MDS codes on carefully chosen sets of points lying on a linearized polynomial. The nodes are connected wirelessly, hence all transmissions by helper nodes during a repair round are available to all the nodes being repaired. The scheme is simple and practical because of low subpacketization, low I/O cost and low computational cost. Achievability of the minimum-bandwidth regenerating (MBR) point, as well as an interior point, on the optimal storage-repair bandwidth tradeoff curve is shown. The subspace properties derived in the paper provide insight into the general properties of functional regenerating codes.Comment: 5 pages, ISIT 201

    Repair Strategies for Storage on Mobile Clouds

    Full text link
    We study the data reliability problem for a community of devices forming a mobile cloud storage system. We consider the application of regenerating codes for file maintenance within a geographically-limited area. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost due to node mobility. We show that at a low departure-to-repair rate regime, a lazy repair strategy in which repairs are initiated after several nodes have left the system outperforms eager repair in which repairs are initiated after a single departure. This optimality is reversed when nodes are highly mobile. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time, as a function of underlying code parameters. In addition, we examine cooperative repair strategies and show performance improvements compared to non-cooperative codes. We investigate several models for the time needed for node repair including a simple fixed time model that allows for the computation of closed-form expressions and a more realistic model that takes into account the number of repaired nodes. We derive the conditions under which the former model approximates the latter. Finally, an extended model where additional failures are allowed during the repair process is investigated. Overall, our results establish the joint effect of code design and repair algorithms on the maintenance cost of distributed storage systems.Comment: 23 pages, 11 figure

    Decentralized Minimum-Cost Repair for Distributed Storage Systems

    Full text link
    There have been emerging lots of applications for distributed storage systems e.g., those in wireless sensor networks or cloud storage. Since storage nodes in wireless sensor networks have limited battery, it is valuable to find a repair scheme with optimal transmission costs (e.g., energy). The optimal-cost repair has been recently investigated in a centralized way. However a centralized control mechanism may not be available or is very expensive. For the scenarios, it is interesting to study optimal-cost repair in a decentralized setup. We formulate the optimal-cost repair as convex optimization problems for the network with convex transmission costs. Then we use primal and dual decomposition approaches to decouple the problem into subproblems to be solved locally. Thus, each surviving node, collaborating with other nodes, can minimize its transmission cost such that the global cost is minimized. We further study the optimality and convergence of the algorithms. Finally, we discuss the code construction and determine the field size for finding feasible network codes in our approaches

    Determinant Codes with Helper-Independent Repair for Single and Multiple Failures

    Full text link
    Determinant codes are a class of exact-repair regenerating codes for distributed storage systems with parameters (n, k = d, d). These codes cover the entire trade-off between per-node storage and repair-bandwidth. In an earlier work of the authors, the repair data of the determinant code sent by a helper node to repair a failed node depends on the identity of the other helper nodes participating in the process, which is practically undesired. In this work, a new repair mechanism is proposed for determinant codes, which relaxes this dependency, while preserving all other properties of the code. Moreover, it is shown that the determinant codes are capable of repairing multiple failures, with a per-node repair-bandwidth which scales sub-linearly with the number of failures

    Repairing Multiple Failures for Scalar MDS Codes

    Full text link
    In distributed storage, erasure codes -- like Reed-Solomon Codes -- are often employed to provide reliability. In this setting, it is desirable to be able to repair one or more failed nodes while minimizing the repair bandwidth. In this work, motivated by Reed-Solomon codes, we study the problem of repairing multiple failed nodes in a scalar MDS code. We extend the framework of (Guruswami and Wootters, 2017) to give a framework for constructing repair schemes for multiple failures in general scalar MDS codes, in the centralized repair model. We then specialize our framework to Reed-Solomon codes, and extend and improve upon recent results of (Dau et al., 2017)

    The Storage vs Repair Bandwidth Trade-off for Multiple Failures in Clustered Storage Networks

    Full text link
    We study the trade-off between storage overhead and inter-cluster repair bandwidth in clustered storage systems, while recovering from multiple node failures within a cluster. A cluster is a collection of mm nodes, and there are nn clusters. For data collection, we download the entire content from any kk clusters. For repair of t≥2t \geq 2 nodes within a cluster, we take help from ℓ\ell local nodes, as well as dd helper clusters. We characterize the optimal trade-off under functional repair, and also under exact repair for the minimum storage and minimum inter-cluster bandwidth (MBR) operating points. Our bounds show the following interesting facts: 1)1) When t∣(m−ℓ)t|(m-\ell) the trade-off is the same as that under t=1t=1, and thus there is no advantage in jointly repairing multiple nodes, 2)2) When t∤(m−ℓ)t \nmid (m-\ell), the optimal file-size at the MBR point under exact repair can be strictly less than that under functional repair. 3)3) Unlike the case of t=1t=1, increasing the number of local helper nodes does not necessarily increase the system capacity under functional repair.Comment: Accepted to IEEE Information Theory Workshop(ITW) 201

    Capacity of Wireless Distributed Storage Systems with Broadcast Repair

    Full text link
    In wireless distributed storage systems, storage nodes are connected by wireless channels, which are broadcast in nature. This paper exploits this unique feature to design an efficient repair mechanism, called broadcast repair, for wireless distributed storage systems in the presence of multiple-node failures. Due to the broadcast nature of wireless transmission, we advocate a new measure on repair performance called repair-transmission bandwidth. In contrast to repair bandwidth, which measures the average number of packets downloaded by a newcomer to replace a failed node, repair-transmission bandwidth measures the average number of packets transmitted by helper nodes per failed node. A fundamental study on the storage capacity of wireless distributed storage systems with broadcast repair is conducted by modeling the storage system as a multicast network and analyzing the minimum cut of the corresponding information flow graph. The fundamental tradeoff between storage efficiency and repair-transmission bandwidth is also obtained for functional repair. The performance of broadcast repair is compared both analytically and numerically with that of cooperative repair, the basic repair method for wired distributed storage systems with multiple-node failures. While cooperative repair is based on the idea of allowing newcomers to exchange packets, broadcast repair is based on the idea of allowing a helper to broadcast packets to all newcomers simultaneously. We show that broadcast repair outperforms cooperative repair, offering a better tradeoff between storage efficiency and repair-transmission bandwidth.Comment: 28 pages, 7 figure

    Storage-Repair Bandwidth Trade-off for Wireless Caching with Partial Failure and Broadcast Repair

    Full text link
    Repair of multiple partially failed cache nodes is studied in a distributed wireless content caching system, where rr out of a total of nn cache nodes lose part of their cached data. Broadcast repair of failed cache contents at the network edge is studied; that is, the surviving cache nodes transmit broadcast messages to the failed ones, which are then used, together with the surviving data in their local cache memories, to recover the lost content. The trade-off between the storage capacity and the repair bandwidth is derived. It is shown that utilizing the broadcast nature of the wireless medium and the surviving cache contents at partially failed nodes significantly reduces the required repair bandwidth per node.Comment: Conference version of this paper has been submitted for review in ITW 2018. This submission includes the proof of theorem

    On the Achievability Region of Regenerating Codes for Multiple Erasures

    Full text link
    We study the problem of centralized exact repair of multiple failures in distributed storage. We describe constructions that achieve a new set of interior points under exact repair. The constructions build upon the layered code construction by Tian et al., designed for exact repair of single failure. We firstly improve upon the layered construction for general system parameters. Then, we extend the improved construction to support the repair of multiple failures, with varying number of helpers. In particular, we prove the optimality of one point on the functional repair tradeoff of multiple failures for some parameters. Finally, considering minimum bandwidth cooperative repair (MBCR) codes as centralized repair codes, we determine explicitly the best achievable region obtained by space-sharing among all known points, including the MBCR point

    Concurrent Regenerating Codes and Scalable Application in Network Storage

    Full text link
    To recover simultaneous multiple failures in erasure coded storage systems, Patrick Lee et al introduce concurrent repair based minimal storage regenerating codes to reduce repair traffic. The architecture of this approach is simpler and more practical than that of the cooperative mechanism in non-fully distributed environment, hence this paper unifies such class of regenerating codes as concurrent regenerating codes and further studies its characteristics by analyzing cut-based information flow graph in the multiple-node recovery model. We present a general storage-bandwidth tradeoff and give closed-form expressions for the points on the curve, including concurrent repair mechanism based on minimal bandwidth regenerating codes. We show that the general concurrent regenerating codes can be constructed by reforming the existing single-node regenerating codes or multiplenode cooperative regenerating codes. Moreover, a connection to strong-MDS is also analyzed. On the other respect, the application of RGC is hardly limited to "repairing". It is of great significance for "scaling", a scenario where we need to increase(decrease) nodes to upgrade(degrade) redundancy and reliability. Thus, by clarifying the similarities and differences, we integrate them into a unified model to adjust to the dynamic storage network.Comment: 12 pages, 7 figure
    • …
    corecore