517 research outputs found

    Repair-Optimal MDS Array Codes over GF(2)

    Full text link
    Maximum-distance separable (MDS) array codes with high rate and an optimal repair property were introduced recently. These codes could be applied in distributed storage systems, where they minimize the communication and disk access required for the recovery of failed nodes. However, the encoding and decoding algorithms of the proposed codes use arithmetic over finite fields of order greater than 2, which could result in a complex implementation. In this work, we present a construction of 2-parity MDS array codes, that allow for optimal repair of a failed information node using XOR operations only. The reduction of the field order is achieved by allowing more parity bits to be updated when a single information bit is being changed by the user.Comment: 5 pages, submitted to ISIT 201

    Two-layer Locally Repairable Codes for Distributed Storage Systems

    Full text link
    In this paper, we propose locally repairable codes (LRCs) with optimal minimum distance for distributed storage systems (DSS). A two-layer encoding structure is employed to ensure data reconstruction and the designated repair locality. The data is first encoded in the first layer by any existing maximum distance separable (MDS) codes, and then the encoded symbols are divided into non-overlapping groups and encoded by an MDS array code in the second layer. The encoding in the second layer provides enough redundancy for local repair, while the overall code performs recovery of the data based on redundancy from both layers. Our codes can be constructed over a finite field with size growing linearly with the total number of nodes in the DSS, and facilitate efficient degraded reads.Comment: This paper has been withdrawn by the author due to inaccuracy of Claim

    A Repair Framework for Scalar MDS Codes

    Full text link
    Several works have developed vector-linear maximum-distance separable (MDS) storage codes that min- imize the total communication cost required to repair a single coded symbol after an erasure, referred to as repair bandwidth (BW). Vector codes allow communicating fewer sub-symbols per node, instead of the entire content. This allows non trivial savings in repair BW. In sharp contrast, classic codes, like Reed- Solomon (RS), used in current storage systems, are deemed to suffer from naive repair, i.e. downloading the entire stored message to repair one failed node. This mainly happens because they are scalar-linear. In this work, we present a simple framework that treats scalar codes as vector-linear. In some cases, this allows significant savings in repair BW. We show that vectorized scalar codes exhibit properties that simplify the design of repair schemes. Our framework can be seen as a finite field analogue of real interference alignment. Using our simplified framework, we design a scheme that we call clique-repair which provably identifies the best linear repair strategy for any scalar 2-parity MDS code, under some conditions on the sub-field chosen for vectorization. We specify optimal repair schemes for specific (5,3)- and (6,4)-Reed- Solomon (RS) codes. Further, we present a repair strategy for the RS code currently deployed in the Facebook Analytics Hadoop cluster that leads to 20% of repair BW savings over naive repair which is the repair scheme currently used for this code.Comment: 10 Pages; accepted to IEEE JSAC -Distributed Storage 201

    Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage

    Get PDF
    Erasure codes are typically used in large-scale distributed storage systems to provide durability of data in the face of failures. In this setting, a set of k blocks to be stored is encoded using an [n, k] code to generate n blocks that are then stored on different storage nodes. A recent work by Kadekodi et al. [Kadekodi et al., 2019] shows that the failure rate of storage devices vary significantly over time, and that changing the rate of the code (via a change in the parameters n and k) in response to such variations provides significant reduction in storage space requirement. However, the resource overhead of realizing such a change in the code rate on already encoded data in traditional codes is prohibitively high. Motivated by this application, in this work we first present a new framework to formalize the notion of code conversion - the process of converting data encoded with an [n^I, k^I] code into data encoded with an [n^F, k^F] code while maintaining desired decodability properties, such as the maximum-distance-separable (MDS) property. We then introduce convertible codes, a new class of code pairs that allow for code conversions in a resource-efficient manner. For an important parameter regime (which we call the merge regime) along with the widely used linearity and MDS decodability constraint, we prove tight bounds on the number of nodes accessed during code conversion. In particular, our achievability result is an explicit construction of MDS convertible codes that are optimal for all parameter values in the merge regime albeit with a high field size. We then present explicit low-field-size constructions of optimal MDS convertible codes for a broad range of parameters in the merge regime. Our results thus show that it is indeed possible to achieve code conversions with significantly lesser resources as compared to the default approach of re-encoding

    Achieving Maximum Distance Separable Private Information Retrieval Capacity With Linear Codes

    Get PDF
    We propose three private information retrieval (PIR) protocols for distributed storage systems (DSSs) where data is stored using an arbitrary linear code. The first two protocols, named Protocol 1 and Protocol 2, achieve privacy for the scenario with noncolluding nodes. Protocol 1 requires a file size that is exponential in the number of files in the system, while Protocol 2 requires a file size that is independent of the number of files and is hence simpler. We prove that, for certain linear codes, Protocol 1 achieves the maximum distance separable (MDS) PIR capacity, i.e., the maximum PIR rate (the ratio of the amount of retrieved stored data per unit of downloaded data) for a DSS that uses an MDS code to store any given (finite and infinite) number of files, and Protocol 2 achieves the asymptotic MDS-PIR capacity (with infinitely large number of files in the DSS). In particular, we provide a necessary and a sufficient condition for a code to achieve the MDS-PIR capacity with Protocols 1 and 2 and prove that cyclic codes, Reed-Muller (RM) codes, and a class of distance-optimal local reconstruction codes achieve both the finite MDS-PIR capacity (i.e., with any given number of files) and the asymptotic MDS-PIR capacity with Protocols 1 and 2, respectively. Furthermore, we present a third protocol, Protocol 3, for the scenario with multiple colluding nodes, which can be seen as an improvement of a protocol recently introduced by Freij-Hollanti et al.. Similar to the noncolluding case, we provide a necessary and a sufficient condition to achieve the maximum possible PIR rate of Protocol 3. Moreover, we provide a particular class of codes that is suitable for this protocol and show that RM codes achieve the maximum possible PIR rate for the protocol. For all three protocols, we present an algorithm to optimize their PIR rates.Comment: This work is the extension of the work done in arXiv:1612.07084v2. The current version introduces further refinement to the manuscript. Current version will appear in the IEEE Transactions on Information Theor

    Low-Complexity Codes for Random and Clustered High-Order Failures in Storage Arrays

    Get PDF
    RC (Random/Clustered) codes are a new efficient array-code family for recovering from 4-erasures. RC codes correct most 4-erasures, and essentially all 4-erasures that are clustered. Clustered erasures are introduced as a new erasure model for storage arrays. This model draws its motivation from correlated device failures, that are caused by physical proximity of devices, or by age proximity of endurance-limited solid-state drives. The reliability of storage arrays that employ RC codes is analyzed and compared to known codes. The new RC code is significantly more efficient, in all practical implementation factors, than the best known 4-erasure correcting MDS code. These factors include: small-write update-complexity, full-device update-complexity, decoding complexity and number of supported devices in the array
    corecore