316 research outputs found

    Long MDS Codes for Optimal Repair Bandwidth

    Get PDF
    MDS codes are erasure-correcting codes that can correct the maximum number of erasures given the number of redundancy or parity symbols. If an MDS code has r parities and no more than r erasures occur, then by transmitting all the remaining data in the code one can recover the original information. However, it was shown that in order to recover a single symbol erasure, only a fraction of 1/r of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column, then the code forms a 2D array and such codes are especially widely used in storage systems. In this paper, we ask the following question: given the length of the column l, can we construct high-rate MDS array codes with optimal repair bandwidth of 1/r, whose code length is as long as possible? In this paper, we give code constructions such that the code length is (r + 1)log_r l

    Explicit MDS Codes for Optimal Repair Bandwidth

    Get PDF
    MDS codes are erasure-correcting codes that can correct the maximum number of erasures for a given number of redundancy or parity symbols. If an MDS code has rr parities and no more than rr erasures occur, then by transmitting all the remaining data in the code, the original information can be recovered. However, it was shown that in order to recover a single symbol erasure, only a fraction of 1/r1/r of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column over some field, then the code forms a 2D array and such codes are especially widely used in storage systems. In this paper, we address the following question: given the length of the column ll, number of parities rr, can we construct high-rate MDS array codes with optimal repair bandwidth of 1/r1/r, whose code length is as long as possible? In this paper, we give code constructions such that the code length is (r+1)log⁥rl(r+1)\log_r l.Comment: 17 page

    Coding Schemes for Distributed Storage Systems

    Get PDF
    This thesis is devoted to problems in error-correcting codes motivated by data integrity problems arising in large-scale distributed storage systems. We study properties and constructions of Maximum Distance Separable (MDS) codes, which are widely used in storage applications since they provide the maximum failure tolerance for a given amount of storage overhead. Among the parameters of the code that are important for storage applications are: the amount of data transferred in the system during node repair (the repair bandwidth), which characterizes the network usage, and the volume of accessed data, which corresponds to the number of disk I/O operations. Therefore, recent research on MDS codes for distributed storage has focused on codes that can minimize these two quantities. A lower bound on the repair bandwidth of a code, called the cut-set bound, was proved by Dimakis et al. in 2010, and codes that attain this bound are said to have the optimal repair property. Explicit optimal-repair low-rate (rate ≀1/2\le 1/2) MDS codes were constructed by Rashmi et al. in 2011. At the same time, large-scale distributed systems such as the Google File System and Hadoop Distributed File System, employ high-rate (rate >1/2> 1/2) MDS codes due to the need of reducing storage overhead. Until recently, except for some particular cases, no general explicit constructions of high-rate optimal-repair MDS codes were known. In this thesis, we present the first explicit constructions of optimal-repair MDS codes, thereby providing a solution to the general construction problem of such codes for the high-rate regime. More specifically, we construct explicit MDS codes that can repair any number of failed nodes from any number of helper nodes with the smallest possible amount of downloaded/accessed data. For the particular case of repairing a single node failure, we further present an explicit family of MDS codes that minimize the amount of accessed data during the repair. This family of codes has an additional favorable property that the node size (the amount of information stored in the node) is also the smallest possible. Reducing the node size directly translates into reducing the complexity of storage systems. While most studies on MDS codes with optimal repair bandwidth focus on array codes, the repair problem of widely used scalar codes such as Reed-Solomon codes has also recently attracted attention of researchers. It has been an open problem whether scalar linear MDS codes can achieve the cut-set bound. In this thesis, we answer this question in the affirmative by giving explicit constructions of Reed-Solomon codes that can be repaired at the cut-set bound. We also prove a lower bound on the node size of optimally repairable scalar MDS codes, showing that the node size of our RS codes is close to the best possible for scalar linear codes. Finally, we extend the concept of repair bandwidth from erasure correction to error correction, which forms a new problem in coding theory. We prove a bound on the amount of downloaded information for this problem and present explicit code families that attain this bound for a wide range of parameters

    Error correction based on partial information

    Full text link
    We consider the decoding of linear and array codes from errors when we are only allowed to download a part of the codeword. More specifically, suppose that we have encoded kk data symbols using an (n,k)(n,k) code with code length nn and dimension k.k. During storage, some of the codeword coordinates might be corrupted by errors. We aim to recover the original data by reading the corrupted codeword with a limit on the transmitting bandwidth, namely, we can only download an α\alpha proportion of the corrupted codeword. For a given α,\alpha, our objective is to design a code and a decoding scheme such that we can recover the original data from the largest possible number of errors. A naive scheme is to read αn\alpha n coordinates of the codeword. This method used in conjunction with MDS codes guarantees recovery from any ⌊(αn−k)/2⌋\lfloor(\alpha n-k)/2\rfloor errors. In this paper we show that we can instead read an α\alpha proportion from each of the codeword's coordinates. For a well-designed MDS code, this method can guarantee recovery from ⌊(n−k/α)/2⌋\lfloor (n-k/\alpha)/2 \rfloor errors, which is 1/α1/\alpha times more than the naive method, and is also the maximum number of errors that an (n,k)(n,k) code can correct by downloading only an α\alpha proportion of the codeword. We present two families of such optimal constructions and decoding schemes. One is a Reed-Solomon code with evaluation points in a subfield and the other is based on Folded Reed-Solomon codes. We further show that both code constructions attain asymptotically optimal list decoding radius when downloading only a part of the corrupted codeword. We also construct an ensemble of random codes that with high probability approaches the upper bound on the number of correctable errors when the decoder downloads an α\alpha proportion of the corrupted codeword.Comment: Extended version of the conference paper in ISIT 201

    Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage

    Get PDF
    Erasure codes are typically used in large-scale distributed storage systems to provide durability of data in the face of failures. In this setting, a set of k blocks to be stored is encoded using an [n, k] code to generate n blocks that are then stored on different storage nodes. A recent work by Kadekodi et al. [Kadekodi et al., 2019] shows that the failure rate of storage devices vary significantly over time, and that changing the rate of the code (via a change in the parameters n and k) in response to such variations provides significant reduction in storage space requirement. However, the resource overhead of realizing such a change in the code rate on already encoded data in traditional codes is prohibitively high. Motivated by this application, in this work we first present a new framework to formalize the notion of code conversion - the process of converting data encoded with an [n^I, k^I] code into data encoded with an [n^F, k^F] code while maintaining desired decodability properties, such as the maximum-distance-separable (MDS) property. We then introduce convertible codes, a new class of code pairs that allow for code conversions in a resource-efficient manner. For an important parameter regime (which we call the merge regime) along with the widely used linearity and MDS decodability constraint, we prove tight bounds on the number of nodes accessed during code conversion. In particular, our achievability result is an explicit construction of MDS convertible codes that are optimal for all parameter values in the merge regime albeit with a high field size. We then present explicit low-field-size constructions of optimal MDS convertible codes for a broad range of parameters in the merge regime. Our results thus show that it is indeed possible to achieve code conversions with significantly lesser resources as compared to the default approach of re-encoding
    • 

    corecore