316 research outputs found
Long MDS Codes for Optimal Repair Bandwidth
MDS codes are erasure-correcting codes that can
correct the maximum number of erasures given the number of
redundancy or parity symbols. If an MDS code has r parities
and no more than r erasures occur, then by transmitting all
the remaining data in the code one can recover the original
information. However, it was shown that in order to recover a
single symbol erasure, only a fraction of 1/r of the information
needs to be transmitted. This fraction is called the repair
bandwidth (fraction). Explicit code constructions were given in
previous works. If we view each symbol in the code as a vector
or a column, then the code forms a 2D array and such codes
are especially widely used in storage systems. In this paper, we
ask the following question: given the length of the column l, can
we construct high-rate MDS array codes with optimal repair
bandwidth of 1/r, whose code length is as long as possible? In
this paper, we give code constructions such that the code length
is (r + 1)log_r l
Explicit MDS Codes for Optimal Repair Bandwidth
MDS codes are erasure-correcting codes that can correct the maximum number of
erasures for a given number of redundancy or parity symbols. If an MDS code has
parities and no more than erasures occur, then by transmitting all the
remaining data in the code, the original information can be recovered. However,
it was shown that in order to recover a single symbol erasure, only a fraction
of of the information needs to be transmitted. This fraction is called
the repair bandwidth (fraction). Explicit code constructions were given in
previous works. If we view each symbol in the code as a vector or a column over
some field, then the code forms a 2D array and such codes are especially widely
used in storage systems. In this paper, we address the following question:
given the length of the column , number of parities , can we construct
high-rate MDS array codes with optimal repair bandwidth of , whose code
length is as long as possible? In this paper, we give code constructions such
that the code length is .Comment: 17 page
Coding Schemes for Distributed Storage Systems
This thesis is devoted to problems in error-correcting codes motivated by data integrity problems arising in large-scale distributed storage systems. We study properties and constructions of Maximum Distance Separable (MDS) codes, which are widely used in storage applications since they provide the maximum failure tolerance for a given amount of storage overhead.
Among the parameters of the code that are important for storage applications are: the amount of data transferred in the system during node repair (the repair bandwidth), which characterizes the network usage, and the volume of accessed data, which corresponds to the number of disk I/O operations. Therefore, recent research on MDS codes for distributed storage has focused on codes that can minimize these two quantities. A lower bound on the repair bandwidth of a code, called the cut-set bound, was proved by Dimakis et al. in 2010, and codes that attain this bound are said to have the optimal repair property. Explicit optimal-repair low-rate (rate ) MDS codes were constructed by Rashmi et al. in 2011. At the same time, large-scale distributed systems such as the Google File System and Hadoop Distributed File System, employ high-rate (rate ) MDS codes due to the need of reducing storage overhead. Until recently, except for some particular cases, no general explicit constructions of high-rate optimal-repair MDS codes were known.
In this thesis, we present the first explicit constructions of optimal-repair MDS codes, thereby providing a solution to the general construction problem of such codes for the high-rate regime.
More specifically, we construct explicit MDS codes that can repair any number of failed nodes from any number of helper nodes with the smallest possible amount of downloaded/accessed data. For the particular case of repairing a single node failure, we further present an explicit family of MDS codes that minimize the amount of accessed data during the repair. This family of codes has an additional favorable property that the node size (the amount of information stored in the node) is also the smallest possible. Reducing the node size directly translates into reducing the complexity of storage systems.
While most studies on MDS codes with optimal repair bandwidth focus on array codes, the repair problem of widely used scalar codes such as Reed-Solomon codes has also recently attracted attention of researchers. It has been an open problem whether scalar linear MDS codes can achieve the cut-set bound. In this thesis, we answer this question in the affirmative by giving explicit constructions of Reed-Solomon codes that can be repaired at the cut-set bound. We also prove a lower bound on the node size of optimally repairable scalar MDS codes, showing that the node size of our RS codes is close to the best possible for scalar linear codes.
Finally, we extend the concept of repair bandwidth from erasure correction to error correction, which forms a new problem in coding theory. We prove a bound on the amount of downloaded information for this problem and present explicit code families that attain this bound for a wide range of parameters
Error correction based on partial information
We consider the decoding of linear and array codes from errors when we are
only allowed to download a part of the codeword. More specifically, suppose
that we have encoded data symbols using an code with code length
and dimension During storage, some of the codeword coordinates might
be corrupted by errors. We aim to recover the original data by reading the
corrupted codeword with a limit on the transmitting bandwidth, namely, we can
only download an proportion of the corrupted codeword. For a given
our objective is to design a code and a decoding scheme such that we
can recover the original data from the largest possible number of errors. A
naive scheme is to read coordinates of the codeword. This method
used in conjunction with MDS codes guarantees recovery from any errors. In this paper we show that we can instead read an
proportion from each of the codeword's coordinates. For a
well-designed MDS code, this method can guarantee recovery from errors, which is times more than the naive
method, and is also the maximum number of errors that an code can
correct by downloading only an proportion of the codeword. We present
two families of such optimal constructions and decoding schemes. One is a
Reed-Solomon code with evaluation points in a subfield and the other is based
on Folded Reed-Solomon codes. We further show that both code constructions
attain asymptotically optimal list decoding radius when downloading only a part
of the corrupted codeword. We also construct an ensemble of random codes that
with high probability approaches the upper bound on the number of correctable
errors when the decoder downloads an proportion of the corrupted
codeword.Comment: Extended version of the conference paper in ISIT 201
Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage
Erasure codes are typically used in large-scale distributed storage systems to provide durability of data in the face of failures. In this setting, a set of k blocks to be stored is encoded using an [n, k] code to generate n blocks that are then stored on different storage nodes. A recent work by Kadekodi et al. [Kadekodi et al., 2019] shows that the failure rate of storage devices vary significantly over time, and that changing the rate of the code (via a change in the parameters n and k) in response to such variations provides significant reduction in storage space requirement. However, the resource overhead of realizing such a change in the code rate on already encoded data in traditional codes is prohibitively high.
Motivated by this application, in this work we first present a new framework to formalize the notion of code conversion - the process of converting data encoded with an [n^I, k^I] code into data encoded with an [n^F, k^F] code while maintaining desired decodability properties, such as the maximum-distance-separable (MDS) property. We then introduce convertible codes, a new class of code pairs that allow for code conversions in a resource-efficient manner. For an important parameter regime (which we call the merge regime) along with the widely used linearity and MDS decodability constraint, we prove tight bounds on the number of nodes accessed during code conversion. In particular, our achievability result is an explicit construction of MDS convertible codes that are optimal for all parameter values in the merge regime albeit with a high field size. We then present explicit low-field-size constructions of optimal MDS convertible codes for a broad range of parameters in the merge regime. Our results thus show that it is indeed possible to achieve code conversions with significantly lesser resources as compared to the default approach of re-encoding
- âŠ