517 research outputs found
Repair-Optimal MDS Array Codes over GF(2)
Maximum-distance separable (MDS) array codes with high rate and an optimal
repair property were introduced recently. These codes could be applied in
distributed storage systems, where they minimize the communication and disk
access required for the recovery of failed nodes. However, the encoding and
decoding algorithms of the proposed codes use arithmetic over finite fields of
order greater than 2, which could result in a complex implementation.
In this work, we present a construction of 2-parity MDS array codes, that
allow for optimal repair of a failed information node using XOR operations
only. The reduction of the field order is achieved by allowing more parity bits
to be updated when a single information bit is being changed by the user.Comment: 5 pages, submitted to ISIT 201
Two-layer Locally Repairable Codes for Distributed Storage Systems
In this paper, we propose locally repairable codes (LRCs) with optimal
minimum distance for distributed storage systems (DSS). A two-layer encoding
structure is employed to ensure data reconstruction and the designated repair
locality. The data is first encoded in the first layer by any existing maximum
distance separable (MDS) codes, and then the encoded symbols are divided into
non-overlapping groups and encoded by an MDS array code in the second layer.
The encoding in the second layer provides enough redundancy for local repair,
while the overall code performs recovery of the data based on redundancy from
both layers. Our codes can be constructed over a finite field with size growing
linearly with the total number of nodes in the DSS, and facilitate efficient
degraded reads.Comment: This paper has been withdrawn by the author due to inaccuracy of
Claim
A Repair Framework for Scalar MDS Codes
Several works have developed vector-linear maximum-distance separable (MDS)
storage codes that min- imize the total communication cost required to repair a
single coded symbol after an erasure, referred to as repair bandwidth (BW).
Vector codes allow communicating fewer sub-symbols per node, instead of the
entire content. This allows non trivial savings in repair BW. In sharp
contrast, classic codes, like Reed- Solomon (RS), used in current storage
systems, are deemed to suffer from naive repair, i.e. downloading the entire
stored message to repair one failed node. This mainly happens because they are
scalar-linear. In this work, we present a simple framework that treats scalar
codes as vector-linear. In some cases, this allows significant savings in
repair BW. We show that vectorized scalar codes exhibit properties that
simplify the design of repair schemes. Our framework can be seen as a finite
field analogue of real interference alignment. Using our simplified framework,
we design a scheme that we call clique-repair which provably identifies the
best linear repair strategy for any scalar 2-parity MDS code, under some
conditions on the sub-field chosen for vectorization. We specify optimal repair
schemes for specific (5,3)- and (6,4)-Reed- Solomon (RS) codes. Further, we
present a repair strategy for the RS code currently deployed in the Facebook
Analytics Hadoop cluster that leads to 20% of repair BW savings over naive
repair which is the repair scheme currently used for this code.Comment: 10 Pages; accepted to IEEE JSAC -Distributed Storage 201
Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage
Erasure codes are typically used in large-scale distributed storage systems to provide durability of data in the face of failures. In this setting, a set of k blocks to be stored is encoded using an [n, k] code to generate n blocks that are then stored on different storage nodes. A recent work by Kadekodi et al. [Kadekodi et al., 2019] shows that the failure rate of storage devices vary significantly over time, and that changing the rate of the code (via a change in the parameters n and k) in response to such variations provides significant reduction in storage space requirement. However, the resource overhead of realizing such a change in the code rate on already encoded data in traditional codes is prohibitively high.
Motivated by this application, in this work we first present a new framework to formalize the notion of code conversion - the process of converting data encoded with an [n^I, k^I] code into data encoded with an [n^F, k^F] code while maintaining desired decodability properties, such as the maximum-distance-separable (MDS) property. We then introduce convertible codes, a new class of code pairs that allow for code conversions in a resource-efficient manner. For an important parameter regime (which we call the merge regime) along with the widely used linearity and MDS decodability constraint, we prove tight bounds on the number of nodes accessed during code conversion. In particular, our achievability result is an explicit construction of MDS convertible codes that are optimal for all parameter values in the merge regime albeit with a high field size. We then present explicit low-field-size constructions of optimal MDS convertible codes for a broad range of parameters in the merge regime. Our results thus show that it is indeed possible to achieve code conversions with significantly lesser resources as compared to the default approach of re-encoding
Achieving Maximum Distance Separable Private Information Retrieval Capacity With Linear Codes
We propose three private information retrieval (PIR) protocols for
distributed storage systems (DSSs) where data is stored using an arbitrary
linear code. The first two protocols, named Protocol 1 and Protocol 2, achieve
privacy for the scenario with noncolluding nodes. Protocol 1 requires a file
size that is exponential in the number of files in the system, while Protocol 2
requires a file size that is independent of the number of files and is hence
simpler. We prove that, for certain linear codes, Protocol 1 achieves the
maximum distance separable (MDS) PIR capacity, i.e., the maximum PIR rate (the
ratio of the amount of retrieved stored data per unit of downloaded data) for a
DSS that uses an MDS code to store any given (finite and infinite) number of
files, and Protocol 2 achieves the asymptotic MDS-PIR capacity (with infinitely
large number of files in the DSS). In particular, we provide a necessary and a
sufficient condition for a code to achieve the MDS-PIR capacity with Protocols
1 and 2 and prove that cyclic codes, Reed-Muller (RM) codes, and a class of
distance-optimal local reconstruction codes achieve both the finite MDS-PIR
capacity (i.e., with any given number of files) and the asymptotic MDS-PIR
capacity with Protocols 1 and 2, respectively. Furthermore, we present a third
protocol, Protocol 3, for the scenario with multiple colluding nodes, which can
be seen as an improvement of a protocol recently introduced by Freij-Hollanti
et al.. Similar to the noncolluding case, we provide a necessary and a
sufficient condition to achieve the maximum possible PIR rate of Protocol 3.
Moreover, we provide a particular class of codes that is suitable for this
protocol and show that RM codes achieve the maximum possible PIR rate for the
protocol. For all three protocols, we present an algorithm to optimize their
PIR rates.Comment: This work is the extension of the work done in arXiv:1612.07084v2.
The current version introduces further refinement to the manuscript. Current
version will appear in the IEEE Transactions on Information Theor
Low-Complexity Codes for Random and Clustered High-Order Failures in Storage Arrays
RC (Random/Clustered) codes are a new efficient array-code family for recovering from 4-erasures. RC codes correct most 4-erasures, and essentially all 4-erasures that are clustered. Clustered erasures are introduced as a new erasure model for storage arrays. This model draws its motivation from correlated device failures, that are caused by physical proximity of devices, or by age proximity of endurance-limited solid-state drives. The reliability of storage arrays that employ RC codes is analyzed and compared to known codes. The new RC code is significantly more efficient, in all practical implementation factors, than the best known 4-erasure correcting MDS code. These factors include: small-write update-complexity, full-device update-complexity, decoding complexity and number of supported devices in the array
- …