20,318 research outputs found
Reliability of Erasure Coded Storage Systems: A Geometric Approach
We consider the probability of data loss, or equivalently, the reliability
function for an erasure coded distributed data storage system under worst case
conditions. Data loss in an erasure coded system depends on probability
distributions for the disk repair duration and the disk failure duration. In
previous works, the data loss probability of such systems has been studied
under the assumption of exponentially distributed disk failure and disk repair
durations, using well-known analytic methods from the theory of Markov
processes. These methods lead to an estimate of the integral of the reliability
function.
Here, we address the problem of directly calculating the data loss
probability for general repair and failure duration distributions. A closed
limiting form is developed for the probability of data loss and it is shown
that the probability of the event that a repair duration exceeds a failure
duration is sufficient for characterizing the data loss probability.
For the case of constant repair duration, we develop an expression for the
conditional data loss probability given the number of failures experienced by a
each node in a given time window. We do so by developing a geometric approach
that relies on the computation of volumes of a family of polytopes that are
related to the code. An exact calculation is provided and an upper bound on the
data loss probability is obtained by posing the problem as a set avoidance
problem. Theoretical calculations are compared to simulation results.Comment: 28 pages. 8 figures. Presented in part at IEEE International
Conference on BigData 2013, Santa Clara, CA, Oct. 2013 and to be presented in
part at 2014 IEEE Information Theory Workshop, Tasmania, Australia, Nov.
2014. New analysis added May 2015. Further Update Aug. 201
Fractional repetition codes with flexible repair from combinatorial designs
Fractional repetition (FR) codes are a class of regenerating codes for
distributed storage systems with an exact (table-based) repair process that is
also uncoded, i.e., upon failure, a node is regenerated by simply downloading
packets from the surviving nodes. In our work, we present constructions of FR
codes based on Steiner systems and resolvable combinatorial designs such as
affine geometries, Hadamard designs and mutually orthogonal Latin squares. The
failure resilience of our codes can be varied in a simple manner. We construct
codes with normalized repair bandwidth () strictly larger than one;
these cannot be obtained trivially from codes with . Furthermore, we
present the Kronecker product technique for generating new codes from existing
ones and elaborate on their properties. FR codes with locality are those where
the repair degree is smaller than the number of nodes contacted for
reconstructing the stored file. For these codes we establish a tradeoff between
the local repair property and failure resilience and construct codes that meet
this tradeoff. Much of prior work only provided lower bounds on the FR code
rate. In our work, for most of our constructions we determine the code rate for
certain parameter ranges.Comment: 27 pages in IEEE two-column format. IEEE Transactions on Information
Theory (to appear
- …