Distributed data storage systems are essential to deal with the need to store
massive volumes of data. In order to make such a system fault-tolerant, some
form of redundancy becomes crucial, incurring various overheads - most
prominently in terms of storage space and maintenance bandwidth requirements.
Erasure codes, originally designed for communication over lossy channels,
provide a storage efficient alternative to replication based redundancy,
however entailing high communication overhead for maintenance, when some of the
encoded fragments need to be replenished in news ones after failure of some
storage devices. We propose as an alternative a new family of erasure codes
called self-repairing codes (SRC) taking into account the peculiarities of
distributed storage systems, specifically the maintenance process. SRC has the
following salient features: (a) encoded fragments can be repaired directly from
other subsets of encoded fragments by downloading less data than the size of
the complete object, ensuring that (b) a fragment is repaired from a fixed
number of encoded fragments, the number depending only on how many encoded
blocks are missing and independent of which specific blocks are missing. This
paper lays the foundations by defining the novel self-repairing codes,
elaborating why the defined characteristics are desirable for distributed
storage systems. Then homomorphic self-repairing codes (HSRC) are proposed as a
concrete instance, whose various aspects and properties are studied and
compared - quantitatively or qualitatively with respect to other codes
including traditional erasure codes as well as other recent codes designed
specifically for storage applications.Comment: arXiv admin note: significant text overlap with arXiv:1008.006