63 research outputs found
CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems
Data availability is critical in distributed storage systems, especially when
node failures are prevalent in real life. A key requirement is to minimize the
amount of data transferred among nodes when recovering the lost or unavailable
data of failed nodes. This paper explores recovery solutions based on
regenerating codes, which are shown to provide fault-tolerant storage and
minimum recovery bandwidth. Existing optimal regenerating codes are designed
for single node failures. We build a system called CORE, which augments
existing optimal regenerating codes to support a general number of failures
including single and concurrent failures. We theoretically show that CORE
achieves the minimum possible recovery bandwidth for most cases. We implement
CORE and evaluate our prototype atop a Hadoop HDFS cluster testbed with up to
20 storage nodes. We demonstrate that our CORE prototype conforms to our
theoretical findings and achieves recovery bandwidth saving when compared to
the conventional recovery approach based on erasure codes.Comment: 25 page
HFR Code: A Flexible Replication Scheme for Cloud Storage Systems
Fractional repetition (FR) codes are a family of repair-efficient storage
codes that provide exact and uncoded node repair at the minimum bandwidth
regenerating point. The advantageous repair properties are achieved by a
tailor-made two-layer encoding scheme which concatenates an outer
maximum-distance-separable (MDS) code and an inner repetition code. In this
paper, we generalize the application of FR codes and propose heterogeneous
fractional repetition (HFR) code, which is adaptable to the scenario where the
repetition degrees of coded packets are different. We provide explicit code
constructions by utilizing group divisible designs, which allow the design of
HFR codes over a large range of parameters. The constructed codes achieve the
system storage capacity under random access repair and have multiple repair
alternatives for node failures. Further, we take advantage of the systematic
feature of MDS codes and present a novel design framework of HFR codes, in
which storage nodes can be wisely partitioned into clusters such that data
reconstruction time can be reduced when contacting nodes in the same cluster.Comment: Accepted for publication in IET Communications, Jul. 201
On Distributed Storage Codes
Distributed storage systems are studied. The interest in such system has become relatively wide due to the increasing amount of information needed to be stored in data centers or different kinds of cloud systems. There are many kinds of solutions for storing the information into distributed devices regarding the needs of the system designer. This thesis studies the questions of designing such storage systems and also fundamental limits of such systems. Namely, the subjects of interest of this thesis include heterogeneous distributed storage systems, distributed storage systems with the exact repair property, and locally repairable codes. For distributed storage systems with either functional or exact repair, capacity results are proved. In the case of locally repairable codes, the minimum distance is studied.
Constructions for exact-repairing codes between minimum bandwidth regeneration (MBR) and minimum storage regeneration (MSR) points are given. These codes exceed the time-sharing line of the extremal points in many cases. Other properties of exact-regenerating codes are also studied. For the heterogeneous setup, the main result is that the capacity of such systems is always smaller than or equal to the capacity of a homogeneous system with symmetric repair with average node size and average repair bandwidth. A randomized construction for a locally repairable code with good minimum distance is given. It is shown that a random linear code of certain natural type has a good minimum distance with high probability. Other properties of locally repairable codes are also studied.Siirretty Doriast
Functional repair codes: a view from projective geometry
Storage codes are used to ensure reliable storage of data in distributed systems. Here we
consider functional repair codes, where individual storage nodes that fail may be repaired
efficiently and the ability to recover original data and to further repair failed nodes is preserved.
There are two predominant approaches to repair codes: a coding theoretic approach
and a vector space approach. We explore the relationship between the two and frame the
later in terms of projective geometry. We find that many of the constructions proposed in
the literature can be seen to arise from natural and well-studied geometric objects, and that
this perspective gives a framework that provides opportunities for generalisations and new
constructions that can lead to greater flexibility in trade-offs between various desirable properties.
We also frame the cut-set bound obtained from network coding in terms of projective
geometry.
We explore the notion of strictly functional repair codes, for which there exist nodes that
cannot be replaced exactly. Currently only one known example is given in the literature,
due to Hollmann and Poh. We examine this phenomenon from a projective geometry point
of view, and discuss how strict functionality can arise.
Finally, we consider the issue that the view of a repair code as a collection of sets of
vector/projective subspaces is recursive in nature and makes it hard to visualise what a
collection of nodes looks like and how one might approach a construction. Here we provide
another view of using directed graphs that gives us non-recursive criteria for determining
whether a family of collections of subspaces constitutes a function, exact, or strictly functional
repair code, which may be of use in searching for new codes with desirable properties
- …