5,616 research outputs found
Storage codes -- coding rate and repair locality
The {\em repair locality} of a distributed storage code is the maximum number
of nodes that ever needs to be contacted during the repair of a failed node.
Having small repair locality is desirable, since it is proportional to the
number of disk accesses during repair. However, recent publications show that
small repair locality comes with a penalty in terms of code distance or storage
overhead if exact repair is required.
Here, we first review some of the main results on storage codes under various
repair regimes and discuss the recent work on possible
(information-theoretical) trade-offs between repair locality and other code
parameters like storage overhead and code distance, under the exact repair
regime.
Then we present some new information theoretical lower bounds on the storage
overhead as a function of the repair locality, valid for all common coding and
repair models. In particular, we show that if each of the nodes in a
distributed storage system has storage capacity \ga and if, at any time, a
failed node can be {\em functionally} repaired by contacting {\em some} set of
nodes (which may depend on the actual state of the system) and downloading
an amount \gb of data from each, then in the extreme cases where \ga=\gb or
\ga = r\gb, the maximal coding rate is at most or 1/2, respectively
(that is, the excess storage overhead is at least or 1, respectively).Comment: Accepted for publication in ICNC'13, San Diego, US
Explicit MBR All-Symbol Locality Codes
Node failures are inevitable in distributed storage systems (DSS). To enable
efficient repair when faced with such failures, two main techniques are known:
Regenerating codes, i.e., codes that minimize the total repair bandwidth; and
codes with locality, which minimize the number of nodes participating in the
repair process. This paper focuses on regenerating codes with locality, using
pre-coding based on Gabidulin codes, and presents constructions that utilize
minimum bandwidth regenerating (MBR) local codes. The constructions achieve
maximum resilience (i.e., optimal minimum distance) and have maximum capacity
(i.e., maximum rate). Finally, the same pre-coding mechanism can be combined
with a subclass of fractional-repetition codes to enable maximum resilience and
repair-by-transfer simultaneously
Node Repair for Distributed Storage Systems over Fading Channels
Distributed storage systems and associated storage codes can efficiently
store a large amount of data while ensuring that data is retrievable in case of
node failure. The study of such systems, particularly the design of storage
codes over finite fields, assumes that the physical channel through which the
nodes communicate is error-free. This is not always the case, for example, in a
wireless storage system.
We study the probability that a subpacket is repaired incorrectly during node
repair in a distributed storage system, in which the nodes communicate over an
AWGN or Rayleigh fading channels. The asymptotic probability (as SNR increases)
that a node is repaired incorrectly is shown to be completely determined by the
repair locality of the DSS and the symbol error rate of the wireless channel.
Lastly, we propose some design criteria for physical layer coding in this
scenario, and use it to compute optimally rotated QAM constellations for use in
wireless distributed storage systems.Comment: To appear in ISITA 201
Coding for the Clouds: Coding Techniques for Enabling Security, Locality, and Availability in Distributed Storage Systems
Cloud systems have become the backbone of many applications such as multimedia
streaming, e-commerce, and cluster computing. At the foundation of any cloud architecture
lies a large-scale, distributed, data storage system. To accommodate the massive
amount of data being stored on the cloud, these distributed storage systems (DSS) have
been scaled to contain hundreds to thousands of nodes that are connected through a networking
infrastructure. Such data-centers are usually built out of commodity components,
which make failures the norm rather than the exception.
In order to combat node failures, data is typically stored in a redundant fashion. Due to
the exponential data growth rate, many DSS are beginning to resort to error control coding
over conventional replication methods, as coding offers high storage space efficiency. This
paradigm shift from replication to coding, along with the need to guarantee reliability, efficiency,
and security in DSS, has created a new set of challenges and opportunities, opening
up a new area of research. This thesis addresses several of these challenges and opportunities
by broadly making the following contributions. (i) We design practically amenable,
low-complexity coding schemes that guarantee security of cloud systems, ensure quick
recovery from failures, and provide high availability for retrieving partial information; and
(ii) We analyze fundamental performance limits and optimal trade-offs between the key
performance metrics of these coding schemes.
More specifically, we first consider the problem of achieving information-theoretic
security in DSS against an eavesdropper that can observe a limited number of nodes. We
present a framework that enables design of secure repair-efficient codes through a joint
construction of inner and outer codes. Then, we consider a practically appealing notion
of weakly secure coding, and construct coset codes that can weakly secure a wide class of regenerating codes that reduce the amount of data downloaded during node repair.
Second, we consider the problem of meeting repair locality constraints, which specify
the number of nodes participating in the repair process. We propose a notion of unequal
locality, which enables different locality values for different nodes, ensuring quick recovery
for nodes storing important data. We establish tight upper bounds on the minimum
distance of linear codes with unequal locality, and present optimal code constructions.
Next, we extend the notion of locality from the Hamming metric to the rank and subspace
metrics, with the goal of designing codes for efficient data recovery from special types of
correlated failures in DSS.We construct a family of locally recoverable rank-metric codes
with optimal data recovery properties.
Finally, we consider the problem of providing high availability, which is ensured by
enabling node repair from multiple disjoint subsets of nodes of small size. We study
codes with availability from a queuing-theoretical perspective by analyzing the average
time necessary to download a block of data under the Poisson request arrival model when
each node takes a random amount of time to fetch its contents. We compare the delay
performance of the availability codes with several alternatives such as conventional erasure
codes and replication schemes
- …