32 research outputs found
Capacity of Locally Recoverable Codes
Motivated by applications in distributed storage, the notion of a locally
recoverable code (LRC) was introduced a few years back. In an LRC, any
coordinate of a codeword is recoverable by accessing only a small number of
other coordinates. While different properties of LRCs have been well-studied,
their performance on channels with random erasures or errors has been mostly
unexplored. In this note, we analyze the performance of LRCs over such
stochastic channels. In particular, for input-symmetric discrete memoryless
channels, we give a tight characterization of the gap to Shannon capacity when
LRCs are used over the channel.Comment: Invited paper to the Information Theory Workshop (ITW) 201
Update-Efficiency and Local Repairability Limits for Capacity Approaching Codes
Motivated by distributed storage applications, we investigate the degree to
which capacity achieving encodings can be efficiently updated when a single
information bit changes, and the degree to which such encodings can be
efficiently (i.e., locally) repaired when single encoded bit is lost.
Specifically, we first develop conditions under which optimum
error-correction and update-efficiency are possible, and establish that the
number of encoded bits that must change in response to a change in a single
information bit must scale logarithmically in the block-length of the code if
we are to achieve any nontrivial rate with vanishing probability of error over
the binary erasure or binary symmetric channels. Moreover, we show there exist
capacity-achieving codes with this scaling.
With respect to local repairability, we develop tight upper and lower bounds
on the number of remaining encoded bits that are needed to recover a single
lost bit of the encoding. In particular, we show that if the code-rate is
less than the capacity, then for optimal codes, the maximum number
of codeword symbols required to recover one lost symbol must scale as
.
Several variations on---and extensions of---these results are also developed.Comment: Accepted to appear in JSA
Coding for the Clouds: Coding Techniques for Enabling Security, Locality, and Availability in Distributed Storage Systems
Cloud systems have become the backbone of many applications such as multimedia
streaming, e-commerce, and cluster computing. At the foundation of any cloud architecture
lies a large-scale, distributed, data storage system. To accommodate the massive
amount of data being stored on the cloud, these distributed storage systems (DSS) have
been scaled to contain hundreds to thousands of nodes that are connected through a networking
infrastructure. Such data-centers are usually built out of commodity components,
which make failures the norm rather than the exception.
In order to combat node failures, data is typically stored in a redundant fashion. Due to
the exponential data growth rate, many DSS are beginning to resort to error control coding
over conventional replication methods, as coding offers high storage space efficiency. This
paradigm shift from replication to coding, along with the need to guarantee reliability, efficiency,
and security in DSS, has created a new set of challenges and opportunities, opening
up a new area of research. This thesis addresses several of these challenges and opportunities
by broadly making the following contributions. (i) We design practically amenable,
low-complexity coding schemes that guarantee security of cloud systems, ensure quick
recovery from failures, and provide high availability for retrieving partial information; and
(ii) We analyze fundamental performance limits and optimal trade-offs between the key
performance metrics of these coding schemes.
More specifically, we first consider the problem of achieving information-theoretic
security in DSS against an eavesdropper that can observe a limited number of nodes. We
present a framework that enables design of secure repair-efficient codes through a joint
construction of inner and outer codes. Then, we consider a practically appealing notion
of weakly secure coding, and construct coset codes that can weakly secure a wide class of regenerating codes that reduce the amount of data downloaded during node repair.
Second, we consider the problem of meeting repair locality constraints, which specify
the number of nodes participating in the repair process. We propose a notion of unequal
locality, which enables different locality values for different nodes, ensuring quick recovery
for nodes storing important data. We establish tight upper bounds on the minimum
distance of linear codes with unequal locality, and present optimal code constructions.
Next, we extend the notion of locality from the Hamming metric to the rank and subspace
metrics, with the goal of designing codes for efficient data recovery from special types of
correlated failures in DSS.We construct a family of locally recoverable rank-metric codes
with optimal data recovery properties.
Finally, we consider the problem of providing high availability, which is ensured by
enabling node repair from multiple disjoint subsets of nodes of small size. We study
codes with availability from a queuing-theoretical perspective by analyzing the average
time necessary to download a block of data under the Poisson request arrival model when
each node takes a random amount of time to fetch its contents. We compare the delay
performance of the availability codes with several alternatives such as conventional erasure
codes and replication schemes
A Study on the Impact of Locality in the Decoding of Binary Cyclic Codes
In this paper, we study the impact of locality on the decoding of binary
cyclic codes under two approaches, namely ordered statistics decoding (OSD) and
trellis decoding. Given a binary cyclic code having locality or availability,
we suitably modify the OSD to obtain gains in terms of the Signal-To-Noise
ratio, for a given reliability and essentially the same level of decoder
complexity. With regard to trellis decoding, we show that careful introduction
of locality results in the creation of cyclic subcodes having lower maximum
state complexity. We also present a simple upper-bounding technique on the
state complexity profile, based on the zeros of the code. Finally, it is shown
how the decoding speed can be significantly increased in the presence of
locality, in the moderate-to-high SNR regime, by making use of a quick-look
decoder that often returns the ML codeword.Comment: Extended version of a paper submitted to ISIT 201
Coding for the Clouds: Coding Techniques for Enabling Security, Locality, and Availability in Distributed Storage Systems
Cloud systems have become the backbone of many applications such as multimedia
streaming, e-commerce, and cluster computing. At the foundation of any cloud architecture
lies a large-scale, distributed, data storage system. To accommodate the massive
amount of data being stored on the cloud, these distributed storage systems (DSS) have
been scaled to contain hundreds to thousands of nodes that are connected through a networking
infrastructure. Such data-centers are usually built out of commodity components,
which make failures the norm rather than the exception.
In order to combat node failures, data is typically stored in a redundant fashion. Due to
the exponential data growth rate, many DSS are beginning to resort to error control coding
over conventional replication methods, as coding offers high storage space efficiency. This
paradigm shift from replication to coding, along with the need to guarantee reliability, efficiency,
and security in DSS, has created a new set of challenges and opportunities, opening
up a new area of research. This thesis addresses several of these challenges and opportunities
by broadly making the following contributions. (i) We design practically amenable,
low-complexity coding schemes that guarantee security of cloud systems, ensure quick
recovery from failures, and provide high availability for retrieving partial information; and
(ii) We analyze fundamental performance limits and optimal trade-offs between the key
performance metrics of these coding schemes.
More specifically, we first consider the problem of achieving information-theoretic
security in DSS against an eavesdropper that can observe a limited number of nodes. We
present a framework that enables design of secure repair-efficient codes through a joint
construction of inner and outer codes. Then, we consider a practically appealing notion
of weakly secure coding, and construct coset codes that can weakly secure a wide class of regenerating codes that reduce the amount of data downloaded during node repair.
Second, we consider the problem of meeting repair locality constraints, which specify
the number of nodes participating in the repair process. We propose a notion of unequal
locality, which enables different locality values for different nodes, ensuring quick recovery
for nodes storing important data. We establish tight upper bounds on the minimum
distance of linear codes with unequal locality, and present optimal code constructions.
Next, we extend the notion of locality from the Hamming metric to the rank and subspace
metrics, with the goal of designing codes for efficient data recovery from special types of
correlated failures in DSS.We construct a family of locally recoverable rank-metric codes
with optimal data recovery properties.
Finally, we consider the problem of providing high availability, which is ensured by
enabling node repair from multiple disjoint subsets of nodes of small size. We study
codes with availability from a queuing-theoretical perspective by analyzing the average
time necessary to download a block of data under the Poisson request arrival model when
each node takes a random amount of time to fetch its contents. We compare the delay
performance of the availability codes with several alternatives such as conventional erasure
codes and replication schemes
Locally Decodable Index Codes
An index code for broadcast channel with receiver side information is locally
decodable if each receiver can decode its demand by observing only a subset of
the transmitted codeword symbols instead of the entire codeword. Local
decodability in index coding is known to reduce receiver complexity, improve
user privacy and decrease decoding error probability in wireless fading
channels. Conventional index coding solutions assume that the receivers observe
the entire codeword, and as a result, for these codes the number of codeword
symbols queried by a user per decoded message symbol, which we refer to as
locality, could be large. In this paper, we pose the index coding problem as
that of minimizing the broadcast rate for a given value of locality (or vice
versa) and designing codes that achieve the optimal trade-off between locality
and rate. We identify the optimal broadcast rate corresponding to the minimum
possible value of locality for all single unicast problems. We present new
structural properties of index codes which allow us to characterize the optimal
trade-off achieved by: vector linear codes when the side information graph is a
directed cycle; and scalar linear codes when the minrank of the side
information graph is one less than the order of the problem. We also identify
the optimal trade-off among all codes, including non-linear codes, when the
side information graph is a directed 3-cycle. Finally, we present techniques to
design locally decodable index codes for arbitrary single unicast problems and
arbitrary values of locality.Comment: Accepted for publication in the IEEE Transactions on Information
Theory. Parts of this manuscript were presented at IEEE ISIT 2018 and IEEE
ISIT 2019. This arXiv manuscript subsumes the contents of arXiv:1801.03895
and arXiv:1901.0590