87,677 research outputs found
Update-Efficient Regenerating Codes with Minimum Per-Node Storage
Regenerating codes provide an efficient way to recover data at failed nodes
in distributed storage systems. It has been shown that regenerating codes can
be designed to minimize the per-node storage (called MSR) or minimize the
communication overhead for regeneration (called MBR). In this work, we propose
a new encoding scheme for [n,d] error- correcting MSR codes that generalizes
our earlier work on error-correcting regenerating codes. We show that by
choosing a suitable diagonal matrix, any generator matrix of the [n,{\alpha}]
Reed-Solomon (RS) code can be integrated into the encoding matrix. Hence, MSR
codes with the least update complexity can be found. An efficient decoding
scheme is also proposed that utilizes the [n,{\alpha}] RS code to perform data
reconstruction. The proposed decoding scheme has better error correction
capability and incurs the least number of node accesses when errors are
present.Comment: Submitted to IEEE ISIT 201
Update-Efficiency and Local Repairability Limits for Capacity Approaching Codes
Motivated by distributed storage applications, we investigate the degree to
which capacity achieving encodings can be efficiently updated when a single
information bit changes, and the degree to which such encodings can be
efficiently (i.e., locally) repaired when single encoded bit is lost.
Specifically, we first develop conditions under which optimum
error-correction and update-efficiency are possible, and establish that the
number of encoded bits that must change in response to a change in a single
information bit must scale logarithmically in the block-length of the code if
we are to achieve any nontrivial rate with vanishing probability of error over
the binary erasure or binary symmetric channels. Moreover, we show there exist
capacity-achieving codes with this scaling.
With respect to local repairability, we develop tight upper and lower bounds
on the number of remaining encoded bits that are needed to recover a single
lost bit of the encoding. In particular, we show that if the code-rate is
less than the capacity, then for optimal codes, the maximum number
of codeword symbols required to recover one lost symbol must scale as
.
Several variations on---and extensions of---these results are also developed.Comment: Accepted to appear in JSA
A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication
This paper has two contributions. First, we propose a novel coded matrix
multiplication technique called Generalized PolyDot codes that advances on
existing methods for coded matrix multiplication under storage and
communication constraints. This technique uses "garbage alignment," i.e.,
aligning computations in coded computing that are not a part of the desired
output. Generalized PolyDot codes bridge between Polynomial codes and MatDot
codes, trading off between recovery threshold and communication costs. Second,
we demonstrate that Generalized PolyDot can be used for training large Deep
Neural Networks (DNNs) on unreliable nodes prone to soft-errors. This requires
us to address three additional challenges: (i) prohibitively large overhead of
coding the weight matrices in each layer of the DNN at each iteration; (ii)
nonlinear operations during training, which are incompatible with linear
coding; and (iii) not assuming presence of an error-free master node, requiring
us to architect a fully decentralized implementation without any "single point
of failure." We allow all primary DNN training steps, namely, matrix
multiplication, nonlinear activation, Hadamard product, and update steps as
well as the encoding/decoding to be error-prone. We consider the case of
mini-batch size , as well as , leveraging coded matrix-vector
products, and matrix-matrix products respectively. The problem of DNN training
under soft-errors also motivates an interesting, probabilistic error model
under which a real number MDS code is shown to correct errors
with probability as compared to for the
more conventional, adversarial error model. We also demonstrate that our
proposed strategy can provide unbounded gains in error tolerance over a
competing replication strategy and a preliminary MDS-code-based strategy for
both these error models.Comment: Presented in part at the IEEE International Symposium on Information
Theory 2018 (Submission Date: Jan 12 2018); Currently under review at the
IEEE Transactions on Information Theor
- β¦