8 research outputs found
Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction
Consider two or more strings that are
concatenated to form . Suppose that up to deletions occur in each of the
concatenated strings. Since deletions alter the lengths of the strings, a
fundamental question to ask is: how much redundancy do we need to introduce in
in order to recover the boundaries of
? This boundary problem is equivalent to the
problem of designing codes that can detect the exact number of deletions in
each concatenated string. In this work, we answer the question above by first
deriving converse results that give lower bounds on the redundancy of
deletion-detecting codes. Then, we present a marker-based code construction
whose redundancy is asymptotically optimal in among all families of
deletion-detecting codes, and exactly optimal among all block-by-block
decodable codes. To exemplify the usefulness of such deletion-detecting codes,
we apply our code to trace reconstruction and design an efficient coded
reconstruction scheme that requires a constant number of traces.Comment: Accepted for publication in the IEEE Transactions on Information
Theory. arXiv admin note: substantial text overlap with arXiv:2207.05126,
arXiv:2105.0021
Guess & Check Codes for Deletions and Synchronization
We consider the problem of constructing codes that can correct
deletions occurring in an arbitrary binary string of length bits.
Varshamov-Tenengolts (VT) codes can correct all possible single deletions
with an asymptotically optimal redundancy. Finding similar codes
for deletions is an open problem. We propose a new family of
codes, that we call Guess & Check (GC) codes, that can correct, with high
probability, a constant number of deletions occurring at uniformly
random positions within an arbitrary string. The GC codes are based on MDS
codes and have an asymptotically optimal redundancy that is . We provide deterministic polynomial time encoding and decoding schemes for
these codes. We also describe the applications of GC codes to file
synchronization.Comment: Accepted in ISIT 201
Coding for Trace Reconstruction over Multiple Channels with Vanishing Deletion Probabilities
Motivated by DNA-based storage applications, we study the problem of
reconstructing a coded sequence from multiple traces. We consider the model
where the traces are outputs of independent deletion channels, where each
channel deletes each bit of the input codeword
independently with probability . We focus on the regime where the deletion
probability when . Our main contribution is
designing a novel code for trace reconstruction that allows reconstructing a
coded sequence efficiently from a constant number of traces. We provide
theoretical results on the performance of our code in addition to simulation
results where we compare the performance of our code to other reconstruction
techniques in terms of the edit distance error.Comment: This is the full version of the short paper accepted at ISIT 202
Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load
In distributed machine learning, a central node outsources computationally
expensive calculations to external worker nodes. The properties of optimization
procedures like stochastic gradient descent (SGD) can be leveraged to mitigate
the effect of unresponsive or slow workers called stragglers, that otherwise
degrade the benefit of outsourcing the computation. This can be done by only
waiting for a subset of the workers to finish their computation at each
iteration of the algorithm. Previous works proposed to adapt the number of
workers to wait for as the algorithm evolves to optimize the speed of
convergence. In contrast, we model the communication and computation times
using independent random variables. Considering this model, we construct a
novel scheme that adapts both the number of workers and the computation load
throughout the run-time of the algorithm. Consequently, we improve the
convergence speed of distributed SGD while significantly reducing the
computation load, at the expense of a slight increase in communication load
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers
We consider the setting where a master wants to run a distributed stochastic
gradient descent (SGD) algorithm on workers each having a subset of the
data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or
unresponsive workers who cause delays. One solution studied in the literature
is to wait at each iteration for the responses of the fastest workers
before updating the model, where is a fixed parameter. The choice of the
value of presents a trade-off between the runtime (i.e., convergence rate)
of SGD and the error of the model. Towards optimizing the error-runtime
trade-off, we investigate distributed SGD with adaptive . We first design an
adaptive policy for varying that optimizes this trade-off based on an upper
bound on the error as a function of the wall-clock time which we derive. Then,
we propose an algorithm for adaptive distributed SGD that is based on a
statistical heuristic. We implement our algorithm and provide numerical
simulations which confirm our intuition and theoretical analysis.Comment: Accepted to IEEE ICASSP 202