8 research outputs found

    Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction

    Full text link
    Consider two or more strings x1,x2,,\mathbf{x}^1,\mathbf{x}^2,\ldots, that are concatenated to form x=x1,x2,\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle. Suppose that up to δ\delta deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in x\mathbf{x} in order to recover the boundaries of x1,x2,\mathbf{x}^1,\mathbf{x}^2,\ldots? This boundary problem is equivalent to the problem of designing codes that can detect the exact number of deletions in each concatenated string. In this work, we answer the question above by first deriving converse results that give lower bounds on the redundancy of deletion-detecting codes. Then, we present a marker-based code construction whose redundancy is asymptotically optimal in δ\delta among all families of deletion-detecting codes, and exactly optimal among all block-by-block decodable codes. To exemplify the usefulness of such deletion-detecting codes, we apply our code to trace reconstruction and design an efficient coded reconstruction scheme that requires a constant number of traces.Comment: Accepted for publication in the IEEE Transactions on Information Theory. arXiv admin note: substantial text overlap with arXiv:2207.05126, arXiv:2105.0021

    Guess & Check Codes for Deletions and Synchronization

    Full text link
    We consider the problem of constructing codes that can correct δ\delta deletions occurring in an arbitrary binary string of length nn bits. Varshamov-Tenengolts (VT) codes can correct all possible single deletions (δ=1)(\delta=1) with an asymptotically optimal redundancy. Finding similar codes for δ2\delta \geq 2 deletions is an open problem. We propose a new family of codes, that we call Guess & Check (GC) codes, that can correct, with high probability, a constant number of deletions δ\delta occurring at uniformly random positions within an arbitrary string. The GC codes are based on MDS codes and have an asymptotically optimal redundancy that is Θ(δlogn)\Theta(\delta \log n). We provide deterministic polynomial time encoding and decoding schemes for these codes. We also describe the applications of GC codes to file synchronization.Comment: Accepted in ISIT 201

    Coding for Trace Reconstruction over Multiple Channels with Vanishing Deletion Probabilities

    Full text link
    Motivated by DNA-based storage applications, we study the problem of reconstructing a coded sequence from multiple traces. We consider the model where the traces are outputs of independent deletion channels, where each channel deletes each bit of the input codeword x{0,1}n\mathbf{x} \in \{0,1\}^n independently with probability pp. We focus on the regime where the deletion probability p0p \to 0 when nn\to \infty. Our main contribution is designing a novel code for trace reconstruction that allows reconstructing a coded sequence efficiently from a constant number of traces. We provide theoretical results on the performance of our code in addition to simulation results where we compare the performance of our code to other reconstruction techniques in terms of the edit distance error.Comment: This is the full version of the short paper accepted at ISIT 202

    Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

    Full text link
    In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers, that otherwise degrade the benefit of outsourcing the computation. This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm. Previous works proposed to adapt the number of workers to wait for as the algorithm evolves to optimize the speed of convergence. In contrast, we model the communication and computation times using independent random variables. Considering this model, we construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm. Consequently, we improve the convergence speed of distributed SGD while significantly reducing the computation load, at the expense of a slight increase in communication load

    Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

    Full text link
    We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on nn workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest k<nk<n workers before updating the model, where kk is a fixed parameter. The choice of the value of kk presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive kk. We first design an adaptive policy for varying kk that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time which we derive. Then, we propose an algorithm for adaptive distributed SGD that is based on a statistical heuristic. We implement our algorithm and provide numerical simulations which confirm our intuition and theoretical analysis.Comment: Accepted to IEEE ICASSP 202

    Guess & Check Codes for Deletions, Insertions, and Synchronization

    No full text
    corecore