Search CORE

8 research outputs found

Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction

Author: Hanna Serge Kas
Publication venue
Publication date: 19/04/2023
Field of study

Consider two or more strings

\mathbf{x}^1,\mathbf{x}^2,\ldots,

that are concatenated to form

\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle

. Suppose that up to

\delta

deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in

\mathbf{x}

in order to recover the boundaries of

\mathbf{x}^1,\mathbf{x}^2,\ldots

? This boundary problem is equivalent to the problem of designing codes that can detect the exact number of deletions in each concatenated string. In this work, we answer the question above by first deriving converse results that give lower bounds on the redundancy of deletion-detecting codes. Then, we present a marker-based code construction whose redundancy is asymptotically optimal in

\delta

among all families of deletion-detecting codes, and exactly optimal among all block-by-block decodable codes. To exemplify the usefulness of such deletion-detecting codes, we apply our code to trace reconstruction and design an efficient coded reconstruction scheme that requires a constant number of traces.Comment: Accepted for publication in the IEEE Transactions on Information Theory. arXiv admin note: substantial text overlap with arXiv:2207.05126, arXiv:2105.0021

arXiv.org e-Print Archive

Guess & Check Codes for Deletions and Synchronization

Author: Hanna Serge Kas
Rouayheb Salim El
Publication venue
Publication date: 27/04/2017
Field of study

We consider the problem of constructing codes that can correct

\delta

deletions occurring in an arbitrary binary string of length

n

bits. Varshamov-Tenengolts (VT) codes can correct all possible single deletions

(\delta=1)

with an asymptotically optimal redundancy. Finding similar codes for

\delta \geq 2

deletions is an open problem. We propose a new family of codes, that we call Guess & Check (GC) codes, that can correct, with high probability, a constant number of deletions

\delta

occurring at uniformly random positions within an arbitrary string. The GC codes are based on MDS codes and have an asymptotically optimal redundancy that is

\Theta(\delta \log n)

. We provide deterministic polynomial time encoding and decoding schemes for these codes. We also describe the applications of GC codes to file synchronization.Comment: Accepted in ISIT 201

arXiv.org e-Print Archive

Crossref

Coding for Trace Reconstruction over Multiple Channels with Vanishing Deletion Probabilities

Author: Hanna Serge Kas
Publication venue
Publication date: 11/07/2022
Field of study

Motivated by DNA-based storage applications, we study the problem of reconstructing a coded sequence from multiple traces. We consider the model where the traces are outputs of independent deletion channels, where each channel deletes each bit of the input codeword

\mathbf{x} \in \{0,1\}^n

independently with probability

p

. We focus on the regime where the deletion probability

p \to 0

when

n\to \infty

. Our main contribution is designing a novel code for trace reconstruction that allows reconstructing a coded sequence efficiently from a constant number of traces. We provide theoretical results on the performance of our code in addition to simulation results where we compare the performance of our code to other reconstruction techniques in terms of the edit distance error.Comment: This is the full version of the short paper accepted at ISIT 202

arXiv.org e-Print Archive

Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

Author: Bitar Rawad
Egger Maximilian
Hanna Serge Kas
Publication venue
Publication date: 17/04/2023
Field of study

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers, that otherwise degrade the benefit of outsourcing the computation. This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm. Previous works proposed to adapt the number of workers to wait for as the algorithm evolves to optimize the speed of convergence. In contrast, we model the communication and computation times using independent random variables. Considering this model, we construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm. Consequently, we improve the convergence speed of distributed SGD while significantly reducing the computation load, at the expense of a slight increase in communication load

arXiv.org e-Print Archive

Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

Author: Bitar Rawad
Dasari Venkat
Hanna Serge Kas
Parag Parimal
Rouayheb Salim El
Publication venue
Publication date: 25/02/2020
Field of study

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on

n

workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest

k<n

workers before updating the model, where

k

is a fixed parameter. The choice of the value of

k

presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive

k

. We first design an adaptive policy for varying

k

that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time which we derive. Then, we propose an algorithm for adaptive distributed SGD that is based on a statistical heuristic. We implement our algorithm and provide numerical simulations which confirm our intuition and theoretical analysis.Comment: Accepted to IEEE ICASSP 202

arXiv.org e-Print Archive

Guess & Check Codes for Deletions, Insertions, and Synchronization

Author: Salim El Rouayheb
Serge Kas Hanna
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref