131 research outputs found

### Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound

We introduce synchronization strings as a novel way of efficiently dealing
with synchronization errors, i.e., insertions and deletions. Synchronization
errors are strictly more general and much harder to deal with than commonly
considered half-errors, i.e., symbol corruptions and erasures. For every
$\epsilon >0$, synchronization strings allow to index a sequence with an
$\epsilon^{-O(1)}$ size alphabet such that one can efficiently transform $k$
synchronization errors into $(1+\epsilon)k$ half-errors. This powerful new
technique has many applications. In this paper, we focus on designing insdel
codes, i.e., error correcting block codes (ECCs) for insertion deletion
channels.
While ECCs for both half-errors and synchronization errors have been
intensely studied, the later has largely resisted progress. Indeed, it took
until 1999 for the first insdel codes with constant rate, constant distance,
and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel
codes for asymptotically large or small noise rates were given in 2016 by
Guruswami et al. but these codes are still polynomially far from the optimal
rate-distance tradeoff. This makes the understanding of insdel codes up to this
work equivalent to what was known for regular ECCs after Forney introduced
concatenated codes in his doctoral thesis 50 years ago.
A direct application of our synchronization strings based indexing method
gives a simple black-box construction which transforms any ECC into an equally
efficient insdel code with a slightly larger alphabet size. This instantly
transfers much of the highly developed understanding for regular ECCs over
large constant alphabets into the realm of insdel codes. Most notably, we
obtain efficient insdel codes which get arbitrarily close to the optimal
rate-distance tradeoff given by the Singleton bound for the complete noise
spectrum

### Interactive Channel Capacity Revisited

We provide the first capacity approaching coding schemes that robustly
simulate any interactive protocol over an adversarial channel that corrupts any
$\epsilon$ fraction of the transmitted symbols. Our coding schemes achieve a
communication rate of $1 - O(\sqrt{\epsilon \log \log 1/\epsilon})$ over any
adversarial channel. This can be improved to $1 - O(\sqrt{\epsilon})$ for
random, oblivious, and computationally bounded channels, or if parties have
shared randomness unknown to the channel.
Surprisingly, these rates exceed the $1 - \Omega(\sqrt{H(\epsilon)}) = 1 -
\Omega(\sqrt{\epsilon \log 1/\epsilon})$ interactive channel capacity bound
which [Kol and Raz; STOC'13] recently proved for random errors. We conjecture
$1 - \Theta(\sqrt{\epsilon \log \log 1/\epsilon})$ and $1 -
\Theta(\sqrt{\epsilon})$ to be the optimal rates for their respective settings
and therefore to capture the interactive channel capacity for random and
adversarial errors.
In addition to being very communication efficient, our randomized coding
schemes have multiple other advantages. They are computationally efficient,
extremely natural, and significantly simpler than prior (non-capacity
approaching) schemes. In particular, our protocols do not employ any coding but
allow the original protocol to be performed as-is, interspersed only by short
exchanges of hash values. When hash values do not match, the parties backtrack.
Our approach is, as we feel, by far the simplest and most natural explanation
for why and how robust interactive communication in a noisy environment is
possible

### Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

This paper gives new results for synchronization strings, a powerful
combinatorial object that allows to efficiently deal with insertions and
deletions in various communication settings:
$\bullet$ We give a deterministic, linear time synchronization string
construction, improving over an $O(n^5)$ time randomized construction.
Independently of this work, a deterministic $O(n\log^2\log n)$ time
construction was just put on arXiv by Cheng, Li, and Wu. We also give a
deterministic linear time construction of an infinite synchronization string,
which was not known to be computable before. Both constructions are highly
explicit, i.e., the $i^{th}$ symbol can be computed in $O(\log i)$ time.
$\bullet$ This paper also introduces a generalized notion we call
long-distance synchronization strings that allow for local and very fast
decoding. In particular, only $O(\log^3 n)$ time and access to logarithmically
many symbols is required to decode any index.
We give several applications for these results:
$\bullet$ For any $\delta0$ we provide an insdel correcting
code with rate $1-\delta-\epsilon$ which can correct any $O(\delta)$ fraction
of insdel errors in $O(n\log^3n)$ time. This near linear computational
efficiency is surprising given that we do not even know how to compute the
(edit) distance between the decoding input and output in sub-quadratic time. We
show that such codes can not only efficiently recover from $\delta$ fraction of
insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any
$O(\delta/\log n)$ fraction of block transpositions and replications.
$\bullet$ We show that highly explicitness and local decoding allow for
infinite channel simulations with exponentially smaller memory and decoding
time requirements. These simulations can be used to give the first near linear
time interactive coding scheme for insdel errors

### Near-Linear Time Insertion-Deletion Codes and (1+$\varepsilon$)-Approximating Edit Distance via Indexing

We introduce fast-decodable indexing schemes for edit distance which can be
used to speed up edit distance computations to near-linear time if one of the
strings is indexed by an indexing string $I$. In particular, for every length
$n$ and every $\varepsilon >0$, one can in near linear time construct a string
$I \in \Sigma'^n$ with $|\Sigma'| = O_{\varepsilon}(1)$, such that, indexing
any string $S \in \Sigma^n$, symbol-by-symbol, with $I$ results in a string $S'
\in \Sigma''^n$ where $\Sigma'' = \Sigma \times \Sigma'$ for which edit
distance computations are easy, i.e., one can compute a
$(1+\varepsilon)$-approximation of the edit distance between $S'$ and any other
string in $O(n \text{poly}(\log n))$ time.
Our indexing schemes can be used to improve the decoding complexity of
state-of-the-art error correcting codes for insertions and deletions. In
particular, they lead to near-linear time decoding algorithms for the
insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding
algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi,
Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in
the construction of fast-decodable indexing schemes

### Lower Bounds on the van der Waerden Numbers: Randomized- and Deterministic-Constructive

The van der Waerden number W(k,2) is the smallest integer n such that every
2-coloring of 1 to n has a monochromatic arithmetic progression of length k.
The existence of such an n for any k is due to van der Waerden but known upper
bounds on W(k,2) are enormous. Much effort was put into developing lower bounds
on W(k,2). Most of these lower bound proofs employ the probabilistic method
often in combination with the Lov\'asz Local Lemma. While these proofs show the
existence of a 2-coloring that has no monochromatic arithmetic progression of
length k they provide no efficient algorithm to find such a coloring. These
kind of proofs are often informally called nonconstructive in contrast to
constructive proofs that provide an efficient algorithm.
This paper clarifies these notions and gives definitions for deterministic-
and randomized-constructive proofs as different types of constructive proofs.
We then survey the literature on lower bounds on W(k,2) in this light. We show
how known nonconstructive lower bound proofs based on the Lov\'asz Local Lemma
can be made randomized-constructive using the recent algorithms of Moser and
Tardos. We also use a derandomization of Chandrasekaran, Goyal and Haeupler to
transform these proofs into deterministic-constructive proofs. We provide
greatly simplified and fully self-contained proofs and descriptions for these
algorithms

### Optimal Gossip with Direct Addressing

Gossip algorithms spread information by having nodes repeatedly forward
information to a few random contacts. By their very nature, gossip algorithms
tend to be distributed and fault tolerant. If done right, they can also be fast
and message-efficient. A common model for gossip communication is the random
phone call model, in which in each synchronous round each node can PUSH or PULL
information to or from a random other node. For example, Karp et al. [FOCS
2000] gave algorithms in this model that spread a message to all nodes in
$\Theta(\log n)$ rounds while sending only $O(\log \log n)$ messages per node
on average.
Recently, Avin and Els\"asser [DISC 2013], studied the random phone call
model with the natural and commonly used assumption of direct addressing.
Direct addressing allows nodes to directly contact nodes whose ID (e.g., IP
address) was learned before. They show that in this setting, one can "break the
$\log n$ barrier" and achieve a gossip algorithm running in $O(\sqrt{\log n})$
rounds, albeit while using $O(\sqrt{\log n})$ messages per node.
We study the same model and give a simple gossip algorithm which spreads a
message in only $O(\log \log n)$ rounds. We also prove a matching $\Omega(\log
\log n)$ lower bound which shows that this running time is best possible. In
particular we show that any gossip algorithm takes with high probability at
least $0.99 \log \log n$ rounds to terminate. Lastly, our algorithm can be
tweaked to send only $O(1)$ messages per node on average with only $O(\log n)$
bits per message. Our algorithm therefore simultaneously achieves the optimal
round-, message-, and bit-complexity for this setting. As all prior gossip
algorithms, our algorithm is also robust against failures. In particular, if in
the beginning an oblivious adversary fails any $F$ nodes our algorithm still,
with high probability, informs all but $o(F)$ surviving nodes

- …