1 research outputs found
Exact Reconstruction from Insertions in Synchronization Codes
This work studies problems in data reconstruction, an important area with
numerous applications. In particular, we examine the reconstruction of binary
and non-binary sequences from synchronization (insertion/deletion-correcting)
codes. These sequences have been corrupted by a fixed number of symbol
insertions (larger than the minimum edit distance of the code), yielding a
number of distinct traces to be used for reconstruction. We wish to know the
minimum number of traces needed for exact reconstruction. This is a general
version of a problem tackled by Levenshtein for uncoded sequences.
We introduce an exact formula for the maximum number of common supersequences
shared by sequences at a certain edit distance, yielding an upper bound on the
number of distinct traces necessary to guarantee exact reconstruction. Without
specific knowledge of the codewords, this upper bound is tight. We apply our
results to the famous single deletion/insertion-correcting Varshamov-Tenengolts
(VT) codes and show that a significant number of VT codeword pairs achieve the
worst-case number of outputs needed for exact reconstruction. We also consider
extensions to other channels, such as adversarial deletion and
insertion/deletion channels and probabilistic channels.Comment: 18 pages, 3 figures. Accepted to IEEE Transactions on Information
Theor