5,907 research outputs found
Guess & Check Codes for Deletions and Synchronization
We consider the problem of constructing codes that can correct
deletions occurring in an arbitrary binary string of length bits.
Varshamov-Tenengolts (VT) codes can correct all possible single deletions
with an asymptotically optimal redundancy. Finding similar codes
for deletions is an open problem. We propose a new family of
codes, that we call Guess & Check (GC) codes, that can correct, with high
probability, a constant number of deletions occurring at uniformly
random positions within an arbitrary string. The GC codes are based on MDS
codes and have an asymptotically optimal redundancy that is . We provide deterministic polynomial time encoding and decoding schemes for
these codes. We also describe the applications of GC codes to file
synchronization.Comment: Accepted in ISIT 201
Deletion codes in the high-noise and high-rate regimes
The noise model of deletions poses significant challenges in coding theory,
with basic questions like the capacity of the binary deletion channel still
being open. In this paper, we study the harder model of worst-case deletions,
with a focus on constructing efficiently decodable codes for the two extreme
regimes of high-noise and high-rate. Specifically, we construct polynomial-time
decodable codes with the following trade-offs (for any eps > 0):
(1) Codes that can correct a fraction 1-eps of deletions with rate poly(eps)
over an alphabet of size poly(1/eps);
(2) Binary codes of rate 1-O~(sqrt(eps)) that can correct a fraction eps of
deletions; and
(3) Binary codes that can be list decoded from a fraction (1/2-eps) of
deletions with rate poly(eps)
Our work is the first to achieve the qualitative goals of correcting a
deletion fraction approaching 1 over bounded alphabets, and correcting a
constant fraction of bit deletions with rate aproaching 1. The above results
bring our understanding of deletion code constructions in these regimes to a
similar level as worst-case errors
A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics
From the output produced by a memoryless deletion channel from a uniformly
random input of known length , one obtains a posterior distribution on the
channel input. The difference between the Shannon entropy of this distribution
and that of the uniform prior measures the amount of information about the
channel input which is conveyed by the output of length , and it is natural
to ask for which outputs this is extremized. This question was posed in a
previous work, where it was conjectured on the basis of experimental data that
the entropy of the posterior is minimized and maximized by the constant strings
and and the alternating strings
and respectively. In the present
work we confirm the minimization conjecture in the asymptotic limit using
results from hidden word statistics. We show how the analytic-combinatorial
methods of Flajolet, Szpankowski and Vall\'ee for dealing with the hidden
pattern matching problem can be applied to resolve the case of fixed output
length and , by obtaining estimates for the entropy in
terms of the moments of the posterior distribution and establishing its
minimization via a measure of autocorrelation.Comment: 11 pages, 2 figure
- …