2 research outputs found
File Updates Under Random/Arbitrary Insertions And Deletions
A client/encoder edits a file, as modeled by an insertion-deletion (InDel)
process. An old copy of the file is stored remotely at a data-centre/decoder,
and is also available to the client. We consider the problem of throughput- and
computationally-efficient communication from the client to the data-centre, to
enable the server to update its copy to the newly edited file. We study two
models for the source files/edit patterns: the random pre-edit sequence
left-to-right random InDel (RPES-LtRRID) process, and the arbitrary pre-edit
sequence arbitrary InDel (APES-AID) process. In both models, we consider the
regime in which the number of insertions/deletions is a small (but constant)
fraction of the original file. For both models we prove information-theoretic
lower bounds on the best possible compression rates that enable file updates.
Conversely, our compression algorithms use dynamic programming (DP) and entropy
coding, and achieve rates that are approximately optimal.Comment: The paper is an extended version of our paper to be appeared at ITW
201
Recommended from our members
Codes for Synchronization in Channels and Sources with Edits
Edit channels are a class of communication channels where the output of the channel is
an edited version of the input. The edits are considered to be deletions and insertions.
DNA-based data storage system is one of the motivations for this model. This thesis
studies various problems related to edit channel and also edit synchronization problem.
Varshamov-Tenengolts (VT) codes are first introduced. These codes can correct a
single deletion or insertion and have a linear-time decoder. The problem of efficient
encoding of the non-binary version of VT codes is addressed, where a simple linear-time
encoding method to systematically map binary message sequences onto VT codewords
is proposed.
Another model that is studied is segmented edit channels, where we have the
additional assumption that the channel input sequence is implicitly divided into
segments such that at most one edit can occur within a segment. A code construction
is proposed for this model based on subsets of VT codes chosen with pre-determined
prefxes and/or sufxes. Also an upper bound is derived on the rate of any zero-error
code for the segmented edit channel in terms of the segment length. This upper bound
shows that the rate scaling of the proposed codes as the segment length increases is
the same as that of the maximal code.
Edit synchronization is another problem studied in this thesis. In this model, there
are two remote nodes (encoder and decoder), each having a binary sequence. The
sequence X, available at the encoder, is the updated sequence and differs from Y
(available at the decoder) by a small number of edits. The goal is to construct a message
M, to be sent via a one-way error-free link, such that the decoder can reconstruct X
using M and Y. A coding scheme is devised for this one-way synchronization model.
The scheme is based on multiple layers of VT codes combined with off-the-shelf linear
error-correcting codes and uses a list decoder.
Motivated by the sequence reconstruction problem from traces in DNA-based storage, the problem of designing codes for the deletion channel when multiple observations
(or traces) are available to the decoder is considered. A simple binary and non-binary
code is proposed that splits the codeword into blocks and employs a VT code in each
block. The availability of multiple traces helps the decoder to identify deletion-free
copies of a block, and to avoid mis-synchronization while decoding. The encoding
complexity of the proposed scheme is linear in the codeword length; the decoding
complexity is linear in the codeword length and quadratic in the number of deletions
and the number of traces. The list decoding technique for the proposed code is also
considered