Search CORE

2 research outputs found

File Updates Under Random/Arbitrary Insertions And Deletions

Author: Cadambe Viveck
Jaggi Sidharth
Médard Muriel
Schwartz Moshe
Wang Qiwen
Publication venue
Publication date: 27/02/2015
Field of study

A client/encoder edits a file, as modeled by an insertion-deletion (InDel) process. An old copy of the file is stored remotely at a data-centre/decoder, and is also available to the client. We consider the problem of throughput- and computationally-efficient communication from the client to the data-centre, to enable the server to update its copy to the newly edited file. We study two models for the source files/edit patterns: the random pre-edit sequence left-to-right random InDel (RPES-LtRRID) process, and the arbitrary pre-edit sequence arbitrary InDel (APES-AID) process. In both models, we consider the regime in which the number of insertions/deletions is a small (but constant) fraction of the original file. For both models we prove information-theoretic lower bounds on the best possible compression rates that enable file updates. Conversely, our compression algorithms use dynamic programming (DP) and entropy coding, and achieve rates that are approximately optimal.Comment: The paper is an extended version of our paper to be appeared at ITW 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Recommended from our members

Codes for Synchronization in Channels and Sources with Edits

Author: Abroshan Mahed
Publication venue: University of Cambridge
Publication date: 22/07/2019
Field of study

Edit channels are a class of communication channels where the output of the channel is an edited version of the input. The edits are considered to be deletions and insertions. DNA-based data storage system is one of the motivations for this model. This thesis studies various problems related to edit channel and also edit synchronization problem. Varshamov-Tenengolts (VT) codes are first introduced. These codes can correct a single deletion or insertion and have a linear-time decoder. The problem of efficient encoding of the non-binary version of VT codes is addressed, where a simple linear-time encoding method to systematically map binary message sequences onto VT codewords is proposed. Another model that is studied is segmented edit channels, where we have the additional assumption that the channel input sequence is implicitly divided into segments such that at most one edit can occur within a segment. A code construction is proposed for this model based on subsets of VT codes chosen with pre-determined prefxes and/or sufxes. Also an upper bound is derived on the rate of any zero-error code for the segmented edit channel in terms of the segment length. This upper bound shows that the rate scaling of the proposed codes as the segment length increases is the same as that of the maximal code. Edit synchronization is another problem studied in this thesis. In this model, there are two remote nodes (encoder and decoder), each having a binary sequence. The sequence X, available at the encoder, is the updated sequence and diﬀers from Y (available at the decoder) by a small number of edits. The goal is to construct a message M, to be sent via a one-way error-free link, such that the decoder can reconstruct X using M and Y. A coding scheme is devised for this one-way synchronization model. The scheme is based on multiple layers of VT codes combined with oﬀ-the-shelf linear error-correcting codes and uses a list decoder. Motivated by the sequence reconstruction problem from traces in DNA-based storage, the problem of designing codes for the deletion channel when multiple observations (or traces) are available to the decoder is considered. A simple binary and non-binary code is proposed that splits the codeword into blocks and employs a VT code in each block. The availability of multiple traces helps the decoder to identify deletion-free copies of a block, and to avoid mis-synchronization while decoding. The encoding complexity of the proposed scheme is linear in the codeword length; the decoding complexity is linear in the codeword length and quadratic in the number of deletions and the number of traces. The list decoding technique for the proposed code is also considered

Apollo (Cambridge)