928 research outputs found
File Updates Under Random/Arbitrary Insertions And Deletions
A client/encoder edits a file, as modeled by an insertion-deletion (InDel)
process. An old copy of the file is stored remotely at a data-centre/decoder,
and is also available to the client. We consider the problem of throughput- and
computationally-efficient communication from the client to the data-centre, to
enable the server to update its copy to the newly edited file. We study two
models for the source files/edit patterns: the random pre-edit sequence
left-to-right random InDel (RPES-LtRRID) process, and the arbitrary pre-edit
sequence arbitrary InDel (APES-AID) process. In both models, we consider the
regime in which the number of insertions/deletions is a small (but constant)
fraction of the original file. For both models we prove information-theoretic
lower bounds on the best possible compression rates that enable file updates.
Conversely, our compression algorithms use dynamic programming (DP) and entropy
coding, and achieve rates that are approximately optimal.Comment: The paper is an extended version of our paper to be appeared at ITW
201
Scalable string reconciliation by recursive content-dependent shingling
We consider the problem of reconciling similar strings in a distributed system. Specifically, we are interested in performing this reconciliation in an efficient manner, minimizing the communication cost. Our problem applies to several types of large-scale distributed networks, file synchronization utilities, and any system that manages the consistency of string encoded ordered data. We present the novel Recursive Content-Dependent Shingling (RCDS) protocol that can handle large strings and has the communication complexity that scales with the edit distance between the reconciling strings. Also, we provide analysis, experimental results, and comparisons to existing synchronization software such as the Rsync utility with an implementation of our protocol.2019-12-03T00:00:00
Low-Complexity Interactive Algorithms for Synchronization From Deletions, Insertions, and Substitutions
Consider two remote nodes having binary sequences and , respectively.
is an edited version of , where the editing involves random deletions,
insertions, and substitutions, possibly in bursts. The goal is for the node
with to reconstruct with minimal exchange of information over a
noiseless link. The communication is measured in terms of both the total number
of bits exchanged and the number of interactive rounds of communication.
This paper focuses on the setting where the number of edits is
, where is the length of . We first consider the
case where the edits are a mixture of insertions and deletions (indels), and
propose an interactive synchronization algorithm with near-optimal
communication rate and average computational complexity of arithmetic
operations. The algorithm uses interaction to efficiently split the source
sequence into substrings containing exactly one deletion or insertion. Each of
these substrings is then synchronized using an optimal one-way synchronization
code based on the single-deletion correcting channel codes of Varshamov and
Tenengolts (VT codes).
We then build on this synchronization algorithm in three different ways.
First, it is modified to work with a single round of interaction. The reduction
in the number of rounds comes at the expense of higher communication, which is
quantified. Next, we present an extension to the practically important case
where the insertions and deletions may occur in (potentially large) bursts.
Finally, we show how to synchronize the sources to within a target Hamming
distance. This feature can be used to differentiate between substitution and
indel edits. In addition to theoretical performance bounds, we provide several
validating simulation results for the proposed algorithms.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TIT.2015.246663
Block Edit Errors with Transpositions: Deterministic Document Exchange Protocols and Almost Optimal Binary Codes
Document exchange and error correcting codes are two fundamental problems regarding communications. In the first problem, Alice and Bob each holds a string, and the goal is for Alice to send a short sketch to Bob, so that Bob can recover Alice\u27s string. In the second problem, Alice sends a message with some redundant information to Bob through a channel that can add adversarial errors, and the goal is for Bob to correctly recover the message despite the errors. In both problems, an upper bound is placed on the number of errors between the two strings or that the channel can add, and a major goal is to minimize the size of the sketch or the redundant information. In this paper we focus on deterministic document exchange protocols and binary error correcting codes.
Both problems have been studied extensively. In the case of Hamming errors (i.e., bit substitutions) and bit erasures, we have explicit constructions with asymptotically optimal parameters. However, other error types are still rather poorly understood. In a recent work [Kuan Cheng et al., 2018], the authors constructed explicit deterministic document exchange protocols and binary error correcting codes for edit errors with almost optimal parameters. Unfortunately, the constructions in [Kuan Cheng et al., 2018] do not work for other common errors such as block transpositions.
In this paper, we generalize the constructions in [Kuan Cheng et al., 2018] to handle a much larger class of errors. These include bursts of insertions and deletions, as well as block transpositions. Specifically, we consider document exchange and error correcting codes where the total number of block insertions, block deletions, and block transpositions is at most k <= alpha n/log n for some constant 0<alpha<1. In addition, the total number of bits inserted and deleted by the first two kinds of operations is at most t <= beta n for some constant 0<beta<1, where n is the length of Alice\u27s string or message. We construct explicit, deterministic document exchange protocols with sketch size O((k log n +t) log^2 n/{k log n + t}) and explicit binary error correcting code with O(k log n log log log n+t) redundant bits. As a comparison, the information-theoretic optimum for both problems is Theta(k log n+t). As far as we know, previously there are no known explicit deterministic document exchange protocols in this case, and the best known binary code needs Omega(n) redundant bits even to correct just one block transposition [L. J. Schulman and D. Zuckerman, 1999]
- …