10 research outputs found
Efficient File Synchronization: a Distributed Source Coding Approach
The problem of reconstructing a source sequence with the presence of decoder
side-information that is mis-synchronized to the source due to deletions is
studied in a distributed source coding framework. Motivated by practical
applications, the deletion process is assumed to be bursty and is modeled by a
Markov chain. The minimum rate needed to reconstruct the source sequence with
high probability is characterized in terms of an information theoretic
expression, which is interpreted as the amount of information of the deleted
content and the locations of deletions, subtracting "nature's secret", that is,
the uncertainty of the locations given the source and side-information. For
small bursty deletion probability, the asymptotic expansion of the minimum rate
is computed.Comment: 9 pages, 2 figures. A shorter version will appear in IEEE
International Symposium on Information Theory (ISIT), 201
On the Combinatorial Version of the Slepian-Wolf Problem
We study the following combinatorial version of the Slepian-Wolf coding
scheme. Two isolated Senders are given binary strings and respectively;
the length of each string is equal to , and the Hamming distance between the
strings is at most . The Senders compress their strings and
communicate the results to the Receiver. Then the Receiver must reconstruct
both strings and . The aim is to minimise the lengths of the transmitted
messages.
For an asymmetric variant of this problem (where one of the Senders transmits
the input string to the Receiver without compression) with deterministic
encoding a nontrivial lower bound was found by A.Orlitsky and K.Viswanathany.
In our paper we prove a new lower bound for the schemes with syndrome coding,
where at least one of the Senders uses linear encoding of the input string.
For the combinatorial Slepian-Wolf problem with randomized encoding the
theoretical optimum of communication complexity was recently found by the first
author, though effective protocols with optimal lengths of messages remained
unknown. We close this gap and present a polynomial time randomized protocol
that achieves the optimal communication complexity.Comment: 20 pages, 14 figures. Accepted to IEEE Transactions on Information
Theory (June 2018
Communication Cost for Updating Linear Functions when Message Updates are Sparse: Connections to Maximally Recoverable Codes
We consider a communication problem in which an update of the source message
needs to be conveyed to one or more distant receivers that are interested in
maintaining specific linear functions of the source message. The setting is one
in which the updates are sparse in nature, and where neither the source nor the
receiver(s) is aware of the exact {\em difference vector}, but only know the
amount of sparsity that is present in the difference-vector. Under this
setting, we are interested in devising linear encoding and decoding schemes
that minimize the communication cost involved. We show that the optimal
solution to this problem is closely related to the notion of maximally
recoverable codes (MRCs), which were originally introduced in the context of
coding for storage systems. In the context of storage, MRCs guarantee optimal
erasure protection when the system is partially constrained to have local
parity relations among the storage nodes. In our problem, we show that optimal
solutions exist if and only if MRCs of certain kind (identified by the desired
linear functions) exist. We consider point-to-point and broadcast versions of
the problem, and identify connections to MRCs under both these settings. For
the point-to-point setting, we show that our linear-encoder based achievable
scheme is optimal even when non-linear encoding is permitted. The theory is
illustrated in the context of updating erasure coded storage nodes. We present
examples based on modern storage codes such as the minimum bandwidth
regenerating codes.Comment: To Appear in IEEE Transactions on Information Theor
The Computational Power of Distributed Shared-Memory Models with Bounded-Size Registers
The celebrated Asynchronous Computability Theorem of Herlihy and Shavit (STOC
1993 and STOC 1994) provided a topological characterization of the tasks that
are solvable in a distributed system where processes are communicating by
writing and reading shared registers, and where any number of processes can
fail by crashing. However, this characterization assumes the use of
full-information protocols, that is, protocols in which each time any of the
processes writes in the shared memory, it communicates everything it learned
since the beginning of the execution. Thus, the characterization implicitly
assumes that each register in the shared memory is of unbounded size. Whether
unbounded size registers are unavoidable for the model of computation to be
universal is the central question studied in this paper. Specifically, is any
task that is solvable using unbounded registers solvable using registers of
bounded size? More generally, when at most processes can crash, is the
model with bounded size registers universal? These are the questions answered
in this paper
Lossless Differential Compression for Synchronizing Arbitrary Single-Dimensional Strings
Differential compression allows expressing a modified document as differences relative to another version of the document. A compressed string requires space relative to amount of changes, irrespective of original document sizes. The purpose of this study was to answer what algorithms are suitable for universal lossless differential compression for synchronizing two arbitrary documents either locally or remotely.
Two main problems in differential compression are finding the differences (differencing), and compactly communicating the differences (encoding). We discussed local differencing algorithms based on subsequence searching, hashtable lookups, suffix searching, and projection. We also discussed probabilistic remote algorithms based on both recursive comparison and characteristic polynomial interpolation of hashes computed from variable-length content-defined substrings. We described various heuristics for approximating optimal algorithms as arbitrary long strings and memory limitations force discarding information. Discussion also included compact delta encoding and in-place reconstruction. We presented results from empirical testing using discussed algorithms.
The conclusions were that multiple algorithms need to be integrated into a hybrid implementation, which heuristically chooses algorithms based on evaluation of the input data. Algorithms based on hashtable lookups are faster on average and require less memory, but algorithms based on suffix searching find least differences. Interpolating characteristic polynomials was found to be too slow for general use. With remote hash comparison, content-defined chunks and recursive comparison can reduce protocol overhead. A differential compressor should be merged with a state-of-art non-differential compressor to enable more compact delta encoding. Input should be processed multiple times to allow constant a space bound without significant reduction in compression efficiency. Compression efficiently of current popular synchronizers could be improved, as our empiral testing showed that a non-differential compressor produced smaller files without having access to one of the two strings
One-way communication and error-correcting codes
Abstract—We establish a further connection between one-way communication where a sender conveys information to a receiver who has related information, and error-correction coding where a sender attempts to communicate reliably over a noisy channel. Using this connection we obtain three results on the two problems. We derive an often-tight lower bound on the number of bits required for one-way communication based on the largest code for the corresponding error-correction problem. We construct an error-correcting code whose minimum distance properties are similar to those of Bose–Chaudhuri–Hocquenghem (BCH) codes based on a one-way communication protocol for set reconciliation. Finally, we prove that one-way communication is suboptimal for a large class of Hamming-distance problems. Index Terms—Error correction, interactive communication, Slepian– Wolf theorem, zero-error information theory