3,833 research outputs found
Efficient and Error-Correcting Data Structures for Membership and Polynomial Evaluation
We construct efficient data structures that are resilient against a constant
fraction of adversarial noise. Our model requires that the decoder answers most
queries correctly with high probability and for the remaining queries, the
decoder with high probability either answers correctly or declares "don't
know." Furthermore, if there is no noise on the data structure, it answers all
queries correctly with high probability. Our model is the common generalization
of a model proposed recently by de Wolf and the notion of "relaxed locally
decodable codes" developed in the PCP literature.
We measure the efficiency of a data structure in terms of its length,
measured by the number of bits in its representation, and query-answering time,
measured by the number of bit-probes to the (possibly corrupted)
representation. In this work, we study two data structure problems: membership
and polynomial evaluation. We show that these two problems have constructions
that are simultaneously efficient and error-correcting.Comment: An abridged version of this paper appears in STACS 201
Error-Correcting Data Structures
We study data structures in the presence of adversarial noise. We want to
encode a given object in a succinct data structure that enables us to
efficiently answer specific queries about the object, even if the data
structure has been corrupted by a constant fraction of errors. This new model
is the common generalization of (static) data structures and locally decodable
error-correcting codes. The main issue is the tradeoff between the space used
by the data structure and the time (number of probes) needed to answer a query
about the encoded object. We prove a number of upper and lower bounds on
various natural error-correcting data structure problems. In particular, we
show that the optimal length of error-correcting data structures for the
Membership problem (where we want to store subsets of size s from a universe of
size n) is closely related to the optimal length of locally decodable codes for
s-bit strings.Comment: 15 pages LaTeX; an abridged version will appear in the Proceedings of
the STACS 2009 conferenc
The Cell Probe Complexity of Succinct Data Structures
In the cell probe model with word size 1 (the bit probe model), a
static data structure problem is given by a map
,
where is a set of possible data to be stored,
is a set of possible queries (for natural problems, we
have ) and is
the answer to question about data .
A solution is given by a
representation and a query algorithm
so that . The time of the query algorithm
is the number of bits it reads in .
In this paper, we consider the case of {em succinct} representations
where for some {em redundancy} .
For
a boolean version of the problem of polynomial
evaluation with preprocessing of coefficients, we show a lower bound on
the redundancy-query time tradeoff of the form
[ (r+1) t geq Omega(n/log n).]
In particular, for very small
redundancies , we get an almost optimal lower bound stating that the
query algorithm has to inspect almost the entire data structure
(up to a logarithmic factor).
We show similar lower bounds for problems satisfying a certain
combinatorial property of a coding theoretic flavor.
Previously, no lower bounds were known on
in the general model for explicit functions, even for very small
redundancies.
By restricting our attention to {em systematic} or {em index}
structures satisfying for some
map (where denotes concatenation) we show
similar lower bounds on the redundancy-query time tradeoff
for the natural data structuring problems of Prefix Sum
and Substring Search
Passive network tomography for erroneous networks: A network coding approach
Passive network tomography uses end-to-end observations of network
communication to characterize the network, for instance to estimate the network
topology and to localize random or adversarial glitches. Under the setting of
linear network coding this work provides a comprehensive study of passive
network tomography in the presence of network (random or adversarial) glitches.
To be concrete, this work is developed along two directions: 1. Tomographic
upper and lower bounds (i.e., the most adverse conditions in each problem
setting under which network tomography is possible, and corresponding schemes
(computationally efficient, if possible) that achieve this performance) are
presented for random linear network coding (RLNC). We consider RLNC designed
with common randomness, i.e., the receiver knows the random code-books all
nodes. (To justify this, we show an upper bound for the problem of topology
estimation in networks using RLNC without common randomness.) In this setting
we present the first set of algorithms that characterize the network topology
exactly. Our algorithm for topology estimation with random network errors has
time complexity that is polynomial in network parameters. For the problem of
network error localization given the topology information, we present the first
computationally tractable algorithm to localize random errors, and prove it is
computationally intractable to localize adversarial errors. 2. New network
coding schemes are designed that improve the tomographic performance of RLNC
while maintaining the desirable low-complexity, throughput-optimal, distributed
linear network coding properties of RLNC. In particular, we design network
codes based on Reed-Solomon codes so that a maximal number of adversarial
errors can be localized in a computationally efficient manner even without the
information of network topology.Comment: 40 pages, under submission for IEEE Trans. on Information Theor
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before
The streaming -mismatch problem
We consider the streaming complexity of a fundamental task in approximate
pattern matching: the -mismatch problem. It asks to compute Hamming
distances between a pattern of length and all length- substrings of a
text for which the Hamming distance does not exceed a given threshold . In
our problem formulation, we report not only the Hamming distance but also, on
demand, the full \emph{mismatch information}, that is the list of mismatched
pairs of symbols and their indices. The twin challenges of streaming pattern
matching derive from the need both to achieve small working space and also to
guarantee that every arriving input symbol is processed quickly.
We present a streaming algorithm for the -mismatch problem which uses
bits of space and spends \ourcomplexity time on
each symbol of the input stream, which consists of the pattern followed by the
text. The running time almost matches the classic offline solution and the
space usage is within a logarithmic factor of optimal.
Our new algorithm therefore effectively resolves and also extends an open
problem first posed in FOCS'09. En route to this solution, we also give a
deterministic -bit encoding of all
the alignments with Hamming distance at most of a length- pattern within
a text of length . This secondary result provides an optimal solution to
a natural communication complexity problem which may be of independent
interest.Comment: 27 page
- …