4,599,452 research outputs found
Error-Correcting Data Structures
We study data structures in the presence of adversarial noise. We want to
encode a given object in a succinct data structure that enables us to
efficiently answer specific queries about the object, even if the data
structure has been corrupted by a constant fraction of errors. This new model
is the common generalization of (static) data structures and locally decodable
error-correcting codes. The main issue is the tradeoff between the space used
by the data structure and the time (number of probes) needed to answer a query
about the encoded object. We prove a number of upper and lower bounds on
various natural error-correcting data structure problems. In particular, we
show that the optimal length of error-correcting data structures for the
Membership problem (where we want to store subsets of size s from a universe of
size n) is closely related to the optimal length of locally decodable codes for
s-bit strings.Comment: 15 pages LaTeX; an abridged version will appear in the Proceedings of
the STACS 2009 conferenc
Optimal Joins Using Compact Data Structures
Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality we either need to build completely new indexes, or we must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that may be non-negligible.
We show that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage. Our representation is a compact quadtree for the static indexes, and a dynamic quadtree sharing subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, and show that the running time of this algorithm is worst-case optimal in data complexity. Remarkably, we can extend our framework to evaluate more expressive queries from relational algebra by introducing a lazy version of qdags (lqdags). Once again, we can show that the running time of our algorithms is worst-case optimal
Compressed Data Structures for Dynamic Sequences
We consider the problem of storing a dynamic string over an alphabet
in compressed form. Our representation
supports insertions and deletions of symbols and answers three fundamental
queries: returns the -th symbol in ,
counts how many times a symbol occurs among the
first positions in , and finds the position
where a symbol occurs for the -th time. We present the first
fully-dynamic data structure for arbitrarily large alphabets that achieves
optimal query times for all three operations and supports updates with
worst-case time guarantees. Ours is also the first fully-dynamic data structure
that needs only bits, where is the -th order
entropy and is the string length. Moreover our representation supports
extraction of a substring in optimal time
Composite repetition-aware data structures
In highly repetitive strings, like collections of genomes from the same
species, distinct measures of repetition all grow sublinearly in the length of
the text, and indexes targeted to such strings typically depend only on one of
these measures. We describe two data structures whose size depends on multiple
measures of repetition at once, and that provide competitive tradeoffs between
the time for counting and reporting all the exact occurrences of a pattern, and
the space taken by the structure. The key component of our constructions is the
run-length encoded BWT (RLBWT), which takes space proportional to the number of
BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it
with data structures from LZ77 indexes, which take space proportional to the
number of LZ77 factors, and with the compact directed acyclic word graph
(CDAWG), which takes space proportional to the number of extensions of maximal
repeats. The combination of CDAWG and RLBWT enables also a new representation
of the suffix tree, whose size depends again on the number of extensions of
maximal repeats, and that is powerful enough to support matching statistics and
constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from
previous version
Concurrent Data Structures Linked in Time
Arguments about correctness of a concurrent data structure are typically
carried out by using the notion of linearizability and specifying the
linearization points of the data structure's procedures. Such arguments are
often cumbersome as the linearization points' position in time can be dynamic
(depend on the interference, run-time values and events from the past, or even
future), non-local (appear in procedures other than the one considered), and
whose position in the execution trace may only be determined after the
considered procedure has already terminated.
In this paper we propose a new method, based on a separation-style logic, for
reasoning about concurrent objects with such linearization points. We embrace
the dynamic nature of linearization points, and encode it as part of the data
structure's auxiliary state, so that it can be dynamically modified in place by
auxiliary code, as needed when some appropriate run-time event occurs. We name
the idea linking-in-time, because it reduces temporal reasoning to spatial
reasoning. For example, modifying a temporal position of a linearization point
can be modeled similarly to a pointer update in separation logic. Furthermore,
the auxiliary state provides a convenient way to concisely express the
properties essential for reasoning about clients of such concurrent objects. We
illustrate the method by verifying (mechanically in Coq) an intricate optimal
snapshot algorithm due to Jayanti, as well as some clients
Predicate Abstraction for Linked Data Structures
We present Alias Refinement Types (ART), a new approach to the verification
of correctness properties of linked data structures. While there are many
techniques for checking that a heap-manipulating program adheres to its
specification, they often require that the programmer annotate the behavior of
each procedure, for example, in the form of loop invariants and pre- and
post-conditions. Predicate abstraction would be an attractive abstract domain
for performing invariant inference, existing techniques are not able to reason
about the heap with enough precision to verify functional properties of data
structure manipulating programs. In this paper, we propose a technique that
lifts predicate abstraction to the heap by factoring the analysis of data
structures into two orthogonal components: (1) Alias Types, which reason about
the physical shape of heap structures, and (2) Refinement Types, which use
simple predicates from an SMT decidable theory to capture the logical or
semantic properties of the structures. We prove ART sound by translating types
into separation logic assertions, thus translating typing derivations in ART
into separation logic proofs. We evaluate ART by implementing a tool that
performs type inference for an imperative language, and empirically show, using
a suite of data-structure benchmarks, that ART requires only 21% of the
annotations needed by other state-of-the-art verification techniques
- …
