93 research outputs found
Invertible Bloom Lookup Tables with Listing Guarantees
The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data
structure for set representation that supports a listing operation as the
recovery of the elements in the represented set. Its applications can be found
in network synchronization and traffic monitoring as well as in
error-correction codes. IBLT can list its elements with probability affected by
the size of the allocated memory and the size of the represented set, such that
it can fail with small probability even for relatively small sets. While
previous works only studied the failure probability of IBLT, this work
initiates the worst case analysis of IBLT that guarantees successful listing
for all sets of a certain size. The worst case study is important since the
failure of IBLT imposes high overhead. We describe a novel approach that
guarantees successful listing when the set satisfies a tunable upper bound on
its size. To allow that, we develop multiple constructions that are based on
various coding techniques such as stopping sets and the stopping redundancy of
error-correcting codes, Steiner systems, and covering arrays as well as new
methodologies we develop. We analyze the sizes of IBLTs with listing guarantees
obtained by the various methods as well as their mapping memory consumption.
Lastly, we study lower bounds on the achievable sizes of IBLT with listing
guarantees and verify the results in the paper by simulations
Invertible Bloom Lookup Tables with Less Memory and Less Randomness
In this work we study Invertible Bloom Lookup Tables (IBLTs) with small
failure probabilities. IBLTs are highly versatile data structures that have
found applications in set reconciliation protocols, error-correcting codes, and
even the design of advanced cryptographic primitives. For storing elements
and ensuring correctness with probability at least , existing IBLT
constructions require space and
they crucially rely on fully random hash functions.
We present new constructions of IBLTs that are simultaneously more space
efficient and require less randomness. For storing elements with a failure
probability of at most , our data structure only requires
space and
-wise independent hash functions.
As a key technical ingredient we show that hashing keys with any -wise
independent hash function for some sufficiently large constant
guarantees with probability that at least keys
will have a unique hash value. Proving this is highly non-trivial as
approaches . We believe that the techniques used to prove this statement may
be of independent interest
Invertible Bloom Lookup Tables with Less Memory and Randomness
In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities. IBLTs are highly versatile data structures that have found applications in set reconciliation protocols, error-correcting codes, and even the design of advanced cryptographic primitives. For storing elements and ensuring correctness with probability at least , existing IBLT constructions require space and they crucially rely on fully random hash functions.
We present new constructions of IBLTs that are simultaneously more space efficient and require less randomness. For storing elements with a failure probability of at most , our data structure only requires space and -wise independent hash functions.
As a key technical ingredient we show that hashing keys with any -wise independent hash function for some sufficiently large constant guarantees with probability that at least keys will have a unique hash value. Proving this is highly non-trivial as approaches . We believe that the techniques used to prove this statement may be of independent interest
Multi-Party Set Reconciliation Using Characteristic Polynomials
In the standard set reconciliation problem, there are two parties and
, each respectively holding a set of elements and . The goal is
for both parties to obtain the union . In many distributed
computing settings the sets may be large but the set difference
is small. In these cases one aims to achieve
reconciliation efficiently in terms of communication; ideally, the
communication should depend on the size of the set difference, and not on the
size of the sets.
Recent work has considered generalizations of the reconciliation problem to
multi-party settings, using a framework based on a specific type of linear
sketch called an Invertible Bloom Lookup Table. Here, we consider multi-party
set reconciliation using the alternative framework of characteristic
polynomials, which have previously been used for efficient pairwise set
reconciliation protocols, and compare their performance with Invertible Bloom
Lookup Tables for these problems.Comment: 6 page
Irregular Invertible Bloom Look-Up Tables
We consider invertible Bloom lookup tables (IBLTs) which are probabilistic
data structures that allow to store keyvalue pairs. An IBLT supports insertion
and deletion of key-value pairs, as well as the recovery of all key-value pairs
that have been inserted, as long as the number of key-value pairs stored in the
IBLT does not exceed a certain number. The recovery operation on an IBLT can be
represented as a peeling process on a bipartite graph. We present a density
evolution analysis of IBLTs which allows to predict the maximum number of
key-value pairs that can be inserted in the table so that recovery is still
successful with high probability. This analysis holds for arbitrary irregular
degree distributions and generalizes results in the literature. We complement
our analysis by numerical simulations of our own IBLT design which allows to
recover a larger number of key-value pairs as state-of-the-art IBLTs of same
size.Comment: Accepted for presentation at ISTC 202
Efficient Reconciliation of Genomic Datasets of High Similarity
We apply Invertible Bloom Lookup Tables (IBLTs) to the comparison of k-mer sets originated from large DNA sequence datasets. We show that for similar datasets, IBLTs provide a more space-efficient and, at the same time, more accurate method for estimating Jaccard similarity of underlying k-mer sets, compared to MinHash which is a go-to sketching technique for efficient pairwise similarity estimation. This is achieved by combining IBLTs with k-mer sampling based on syncmers, which constitute a context-independent alternative to minimizers and provide an unbiased estimator of Jaccard similarity. A key property of our method is that involved data structures require space proportional to the difference of k-mer sets and are independent of the size of sets themselves. As another application, we show how our ideas can be applied in order to efficiently compute (an approximation of) k-mers that differ between two datasets, still using space only proportional to their number. We experimentally illustrate our results on both simulated and real data (SARS-CoV-2 and Streptococcus Pneumoniae genomes)
- …