93 research outputs found

    Invertible Bloom Lookup Tables with Listing Guarantees

    Full text link
    The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data structure for set representation that supports a listing operation as the recovery of the elements in the represented set. Its applications can be found in network synchronization and traffic monitoring as well as in error-correction codes. IBLT can list its elements with probability affected by the size of the allocated memory and the size of the represented set, such that it can fail with small probability even for relatively small sets. While previous works only studied the failure probability of IBLT, this work initiates the worst case analysis of IBLT that guarantees successful listing for all sets of a certain size. The worst case study is important since the failure of IBLT imposes high overhead. We describe a novel approach that guarantees successful listing when the set satisfies a tunable upper bound on its size. To allow that, we develop multiple constructions that are based on various coding techniques such as stopping sets and the stopping redundancy of error-correcting codes, Steiner systems, and covering arrays as well as new methodologies we develop. We analyze the sizes of IBLTs with listing guarantees obtained by the various methods as well as their mapping memory consumption. Lastly, we study lower bounds on the achievable sizes of IBLT with listing guarantees and verify the results in the paper by simulations

    Invertible Bloom Lookup Tables with Less Memory and Less Randomness

    Full text link
    In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities. IBLTs are highly versatile data structures that have found applications in set reconciliation protocols, error-correcting codes, and even the design of advanced cryptographic primitives. For storing nn elements and ensuring correctness with probability at least 1δ1 - \delta, existing IBLT constructions require Ω(n(log(1/δ)log(n)+1))\Omega(n(\frac{\log(1/\delta)}{\log(n)}+1)) space and they crucially rely on fully random hash functions. We present new constructions of IBLTs that are simultaneously more space efficient and require less randomness. For storing nn elements with a failure probability of at most δ\delta, our data structure only requires O(n+log(1/δ)loglog(1/δ))\mathcal{O}(n + \log(1/\delta)\log\log(1/\delta)) space and O(log(log(n)/δ))\mathcal{O}(\log(\log(n)/\delta))-wise independent hash functions. As a key technical ingredient we show that hashing nn keys with any kk-wise independent hash function h:U[Cn]h:U \to [Cn] for some sufficiently large constant CC guarantees with probability 12Ω(k)1 - 2^{-\Omega(k)} that at least n/2n/2 keys will have a unique hash value. Proving this is highly non-trivial as kk approaches nn. We believe that the techniques used to prove this statement may be of independent interest

    Invertible Bloom Lookup Tables with Less Memory and Randomness

    Get PDF
    In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities. IBLTs are highly versatile data structures that have found applications in set reconciliation protocols, error-correcting codes, and even the design of advanced cryptographic primitives. For storing nn elements and ensuring correctness with probability at least 1δ1 - \delta, existing IBLT constructions require Ω(n(log(1/δ)log(n)+1))\Omega(n(\frac{\log(1/\delta)}{\log(n)}+1)) space and they crucially rely on fully random hash functions. We present new constructions of IBLTs that are simultaneously more space efficient and require less randomness. For storing nn elements with a failure probability of at most δ\delta, our data structure only requires O(n+log(1/δ)loglog(1/δ))\mathcal{O}(n + \log(1/\delta)\log\log(1/\delta)) space and O(log(log(n)/δ))\mathcal{O}(\log(\log(n)/\delta))-wise independent hash functions. As a key technical ingredient we show that hashing nn keys with any kk-wise independent hash function h:U[Cn]h:U \to [Cn] for some sufficiently large constant CC guarantees with probability 12Ω(k)1 - 2^{-\Omega(k)} that at least n/2n/2 keys will have a unique hash value. Proving this is highly non-trivial as kk approaches nn. We believe that the techniques used to prove this statement may be of independent interest

    Multi-Party Set Reconciliation Using Characteristic Polynomials

    Full text link
    In the standard set reconciliation problem, there are two parties A1A_1 and A2A_2, each respectively holding a set of elements S1S_1 and S2S_2. The goal is for both parties to obtain the union S1S2S_1 \cup S_2. In many distributed computing settings the sets may be large but the set difference S1S2+S2S1|S_1-S_2|+|S_2-S_1| is small. In these cases one aims to achieve reconciliation efficiently in terms of communication; ideally, the communication should depend on the size of the set difference, and not on the size of the sets. Recent work has considered generalizations of the reconciliation problem to multi-party settings, using a framework based on a specific type of linear sketch called an Invertible Bloom Lookup Table. Here, we consider multi-party set reconciliation using the alternative framework of characteristic polynomials, which have previously been used for efficient pairwise set reconciliation protocols, and compare their performance with Invertible Bloom Lookup Tables for these problems.Comment: 6 page

    Irregular Invertible Bloom Look-Up Tables

    Get PDF
    We consider invertible Bloom lookup tables (IBLTs) which are probabilistic data structures that allow to store keyvalue pairs. An IBLT supports insertion and deletion of key-value pairs, as well as the recovery of all key-value pairs that have been inserted, as long as the number of key-value pairs stored in the IBLT does not exceed a certain number. The recovery operation on an IBLT can be represented as a peeling process on a bipartite graph. We present a density evolution analysis of IBLTs which allows to predict the maximum number of key-value pairs that can be inserted in the table so that recovery is still successful with high probability. This analysis holds for arbitrary irregular degree distributions and generalizes results in the literature. We complement our analysis by numerical simulations of our own IBLT design which allows to recover a larger number of key-value pairs as state-of-the-art IBLTs of same size.Comment: Accepted for presentation at ISTC 202

    Efficient Reconciliation of Genomic Datasets of High Similarity

    Get PDF
    We apply Invertible Bloom Lookup Tables (IBLTs) to the comparison of k-mer sets originated from large DNA sequence datasets. We show that for similar datasets, IBLTs provide a more space-efficient and, at the same time, more accurate method for estimating Jaccard similarity of underlying k-mer sets, compared to MinHash which is a go-to sketching technique for efficient pairwise similarity estimation. This is achieved by combining IBLTs with k-mer sampling based on syncmers, which constitute a context-independent alternative to minimizers and provide an unbiased estimator of Jaccard similarity. A key property of our method is that involved data structures require space proportional to the difference of k-mer sets and are independent of the size of sets themselves. As another application, we show how our ideas can be applied in order to efficiently compute (an approximation of) k-mers that differ between two datasets, still using space only proportional to their number. We experimentally illustrate our results on both simulated and real data (SARS-CoV-2 and Streptococcus Pneumoniae genomes)
    corecore