Search CORE

93 research outputs found

Invertible Bloom Lookup Tables with Listing Guarantees

Author: Bar-Lev Daniella
Mizrahi Avi
Rottenstreich Ori
Yaakobi Eitan
Publication venue
Publication date: 28/12/2022
Field of study

The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data structure for set representation that supports a listing operation as the recovery of the elements in the represented set. Its applications can be found in network synchronization and traffic monitoring as well as in error-correction codes. IBLT can list its elements with probability affected by the size of the allocated memory and the size of the represented set, such that it can fail with small probability even for relatively small sets. While previous works only studied the failure probability of IBLT, this work initiates the worst case analysis of IBLT that guarantees successful listing for all sets of a certain size. The worst case study is important since the failure of IBLT imposes high overhead. We describe a novel approach that guarantees successful listing when the set satisfies a tunable upper bound on its size. To allow that, we develop multiple constructions that are based on various coding techniques such as stopping sets and the stopping redundancy of error-correcting codes, Steiner systems, and covering arrays as well as new methodologies we develop. We analyze the sizes of IBLTs with listing guarantees obtained by the various methods as well as their mapping memory consumption. Lastly, we study lower bounds on the achievable sizes of IBLT with listing guarantees and verify the results in the paper by simulations

arXiv.org e-Print Archive

Invertible Bloom Lookup Tables with Less Memory and Less Randomness

Author: Fleischhacker Nils
Larsen Kasper Green
Obremski Maciej
Simkin Mark
Publication venue
Publication date: 13/06/2023
Field of study

In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities. IBLTs are highly versatile data structures that have found applications in set reconciliation protocols, error-correcting codes, and even the design of advanced cryptographic primitives. For storing

n

elements and ensuring correctness with probability at least

1 - \delta

, existing IBLT constructions require

\Omega(n(\frac{\log(1/\delta)}{\log(n)}+1))

space and they crucially rely on fully random hash functions. We present new constructions of IBLTs that are simultaneously more space efficient and require less randomness. For storing

n

elements with a failure probability of at most

\delta

, our data structure only requires

\mathcal{O}(n + \log(1/\delta)\log\log(1/\delta))

space and

\mathcal{O}(\log(\log(n)/\delta))

-wise independent hash functions. As a key technical ingredient we show that hashing

n

keys with any

k

-wise independent hash function

h:U \to [Cn]

for some sufficiently large constant

C

guarantees with probability

1 - 2^{-\Omega(k)}

that at least

n/2

keys will have a unique hash value. Proving this is highly non-trivial as

k

approaches

n

. We believe that the techniques used to prove this statement may be of independent interest

arXiv.org e-Print Archive

Invertible Bloom Lookup Tables with Less Memory and Randomness

Author: Kasper Green Larsen
Maciej Obremski
Mark Simkin
Nils Fleischhacker
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 12/06/2023
Field of study

n

elements and ensuring correctness with probability at least

1 - \delta

, existing IBLT constructions require

\Omega(n(\frac{\log(1/\delta)}{\log(n)}+1))

space and they crucially rely on fully random hash functions. We present new constructions of IBLTs that are simultaneously more space efficient and require less randomness. For storing

n

elements with a failure probability of at most

\delta

, our data structure only requires

\mathcal{O}(n + \log(1/\delta)\log\log(1/\delta))

space and

\mathcal{O}(\log(\log(n)/\delta))

-wise independent hash functions. As a key technical ingredient we show that hashing

n

keys with any

k

-wise independent hash function

h:U \to [Cn]

for some sufficiently large constant

C

guarantees with probability

1 - 2^{-\Omega(k)}

that at least

n/2

keys will have a unique hash value. Proving this is highly non-trivial as

k

approaches

n

. We believe that the techniques used to prove this statement may be of independent interest

Cryptology ePrint Archive

Multi-Party Set Reconciliation Using Characteristic Polynomials

Author: Boral Anudhyan
Mitzenmacher Michael
Publication venue
Publication date: 09/10/2014
Field of study

In the standard set reconciliation problem, there are two parties

A_1

and

A_2

, each respectively holding a set of elements

S_1

and

S_2

. The goal is for both parties to obtain the union

S_1 \cup S_2

. In many distributed computing settings the sets may be large but the set difference

|S_1-S_2|+|S_2-S_1|

is small. In these cases one aims to achieve reconciliation efficiently in terms of communication; ideally, the communication should depend on the size of the set difference, and not on the size of the sets. Recent work has considered generalizations of the reconciliation problem to multi-party settings, using a framework based on a specific type of linear sketch called an Invertible Bloom Lookup Table. Here, we consider multi-party set reconciliation using the alternative framework of characteristic polynomials, which have previously been used for efficient pairwise set reconciliation protocols, and compare their performance with Invertible Bloom Lookup Tables for these problems.Comment: 6 page

arXiv.org e-Print Archive

Crossref

Irregular Invertible Bloom Look-Up Tables

Author: Lázaro Francisco
Matuz Balázs
Publication venue
Publication date: 01/01/2021
Field of study

We consider invertible Bloom lookup tables (IBLTs) which are probabilistic data structures that allow to store keyvalue pairs. An IBLT supports insertion and deletion of key-value pairs, as well as the recovery of all key-value pairs that have been inserted, as long as the number of key-value pairs stored in the IBLT does not exceed a certain number. The recovery operation on an IBLT can be represented as a peeling process on a bipartite graph. We present a density evolution analysis of IBLTs which allows to predict the maximum number of key-value pairs that can be inserted in the table so that recovery is still successful with high probability. This analysis holds for arbitrary irregular degree distributions and generalizes results in the literature. We complement our analysis by numerical simulations of our own IBLT design which allows to recover a larger number of key-value pairs as state-of-the-art IBLTs of same size.Comment: Accepted for presentation at ISTC 202

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Efficient Reconciliation of Genomic Datasets of High Similarity

Author: Belazzougui Djamal
Kucherov Gregory
Shibuya Yoshihiro
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
Publication date: 01/01/2022
Field of study

We apply Invertible Bloom Lookup Tables (IBLTs) to the comparison of k-mer sets originated from large DNA sequence datasets. We show that for similar datasets, IBLTs provide a more space-efficient and, at the same time, more accurate method for estimating Jaccard similarity of underlying k-mer sets, compared to MinHash which is a go-to sketching technique for efficient pairwise similarity estimation. This is achieved by combining IBLTs with k-mer sampling based on syncmers, which constitute a context-independent alternative to minimizers and provide an unbiased estimator of Jaccard similarity. A key property of our method is that involved data structures require space proportional to the difference of k-mer sets and are independent of the size of sets themselves. As another application, we show how our ideas can be applied in order to efficiently compute (an approximation of) k-mers that differ between two datasets, still using space only proportional to their number. We experimentally illustrate our results on both simulated and real data (SARS-CoV-2 and Streptococcus Pneumoniae genomes)

Dagstuhl Research Online Publication Server

HAL-Ecole des Ponts ParisTech