15 research outputs found
Parallel Peeling Algorithms
The analysis of several algorithms and data structures can be framed as a
peeling process on a random hypergraph: vertices with degree less than k are
removed until there are no vertices of degree less than k left. The remaining
hypergraph is known as the k-core. In this paper, we analyze parallel peeling
processes, where in each round, all vertices of degree less than k are removed.
It is known that, below a specific edge density threshold, the k-core is empty
with high probability. We show that, with high probability, below this
threshold, only (log log n)/log(k-1)(r-1) + O(1) rounds of peeling are needed
to obtain the empty k-core for r-uniform hypergraphs. Interestingly, we show
that above this threshold, Omega(log n) rounds of peeling are required to find
the non-empty k-core. Since most algorithms and data structures aim to peel to
an empty k-core, this asymmetry appears fortunate. We verify the theoretical
results both with simulation and with a parallel implementation using graphics
processing units (GPUs). Our implementation provides insights into how to
structure parallel peeling algorithms for efficiency in practice.Comment: Appears in SPAA 2014. Minor typo corrections relative to previous
versio
Cache-Oblivious Peeling of Random Hypergraphs
The computation of a peeling order in a randomly generated hypergraph is the
most time-consuming step in a number of constructions, such as perfect hashing
schemes, random -SAT solvers, error-correcting codes, and approximate set
encodings. While there exists a straightforward linear time algorithm, its poor
I/O performance makes it impractical for hypergraphs whose size exceeds the
available internal memory.
We show how to reduce the computation of a peeling order to a small number of
sequential scans and sorts, and analyze its I/O complexity in the
cache-oblivious model. The resulting algorithm requires
I/Os and time to peel a random hypergraph with edges.
We experimentally evaluate the performance of our implementation of this
algorithm in a real-world scenario by using the construction of minimal perfect
hash functions (MPHF) as our test case: our algorithm builds a MPHF of
billion keys in less than hours on a single machine. The resulting data
structure is both more space-efficient and faster than that obtained with the
current state-of-the-art MPHF construction for large-scale key sets
Dense peelable random uniform hypergraphs
We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1.
In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds f_k for peelability of our hypergraphs (f_3 ~ 0.918, f_4 ~ 0.977, f_5 ~ 0.992, ...) are well beyond the corresponding thresholds (c_3 ~ 0.818, c_4 ~ 0.772, c_5 ~ 0.702, ...) of standard k-uniform random hypergraphs.
To get a grip on f_k, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on [0,1]^Z and f_k can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods.
Random hypergraphs underlie the construction of various data structures based on hashing, for instance invertible Bloom filters, perfect hash functions, retrieval data structures, error correcting codes and cuckoo hash tables, where inputs are mapped to edges using hash functions. Frequently, the data structures rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. Memory efficiency is closely tied to edge density while worst and average case query times are tied to maximum and average edge size.
To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al. [Fabiano Cupertino Botelho et al., 2013]. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time. Using k > 3 attains, at small sacrifices in running time, further improvements to memory usage
Simple Set Sketching
Imagine handling collisions in a hash table by storing, in each cell, the
bit-wise exclusive-or of the set of keys hashing there. This appears to be a
terrible idea: For keys and buckets, where is constant,
we expect that a constant fraction of the keys will be unrecoverable due to
collisions.
We show that if this collision resolution strategy is repeated three times
independently the situation reverses: If is below a threshold of
then we can recover the set of all inserted keys in linear time
with high probability.
Even though the description of our data structure is simple, its analysis is
nontrivial. Our approach can be seen as a variant of the Invertible Bloom
Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum
per bucket to decide whether the bucket stores a single key, we exploit the
idea of quotienting, namely that some bits of the key are implicit in the
location where it is stored. We let those serve as an implicit checksum. These
bits are not quite enough to ensure that no errors occur and the main technical
challenge is to show that decoding can recover from these errors.Comment: To be published at SIAM Symposium on Simplicity in Algorithms
(SOSA23
Invertible Bloom Lookup Tables with Listing Guarantees
The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data
structure for set representation that supports a listing operation as the
recovery of the elements in the represented set. Its applications can be found
in network synchronization and traffic monitoring as well as in
error-correction codes. IBLT can list its elements with probability affected by
the size of the allocated memory and the size of the represented set, such that
it can fail with small probability even for relatively small sets. While
previous works only studied the failure probability of IBLT, this work
initiates the worst case analysis of IBLT that guarantees successful listing
for all sets of a certain size. The worst case study is important since the
failure of IBLT imposes high overhead. We describe a novel approach that
guarantees successful listing when the set satisfies a tunable upper bound on
its size. To allow that, we develop multiple constructions that are based on
various coding techniques such as stopping sets and the stopping redundancy of
error-correcting codes, Steiner systems, and covering arrays as well as new
methodologies we develop. We analyze the sizes of IBLTs with listing guarantees
obtained by the various methods as well as their mapping memory consumption.
Lastly, we study lower bounds on the achievable sizes of IBLT with listing
guarantees and verify the results in the paper by simulations