The computation of a peeling order in a randomly generated hypergraph is the
most time-consuming step in a number of constructions, such as perfect hashing
schemes, random r-SAT solvers, error-correcting codes, and approximate set
encodings. While there exists a straightforward linear time algorithm, its poor
I/O performance makes it impractical for hypergraphs whose size exceeds the
available internal memory.
We show how to reduce the computation of a peeling order to a small number of
sequential scans and sorts, and analyze its I/O complexity in the
cache-oblivious model. The resulting algorithm requires O(sort(n))
I/Os and O(nlogn) time to peel a random hypergraph with n edges.
We experimentally evaluate the performance of our implementation of this
algorithm in a real-world scenario by using the construction of minimal perfect
hash functions (MPHF) as our test case: our algorithm builds a MPHF of 7.6
billion keys in less than 21 hours on a single machine. The resulting data
structure is both more space-efficient and faster than that obtained with the
current state-of-the-art MPHF construction for large-scale key sets