Search CORE

1,055 research outputs found

Fast Scalable Construction of (Minimal Perfect Hash) Functions

Author: A Goerdt
AM Frieze
AM Odlyzko
BA LaMacchia
BS Majewski
D Belazzougui
D Belazzougui
D Belazzougui
D Belazzougui
FC Botelho
M Aumüller
M Dietzfelbinger
M Dietzfelbinger
N Fountoulakis
Publication venue
Publication date: 22/03/2016
Field of study

Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03

arXiv.org e-Print Archive

Crossref

Cache-Oblivious Peeling of Random Hypergraphs

Author: Belazzougui Djamal
Boldi Paolo
Ottaviano Giuseppe
Venturini Rossano
Vigna Sebastiano
Publication venue
Publication date: 02/12/2013
Field of study

The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random

r

-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires

O(\mathrm{sort}(n))

I/Os and

O(n \log n)

time to peel a random hypergraph with

n

edges. We experimentally evaluate the performance of our implementation of this algorithm in a real-world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of

7.6

billion keys in less than

21

hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

Archivio della Ricerca - Università di Pisa

An Epitome of Multi Secret Sharing Schemes for General Access Structure

Author: Binu V P
Sreekumar A
Publication venue
Publication date: 21/06/2014
Field of study

Secret sharing schemes are widely used now a days in various applications, which need more security, trust and reliability. In secret sharing scheme, the secret is divided among the participants and only authorized set of participants can recover the secret by combining their shares. The authorized set of participants are called access structure of the scheme. In Multi-Secret Sharing Scheme (MSSS), k different secrets are distributed among the participants, each one according to an access structure. Multi-secret sharing schemes have been studied extensively by the cryptographic community. Number of schemes are proposed for the threshold multi-secret sharing and multi-secret sharing according to generalized access structure with various features. In this survey we explore the important constructions of multi-secret sharing for the generalized access structure with their merits and demerits. The features like whether shares can be reused, participants can be enrolled or dis-enrolled efficiently, whether shares have to modified in the renewal phase etc., are considered for the evaluation

arXiv.org e-Print Archive

CiteSeerX

The parameterised complexity of counting connected subgraphs and graph motifs

Author: Jerrum Mark
Meeks Kitty
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

We introduce a family of parameterised counting problems on graphs, p-#Induced Subgraph With Property(Φ), which generalises a number of problems which have previously been studied. This paper focuses on the case in which Φ defines a family of graphs whose edge-minimal elements all have bounded treewidth; this includes the special case in which Φ describes the property of being connected. We show that exactly counting the number of connected induced k-vertex subgraphs in an n-vertex graph is #W[1]-hard, but on the other hand there exists an FPTRAS for the problem; more generally, we show that there exists an FPTRAS for p-#Induced Subgraph With Property(Φ) whenever Φ is monotone and all the minimal graphs satisfying Φ have bounded treewidth. We then apply these results to a counting version of the Graph Motif problem

arXiv.org e-Print Archive

CiteSeerX

Crossref

Enlighten

Learned Monotone Minimal Perfect Hashing

Author: Ferragina Paolo
Lehmann Hans-Peter
Sanders Peter
Vinciguerra Giorgio
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 12/09/2023
Field of study

A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary value. Applications range from databases, search engines, data encryption, to pattern-matching algorithms. In this paper, we describe LeMonHash, a new technique for constructing MMPHFs for integers. The core idea of LeMonHash is surprisingly simple and effective: we learn a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then we solve the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR). On synthetic random datasets, LeMonHash needs 34% less space than the next larger competitor, while achieving about 16 times faster queries. On real-world datasets, the space usage is very close to or much better than the best competitors, while achieving up to 19 times faster queries than the next larger competitor. As far as the construction of LeMonHash is concerned, we get an improvement by a factor of up to 2, compared to the competitor with the next best space usage. We also investigate the case of keys being variable-length strings, introducing the so-called LeMonHash-VL: it needs space within 13% of the best competitors while achieving up to 3 times faster queries than the next larger competitor

KITopen

Learned Monotone Minimal Perfect Hashing

Author: Ferragina Paolo
Lehmann Hans-Peter
Sanders Peter
Vinciguerra Giorgio
Publication venue
Publication date: 21/04/2023
Field of study

A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary value. Applications range from databases, search engines, data encryption, to pattern-matching algorithms. In this paper, we describe LeMonHash, a new technique for constructing MMPHFs for integers. The core idea of LeMonHash is surprisingly simple and effective: we learn a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then we solve the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR). On synthetic random datasets, LeMonHash needs 35% less space than the next best competitor, while achieving about 16 times faster queries. On real-world datasets, the space usage is very close to or much better than the best competitors, while achieving up to 19 times faster queries than the next larger competitor. As far as the construction of LeMonHash is concerned, we get an improvement by a factor of up to 2, compared to the competitor with the next best space usage. We also investigate the case of keys being variable-length strings, introducing the so-called LeMonHash-VL: it needs space within 10% of the best competitors while achieving up to 3 times faster queries

arXiv.org e-Print Archive