Search CORE

941 research outputs found

Recommended from our members

Finding succinct ordered minimal perfect hashing functions

Author: Hirschberg Daniel S.
Seiden Steven S.
Publication venue: eScholarship, University of California
Publication date: 28/02/1992
Field of study

An ordered minimal perfect hash table is one in which no collisions occur among a predefined set of keys, no space is unused, and the data are placed in the table in order. A new method for creating ordered minimal perfect hashing functions is presented. The method presented is based on a method developed by Fox, Heath, Daoud, and Chen, but it creates hash functions with representation space requirements closer to the theoretical lower bound. The method presented requires approximately 10% less space to represent generated hash functions, and is easier to implement than Fox et al's. However, a higher time complexity makes it practical for small sets only (< 1000)

eScholarship - University of California

Succinct Indexable Dictionaries with Applications to Encoding $k$ -ary Trees, Prefix Sums and Multisets

Author: Fich F. E.
Grossi R.
Hagerup T.
Hagerup T.
Jansson J.
Munro J. I.
Paul W. J.
Rajeev Raman
Raman R.
Raman V.
Srinivasa Rao Satti
Venkatesh Raman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/05/2007
Field of study

We consider the {\it indexable dictionary} problem, which consists of storing a set

S \subseteq \{0,...,m-1\}

for some integer

m

, while supporting the operations of \Rank(x), which returns the number of elements in

S

that are less than

x

x \in S

, and -1 otherwise; and \Select(i) which returns the

i

-th smallest element in

S

. We give a data structure that supports both operations in O(1) time on the RAM model and requires

{\cal B}(n,m) + o(n) + O(\lg \lg m)

bits to store a set of size

n

, where {\cal B}(n,m) = \ceil{\lg {m \choose n}} is the minimum number of bits required to store any

n

-element subset from a universe of size

m

. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the

O(\lg \lg m)

additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a

k

-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size

n

from

\{0,...,m-1\}

{\cal B}(n,m+n) + o(n)

bits that supports (appropriate generalizations of) \Rank and \Select operations in constant time, and A representation of a sequence of

n

non-negative integers summing up to

m

{\cal B}(n,m+n) + o(n)

bits that supports prefix sum queries in constant time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report 2002/1

arXiv.org e-Print Archive

Crossref

PlaDs - Platzeffiziente Datenstrukturen (Succinct Data Structures)

Author: Duong Hong Diem
Freese Maximilian
Herlez Alexander
Kalb Antonia
Meinhövel Jan
Tarnowski Jan-Philipp
Tayyem Mariam
Todtenhoefer Jennifer
Publication venue
Publication date: 25/10/2020
Field of study

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Succinct Data Structures for Retrieval and Approximate Membership

Author: Dietzfelbinger Martin
Pagh Rasmus
Publication venue
Publication date: 01/01/2008
Field of study

The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U ->{0,1}^r that has specified values on the elements of a given set S, a subset of U, |S|=n, but may have any value on elements outside S. Minimal perfect hashing makes it possible to avoid storing the set S, but this induces a space overhead of Theta(n) bits in addition to the nr bits needed for function values. In this paper we show how to eliminate this overhead. Moreover, we show that for any k query time O(k) can be achieved using space that is within a factor 1+e^{-k} of optimal, asymptotically for large n. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. The time to construct the data structure is O(n), expected. A main technical ingredient is to utilize existing tight bounds on the probability of almost square random matrices with rows of low weight to have full row rank. In addition to direct constructions, we point out a close connection between retrieval structures and hash tables where keys are stored in an array and some kind of probing scheme is used. Further, we propose a general reduction that transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Again, we show how to eliminate the space overhead present in previously known methods, and get arbitrarily close to the lower bound. The evaluation procedures of our data structures are extremely simple (similar to a Bloom filter). For the results stated above we assume free access to fully random hash functions. However, we show how to justify this assumption using extra space o(n) to simulate full randomness on a RAM

arXiv.org e-Print Archive

CiteSeerX

The IT University of Copenhagen's Repository

Constructing Minimal Perfect Hash Functions Using SAT Technology

Author: Heule Marijn
Weaver Sean
Publication venue
Publication date: 22/11/2019
Field of study

Minimal perfect hash functions (MPHFs) are used to provide efficient access to values of large dictionaries (sets of key-value pairs). Discovering new algorithms for building MPHFs is an area of active research, especially from the perspective of storage efficiency. The information-theoretic limit for MPHFs is 1/(ln 2) or roughly 1.44 bits per key. The current best practical algorithms range between 2 and 4 bits per key. In this article, we propose two SAT-based constructions of MPHFs. Our first construction yields MPHFs near the information-theoretic limit. For this construction, current state-of-the-art SAT solvers can handle instances where the dictionaries contain up to 40 elements, thereby outperforming the existing (brute-force) methods. Our second construction uses XOR-SAT filters to realize a practical approach with long-term storage of approximately 1.83 bits per key.Comment: Accepted for AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Advances in Learning Bayesian Networks of Bounded Treewidth

Author: de Campos Cassio Polpo
Ji Qiang
Maua Denis Deratani
Nie Siqi
Publication venue
Publication date: 01/01/2014
Field of study

This work presents novel algorithms for learning Bayesian network structures with bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed-integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in uniformly sampling

k

-trees (maximal graphs of treewidth

k

), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that

k

-tree. Some properties of these methods are discussed and proven. The approaches are empirically compared to each other and to a state-of-the-art method for learning bounded treewidth structures on a collection of public data sets with up to 100 variables. The experiments show that our exact algorithm outperforms the state of the art, and that the approximate approach is fairly accurate.Comment: 23 pages, 2 figures, 3 table

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

CiteSeerX

Repository TU/e