941 research outputs found
Recommended from our members
Finding succinct ordered minimal perfect hashing functions
An ordered minimal perfect hash table is one in which no collisions occur among a predefined set of keys, no space is unused, and the data are placed in the table in order. A new method for creating ordered minimal perfect hashing functions is presented. The method presented is based on a method developed by Fox, Heath, Daoud, and Chen, but it creates hash functions with representation space requirements closer to the theoretical lower bound. The method presented requires approximately 10% less space to represent generated hash functions, and is easier to implement than Fox et al's. However, a higher time complexity makes it practical for small sets only (< 1000)
Succinct Indexable Dictionaries with Applications to Encoding -ary Trees, Prefix Sums and Multisets
We consider the {\it indexable dictionary} problem, which consists of storing
a set for some integer , while supporting the
operations of \Rank(x), which returns the number of elements in that are
less than if , and -1 otherwise; and \Select(i) which returns
the -th smallest element in . We give a data structure that supports both
operations in O(1) time on the RAM model and requires bits to store a set of size , where {\cal B}(n,m) = \ceil{\lg
{m \choose n}} is the minimum number of bits required to store any -element
subset from a universe of size . Previous dictionaries taking this space
only supported (yes/no) membership queries in O(1) time. In the cell probe
model we can remove the additive term in the space bound,
answering a question raised by Fich and Miltersen, and Pagh.
We present extensions and applications of our indexable dictionary data
structure, including:
An information-theoretically optimal representation of a -ary cardinal
tree that supports standard operations in constant time,
A representation of a multiset of size from in bits that supports (appropriate generalizations of) \Rank
and \Select operations in constant time, and
A representation of a sequence of non-negative integers summing up to
in bits that supports prefix sum queries in constant
time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report
2002/1
Succinct Data Structures for Retrieval and Approximate Membership
The retrieval problem is the problem of associating data with keys in a set.
Formally, the data structure must store a function f: U ->{0,1}^r that has
specified values on the elements of a given set S, a subset of U, |S|=n, but
may have any value on elements outside S. Minimal perfect hashing makes it
possible to avoid storing the set S, but this induces a space overhead of
Theta(n) bits in addition to the nr bits needed for function values. In this
paper we show how to eliminate this overhead. Moreover, we show that for any k
query time O(k) can be achieved using space that is within a factor 1+e^{-k} of
optimal, asymptotically for large n. If we allow logarithmic evaluation time,
the additive overhead can be reduced to O(log log n) bits whp. The time to
construct the data structure is O(n), expected. A main technical ingredient is
to utilize existing tight bounds on the probability of almost square random
matrices with rows of low weight to have full row rank. In addition to direct
constructions, we point out a close connection between retrieval structures and
hash tables where keys are stored in an array and some kind of probing scheme
is used. Further, we propose a general reduction that transfers the results on
retrieval into analogous results on approximate membership, a problem
traditionally addressed using Bloom filters. Again, we show how to eliminate
the space overhead present in previously known methods, and get arbitrarily
close to the lower bound. The evaluation procedures of our data structures are
extremely simple (similar to a Bloom filter). For the results stated above we
assume free access to fully random hash functions. However, we show how to
justify this assumption using extra space o(n) to simulate full randomness on a
RAM
Constructing Minimal Perfect Hash Functions Using SAT Technology
Minimal perfect hash functions (MPHFs) are used to provide efficient access
to values of large dictionaries (sets of key-value pairs). Discovering new
algorithms for building MPHFs is an area of active research, especially from
the perspective of storage efficiency. The information-theoretic limit for
MPHFs is 1/(ln 2) or roughly 1.44 bits per key. The current best practical
algorithms range between 2 and 4 bits per key. In this article, we propose two
SAT-based constructions of MPHFs. Our first construction yields MPHFs near the
information-theoretic limit. For this construction, current state-of-the-art
SAT solvers can handle instances where the dictionaries contain up to 40
elements, thereby outperforming the existing (brute-force) methods. Our second
construction uses XOR-SAT filters to realize a practical approach with
long-term storage of approximately 1.83 bits per key.Comment: Accepted for AAAI 202
Advances in Learning Bayesian Networks of Bounded Treewidth
This work presents novel algorithms for learning Bayesian network structures
with bounded treewidth. Both exact and approximate methods are developed. The
exact method combines mixed-integer linear programming formulations for
structure learning and treewidth computation. The approximate method consists
in uniformly sampling -trees (maximal graphs of treewidth ), and
subsequently selecting, exactly or approximately, the best structure whose
moral graph is a subgraph of that -tree. Some properties of these methods
are discussed and proven. The approaches are empirically compared to each other
and to a state-of-the-art method for learning bounded treewidth structures on a
collection of public data sets with up to 100 variables. The experiments show
that our exact algorithm outperforms the state of the art, and that the
approximate approach is fairly accurate.Comment: 23 pages, 2 figures, 3 table
- …