3,156 research outputs found
Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket
We study wear-leveling techniques for cuckoo hashing, showing that it is
possible to achieve a memory wear bound of after the
insertion of items into a table of size for a suitable constant
using cuckoo hashing. Moreover, we study our cuckoo hashing method empirically,
showing that it significantly improves on the memory wear performance for
classic cuckoo hashing and linear probing in practice.Comment: 13 pages, 1 table, 7 figures; to appear at the 13th Symposium on
Experimental Algorithms (SEA 2014
Succinct Indexable Dictionaries with Applications to Encoding -ary Trees, Prefix Sums and Multisets
We consider the {\it indexable dictionary} problem, which consists of storing
a set for some integer , while supporting the
operations of \Rank(x), which returns the number of elements in that are
less than if , and -1 otherwise; and \Select(i) which returns
the -th smallest element in . We give a data structure that supports both
operations in O(1) time on the RAM model and requires bits to store a set of size , where {\cal B}(n,m) = \ceil{\lg
{m \choose n}} is the minimum number of bits required to store any -element
subset from a universe of size . Previous dictionaries taking this space
only supported (yes/no) membership queries in O(1) time. In the cell probe
model we can remove the additive term in the space bound,
answering a question raised by Fich and Miltersen, and Pagh.
We present extensions and applications of our indexable dictionary data
structure, including:
An information-theoretically optimal representation of a -ary cardinal
tree that supports standard operations in constant time,
A representation of a multiset of size from in bits that supports (appropriate generalizations of) \Rank
and \Select operations in constant time, and
A representation of a sequence of non-negative integers summing up to
in bits that supports prefix sum queries in constant
time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report
2002/1
Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval
We present a simple but powerful reinterpretation of kernelized
locality-sensitive hashing (KLSH), a general and popular method developed in
the vision community for performing approximate nearest-neighbor searches in an
arbitrary reproducing kernel Hilbert space (RKHS). Our new perspective is based
on viewing the steps of the KLSH algorithm in an appropriately projected space,
and has several key theoretical and practical benefits. First, it eliminates
the problematic conceptual difficulties that are present in the existing
motivation of KLSH. Second, it yields the first formal retrieval performance
bounds for KLSH. Third, our analysis reveals two techniques for boosting the
empirical performance of KLSH. We evaluate these extensions on several
large-scale benchmark image retrieval data sets, and show that our analysis
leads to improved recall performance of at least 12%, and sometimes much
higher, over the standard KLSH method.Comment: 15 page
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
- …