3 research outputs found
A New Approach to Analyzing Robin Hood Hashing
Robin Hood hashing is a variation on open addressing hashing designed to
reduce the maximum search time as well as the variance in the search time for
elements in the hash table. While the case of insertions only using Robin Hood
hashing is well understood, the behavior with deletions has remained open. Here
we show that Robin Hood hashing can be analyzed under the framework of
finite-level finite-dimensional jump Markov chains. This framework allows us to
re-derive some past results for the insertion-only case with some new insight,
as well as provide a new analysis for a standard deletion model, where we
alternate between deleting a random old key and inserting a new one. In
particular, we show that a simple but apparently unstudied approach for
handling deletions with Robin Hood hashing offers good performance even under
high loads.Comment: 19 pages, draft version. Updated from the previous version with some
new proofs, in particular a full proof of the log log n + O(1) maximum age
bound in the static setting, by converting the fluid limit argument into a
layered inductio
Hashing with Linear Probing and Referential Integrity
We describe a variant of linear probing hash tables that never moves elements
and thus supports referential integrity, i.e., pointers to elements remain
valid while this element is in the hash table. This is achieved by the folklore
method of marking some table entries as formerly occupied (tombstones). The
innovation is that the number of tombstones is minimized. Experiments indicate
that this allows an unbounded number of operations with bounded overhead
compared to linear probing without tombstones (and without referential
integrity).Comment: 5 pages, 3 figure
Robin Hood Hashing really has constant average search cost and variance in full tables
Thirty years ago, the Robin Hood collision resolution strategy was introduced
for open addressing hash tables, and a recurrence equation was found for the
distribution of its search cost. Although this recurrence could not be solved
analytically, it allowed for numerical computations that, remarkably, suggested
that the variance of the search cost approached a value of when the
table was full. Furthermore, by using a non-standard mean-centered search
algorithm, this would imply that searches could be performed in expected
constant time even in a full table.
In spite of the time elapsed since these observations were made, no progress
has been made in proving them. In this paper we introduce a technique to work
around the intractability of the recurrence equation by solving instead an
associated differential equation. While this does not provide an exact
solution, it is sufficiently powerful to prove a bound for the variance, and
thus obtain a proof that the variance of Robin Hood is bounded by a small
constant for load factors arbitrarily close to 1. As a corollary, this proves
that the mean-centered search algorithm runs in expected constant time.
We also use this technique to study the performance of Robin Hood hash tables
under a long sequence of insertions and deletions, where deletions are
implemented by marking elements as deleted. We prove that, in this case, the
variance is bounded by , where is the load factor.
To model the behavior of these hash tables, we use a unified approach that
can be applied also to study the First-Come-First-Served and
Last-Come-First-Served collision resolution disciplines, both with and without
deletions