386 research outputs found

    On deletions in open addressing hashing

    Get PDF
    Deletions in open addressing tables have often been seen as problematic. The usual solution is to use a special mark’deleted’ so that probe sequences continue past deleted slots, as if there was an element still sitting there. Such a solution, notwithstanding is wide applicability, may involve performance degradation. In the first part of this paper we review a practical implementation of the often overlooked deletion algorithm for linear probing hash tables, analyze its properties and performance, and provide several strong arguments in favor of the Robin Hood variant. In particular, we show how a small variation can yield substantial improvements for unsuccessful search. In the second part we propose an algorithm for true deletion in open addressing hashing with secondary clustering, like quadratic hashing. As far as we know, this is the first time that such an algorithm appears in the literature. Moreover, for tables built using the Robin Hood variant the deletion algorithm strongly preserves randomness (the resulting table is identical to the table that would result if the item were not inserted at all). Although it involves some extra memory for bookkeeping, the algorithm is comparatively easy and efficient, and it might be of some practical value, besides its theoretical interest.Peer ReviewedPostprint (author's final draft

    On deletions in open addressing hashing

    Get PDF
    Deletions in open addressing tables have often been seen as problematic. The usual solution is to use a special mark ’deleted’ so that probe sequences continue past deleted slots, as if there was an element still sitting there. Such a solution, notwithstanding is wide applicability, may involve serious performance degradation. In the first part of this paper we review a practical implementation of the often overlooked deletion algorithm for linear probing hash tables, analyze its properties and performance, and provide several strong arguments in favor of the Robin Hood variant. In particular, we show how a small variation can yield substantial improvements for unsuccesful search. In the second part we propose an algorithm for true deletion in open addressing hashing with secondary clustering, like quadratic hashing. As far as we know, this is the first time that such an algorithm appears in the literature. Although it involves some extra memory for bookkeeping, the algorithm is comparatively easy and efficient, and might be of practical value, besides its theoretical interest.Postprint (published version

    The Maximum Size of Dynamic Data Structures

    Get PDF
    This paper develops two probabilistic methods that allow the analysis of the maximum data structure size encountered during a sequence of insertions and deletions in data structures such as priority queues, dictionaries, linear lists, and symbol tables, and in sweepline structures for geometry and Very-Large-Scale-Integration (VLSI) applications. The notion of the "maximum" is basic to issues of resource preallocation. The methods here are applied to combinatorial models of file histories and probabilistic models, as well as to a non-Markovian process (algorithm) for processing sweepline information in an efficient way, called "hashing with lazy deletion" (HwLD). Expressions are derived for the expected maximum data structure size that are asymptotically exact, that is, correct up to lower-order terms; in several cases of interest the expected value of the maximum size is asymptotically equal to the maximum expected size. This solves several open problems, including longstanding questions in queueing theory. Both of these approaches are robust and rely upon novel applications of techniques from the analysis of algorithms. At a high level, the first method isolates the primary contribution to the maximum and bounds the lesser effects. In the second technique the continuous-time probabilistic model is related to its discrete analog--the maximum slot occupancy in hashing

    Dynamic Data Structures for Document Collections and Graphs

    Full text link
    In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations

    Sorting using complete subintervals and the maximum number of runs in a randomly evolving sequence

    Full text link
    We study the space requirements of a sorting algorithm where only items that at the end will be adjacent are kept together. This is equivalent to the following combinatorial problem: Consider a string of fixed length n that starts as a string of 0's, and then evolves by changing each 0 to 1, with then changes done in random order. What is the maximal number of runs of 1's? We give asymptotic results for the distribution and mean. It turns out that, as in many problems involving a maximum, the maximum is asymptotically normal, with fluctuations of order n^{1/2}, and to the first order well approximated by the number of runs at the instance when the expectation is maximized, in this case when half the elements have changed to 1; there is also a second order term of order n^{1/3}. We also treat some variations, including priority queues. The proofs use methods originally developed for random graphs.Comment: 31 PAGE
    • …
    corecore