    Amortized Rotation Cost in AVL Trees

    An AVL tree is the original type of balanced binary search tree. An insertion in an nn-node AVL tree takes at most two rotations, but a deletion in an nn-node AVL tree can take Θ(logn)\Theta(\log n). A natural question is whether deletions can take many rotations not only in the worst case but in the amortized case as well. A sequence of nn successive deletions in an nn-node tree takes O(n)O(n) rotations, but what happens when insertions are intermixed with deletions? Heaupler, Sen, and Tarjan conjectured that alternating insertions and deletions in an nn-node AVL tree can cause each deletion to do Ω(logn)\Omega(\log n) rotations, but they provided no construction to justify their claim. We provide such a construction: we show that, for infinitely many nn, there is a set EE of {\it expensive} nn-node AVL trees with the property that, given any tree in EE, deleting a certain leaf and then reinserting it produces a tree in EE, with the deletion having done Θ(logn)\Theta(\log n) rotations. One can do an arbitrary number of such expensive deletion-insertion pairs. The difficulty in obtaining such a construction is that in general the tree produced by an expensive deletion-insertion pair is not the original tree. Indeed, if the trees in EE have even height kk, 2k/22^{k/2} deletion-insertion pairs are required to reproduce the original tree

    A Back-to-Basics Empirical Study of Priority Queues

    The theory community has proposed several new heap variants in the recent past which have remained largely untested experimentally. We take the field back to the drawing board, with straightforward implementations of both classic and novel structures using only standard, well-known optimizations. We study the behavior of each structure on a variety of inputs, including artificial workloads, workloads generated by running algorithms on real map data, and workloads from a discrete event simulator used in recent systems networking research. We provide observations about which characteristics are most correlated to performance. For example, we find that the L1 cache miss rate appears to be strongly correlated with wallclock time. We also provide observations about how the input sequence affects the relative performance of the different heap variants. For example, we show (both theoretically and in practice) that certain random insertion-deletion sequences are degenerate and can lead to misleading results. Overall, our findings suggest that while the conventional wisdom holds in some cases, it is sorely mistaken in others

    Finding Dominators via Disjoint Set Union

    The problem of finding dominators in a directed graph has many important applications, notably in global optimization of computer code. Although linear and near-linear-time algorithms exist, they use sophisticated data structures. We develop an algorithm for finding dominators that uses only a "static tree" disjoint set data structure in addition to simple lists and maps. The algorithm runs in near-linear or linear time, depending on the implementation of the disjoint set data structure. We give several versions of the algorithm, including one that computes loop nesting information (needed in many kinds of global code optimization) and that can be made self-certifying, so that the correctness of the computed dominators is very easy to verify

    Hollow Heaps

    We introduce the hollow heap, a very simple data structure with the same amortized efficiency as the classical Fibonacci heap. All heap operations except delete and delete-min take O(1)O(1) time, worst case as well as amortized; delete and delete-min take O(logn)O(\log n) amortized time on a heap of nn items. Hollow heaps are by far the simplest structure to achieve this. Hollow heaps combine two novel ideas: the use of lazy deletion and re-insertion to do decrease-key operations, and the use of a dag (directed acyclic graph) instead of a tree or set of trees to represent a heap. Lazy deletion produces hollow nodes (nodes without items), giving the data structure its name.Comment: 27 pages, 7 figures, preliminary version appeared in ICALP 201

    Optimal resizable arrays

    A \emph{resizable array} is an array that can \emph{grow} and \emph{shrink} by the addition or removal of items from its end, or both its ends, while still supporting constant-time \emph{access} to each item stored in the array given its \emph{index}. Since the size of an array, i.e., the number of items in it, varies over time, space-efficient maintenance of a resizable array requires dynamic memory management. A standard doubling technique allows the maintenance of an array of size~NN using only O(N)O(N) space, with O(1)O(1) amortized time, or even O(1)O(1) worst-case time, per operation. Sitarski and Brodnik et al.\ describe much better solutions that maintain a resizable array of size~NN using only N+O(N)N+O(\sqrt{N}) space, still with O(1)O(1) time per operation. Brodnik et al.\ give a simple proof that this is best possible. We distinguish between the space needed for \emph{storing} a resizable array, and accessing its items, and the \emph{temporary} space that may be needed while growing or shrinking the array. For every integer r2r\ge 2, we show that N+O(N1/r)N+O(N^{1/r}) space is sufficient for storing and accessing an array of size~NN, if N+O(N11/r)N+O(N^{1-1/r}) space can be used briefly during grow and shrink operations. Accessing an item by index takes O(1)O(1) worst-case time while grow and shrink operations take O(r)O(r) amortized time. Using an exact analysis of a \emph{growth game}, we show that for any data structure from a wide class of data structures that uses only N+O(N1/r)N+O(N^{1/r}) space to store the array, the amortized cost of grow is Ω(r)\Omega(r), even if only grow and access operations are allowed. The time for grow and shrink operations cannot be made worst-case, unless r=2r=2.Comment: To appear in SOSA 202