15,975 research outputs found

    The Skip Quadtree: A Simple Dynamic Data Structure for Multidimensional Data

    Full text link
    We present a new multi-dimensional data structure, which we call the skip quadtree (for point data in R^2) or the skip octree (for point data in R^d, with constant d>2). Our data structure combines the best features of two well-known data structures, in that it has the well-defined "box"-shaped regions of region quadtrees and the logarithmic-height search and update hierarchical structure of skip lists. Indeed, the bottom level of our structure is exactly a region quadtree (or octree for higher dimensional data). We describe efficient algorithms for inserting and deleting points in a skip quadtree, as well as fast methods for performing point location and approximate range queries.Comment: 12 pages, 3 figures. A preliminary version of this paper appeared in the 21st ACM Symp. Comp. Geom., Pisa, 2005, pp. 296-30

    Top-Down Skiplists

    Full text link
    We describe todolists (top-down skiplists), a variant of skiplists (Pugh 1990) that can execute searches using at most log2εn+O(1)\log_{2-\varepsilon} n + O(1) binary comparisons per search and that have amortized update time O(ε1logn)O(\varepsilon^{-1}\log n). A variant of todolists, called working-todolists, can execute a search for any element xx using log2εw(x)+o(logw(x))\log_{2-\varepsilon} w(x) + o(\log w(x)) binary comparisons and have amortized search time O(ε1logw(w))O(\varepsilon^{-1}\log w(w)). Here, w(x)w(x) is the "working-set number" of xx. No previous data structure is known to achieve a bound better than 4log2w(x)4\log_2 w(x) comparisons. We show through experiments that, if implemented carefully, todolists are comparable to other common dictionary implementations in terms of insertion times and outperform them in terms of search times.Comment: 18 pages, 5 figure

    Self-Improving Algorithms

    Full text link
    We investigate ways in which an algorithm can improve its expected performance by fine-tuning itself automatically with respect to an unknown input distribution D. We assume here that D is of product type. More precisely, suppose that we need to process a sequence I_1, I_2, ... of inputs I = (x_1, x_2, ..., x_n) of some fixed length n, where each x_i is drawn independently from some arbitrary, unknown distribution D_i. The goal is to design an algorithm for these inputs so that eventually the expected running time will be optimal for the input distribution D = D_1 * D_2 * ... * D_n. We give such self-improving algorithms for two problems: (i) sorting a sequence of numbers and (ii) computing the Delaunay triangulation of a planar point set. Both algorithms achieve optimal expected limiting complexity. The algorithms begin with a training phase during which they collect information about the input distribution, followed by a stationary regime in which the algorithms settle to their optimized incarnations.Comment: 26 pages, 8 figures, preliminary versions appeared at SODA 2006 and SoCG 2008. Thorough revision to improve the presentation of the pape

    Online Sorting via Searching and Selection

    Full text link
    In this paper, we present a framework based on a simple data structure and parameterized algorithms for the problems of finding items in an unsorted list of linearly ordered items based on their rank (selection) or value (search). As a side-effect of answering these online selection and search queries, we progressively sort the list. Our algorithms are based on Hoare's Quickselect, and are parameterized based on the pivot selection method. For example, if we choose the pivot as the last item in a subinterval, our framework yields algorithms that will answer q<=n unique selection and/or search queries in a total of O(n log q) average time. After q=\Omega(n) queries the list is sorted. Each repeated selection query takes constant time, and each repeated search query takes O(log n) time. The two query types can be interleaved freely. By plugging different pivot selection methods into our framework, these results can, for example, become randomized expected time or deterministic worst-case time. Our methods are easy to implement, and we show they perform well in practice

    The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

    Full text link
    An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence

    The Lazy Flipper: MAP Inference in Higher-Order Graphical Models by Depth-limited Exhaustive Search

    Full text link
    This article presents a new search algorithm for the NP-hard problem of optimizing functions of binary variables that decompose according to a graphical model. It can be applied to models of any order and structure. The main novelty is a technique to constrain the search space based on the topology of the model. When pursued to the full search depth, the algorithm is guaranteed to converge to a global optimum, passing through a series of monotonously improving local optima that are guaranteed to be optimal within a given and increasing Hamming distance. For a search depth of 1, it specializes to Iterated Conditional Modes. Between these extremes, a useful tradeoff between approximation quality and runtime is established. Experiments on models derived from both illustrative and real problems show that approximations found with limited search depth match or improve those obtained by state-of-the-art methods based on message passing and linear programming.Comment: C++ Source Code available from http://hci.iwr.uni-heidelberg.de/software.ph

    Dynamic Data Structures for Document Collections and Graphs

    Full text link
    In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations

    Longest Common Extensions in Sublinear Space

    Get PDF
    The longest common extension problem (LCE problem) is to construct a data structure for an input string TT of length nn that supports LCE(i,j)(i,j) queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions ii and jj in TT. This classic problem has a well-known solution that uses O(n)O(n) space and O(1)O(1) query time. In this paper we show that for any trade-off parameter 1τn1 \leq \tau \leq n, the problem can be solved in O(nτ)O(\frac{n}{\tau}) space and O(τ)O(\tau) query time. This significantly improves the previously best known time-space trade-offs, and almost matches the best known time-space product lower bound.Comment: An extended abstract of this paper has been accepted to CPM 201
    corecore