5,972 research outputs found

    Radix Sorting With No Extra Space

    Full text link
    It is well known that n integers in the range [1,n^c] can be sorted in O(n) time in the RAM model using radix sorting. More generally, integers in any range [1,U] can be sorted in O(n sqrt{loglog n}) time. However, these algorithms use O(n) words of extra memory. Is this necessary? We present a simple, stable, integer sorting algorithm for words of size O(log n), which works in O(n) time and uses only O(1) words of extra memory on a RAM model. This is the integer sorting case most useful in practice. We extend this result with same bounds to the case when the keys are read-only, which is of theoretical interest. Another interesting question is the case of arbitrary c. Here we present a black-box transformation from any RAM sorting algorithm to a sorting algorithm which uses only O(1) extra space and has the same running time. This settles the complexity of in-place sorting in terms of the complexity of sorting.Comment: Full version of paper accepted to ESA 2007. (17 pages

    Worst-Case Efficient Sorting with QuickMergesort

    Full text link
    The two most prominent solutions for the sorting problem are Quicksort and Mergesort. While Quicksort is very fast on average, Mergesort additionally gives worst-case guarantees, but needs extra space for a linear number of elements. Worst-case efficient in-place sorting, however, remains a challenge: the standard solution, Heapsort, suffers from a bad cache behavior and is also not overly fast for in-cache instances. In this work we present median-of-medians QuickMergesort (MoMQuickMergesort), a new variant of QuickMergesort, which combines Quicksort with Mergesort allowing the latter to be implemented in place. Our new variant applies the median-of-medians algorithm for selecting pivots in order to circumvent the quadratic worst case. Indeed, we show that it uses at most nlogn+1.6nn \log n + 1.6n comparisons for nn large enough. We experimentally confirm the theoretical estimates and show that the new algorithm outperforms Heapsort by far and is only around 10% slower than Introsort (std::sort implementation of stdlibc++), which has a rather poor guarantee for the worst case. We also simulate the worst case, which is only around 10% slower than the average case. In particular, the new algorithm is a natural candidate to replace Heapsort as a worst-case stopper in Introsort

    Maximally Consistent Sampling and the Jaccard Index of Probability Distributions

    Full text link
    We introduce simple, efficient algorithms for computing a MinHash of a probability distribution, suitable for both sparse and dense data, with equivalent running times to the state of the art for both cases. The collision probability of these algorithms is a new measure of the similarity of positive vectors which we investigate in detail. We describe the sense in which this collision probability is optimal for any Locality Sensitive Hash based on sampling. We argue that this similarity measure is more useful for probability distributions than the similarity pursued by other algorithms for weighted MinHash, and is the natural generalization of the Jaccard index.Comment: To appear in ICDMW 201

    Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

    Full text link
    We consider the problem of encoding a string of length nn from an integer alphabet of size σ\sigma so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take nlogσ+Θ(log(nlogσ))n\log\sigma + \Theta(\log (n\log\sigma)) bits. We describe a new data structure matching this lower bound when σnO(1)\sigma\leq n^{O(1)} while supporting both queries in optimal O(1)O(1) time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of Θ(logn)\Theta(\log n) bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore model: the input string is rewritable and must be restored before the computation terminates. In particular, we describe the first in-place subquadratic Monte Carlo solutions to the sparse suffix sorting, sparse LCP array construction, and suffix selection problems. With the sole exception of suffix selection, our algorithms are also the first running in sublinear time for small enough sets of input suffixes. Combining these solutions, we obtain the first sublinear-time Monte Carlo algorithm for building the sparse suffix tree in compact space. We also show how to derandomize our algorithms using small space. This leads to the first Las Vegas in-place algorithm computing the full LCP array in O(nlogn)O(n\log n) time and to the first Las Vegas in-place algorithms solving the sparse suffix sorting and sparse LCP array construction problems in O(n1.5logσ)O(n^{1.5}\sqrt{\log \sigma}) time. Running times of these Las Vegas algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las Vegas algorithm

    Smooth heaps and a dual view of self-adjusting data structures

    Full text link
    We present a new connection between self-adjusting binary search trees (BSTs) and heaps, two fundamental, extensively studied, and practically relevant families of data structures. Roughly speaking, we map an arbitrary heap algorithm within a natural model, to a corresponding BST algorithm with the same cost on a dual sequence of operations (i.e. the same sequence with the roles of time and key-space switched). This is the first general transformation between the two families of data structures. There is a rich theory of dynamic optimality for BSTs (i.e. the theory of competitiveness between BST algorithms). The lack of an analogous theory for heaps has been noted in the literature. Through our connection, we transfer all instance-specific lower bounds known for BSTs to a general model of heaps, initiating a theory of dynamic optimality for heaps. On the algorithmic side, we obtain a new, simple and efficient heap algorithm, which we call the smooth heap. We show the smooth heap to be the heap-counterpart of Greedy, the BST algorithm with the strongest proven and conjectured properties from the literature, widely believed to be instance-optimal. Assuming the optimality of Greedy, the smooth heap is also optimal within our model of heap algorithms. As corollaries of results known for Greedy, we obtain instance-specific upper bounds for the smooth heap, with applications in adaptive sorting. Intriguingly, the smooth heap, although derived from a non-practical BST algorithm, is simple and easy to implement (e.g. it stores no auxiliary data besides the keys and tree pointers). It can be seen as a variation on the popular pairing heap data structure, extending it with a "power-of-two-choices" type of heuristic.Comment: Presented at STOC 2018, light revision, additional figure

    Improved parallel integer sorting without concurrent writing

    No full text
    We show that nn integers in the range 1 \twodots n can be stably sorted on an \linebreak EREW PRAM using \nolinebreak O(t)O(t) time \linebreak and O(n(lognloglogn+(logn)2/t))O(n(\sqrt{\log n\log\log n}+{{(\log n)^2}/t})) operations, for arbitrary given \linebreak tlognloglognt\ge\log n\log\log n, and on a CREW PRAM using %O(lognloglogn)O(\log n\log\log n) time and O(nlogn)O(n\sqrt{\log n}) O(t)O(t) time and O(n(logn+logn/2t/logn))O(n(\sqrt{\log n}+{{\log n}/{2^{{t/{\log n}}}}})) operations, for arbitrary given tlognt\ge\log n. In addition, we are able to sort nn arbitrary integers on a randomized CREW PRAM % using %O(lognloglogn)O(\log n\log\log n) time and O(nlogn)O(n\sqrt{\log n}) operations within the same resource bounds with high probability. In each case our algorithm is a factor of almost Θ(logn)\Theta(\sqrt{\log n}) closer to optimality than all previous algorithms for the stated problem in the stated model, and our third result matches the operation count of the best known sequential algorithm. We also show that nn integers in the range 1 \twodots m can be sorted in O((logn)2)O((\log n)^2) time with O(n)O(n) operations on an EREW PRAM using a nonstandard word length of O(lognloglognlogm)O(\log n \log\log n \log m) bits, thereby greatly improving the upper bound on the word length necessary to sort integers with a linear time-processor product, even sequentially. Our algorithms were inspired by, and in one case directly use, the fusion trees of Fredman and Willard

    Fast integer merging on the EREW PRAM

    Get PDF
    We investigate the complexity of merging sequences of small integers on the EREW PRAM. Our most surprising result is that two sorted sequences of nn bits each can be merged in O(loglogn)O(\log\log n) time. More generally, we describe an algorithm to merge two sorted sequences of nn integers drawn from the set {0,,m1}\{0,\ldots,m-1\} in O(loglogn+logm)O(\log\log n+\log m) time using an optimal number of processors. No sublogarithmic merging algorithm for this model of computation was previously known. The algorithm not only produces the merged sequence, but also computes the rank of each input element in the merged sequence. On the other hand, we show a lower bound of Ω(logmin{n,m})\Omega(\log\min\{n,m\}) on the time needed to merge two sorted sequences of length nn each with elements in the set {0,,m1}\{0,\ldots,m-1\}, implying that our merging algorithm is as fast as possible for m=(logn)Ω(1)m=(\log n)^{\Omega(1)}. If we impose an additional stability condition requiring the ranks of each input sequence to form an increasing sequence, then the time complexity of the problem becomes Θ(logn)\Theta(\log n), even for m=2m=2. Stable merging is thus harder than nonstable merging
    corecore