5,972 research outputs found
Radix Sorting With No Extra Space
It is well known that n integers in the range [1,n^c] can be sorted in O(n)
time in the RAM model using radix sorting. More generally, integers in any
range [1,U] can be sorted in O(n sqrt{loglog n}) time. However, these
algorithms use O(n) words of extra memory. Is this necessary?
We present a simple, stable, integer sorting algorithm for words of size
O(log n), which works in O(n) time and uses only O(1) words of extra memory on
a RAM model. This is the integer sorting case most useful in practice. We
extend this result with same bounds to the case when the keys are read-only,
which is of theoretical interest. Another interesting question is the case of
arbitrary c. Here we present a black-box transformation from any RAM sorting
algorithm to a sorting algorithm which uses only O(1) extra space and has the
same running time. This settles the complexity of in-place sorting in terms of
the complexity of sorting.Comment: Full version of paper accepted to ESA 2007. (17 pages
Worst-Case Efficient Sorting with QuickMergesort
The two most prominent solutions for the sorting problem are Quicksort and
Mergesort. While Quicksort is very fast on average, Mergesort additionally
gives worst-case guarantees, but needs extra space for a linear number of
elements. Worst-case efficient in-place sorting, however, remains a challenge:
the standard solution, Heapsort, suffers from a bad cache behavior and is also
not overly fast for in-cache instances.
In this work we present median-of-medians QuickMergesort (MoMQuickMergesort),
a new variant of QuickMergesort, which combines Quicksort with Mergesort
allowing the latter to be implemented in place. Our new variant applies the
median-of-medians algorithm for selecting pivots in order to circumvent the
quadratic worst case. Indeed, we show that it uses at most
comparisons for large enough.
We experimentally confirm the theoretical estimates and show that the new
algorithm outperforms Heapsort by far and is only around 10% slower than
Introsort (std::sort implementation of stdlibc++), which has a rather poor
guarantee for the worst case. We also simulate the worst case, which is only
around 10% slower than the average case. In particular, the new algorithm is a
natural candidate to replace Heapsort as a worst-case stopper in Introsort
Maximally Consistent Sampling and the Jaccard Index of Probability Distributions
We introduce simple, efficient algorithms for computing a MinHash of a
probability distribution, suitable for both sparse and dense data, with
equivalent running times to the state of the art for both cases. The collision
probability of these algorithms is a new measure of the similarity of positive
vectors which we investigate in detail. We describe the sense in which this
collision probability is optimal for any Locality Sensitive Hash based on
sampling. We argue that this similarity measure is more useful for probability
distributions than the similarity pursued by other algorithms for weighted
MinHash, and is the natural generalization of the Jaccard index.Comment: To appear in ICDMW 201
Optimal Substring-Equality Queries with Applications to Sparse Text Indexing
We consider the problem of encoding a string of length from an integer
alphabet of size so that access and substring equality queries (that
is, determining the equality of any two substrings) can be answered
efficiently. Any uniquely-decodable encoding supporting access must take
bits. We describe a new data
structure matching this lower bound when while supporting
both queries in optimal time. Furthermore, we show that the string can
be overwritten in-place with this structure. The redundancy of
bits and the constant query time break exponentially a lower bound that is
known to hold in the read-only model. Using our new string representation, we
obtain the first in-place subquadratic (indeed, even sublinear in some cases)
algorithms for several string-processing problems in the restore model: the
input string is rewritable and must be restored before the computation
terminates. In particular, we describe the first in-place subquadratic Monte
Carlo solutions to the sparse suffix sorting, sparse LCP array construction,
and suffix selection problems. With the sole exception of suffix selection, our
algorithms are also the first running in sublinear time for small enough sets
of input suffixes. Combining these solutions, we obtain the first
sublinear-time Monte Carlo algorithm for building the sparse suffix tree in
compact space. We also show how to derandomize our algorithms using small
space. This leads to the first Las Vegas in-place algorithm computing the full
LCP array in time and to the first Las Vegas in-place algorithms
solving the sparse suffix sorting and sparse LCP array construction problems in
time. Running times of these Las Vegas
algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las
Vegas algorithm
Smooth heaps and a dual view of self-adjusting data structures
We present a new connection between self-adjusting binary search trees (BSTs)
and heaps, two fundamental, extensively studied, and practically relevant
families of data structures. Roughly speaking, we map an arbitrary heap
algorithm within a natural model, to a corresponding BST algorithm with the
same cost on a dual sequence of operations (i.e. the same sequence with the
roles of time and key-space switched). This is the first general transformation
between the two families of data structures.
There is a rich theory of dynamic optimality for BSTs (i.e. the theory of
competitiveness between BST algorithms). The lack of an analogous theory for
heaps has been noted in the literature. Through our connection, we transfer all
instance-specific lower bounds known for BSTs to a general model of heaps,
initiating a theory of dynamic optimality for heaps.
On the algorithmic side, we obtain a new, simple and efficient heap
algorithm, which we call the smooth heap. We show the smooth heap to be the
heap-counterpart of Greedy, the BST algorithm with the strongest proven and
conjectured properties from the literature, widely believed to be
instance-optimal. Assuming the optimality of Greedy, the smooth heap is also
optimal within our model of heap algorithms. As corollaries of results known
for Greedy, we obtain instance-specific upper bounds for the smooth heap, with
applications in adaptive sorting.
Intriguingly, the smooth heap, although derived from a non-practical BST
algorithm, is simple and easy to implement (e.g. it stores no auxiliary data
besides the keys and tree pointers). It can be seen as a variation on the
popular pairing heap data structure, extending it with a "power-of-two-choices"
type of heuristic.Comment: Presented at STOC 2018, light revision, additional figure
Improved parallel integer sorting without concurrent writing
We show that integers in the range 1 \twodots n can be stably sorted on an \linebreak EREW PRAM using \nolinebreak time \linebreak and operations, for arbitrary given \linebreak , and on a CREW PRAM using % time and time and operations, for arbitrary given . In addition, we are able to sort arbitrary integers on a randomized CREW PRAM % using % time and operations within the same resource bounds with high probability. In each case our algorithm is a factor of almost closer to optimality than all previous algorithms for the stated problem in the stated model, and our third result matches the operation count of the best known sequential algorithm. We also show that integers in the range 1 \twodots m can be sorted in time with operations on an EREW PRAM using a nonstandard word length of bits, thereby greatly improving the upper bound on the word length necessary to sort integers with a linear time-processor product, even sequentially. Our algorithms were inspired by, and in one case directly use, the fusion trees of Fredman and Willard
Fast integer merging on the EREW PRAM
We investigate the complexity of merging sequences of small integers on the EREW PRAM. Our most surprising result is that two sorted sequences of bits each can be merged in time. More generally, we describe an algorithm to merge two sorted sequences of integers drawn from the set in time using an optimal number of processors. No sublogarithmic merging algorithm for this model of computation was previously known. The algorithm not only produces the merged sequence, but also computes the rank of each input element in the merged sequence. On the other hand, we show a lower bound of on the time needed to merge two sorted sequences of length each with elements in the set , implying that our merging algorithm is as fast as possible for . If we impose an additional stability condition requiring the ranks of each input sequence to form an increasing sequence, then the time complexity of the problem becomes , even for . Stable merging is thus harder than nonstable merging
- …