Search CORE

5,972 research outputs found

Radix Sorting With No Extra Space

Author: Franceschini Gianni
Muthukrishnan S.
Patrascu Mihai
Publication venue
Publication date: 01/01/2007
Field of study

It is well known that n integers in the range [1,n^c] can be sorted in O(n) time in the RAM model using radix sorting. More generally, integers in any range [1,U] can be sorted in O(n sqrt{loglog n}) time. However, these algorithms use O(n) words of extra memory. Is this necessary? We present a simple, stable, integer sorting algorithm for words of size O(log n), which works in O(n) time and uses only O(1) words of extra memory on a RAM model. This is the integer sorting case most useful in practice. We extend this result with same bounds to the case when the keys are read-only, which is of theoretical interest. Another interesting question is the case of arbitrary c. Here we present a black-box transformation from any RAM sorting algorithm to a sorting algorithm which uses only O(1) extra space and has the same running time. This settles the complexity of in-place sorting in terms of the complexity of sorting.Comment: Full version of paper accepted to ESA 2007. (17 pages

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Worst-Case Efficient Sorting with QuickMergesort

Author: Edelkamp Stefan
Weiß Armin
Publication venue
Publication date: 02/11/2018
Field of study

The two most prominent solutions for the sorting problem are Quicksort and Mergesort. While Quicksort is very fast on average, Mergesort additionally gives worst-case guarantees, but needs extra space for a linear number of elements. Worst-case efficient in-place sorting, however, remains a challenge: the standard solution, Heapsort, suffers from a bad cache behavior and is also not overly fast for in-cache instances. In this work we present median-of-medians QuickMergesort (MoMQuickMergesort), a new variant of QuickMergesort, which combines Quicksort with Mergesort allowing the latter to be implemented in place. Our new variant applies the median-of-medians algorithm for selecting pivots in order to circumvent the quadratic worst case. Indeed, we show that it uses at most

n \log n + 1.6n

comparisons for

n

large enough. We experimentally confirm the theoretical estimates and show that the new algorithm outperforms Heapsort by far and is only around 10% slower than Introsort (std::sort implementation of stdlibc++), which has a rather poor guarantee for the worst case. We also simulate the worst case, which is only around 10% slower than the average case. In particular, the new algorithm is a natural candidate to replace Heapsort as a worst-case stopper in Introsort

arXiv.org e-Print Archive

Crossref

King's Research Portal

Maximally Consistent Sampling and the Jaccard Index of Probability Distributions

Author: Jiang Yunjiang
Moulton Ryan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/10/2018
Field of study

We introduce simple, efficient algorithms for computing a MinHash of a probability distribution, suitable for both sparse and dense data, with equivalent running times to the state of the art for both cases. The collision probability of these algorithms is a new measure of the similarity of positive vectors which we investigate in detail. We describe the sense in which this collision probability is optimal for any Locality Sensitive Hash based on sampling. We argue that this similarity measure is more useful for probability distributions than the similarity pursued by other algorithms for weighted MinHash, and is the natural generalization of the Jaccard index.Comment: To appear in ICDMW 201

arXiv.org e-Print Archive

Crossref

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

Author: Prezza Nicola
Publication venue
Publication date: 01/01/2020
Field of study

We consider the problem of encoding a string of length

n

from an integer alphabet of size

\sigma

so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take

n\log\sigma + \Theta(\log (n\log\sigma))

bits. We describe a new data structure matching this lower bound when

\sigma\leq n^{O(1)}

while supporting both queries in optimal

O(1)

time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of

\Theta(\log n)

bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore model: the input string is rewritable and must be restored before the computation terminates. In particular, we describe the first in-place subquadratic Monte Carlo solutions to the sparse suffix sorting, sparse LCP array construction, and suffix selection problems. With the sole exception of suffix selection, our algorithms are also the first running in sublinear time for small enough sets of input suffixes. Combining these solutions, we obtain the first sublinear-time Monte Carlo algorithm for building the sparse suffix tree in compact space. We also show how to derandomize our algorithms using small space. This leads to the first Las Vegas in-place algorithm computing the full LCP array in

O(n\log n)

time and to the first Las Vegas in-place algorithms solving the sparse suffix sorting and sparse LCP array construction problems in

O(n^{1.5}\sqrt{\log \sigma})

time. Running times of these Las Vegas algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las Vegas algorithm

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Smooth heaps and a dual view of self-adjusting data structures

Author: Kozma László
Saranurak Thatchaphol
Publication venue
Publication date: 20/06/2018
Field of study

We present a new connection between self-adjusting binary search trees (BSTs) and heaps, two fundamental, extensively studied, and practically relevant families of data structures. Roughly speaking, we map an arbitrary heap algorithm within a natural model, to a corresponding BST algorithm with the same cost on a dual sequence of operations (i.e. the same sequence with the roles of time and key-space switched). This is the first general transformation between the two families of data structures. There is a rich theory of dynamic optimality for BSTs (i.e. the theory of competitiveness between BST algorithms). The lack of an analogous theory for heaps has been noted in the literature. Through our connection, we transfer all instance-specific lower bounds known for BSTs to a general model of heaps, initiating a theory of dynamic optimality for heaps. On the algorithmic side, we obtain a new, simple and efficient heap algorithm, which we call the smooth heap. We show the smooth heap to be the heap-counterpart of Greedy, the BST algorithm with the strongest proven and conjectured properties from the literature, widely believed to be instance-optimal. Assuming the optimality of Greedy, the smooth heap is also optimal within our model of heap algorithms. As corollaries of results known for Greedy, we obtain instance-specific upper bounds for the smooth heap, with applications in adaptive sorting. Intriguingly, the smooth heap, although derived from a non-practical BST algorithm, is simple and easy to implement (e.g. it stores no auxiliary data besides the keys and tree pointers). It can be seen as a variation on the popular pairing heap data structure, extending it with a "power-of-two-choices" type of heuristic.Comment: Presented at STOC 2018, light revision, additional figure

arXiv.org e-Print Archive

Pure OAI Repository

Improved parallel integer sorting without concurrent writing

Author: Albers S.
Hagerup T.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1994
Field of study

We show that

n

integers in the range 1 \twodots n can be stably sorted on an \linebreak EREW PRAM using \nolinebreak

O(t)

time \linebreak and

O(n(\sqrt{\log n\log\log n}+{{(\log n)^2}/t}))

operations, for arbitrary given \linebreak

t\ge\log n\log\log n

, and on a CREW PRAM using %

O(\log n\log\log n)

time and

O(n\sqrt{\log n})

O(t)

time and

O(n(\sqrt{\log n}+{{\log n}/{2^{{t/{\log n}}}}}))

operations, for arbitrary given

t\ge\log n

. In addition, we are able to sort

n

arbitrary integers on a randomized CREW PRAM % using %

O(\log n\log\log n)

time and

O(n\sqrt{\log n})

operations within the same resource bounds with high probability. In each case our algorithm is a factor of almost

\Theta(\sqrt{\log n})

closer to optimality than all previous algorithms for the stated problem in the stated model, and our third result matches the operation count of the best known sequential algorithm. We also show that

n

integers in the range 1 \twodots m can be sorted in

O((\log n)^2)

time with

O(n)

operations on an EREW PRAM using a nonstandard word length of

O(\log n \log\log n \log m)

bits, thereby greatly improving the upper bound on the word length necessary to sort integers with a linear time-processor product, even sequentially. Our algorithms were inspired by, and in one case directly use, the fusion trees of Fredman and Willard

CiteSeerX

MPG.PuRe

Fast integer merging on the EREW PRAM

Author: Hagerup T.
Kutylowski M.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1992
Field of study

We investigate the complexity of merging sequences of small integers on the EREW PRAM. Our most surprising result is that two sorted sequences of

n

bits each can be merged in

O(\log\log n)

time. More generally, we describe an algorithm to merge two sorted sequences of

n

integers drawn from the set

\{0,\ldots,m-1\}

O(\log\log n+\log m)

time using an optimal number of processors. No sublogarithmic merging algorithm for this model of computation was previously known. The algorithm not only produces the merged sequence, but also computes the rank of each input element in the merged sequence. On the other hand, we show a lower bound of

\Omega(\log\min\{n,m\})

on the time needed to merge two sorted sequences of length

n

each with elements in the set

\{0,\ldots,m-1\}

, implying that our merging algorithm is as fast as possible for

m=(\log n)^{\Omega(1)}

. If we impose an additional stability condition requiring the ranks of each input sequence to form an increasing sequence, then the time complexity of the problem becomes

\Theta(\log n)

, even for

m=2

. Stable merging is thus harder than nonstable merging

MPG.PuRe