57 research outputs found
Distributional convergence for the number of symbol comparisons used by QuickSort
Most previous studies of the sorting algorithm QuickSort have used the number
of key comparisons as a measure of the cost of executing the algorithm. Here we
suppose that the n independent and identically distributed (i.i.d.) keys are
each represented as a sequence of symbols from a probabilistic source and that
QuickSort operates on individual symbols, and we measure the execution cost as
the number of symbol comparisons. Assuming only a mild "tameness" condition on
the source, we show that there is a limiting distribution for the number of
symbol comparisons after normalization: first centering by the mean and then
dividing by n. Additionally, under a condition that grows more restrictive as p
increases, we have convergence of moments of orders p and smaller. In
particular, we have convergence in distribution and convergence of moments of
every order whenever the source is memoryless, that is, whenever each key is
generated as an infinite string of i.i.d. symbols. This is somewhat surprising;
even for the classical model that each key is an i.i.d. string of unbiased
("fair") bits, the mean exhibits periodic fluctuations of order n.Comment: Published in at http://dx.doi.org/10.1214/12-AAP866 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Analysis of Quickselect under Yaroslavskiy's Dual-Pivoting Algorithm
There is excitement within the algorithms community about a new partitioning
method introduced by Yaroslavskiy. This algorithm renders Quicksort slightly
faster than the case when it runs under classic partitioning methods. We show
that this improved performance in Quicksort is not sustained in Quickselect; a
variant of Quicksort for finding order statistics. We investigate the number of
comparisons made by Quickselect to find a key with a randomly selected rank
under Yaroslavskiy's algorithm. This grand averaging is a smoothing operator
over all individual distributions for specific fixed order statistics. We give
the exact grand average. The grand distribution of the number of comparison
(when suitably scaled) is given as the fixed-point solution of a distributional
equation of a contraction in the Zolotarev metric space. Our investigation
shows that Quickselect under older partitioning methods slightly outperforms
Quickselect under Yaroslavskiy's algorithm, for an order statistic of a random
rank. Similar results are obtained for extremal order statistics, where again
we find the exact average, and the distribution for the number of comparisons
(when suitably scaled). Both limiting distributions are of perpetuities (a sum
of products of independent mixed continuous random variables).Comment: full version with appendices; otherwise identical to Algorithmica
versio
Distributional convergence for the number of symbol comparisons used by QuickSelect
When the search algorithm QuickSelect compares keys during its execution in
order to find a key of target rank, it must operate on the keys'
representations or internal structures, which were ignored by the previous
studies that quantified the execution cost for the algorithm in terms of the
number of required key comparisons. In this paper, we analyze running costs for
the algorithm that take into account not only the number of key comparisons but
also the cost of each key comparison. We suppose that keys are represented as
sequences of symbols generated by various probabilistic sources and that
QuickSelect operates on individual symbols in order to find the target key. We
identify limiting distributions for the costs and derive integral and series
expressions for the expectations of the limiting distributions. These
expressions are used to recapture previously obtained results on the number of
key comparisons required by the algorithm.Comment: The first paragraph in the proof of Theorem 3.1 has been corrected in
this revision, and references have been update
The limiting distribution for the number of symbol comparisons used by QuickSort is nondegenerate (extended abstract)
In a continuous-time setting, Fill (2010) proved, for a large class of
probabilistic sources, that the number of symbol comparisons used by QuickSort,
when centered by subtracting the mean and scaled by dividing by time, has a
limiting distribution, but proved little about that limiting random variable Y
-- not even that it is nondegenerate. We establish the nondegeneracy of Y. The
proof is perhaps surprisingly difficult
Analysis of pivot sampling in dual-pivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme
The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0041-7The new dual-pivot Quicksort by Vladimir Yaroslavskiy-used in Oracle's Java runtime library since version 7-features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.Peer ReviewedPostprint (author's final draft
A general framework for the realistic analysis of sorting and searching algorithms. Application to some popular algorithms
We describe a general framework for realistic analysis of sorting and searching algorithms, and we apply it to the average-case analysis of five basic algorithms: three sorting algorithms (QuickSort, InsertionSort, BubbleSort) and two selection algorithms (QuickMin and SelectionMin). Usually, the analysis deals with the mean number of key comparisons, but, here, we view keys as words produced by the same source, which are compared via their symbols in the lexicographic order. The "realistic" cost of the algorithm is now the total number of symbol comparisons performed by the algorithm, and, in this context, the average-case analysis aims to providee stimates for the mean number of symbol comparisons used by the algorithm. For sorting algorithms, and with respect to key comparisons, the average-case complexity of QuickSort is asymptotic to 2n log n, InsertionSort to n^2/4 and BubbleSort to n^2/2. With respect to symbol comparisons, we prove that their average-case complexity becomes Theta(n log^2n), Theta(n^2), Theta (n^2 log n). For selection algorithms, and with respect to key comparisons, the average-case complexity of QuickMin is asymptotic to 2n, of SelectionMin is n - 1. With respect to symbol comparisons, we prove that their average-case complexity remains Theta(n). In these five cases, we describe the dominant constants which exhibit the probabilistic behaviour of the source (namely, entropy, and various notions of coincidence) with respect to the algorithm
- …