8 research outputs found

    Distributional convergence for the number of symbol comparisons used by QuickSort

    Full text link
    Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (i.i.d.) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild "tameness" condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, that is, whenever each key is generated as an infinite string of i.i.d. symbols. This is somewhat surprising; even for the classical model that each key is an i.i.d. string of unbiased ("fair") bits, the mean exhibits periodic fluctuations of order n.Comment: Published in at http://dx.doi.org/10.1214/12-AAP866 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Analysis of pivot sampling in dual-pivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0041-7The new dual-pivot Quicksort by Vladimir Yaroslavskiy-used in Oracle's Java runtime library since version 7-features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.Peer ReviewedPostprint (author's final draft
    corecore