75 research outputs found
Online Sorting via Searching and Selection
In this paper, we present a framework based on a simple data structure and
parameterized algorithms for the problems of finding items in an unsorted list
of linearly ordered items based on their rank (selection) or value (search). As
a side-effect of answering these online selection and search queries, we
progressively sort the list. Our algorithms are based on Hoare's Quickselect,
and are parameterized based on the pivot selection method.
For example, if we choose the pivot as the last item in a subinterval, our
framework yields algorithms that will answer q<=n unique selection and/or
search queries in a total of O(n log q) average time. After q=\Omega(n) queries
the list is sorted. Each repeated selection query takes constant time, and each
repeated search query takes O(log n) time. The two query types can be
interleaved freely. By plugging different pivot selection methods into our
framework, these results can, for example, become randomized expected time or
deterministic worst-case time. Our methods are easy to implement, and we show
they perform well in practice
Worst-Case Efficient Sorting with QuickMergesort
The two most prominent solutions for the sorting problem are Quicksort and
Mergesort. While Quicksort is very fast on average, Mergesort additionally
gives worst-case guarantees, but needs extra space for a linear number of
elements. Worst-case efficient in-place sorting, however, remains a challenge:
the standard solution, Heapsort, suffers from a bad cache behavior and is also
not overly fast for in-cache instances.
In this work we present median-of-medians QuickMergesort (MoMQuickMergesort),
a new variant of QuickMergesort, which combines Quicksort with Mergesort
allowing the latter to be implemented in place. Our new variant applies the
median-of-medians algorithm for selecting pivots in order to circumvent the
quadratic worst case. Indeed, we show that it uses at most
comparisons for large enough.
We experimentally confirm the theoretical estimates and show that the new
algorithm outperforms Heapsort by far and is only around 10% slower than
Introsort (std::sort implementation of stdlibc++), which has a rather poor
guarantee for the worst case. We also simulate the worst case, which is only
around 10% slower than the average case. In particular, the new algorithm is a
natural candidate to replace Heapsort as a worst-case stopper in Introsort
The analysis of approximate quickselect and related problems
Approximate Quickselect, a simple modification of the well known Quickselect algorithm for selection, can be used to efficiently find an element with rank k in a given range [i..j], out of n given elements. We study basic cost measures of Approximate Quickselect by computing exact and asymptotic results for the expected number of passes, comparisons and data moves during the execution of this algorithm. The key element appearing in the analysis of Approximate Quickselect is a trivariate recurrence that we solve in full generality. The general solution of the recurrence proves to be very useful, as it allows us to tackle several related problems, besides the analysis that originally motivated us. In particular, we have been able to carry out a precise analysis of the expected number of moves of the ith element when selecting the jth smallest element with standard Quickselect, where we are able to give both exact and asymptotic results. Moreover, we can apply our general results to obtain exact and asymptotic results for several parameters in binary search trees, namely the expected number of common ancestors of the nodes with rank i and j, the expected size of the subtree rooted at the least common ancestor of the nodes with rank i and j, and the expected distance between the nodes of ranks i and j
Perspects in astrophysical databases
Astrophysics has become a domain extremely rich of scientific data. Data
mining tools are needed for information extraction from such large datasets.
This asks for an approach to data management emphasizing the efficiency and
simplicity of data access; efficiency is obtained using multidimensional access
methods and simplicity is achieved by properly handling metadata. Moreover,
clustering and classification techniques on large datasets pose additional
requirements in terms of computation and memory scalability and
interpretability of results. In this study we review some possible solutions
Analysis of pivot sampling in dual-pivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme
The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0041-7The new dual-pivot Quicksort by Vladimir Yaroslavskiy-used in Oracle's Java runtime library since version 7-features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.Peer ReviewedPostprint (author's final draft
- …