247,759 research outputs found

    Engineering Faster Sorters for Small Sets of Items

    Get PDF
    Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as fast as possible. But the more sophisticated and complex the algorithms become, the less efficient they are for small sets of items due to large constant factors. We aim to determine if there is a faster way than insertion sort to sort small sets of items to provide a more efficient base case sorter. We looked at sorting networks, at how they can improve the speed of sorting few elements, and how to implement them in an efficient manner by using conditional moves. Since sorting networks need to be implemented explicitly for each set size, providing networks for larger sizes becomes less efficient due to increased code sizes. To also enable the sorting of slightly larger base cases, we adapted sample sort to Register Sample Sort, to break down those larger sets into sizes that can in turn be sorted by sorting networks. From our experiments we found that when sorting only small sets, the sorting networks outperform insertion sort by a factor of at least 1.76 for any array size between six and sixteen, and by a factor of 2.72 on average across all machines and array sizes. When integrating sorting networks as a base case sorter into Quicksort, we achieved far less performance improvements, which is probably due to the networks having a larger code size and cluttering the L1 instruction cache. But for x86 machines with a larger L1 instruction cache of 64 KiB or more, we obtained speedups of 12.7% when using sorting networks as a base case sorter in std::sort. In conclusion, the desired improvement in speed could only be achieved under special circumstances, but the results clearly show the potential of using conditional moves in the field of sorting algorithms.Comment: arXiv admin note: substantial text overlap with arXiv:1908.0811

    QuickXsort: Efficient Sorting with n log n - 1.399n +o(n) Comparisons on Average

    Full text link
    In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X satisfies certain natural conditions. With QuickWeakHeapsort and QuickMergesort we present two examples for the QuickXsort-construction. Both are efficient algorithms that incur approximately n log n - 1.26n +o(n) comparisons on the average. A worst case of n log n + O(n) comparisons can be achieved without significantly affecting the average case. Furthermore, we describe an implementation of MergeInsertion for small n. Taking MergeInsertion as a base case for QuickMergesort, we establish a worst-case efficient sorting algorithm calling for n log n - 1.3999n + o(n) comparisons on average. QuickMergesort with constant size base cases shows the best performance on practical inputs: when sorting integers it is slower by only 15% to STL-Introsort

    An analytical approach to sorting in periodic potentials

    Get PDF
    There has been a recent revolution in the ability to manipulate micrometer-sized objects on surfaces patterned by traps or obstacles of controllable configurations and shapes. One application of this technology is to separate particles driven across such a surface by an external force according to some particle characteristic such as size or index of refraction. The surface features cause the trajectories of particles driven across the surface to deviate from the direction of the force by an amount that depends on the particular characteristic, thus leading to sorting. While models of this behavior have provided a good understanding of these observations, the solutions have so far been primarily numerical. In this paper we provide analytic predictions for the dependence of the angle between the direction of motion and the external force on a number of model parameters for periodic as well as random surfaces. We test these predictions against exact numerical simulations

    An Efficient Multiway Mergesort for GPU Architectures

    Full text link
    Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, the intricacies of their compute and memory hierarchies make designing GPU-efficient algorithms challenging. In this work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway mergesort algorithm. MMS employs a new partitioning technique that exposes the parallelism needed by modern GPU architectures. To the best of our knowledge, MMS is the first sorting algorithm for the GPU that is asymptotically optimal in terms of global memory accesses and that is completely free of shared memory bank conflicts. We realize an initial implementation of MMS, evaluate its performance on three modern GPU architectures, and compare it to competitive implementations available in state-of-the-art GPU libraries. Despite these implementations being highly optimized, MMS compares favorably, achieving performance improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art algorithms are susceptible to bank conflicts. We find that for certain inputs that cause these algorithms to incur large numbers of bank conflicts, MMS can achieve up to a 37.6% speedup over its fastest competitor. Overall, even though its current implementation is not fully optimized, due to its efficient use of the memory hierarchy, MMS outperforms the fastest comparison-based sorting implementations available to date

    On the average running time of odd-even merge sort

    Get PDF
    This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where nn, the size of the input, is an arbitrary multiple of the number pp of processors used. We show that Batcher's odd-even merge (for two sorted lists of length nn each) can be implemented to run in time O((n/p)(log(2+p2/n)))O((n/p)(\log (2+p^2/n))) on the average, and that odd-even merge sort can be implemented to run in time O((n/p)(logn+logplog(2+p2/n)))O((n/p)(\log n+\log p\log (2+p^2/n))) on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of nn elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if np2n\geq p^2. The constants involved are also quite small

    Improved Average Complexity for Comparison-Based Sorting

    Full text link
    This paper studies the average complexity on the number of comparisons for sorting algorithms. Its information-theoretic lower bound is nlgn1.4427n+O(logn)n \lg n - 1.4427n + O(\log n). For many efficient algorithms, the first nlgnn\lg n term is easy to achieve and our focus is on the (negative) constant factor of the linear term. The current best value is 1.3999-1.3999 for the MergeInsertion sort. Our new value is 1.4106-1.4106, narrowing the gap by some 25%25\%. An important building block of our algorithm is "two-element insertion," which inserts two numbers AA and BB, A<BA<B, into a sorted sequence TT. This insertion algorithm is still sufficiently simple for rigorous mathematical analysis and works well for a certain range of the length of TT for which the simple binary insertion does not, thus allowing us to take a complementary approach with the binary insertion.Comment: 21 pages, 2 figure
    corecore