247,759 research outputs found
Engineering Faster Sorters for Small Sets of Items
Sorting a set of items is a task that can be useful by itself or as a
building block for more complex operations. That is why a lot of effort has
been put into finding sorting algorithms that sort large sets as fast as
possible. But the more sophisticated and complex the algorithms become, the
less efficient they are for small sets of items due to large constant factors.
We aim to determine if there is a faster way than insertion sort to sort small
sets of items to provide a more efficient base case sorter. We looked at
sorting networks, at how they can improve the speed of sorting few elements,
and how to implement them in an efficient manner by using conditional moves.
Since sorting networks need to be implemented explicitly for each set size,
providing networks for larger sizes becomes less efficient due to increased
code sizes. To also enable the sorting of slightly larger base cases, we
adapted sample sort to Register Sample Sort, to break down those larger sets
into sizes that can in turn be sorted by sorting networks. From our experiments
we found that when sorting only small sets, the sorting networks outperform
insertion sort by a factor of at least 1.76 for any array size between six and
sixteen, and by a factor of 2.72 on average across all machines and array
sizes. When integrating sorting networks as a base case sorter into Quicksort,
we achieved far less performance improvements, which is probably due to the
networks having a larger code size and cluttering the L1 instruction cache. But
for x86 machines with a larger L1 instruction cache of 64 KiB or more, we
obtained speedups of 12.7% when using sorting networks as a base case sorter in
std::sort. In conclusion, the desired improvement in speed could only be
achieved under special circumstances, but the results clearly show the
potential of using conditional moves in the field of sorting algorithms.Comment: arXiv admin note: substantial text overlap with arXiv:1908.0811
QuickXsort: Efficient Sorting with n log n - 1.399n +o(n) Comparisons on Average
In this paper we generalize the idea of QuickHeapsort leading to the notion
of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an
internal sorting algorithm if X satisfies certain natural conditions.
With QuickWeakHeapsort and QuickMergesort we present two examples for the
QuickXsort-construction. Both are efficient algorithms that incur approximately
n log n - 1.26n +o(n) comparisons on the average. A worst case of n log n +
O(n) comparisons can be achieved without significantly affecting the average
case.
Furthermore, we describe an implementation of MergeInsertion for small n.
Taking MergeInsertion as a base case for QuickMergesort, we establish a
worst-case efficient sorting algorithm calling for n log n - 1.3999n + o(n)
comparisons on average. QuickMergesort with constant size base cases shows the
best performance on practical inputs: when sorting integers it is slower by
only 15% to STL-Introsort
An analytical approach to sorting in periodic potentials
There has been a recent revolution in the ability to manipulate
micrometer-sized objects on surfaces patterned by traps or obstacles of
controllable configurations and shapes. One application of this technology is
to separate particles driven across such a surface by an external force
according to some particle characteristic such as size or index of refraction.
The surface features cause the trajectories of particles driven across the
surface to deviate from the direction of the force by an amount that depends on
the particular characteristic, thus leading to sorting. While models of this
behavior have provided a good understanding of these observations, the
solutions have so far been primarily numerical. In this paper we provide
analytic predictions for the dependence of the angle between the direction of
motion and the external force on a number of model parameters for periodic as
well as random surfaces. We test these predictions against exact numerical
simulations
An Efficient Multiway Mergesort for GPU Architectures
Sorting is a primitive operation that is a building block for countless
algorithms. As such, it is important to design sorting algorithms that approach
peak performance on a range of hardware architectures. Graphics Processing
Units (GPUs) are particularly attractive architectures as they provides massive
parallelism and computing power. However, the intricacies of their compute and
memory hierarchies make designing GPU-efficient algorithms challenging. In this
work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway
mergesort algorithm. MMS employs a new partitioning technique that exposes the
parallelism needed by modern GPU architectures. To the best of our knowledge,
MMS is the first sorting algorithm for the GPU that is asymptotically optimal
in terms of global memory accesses and that is completely free of shared memory
bank conflicts.
We realize an initial implementation of MMS, evaluate its performance on
three modern GPU architectures, and compare it to competitive implementations
available in state-of-the-art GPU libraries. Despite these implementations
being highly optimized, MMS compares favorably, achieving performance
improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art
algorithms are susceptible to bank conflicts. We find that for certain inputs
that cause these algorithms to incur large numbers of bank conflicts, MMS can
achieve up to a 37.6% speedup over its fastest competitor. Overall, even though
its current implementation is not fully optimized, due to its efficient use of
the memory hierarchy, MMS outperforms the fastest comparison-based sorting
implementations available to date
On the average running time of odd-even merge sort
This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where , the size of the input, is an arbitrary multiple of the number of processors used. We show that Batcher's odd-even merge (for two sorted lists of length each) can be implemented to run in time on the average, and that odd-even merge sort can be implemented to run in time on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if . The constants involved are also quite small
Improved Average Complexity for Comparison-Based Sorting
This paper studies the average complexity on the number of comparisons for
sorting algorithms. Its information-theoretic lower bound is . For many efficient algorithms, the first term is easy to
achieve and our focus is on the (negative) constant factor of the linear term.
The current best value is for the MergeInsertion sort. Our new value
is , narrowing the gap by some . An important building block of
our algorithm is "two-element insertion," which inserts two numbers and
, , into a sorted sequence . This insertion algorithm is still
sufficiently simple for rigorous mathematical analysis and works well for a
certain range of the length of for which the simple binary insertion does
not, thus allowing us to take a complementary approach with the binary
insertion.Comment: 21 pages, 2 figure
- …