Search CORE

247,759 research outputs found

Engineering Faster Sorters for Small Sets of Items

Author: Bingmann Timo
Marianczuk Jasper
Sanders Peter
Publication venue
Publication date: 02/10/2020
Field of study

Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as fast as possible. But the more sophisticated and complex the algorithms become, the less efficient they are for small sets of items due to large constant factors. We aim to determine if there is a faster way than insertion sort to sort small sets of items to provide a more efficient base case sorter. We looked at sorting networks, at how they can improve the speed of sorting few elements, and how to implement them in an efficient manner by using conditional moves. Since sorting networks need to be implemented explicitly for each set size, providing networks for larger sizes becomes less efficient due to increased code sizes. To also enable the sorting of slightly larger base cases, we adapted sample sort to Register Sample Sort, to break down those larger sets into sizes that can in turn be sorted by sorting networks. From our experiments we found that when sorting only small sets, the sorting networks outperform insertion sort by a factor of at least 1.76 for any array size between six and sixteen, and by a factor of 2.72 on average across all machines and array sizes. When integrating sorting networks as a base case sorter into Quicksort, we achieved far less performance improvements, which is probably due to the networks having a larger code size and cluttering the L1 instruction cache. But for x86 machines with a larger L1 instruction cache of 64 KiB or more, we obtained speedups of 12.7% when using sorting networks as a base case sorter in std::sort. In conclusion, the desired improvement in speed could only be achieved under special circumstances, but the results clearly show the potential of using conditional moves in the field of sorting algorithms.Comment: arXiv admin note: substantial text overlap with arXiv:1908.0811

arXiv.org e-Print Archive

KITopen

QuickXsort: Efficient Sorting with n log n - 1.399n +o(n) Comparisons on Average

Author: A. Elmasry
C. Martínez
D. Cantone
D.R. Musser
I. Wegener
J. Ford
J. Katajainen
K. Reinhardt
M. Blum
R.D. Dutton
S. Edelkamp
V. Diekert
Publication venue
Publication date: 11/07/2013
Field of study

In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X satisfies certain natural conditions. With QuickWeakHeapsort and QuickMergesort we present two examples for the QuickXsort-construction. Both are efficient algorithms that incur approximately n log n - 1.26n +o(n) comparisons on the average. A worst case of n log n + O(n) comparisons can be achieved without significantly affecting the average case. Furthermore, we describe an implementation of MergeInsertion for small n. Taking MergeInsertion as a base case for QuickMergesort, we establish a worst-case efficient sorting algorithm calling for n log n - 1.3999n + o(n) comparisons on average. QuickMergesort with constant size base cases shows the best performance on practical inputs: when sorting integers it is slower by only 15% to STL-Introsort

arXiv.org e-Print Archive

Crossref

An analytical approach to sorting in periodic potentials

Author: A. M. Lacasta
H. Risken
J. M. Sancho
James P. Gleeson
Katja Lindenberg
Publication venue: 'American Physical Society (APS)'
Publication date: 28/12/2005
Field of study

There has been a recent revolution in the ability to manipulate micrometer-sized objects on surfaces patterned by traps or obstacles of controllable configurations and shapes. One application of this technology is to separate particles driven across such a surface by an external force according to some particle characteristic such as size or index of refraction. The surface features cause the trajectories of particles driven across the surface to deviate from the direction of the force by an amount that depends on the particular characteristic, thus leading to sorting. While models of this behavior have provided a good understanding of these observations, the solutions have so far been primarily numerical. In this paper we provide analytic predictions for the dependence of the angle between the direction of motion and the external force on a number of model parameters for periodic as well as random surfaces. We test these predictions against exact numerical simulations

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Diposit Digital de la Universitat de Barcelona

An Efficient Multiway Mergesort for GPU Architectures

Author: Casanova Henri
Iacono John
Karsin Ben
Sitchinava Nodari
Weichert Volker
Publication venue
Publication date: 01/01/2017
Field of study

Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, the intricacies of their compute and memory hierarchies make designing GPU-efficient algorithms challenging. In this work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway mergesort algorithm. MMS employs a new partitioning technique that exposes the parallelism needed by modern GPU architectures. To the best of our knowledge, MMS is the first sorting algorithm for the GPU that is asymptotically optimal in terms of global memory accesses and that is completely free of shared memory bank conflicts. We realize an initial implementation of MMS, evaluate its performance on three modern GPU architectures, and compare it to competitive implementations available in state-of-the-art GPU libraries. Despite these implementations being highly optimized, MMS compares favorably, achieving performance improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art algorithms are susceptible to bank conflicts. We find that for certain inputs that cause these algorithms to incur large numbers of bank conflicts, MMS can achieve up to a 37.6% speedup over its fastest competitor. Overall, even though its current implementation is not fully optimized, due to its efficient use of the memory hierarchy, MMS outperforms the fastest comparison-based sorting implementations available to date

arXiv.org e-Print Archive

DI-fusion

On the average running time of odd-even merge sort

Author: Rüb C.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1995
Field of study

This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where

n

, the size of the input, is an arbitrary multiple of the number

p

of processors used. We show that Batcher's odd-even merge (for two sorted lists of length

n

each) can be implemented to run in time

O((n/p)(\log (2+p^2/n)))

on the average, and that odd-even merge sort can be implemented to run in time

O((n/p)(\log n+\log p\log (2+p^2/n)))

on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of

n

elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if

n\geq p^2

. The constants involved are also quite small

MPG.PuRe

Improved Average Complexity for Comparison-Based Sorting

Author: DE Knuth
FK Hwang
GK Manacher
GK Manacher
J Schulte
LR Ford
M Ayala-Rincón
M Peczarski
M Peczarski
M Peczarski
M Thanh
Publication venue
Publication date: 02/05/2017
Field of study

This paper studies the average complexity on the number of comparisons for sorting algorithms. Its information-theoretic lower bound is

n \lg n - 1.4427n + O(\log n)

. For many efficient algorithms, the first

n\lg n

term is easy to achieve and our focus is on the (negative) constant factor of the linear term. The current best value is

-1.3999

for the MergeInsertion sort. Our new value is

-1.4106

, narrowing the gap by some

25\%

. An important building block of our algorithm is "two-element insertion," which inserts two numbers

A

and

B

A<B

, into a sorted sequence

T

. This insertion algorithm is still sufficiently simple for rigorous mathematical analysis and works well for a certain range of the length of

T

for which the simple binary insertion does not, thus allowing us to take a complementary approach with the binary insertion.Comment: 21 pages, 2 figure

arXiv.org e-Print Archive

Crossref