110,035 research outputs found
Highly scalable parallel sorting
Abstract — Sorting is a commonly used process with a wide breadth of applications in the high performance computing field. Early research in parallel processing has provided us with comprehensive analysis and theory for parallel sorting algorithms. However, modern super-computers have advanced rapidly in size and changed significantly in architecture, forcing new adaptations to these algorithms. To fully utilize the potential of highly parallel machines, tens of thousands of processors are used. Efficiently scaling parallel sorting on machines of this magnitude is inhibited by the communication-intensive problem of migrating large amounts of data between processors. The challenge is to design a highly scalable sorting algorithm that uses minimal communication, max-imizes overlap between computation and communication, and uses memory efficiently. This paper presents a scal-able extension of the Histogram Sorting method, making fundamental modifications to the original algorithm in order to minimize message contention and exploit overlap. We implement Histogram Sort, Sample Sort, and Radix Sort in CHARM++ and compare their performance. The choice of algorithm as well as the importance of the optimizations is validated by performance tests on two predominant modern supercomputer architectures: XT4 at ORNL (Jaguar) and Blue Gene/P at ANL (Intrepid). I
Twenty-Five Comparators is Optimal when Sorting Nine Inputs (and Twenty-Nine for Ten)
This paper describes a computer-assisted non-existence proof of nine-input
sorting networks consisting of 24 comparators, hence showing that the
25-comparator sorting network found by Floyd in 1964 is optimal. As a
corollary, we obtain that the 29-comparator network found by Waksman in 1969 is
optimal when sorting ten inputs.
This closes the two smallest open instances of the optimal size sorting
network problem, which have been open since the results of Floyd and Knuth from
1966 proving optimality for sorting networks of up to eight inputs.
The proof involves a combination of two methodologies: one based on
exploiting the abundance of symmetries in sorting networks, and the other,
based on an encoding of the problem to that of satisfiability of propositional
logic. We illustrate that, while each of these can single handed solve smaller
instances of the problem, it is their combination which leads to an efficient
solution for nine inputs.Comment: 18 page
- …