143 research outputs found
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools
This dissertation focuses on two fundamental sorting problems: string sorting
and suffix sorting. The first part considers parallel string sorting on
shared-memory multi-core machines, the second part external memory suffix
sorting using the induced sorting principle, and the third part distributed
external memory suffix sorting with a new distributed algorithmic big data
framework named Thrill.Comment: 396 pages, dissertation, Karlsruher Instituts f\"ur Technologie
(2018). arXiv admin note: text overlap with arXiv:1101.3448 by other author
Scalable Construction of Text Indexes with Thrill
The suffix array is the key to efficient solutions for myriads of string processing problems in different application domains, like data compression, data mining, or bioinformatics. With the rapid growth of available data, suffix array construction algorithms have to be adapted to advanced computational models such as external memory and distributed computing. In this article, we present five suffix array construction algorithms utilizing the new algorithmic big data batch processing framework Thrill, which allows scalable processing of input sizes on distributed systems in orders of magnitude that have not been considered before
Engineering Faster Sorters for Small Sets of Items
Sorting a set of items is a task that can be useful by itself or as a
building block for more complex operations. That is why a lot of effort has
been put into finding sorting algorithms that sort large sets as fast as
possible. But the more sophisticated and complex the algorithms become, the
less efficient they are for small sets of items due to large constant factors.
We aim to determine if there is a faster way than insertion sort to sort small
sets of items to provide a more efficient base case sorter. We looked at
sorting networks, at how they can improve the speed of sorting few elements,
and how to implement them in an efficient manner by using conditional moves.
Since sorting networks need to be implemented explicitly for each set size,
providing networks for larger sizes becomes less efficient due to increased
code sizes. To also enable the sorting of slightly larger base cases, we
adapted sample sort to Register Sample Sort, to break down those larger sets
into sizes that can in turn be sorted by sorting networks. From our experiments
we found that when sorting only small sets, the sorting networks outperform
insertion sort by a factor of at least 1.76 for any array size between six and
sixteen, and by a factor of 2.72 on average across all machines and array
sizes. When integrating sorting networks as a base case sorter into Quicksort,
we achieved far less performance improvements, which is probably due to the
networks having a larger code size and cluttering the L1 instruction cache. But
for x86 machines with a larger L1 instruction cache of 64 KiB or more, we
obtained speedups of 12.7% when using sorting networks as a base case sorter in
std::sort. In conclusion, the desired improvement in speed could only be
achieved under special circumstances, but the results clearly show the
potential of using conditional moves in the field of sorting algorithms.Comment: arXiv admin note: substantial text overlap with arXiv:1908.0811
Engineering faster sorters for small sets of items
Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as efficiently as possible. But the more sophisticated and complex the algorithms become, the less efficient they are for small sets of items due to large constant factors. A relatively simple sorting algorithm that is often used as a base case sorter is insertion sort, because it has small code size and small constant factors influencing its execution time. We aim to determine if there is a faster way to sort small sets of items to provide an efficient base case sorter. We looked at sorting networks, at how they can improve the speed of sorting few elements, and how to implement them in an efficient manner using conditional moves. Since sorting networks need to be implemented explicitly for each set size, providing networks for larger sizes becomes less efficient due to increased code sizes. To also enable the sorting of slightly larger base cases, we adapted sample sort to Register Sample Sort, to break down those larger sets into sizes that can in turn be sorted by sorting networks. From our experiments we found that when sorting only small sets of integers, the sorting networks outperform insertion sort by a factor of at least 1.76 for any array size between six and 16, and by a factor of 2.72 on average across all machines and array sizes. When integrating sorting networks as a base case sorter into Quicksort, we achieved far less performance improvements over using insertion sort, which is probably due to the networks having a larger code size and cluttering the L1 instruction cache. The same effect occurs when including Register Sample Sort as a base case sorter for IPS4o. But for x86 machines that have a larger L1 instruction cache of 64 KiB or more, we obtained speedups of 12.7% when using sorting networks as a base case sorter in std::sort, and of 5%–6% when integrating Register Sample Sort as a base case sorter into IPS4o, each in comparison to using insertion sort as the base case sorter. In conclusion, the desired improvement in speed could only be achieved under special circumstances, but the results clearly show the potential of using conditional moves in the field of sorting algorithms
- …