143 research outputs found

    Engineering Parallel String Sorting

    Get PDF
    We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

    Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

    Get PDF
    This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using the induced sorting principle, and the third part distributed external memory suffix sorting with a new distributed algorithmic big data framework named Thrill.Comment: 396 pages, dissertation, Karlsruher Instituts f\"ur Technologie (2018). arXiv admin note: text overlap with arXiv:1101.3448 by other author

    Scalable Construction of Text Indexes with Thrill

    Get PDF
    The suffix array is the key to efficient solutions for myriads of string processing problems in different application domains, like data compression, data mining, or bioinformatics. With the rapid growth of available data, suffix array construction algorithms have to be adapted to advanced computational models such as external memory and distributed computing. In this article, we present five suffix array construction algorithms utilizing the new algorithmic big data batch processing framework Thrill, which allows scalable processing of input sizes on distributed systems in orders of magnitude that have not been considered before

    Engineering Faster Sorters for Small Sets of Items

    Get PDF
    Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as fast as possible. But the more sophisticated and complex the algorithms become, the less efficient they are for small sets of items due to large constant factors. We aim to determine if there is a faster way than insertion sort to sort small sets of items to provide a more efficient base case sorter. We looked at sorting networks, at how they can improve the speed of sorting few elements, and how to implement them in an efficient manner by using conditional moves. Since sorting networks need to be implemented explicitly for each set size, providing networks for larger sizes becomes less efficient due to increased code sizes. To also enable the sorting of slightly larger base cases, we adapted sample sort to Register Sample Sort, to break down those larger sets into sizes that can in turn be sorted by sorting networks. From our experiments we found that when sorting only small sets, the sorting networks outperform insertion sort by a factor of at least 1.76 for any array size between six and sixteen, and by a factor of 2.72 on average across all machines and array sizes. When integrating sorting networks as a base case sorter into Quicksort, we achieved far less performance improvements, which is probably due to the networks having a larger code size and cluttering the L1 instruction cache. But for x86 machines with a larger L1 instruction cache of 64 KiB or more, we obtained speedups of 12.7% when using sorting networks as a base case sorter in std::sort. In conclusion, the desired improvement in speed could only be achieved under special circumstances, but the results clearly show the potential of using conditional moves in the field of sorting algorithms.Comment: arXiv admin note: substantial text overlap with arXiv:1908.0811

    Engineering faster sorters for small sets of items

    Get PDF
    Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as efficiently as possible. But the more sophisticated and complex the algorithms become, the less efficient they are for small sets of items due to large constant factors. A relatively simple sorting algorithm that is often used as a base case sorter is insertion sort, because it has small code size and small constant factors influencing its execution time. We aim to determine if there is a faster way to sort small sets of items to provide an efficient base case sorter. We looked at sorting networks, at how they can improve the speed of sorting few elements, and how to implement them in an efficient manner using conditional moves. Since sorting networks need to be implemented explicitly for each set size, providing networks for larger sizes becomes less efficient due to increased code sizes. To also enable the sorting of slightly larger base cases, we adapted sample sort to Register Sample Sort, to break down those larger sets into sizes that can in turn be sorted by sorting networks. From our experiments we found that when sorting only small sets of integers, the sorting networks outperform insertion sort by a factor of at least 1.76 for any array size between six and 16, and by a factor of 2.72 on average across all machines and array sizes. When integrating sorting networks as a base case sorter into Quicksort, we achieved far less performance improvements over using insertion sort, which is probably due to the networks having a larger code size and cluttering the L1 instruction cache. The same effect occurs when including Register Sample Sort as a base case sorter for IPS4o. But for x86 machines that have a larger L1 instruction cache of 64 KiB or more, we obtained speedups of 12.7% when using sorting networks as a base case sorter in std::sort, and of 5%–6% when integrating Register Sample Sort as a base case sorter into IPS4o, each in comparison to using insertion sort as the base case sorter. In conclusion, the desired improvement in speed could only be achieved under special circumstances, but the results clearly show the potential of using conditional moves in the field of sorting algorithms

    Practical Massively Parallel Sorting

    Get PDF
    • …
    corecore