Search CORE

15,245 research outputs found

Parallel String Sample Sort

Author: J. Kärkkäinen
J. Wassenberg
K. Mehlhorn
P. Sanders
P.M. McIlroy
R. Sinha
R. Sinha
R. Sinha
T. Hagerup
W. Ng
Publication venue
Publication date: 01/01/2013
Field of study

arXiv.org e-Print Archive

CiteSeerX

Crossref

KITopen

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Author: Cisco
Davidson A.
Dehne F.
Harris M.
Kipfer P.
Krueger J.
Merrill D.
Wassenberg J.
Ye X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/05/2017
Field of study

Sorting is at the core of many database operations, such as index creation, sort-merge joins, and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable endeavour. Over the past few years, several improvements have been proposed for sorting on GPUs, leading to the first radix sort implementations that achieve a sorting rate of over one billion 32-bit keys per second. Yet, state-of-the-art approaches are heavily memory bandwidth-bound, as they require substantially more memory transfers than their CPU-based counterparts. Our work proposes a novel approach that almost halves the amount of memory transfers and, therefore, considerably lifts the memory bandwidth limitation. Being able to sort two gigabytes of eight-byte records in as little as 50 milliseconds, our approach achieves a 2.32-fold improvement over the state-of-the-art GPU-based radix sort for uniform distributions, sustaining a minimum speed-up of no less than a factor of 1.66 for skewed distributions. To address inputs that either do not reside on the GPU or exceed the available device memory, we build on our efficient GPU sorting approach with a pipelined heterogeneous sorting algorithm that mitigates the overhead associated with PCIe data transfers. Comparing the end-to-end sorting performance to the state-of-the-art CPU-based radix sort running 16 threads, our heterogeneous approach achieves a 2.06-fold and a 1.53-fold improvement for sorting 64 GB key-value pairs with a skewed and a uniform distribution, respectively.Comment: 16 pages, accepted at SIGMOD 201

arXiv.org e-Print Archive

Crossref

Cache-Oblivious Peeling of Random Hypergraphs

Author: Belazzougui Djamal
Boldi Paolo
Ottaviano Giuseppe
Venturini Rossano
Vigna Sebastiano
Publication venue
Publication date: 02/12/2013
Field of study

The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random

r

-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires

O(\mathrm{sort}(n))

I/Os and

O(n \log n)

time to peel a random hypergraph with

n

edges. We experimentally evaluate the performance of our implementation of this algorithm in a real-world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of

7.6

billion keys in less than

21

hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

Archivio della Ricerca - Università di Pisa

Engineering Parallel String Sorting

Author: Bingmann Timo
Eberle Andreas
Sanders Peter
Publication venue
Publication date: 09/03/2014
Field of study

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

arXiv.org e-Print Archive

CiteSeerX

KITopen