Search CORE

633 research outputs found

Parallel String Sample Sort

Author: J. Kärkkäinen
J. Wassenberg
K. Mehlhorn
P. Sanders
P.M. McIlroy
R. Sinha
R. Sinha
R. Sinha
T. Hagerup
W. Ng
Publication venue
Publication date: 01/01/2013
Field of study

arXiv.org e-Print Archive

CiteSeerX

Crossref

KITopen

Mach-Based Channel Library

Author: Chandy K. Mani
Manohar Rajit
Publication venue: 'California Institute of Technology Library'
Publication date: 07/07/1994
Field of study

[No Abstract

Caltech Authors

Engineering Parallel String Sorting

Author: Bingmann Timo
Eberle Andreas
Sanders Peter
Publication venue
Publication date: 09/03/2014
Field of study

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

arXiv.org e-Print Archive

CiteSeerX

KITopen

An Efficient Multiway Mergesort for GPU Architectures

Author: Casanova Henri
Iacono John
Karsin Ben
Sitchinava Nodari
Weichert Volker
Publication venue
Publication date: 01/01/2017
Field of study

Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, the intricacies of their compute and memory hierarchies make designing GPU-efficient algorithms challenging. In this work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway mergesort algorithm. MMS employs a new partitioning technique that exposes the parallelism needed by modern GPU architectures. To the best of our knowledge, MMS is the first sorting algorithm for the GPU that is asymptotically optimal in terms of global memory accesses and that is completely free of shared memory bank conflicts. We realize an initial implementation of MMS, evaluate its performance on three modern GPU architectures, and compare it to competitive implementations available in state-of-the-art GPU libraries. Despite these implementations being highly optimized, MMS compares favorably, achieving performance improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art algorithms are susceptible to bank conflicts. We find that for certain inputs that cause these algorithms to incur large numbers of bank conflicts, MMS can achieve up to a 37.6% speedup over its fastest competitor. Overall, even though its current implementation is not fully optimized, due to its efficient use of the memory hierarchy, MMS outperforms the fastest comparison-based sorting implementations available to date

arXiv.org e-Print Archive

DI-fusion

Exploiting non-constant safe memory in resilient algorithms and data structures

Author: DE STEFANI LORENZO
SILVESTRI FRANCESCO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

We extend the Faulty RAM model by Finocchi and Italiano (2008) by adding a safe memory of arbitrary size

S

, and we then derive tradeoffs between the performance of resilient algorithmic techniques and the size of the safe memory. Let

\delta

and

\alpha

denote, respectively, the maximum amount of faults which can happen during the execution of an algorithm and the actual number of occurred faults, with

\alpha \leq \delta

. We propose a resilient algorithm for sorting

n

entries which requires

O\left(n\log n+\alpha (\delta/S + \log S)\right)

time and uses

\Theta(S)

safe memory words. Our algorithm outperforms previous resilient sorting algorithms which do not exploit the available safe memory and require

O\left(n\log n+ \alpha\delta\right)

time. Finally, we exploit our sorting algorithm for deriving a resilient priority queue. Our implementation uses

\Theta(S)

safe memory words and

\Theta(n)

faulty memory words for storing

n

keys, and requires

O\left(\log n + \delta/S\right)

amortized time for each insert and deletemin operation. Our resilient priority queue improves the

O\left(\log n + \delta\right)

amortized time required by the state of the art.Comment: To appear in Theoretical Computer Science, 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Heaps and heapsort on secondary storage

Author: Fadel R.
Jakobsen K.V.
Katajainen J.
Teuhola J.
Publication venue: Published by Elsevier B.V.
Publication date: 01/01/1999
Field of study

AbstractA heap structure designed for secondary storage is suggested that tries to make the best use of the available buffer space in primary memory. The heap is a complete multi-way tree, with multi-page blocks of records as nodes, satisfying a generalized heap property. A special feature of the tree is that the nodes may be partially filled, as in B-trees. The structure is complemented with priority-queue operations insert and delete-max. When handling a sequence of S operations, the number of page transfers performed is shown to be O(∑i = 1S(1P) log(MP)(NiP)), where P denotes the number of records fitting into a page, M the capacity of the buffer space in records, and Ni, the number of records in the heap prior to the ith operation (assuming P ⩾ 1 and S > M ⩾ c · P, where c is a small positive constant). The number of comparisons required when handling the sequence is O(∑i = 1S log2 Ni). Using the suggested data structure we obtain an optimal external heapsort that performs O((NP) log(MP)(NP)) page transfers and O(N log2 N) comparisons in the worst case when sorting N records

Elsevier - Publisher Connector

Copenhagen University Research Information System