2,772 research outputs found
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
Parallel String Sample Sort
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we propose string sample sort. The algorithm makes effective use of
the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Additionally, we parallelize variants of multikey
quicksort and radix sort that are also useful in certain situations.Comment: 34 pages, 7 figures and 12 table
Why Is Dual-Pivot Quicksort Fast?
I discuss the new dual-pivot Quicksort that is nowadays used to sort arrays
of primitive types in Java. I sketch theoretical analyses of this algorithm
that offer a possible, and in my opinion plausible, explanation why (a)
dual-pivot Quicksort is faster than the previously used (classic) Quicksort and
(b) why this improvement was not already found much earlier.Comment: extended abstract for Theorietage 2015
(https://www.uni-trier.de/index.php?id=55089) (v2 fixes a small bug in the
pseudocode
Analysis of pivot sampling in dual-pivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme
The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0041-7The new dual-pivot Quicksort by Vladimir Yaroslavskiy-used in Oracle's Java runtime library since version 7-features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.Peer ReviewedPostprint (author's final draft
Contract-Based General-Purpose GPU Programming
Using GPUs as general-purpose processors has revolutionized parallel
computing by offering, for a large and growing set of algorithms, massive
data-parallelization on desktop machines. An obstacle to widespread adoption,
however, is the difficulty of programming them and the low-level control of the
hardware required to achieve good performance. This paper suggests a
programming library, SafeGPU, that aims at striking a balance between
programmer productivity and performance, by making GPU data-parallel operations
accessible from within a classical object-oriented programming language. The
solution is integrated with the design-by-contract approach, which increases
confidence in functional program correctness by embedding executable program
specifications into the program text. We show that our library leads to modular
and maintainable code that is accessible to GPGPU non-experts, while providing
performance that is comparable with hand-written CUDA code. Furthermore,
runtime contract checking turns out to be feasible, as the contracts can be
executed on the GPU
Searching for invariants using genetic programming and mutation testing
Invariants are concise and useful descriptions of a program's behaviour. As most programs are not annotated with invariants, previous research has attempted to automatically generate them from source code. In this paper, we propose a new approach to invariant generation using search. We reuse the trace generation front-end of existing tool Daikon and integrate it with genetic programming and a mutation testing tool. We demonstrate that our system can find the same invariants through search that Daikon produces via template instantiation, and we also find useful invariants that Daikon does not. We then present a method of ranking invariants such that we can identify those that are most interesting, through a novel application of program mutation
Quicksort asymptotics
The number of comparisons X_n used by Quicksort to sort an array of n
distinct numbers has mean mu_n of order n log n and standard deviation of order
n. Using different methods, Regnier and Roesler each showed that the normalized
variate Y_n := (X_n - mu_n) / n converges in distribution, say to Y; the
distribution of Y can be characterized as the unique fixed point with zero mean
of a certain distributional transformation.
We provide the first rates of convergence for the distribution of Y_n to that
of Y, using various metrics. In particular, we establish the bound 2 n^{- 1 /
2} in the d_2-metric, and the rate O(n^{epsilon - (1 / 2)}) for
Kolmogorov-Smirnov distance, for any positive epsilon.Comment: 23 pages. See also http://www.mts.jhu.edu/~fill/ and
http://www.math.uu.se/~svante/ . To be submitted for publication in May, 200
- …