55 research outputs found
In search of the fastest sorting algorithm
This paper explores in a chronological way the concepts, structures, and
algorithms that programmers and computer scientists have tried out in their attempt to
produce and improve the process of sorting. The measure of ‘fastness’ in this paper is
mainly given in terms of the Big O notation.peer-reviewe
Fast -NNG construction with GPU-based quick multi-select
In this paper we describe a new brute force algorithm for building the
-Nearest Neighbor Graph (-NNG). The -NNG algorithm has many
applications in areas such as machine learning, bio-informatics, and clustering
analysis. While there are very efficient algorithms for data of low dimensions,
for high dimensional data the brute force search is the best algorithm. There
are two main parts to the algorithm: the first part is finding the distances
between the input vectors which may be formulated as a matrix multiplication
problem. The second is the selection of the -NNs for each of the query
vectors. For the second part, we describe a novel graphics processing unit
(GPU) -based multi-select algorithm based on quick sort. Our optimization makes
clever use of warp voting functions available on the latest GPUs along with
use-controlled cache. Benchmarks show significant improvement over
state-of-the-art implementations of the -NN search on GPUs
Simple and Fast BlockQuicksort using Lomuto's Partitioning Scheme
This paper presents simple variants of the BlockQuicksort algorithm described
by Edelkamp and Weiss (ESA 2016). The simplification is achieved by using
Lomuto's partitioning scheme instead of Hoare's crossing pointer technique to
partition the input. To achieve a robust sorting algorithm that works well on
many different input types, the paper introduces a novel two-pivot variant of
Lomuto's partitioning scheme. A surprisingly simple twist to the generic
two-pivot quicksort approach makes the algorithm robust. The paper provides an
analysis of the theoretical properties of the proposed algorithms and compares
them to their competitors. The analysis shows that Lomuto-based approaches
incur a higher average sorting cost than the Hoare-based approach of
BlockQuicksort. Moreover, the analysis is particularly useful to reason about
pivot choices that suit the two-pivot approach. An extensive experimental study
shows that, despite their worse theoretical behavior, the simpler variants
perform as well as the original version of BlockQuicksort.Comment: Accepted at ALENEX 201
Data Structures & Algorithm Analysis in C++
This is the textbook for CSIS 215 at Liberty University.https://digitalcommons.liberty.edu/textbooks/1005/thumbnail.jp
How Good Is Multi-Pivot Quicksort?
Multi-Pivot Quicksort refers to variants of classical quicksort where in the
partitioning step pivots are used to split the input into segments.
For many years, multi-pivot quicksort was regarded as impractical, but in 2009
a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the
standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al.
introduced an even faster algorithm that uses three pivots. This paper studies
what possible advantages multi-pivot quicksort might offer in general. The
contributions are as follows: Natural comparison-optimal algorithms for
multi-pivot quicksort are devised and analyzed. The analysis shows that the
benefits of using multiple pivots with respect to the average comparison count
are marginal and these strategies are inferior to simpler strategies such as
the well known median-of- approach. A substantial part of the partitioning
cost is caused by rearranging elements. A rigorous analysis of an algorithm for
rearranging elements in the partitioning step is carried out, observing mainly
how often array cells are accessed during partitioning. The algorithm behaves
best if 3 to 5 pivots are used. Experiments show that this translates into good
cache behavior and is closest to predicting observed running times of
multi-pivot quicksort algorithms. Finally, it is studied how choosing pivots
from a sample affects sorting cost. The study is theoretical in the sense that
although the findings motivate design recommendations for multipivot quicksort
algorithms that lead to running time improvements over known algorithms in an
experimental setting, these improvements are small.Comment: Submitted to a journal, v2: Fixed statement of Gibb's inequality, v3:
Revised version, especially improving on the experiments in Section
Notes on Randomized Algorithms
Lecture notes for the Yale Computer Science course CPSC 469/569 Randomized
Algorithms. Suitable for use as a supplementary text for an introductory
graduate or advanced undergraduate course on randomized algorithms. Discusses
tools from probability theory, including random variables and expectations,
union bound arguments, concentration bounds, applications of martingales and
Markov chains, and the Lov\'asz Local Lemma. Algorithmic topics include
analysis of classic randomized algorithms such as Quicksort and Hoare's FIND,
randomized tree data structures, hashing, Markov chain Monte Carlo sampling,
randomized approximate counting, derandomization, quantum computing, and some
examples of randomized distributed algorithms
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
- …