65 research outputs found
Fast Parallel Algorithms for Basic Problems
Parallel processing is one of the most active research areas these days. We are interested in one aspect of parallel processing, i.e. the design and analysis of parallel algorithms. Here, we focus on non-numerical parallel algorithms for basic combinatorial problems, such as data structures, selection, searching, merging and sorting. The purposes of studying these types of problems are to obtain basic building blocks which will be useful in solving complex problems, and to develop fundamental algorithmic techniques.
In this thesis, we study the following problems: priority queues, multiple search and multiple selection, and reconstruction of a binary tree from its traversals. The research on priority queue was motivated by its various applications. The purpose of studying multiple search and multiple selection is to explore the relationships between four of the most fundamental problems in algorithm design, that is, selection, searching, merging and sorting; while our parallel solutions can be used as subroutines in algorithms for other problems. The research on the last problem, reconstruction of a binary tree from its traversals, was stimulated by a challenge proposed in a recent paper by Berkman et al. ( Highly Parallelizable Problems, STOC 89) to design doubly logarithmic time optimal parallel algorithms because a remarkably small number of such parallel algorithms exist
A parallel priority queue with fast updates for GPU architectures
The high computational throughput of modern graphics processing units (GPUs)
make them the de-facto architecture for high-performance computing
applications. However, to achieve peak performance, GPUs require highly
parallel workloads, as well as memory access patterns that exhibit good
locality of reference. As a result, many state-of-the-art algorithms and data
structures designed for GPUs sacrifice work-optimality to achieve the necessary
parallelism. Furthermore, some abstract data types are avoided completely due
to there being no corresponding data structure that performs well on the GPU.
One such abstract data type is the priority queue. Many well-known algorithms
rely on priority queue operations as a building block. While various priority
queue structures have been developed that are parallel, cache-aware, or
cache-oblivious, none has been shown to be efficient on GPUs. In this paper, we
present the parBucketHeap, a parallel, cache-efficient data structure designed
for modern GPU architectures that supports standard priority queue operations,
as well as bulk update. We analyze the structure in several well-known
computational models and show that it provides both optimal parallelism and is
cache-efficient. We implement the parBucketHeap and, using it, we solve the
single-source shortest path (SSSP) problem. Experimental results indicate that,
for sufficiently large, dense graphs with high diameter, we out-perform current
state-of-the-art SSSP algorithms on the GPU by up to a factor of 5. Unlike
existing GPU SSSP algorithms, our approach is work-optimal and places
significantly less load on the GPU, reducing power consumption
A Bulk-Parallel Priority Queue in External Memory with STXXL
We propose the design and an implementation of a bulk-parallel external
memory priority queue to take advantage of both shared-memory parallelism and
high external memory transfer speeds to parallel disks. To achieve higher
performance by decoupling item insertions and extractions, we offer two
parallelization interfaces: one using "bulk" sequences, the other by defining
"limit" items. In the design, we discuss how to parallelize insertions using
multiple heaps, and how to calculate a dynamic prediction sequence to prefetch
blocks and apply parallel multiway merge for extraction. Our experimental
results show that in the selected benchmarks the priority queue reaches 75% of
the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the
speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape
Some Optimally Adaptive Parallel Graph Algorithms on EREW PRAM Model
The study of graph algorithms is an important area of research in computer science, since graphs offer useful tools to model many real-world situations. The commercial availability of parallel computers have led to the development of efficient parallel graph algorithms.
Using an exclusive-read and exclusive-write (EREW) parallel random access machine (PRAM) as the computation model with a fixed number of processors, we design and analyze parallel algorithms for seven undirected graph problems, such as, connected components, spanning forest, fundamental cycle set, bridges, bipartiteness, assignment problems, and approximate vertex coloring. For all but the last two problems, the input data structure is an unordered list of edges, and divide-and-conquer is the paradigm for designing algorithms. One of the algorithms to solve the assignment problem makes use of an appropriate variant of dynamic programming strategy. An elegant data structure, called the adjacency list matrix, used in a vertex-coloring algorithm avoids the sequential nature of linked adjacency lists.
Each of the proposed algorithms achieves optimal speedup, choosing an optimal granularity (thus exploiting maximum parallelism) which depends on the density or the number of vertices of the given graph. The processor-(time)2 product has been identified as a useful parameter to measure the cost-effectiveness of a parallel algorithm. We derive a lower bound on this measure for each of our algorithms
Hammock-on-ears decomposition: a technique for the efficient parallel solution of shortest paths and other problems
We show how to decompose efficiently in parallel {\em any} graph into a number, , of outerplanar subgraphs (called {\em hammocks}) satisfying certain separator properties. Our work combines and extends the sequential hammock decomposition technique introduced by G.~Frederickson and the parallel ear decomposition technique, thus we call it the {\em hammock-on-ears decomposition}. We mention that hammock-on-ears decomposition also draws from techniques in computational geometry and that an embedding of the graph does not need to be provided with the input. We achieve this decomposition in time using CREW PRAM processors, for an -vertex, -edge graph or digraph. The hammock-on-ears decomposition implies a general framework for solving graph problems efficiently. Its value is demonstrated by a variety of applications on a significant class of (di)graphs, namely that of {\em sparse (di)graphs}. This class consists of all (di)graphs which have a between and , and includes planar graphs and graphs with genus . We improve previous bounds for certain instances of shortest paths and related problems, in this class of graphs. These problems include all pairs shortest paths, all pairs reachability
Recommended from our members
Efficient Parallel Algorithms and Data Structures Related to Trees
The main contribution of this dissertation proposes a new paradigm, called the parentheses matching paradigm. It claims that this paradigm is well suited for designing efficient parallel algorithms for a broad class of nonnumeric problems. To demonstrate its applicability, we present three cost-optimal parallel algorithms for breadth-first traversal of general trees, sorting a special class of integers, and coloring an interval graph with the minimum number of colors
- …