Search CORE

65 research outputs found

Fast Parallel Algorithms for Basic Problems

Author: Wen Zhaofang
Publication venue: ODU Digital Commons
Publication date: 01/07/1991
Field of study

Parallel processing is one of the most active research areas these days. We are interested in one aspect of parallel processing, i.e. the design and analysis of parallel algorithms. Here, we focus on non-numerical parallel algorithms for basic combinatorial problems, such as data structures, selection, searching, merging and sorting. The purposes of studying these types of problems are to obtain basic building blocks which will be useful in solving complex problems, and to develop fundamental algorithmic techniques. In this thesis, we study the following problems: priority queues, multiple search and multiple selection, and reconstruction of a binary tree from its traversals. The research on priority queue was motivated by its various applications. The purpose of studying multiple search and multiple selection is to explore the relationships between four of the most fundamental problems in algorithm design, that is, selection, searching, merging and sorting; while our parallel solutions can be used as subroutines in algorithms for other problems. The research on the last problem, reconstruction of a binary tree from its traversals, was stimulated by a challenge proposed in a recent paper by Berkman et al. ( Highly Parallelizable Problems, STOC 89) to design doubly logarithmic time optimal parallel algorithms because a remarkably small number of such parallel algorithms exist

Old Dominion University

A parallel priority queue with fast updates for GPU architectures

Author: Iacono John
Karsin Ben
Sitchinava Nodari
Publication venue
Publication date: 01/06/2019
Field of study

The high computational throughput of modern graphics processing units (GPUs) make them the de-facto architecture for high-performance computing applications. However, to achieve peak performance, GPUs require highly parallel workloads, as well as memory access patterns that exhibit good locality of reference. As a result, many state-of-the-art algorithms and data structures designed for GPUs sacrifice work-optimality to achieve the necessary parallelism. Furthermore, some abstract data types are avoided completely due to there being no corresponding data structure that performs well on the GPU. One such abstract data type is the priority queue. Many well-known algorithms rely on priority queue operations as a building block. While various priority queue structures have been developed that are parallel, cache-aware, or cache-oblivious, none has been shown to be efficient on GPUs. In this paper, we present the parBucketHeap, a parallel, cache-efficient data structure designed for modern GPU architectures that supports standard priority queue operations, as well as bulk update. We analyze the structure in several well-known computational models and show that it provides both optimal parallelism and is cache-efficient. We implement the parBucketHeap and, using it, we solve the single-source shortest path (SSSP) problem. Experimental results indicate that, for sufficiently large, dense graphs with high diameter, we out-perform current state-of-the-art SSSP algorithms on the GPU by up to a factor of 5. Unlike existing GPU SSSP algorithms, our approach is work-optimal and places significantly less load on the GPU, reducing power consumption

arXiv.org e-Print Archive

DI-fusion

Parallel algorithms for the construction of special subgraphs

Author: Moussa Mahmoud Ibrahim
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

A Bulk-Parallel Priority Queue in External Memory with STXXL

Author: GS Brodal
J Singler
JS Vitter
L Arge
MC Pinotti
N Deo
P Sanders
P Sanders
PJ Varman
R Dementiev
Publication venue
Publication date: 01/01/2015
Field of study

We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape

arXiv.org e-Print Archive

Crossref

KITopen

Some Optimally Adaptive Parallel Graph Algorithms on EREW PRAM Model

Author: Das Sajal K.
Publication venue: University of Central Florida
Publication date: 01/01/1988
Field of study

The study of graph algorithms is an important area of research in computer science, since graphs offer useful tools to model many real-world situations. The commercial availability of parallel computers have led to the development of efficient parallel graph algorithms. Using an exclusive-read and exclusive-write (EREW) parallel random access machine (PRAM) as the computation model with a fixed number of processors, we design and analyze parallel algorithms for seven undirected graph problems, such as, connected components, spanning forest, fundamental cycle set, bridges, bipartiteness, assignment problems, and approximate vertex coloring. For all but the last two problems, the input data structure is an unordered list of edges, and divide-and-conquer is the paradigm for designing algorithms. One of the algorithms to solve the assignment problem makes use of an appropriate variant of dynamic programming strategy. An elegant data structure, called the adjacency list matrix, used in a vertex-coloring algorithm avoids the sequential nature of linked adjacency lists. Each of the proposed algorithms achieves optimal speedup, choosing an optimal granularity (thus exploiting maximum parallelism) which depends on the density or the number of vertices of the given graph. The processor-(time)2 product has been identified as a useful parameter to measure the cost-effectiveness of a parallel algorithm. We derive a lower bound on this measure for each of our algorithms

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Hammock-on-ears decomposition: a technique for the efficient parallel solution of shortest paths and other problems

Author: Kavvadias G.
Pantziou G.
Spirakis P.
Zaroliagis C.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1994
Field of study

We show how to decompose efficiently in parallel {\em any} graph into a number,

\tilde{\gamma}

, of outerplanar subgraphs (called {\em hammocks}) satisfying certain separator properties. Our work combines and extends the sequential hammock decomposition technique introduced by G.~Frederickson and the parallel ear decomposition technique, thus we call it the {\em hammock-on-ears decomposition}. We mention that hammock-on-ears decomposition also draws from techniques in computational geometry and that an embedding of the graph does not need to be provided with the input. We achieve this decomposition in

O(\log n\log\log n)

time using

O(n+m)

CREW PRAM processors, for an

n

-vertex,

m

-edge graph or digraph. The hammock-on-ears decomposition implies a general framework for solving graph problems efficiently. Its value is demonstrated by a variety of applications on a significant class of (di)graphs, namely that of {\em sparse (di)graphs}. This class consists of all (di)graphs which have a

\tilde{\gamma}

between

1

and

\Theta(n)

, and includes planar graphs and graphs with genus

o(n)

. We improve previous bounds for certain instances of shortest paths and related problems, in this class of graphs. These problems include all pairs shortest paths, all pairs reachability

MPG.PuRe

A Practical Scalable Shared-Memory Parallel Algorithm for Computing Minimum Spanning Trees

Author: Zhou Wei
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2017
Field of study

KITopen

Recommended from our members

Efficient Parallel Algorithms and Data Structures Related to Trees

Author: Chen Calvin Ching-Yuen
Publication venue: 'University of North Texas Libraries'
Publication date: 01/12/1991
Field of study

The main contribution of this dissertation proposes a new paradigm, called the parentheses matching paradigm. It claims that this paradigm is well suited for designing efficient parallel algorithms for a broad class of nonnumeric problems. To demonstrate its applicability, we present three cost-optimal parallel algorithms for breadth-first traversal of general trees, sorting a special class of integers, and coloring an interval graph with the minimum number of colors

UNT Digital Library