29,351 research outputs found
An efficient implementation of the Bellman-Ford algorithm for Kepler GPU architectures
Finding the shortest paths from a single source to all other vertices is a common problem in graph analysis. The Bellman-Ford's algorithm is the solution that solves such a single-source shortest path (SSSP) problem and better applies to be parallelized for many-core architectures. Nevertheless, the high degree of parallelism is guaranteed at the cost of low work efficiency, which, compared to similar algorithms in literature (e.g., Dijkstra's) involves much more redundant work and a consequent waste of power consumption. This article presents a parallel implementation of the Bellman-Ford algorithm that exploits the architectural characteristics of recent GPU architectures (i.e., NVIDIA Kepler, Maxwell) to improve both performance and work efficiency. The article presents different optimizations to the implementation, which are oriented both to the algorithm and to the architecture. The experimental results show that the proposed implementation provides an average speedup of 5x higher than the existing most efficient parallel implementations for SSSP, that it works on graphs where those implementations cannot work or are inefficient (e.g., graphs with negative weight edges, sparse graphs), and that it sensibly reduces the redundant work caused by the parallelization process
A Simple Boosting Framework for Transshipment
Transshipment, also known under the names of earth mover's distance,
uncapacitated min-cost flow, or Wasserstein's metric, is an important and
well-studied problem that asks to find a flow of minimum cost that routes a
general demand vector. Adding to its importance, recent advancements in our
understanding of algorithms for transshipment have led to breakthroughs for the
fundamental problem of computing shortest paths. Specifically, the recent
near-optimal -approximate single-source shortest path
algorithms in the parallel and distributed settings crucially solve
transshipment as a central step of their approach.
The key property that differentiates transshipment from other similar
problems like shortest path is the so-called \emph{boosting}: one can boost a
(bad) approximate solution to a near-optimal -approximate
solution. This conceptually reduces the problem to finding an approximate
solution. However, not all approximations can be boosted -- there have been
several proposed approaches that were shown to be susceptible to boosting, and
a few others where boosting was left as an open question.
The main takeaway of our paper is that any black-box -approximate
transshipment solver that computes a \emph{dual} solution can be boosted to an
-approximate solver. Moreover, we significantly simplify and
decouple previous approaches to transshipment (in sequential, parallel, and
distributed settings) by showing all of them (implicitly) obtain approximate
dual solutions.
Our analysis is very simple and relies only on the well-known multiplicative
weights framework. Furthermore, to keep the paper completely self-contained, we
provide a new (and arguably much simpler) analysis of multiplicative weights
that leverages well-known optimization tools to bypass the ad-hoc calculations
used in the standard analyses
A parallel priority queue with fast updates for GPU architectures
The high computational throughput of modern graphics processing units (GPUs)
make them the de-facto architecture for high-performance computing
applications. However, to achieve peak performance, GPUs require highly
parallel workloads, as well as memory access patterns that exhibit good
locality of reference. As a result, many state-of-the-art algorithms and data
structures designed for GPUs sacrifice work-optimality to achieve the necessary
parallelism. Furthermore, some abstract data types are avoided completely due
to there being no corresponding data structure that performs well on the GPU.
One such abstract data type is the priority queue. Many well-known algorithms
rely on priority queue operations as a building block. While various priority
queue structures have been developed that are parallel, cache-aware, or
cache-oblivious, none has been shown to be efficient on GPUs. In this paper, we
present the parBucketHeap, a parallel, cache-efficient data structure designed
for modern GPU architectures that supports standard priority queue operations,
as well as bulk update. We analyze the structure in several well-known
computational models and show that it provides both optimal parallelism and is
cache-efficient. We implement the parBucketHeap and, using it, we solve the
single-source shortest path (SSSP) problem. Experimental results indicate that,
for sufficiently large, dense graphs with high diameter, we out-perform current
state-of-the-art SSSP algorithms on the GPU by up to a factor of 5. Unlike
existing GPU SSSP algorithms, our approach is work-optimal and places
significantly less load on the GPU, reducing power consumption
Parallel Processing of Large Graphs
More and more large data collections are gathered worldwide in various IT
systems. Many of them possess the networked nature and need to be processed and
analysed as graph structures. Due to their size they require very often usage
of parallel paradigm for efficient computation. Three parallel techniques have
been compared in the paper: MapReduce, its map-side join extension and Bulk
Synchronous Parallel (BSP). They are implemented for two different graph
problems: calculation of single source shortest paths (SSSP) and collective
classification of graph nodes by means of relational influence propagation
(RIP). The methods and algorithms are applied to several network datasets
differing in size and structural profile, originating from three domains:
telecommunication, multimedia and microblog. The results revealed that
iterative graph processing with the BSP implementation always and
significantly, even up to 10 times outperforms MapReduce, especially for
algorithms with many iterations and sparse communication. Also MapReduce
extension based on map-side join usually noticeably presents better efficiency,
although not as much as BSP. Nevertheless, MapReduce still remains the good
alternative for enormous networks, whose data structures do not fit in local
memories.Comment: Preprint submitted to Future Generation Computer System
Distributed Approximation Algorithms for Weighted Shortest Paths
A distributed network is modeled by a graph having nodes (processors) and
diameter . We study the time complexity of approximating {\em weighted}
(undirected) shortest paths on distributed networks with a {\em
bandwidth restriction} on edges (the standard synchronous \congest model). The
question whether approximation algorithms help speed up the shortest paths
(more precisely distance computation) was raised since at least 2004 by Elkin
(SIGACT News 2004). The unweighted case of this problem is well-understood
while its weighted counterpart is fundamental problem in the area of
distributed approximation algorithms and remains widely open. We present new
algorithms for computing both single-source shortest paths (\sssp) and
all-pairs shortest paths (\apsp) in the weighted case.
Our main result is an algorithm for \sssp. Previous results are the classic
-time Bellman-Ford algorithm and an -time
-approximation algorithm, for any integer
, which follows from the result of Lenzen and Patt-Shamir (STOC 2013).
(Note that Lenzen and Patt-Shamir in fact solve a harder problem, and we use
to hide the O(\poly\log n) term.) We present an -time -approximation algorithm for \sssp. This
algorithm is {\em sublinear-time} as long as is sublinear, thus yielding a
sublinear-time algorithm with almost optimal solution. When is small, our
running time matches the lower bound of by Das Sarma
et al. (SICOMP 2012), which holds even when , up to a
\poly\log n factor.Comment: Full version of STOC 201
- …