23,906 research outputs found
Multi-GPU Graph Analytics
We present a single-node, multi-GPU programmable graph processing library
that allows programmers to easily extend single-GPU graph algorithms to achieve
scalable performance on large graphs with billions of edges. Directly using the
single-GPU implementations, our design only requires programmers to specify a
few algorithm-dependent concerns, hiding most multi-GPU related implementation
details. We analyze the theoretical and practical limits to scalability in the
context of varying graph primitives and datasets. We describe several
optimizations, such as direction optimizing traversal, and a just-enough memory
allocation scheme, for better performance and smaller memory consumption.
Compared to previous work, we achieve best-of-class performance across
operations and datasets, including excellent strong and weak scalability on
most primitives as we increase the number of GPUs in the system.Comment: 12 pages. Final version submitted to IPDPS 201
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
Data fragmentation for parallel transitive closure strategies
Addresses the problem of fragmenting a relation to make the parallel computation of the transitive closure efficient, based on the disconnection set approach. To better understand this design problem, the authors focus on transportation networks. These are characterized by loosely interconnected clusters of nodes with a high internal connectivity rate. Three requirements that have to be fulfilled by a fragmentation are formulated, and three different fragmentation strategies are presented, each emphasizing one of these requirements. Some test results are presented to show the performance of the various fragmentation strategie
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
Persistent Homology Guided Force-Directed Graph Layouts
Graphs are commonly used to encode relationships among entities, yet their
abstractness makes them difficult to analyze. Node-link diagrams are popular
for drawing graphs, and force-directed layouts provide a flexible method for
node arrangements that use local relationships in an attempt to reveal the
global shape of the graph. However, clutter and overlap of unrelated structures
can lead to confusing graph visualizations. This paper leverages the persistent
homology features of an undirected graph as derived information for interactive
manipulation of force-directed layouts. We first discuss how to efficiently
extract 0-dimensional persistent homology features from both weighted and
unweighted undirected graphs. We then introduce the interactive persistence
barcode used to manipulate the force-directed graph layout. In particular, the
user adds and removes contracting and repulsing forces generated by the
persistent homology features, eventually selecting the set of persistent
homology features that most improve the layout. Finally, we demonstrate the
utility of our approach across a variety of synthetic and real datasets
Distributed Edge Connectivity in Sublinear Time
We present the first sublinear-time algorithm for a distributed
message-passing network sto compute its edge connectivity exactly in
the CONGEST model, as long as there are no parallel edges. Our algorithm takes
time to compute and a
cut of cardinality with high probability, where and are the
number of nodes and the diameter of the network, respectively, and
hides polylogarithmic factors. This running time is sublinear in (i.e.
) whenever is. Previous sublinear-time
distributed algorithms can solve this problem either (i) exactly only when
[Thurimella PODC'95; Pritchard, Thurimella, ACM
Trans. Algorithms'11; Nanongkai, Su, DISC'14] or (ii) approximately [Ghaffari,
Kuhn, DISC'13; Nanongkai, Su, DISC'14].
To achieve this we develop and combine several new techniques. First, we
design the first distributed algorithm that can compute a -edge connectivity
certificate for any in time .
Second, we show that by combining the recent distributed expander decomposition
technique of [Chang, Pettie, Zhang, SODA'19] with techniques from the
sequential deterministic edge connectivity algorithm of [Kawarabayashi, Thorup,
STOC'15], we can decompose the network into a sublinear number of clusters with
small average diameter and without any mincut separating a cluster (except the
`trivial' ones). Finally, by extending the tree packing technique from [Karger
STOC'96], we can find the minimum cut in time proportional to the number of
components. As a byproduct of this technique, we obtain an -time
algorithm for computing exact minimum cut for weighted graphs.Comment: Accepted at 51st ACM Symposium on Theory of Computing (STOC 2019
On the Complexity of Local Distributed Graph Problems
This paper is centered on the complexity of graph problems in the
well-studied LOCAL model of distributed computing, introduced by Linial [FOCS
'87]. It is widely known that for many of the classic distributed graph
problems (including maximal independent set (MIS) and -vertex
coloring), the randomized complexity is at most polylogarithmic in the size
of the network, while the best deterministic complexity is typically
. Understanding and narrowing down this exponential gap
is considered to be one of the central long-standing open questions in the area
of distributed graph algorithms. We investigate the problem by introducing a
complexity-theoretic framework that allows us to shed some light on the role of
randomness in the LOCAL model. We define the SLOCAL model as a sequential
version of the LOCAL model. Our framework allows us to prove completeness
results with respect to the class of problems which can be solved efficiently
in the SLOCAL model, implying that if any of the complete problems can be
solved deterministically in rounds in the LOCAL model, we can
deterministically solve all efficient SLOCAL-problems (including MIS and
-coloring) in rounds in the LOCAL model. We show
that a rather rudimentary looking graph coloring problem is complete in the
above sense: Color the nodes of a graph with colors red and blue such that each
node of sufficiently large polylogarithmic degree has at least one neighbor of
each color. The problem admits a trivial zero-round randomized solution. The
result can be viewed as showing that the only obstacle to getting efficient
determinstic algorithms in the LOCAL model is an efficient algorithm to
approximately round fractional values into integer values
- …