6,642 research outputs found
MPI+X: task-based parallelization and dynamic load balance of finite element assembly
The main computing tasks of a finite element code(FE) for solving partial
differential equations (PDE's) are the algebraic system assembly and the
iterative solver. This work focuses on the first task, in the context of a
hybrid MPI+X paradigm. Although we will describe algorithms in the FE context,
a similar strategy can be straightforwardly applied to other discretization
methods, like the finite volume method. The matrix assembly consists of a loop
over the elements of the MPI partition to compute element matrices and
right-hand sides and their assemblies in the local system to each MPI
partition. In a MPI+X hybrid parallelism context, X has consisted traditionally
of loop parallelism using OpenMP. Several strategies have been proposed in the
literature to implement this loop parallelism, like coloring or substructuring
techniques to circumvent the race condition that appears when assembling the
element system into the local system. The main drawback of the first technique
is the decrease of the IPC due to bad spatial locality. The second technique
avoids this issue but requires extensive changes in the implementation, which
can be cumbersome when several element loops should be treated. We propose an
alternative, based on the task parallelism of the element loop using some
extensions to the OpenMP programming model. The taskification of the assembly
solves both aforementioned problems. In addition, dynamic load balance will be
applied using the DLB library, especially efficient in the presence of hybrid
meshes, where the relative costs of the different elements is impossible to
estimate a priori. This paper presents the proposed methodology, its
implementation and its validation through the solution of large computational
mechanics problems up to 16k cores
Optimality program in segment and string graphs
Planar graphs are known to allow subexponential algorithms running in time
or for most of the paradigmatic
problems, while the brute-force time is very likely to be
asymptotically best on general graphs. Intrigued by an algorithm packing curves
in by Fox and Pach [SODA'11], we investigate which
problems have subexponential algorithms on the intersection graphs of curves
(string graphs) or segments (segment intersection graphs) and which problems
have no such algorithms under the ETH (Exponential Time Hypothesis). Among our
results, we show that, quite surprisingly, 3-Coloring can also be solved in
time on string graphs while an algorithm running
in time for 4-Coloring even on axis-parallel segments (of unbounded
length) would disprove the ETH. For 4-Coloring of unit segments, we show a
weaker ETH lower bound of which exploits the celebrated
Erd\H{o}s-Szekeres theorem. The subexponential running time also carries over
to Min Feedback Vertex Set but not to Min Dominating Set and Min Independent
Dominating Set.Comment: 19 pages, 15 figure
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
Distributed Symmetry Breaking in Hypergraphs
Fundamental local symmetry breaking problems such as Maximal Independent Set
(MIS) and coloring have been recognized as important by the community, and
studied extensively in (standard) graphs. In particular, fast (i.e.,
logarithmic run time) randomized algorithms are well-established for MIS and
-coloring in both the LOCAL and CONGEST distributed computing
models. On the other hand, comparatively much less is known on the complexity
of distributed symmetry breaking in {\em hypergraphs}. In particular, a key
question is whether a fast (randomized) algorithm for MIS exists for
hypergraphs.
In this paper, we study the distributed complexity of symmetry breaking in
hypergraphs by presenting distributed randomized algorithms for a variety of
fundamental problems under a natural distributed computing model for
hypergraphs. We first show that MIS in hypergraphs (of arbitrary dimension) can
be solved in rounds ( is the number of nodes of the
hypergraph) in the LOCAL model. We then present a key result of this paper ---
an -round hypergraph MIS algorithm in
the CONGEST model where is the maximum node degree of the hypergraph
and is any arbitrarily small constant.
To demonstrate the usefulness of hypergraph MIS, we present applications of
our hypergraph algorithm to solving problems in (standard) graphs. In
particular, the hypergraph MIS yields fast distributed algorithms for the {\em
balanced minimal dominating set} problem (left open in Harris et al. [ICALP
2013]) and the {\em minimal connected dominating set problem}. We also present
distributed algorithms for coloring, maximal matching, and maximal clique in
hypergraphs.Comment: Changes from the previous version: More references adde
Algorithms for Coloring Quadtrees
We describe simple linear time algorithms for coloring the squares of
balanced and unbalanced quadtrees so that no two adjacent squares are given the
same color. If squares sharing sides are defined as adjacent, we color balanced
quadtrees with three colors, and unbalanced quadtrees with four colors; these
results are both tight, as some quadtrees require this many colors. If squares
sharing corners are defined as adjacent, we color balanced or unbalanced
quadtrees with six colors; for some quadtrees, at least five colors are
required.Comment: 7 pages, 9 figure
Distributed local approximation algorithms for maximum matching in graphs and hypergraphs
We describe approximation algorithms in Linial's classic LOCAL model of
distributed computing to find maximum-weight matchings in a hypergraph of rank
. Our main result is a deterministic algorithm to generate a matching which
is an -approximation to the maximum weight matching, running in rounds. (Here, the
notations hides and factors).
This is based on a number of new derandomization techniques extending methods
of Ghaffari, Harris & Kuhn (2017).
As a main application, we obtain nearly-optimal algorithms for the
long-studied problem of maximum-weight graph matching. Specifically, we get a
approximation algorithm using randomized time and deterministic time.
The second application is a faster algorithm for hypergraph maximal matching,
a versatile subroutine introduced in Ghaffari et al. (2017) for a variety of
local graph algorithms. This gives an algorithm for -edge-list
coloring in rounds deterministically or
rounds randomly. Another consequence (with
additional optimizations) is an algorithm which generates an edge-orientation
with out-degree at most for a graph of
arboricity ; for fixed this runs in
rounds deterministically or rounds randomly
- …