Search CORE

6,642 research outputs found

MPI+X: task-based parallelization and dynamic load balance of finite element assembly

Author: Artigues Antoni
Ferrer Roger
Garcia-Gasulla Marta
Houzeaux Guillaume
Labarta Jesús
López Victor
Vázquez Mariano
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The taskification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Optimality program in segment and string graphs

Author: BS Baker
C McDiarmid
D Marx
D Marx
DE Knuth
ED Demaine
ED Demaine
ED Demaine
ED Demaine
IE Zverovich
J Alber
J Fox
J Kratochvíl
J Kratochvíl
J Matoušek
M Schaefer
M Schaefer
N Robertson
R Impagliazzo
RJ Lipton
S Cabello
Publication venue
Publication date: 10/10/2018
Field of study

Planar graphs are known to allow subexponential algorithms running in time

2^{O(\sqrt n)}

2^{O(\sqrt n \log n)}

for most of the paradigmatic problems, while the brute-force time

2^{\Theta(n)}

is very likely to be asymptotically best on general graphs. Intrigued by an algorithm packing curves in

2^{O(n^{2/3}\log n)}

by Fox and Pach [SODA'11], we investigate which problems have subexponential algorithms on the intersection graphs of curves (string graphs) or segments (segment intersection graphs) and which problems have no such algorithms under the ETH (Exponential Time Hypothesis). Among our results, we show that, quite surprisingly, 3-Coloring can also be solved in time

2^{O(n^{2/3}\log^{O(1)}n)}

on string graphs while an algorithm running in time

2^{o(n)}

for 4-Coloring even on axis-parallel segments (of unbounded length) would disprove the ETH. For 4-Coloring of unit segments, we show a weaker ETH lower bound of

2^{o(n^{2/3})}

which exploits the celebrated Erd\H{o}s-Szekeres theorem. The subexponential running time also carries over to Min Feedback Vertex Set but not to Min Dominating Set and Min Independent Dominating Set.Comment: 19 pages, 15 figure

arXiv.org e-Print Archive

Crossref

Gunrock: GPU Graph Analytics

Author: Davidson Andrew
Liu Weitang
Osama Muhammad
Owens John D.
Pan Yuechao
Riffel Andy T.
Wang Leyuan
Wang Yangzihao
Wu Yuduo
Yang Carl
Yuan Chenshan
Publication venue
Publication date: 04/01/2017
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

arXiv.org e-Print Archive

eScholarship - University of California

FigShare

Distributed Symmetry Breaking in Hypergraphs

Author: A. Ephremides
A.D. Sarma
C. Avin
D. Peleg
D.G. Harris
D.P. Dubhashi
F. Dai
H. Balakrishnan
I. Chlamtac
J.A. Garay
M. Ghaffari
N. Linial
N. Linial
R. Thurimella
S. Kutten
T. Luczak
Y. Métivier
Publication venue
Publication date: 01/01/2014
Field of study

Fundamental local symmetry breaking problems such as Maximal Independent Set (MIS) and coloring have been recognized as important by the community, and studied extensively in (standard) graphs. In particular, fast (i.e., logarithmic run time) randomized algorithms are well-established for MIS and

\Delta +1

-coloring in both the LOCAL and CONGEST distributed computing models. On the other hand, comparatively much less is known on the complexity of distributed symmetry breaking in {\em hypergraphs}. In particular, a key question is whether a fast (randomized) algorithm for MIS exists for hypergraphs. In this paper, we study the distributed complexity of symmetry breaking in hypergraphs by presenting distributed randomized algorithms for a variety of fundamental problems under a natural distributed computing model for hypergraphs. We first show that MIS in hypergraphs (of arbitrary dimension) can be solved in

O(\log^2 n)

rounds (

n

is the number of nodes of the hypergraph) in the LOCAL model. We then present a key result of this paper --- an

O(\Delta^{\epsilon}\text{polylog}(n))

-round hypergraph MIS algorithm in the CONGEST model where

\Delta

is the maximum node degree of the hypergraph and

\epsilon > 0

is any arbitrarily small constant. To demonstrate the usefulness of hypergraph MIS, we present applications of our hypergraph algorithm to solving problems in (standard) graphs. In particular, the hypergraph MIS yields fast distributed algorithms for the {\em balanced minimal dominating set} problem (left open in Harris et al. [ICALP 2013]) and the {\em minimal connected dominating set problem}. We also present distributed algorithms for coloring, maximal matching, and maximal clique in hypergraphs.Comment: Changes from the previous version: More references adde

arXiv.org e-Print Archive

Crossref

Algorithms for Coloring Quadtrees

Author: Bern Marshall W.
Eppstein David
Hutchings Brad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/07/1999
Field of study

We describe simple linear time algorithms for coloring the squares of balanced and unbalanced quadtrees so that no two adjacent squares are given the same color. If squares sharing sides are defined as adjacent, we color balanced quadtrees with three colors, and unbalanced quadtrees with four colors; these results are both tight, as some quadtrees require this many colors. If squares sharing corners are defined as adjacent, we color balanced or unbalanced quadtrees with six colors; for some quadtrees, at least five colors are required.Comment: 7 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Distributed local approximation algorithms for maximum matching in graphs and hypergraphs

Author: Harris David G.
Publication venue
Publication date: 23/03/2020
Field of study

We describe approximation algorithms in Linial's classic LOCAL model of distributed computing to find maximum-weight matchings in a hypergraph of rank

r

. Our main result is a deterministic algorithm to generate a matching which is an

O(r)

-approximation to the maximum weight matching, running in

\tilde O(r \log \Delta + \log^2 \Delta + \log^* n)

rounds. (Here, the

\tilde O()

notations hides

\text{polyloglog } \Delta

and

\text{polylog } r

factors). This is based on a number of new derandomization techniques extending methods of Ghaffari, Harris & Kuhn (2017). As a main application, we obtain nearly-optimal algorithms for the long-studied problem of maximum-weight graph matching. Specifically, we get a

(1+\epsilon)

approximation algorithm using

\tilde O(\log \Delta / \epsilon^3 + \text{polylog}(1/\epsilon, \log \log n))

randomized time and

\tilde O(\log^2 \Delta / \epsilon^4 + \log^*n / \epsilon)

deterministic time. The second application is a faster algorithm for hypergraph maximal matching, a versatile subroutine introduced in Ghaffari et al. (2017) for a variety of local graph algorithms. This gives an algorithm for