6,642 research outputs found

    MPI+X: task-based parallelization and dynamic load balance of finite element assembly

    Get PDF
    The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The taskification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores

    Optimality program in segment and string graphs

    Full text link
    Planar graphs are known to allow subexponential algorithms running in time 2O(n)2^{O(\sqrt n)} or 2O(nlogn)2^{O(\sqrt n \log n)} for most of the paradigmatic problems, while the brute-force time 2Θ(n)2^{\Theta(n)} is very likely to be asymptotically best on general graphs. Intrigued by an algorithm packing curves in 2O(n2/3logn)2^{O(n^{2/3}\log n)} by Fox and Pach [SODA'11], we investigate which problems have subexponential algorithms on the intersection graphs of curves (string graphs) or segments (segment intersection graphs) and which problems have no such algorithms under the ETH (Exponential Time Hypothesis). Among our results, we show that, quite surprisingly, 3-Coloring can also be solved in time 2O(n2/3logO(1)n)2^{O(n^{2/3}\log^{O(1)}n)} on string graphs while an algorithm running in time 2o(n)2^{o(n)} for 4-Coloring even on axis-parallel segments (of unbounded length) would disprove the ETH. For 4-Coloring of unit segments, we show a weaker ETH lower bound of 2o(n2/3)2^{o(n^{2/3})} which exploits the celebrated Erd\H{o}s-Szekeres theorem. The subexponential running time also carries over to Min Feedback Vertex Set but not to Min Dominating Set and Min Independent Dominating Set.Comment: 19 pages, 15 figure

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    Distributed Symmetry Breaking in Hypergraphs

    Full text link
    Fundamental local symmetry breaking problems such as Maximal Independent Set (MIS) and coloring have been recognized as important by the community, and studied extensively in (standard) graphs. In particular, fast (i.e., logarithmic run time) randomized algorithms are well-established for MIS and Δ+1\Delta +1-coloring in both the LOCAL and CONGEST distributed computing models. On the other hand, comparatively much less is known on the complexity of distributed symmetry breaking in {\em hypergraphs}. In particular, a key question is whether a fast (randomized) algorithm for MIS exists for hypergraphs. In this paper, we study the distributed complexity of symmetry breaking in hypergraphs by presenting distributed randomized algorithms for a variety of fundamental problems under a natural distributed computing model for hypergraphs. We first show that MIS in hypergraphs (of arbitrary dimension) can be solved in O(log2n)O(\log^2 n) rounds (nn is the number of nodes of the hypergraph) in the LOCAL model. We then present a key result of this paper --- an O(Δϵpolylog(n))O(\Delta^{\epsilon}\text{polylog}(n))-round hypergraph MIS algorithm in the CONGEST model where Δ\Delta is the maximum node degree of the hypergraph and ϵ>0\epsilon > 0 is any arbitrarily small constant. To demonstrate the usefulness of hypergraph MIS, we present applications of our hypergraph algorithm to solving problems in (standard) graphs. In particular, the hypergraph MIS yields fast distributed algorithms for the {\em balanced minimal dominating set} problem (left open in Harris et al. [ICALP 2013]) and the {\em minimal connected dominating set problem}. We also present distributed algorithms for coloring, maximal matching, and maximal clique in hypergraphs.Comment: Changes from the previous version: More references adde

    Algorithms for Coloring Quadtrees

    Full text link
    We describe simple linear time algorithms for coloring the squares of balanced and unbalanced quadtrees so that no two adjacent squares are given the same color. If squares sharing sides are defined as adjacent, we color balanced quadtrees with three colors, and unbalanced quadtrees with four colors; these results are both tight, as some quadtrees require this many colors. If squares sharing corners are defined as adjacent, we color balanced or unbalanced quadtrees with six colors; for some quadtrees, at least five colors are required.Comment: 7 pages, 9 figure

    Distributed local approximation algorithms for maximum matching in graphs and hypergraphs

    Full text link
    We describe approximation algorithms in Linial's classic LOCAL model of distributed computing to find maximum-weight matchings in a hypergraph of rank rr. Our main result is a deterministic algorithm to generate a matching which is an O(r)O(r)-approximation to the maximum weight matching, running in O~(rlogΔ+log2Δ+logn)\tilde O(r \log \Delta + \log^2 \Delta + \log^* n) rounds. (Here, the O~()\tilde O() notations hides polyloglog Δ\text{polyloglog } \Delta and polylog r\text{polylog } r factors). This is based on a number of new derandomization techniques extending methods of Ghaffari, Harris & Kuhn (2017). As a main application, we obtain nearly-optimal algorithms for the long-studied problem of maximum-weight graph matching. Specifically, we get a (1+ϵ)(1+\epsilon) approximation algorithm using O~(logΔ/ϵ3+polylog(1/ϵ,loglogn))\tilde O(\log \Delta / \epsilon^3 + \text{polylog}(1/\epsilon, \log \log n)) randomized time and O~(log2Δ/ϵ4+logn/ϵ)\tilde O(\log^2 \Delta / \epsilon^4 + \log^*n / \epsilon) deterministic time. The second application is a faster algorithm for hypergraph maximal matching, a versatile subroutine introduced in Ghaffari et al. (2017) for a variety of local graph algorithms. This gives an algorithm for (2Δ1)(2 \Delta - 1)-edge-list coloring in O~(log2Δlogn)\tilde O(\log^2 \Delta \log n) rounds deterministically or O~((loglogn)3)\tilde O( (\log \log n)^3 ) rounds randomly. Another consequence (with additional optimizations) is an algorithm which generates an edge-orientation with out-degree at most (1+ϵ)λ\lceil (1+\epsilon) \lambda \rceil for a graph of arboricity λ\lambda; for fixed ϵ\epsilon this runs in O~(log6n)\tilde O(\log^6 n) rounds deterministically or O~(log3n)\tilde O(\log^3 n ) rounds randomly
    corecore