446 research outputs found

    The DUNE-ALUGrid Module

    Get PDF
    In this paper we present the new DUNE-ALUGrid module. This module contains a major overhaul of the sources from the ALUgrid library and the binding to the DUNE software framework. The main changes include user defined load balancing, parallel grid construction, and an redesign of the 2d grid which can now also be used for parallel computations. In addition many improvements have been introduced into the code to increase the parallel efficiency and to decrease the memory footprint. The original ALUGrid library is widely used within the DUNE community due to its good parallel performance for problems requiring local adaptivity and dynamic load balancing. Therefore, this new model will benefit a number of DUNE users. In addition we have added features to increase the range of problems for which the grid manager can be used, for example, introducing a 3d tetrahedral grid using a parallel newest vertex bisection algorithm for conforming grid refinement. In this paper we will discuss the new features, extensions to the DUNE interface, and explain for various examples how the code is used in parallel environments.Comment: 25 pages, 11 figure

    High-Quality Shared-Memory Graph Partitioning

    Full text link
    Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation in processing graphs. Recently, size, variety, and structural complexity of these networks has grown dramatically. Unfortunately, previous approaches to parallel graph partitioning have problems in this context since they often show a negative trade-off between speed and quality. We present an approach to multi-level shared-memory parallel graph partitioning that guarantees balanced solutions, shows high speed-ups for a variety of large graphs and yields very good quality independently of the number of cores used. For example, on 31 cores, our algorithm partitions our largest test instance into 16 blocks cutting less than half the number of edges than our main competitor when both algorithms are given the same amount of time. Important ingredients include parallel label propagation for both coarsening and improvement, parallel initial partitioning, a simple yet effective approach to parallel localized local search, and fast locality preserving hash tables

    PT-Scotch: A tool for efficient parallel graph ordering

    Get PDF
    The parallel ordering of large graphs is a difficult problem, because on the one hand minimum degree algorithms do not parallelize well, and on the other hand the obtainment of high quality orderings with the nested dissection algorithm requires efficient graph bipartitioning heuristics, the best sequential implementations of which are also hard to parallelize. This paper presents a set of algorithms, implemented in the PT-Scotch software package, which allows one to order large graphs in parallel, yielding orderings the quality of which is only slightly worse than the one of state-of-the-art sequential algorithms. Our implementation uses the classical nested dissection approach but relies on several novel features to solve the parallel graph bipartitioning problem. Thanks to these improvements, PT-Scotch produces consistently better orderings than ParMeTiS on large numbers of processors

    Parallel Graph Partitioning for Complex Networks

    Full text link
    Processing large complex networks like social networks or web graphs has recently attracted considerable interest. In order to do this in parallel, we need to partition them into pieces of about equal size. Unfortunately, previous parallel graph partitioners originally developed for more regular mesh-like networks do not work well for these networks. This paper addresses this problem by parallelizing and adapting the label propagation technique originally developed for graph clustering. By introducing size constraints, label propagation becomes applicable for both the coarsening and the refinement phase of multilevel graph partitioning. We obtain very high quality by applying a highly parallel evolutionary algorithm to the coarsened graph. The resulting system is both more scalable and achieves higher quality than state-of-the-art systems like ParMetis or PT-Scotch. For large complex networks the performance differences are very big. For example, our algorithm can partition a web graph with 3.3 billion edges in less than sixteen seconds using 512 cores of a high performance cluster while producing a high quality partition -- none of the competing systems can handle this graph on our system.Comment: Review article. Parallelization of our previous approach arXiv:1402.328

    Large-Scale CFD Parallel Computing Dealing with Massive Mesh

    Get PDF

    Parallel Mesh Partitioning in Alya

    Get PDF
    The Alya System is the BSC simulation code for multi-physics problems [1]. It is based on a Variational Multiscale Finite Element Method for unstructured meshes. Work distribution is achieved by partitioning the original mesh into subdomains (submeshes). This pre-partition step has until now been done in serial by only one process, using the metis library [2]. This is a huge bottleneck when larger meshes with millions of elements have to be partitioned. This is due to the data not fitting in the memory of a single computing node and in the cases where the data does fit; Alya takes too long in the partitioning step. In this document we explain the tasks done to design, implement and test a new parallel partitioning algorithm for Alya. In this algorithm a subset of the workers, is in charge of partition the mesh in parallel, using the parmetis library [3]. Partitioning workers, load consecutive parts of the main mesh, with a parallel space partitioning bin structure [4], capable of obtaining the adjacent boundary elements of their respective submeshes. With this local mesh, each of the partitioning workers is able to create its local element adjacency graph and to partition the mesh. We have validated our new algorithm using a Navier-Stokes problem on a small cube mesh of 1000 elements. Then we performed a scalability test on a 30M element mesh to check if the time to partition the mesh is reduced proportionally with the number of partitioning workers. We have also done a comparison between metis and parmetis, the balancing of the element distribution among the domains, to test how the use of many partitioning workers to partition the mesh affects the scalability of Alya. We have noticed in these tests that it’s better to use fewer partitioning workers to partition the mesh. Finally we have two sections explaining the results and the future work that has to be done in order to finalise and improve the parallel partition algorithm
    • …
    corecore