446 research outputs found
The DUNE-ALUGrid Module
In this paper we present the new DUNE-ALUGrid module. This module contains a
major overhaul of the sources from the ALUgrid library and the binding to the
DUNE software framework. The main changes include user defined load balancing,
parallel grid construction, and an redesign of the 2d grid which can now also
be used for parallel computations. In addition many improvements have been
introduced into the code to increase the parallel efficiency and to decrease
the memory footprint.
The original ALUGrid library is widely used within the DUNE community due to
its good parallel performance for problems requiring local adaptivity and
dynamic load balancing. Therefore, this new model will benefit a number of DUNE
users. In addition we have added features to increase the range of problems for
which the grid manager can be used, for example, introducing a 3d tetrahedral
grid using a parallel newest vertex bisection algorithm for conforming grid
refinement. In this paper we will discuss the new features, extensions to the
DUNE interface, and explain for various examples how the code is used in
parallel environments.Comment: 25 pages, 11 figure
High-Quality Shared-Memory Graph Partitioning
Partitioning graphs into blocks of roughly equal size such that few edges run
between blocks is a frequently needed operation in processing graphs. Recently,
size, variety, and structural complexity of these networks has grown
dramatically. Unfortunately, previous approaches to parallel graph partitioning
have problems in this context since they often show a negative trade-off
between speed and quality. We present an approach to multi-level shared-memory
parallel graph partitioning that guarantees balanced solutions, shows high
speed-ups for a variety of large graphs and yields very good quality
independently of the number of cores used. For example, on 31 cores, our
algorithm partitions our largest test instance into 16 blocks cutting less than
half the number of edges than our main competitor when both algorithms are
given the same amount of time. Important ingredients include parallel label
propagation for both coarsening and improvement, parallel initial partitioning,
a simple yet effective approach to parallel localized local search, and fast
locality preserving hash tables
PT-Scotch: A tool for efficient parallel graph ordering
The parallel ordering of large graphs is a difficult problem, because on the
one hand minimum degree algorithms do not parallelize well, and on the other
hand the obtainment of high quality orderings with the nested dissection
algorithm requires efficient graph bipartitioning heuristics, the best
sequential implementations of which are also hard to parallelize. This paper
presents a set of algorithms, implemented in the PT-Scotch software package,
which allows one to order large graphs in parallel, yielding orderings the
quality of which is only slightly worse than the one of state-of-the-art
sequential algorithms. Our implementation uses the classical nested dissection
approach but relies on several novel features to solve the parallel graph
bipartitioning problem. Thanks to these improvements, PT-Scotch produces
consistently better orderings than ParMeTiS on large numbers of processors
Parallel Graph Partitioning for Complex Networks
Processing large complex networks like social networks or web graphs has
recently attracted considerable interest. In order to do this in parallel, we
need to partition them into pieces of about equal size. Unfortunately, previous
parallel graph partitioners originally developed for more regular mesh-like
networks do not work well for these networks. This paper addresses this problem
by parallelizing and adapting the label propagation technique originally
developed for graph clustering. By introducing size constraints, label
propagation becomes applicable for both the coarsening and the refinement phase
of multilevel graph partitioning. We obtain very high quality by applying a
highly parallel evolutionary algorithm to the coarsened graph. The resulting
system is both more scalable and achieves higher quality than state-of-the-art
systems like ParMetis or PT-Scotch. For large complex networks the performance
differences are very big. For example, our algorithm can partition a web graph
with 3.3 billion edges in less than sixteen seconds using 512 cores of a high
performance cluster while producing a high quality partition -- none of the
competing systems can handle this graph on our system.Comment: Review article. Parallelization of our previous approach
arXiv:1402.328
Parallel Mesh Partitioning in Alya
The Alya System is the BSC simulation code for multi-physics problems [1]. It is based on a Variational
Multiscale Finite Element Method for unstructured meshes.
Work distribution is achieved by partitioning the original mesh into subdomains (submeshes). This pre-partition
step has until now been done in serial by only one process, using the metis library [2]. This is a huge bottleneck
when larger meshes with millions of elements have to be partitioned. This is due to the data not fitting in the
memory of a single computing node and in the cases where the data does fit; Alya takes too long in the
partitioning step.
In this document we explain the tasks done to design, implement and test a new parallel partitioning algorithm
for Alya. In this algorithm a subset of the workers, is in charge of partition the mesh in parallel, using the
parmetis library [3].
Partitioning workers, load consecutive parts of the main mesh, with a parallel space partitioning bin structure [4],
capable of obtaining the adjacent boundary elements of their respective submeshes. With this local mesh, each of
the partitioning workers is able to create its local element adjacency graph and to partition the mesh.
We have validated our new algorithm using a Navier-Stokes problem on a small cube mesh of 1000 elements.
Then we performed a scalability test on a 30M element mesh to check if the time to partition the mesh is reduced
proportionally with the number of partitioning workers.
We have also done a comparison between metis and parmetis, the balancing of the element distribution among
the domains, to test how the use of many partitioning workers to partition the mesh affects the scalability of
Alya. We have noticed in these tests that it’s better to use fewer partitioning workers to partition the mesh.
Finally we have two sections explaining the results and the future work that has to be done in order to finalise
and improve the parallel partition algorithm
- …