4,131 research outputs found
The DUNE-ALUGrid Module
In this paper we present the new DUNE-ALUGrid module. This module contains a
major overhaul of the sources from the ALUgrid library and the binding to the
DUNE software framework. The main changes include user defined load balancing,
parallel grid construction, and an redesign of the 2d grid which can now also
be used for parallel computations. In addition many improvements have been
introduced into the code to increase the parallel efficiency and to decrease
the memory footprint.
The original ALUGrid library is widely used within the DUNE community due to
its good parallel performance for problems requiring local adaptivity and
dynamic load balancing. Therefore, this new model will benefit a number of DUNE
users. In addition we have added features to increase the range of problems for
which the grid manager can be used, for example, introducing a 3d tetrahedral
grid using a parallel newest vertex bisection algorithm for conforming grid
refinement. In this paper we will discuss the new features, extensions to the
DUNE interface, and explain for various examples how the code is used in
parallel environments.Comment: 25 pages, 11 figure
A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing
This work introduces an innovative parallel, fully-distributed finite element
framework for growing geometries and its application to metal additive
manufacturing. It is well-known that virtual part design and qualification in
additive manufacturing requires highly-accurate multiscale and multiphysics
analyses. Only high performance computing tools are able to handle such
complexity in time frames compatible with time-to-market. However, efficiency,
without loss of accuracy, has rarely held the centre stage in the numerical
community. Here, in contrast, the framework is designed to adequately exploit
the resources of high-end distributed-memory machines. It is grounded on three
building blocks: (1) Hierarchical adaptive mesh refinement with octree-based
meshes; (2) a parallel strategy to model the growth of the geometry; (3)
state-of-the-art parallel iterative linear solvers. Computational experiments
consider the heat transfer analysis at the part scale of the printing process
by powder-bed technologies. After verification against a 3D benchmark, a
strong-scaling analysis assesses performance and identifies major sources of
parallel overhead. A third numerical example examines the efficiency and
robustness of (2) in a curved 3D shape. Unprecedented parallelism and
scalability were achieved in this work. Hence, this framework contributes to
take on higher complexity and/or accuracy, not only of part-scale simulations
of metal or polymer additive manufacturing, but also in welding, sedimentation,
atherosclerosis, or any other physical problem where the physical domain of
interest grows in time
A Scalable and Modular Software Architecture for Finite Elements on Hierarchical Hybrid Grids
In this article, a new generic higher-order finite-element framework for
massively parallel simulations is presented. The modular software architecture
is carefully designed to exploit the resources of modern and future
supercomputers. Combining an unstructured topology with structured grid
refinement facilitates high geometric adaptability and matrix-free multigrid
implementations with excellent performance. Different abstraction levels and
fully distributed data structures additionally ensure high flexibility,
extensibility, and scalability. The software concepts support sophisticated
load balancing and flexibly combining finite element spaces. Example scenarios
with coupled systems of PDEs show the applicability of the concepts to
performing geophysical simulations.Comment: Preprint of an article submitted to International Journal of
Parallel, Emergent and Distributed Systems (Taylor & Francis
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
A Massively Parallel Implementation of Multilevel Monte Carlo for Finite Element Models
The Multilevel Monte Carlo (MLMC) method has proven to be an effective
variance-reduction statistical method for Uncertainty Quantification (UQ) in
Partial Differential Equation (PDE) models, combining model computations at
different levels to create an accurate estimate. Still, the computational
complexity of the resulting method is extremely high, particularly for 3D
models, which requires advanced algorithms for the efficient exploitation of
High Performance Computing (HPC). In this article we present a new
implementation of the MLMC in massively parallel computer architectures,
exploiting parallelism within and between each level of the hierarchy. The
numerical approximation of the PDE is performed using the finite element method
but the algorithm is quite general and could be applied to other discretization
methods as well, although the focus is on parallel sampling. The two key
ingredients of an efficient parallel implementation are a good processor
partition scheme together with a good scheduling algorithm to assign work to
different processors. We introduce a multiple partition of the set of
processors that permits the simultaneous execution of different levels and we
develop a dynamic scheduling algorithm to exploit it. The problem of finding
the optimal scheduling of distributed tasks in a parallel computer is an
NP-complete problem. We propose and analyze a new greedy scheduling algorithm
to assign samples and we show that it is a 2-approximation, which is the best
that may be expected under general assumptions. On top of this result we design
a distributed memory implementation using the Message Passing Interface (MPI)
standard. Finally we present a set of numerical experiments illustrating its
scalability properties.Comment: 21 pages, 13 figure
An adaptive hierarchical domain decomposition method for parallel contact dynamics simulations of granular materials
A fully parallel version of the contact dynamics (CD) method is presented in
this paper. For large enough systems, 100% efficiency has been demonstrated for
up to 256 processors using a hierarchical domain decomposition with dynamic
load balancing. The iterative scheme to calculate the contact forces is left
domain-wise sequential, with data exchange after each iteration step, which
ensures its stability. The number of additional iterations required for
convergence by the partially parallel updates at the domain boundaries becomes
negligible with increasing number of particles, which allows for an effective
parallelization. Compared to the sequential implementation, we found no
influence of the parallelization on simulation results.Comment: 19 pages, 15 figures, published in Journal of Computational Physics
(2011
- …