232 research outputs found
The DUNE-ALUGrid Module
In this paper we present the new DUNE-ALUGrid module. This module contains a
major overhaul of the sources from the ALUgrid library and the binding to the
DUNE software framework. The main changes include user defined load balancing,
parallel grid construction, and an redesign of the 2d grid which can now also
be used for parallel computations. In addition many improvements have been
introduced into the code to increase the parallel efficiency and to decrease
the memory footprint.
The original ALUGrid library is widely used within the DUNE community due to
its good parallel performance for problems requiring local adaptivity and
dynamic load balancing. Therefore, this new model will benefit a number of DUNE
users. In addition we have added features to increase the range of problems for
which the grid manager can be used, for example, introducing a 3d tetrahedral
grid using a parallel newest vertex bisection algorithm for conforming grid
refinement. In this paper we will discuss the new features, extensions to the
DUNE interface, and explain for various examples how the code is used in
parallel environments.Comment: 25 pages, 11 figure
A generic finite element framework on parallel tree-based adaptive meshes
We present highly scalable parallel distributed-memory algorithms and associated data structures for a generic finite element framework that supports h-adaptivity on computational domains represented as multiple connected adaptive trees—forest-of-trees—, thus providing multi-scale resolution on problems governed by partial differential equations.The framework is grounded on a rich representation of the adaptive mesh suitable for generic finite elements that is built on top of a low-level, light-weight forest-oftrees data structure handled by a specialized, highly parallel adaptive meshing engine. Along the way, we have identified the requirements that the forest-of-trees layer must fulfill to be coupled into our framework. Essentially, it must be able to describe neighboring relationships between cells in the adapted mesh (apart from hierarchical relationships) across the lower-dimensional objects at the boundary of the cells. Atop this two-layered mesh representation, we build the rest of data structures required for the numerical integration and assembly of the discrete system of linear equations.We consider algorithms that are suitable for both subassembled and fully-assembled distributed data layouts of linear system matrices. The proposed framework has been implemented within the FEMPAR scientific software library, using p4est as a practical forest-of-octrees demonstrator. A comprehensive strong scaling study of this implementation when applied to Poisson and Maxwell problems reveals remarkable scalability up to 32.2K CPU cores and 482.2M degrees of freedom. Besides, the implementation in FEMPAR of the proposed approach is up to 2.6 and 3.4 times faster than the state-of-the-art deal.II finite element software in the h-adaptive approximation of a Poisson problem with firstand second-order Lagrangian finite elements, respectively (excluding the linear solver step from the comparison)
Evaluation of an efficient etack-RLE clustering concept for dynamically adaptive grids
This is the author accepted manuscript. The final version is available from the Society for Industrial and Applied Mathematics via the DOI in this record.Abstract.
One approach to tackle the challenge of efficient implementations for parallel PDE simulations
on dynamically changing grids is the usage of space-filling curves (SFC). While SFC algorithms
possess advantageous properties such as low memory requirements and close-to-optimal partitioning
approaches with linear complexity, they require efficient communication strategies for keeping and
utilizing the connectivity information, in particular for dynamically changing grids. Our approach
is to use a sparse communication graph to store the connectivity information and to transfer data
block-wise. This permits efficient generation of multiple partitions per memory context (denoted
by clustering) which - in combination with a run-length encoding (RLE) - directly leads to elegant
solutions for shared, distributed and hybrid parallelization and allows cluster-based optimizations.
While previous work focused on specific aspects, we present in this paper an overall compact
summary of the stack-RLE clustering approach completed by aspects on the vertex-based communication
that ease up understanding the approach. The central contribution of this work is the proof
of suitability of the stack-RLE clustering approach for an efficient realization of different, relevant
building blocks of Scientific Computing methodology and real-life CSE applications: We show 95%
strong scalability for small-scale scalability benchmarks on 512 cores and weak scalability of over 90%
on 8192 cores for finite-volume solvers and changing grid structure in every time step; optimizations
of simulation data backends by writer tasks; comparisons of analytical benchmarks to analyze the
adaptivity criteria; and a Tsunami simulation as a representative real-world showcase of a wave propagation
for our approach which reduces the overall workload by 95% for parallel fully-adaptive mesh
refinement and, based on a comparison with SFC-ordered regular grid cells, reduces the computation
time by a factor of 7.6 with improved results and a factor of 62.2 with results of similar accuracy of
buoy station dataThis work was partly supported by the German Research
Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive
Computing” (SFB/TR 89)
Recommended from our members
Albany: Using Component-based Design to Develop a Flexible, Generic Multiphysics Analysis Code
Abstract:
Albany is a multiphysics code constructed by assembling a set of reusable, general components. It is an implicit, unstructured grid finite element code that hosts a set of advanced features that are readily combined within a single analysis run. Albany uses template-based generic programming methods to provide extensibility and flexibility; it employs a generic residual evaluation interface to support the easy addition and modification of physics. This interface is coupled to powerful automatic differentiation utilities that are used to implement efficient nonlinear solvers and preconditioners, and also to enable sensitivity analysis and embedded uncertainty quantification capabilities as part of the forward solve. The flexible application programming interfaces in Albany couple to two different adaptive mesh libraries; it internally employs generic integration machinery that supports tetrahedral, hexahedral, and hybrid meshes of user specified order. We present the overall design of Albany, and focus on the specifics of the integration of many of its advanced features. As Albany and the components that form it are openly available on the internet, it is our goal that the reader might find some of the design concepts useful in their own work. Albany results in a code that enables the rapid development of parallel, numerically efficient multiphysics software tools. In discussing the features and details of the integration of many of the components involved, we show the reader the wide variety of solution components that are available and what is possible when they are combined within a simulation capability.
Key Words: partial differential equations, finite element analysis, template-based generic programmin
Performance and Optimization Abstractions for Large Scale Heterogeneous Systems in the Cactus/Chemora Framework
We describe a set of lower-level abstractions to improve performance on
modern large scale heterogeneous systems. These provide portable access to
system- and hardware-dependent features, automatically apply dynamic
optimizations at run time, and target stencil-based codes used in finite
differencing, finite volume, or block-structured adaptive mesh refinement
codes.
These abstractions include a novel data structure to manage refinement
information for block-structured adaptive mesh refinement, an iterator
mechanism to efficiently traverse multi-dimensional arrays in stencil-based
codes, and a portable API and implementation for explicit SIMD vectorization.
These abstractions can either be employed manually, or be targeted by
automated code generation, or be used via support libraries by compilers during
code generation. The implementations described below are available in the
Cactus framework, and are used e.g. in the Einstein Toolkit for relativistic
astrophysics simulations
Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping
FLUSEPA (Registered trademark in France No. 134009261) is an advanced
simulation tool which performs a large panel of aerodynamic studies. It is the
unstructured finite-volume solver developed by Airbus Safran Launchers company
to calculate compressible, multidimensional, unsteady, viscous and reactive
flows around bodies in relative motion. The time integration in FLUSEPA is done
using an explicit temporal adaptive method. The current production version of
the code is based on MPI and OpenMP. This implementation leads to important
synchronizations that must be reduced. To tackle this problem, we present the
study of a task-based parallelization of the aerodynamic solver of FLUSEPA
using the runtime system StarPU and combining up to three levels of
parallelism. We validate our solution by the simulation (using a finite-volume
mesh with 80 million cells) of a take-off blast wave propagation for Ariane 5
launcher.Comment: Accepted manuscript of a paper in Journal of Computational Scienc
- …