3,263 research outputs found
One machine, one minute, three billion tetrahedra
This paper presents a new scalable parallelization scheme to generate the 3D
Delaunay triangulation of a given set of points. Our first contribution is an
efficient serial implementation of the incremental Delaunay insertion
algorithm. A simple dedicated data structure, an efficient sorting of the
points and the optimization of the insertion algorithm have permitted to
accelerate reference implementations by a factor three. Our second contribution
is a multi-threaded version of the Delaunay kernel that is able to concurrently
insert vertices. Moore curve coordinates are used to partition the point set,
avoiding heavy synchronization overheads. Conflicts are managed by modifying
the partitions with a simple rescaling of the space-filling curve. The
performances of our implementation have been measured on three different
processors, an Intel core-i7, an Intel Xeon Phi and an AMD EPYC, on which we
have been able to compute 3 billion tetrahedra in 53 seconds. This corresponds
to a generation rate of over 55 million tetrahedra per second. We finally show
how this very efficient parallel Delaunay triangulation can be integrated in a
Delaunay refinement mesh generator which takes as input the triangulated
surface boundary of the volume to mesh
Construction and Application of an AMR Algorithm for Distributed Memory Computers
While the parallelization of blockstructured adaptive mesh refinement techniques is relatively straight-forward on shared memory architectures, appropriate distribution strategies for the emerging generation of distributed
memory machines are a topic of on-going research. In this paper, a locality-preserving domain decomposition is proposed that partitions the entire AMR hierarchy from the base level on. It is shown that the approach reduces the
communication costs and simplifies the implementation. Emphasis is put on the effective parallelization of the flux correction procedure at coarse-fine boundaries, which is indispensable for conservative finite volume schemes. An
easily reproducible standard benchmark and a highly resolved parallel AMR
simulation of a diffracting hydrogen-oxygen detonation demonstrate the proposed
strategy in practice
Distributed memory compiler design for sparse problems
A compiler and runtime support mechanism is described and demonstrated. The methods presented are capable of solving a wide range of sparse and unstructured problems in scientific computing. The compiler takes as input a FORTRAN 77 program enhanced with specifications for distributing data, and the compiler outputs a message passing program that runs on a distributed memory computer. The runtime support for this compiler is a library of primitives designed to efficiently support irregular patterns of distributed array accesses and irregular distributed array partitions. A variety of Intel iPSC/860 performance results obtained through the use of this compiler are presented
A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing
This work introduces an innovative parallel, fully-distributed finite element
framework for growing geometries and its application to metal additive
manufacturing. It is well-known that virtual part design and qualification in
additive manufacturing requires highly-accurate multiscale and multiphysics
analyses. Only high performance computing tools are able to handle such
complexity in time frames compatible with time-to-market. However, efficiency,
without loss of accuracy, has rarely held the centre stage in the numerical
community. Here, in contrast, the framework is designed to adequately exploit
the resources of high-end distributed-memory machines. It is grounded on three
building blocks: (1) Hierarchical adaptive mesh refinement with octree-based
meshes; (2) a parallel strategy to model the growth of the geometry; (3)
state-of-the-art parallel iterative linear solvers. Computational experiments
consider the heat transfer analysis at the part scale of the printing process
by powder-bed technologies. After verification against a 3D benchmark, a
strong-scaling analysis assesses performance and identifies major sources of
parallel overhead. A third numerical example examines the efficiency and
robustness of (2) in a curved 3D shape. Unprecedented parallelism and
scalability were achieved in this work. Hence, this framework contributes to
take on higher complexity and/or accuracy, not only of part-scale simulations
of metal or polymer additive manufacturing, but also in welding, sedimentation,
atherosclerosis, or any other physical problem where the physical domain of
interest grows in time
Computational Aerodynamics on unstructed meshes
New 2D and 3D unstructured-grid based flow solvers have been developed for simulating steady compressible flows for aerodynamic applications. The codes employ the full compressible Euler/Navier-Stokes equations. The Spalart-Al Imaras one equation turbulence model is used to model turbulence effects of flows. The spatial discretisation has been obtained using a cell-centred finite volume scheme on unstructured-grids, consisting of triangles in 2D and of tetrahedral and prismatic elements in 3D. The temporal discretisation has been obtained with an explicit multistage Runge-Kutta scheme. An "inflation" mesh generation technique is introduced to effectively reduce the difficulty in generating highly stretched 2D/3D viscous grids in regions near solid surfaces. The explicit flow method is accelerated by the use of a multigrid method with consideration of the high grid aspect ratio in viscous flow simulations. A solution mesh adaptation technique is incorporated to improve the overall accuracy of the 2D inviscid and viscous flow solutions. The 3D flow solvers are parallelised in a MIMD fashion aimed at a PC cluster system to reduce the computing time for aerodynamic applications. The numerical methods are first applied to several 2D inviscid flow cases, including subsonic flow in a bump channel, transonic flow around a NACA0012 airfoil and transonic flow around the RAE 2822 airfoil to validate the numerical algorithms. The rest of the 2D case studies concentrate on viscous flow simulations including laminar/turbulent flow over a flat plate, transonic turbulent flow over the RAE 2822 airfoil, and low speed turbulent flows in a turbine cascade with massive separations. The results are compared to experimental data to assess the accuracy of the method. The over resolved problem with mesh adaptation on viscous flow simulations is addressed with a two phase mesh reconstruction procedure. The solution convergence rate with the aspect ratio adaptive multigrid method and the direct connectivity based multigrid is assessed in several viscous turbulent flow simulations. Several 3D test cases are presented to validate the numerical algorithms for solving Euler/Navier-Stokes equations. Inviscid flow around the M6 wing airfoil is simulated on the tetrahedron based 3D flow solver with an upwind scheme and spatial second order finite volume method. The efficiency of the multigrid for inviscid flow simulations is examined. The efficiency of the parallelised 3D flow solver and the PC cluster system is assessed with simulations of the same case with different partitioning schemes. The present parallelised 3D flow solvers on the PC cluster system show satisfactory parallel computing performance. Turbulent flows over a flat plate are simulated with the tetrahedron based and prismatic based flow solver to validate the viscous term treatment. Next, simulation of turbulent flow over the M6 wing is carried out with the parallelised 3D flow solvers to demonstrate the overall accuracy of the algorithms and the efficiency of the multigrid method. The results show very good agreement with experimental data. A highly stretched and well-formed computational grid near the solid wall and wake regions is generated with the "inflation" method. The aspect ratio adaptive multigrid displayed a good acceleration rate. Finally, low speed flow around the NREL Phase 11 Wind turbine is simulated and the results are compared to the experimental data
Parallel unstructured solvers for linear partial differential equations
This thesis presents the development of a parallel algorithm to solve symmetric
systems of linear equations and the computational implementation of a parallel
partial differential equations solver for unstructured meshes. The proposed
method, called distributive conjugate gradient - DCG, is based on a single-level
domain decomposition method and the conjugate gradient method to obtain a
highly scalable parallel algorithm.
An overview on methods for the discretization of domains and partial differential
equations is given. The partition and refinement of meshes is discussed and
the formulation of the weighted residual method for two- and three-dimensions
presented. Some of the methods to solve systems of linear equations are introduced,
highlighting the conjugate gradient method and domain decomposition
methods. A parallel unstructured PDE solver is proposed and its actual implementation
presented. Emphasis is given to the data partition adopted and the
scheme used for communication among adjacent subdomains is explained. A series
of experiments in processor scalability is also reported.
The derivation and parallelization of DCG are presented and the method validated
throughout numerical experiments. The method capabilities and limitations
were investigated by the solution of the Poisson equation with various source
terms. The experimental results obtained using the parallel solver developed as
part of this work show that the algorithm presented is accurate and highly scalable,
achieving roughly linear parallel speed-up in many of the cases tested
Parallel processors and nonlinear structural dynamics algorithms and software
The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14
- …