8,107 research outputs found
Adaptive mesh refinement computation of acoustic radiation from an engine intake
A block-structured adaptive mesh refinement (AMR) method was applied to the computational problem of acoustic radiation from an aeroengine intake. The aim is to improve the computational and storage efficiency in aeroengine noise prediction through reduction of computational cells. A parallel implementation of the adaptive mesh refinement algorithm was achieved using message passing interface. It combined a range of 2nd- and 4th-order spatial stencils, a 4th-order low-dissipation and low-dispersion Runge–Kutta scheme for time integration and several different interpolation methods. Both the parallel AMR algorithms and numerical issues were introduced briefly in this work. To solve the problem of acoustic radiation from an aeroengine intake, the code was extended to support body-fitted grid structures. The problem of acoustic radiation was solved with linearised Euler equations. The AMR results were compared with the previous results computed on a uniformly fine mesh to demonstrate the accuracy and the efficiency of the current AMR strategy. As the computational load of the whole adaptively refined mesh has to be balanced between nodes on-line, the parallel performance of the existing code deteriorates along with the increase of processors due to the expensive inter-nodes memory communication costs. The potential solution was suggested in the end
Vectorization and Parallelization of the Adaptive Mesh Refinement N-body Code
In this paper, we describe our vectorized and parallelized adaptive mesh
refinement (AMR) N-body code with shared time steps, and report its performance
on a Fujitsu VPP5000 vector-parallel supercomputer. Our AMR N-body code puts
hierarchical meshes recursively where higher resolution is required and the
time step of all particles are the same. The parts which are the most difficult
to vectorize are loops that access the mesh data and particle data. We
vectorized such parts by changing the loop structure, so that the innermost
loop steps through the cells instead of the particles in each cell, in other
words, by changing the loop order from the depth-first order to the
breadth-first order. Mass assignment is also vectorizable using this loop order
exchange and splitting the loop into loops, if the cloud-in-cell
scheme is adopted. Here, is the number of dimension. These
vectorization schemes which eliminate the unvectorized loops are applicable to
parallelization of loops for shared-memory multiprocessors. We also
parallelized our code for distributed memory machines. The important part of
parallelization is data decomposition. We sorted the hierarchical mesh data by
the Morton order, or the recursive N-shaped order, level by level and split and
allocated the mesh data to the processors. Particles are allocated to the
processor to which the finest refined cells including the particles are also
assigned. Our timing analysis using the -dominated cold dark matter
simulations shows that our parallel code speeds up almost ideally up to 32
processors, the largest number of processors in our test.Comment: 21pages, 16 figures, to be published in PASJ (Vol. 57, No. 5, Oct.
2005
Relativistic MHD with Adaptive Mesh Refinement
This paper presents a new computer code to solve the general relativistic
magnetohydrodynamics (GRMHD) equations using distributed parallel adaptive mesh
refinement (AMR). The fluid equations are solved using a finite difference
Convex ENO method (CENO) in 3+1 dimensions, and the AMR is Berger-Oliger.
Hyperbolic divergence cleaning is used to control the
constraint. We present results from three flat space tests, and examine the
accretion of a fluid onto a Schwarzschild black hole, reproducing the Michel
solution. The AMR simulations substantially improve performance while
reproducing the resolution equivalent unigrid simulation results. Finally, we
discuss strong scaling results for parallel unigrid and AMR runs.Comment: 24 pages, 14 figures, 3 table
Refficientlib: an efficient load-rebalanced adaptive mesh refinement algorithm for high-performance computational physics meshes
No separate or additional fees are collected for access to or distribution of the work.In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper are its capability of handling multiple types of elements in two and three dimensions (triangular, quadrilateral, tetrahedral, and hexahedral), the small amount of memory required per processor, and the parallel scalability up to thousands of processors. The presented algorithm is also capable of dealing with nonbalanced hierarchical refinement, where multirefinement level jumps are possible between neighbor elements. An algorithm for dealing with load rebalancing is also presented, which allows us to move the hierarchical data structure between processors so that load unbalancing is kept below an acceptable level at all times during the simulation. A particular feature of the proposed algorithm is that arbitrary renumbering algorithms can be used in the load rebalancing step, including both graph partitioning and space-filling renumbering algorithms. The presented algorithm is packed in the Fortran 2003 object oriented library \textttRefficientLib, whose interface calls which allow it to be used from any computational physics code are summarized. Finally, numerical experiments illustrating the performance and scalability of the algorithm are presented.Peer ReviewedPostprint (published version
Hydra: A Parallel Adaptive Grid Code
We describe the first parallel implementation of an adaptive
particle-particle, particle-mesh code with smoothed particle hydrodynamics.
Parallelisation of the serial code, ``Hydra'', is achieved by using CRAFT, a
Cray proprietary language which allows rapid implementation of a serial code on
a parallel machine by allowing global addressing of distributed memory.
The collisionless variant of the code has already completed several 16.8
million particle cosmological simulations on a 128 processor Cray T3D whilst
the full hydrodynamic code has completed several 4.2 million particle combined
gas and dark matter runs. The efficiency of the code now allows parameter-space
explorations to be performed routinely using particles of each species.
A complete run including gas cooling, from high redshift to the present epoch
requires approximately 10 hours on 64 processors.
In this paper we present implementation details and results of the
performance and scalability of the CRAFT version of Hydra under varying degrees
of particle clustering.Comment: 23 pages, LaTex plus encapsulated figure
Optimisation of patch distribution strategies for AMR applications
As core counts increase in the world's most powerful supercomputers, applications are becoming limited not only by computational power, but also by data availability. In the race to exascale, efficient and effective communication policies are key to achieving optimal application performance. Applications using adaptive mesh refinement (AMR) trade off communication for computational load balancing, to enable the focused computation of specific areas of interest. This class of application is particularly susceptible to the communication performance of the underlying architectures, and are inherently difficult to scale efficiently. In this paper we present a study of the effect of patch distribution strategies on the scalability of an AMR code. We demonstrate the significance of patch placement on communication overheads, and by balancing the computation and communication costs of patches, we develop a scheme to optimise performance of a specific, industry-strength, benchmark application
A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing
This work introduces an innovative parallel, fully-distributed finite element
framework for growing geometries and its application to metal additive
manufacturing. It is well-known that virtual part design and qualification in
additive manufacturing requires highly-accurate multiscale and multiphysics
analyses. Only high performance computing tools are able to handle such
complexity in time frames compatible with time-to-market. However, efficiency,
without loss of accuracy, has rarely held the centre stage in the numerical
community. Here, in contrast, the framework is designed to adequately exploit
the resources of high-end distributed-memory machines. It is grounded on three
building blocks: (1) Hierarchical adaptive mesh refinement with octree-based
meshes; (2) a parallel strategy to model the growth of the geometry; (3)
state-of-the-art parallel iterative linear solvers. Computational experiments
consider the heat transfer analysis at the part scale of the printing process
by powder-bed technologies. After verification against a 3D benchmark, a
strong-scaling analysis assesses performance and identifies major sources of
parallel overhead. A third numerical example examines the efficiency and
robustness of (2) in a curved 3D shape. Unprecedented parallelism and
scalability were achieved in this work. Hence, this framework contributes to
take on higher complexity and/or accuracy, not only of part-scale simulations
of metal or polymer additive manufacturing, but also in welding, sedimentation,
atherosclerosis, or any other physical problem where the physical domain of
interest grows in time
An adaptive fixed-mesh ALE method for free surface flows
In this work we present a Fixed-Mesh ALE method for the numerical simulation of free surface flows capable of using an adaptive finite element mesh covering a background domain. This mesh is successively refined and unrefined at each time step in order to focus the computational effort on the spatial regions where it is required. Some of the main ingredients of the formulation are the use of an Arbitrary-Lagrangian–Eulerian formulation for computing temporal derivatives, the use of stabilization terms for stabilizing convection, stabilizing the lack of compatibility between velocity and pressure interpolation spaces, and stabilizing the ill-conditioning introduced by the cuts on the background finite element mesh, and the coupling of the algorithm with an adaptive mesh refinement procedure suitable for running on distributed memory environments. Algorithmic steps for the projection between meshes are presented together with the algebraic fractional step approach used for improving the condition number of the linear systems to be solved. The method is tested in several numerical examples. The expected convergence rates both in space and time are observed. Smooth solution fields for both velocity and pressure are obtained (as a result of the contribution of the stabilization terms). Finally, a good agreement between the numerical results and the reference experimental data is obtained.Postprint (published version
- …