Search CORE

2,712 research outputs found

A Parallel Adaptive P3M code with Hierarchical Particle Reordering

Author: Anderson
Bagla
Balsara
Barnes
Becciani
Blumenthal
Bode
Boris
Brieu
Couchman
Couchman
Dave
Decyk
Dubinski
Dubinski
Eastwood
Efstathiou
Evrard
Ferrell
Frenk
Frigo
Gingold
Greengard
H.M.P. Couchman
Hernquist
Hernquist
Hockney
Kawata
Kravtsov
Li
Lia
MacFarland
Miocchi
Monaghan
Navarro
Pearce
Robert J. Thacker
Serna
Snir
Spergel
Springel
Springel
Steinmetz
Sugimoto
Swarztrauber
Thacker
Thacker
Thacker
Thacker
Theuns
Vetterling
Wadsley
White
Wisdom
Wood
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

A fast GPU Monte Carlo Radiative Heat Transfer Implementation for Coupling with Direct Numerical Simulation

Author: Pecnik Rene
Silvestri Simone
Publication venue: 'Elsevier BV'
Publication date: 29/09/2018
Field of study

We implemented a fast Reciprocal Monte Carlo algorithm, to accurately solve radiative heat transfer in turbulent flows of non-grey participating media that can be coupled to fully resolved turbulent flows, namely to Direct Numerical Simulation (DNS). The spectrally varying absorption coefficient is treated in a narrow-band fashion with a correlated-k distribution. The implementation is verified with analytical solutions and validated with results from literature and line-by-line Monte Carlo computations. The method is implemented on GPU with a thorough attention to memory transfer and computational efficiency. The bottlenecks that dominate the computational expenses are addressed and several techniques are proposed to optimize the GPU execution. By implementing the proposed algorithmic accelerations, a speed-up of up to 3 orders of magnitude can be achieved, while maintaining the same accuracy

arXiv.org e-Print Archive

An Efficient Sliding Mesh Interface Method for High-Order Discontinuous Galerkin Schemes

Author: Beck Andrea
Dürrwächter Jakob
Kempf Daniel
Kopper Patrick
Kurz Marius
Munz Claus-Dieter
Publication venue
Publication date: 10/08/2020
Field of study

Sliding meshes are a powerful method to treat deformed domains in computational fluid dynamics, where different parts of the domain are in relative motion. In this paper, we present an efficient implementation of a sliding mesh method into a discontinuous Galerkin compressible Navier-Stokes solver and its application to a large eddy simulation of a 1-1/2 stage turbine. The method is based on the mortar method and is high-order accurate. It can handle three-dimensional sliding mesh interfaces with various interface shapes. For plane interfaces, which are the most common case, conservativity and free-stream preservation are ensured. We put an emphasis on efficient parallel implementation. Our implementation generates little computational and storage overhead. Inter-node communication via MPI in a dynamically changing mesh topology is reduced to a bare minimum by ensuring a priori information about communication partners and data sorting. We provide performance and scaling results showing the capability of the implementation strategy. Apart from analytical validation computations and convergence results, we present a wall-resolved implicit LES of the 1-1/2 stage Aachen turbine test case as a large scale practical application example

arXiv.org e-Print Archive

Hypercube algorithms on mesh connected multicomputers

Author: Díaz de Cerio Ripalda Luis Manuel
González Colás Antonio María
Valero García Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algorithm that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: (a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; (b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and (c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, the paper only focuses on the particular case of meshes.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Numerical wave propagation for the triangular $P1_{DG}$ - $P2$ finite element pair

Author: Arakawa
Brenner
C.J. Cotter
Comblen
Cotter
Cotter
D.A. Ham
Fox-Rabinovitz
Gresho
Kossevich
Le Roux
Le Roux
Majewski
Randall
Raviart
Ringler
Roux
Roux
Satoh
Thuburn
Thuburn
Umgiesser
Walters
Publication venue: 'Elsevier BV'
Publication date: 21/12/2010
Field of study

Inertia-gravity mode and Rossby mode dispersion properties are examined for discretisations of the linearized rotating shallow-water equations using the

P1_{DG}

P2

finite element pair on arbitrary triangulations in planar geometry. A discrete Helmholtz decomposition of the functions in the velocity space based on potentials taken from the pressure space is used to provide a complete description of the numerical wave propagation for the discretised equations. In the

f

-plane case, this decomposition is used to obtain decoupled equations for the geostrophic modes, the inertia-gravity modes, and the inertial oscillations. As has been noticed previously, the geostrophic modes are steady. The Helmholtz decomposition is used to show that the resulting inertia-gravity wave equation is third-order accurate in space. In general the \pdgp finite element pair is second-order accurate, so this leads to very accurate wave propagation. It is further shown that the only spurious modes supported by this discretisation are spurious inertial oscillations which have frequency

f

, and which do not propagate. The Helmholtz decomposition also allows a simple derivation of the quasi-geostrophic limit of the discretised

P1_{DG}

P2

equations in the

\beta

-plane case, resulting in a Rossby wave equation which is also third-order accurate.Comment: Revised version prior to final journal submissio

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Vectorization and Parallelization of the Adaptive Mesh Refinement N-body Code

Author: Yahagi Hideki
Publication venue: 'Oxford University Press (OUP)'
Publication date: 14/07/2005
Field of study

In this paper, we describe our vectorized and parallelized adaptive mesh refinement (AMR) N-body code with shared time steps, and report its performance on a Fujitsu VPP5000 vector-parallel supercomputer. Our AMR N-body code puts hierarchical meshes recursively where higher resolution is required and the time step of all particles are the same. The parts which are the most difficult to vectorize are loops that access the mesh data and particle data. We vectorized such parts by changing the loop structure, so that the innermost loop steps through the cells instead of the particles in each cell, in other words, by changing the loop order from the depth-first order to the breadth-first order. Mass assignment is also vectorizable using this loop order exchange and splitting the loop into

2^{N_{dim}}

loops, if the cloud-in-cell scheme is adopted. Here,

N_{dim}

is the number of dimension. These vectorization schemes which eliminate the unvectorized loops are applicable to parallelization of loops for shared-memory multiprocessors. We also parallelized our code for distributed memory machines. The important part of parallelization is data decomposition. We sorted the hierarchical mesh data by the Morton order, or the recursive N-shaped order, level by level and split and allocated the mesh data to the processors. Particles are allocated to the processor to which the finest refined cells including the particles are also assigned. Our timing analysis using the

\Lambda

-dominated cold dark matter simulations shows that our parallel code speeds up almost ideally up to 32 processors, the largest number of processors in our test.Comment: 21pages, 16 figures, to be published in PASJ (Vol. 57, No. 5, Oct. 2005

arXiv.org e-Print Archive

Crossref

CERN Document Server