Search CORE

15 research outputs found

A Parallel Adaptive P3M code with Hierarchical Particle Reordering

Author: Anderson
Bagla
Balsara
Barnes
Becciani
Blumenthal
Bode
Boris
Brieu
Couchman
Couchman
Dave
Decyk
Dubinski
Dubinski
Eastwood
Efstathiou
Evrard
Ferrell
Frenk
Frigo
Gingold
Greengard
H.M.P. Couchman
Hernquist
Hernquist
Hockney
Kawata
Kravtsov
Li
Lia
MacFarland
Miocchi
Monaghan
Navarro
Pearce
Robert J. Thacker
Serna
Snir
Spergel
Springel
Springel
Steinmetz
Sugimoto
Swarztrauber
Thacker
Thacker
Thacker
Thacker
Theuns
Vetterling
Wadsley
White
Wisdom
Wood
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Fast Multipole Method for Gravitational Lensing: Application to High-magnification Quasar Microlensing

Author: Jiménez Vicente Jorge
Publication venue: 'American Astronomical Society'
Publication date: 13/12/2022
Field of study

We introduce the use of the fast multipole method (FMM) to speed up gravitational lensing ray tracing calculations. The method allows very fast calculation of ray deflections when a large number of deflectors, N-*, are involved, while keeping rigorous control on the errors. In particular, we apply this method, in combination with the inverse polygon mapping (IPM) technique, to quasar microlensing to generate microlensing magnification maps with very high workloads (high magnification, large size, and/or high resolution) that require a very large number of deflectors. Using FMM-IPM, the computation time can be reduced by a factor of similar to 10(5) with respect to standard inverse ray shooting (IRS), making the use of this algorithm on a personal computer comparable to the use of standard IRS on GPUs. We also provide a flexible web interface for easy calculation of microlensing magnification maps using FMM-IPM (see https://gloton.ugr.es/microlensing/). We exemplify the power of this new method by applying it to some challenging interesting astrophysical scenarios, including clustered primordial black holes and extremely magnified stars close to the giant arcs of galaxy clusters. We also show the performance/use of FMM to calculate ray deflection for a halo resulting from cosmological simulations composed of a large number (N (sic) 10(7)) of elements.MCIN/AEI PID2020-118687GB-C33 PID2020-118687GB-C31Junta de Andalucia FQM-108, P20_00334 A-FQM-510-UGR20/FEDE

Repositorio Institucional Universidad de Granada

Experimental cosmology: The early universe after COBE

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Hybrid Systems for N-body Simulations

Author: Spinnato P.F.
Publication venue: Eigen Beheer
Publication date: 01/01/2003
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Development and Application of Numerical Methods in Biomolecular Solvation

Author: Wilson Leighton
Publication venue
Publication date: 01/01/2021
Field of study

This work addresses the development of fast summation methods for long range particle interactions and their application to problems in biomolecular solvation, which describes the interaction of proteins or other biomolecules with their solvent environment. At the core of this work are treecodes, tree-based fast summation methods which, for N particles, reduce the cost of computing particle interactions from O(N^2) to O(N log N). Background on fast summation methods and treecodes in particular, as well as several treecode improvements developed in the early stages of this work, are presented. Building on treecodes, dual tree traversal (DTT) methods are another class of tree-based fast summation methods which reduce the cost of computing particle interactions for N particles to O(N). The primary result of this work is the development of an O(N) dual tree traversal fast summation method based on barycentric Lagrange polynomial interpolation (BLDTT). This method is implemented to run across multiple GPU compute nodes in the software package BaryTree. Across different problem sizes, particle distributions, geometries, and interaction kernels, the BLDTT shows consistently better performance than the previously developed barycentric Lagrange treecode (BLTC). The first major biomolecular solvation application of fast summation methods presented is to the Poisson–Boltzmann implicit solvent model, and in particular, the treecode-accelerated boundary integral Poisson–Boltzmann solver (TABI-PB). The work on TABI-PB consists of three primary projects and an application. The first project investigates the impact of various biomolecular surface meshing codes on TABI-PB, and integrated the NanoShaper software into the package, resulting in significantly better performance. Second, a node patch method for discretizing the system of integral equations is introduced to replace the previous centroid collocation scheme, resulting in faster convergence of solvation energies. Third, a new version of TABI-PB with GPU acceleration based on the BLDTT is developed, resulting in even more scalability. An application investigating the binding of biomolecular complexes is undertaken using the previous Taylor treecode-based version of TABI-PB. In addition to these projects, work performed over the course of this thesis integrated TABI-PB into the popular Adaptive Poisson–Boltzmann Solver (APBS) developed at Pacific Northwest National Laboratory. The second major application of fast summation methods is to the 3D reference interaction site model (3D-RISM), a statistical-mechanics based continuum solvation model. This work applies cluster-particle Taylor expansion treecodes to treat long-range asymptotic Coulomb-like potentials in 3D-RISM, and results in significant speedups and improved scalability to the 3D-RISM package implemented in AmberTools. Additionally, preliminary work on specialized GPU-accelerated treecodes based on BaryTree for 3D-RISM long-range asymptotic functions is presented.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168120/1/lwwilson_1.pd

Deep Blue Documents at the University of Michigan

The 3-D Vortex Particle Method and the Fast Summation Algorithm. G.U. Aero Report 9620

Author: Qian L.
Vezza M.
Publication venue: Department of Aerospace Engineering, University of Glasgow
Publication date: 01/09/1996
Field of study

In this report the vortex particle method developed by G.S. Winckelmans and A. Leonard for the computation of 3-D unsteady viscous flows is briefly reviewed. Numerical results are given for the interesting phenomenon of the fusion of two vortex rings, which shows that the method works well for long time computation. To reduce the high computational cost of the direct summation, a fast hierarchical algorithm for 3-D vortex particle interactions is being implemented

Enlighten

The 3-D Vortex Particle Method and the Fast Summation Algorithm. G.U. Aero Report 9620

Author: Qian L.
Vezza M.
Publication venue: Department of Aerospace Engineering, University of Glasgow
Publication date: 01/09/1996
Field of study

Recommended from our members

Parallel, out-of-core methods for N-body simulation

Author: Salmon J.
Warren M.S.
Publication venue: Los Alamos National Laboratory
Publication date: 01/03/1997
Field of study

Hierarchical treecodes have, to a large extent, converted the compute-bound N-body problem into a memory-bound problem. The large ratio of DRAM to disk pricing suggests use of out-of-core techniques to overcome memory capacity limitations. The authors describe a parallel, out-of-core treecode library, targeted at machines with independent secondary storage associated with each processor. Borrowing the space-filling curve techniques from the in-core library, and manually paging, resulting in excellent spatial and temporal locality and very good performance

UNT Digital Library

A geometrically non-linear time-domain unsteady lifting-line theory

Author: Bird H. J. A.
Boutet J.
Garrick I. E.
Greenberg J. M.
Hepperle M.
Holten T. V.
Leishman G. J.
McCroskey W. J.
Prandtl L.
Press W. H.
Ramesh K.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 06/01/2019
Field of study

Crossref

Edinburgh Research Explorer

FieldPlacer - A flexible, fast and unconstrained force-directed placement method for heterogeneous reconfigurable logic architectures

Author: Feld Dustin
Publication venue: Fraunhofer Verlag
Publication date: 01/01/2017
Field of study

The field of placement methods for components of integrated circuits, especially in the domain of reconfigurable chip architectures, is mainly dominated by a handful of concepts. While some of these are easy to apply but difficult to adapt to new situations, others are more flexible but rather complex to realize. This work presents the FieldPlacer framework, a flexible, fast and unconstrained force-directed placement method for heterogeneous reconfigurable logic architectures, in particular for the ever important heterogeneous FPGAs. In contrast to many other force-directed placers, this approach is called ‘unconstrained’ as it does not require a priori fixed logic elements in order to calculate a force equilibrium as the solution to a system of equations. Instead, it is based on a free spring embedder simulation of a graph representation which includes all logic block types of a design simultaneously. The FieldPlacer framework offers a huge amount of flexibility in applying different distance norms (e. g., the Manhattan distance) for the force-directed layout and aims at creating adapted layouts for various objective functions, e. g., highest performance or improved routability. Depending on the individual situation, a runtime-quality trade-off can be considered to either produce a decent placement in a very short time or to generate an exceptionally good placement, which takes longer. An extensive comparison with the latest simulated annealing placement method from the well-known Versatile Place and Route (VPR) framework shows that the FieldPlacer approach can create placements of comparable quality much faster than VPR or, alternatively, generate better placements in the same time. The flexibility in defining arbitrary objective functions and the intuitive adaptability of the method, which, among others, includes different concepts from the field of graph drawing, should facilitate further developments with this framework, e. g., for new upcoming optimization targets like the energy consumption of an implemented design

Kölner UniversitätsPublikationsServer

Fraunhofer-ePrints