15 research outputs found

    A Parallel Adaptive P3M code with Hierarchical Particle Reordering

    Full text link
    We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

    Fast Multipole Method for Gravitational Lensing: Application to High-magnification Quasar Microlensing

    Get PDF
    We introduce the use of the fast multipole method (FMM) to speed up gravitational lensing ray tracing calculations. The method allows very fast calculation of ray deflections when a large number of deflectors, N-*, are involved, while keeping rigorous control on the errors. In particular, we apply this method, in combination with the inverse polygon mapping (IPM) technique, to quasar microlensing to generate microlensing magnification maps with very high workloads (high magnification, large size, and/or high resolution) that require a very large number of deflectors. Using FMM-IPM, the computation time can be reduced by a factor of similar to 10(5) with respect to standard inverse ray shooting (IRS), making the use of this algorithm on a personal computer comparable to the use of standard IRS on GPUs. We also provide a flexible web interface for easy calculation of microlensing magnification maps using FMM-IPM (see https://gloton.ugr.es/microlensing/). We exemplify the power of this new method by applying it to some challenging interesting astrophysical scenarios, including clustered primordial black holes and extremely magnified stars close to the giant arcs of galaxy clusters. We also show the performance/use of FMM to calculate ray deflection for a halo resulting from cosmological simulations composed of a large number (N (sic) 10(7)) of elements.MCIN/AEI PID2020-118687GB-C33 PID2020-118687GB-C31Junta de Andalucia FQM-108, P20_00334 A-FQM-510-UGR20/FEDE

    Experimental cosmology: The early universe after COBE

    Full text link

    Development and Application of Numerical Methods in Biomolecular Solvation

    Full text link
    This work addresses the development of fast summation methods for long range particle interactions and their application to problems in biomolecular solvation, which describes the interaction of proteins or other biomolecules with their solvent environment. At the core of this work are treecodes, tree-based fast summation methods which, for N particles, reduce the cost of computing particle interactions from O(N^2) to O(N log N). Background on fast summation methods and treecodes in particular, as well as several treecode improvements developed in the early stages of this work, are presented. Building on treecodes, dual tree traversal (DTT) methods are another class of tree-based fast summation methods which reduce the cost of computing particle interactions for N particles to O(N). The primary result of this work is the development of an O(N) dual tree traversal fast summation method based on barycentric Lagrange polynomial interpolation (BLDTT). This method is implemented to run across multiple GPU compute nodes in the software package BaryTree. Across different problem sizes, particle distributions, geometries, and interaction kernels, the BLDTT shows consistently better performance than the previously developed barycentric Lagrange treecode (BLTC). The first major biomolecular solvation application of fast summation methods presented is to the Poisson–Boltzmann implicit solvent model, and in particular, the treecode-accelerated boundary integral Poisson–Boltzmann solver (TABI-PB). The work on TABI-PB consists of three primary projects and an application. The first project investigates the impact of various biomolecular surface meshing codes on TABI-PB, and integrated the NanoShaper software into the package, resulting in significantly better performance. Second, a node patch method for discretizing the system of integral equations is introduced to replace the previous centroid collocation scheme, resulting in faster convergence of solvation energies. Third, a new version of TABI-PB with GPU acceleration based on the BLDTT is developed, resulting in even more scalability. An application investigating the binding of biomolecular complexes is undertaken using the previous Taylor treecode-based version of TABI-PB. In addition to these projects, work performed over the course of this thesis integrated TABI-PB into the popular Adaptive Poisson–Boltzmann Solver (APBS) developed at Pacific Northwest National Laboratory. The second major application of fast summation methods is to the 3D reference interaction site model (3D-RISM), a statistical-mechanics based continuum solvation model. This work applies cluster-particle Taylor expansion treecodes to treat long-range asymptotic Coulomb-like potentials in 3D-RISM, and results in significant speedups and improved scalability to the 3D-RISM package implemented in AmberTools. Additionally, preliminary work on specialized GPU-accelerated treecodes based on BaryTree for 3D-RISM long-range asymptotic functions is presented.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168120/1/lwwilson_1.pd

    The 3-D Vortex Particle Method and the Fast Summation Algorithm. G.U. Aero Report 9620

    Get PDF
    In this report the vortex particle method developed by G.S. Winckelmans and A. Leonard for the computation of 3-D unsteady viscous flows is briefly reviewed. Numerical results are given for the interesting phenomenon of the fusion of two vortex rings, which shows that the method works well for long time computation. To reduce the high computational cost of the direct summation, a fast hierarchical algorithm for 3-D vortex particle interactions is being implemented

    The 3-D Vortex Particle Method and the Fast Summation Algorithm. G.U. Aero Report 9620

    Get PDF
    In this report the vortex particle method developed by G.S. Winckelmans and A. Leonard for the computation of 3-D unsteady viscous flows is briefly reviewed. Numerical results are given for the interesting phenomenon of the fusion of two vortex rings, which shows that the method works well for long time computation. To reduce the high computational cost of the direct summation, a fast hierarchical algorithm for 3-D vortex particle interactions is being implemented

    FieldPlacer - A flexible, fast and unconstrained force-directed placement method for heterogeneous reconfigurable logic architectures

    Get PDF
    The field of placement methods for components of integrated circuits, especially in the domain of reconfigurable chip architectures, is mainly dominated by a handful of concepts. While some of these are easy to apply but difficult to adapt to new situations, others are more flexible but rather complex to realize. This work presents the FieldPlacer framework, a flexible, fast and unconstrained force-directed placement method for heterogeneous reconfigurable logic architectures, in particular for the ever important heterogeneous FPGAs. In contrast to many other force-directed placers, this approach is called ‘unconstrained’ as it does not require a priori fixed logic elements in order to calculate a force equilibrium as the solution to a system of equations. Instead, it is based on a free spring embedder simulation of a graph representation which includes all logic block types of a design simultaneously. The FieldPlacer framework offers a huge amount of flexibility in applying different distance norms (e. g., the Manhattan distance) for the force-directed layout and aims at creating adapted layouts for various objective functions, e. g., highest performance or improved routability. Depending on the individual situation, a runtime-quality trade-off can be considered to either produce a decent placement in a very short time or to generate an exceptionally good placement, which takes longer. An extensive comparison with the latest simulated annealing placement method from the well-known Versatile Place and Route (VPR) framework shows that the FieldPlacer approach can create placements of comparable quality much faster than VPR or, alternatively, generate better placements in the same time. The flexibility in defining arbitrary objective functions and the intuitive adaptability of the method, which, among others, includes different concepts from the field of graph drawing, should facilitate further developments with this framework, e. g., for new upcoming optimization targets like the energy consumption of an implemented design
    corecore