3,627 research outputs found

    The cosmological simulation code GADGET-2

    Full text link
    We discuss the cosmological simulation code GADGET-2, a new massively parallel TreeSPH code, capable of following a collisionless fluid with the N-body method, and an ideal gas by means of smoothed particle hydrodynamics (SPH). Our implementation of SPH manifestly conserves energy and entropy in regions free of dissipation, while allowing for fully adaptive smoothing lengths. Gravitational forces are computed with a hierarchical multipole expansion, which can optionally be applied in the form of a TreePM algorithm, where only short-range forces are computed with the `tree'-method while long-range forces are determined with Fourier techniques. Time integration is based on a quasi-symplectic scheme where long-range and short-range forces can be integrated with different timesteps. Individual and adaptive short-range timesteps may also be employed. The domain decomposition used in the parallelisation algorithm is based on a space-filling curve, resulting in high flexibility and tree force errors that do not depend on the way the domains are cut. The code is efficient in terms of memory consumption and required communication bandwidth. It has been used to compute the first cosmological N-body simulation with more than 10^10 dark matter particles, reaching a homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has also been used to carry out very large cosmological SPH simulations that account for radiative cooling and star formation, reaching total particle numbers of more than 250 million. We present the algorithms used by the code and discuss their accuracy and performance using a number of test problems. GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code available at http://www.mpa-garching.mpg.de/gadge

    A Parallel Adaptive P3M code with Hierarchical Particle Reordering

    Full text link
    We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

    Refficientlib: an efficient load-rebalanced adaptive mesh refinement algorithm for high-performance computational physics meshes

    Get PDF
    No separate or additional fees are collected for access to or distribution of the work.In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper are its capability of handling multiple types of elements in two and three dimensions (triangular, quadrilateral, tetrahedral, and hexahedral), the small amount of memory required per processor, and the parallel scalability up to thousands of processors. The presented algorithm is also capable of dealing with nonbalanced hierarchical refinement, where multirefinement level jumps are possible between neighbor elements. An algorithm for dealing with load rebalancing is also presented, which allows us to move the hierarchical data structure between processors so that load unbalancing is kept below an acceptable level at all times during the simulation. A particular feature of the proposed algorithm is that arbitrary renumbering algorithms can be used in the load rebalancing step, including both graph partitioning and space-filling renumbering algorithms. The presented algorithm is packed in the Fortran 2003 object oriented library \textttRefficientLib, whose interface calls which allow it to be used from any computational physics code are summarized. Finally, numerical experiments illustrating the performance and scalability of the algorithm are presented.Peer ReviewedPostprint (published version

    Refficientlib: an efficient load-rebalanced adaptive mesh refinement algorithm for high-performance computational physics meshes

    Get PDF
    In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper are its capability of handling multiple types of elements in two and three dimensions (triangular, quadrilateral, tetrahedral, and hexahedral), the small amount of memory required per processor, and the parallel scalability up to thousands of processors. The presented algorithm is also capable of dealing with nonbalanced hierarchical refinement, where multirefinement level jumps are possible between neighbor elements. An algorithm for dealing with load rebalancing is also presented, which allows us to move the hierarchical data structure between processors so that load unbalancing is kept below an acceptable level at all times during the simulation. A particular feature of the proposed algorithm is that arbitrary renumbering algorithms can be used in the load rebalancing step, including both graph partitioning and space-filling renumbering algorithms. The presented algorithm is packed in the Fortran 2003 object oriented library \textttRefficientLib, whose interface calls which allow it to be used from any computational physics code are summarized. Finally, numerical experiments illustrating the performance and scalability of the algorithm are presented. No separate or additional fees are collected for access to or distribution of the wor

    Parallel TreeSPH

    Get PDF
    We describe PTreeSPH, a gravity treecode combined with an SPH hydrodynamics code designed for massively parallel supercomputers having distributed memory. Our computational algorithm is based on the popular TreeSPH code of Hernquist & Katz (1989). PTreeSPH utilizes a domain decomposition procedure and a synchronous hypercube communication paradigm to build self-contained subvolumes of the simulation on each processor at every timestep. Computations then proceed in a manner analogous to a serial code. We use the Message Passing Interface (MPI) communications package, making our code easily portable to a variety of parallel systems. PTreeSPH uses individual smoothing lengths and timesteps, with a communication algorithm designed to minimize exchange of information while still providing all information required to accurately perform SPH computations. We have additionally incorporated cosmology, periodic boundary conditions with forces calculated using a quadrupole Ewald summation method, and radiative cooling and heating from a parameterized ionizing background following Katz, Weinberg & Hernquist (1996). The addition of other physical processes, such as star formation, is straightforward. A cosmological simulation from z=49 to z=2 with 64^3 gas particles and 64^3 dark matter particles requires ~6000 node-hours on a Cray T3D, with a communications overhead of ~10% and is load balanced to a ~90% level. When used on the new Cray T3E, this code will be capable of performing cosmological hydrodynamical simulations down to z=0 with ~2x10^6 particles, or to z=2 with ~10^7 particles, in a reasonable amount of time. Even larger simulations will be practical in situations where the matter is not highly clustered or when periodic boundaries are not required.Comment: 30 pages, 6 Postscript figures, Submitted to New Astronom

    The DUNE-ALUGrid Module

    Get PDF
    In this paper we present the new DUNE-ALUGrid module. This module contains a major overhaul of the sources from the ALUgrid library and the binding to the DUNE software framework. The main changes include user defined load balancing, parallel grid construction, and an redesign of the 2d grid which can now also be used for parallel computations. In addition many improvements have been introduced into the code to increase the parallel efficiency and to decrease the memory footprint. The original ALUGrid library is widely used within the DUNE community due to its good parallel performance for problems requiring local adaptivity and dynamic load balancing. Therefore, this new model will benefit a number of DUNE users. In addition we have added features to increase the range of problems for which the grid manager can be used, for example, introducing a 3d tetrahedral grid using a parallel newest vertex bisection algorithm for conforming grid refinement. In this paper we will discuss the new features, extensions to the DUNE interface, and explain for various examples how the code is used in parallel environments.Comment: 25 pages, 11 figure

    An adaptive fixed-mesh ALE method for free surface flows

    Get PDF
    In this work we present a Fixed-Mesh ALE method for the numerical simulation of free surface flows capable of using an adaptive finite element mesh covering a background domain. This mesh is successively refined and unrefined at each time step in order to focus the computational effort on the spatial regions where it is required. Some of the main ingredients of the formulation are the use of an Arbitrary-Lagrangian–Eulerian formulation for computing temporal derivatives, the use of stabilization terms for stabilizing convection, stabilizing the lack of compatibility between velocity and pressure interpolation spaces, and stabilizing the ill-conditioning introduced by the cuts on the background finite element mesh, and the coupling of the algorithm with an adaptive mesh refinement procedure suitable for running on distributed memory environments. Algorithmic steps for the projection between meshes are presented together with the algebraic fractional step approach used for improving the condition number of the linear systems to be solved. The method is tested in several numerical examples. The expected convergence rates both in space and time are observed. Smooth solution fields for both velocity and pressure are obtained (as a result of the contribution of the stabilization terms). Finally, a good agreement between the numerical results and the reference experimental data is obtained.Postprint (published version

    Energy Efficient Ant Colony Algorithms for Data Aggregation in Wireless Sensor Networks

    Get PDF
    In this paper, a family of ant colony algorithms called DAACA for data aggregation has been presented which contains three phases: the initialization, packet transmission and operations on pheromones. After initialization, each node estimates the remaining energy and the amount of pheromones to compute the probabilities used for dynamically selecting the next hop. After certain rounds of transmissions, the pheromones adjustment is performed periodically, which combines the advantages of both global and local pheromones adjustment for evaporating or depositing pheromones. Four different pheromones adjustment strategies are designed to achieve the global optimal network lifetime, namely Basic-DAACA, ES-DAACA, MM-DAACA and ACS-DAACA. Compared with some other data aggregation algorithms, DAACA shows higher superiority on average degree of nodes, energy efficiency, prolonging the network lifetime, computation complexity and success ratio of one hop transmission. At last we analyze the characteristic of DAACA in the aspects of robustness, fault tolerance and scalability.Comment: To appear in Journal of Computer and System Science
    • …
    corecore