Search CORE

3,628 research outputs found

The cosmological simulation code GADGET-2

Author: Abel
Appel
Ascasibar
Bagla
Bagla
Balsara
Barnes
Barnes
Bate
Bode
Bode
Bonnell
Boss
Bryan
Burkert
Cen
Cen
Cen
Couchman
Couchman
Cox
Cuadra
Davé
Davé
Dehnen
Di Matteo
Dolag
Dolag
Dolag
Dolag
Dubinski
Dubinski
Duncan
Efstathiou
Evrard
Evrard
Frenk
Fryxell
Fukushige
Gao
Gingold
Gnedin
Hairer
Heitmann
Hernquist
Hernquist
Hernquist
Hernquist
Hernquist
Hockney
Hut
Jenkins
Jenkins
Jernigan
Jubelgas
Kang
Katz
Kay
Klein
Klypin
Knebe
Kravtsov
Kravtsov
Kravtsov
Linder
Lucy
Makino
Makino
Makino
Marri
Monaghan
Monaghan
Monaghan
Monaghan
Motl
Navarro
Navarro
Norman
O'Shea
O'Shea
Owen
Pen
Poludnenko
Power
Quilis
Quinn
Rasio
Refregier
Saha
Salmon
Scannapieco
Serna
Serna
Sommer-Larsen
Springel
Springel
Springel
Springel
Springel
Springel
Springel
Springel
Stadel
Steinmetz
Steinmetz
Teyssier
Tissera
Tormen
Tornatore
Tornatore
Van Den Bosch
Volker Springel
Wadsley
Warren
Warren
White
White
White
Whitehurst
Xu
Yepes
Yoshida
Yoshida
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

We discuss the cosmological simulation code GADGET-2, a new massively parallel TreeSPH code, capable of following a collisionless fluid with the N-body method, and an ideal gas by means of smoothed particle hydrodynamics (SPH). Our implementation of SPH manifestly conserves energy and entropy in regions free of dissipation, while allowing for fully adaptive smoothing lengths. Gravitational forces are computed with a hierarchical multipole expansion, which can optionally be applied in the form of a TreePM algorithm, where only short-range forces are computed with the `tree'-method while long-range forces are determined with Fourier techniques. Time integration is based on a quasi-symplectic scheme where long-range and short-range forces can be integrated with different timesteps. Individual and adaptive short-range timesteps may also be employed. The domain decomposition used in the parallelisation algorithm is based on a space-filling curve, resulting in high flexibility and tree force errors that do not depend on the way the domains are cut. The code is efficient in terms of memory consumption and required communication bandwidth. It has been used to compute the first cosmological N-body simulation with more than 10^10 dark matter particles, reaching a homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has also been used to carry out very large cosmological SPH simulations that account for radiative cooling and star formation, reaching total particle numbers of more than 250 million. We present the algorithms used by the code and discuss their accuracy and performance using a number of test problems. GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code available at http://www.mpa-garching.mpg.de/gadge

arXiv.org e-Print Archive

A Parallel Adaptive P3M code with Hierarchical Particle Reordering

Author: Anderson
Bagla
Balsara
Barnes
Becciani
Blumenthal
Bode
Boris
Brieu
Couchman
Couchman
Dave
Decyk
Dubinski
Dubinski
Eastwood
Efstathiou
Evrard
Ferrell
Frenk
Frigo
Gingold
Greengard
H.M.P. Couchman
Hernquist
Hernquist
Hockney
Kawata
Kravtsov
Li
Lia
MacFarland
Miocchi
Monaghan
Navarro
Pearce
Robert J. Thacker
Serna
Snir
Spergel
Springel
Springel
Steinmetz
Sugimoto
Swarztrauber
Thacker
Thacker
Thacker
Thacker
Theuns
Vetterling
Wadsley
White
Wisdom
Wood
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Refficientlib: an efficient load-rebalanced adaptive mesh refinement algorithm for high-performance computational physics meshes

Author: Baiges Aznar Joan
Bayona Roa Camilo Andrés
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2017
Field of study

No separate or additional fees are collected for access to or distribution of the work.In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper are its capability of handling multiple types of elements in two and three dimensions (triangular, quadrilateral, tetrahedral, and hexahedral), the small amount of memory required per processor, and the parallel scalability up to thousands of processors. The presented algorithm is also capable of dealing with nonbalanced hierarchical refinement, where multirefinement level jumps are possible between neighbor elements. An algorithm for dealing with load rebalancing is also presented, which allows us to move the hierarchical data structure between processors so that load unbalancing is kept below an acceptable level at all times during the simulation. A particular feature of the proposed algorithm is that arbitrary renumbering algorithms can be used in the load rebalancing step, including both graph partitioning and space-filling renumbering algorithms. The presented algorithm is packed in the Fortran 2003 object oriented library \textttRefficientLib, whose interface calls which allow it to be used from any computational physics code are summarized. Finally, numerical experiments illustrating the performance and scalability of the algorithm are presented.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Refficientlib: an efficient load-rebalanced adaptive mesh refinement algorithm for high-performance computational physics meshes

Author: Baiges Joan
Bayona Camilo
Publication venue
Publication date
Field of study

In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper are its capability of handling multiple types of elements in two and three dimensions (triangular, quadrilateral, tetrahedral, and hexahedral), the small amount of memory required per processor, and the parallel scalability up to thousands of processors. The presented algorithm is also capable of dealing with nonbalanced hierarchical refinement, where multirefinement level jumps are possible between neighbor elements. An algorithm for dealing with load rebalancing is also presented, which allows us to move the hierarchical data structure between processors so that load unbalancing is kept below an acceptable level at all times during the simulation. A particular feature of the proposed algorithm is that arbitrary renumbering algorithms can be used in the load rebalancing step, including both graph partitioning and space-filling renumbering algorithms. The presented algorithm is packed in the Fortran 2003 object oriented library \textttRefficientLib, whose interface calls which allow it to be used from any computational physics code are summarized. Finally, numerical experiments illustrating the performance and scalability of the algorithm are presented. No separate or additional fees are collected for access to or distribution of the wor

Scipedia

Parallel TreeSPH

Author: Aarseth
Barnes
Barnes
Bryan
Bryan
Colella
Couchman
Dikaiakos
Dubinski
Efstathiou
Evrard
Ewald
Gammie
Gingold
Gnedin
Hernquist
Hernquist
Hernquist
Hernquist
Hernquist
Hernquist
Heyl
Hockney
Jenkins
John Dubinski
Kang
Katz
Katz
Katz
Katz
Kauffmann
Kundic
Lars Hernquist
Lucy
Mihos
Monaghan
Monaghan
Pearce
Pen
Press
Romeel Davé
Ryu
Salmon
Somerville
Steinmetz
Villumsen
Warren
Weinberg
Xu
Xu
Zel'dovich
Publication venue: 'Elsevier BV'
Publication date: 16/01/1997
Field of study

We describe PTreeSPH, a gravity treecode combined with an SPH hydrodynamics code designed for massively parallel supercomputers having distributed memory. Our computational algorithm is based on the popular TreeSPH code of Hernquist & Katz (1989). PTreeSPH utilizes a domain decomposition procedure and a synchronous hypercube communication paradigm to build self-contained subvolumes of the simulation on each processor at every timestep. Computations then proceed in a manner analogous to a serial code. We use the Message Passing Interface (MPI) communications package, making our code easily portable to a variety of parallel systems. PTreeSPH uses individual smoothing lengths and timesteps, with a communication algorithm designed to minimize exchange of information while still providing all information required to accurately perform SPH computations. We have additionally incorporated cosmology, periodic boundary conditions with forces calculated using a quadrupole Ewald summation method, and radiative cooling and heating from a parameterized ionizing background following Katz, Weinberg & Hernquist (1996). The addition of other physical processes, such as star formation, is straightforward. A cosmological simulation from z=49 to z=2 with 64^3 gas particles and 64^3 dark matter particles requires ~6000 node-hours on a Cray T3D, with a communications overhead of ~10% and is load balanced to a ~90% level. When used on the new Cray T3E, this code will be capable of performing cosmological hydrodynamical simulations down to z=0 with ~2x10^6 particles, or to z=2 with ~10^7 particles, in a reasonable amount of time. Even larger simulations will be practical in situations where the matter is not highly clustered or when periodic boundaries are not required.Comment: 30 pages, 6 Postscript figures, Submitted to New Astronom

arXiv.org e-Print Archive

Crossref

CERN Document Server

The DUNE-ALUGrid Module

Author: Alkämper Martin
Dedner Andreas
Klöfkorn Robert
Nolte Martin
Publication venue
Publication date: 15/08/2015
Field of study

In this paper we present the new DUNE-ALUGrid module. This module contains a major overhaul of the sources from the ALUgrid library and the binding to the DUNE software framework. The main changes include user defined load balancing, parallel grid construction, and an redesign of the 2d grid which can now also be used for parallel computations. In addition many improvements have been introduced into the code to increase the parallel efficiency and to decrease the memory footprint. The original ALUGrid library is widely used within the DUNE community due to its good parallel performance for problems requiring local adaptivity and dynamic load balancing. Therefore, this new model will benefit a number of DUNE users. In addition we have added features to increase the range of problems for which the grid manager can be used, for example, introducing a 3d tetrahedral grid using a parallel newest vertex bisection algorithm for conforming grid refinement. In this paper we will discuss the new features, extensions to the DUNE interface, and explain for various examples how the code is used in parallel environments.Comment: 25 pages, 11 figure

arXiv.org e-Print Archive

UiS Brage

An adaptive fixed-mesh ALE method for free surface flows

Author: Baiges Aznar Joan
Castillo Ernesto
Codina Ramon
Pont Ribas Arnau
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In this work we present a Fixed-Mesh ALE method for the numerical simulation of free surface flows capable of using an adaptive finite element mesh covering a background domain. This mesh is successively refined and unrefined at each time step in order to focus the computational effort on the spatial regions where it is required. Some of the main ingredients of the formulation are the use of an Arbitrary-Lagrangian–Eulerian formulation for computing temporal derivatives, the use of stabilization terms for stabilizing convection, stabilizing the lack of compatibility between velocity and pressure interpolation spaces, and stabilizing the ill-conditioning introduced by the cuts on the background finite element mesh, and the coupling of the algorithm with an adaptive mesh refinement procedure suitable for running on distributed memory environments. Algorithmic steps for the projection between meshes are presented together with the algebraic fractional step approach used for improving the condition number of the linear systems to be solved. The method is tested in several numerical examples. The expected convergence rates both in space and time are observed. Smooth solution fields for both velocity and pressure are obtained (as a result of the contribution of the stabilization terms). Finally, a good agreement between the numerical results and the reference experimental data is obtained.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Scipedia

Energy Efficient Ant Colony Algorithms for Data Aggregation in Wireless Sensor Networks

Author: Li Mingchu
Lin Chi
Pei Zhongyi
Wu Guowei
Xia Feng
Yao Lin
Publication venue
Publication date: 30/12/2011
Field of study

In this paper, a family of ant colony algorithms called DAACA for data aggregation has been presented which contains three phases: the initialization, packet transmission and operations on pheromones. After initialization, each node estimates the remaining energy and the amount of pheromones to compute the probabilities used for dynamically selecting the next hop. After certain rounds of transmissions, the pheromones adjustment is performed periodically, which combines the advantages of both global and local pheromones adjustment for evaporating or depositing pheromones. Four different pheromones adjustment strategies are designed to achieve the global optimal network lifetime, namely Basic-DAACA, ES-DAACA, MM-DAACA and ACS-DAACA. Compared with some other data aggregation algorithms, DAACA shows higher superiority on average degree of nodes, energy efficiency, prolonging the network lifetime, computation complexity and success ratio of one hop transmission. At last we analyze the characteristic of DAACA in the aspects of robustness, fault tolerance and scalability.Comment: To appear in Journal of Computer and System Science

arXiv.org e-Print Archive

Elsevier - Publisher Connector