3,024 research outputs found
Recommended from our members
Applying an abstract data structure description approach to parallelizing scientific pointer programs
Even though impressive progress has been made in the area of parallelizing scientific programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. Unlike arrays which have a small number of well-defined properties that can be utilized by a parallelizing compiler, pointer data structures are used to implement a wide variety of structures that exhibit a much more diverse set of properties. The complexity and diversity of such properties means that, in general, scientific programs with pointer data structures cannot be effectively analyzed by an optimizing and parallelizing compiler.In order to provide a system in which the compiler can fully utilize the properties of different types of pointer data structures, we have developed a mechanism for the Abstract Description of Data Structures (ADDS). With our approach, the programmer can explicitly describe important properties such as dimensionality of the pointer data structure, independence of dimensions, and direction of traversal. These abstract descriptions of pointer data structures are then used by the compiler to guide analysis, optimization, and parallelization.In this paper we summarize the ADDS approach through the use of numerous examples of data structures used in scientific computations, we illustrate how such declarations are natural and non-tedious to specify, and we show how the ADDS declarations can be used to improve compile-time analysis. In order to demonstrate the viability of our approach, we show how such techniques can be used to parallelize an important class of scientific codes which naturally use recursive pointer data structures. In particular, we use our approach to develop the parallelization of an N-body simulation that is based on a relatively complicated pointer data structure, and we report the speedup results for a Sequent multiprocessor
GADGET: A code for collisionless and gasdynamical cosmological simulations
We describe the newly written code GADGET which is suitable both for
cosmological simulations of structure formation and for the simulation of
interacting galaxies. GADGET evolves self-gravitating collisionless fluids with
the traditional N-body approach, and a collisional gas by smoothed particle
hydrodynamics. Along with the serial version of the code, we discuss a parallel
version that has been designed to run on massively parallel supercomputers with
distributed memory. While both versions use a tree algorithm to compute
gravitational forces, the serial version of GADGET can optionally employ the
special-purpose hardware GRAPE instead of the tree. Periodic boundary
conditions are supported by means of an Ewald summation technique. The code
uses individual and adaptive timesteps for all particles, and it combines this
with a scheme for dynamic tree updates. Due to its Lagrangian nature, GADGET
thus allows a very large dynamic range to be bridged, both in space and time.
So far, GADGET has been successfully used to run simulations with up to 7.5e7
particles, including cosmological studies of large-scale structure formation,
high-resolution simulations of the formation of clusters of galaxies, as well
as workstation-sized problems of interacting galaxies. In this study, we detail
the numerical algorithms employed, and show various tests of the code. We
publically release both the serial and the massively parallel version of the
code.Comment: 32 pages, 14 figures, replaced to match published version in New
Astronomy. For download of the code, see
http://www.mpa-garching.mpg.de/gadget (new version 1.1 available
Tangos: the agile numerical galaxy organization system
We present Tangos, a Python framework and web interface for database-driven
analysis of numerical structure formation simulations. To understand the role
that such a tool can play, consider constructing a history for the absolute
magnitude of each galaxy within a simulation. The magnitudes must first be
calculated for all halos at all timesteps and then linked using a merger tree;
folding the required information into a final analysis can entail significant
effort. Tangos is a generic solution to this information organization problem,
aiming to free users from the details of data management. At the querying
stage, our example of gathering properties over history is reduced to a few
clicks or a simple, single-line Python command. The framework is highly
extensible; in particular, users are expected to define their own properties
which tangos will write into the database. A variety of parallelization options
are available and the raw simulation data can be read using existing libraries
such as pynbody or yt. Finally, tangos-based databases and analysis pipelines
can easily be shared with collaborators or the broader community to ensure
reproducibility. User documentation is provided separately.Comment: Clarified various points and further improved code performance;
accepted for publication in ApJS. Tutorials (including video) at
http://tiny.cc/tango
A Parallel Adaptive P3M code with Hierarchical Particle Reordering
We discuss the design and implementation of HYDRA_OMP a parallel
implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M)
code HYDRA. The code is designed primarily for conducting cosmological
hydrodynamic simulations and is written in Fortran77+OpenMP. A number of
optimizations for RISC processors and SMP-NUMA architectures have been
implemented, the most important optimization being hierarchical reordering of
particles within chaining cells, which greatly improves data locality thereby
removing the cache misses typically associated with linked lists. Parallel
scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes
for a variety of modern SMP architectures. We give performance data in terms of
the number of particle updates per second, which is a more useful performance
metric than raw MFlops. A basic version of the code will be made available to
the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics
Communication
SPH-EXA: Enhancing the Scalability of SPH codes Via an Exascale-Ready SPH Mini-App
Numerical simulations of fluids in astrophysics and computational fluid
dynamics (CFD) are among the most computationally-demanding calculations, in
terms of sustained floating-point operations per second, or FLOP/s. It is
expected that these numerical simulations will significantly benefit from the
future Exascale computing infrastructures, that will perform 10^18 FLOP/s. The
performance of the SPH codes is, in general, adversely impacted by several
factors, such as multiple time-stepping, long-range interactions, and/or
boundary conditions. In this work an extensive study of three SPH
implementations SPHYNX, ChaNGa, and XXX is performed, to gain insights and to
expose any limitations and characteristics of the codes. These codes are the
starting point of an interdisciplinary co-design project, SPH-EXA, for the
development of an Exascale-ready SPH mini-app. We implemented a rotating square
patch as a joint test simulation for the three SPH codes and analyzed their
performance on a modern HPC system, Piz Daint. The performance profiling and
scalability analysis conducted on the three parent codes allowed to expose
their performance issues, such as load imbalance, both in MPI and OpenMP.
Two-level load balancing has been successfully applied to SPHYNX to overcome
its load imbalance. The performance analysis shapes and drives the design of
the SPH-EXA mini-app towards the use of efficient parallelization methods,
fault-tolerance mechanisms, and load balancing approaches.Comment: arXiv admin note: substantial text overlap with arXiv:1809.0801
A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems
Heterogeneous systems are becoming more common on High Performance Computing
(HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task
to obtain optimal performance on the GPU. Approaches to simplifying this task
include Merge (a library based framework for heterogeneous multi-core systems),
Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a
new programming language for general purpose computation on the GPU) and
CUDA-lite (an enhancement to CUDA that transforms code based on annotations).
In addition, efforts are underway to improve compiler tools for automatic
parallelization and optimization of affine loop nests for GPUs and for
automatic translation of OpenMP parallelized codes to CUDA.
In this paper we present an alternative approach: a new computational
framework for the development of massively data parallel scientific codes
applications suitable for use on such petascale/exascale hybrid systems built
upon the highly scalable Cactus framework. As the first non-trivial
demonstration of its usefulness, we successfully developed a new 3D CFD code
that achieves improved performance.Comment: Parallel Computing 2011 (ParCo2011), 30 August -- 2 September 2011,
Ghent, Belgiu
- …