859 research outputs found
On the impact of communication complexity in the design of parallel numerical algorithms
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation
A study of the communication cost of the FFT on torus multicomputers
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version
Cosmological Simulations Using Special Purpose Computers: Implementing P3M on Grape
An adaptation of the Particle-Particle/Particle-Mesh (P3M) code to the
special purpose hardware GRAPE is presented. The short range force is
calculated by a four chip GRAPE-3A board, while the rest of the calculation is
performed on a Sun Sparc 10/51 workstation. The limited precision of the GRAPE
hardware and algorithm constraints introduce stochastic errors of the order of
a few percent in the gravitational forces. Tests of this new P3MG3A code show
that it is a robust tool for cosmological simulations. The code currently
achieves a peak efficiency of one third the speed of the vectorized P3M code on
a Cray C-90 and significant improvements are planned in the near future.
Special purpose computers like GRAPE are therefore an attractive alternative to
supercomputers for numerical cosmology.Comment: 9 pages (ApJS style); uuencoded compressed PostScript file (371 kb)
Also available by anonymous 'ftp' to astro.Princeton.EDU [128.112.24.45] in:
summers/grape/p3mg3a.ps (668 kb) and WWW at:
http://astro.Princeton.EDU/~library/prep.html (as POPe-600) Send all
comments, questions, requests, etc. to: [email protected]
A low-cost parallel implementation of direct numerical simulation of wall turbulence
A numerical method for the direct numerical simulation of incompressible wall
turbulence in rectangular and cylindrical geometries is presented. The
distinctive feature resides in its design being targeted towards an efficient
distributed-memory parallel computing on commodity hardware. The adopted
discretization is spectral in the two homogeneous directions; fourth-order
accurate, compact finite-difference schemes over a variable-spacing mesh in the
wall-normal direction are key to our parallel implementation. The parallel
algorithm is designed in such a way as to minimize data exchange among the
computing machines, and in particular to avoid taking a global transpose of the
data during the pseudo-spectral evaluation of the non-linear terms. The
computing machines can then be connected to each other through low-cost network
devices. The code is optimized for memory requirements, which can moreover be
subdivided among the computing nodes. The layout of a simple, dedicated and
optimized computing system based on commodity hardware is described. The
performance of the numerical method on this computing system is evaluated and
compared with that of other codes described in the literature, as well as with
that of the same code implementing a commonly employed strategy for the
pseudo-spectral calculation.Comment: To be published in J. Comp. Physic
The cosmological simulation code GADGET-2
We discuss the cosmological simulation code GADGET-2, a new massively
parallel TreeSPH code, capable of following a collisionless fluid with the
N-body method, and an ideal gas by means of smoothed particle hydrodynamics
(SPH). Our implementation of SPH manifestly conserves energy and entropy in
regions free of dissipation, while allowing for fully adaptive smoothing
lengths. Gravitational forces are computed with a hierarchical multipole
expansion, which can optionally be applied in the form of a TreePM algorithm,
where only short-range forces are computed with the `tree'-method while
long-range forces are determined with Fourier techniques. Time integration is
based on a quasi-symplectic scheme where long-range and short-range forces can
be integrated with different timesteps. Individual and adaptive short-range
timesteps may also be employed. The domain decomposition used in the
parallelisation algorithm is based on a space-filling curve, resulting in high
flexibility and tree force errors that do not depend on the way the domains are
cut. The code is efficient in terms of memory consumption and required
communication bandwidth. It has been used to compute the first cosmological
N-body simulation with more than 10^10 dark matter particles, reaching a
homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has
also been used to carry out very large cosmological SPH simulations that
account for radiative cooling and star formation, reaching total particle
numbers of more than 250 million. We present the algorithms used by the code
and discuss their accuracy and performance using a number of test problems.
GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code
available at http://www.mpa-garching.mpg.de/gadge
- …