Search CORE

859 research outputs found

On the impact of communication complexity in the design of parallel numerical algorithms

Author: Gannon D.
Vanrosendale J.
Publication venue
Publication date
Field of study

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation

NASA Technical Reports Server

A study of the communication cost of the FFT on torus multicomputers

Author: Díaz de Cerio Ripalda Luis Manuel
González Colás Antonio María
Valero García Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Cosmological Simulations Using Special Purpose Computers: Implementing P3M on Grape

Author: Brieu Philippe P.
Ostriker Jeremiah P.
Summers FJ
Publication venue: 'University of Chicago Press'
Publication date: 31/10/1994
Field of study

An adaptation of the Particle-Particle/Particle-Mesh (P3M) code to the special purpose hardware GRAPE is presented. The short range force is calculated by a four chip GRAPE-3A board, while the rest of the calculation is performed on a Sun Sparc 10/51 workstation. The limited precision of the GRAPE hardware and algorithm constraints introduce stochastic errors of the order of a few percent in the gravitational forces. Tests of this new P3MG3A code show that it is a robust tool for cosmological simulations. The code currently achieves a peak efficiency of one third the speed of the vectorized P3M code on a Cray C-90 and significant improvements are planned in the near future. Special purpose computers like GRAPE are therefore an attractive alternative to supercomputers for numerical cosmology.Comment: 9 pages (ApJS style); uuencoded compressed PostScript file (371 kb) Also available by anonymous 'ftp' to astro.Princeton.EDU [128.112.24.45] in: summers/grape/p3mg3a.ps (668 kb) and WWW at: http://astro.Princeton.EDU/~library/prep.html (as POPe-600) Send all comments, questions, requests, etc. to: [email protected]

arXiv.org e-Print Archive

Crossref

A low-cost parallel implementation of direct numerical simulation of wall turbulence

Author: Bertolotti
del Álamo
Dmitruk
Günther
Iovieno
Jiménez
Kim
Kim
Kwok
Lele
Mahesh
Maurizio Quadrio
Moin
Moser
Na
Paolo Luchini
Pelz
Pozzi
Quadrio
Quadrio
Spotz
Thomas
Publication venue: 'Elsevier BV'
Publication date: 18/06/2005
Field of study

A numerical method for the direct numerical simulation of incompressible wall turbulence in rectangular and cylindrical geometries is presented. The distinctive feature resides in its design being targeted towards an efficient distributed-memory parallel computing on commodity hardware. The adopted discretization is spectral in the two homogeneous directions; fourth-order accurate, compact finite-difference schemes over a variable-spacing mesh in the wall-normal direction are key to our parallel implementation. The parallel algorithm is designed in such a way as to minimize data exchange among the computing machines, and in particular to avoid taking a global transpose of the data during the pseudo-spectral evaluation of the non-linear terms. The computing machines can then be connected to each other through low-cost network devices. The code is optimized for memory requirements, which can moreover be subdivided among the computing nodes. The layout of a simple, dedicated and optimized computing system based on commodity hardware is described. The performance of the numerical method on this computing system is evaluated and compared with that of other codes described in the literature, as well as with that of the same code implementing a commonly employed strategy for the pseudo-spectral calculation.Comment: To be published in J. Comp. Physic

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Archivio della Ricerca - Università di Salerno

CERN Document Server

The cosmological simulation code GADGET-2

Author: Abel
Appel
Ascasibar
Bagla
Bagla
Balsara
Barnes
Barnes
Bate
Bode
Bode
Bonnell
Boss
Bryan
Burkert
Cen
Cen
Cen
Couchman
Couchman
Cox
Cuadra
Davé
Davé
Dehnen
Di Matteo
Dolag
Dolag
Dolag
Dolag
Dubinski
Dubinski
Duncan
Efstathiou
Evrard
Evrard
Frenk
Fryxell
Fukushige
Gao
Gingold
Gnedin
Hairer
Heitmann
Hernquist
Hernquist
Hernquist
Hernquist
Hernquist
Hockney
Hut
Jenkins
Jenkins
Jernigan
Jubelgas
Kang
Katz
Kay
Klein
Klypin
Knebe
Kravtsov
Kravtsov
Kravtsov
Linder
Lucy
Makino
Makino
Makino
Marri
Monaghan
Monaghan
Monaghan
Monaghan
Motl
Navarro
Navarro
Norman
O'Shea
O'Shea
Owen
Pen
Poludnenko
Power
Quilis
Quinn
Rasio
Refregier
Saha
Salmon
Scannapieco
Serna
Serna
Sommer-Larsen
Springel
Springel
Springel
Springel
Springel
Springel
Springel
Springel
Stadel
Steinmetz
Steinmetz
Teyssier
Tissera
Tormen
Tornatore
Tornatore
Van Den Bosch
Volker Springel
Wadsley
Warren
Warren
White
White
White
Whitehurst
Xu
Yepes
Yoshida
Yoshida
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

We discuss the cosmological simulation code GADGET-2, a new massively parallel TreeSPH code, capable of following a collisionless fluid with the N-body method, and an ideal gas by means of smoothed particle hydrodynamics (SPH). Our implementation of SPH manifestly conserves energy and entropy in regions free of dissipation, while allowing for fully adaptive smoothing lengths. Gravitational forces are computed with a hierarchical multipole expansion, which can optionally be applied in the form of a TreePM algorithm, where only short-range forces are computed with the `tree'-method while long-range forces are determined with Fourier techniques. Time integration is based on a quasi-symplectic scheme where long-range and short-range forces can be integrated with different timesteps. Individual and adaptive short-range timesteps may also be employed. The domain decomposition used in the parallelisation algorithm is based on a space-filling curve, resulting in high flexibility and tree force errors that do not depend on the way the domains are cut. The code is efficient in terms of memory consumption and required communication bandwidth. It has been used to compute the first cosmological N-body simulation with more than 10^10 dark matter particles, reaching a homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has also been used to carry out very large cosmological SPH simulations that account for radiative cooling and star formation, reaching total particle numbers of more than 250 million. We present the algorithms used by the code and discuss their accuracy and performance using a number of test problems. GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code available at http://www.mpa-garching.mpg.de/gadge

arXiv.org e-Print Archive