2,010 research outputs found
Fast Fourier Transform algorithm design and tradeoffs
The Fast Fourier Transform (FFT) is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional Single Instruction Stream/Multiple Data (SIMD) parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Fast Fourier Transform programs are compared to the currently best Cray-2 FFT program
Plasma simulation using the massively parallel processor
Two dimensional electrostatic simulation codes using the particle-in-cell model are developed on the Massively Parallel Processor (MPP). The conventional plasma simulation procedure that computes electric fields at particle positions by means of a gridded system is found inefficient on the MPP. The MPP simulation code is thus based on the gridless system in which particles are assigned to processing elements and electric fields are computed directly via Discrete Fourier Transform. Currently, the gridless model on the MPP in two dimensions is about nine times slower that the gridded system on the CRAY X-MP without considering I/O time. However, the gridless system on the MPP can be improved by incorporating a faster I/O between the staging memory and Array Unit and a more efficient procedure for taking floating point sums over processing elements. The initial results suggest that the parallel processors have the potential for performing large scale plasma simulations
Solving the Klein-Gordon equation using Fourier spectral methods: A benchmark test for computer performance
The cubic Klein-Gordon equation is a simple but non-trivial partial
differential equation whose numerical solution has the main building blocks
required for the solution of many other partial differential equations. In this
study, the library 2DECOMP&FFT is used in a Fourier spectral scheme to solve
the Klein-Gordon equation and strong scaling of the code is examined on
thirteen different machines for a problem size of 512^3. The results are useful
in assessing likely performance of other parallel fast Fourier transform based
programs for solving partial differential equations. The problem is chosen to
be large enough to solve on a workstation, yet also of interest to solve
quickly on a supercomputer, in particular for parametric studies. Unlike other
high performance computing benchmarks, for this problem size, the time to
solution will not be improved by simply building a bigger supercomputer.Comment: 10 page
Accuracy and speed in computing the Chebyshev collocation derivative
We studied several algorithms for computing the Chebyshev spectral derivative and compare their roundoff error. For a large number of collocation points, the elements of the Chebyshev differentiation matrix, if constructed in the usual way, are not computed accurately. A subtle cause is is found to account for the poor accuracy when computing the derivative by the matrix-vector multiplication method. Methods for accurately computing the elements of the matrix are presented, and we find that if the entities of the matrix are computed accurately, the roundoff error of the matrix-vector multiplication is as small as that of the transform-recursion algorithm. Results of CPU time usage are shown for several different algorithms for computing the derivative by the Chebyshev collocation method for a wide variety of two-dimensional grid sizes on both an IBM and a Cray 2 computer. We found that which algorithm is fastest on a particular machine depends not only on the grid size, but also on small details of the computer hardware as well. For most practical grid sizes used in computation, the even-odd decomposition algorithm is found to be faster than the transform-recursion method
A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence
A hybrid scheme that utilizes MPI for distributed memory parallelism and
OpenMP for shared memory parallelism is presented. The work is motivated by the
desire to achieve exceptionally high Reynolds numbers in pseudospectral
computations of fluid turbulence on emerging petascale, high core-count,
massively parallel processing systems. The hybrid implementation derives from
and augments a well-tested scalable MPI-parallelized pseudospectral code. The
hybrid paradigm leads to a new picture for the domain decomposition of the
pseudospectral grids, which is helpful in understanding, among other things,
the 3D transpose of the global data that is necessary for the parallel fast
Fourier transforms that are the central component of the numerical
discretizations. Details of the hybrid implementation are provided, and
performance tests illustrate the utility of the method. It is shown that the
hybrid scheme achieves near ideal scalability up to ~20000 compute cores with a
maximum mean efficiency of 83%. Data are presented that demonstrate how to
choose the optimal number of MPI processes and OpenMP threads in order to
optimize code performance on two different platforms.Comment: Submitted to Parallel Computin
Hydra: A Parallel Adaptive Grid Code
We describe the first parallel implementation of an adaptive
particle-particle, particle-mesh code with smoothed particle hydrodynamics.
Parallelisation of the serial code, ``Hydra'', is achieved by using CRAFT, a
Cray proprietary language which allows rapid implementation of a serial code on
a parallel machine by allowing global addressing of distributed memory.
The collisionless variant of the code has already completed several 16.8
million particle cosmological simulations on a 128 processor Cray T3D whilst
the full hydrodynamic code has completed several 4.2 million particle combined
gas and dark matter runs. The efficiency of the code now allows parameter-space
explorations to be performed routinely using particles of each species.
A complete run including gas cooling, from high redshift to the present epoch
requires approximately 10 hours on 64 processors.
In this paper we present implementation details and results of the
performance and scalability of the CRAFT version of Hydra under varying degrees
of particle clustering.Comment: 23 pages, LaTex plus encapsulated figure
- …