A systolic algorithm rhythmically computes and passes data through a network of processors. We investigate the performance of systolic algorithms for implementing the gravitational N-body problem on distributedmemory computers. Systolic algorithms minimize memory requirements by distributing the particles between processors. We show that the performance of systolic routines can be greatly enhanced by the use of nonblocking communication, which allows particle coordinates to be communicated at the same time that force calculations are being carried out. The performance enhancement is particularly great when block sizes are small, i.e. when only a small fraction of the N particles need their forces computed in each time step. Hyper-systolic algorithms reduce the communication complexity from O(Np), with p the processor number, to O(N √ p), at the expense of increased memory demands. We describe a hyper-systolic algorithm that will work with a block time step algorithm and analyze its performance. As an example of an application requiring large N, we use the systolic algorithm to carry out direct-summation simulations using 10 6 particles of the Brownian motion of the supermassive black hole at the center of the Milky Way galaxy. We predict a 3D random velocity of ∼ 0.4 km s −1 for the black hole. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.