23 research outputs found
A Parallel Tree code for large Nbody simulation: dynamic load balance and data distribution on CRAY T3D system
N-body algorithms for long-range unscreened interactions like gravity belong
to a class of highly irregular problems whose optimal solution is a challenging
task for present-day massively parallel computers. In this paper we describe a
strategy for optimal memory and work distribution which we have applied to our
parallel implementation of the Barnes & Hut (1986) recursive tree scheme on a
Cray T3D using the CRAFT programming environment. We have performed a series of
tests to find an " optimal data distribution " in the T3D memory, and to
identify a strategy for the " Dynamic Load Balance " in order to obtain good
performances when running large simulations (more than 10 million particles).
The results of tests show that the step duration depends on two main factors:
the data locality and the T3D network contention. Increasing data locality we
are able to minimize the step duration if the closest bodies (direct
interaction) tend to be located in the same PE local memory (contiguous block
subdivison, high granularity), whereas the tree properties have a fine grain
distribution. In a very large simulation, due to network contention, an
unbalanced load arises. To remedy this we have devised an automatic work
redistribution mechanism which provided a good Dynamic Load Balance at the
price of an insignificant overhead.Comment: 16 pages with 11 figures included, (Latex, elsart.style). Accepted by
Computer Physics Communication
A Work- and Data-Sharing Parallel Tree N-body Code
We describe a new parallel N-body code for cosmological simulations. The code
is based on a work- and data sharing scheme, and is implemented within the Cray
Research Corporation's CRAFT programming environment. Different data
distribution schemes have been adopted for bodies' and tree's structures. Tests
performed for two different types of initial distributions show that the
performance scales almost ideally as a function of the size of the system and
of the number of processors. We discuss the factors affecting the absolute
speedup and how it can be increased with a better tree's data distribution
scheme.Comment: 16 pages, 8 figures. Uses elsart.sty and epsf.sty. Accepted for
publication in Computer Physics Communication