539 research outputs found

    A Parallel Tree code for large Nbody simulation: dynamic load balance and data distribution on CRAY T3D system

    Get PDF
    N-body algorithms for long-range unscreened interactions like gravity belong to a class of highly irregular problems whose optimal solution is a challenging task for present-day massively parallel computers. In this paper we describe a strategy for optimal memory and work distribution which we have applied to our parallel implementation of the Barnes & Hut (1986) recursive tree scheme on a Cray T3D using the CRAFT programming environment. We have performed a series of tests to find an " optimal data distribution " in the T3D memory, and to identify a strategy for the " Dynamic Load Balance " in order to obtain good performances when running large simulations (more than 10 million particles). The results of tests show that the step duration depends on two main factors: the data locality and the T3D network contention. Increasing data locality we are able to minimize the step duration if the closest bodies (direct interaction) tend to be located in the same PE local memory (contiguous block subdivison, high granularity), whereas the tree properties have a fine grain distribution. In a very large simulation, due to network contention, an unbalanced load arises. To remedy this we have devised an automatic work redistribution mechanism which provided a good Dynamic Load Balance at the price of an insignificant overhead.Comment: 16 pages with 11 figures included, (Latex, elsart.style). Accepted by Computer Physics Communication

    Can we do better than Hybrid Monte Carlo in Lattice QCD?

    Get PDF
    The Hybrid Monte Carlo algorithm for the simulation of QCD with dynamical staggered fermions is compared with Kramers equation algorithm. We find substantially different autocorrelation times for local and nonlocal observables. The calculations have been performed on the parallel computer CRAY T3D.Comment: Talk presented at LATTICE96(algorithms), LaTeX 3 pages, uses espcrc2, epsf, 2 postscript figure

    The Eta-prime and Cooling with Staggered Fermions

    Full text link
    We present a calculation of the mass of the eta-prime meson using quenched and dynamical staggered fermions. We also discuss the effects of "cooling" and suggest its use as a quantitative tool.Comment: 4 pages, LaTeX with 7 EPS figs, contribution to Lattice 9

    Ludwig: A parallel Lattice-Boltzmann code for complex fluids

    Full text link
    This paper describes `Ludwig', a versatile code for the simulation of Lattice-Boltzmann (LB) models in 3-D on cubic lattices. In fact `Ludwig' is not a single code, but a set of codes that share certain common routines, such as I/O and communications. If `Ludwig' is used as intended, a variety of complex fluid models with different equilibrium free energies are simple to code, so that the user may concentrate on the physics of the problem, rather than on parallel computing issues. Thus far, `Ludwig''s main application has been to symmetric binary fluid mixtures. We first explain the philosophy and structure of `Ludwig' which is argued to be a very effective way of developing large codes for academic consortia. Next we elaborate on some parallel implementation issues such as parallel I/O, and the use of MPI to achieve full portability and good efficiency on both MPP and SMP systems. Finally, we describe how to implement generic solid boundaries, and look in detail at the particular case of a symmetric binary fluid mixture near a solid wall. We present a novel scheme for the thermodynamically consistent simulation of wetting phenomena, in the presence of static and moving solid boundaries, and check its performance.Comment: Submitted to Computer Physics Communication

    Optimisation of a parallel ocean general circulation model

    Get PDF
    Abstract. This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing rou- tines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel e?ciency of the model is adversely a?ected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers

    Evaluation of High Performance Fortran through Application Kernels

    Get PDF
    Since the definition of the High Performance Fortran (HPF) standard, we have been maintaining a suite of application kernel codes with the aim of using them to evaluate the available compilers. This paper presents the results and conclusions from this study, for sixteen codes, on compilers from IBM, DEC, and the Portland Group Inc. (PGI), and on three machines: a DEC Alphafarm, an IBM SP-2, and a Cray T3D. From this, we hope to show the prospective HPF user that scalable performance is possible with modest effort, yet also where the current weaknesses lay
    corecore