36 research outputs found

    A Parallel Tree code for large Nbody simulation: dynamic load balance and data distribution on CRAY T3D system

    Get PDF
    N-body algorithms for long-range unscreened interactions like gravity belong to a class of highly irregular problems whose optimal solution is a challenging task for present-day massively parallel computers. In this paper we describe a strategy for optimal memory and work distribution which we have applied to our parallel implementation of the Barnes & Hut (1986) recursive tree scheme on a Cray T3D using the CRAFT programming environment. We have performed a series of tests to find an " optimal data distribution " in the T3D memory, and to identify a strategy for the " Dynamic Load Balance " in order to obtain good performances when running large simulations (more than 10 million particles). The results of tests show that the step duration depends on two main factors: the data locality and the T3D network contention. Increasing data locality we are able to minimize the step duration if the closest bodies (direct interaction) tend to be located in the same PE local memory (contiguous block subdivison, high granularity), whereas the tree properties have a fine grain distribution. In a very large simulation, due to network contention, an unbalanced load arises. To remedy this we have devised an automatic work redistribution mechanism which provided a good Dynamic Load Balance at the price of an insignificant overhead.Comment: 16 pages with 11 figures included, (Latex, elsart.style). Accepted by Computer Physics Communication

    A Work- and Data-Sharing Parallel Tree N-body Code

    Get PDF
    We describe a new parallel N-body code for cosmological simulations. The code is based on a work- and data sharing scheme, and is implemented within the Cray Research Corporation's CRAFT programming environment. Different data distribution schemes have been adopted for bodies' and tree's structures. Tests performed for two different types of initial distributions show that the performance scales almost ideally as a function of the size of the system and of the number of processors. We discuss the factors affecting the absolute speedup and how it can be increased with a better tree's data distribution scheme.Comment: 16 pages, 8 figures. Uses elsart.sty and epsf.sty. Accepted for publication in Computer Physics Communication

    Toward High Performance Computing Education

    Get PDF
    High Performance Computing (HPC) is the ability to process data and perform complex calculations at extremely high speeds. Current HPC platforms can achieve calculations on the order of quadrillions of calculations per second with quintillions on the horizon. The past three decades witnessed a vast increase in the use of HPC across different scientific, engineering and business communities, for example, sequencing the genome, predicting climate changes, designing modern aerodynamics, or establishing customer preferences. Although HPC has been well incorporated into science curricula such as bioinformatics, the same cannot be said for most computing programs. This working group will explore how HPC can make inroads into computer science education, from the undergraduate to postgraduate levels. The group will address research questions designed to investigate topics such as identifying and handling barriers that inhibit the adoption of HPC in educational environments, how to incorporate HPC into various curricula, and how HPC can be leveraged to enhance applied critical thinking and problem solving skills. Four deliverables include: (1) a catalog of core HPC educational concepts, (2) HPC curricula for contemporary computing needs, such as in artificial intelligence, cyberanalytics, data science and engineering, or internet of things, (3) possible infrastructures for implementing HPC coursework, and (4) HPC-related feedback to the CC2020 project

    Bibliographic Snapshots of High-Performance/High-Productivity Computing

    No full text
    corecore