574 research outputs found

    Implementation and analysis of a Navier-Stokes algorithm on parallel computers

    Get PDF
    The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented

    A Parallel Tree code for large Nbody simulation: dynamic load balance and data distribution on CRAY T3D system

    Get PDF
    N-body algorithms for long-range unscreened interactions like gravity belong to a class of highly irregular problems whose optimal solution is a challenging task for present-day massively parallel computers. In this paper we describe a strategy for optimal memory and work distribution which we have applied to our parallel implementation of the Barnes & Hut (1986) recursive tree scheme on a Cray T3D using the CRAFT programming environment. We have performed a series of tests to find an " optimal data distribution " in the T3D memory, and to identify a strategy for the " Dynamic Load Balance " in order to obtain good performances when running large simulations (more than 10 million particles). The results of tests show that the step duration depends on two main factors: the data locality and the T3D network contention. Increasing data locality we are able to minimize the step duration if the closest bodies (direct interaction) tend to be located in the same PE local memory (contiguous block subdivison, high granularity), whereas the tree properties have a fine grain distribution. In a very large simulation, due to network contention, an unbalanced load arises. To remedy this we have devised an automatic work redistribution mechanism which provided a good Dynamic Load Balance at the price of an insignificant overhead.Comment: 16 pages with 11 figures included, (Latex, elsart.style). Accepted by Computer Physics Communication

    Solving the Cauchy-Riemann equations on parallel computers

    Get PDF
    Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented

    Performance of the Cray T3D and Emerging Architectures on Canopy QCD Applications

    Get PDF
    The Cray T3D, an MIMD system with NUMA shared memory capabilities and in principle very low communications latency, can support the Canopy framework for grid-oriented applications. CANOPY has been ported to the T3D, with the intent of making it available to a spectrum of users. The performance of the T3D running Canopy has been benchmarked on five QCD applications extensively run on ACPMAPS at Fermilab, requiring a variety of data access patterns. The net performance and scaling behavior reveals an efficiency relative to peak Gflops almost identical to that achieved on ACPMAPS. Detailed studies of the major factors impacting performance are presented. Generalizations applying this analysis to the newly emerging crop of commercial systems reveal where their limitations will lie. On these applications, efficiencies of above 25\% are not to be expected; eliminating overheads due to Canopy will improve matters, but by less than a factor of two.Comment: 4 pages, to be published in Proceedings of Lattice '95. LaTeX using Elsevere's espcrc2.sty style file (espcrc2.sty and .tex appended at end of submission

    Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

    Get PDF
    NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

    Scalable Parallel Computers for Real-Time Signal Processing

    Get PDF
    We assess the state-of-the-art technology in massively parallel processors (MPPs) and their variations in different architectural platforms. Architectural and programming issues are identified in using MPPs for time-critical applications such as adaptive radar signal processing. We review the enabling technologies. These include high-performance CPU chips and system interconnects, distributed memory architectures, and various latency hiding mechanisms. We characterize the concept of scalability in three areas: resources, applications, and technology. Scalable performance attributes are analytically defined. Then we compare MPPs with symmetric multiprocessors (SMPs) and clusters of workstations (COWs). The purpose is to reveal their capabilities, limits, and effectiveness in signal processing. We evaluate the IBM SP2 at MHPCC, the Intel Paragon at SDSC, the Gray T3D at Gray Eagan Center, and the Gray T3E and ASCI TeraFLOP system proposed by Intel. On the software and programming side, we evaluate existing parallel programming environments, including the models, languages, compilers, software tools, and operating systems. Some guidelines for program parallelization are provided. We examine data-parallel, shared-variable, message-passing, and implicit programming models. Communication functions and their performance overhead are discussed. Available software tools and communication libraries are also introducedpublished_or_final_versio

    Implementation and Performance Analysis of Numerical Algorithms on the MPP, Flex/32, and Cray/2

    Get PDF
    This dissertation presents the results of the implementation of a number of numerical algorithms on three parallel/vector computers. The object of this research is to determine how well, or poorly, a number of numerical algorithms would map onto three different architectures and to analyze the performance of these architectures using these algorithms. These algorithms are: a relaxation scheme for the solution of the Cauchy-Riemann equations, an ADI method for the solution of the diffusion equation, and a compact difference scheme for the solution of two-dimensional Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and the Cray/2. The machine architectures are briefly described. The implementation of these algorithms is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for these algorithms on these architectures. Finally conclusions are presented

    Functional requirements document for the Earth Observing System Data and Information System (EOSDIS) Scientific Computing Facilities (SCF) of the NASA/MSFC Earth Science and Applications Division, 1992

    Get PDF
    Five scientists at MSFC/ESAD have EOS SCF investigator status. Each SCF has unique tasks which require the establishment of a computing facility dedicated to accomplishing those tasks. A SCF Working Group was established at ESAD with the charter of defining the computing requirements of the individual SCFs and recommending options for meeting these requirements. The primary goal of the working group was to determine which computing needs can be satisfied using either shared resources or separate but compatible resources, and which needs require unique individual resources. The requirements investigated included CPU-intensive vector and scalar processing, visualization, data storage, connectivity, and I/O peripherals. A review of computer industry directions and a market survey of computing hardware provided information regarding important industry standards and candidate computing platforms. It was determined that the total SCF computing requirements might be most effectively met using a hierarchy consisting of shared and individual resources. This hierarchy is composed of five major system types: (1) a supercomputer class vector processor; (2) a high-end scalar multiprocessor workstation; (3) a file server; (4) a few medium- to high-end visualization workstations; and (5) several low- to medium-range personal graphics workstations. Specific recommendations for meeting the needs of each of these types are presented

    Parallel computing and the generation of basic plasma data

    Get PDF
    Comprehensive simulations of the processing plasmas used in semiconductor fabrication will depend on the availability of basic data for many microscopic processes that occur in the plasma and at the surface. Cross sections for electron collisions, a principal mechanism for producing reactive species in these plasmas, are among the most important such data; however, electron-collision cross sections are difficult to measure, and the available data are, at best, sketchy for the polyatomic feed gases of interest. While computational approaches to obtaining such data are thus potentially of significant value, studies of electron collisions with polyatomic gases at relevant energies are numerically intensive. In this article, we report on the progress we have made in exploiting large-scale distributed-memory parallel computers, consisting of hundreds of interconnected microprocessors, to generate electron-collision cross sections for gases of interest in plasma simulations
    corecore