3,155 research outputs found

    A performance comparison of the Cray-2 and the Cray X-MP

    Get PDF
    A suite of thirteen large Fortran benchmark codes were run on Cray-2 and Cray X-MP supercomputers. These codes were a mix of compute-intensive scientific application programs (mostly Computational Fluid Dynamics) and some special vectorized computation exercise programs. For the general class of programs tested on the Cray-2, most of which were not specially tuned for speed, the floating point operation rates varied under a variety of system load configurations from 40 percent up to 125 percent of X-MP performance rates. It is concluded that the Cray-2, in the original system configuration studied (without memory pseudo-banking) will run untuned Fortran code, on average, about 70 percent of X-MP speeds

    Plasma simulation using the massively parallel processor

    Get PDF
    Two dimensional electrostatic simulation codes using the particle-in-cell model are developed on the Massively Parallel Processor (MPP). The conventional plasma simulation procedure that computes electric fields at particle positions by means of a gridded system is found inefficient on the MPP. The MPP simulation code is thus based on the gridless system in which particles are assigned to processing elements and electric fields are computed directly via Discrete Fourier Transform. Currently, the gridless model on the MPP in two dimensions is about nine times slower that the gridded system on the CRAY X-MP without considering I/O time. However, the gridless system on the MPP can be improved by incorporating a faster I/O between the staging memory and Array Unit and a more efficient procedure for taking floating point sums over processing elements. The initial results suggest that the parallel processors have the potential for performing large scale plasma simulations

    Cray performance data from five benchmarks

    Get PDF
    The five benchmark programs discussed in TM-88956, February 1987, were run on the CRAY X-MP/24 under different operating systems and compilers. Performance data is reported for runs under early versions of UNICOS and CFT77. The most recent data includes a system of configuration for a X-MP hardware upgrade. Performance figures for the Y-MP are shown for comparison. Differences in the figures are analyzed and discussed

    Multitasking and microtasking experience on the NA S Cray-2 and ACF Cray X-MP

    Get PDF
    The fast Fourier transform (FFT) kernel of the NAS benchmark program has been utilized to experiment with the multitasking library on the Cray-2 and Cray X-MP/48, and microtasking directives on the Cray X-MP. Some performance figures are shown, and the state of multitasking software is described

    A comparison of the Cray-2 performance before and after the installation of memory pseudo-banking

    Get PDF
    A suite of 13 large Fortran benchmark codes were run on a Cray-2 configured with memory pseudo-banking circuits, and floating point operation rates were measured for each under a variety of system load configurations. These were compared with similar flop measurements taken on the same system before installation of the pseudo-banking. A useful memory access efficiency parameter was defined and calculated for both sets of performance rates, allowing a crude quantitative measure of the improvement in efficiency due to pseudo-banking. Programs were categorized as either highly scalar (S) or highly vectorized (V) and either memory-intensive or register-intensive, giving 4 categories: S-memory, S-register, V-memory, and V-register. Using flop rates as a simple quantifier of these 4 categories, a scatter plot of efficiency gain vs Mflops roughly illustrates the improvement in floating point processing speed due to pseudo-banking. On the Cray-2 system tested this improvement ranged from 1 percent for S-memory codes to about 12 percent for V-memory codes. No significant gains were made for V-register codes, which was to be expected

    Partitioning strategy for efficient nonlinear finite element dynamic analysis on multiprocessor computers

    Get PDF
    A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning strategy in which the response of the unsymmetric and antisymmetric response vectors (modes), each obtained by using only a fraction of the degrees of freedom of the original finite element model. The three key elements of the procedure which result in high degree of concurrency throughout the solution process are: (1) mixed (or primitive variable) formulation with independent shape functions for the different fields; (2) operator splitting or restructuring of the discrete equations at each time step to delineate the symmetric and antisymmetric vectors constituting the response; and (3) two level iterative process for generating the response of the structure. An assessment is made of the effectiveness of the procedure on the CRAY X-MP/4 computers

    Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

    Get PDF
    A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm

    CRAY mini manual. Revision D

    Get PDF
    This document briefly describes the use of the CRAY supercomputers that are an integral part of the Supercomputing Network Subsystem of the Central Scientific Computing Complex at LaRC. Features of the CRAY supercomputers are covered, including: FORTRAN, C, PASCAL, architectures of the CRAY-2 and CRAY Y-MP, the CRAY UNICOS environment, batch job submittal, debugging, performance analysis, parallel processing, utilities unique to CRAY, and documentation. The document is intended for all CRAY users as a ready reference to frequently asked questions and to more detailed information contained in the vendor manuals. It is appropriate for both the novice and the experienced user

    Solving the shallow water equations on the Cray X-MP/48 and the connection machine 2

    Get PDF
    The shallow water equations in Cartesian coordinates and 2-D are solved on the Connection Machine 2 (CM-2) using both the spectral and finite difference methods. A description of these implementations is presented together with a brief discussion of the CM-2 as it relates to these specific computations. The finite difference code was written both in C* and *LISP and the spectral code was written in *LISP. The performance of the codes is compared with a FORTRAN version that was optimized for the Cray X-MP/48

    Using a Cray Y-MP as an array processor for a RISC Workstation

    Get PDF
    As microprocessors increase in power, the economics of centralized computing has changed dramatically. At the beginning of the 1980's, mainframes and super computers were often considered to be cost-effective machines for scalar computing. Today, microprocessor-based RISC (reduced-instruction-set computer) systems have displaced many uses of mainframes and supercomputers. Supercomputers are still cost competitive when processing jobs that require both large memory size and high memory bandwidth. One such application is array processing. Certain numerical operations are appropriate to use in a Remote Procedure Call (RPC)-based environment. Matrix multiplication is an example of an operation that can have a sufficient number of arithmetic operations to amortize the cost of an RPC call. An experiment which demonstrates that matrix multiplication can be executed remotely on a large system to speed the execution over that experienced on a workstation is described
    corecore