6,051 research outputs found

    Transferring ecosystem simulation codes to supercomputers

    Get PDF
    Many ecosystem simulation computer codes have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Supercomputing platforms (both parallel and distributed systems) have been largely unused, however, because of the perceived difficulty in accessing and using the machines. Also, significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers must be considered. We have transferred a grassland simulation model (developed on a VAX) to a Cray Y-MP/C90. We describe porting the model to the Cray and the changes we made to exploit the parallelism in the application and improve code execution. The Cray executed the model 30 times faster than the VAX and 10 times faster than a Unix workstation. We achieved an additional speedup of 30 percent by using the compiler's vectoring and 'in-line' capabilities. The code runs at only about 5 percent of the Cray's peak speed because it ineffectively uses the vector and parallel processing capabilities of the Cray. We expect that by restructuring the code, it could execute an additional six to ten times faster

    Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

    Get PDF
    A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm

    Numerische Lösung großer strukturierter DAE-Systeme der chemischen Prozeßsimulation

    Get PDF
    Parallelizable numerical methods for solving large scale DAE systems are developed at the level of differential, nonlinear and linear equations. For this the subsystem-wise structure of the DAE systems based on unit-oriented modelling is explored. Partitionings are used to parallelize waveform relaxation and structured Newton methods. Initial values are computed with a modified Newton method. To solve large sparse systems of linear equations a special Gaussian elimination method is used. The algorithms were implemented on a CRAY C90 vector computer, as well as on both, moderately parallel CRAY J90 vector computers and massively parallel CRAY T3D machines. The methods were tested using several real life examples

    Solving the Cauchy-Riemann equations on parallel computers

    Get PDF
    Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented

    Implementation and analysis of a Navier-Stokes algorithm on parallel computers

    Get PDF
    The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented

    Analysis and performance prediction of scramjet inlets utilizing a three-dimensional Navier-Stokes code

    Get PDF
    A series of inlet analysis codes (2-D, axisymmetric, 3-D) were developed which can analyze complicated flow through complex inlet geometries in a reasonably efficient manner. The codes were verified and are being used extensively to analyze practical inlet geometries both at Langley as well as industries. Newly installed VPS 32 computer will allow more complex configurations to be analyzed. Scalar FORTRAN versions are available to increase transportability of the codes for use on other Scalar computers and on the Cray vector processing computer

    Lanczos eigensolution method for high-performance computers

    Get PDF
    The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

    COSMIC/NASTRAN on the Cray Computer Systems

    Get PDF
    COSMIC/NASTRAN was converted to the CRAY computer systems. The CRAY version is currently available and provides users with access to all of the machine independent source code of COSMIC/NASTRAN. Future releases of COSMIC/NASTRAN will be made available on the CRAY soon after they are released by COSMIC

    A simple parallel prefix algorithm for compact finite-difference schemes

    Get PDF
    A compact scheme is a discretization scheme that is advantageous in obtaining highly accurate solutions. However, the resulting systems from compact schemes are tridiagonal systems that are difficult to solve efficiently on parallel computers. Considering the almost symmetric Toeplitz structure, a parallel algorithm, simple parallel prefix (SPP), is proposed. The SPP algorithm requires less memory than the conventional LU decomposition and is highly efficient on parallel machines. It consists of a prefix communication pattern and AXPY operations. Both the computation and the communication can be truncated without degrading the accuracy when the system is diagonally dominant. A formal accuracy study was conducted to provide a simple truncation formula. Experimental results were measured on a MasPar MP-1 SIMD machine and on a Cray 2 vector machine. Experimental results show that the simple parallel prefix algorithm is a good algorithm for the compact scheme on high-performance computers

    Cluster vs Single-Spin Algorithms -- Which are More Efficient?

    Full text link
    A comparison between single-cluster and single-spin algorithms is made for the Ising model in 2 and 3 dimensions. We compare the amount of computer time needed to achieve a given level of statistical accuracy, rather than the speed in terms of site updates per second or the dynamical critical exponents. Our main result is that the cluster algorithms become more efficient when the system size, LdL^d, exceeds, L70L\sim 70--300300 for d=2d=2 and L80L\sim 80--200200 for d=3d=3. The exact value of the crossover is dependent upon the computer being used. The lower end of the crossover range is typical of workstations while the higher end is typical of vector computers. Hence, even for workstations, the system sizes needed for efficient use of the cluster algorithm is relatively large.Comment: 13pages, postscript file, HLRZ 21/9
    corecore