15,926 research outputs found

    Performance Analysis for Mesh and Mesh-Spectral Archetype Applications

    Get PDF
    This document outlines a simple method for benchmarking a parallel communication library and for using the results to model the performance of applications developed with that communication library. We use compositional performance analysis - decomposing a parallel program into its modular parts and analyzing their respective performances - to gain perspective on the performance of the whole program. This model is useful for predicting parallel program execution times for different types of program archetypes, (e.g., mesh and mesh-spectral) using communication libraries built with different message-passing schemes (e.g., Fortran M and Fortran with MPI) running on different architectures (e.g., IBM SP2 and a network of Pentium personal computers)

    Development of programs for computing characteristics of ultraviolet radiation

    Get PDF
    Efficient programs were developed for computing all four characteristics of the radiation scattered by a plane-parallel, turbid, terrestrial atmospheric model. They were developed (FORTRAN 4) and tested on the IBM /360 computers with 2314 direct access storage facility. The storage requirement varies between 200K and 750K bytes depending upon the task. The scattering phase matrix (or function) is expanded in a Fourier series whose number of terms depend upon the zenith angles of the incident and scattered radiations, as well as on the nature of aerosols. A Gauss-Seidel procedure is used for obtaining the numerical solution of the transfer equation

    Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program

    Get PDF
    We describe the parallel implementation of our generalized stellar atmosphere and NLTE radiative transfer computer program PHOENIX. We discuss the parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. Our implementation uses a MIMD design based on a relatively small number of MPI library calls. We report the results of test calculations on a number of different parallel computers and discuss the results of scalability tests.Comment: To appear in ApJ, 1997, vol 483. LaTeX, 34 pages, 3 Figures, uses AASTeX macros and styles natbib.sty, and psfig.st

    A Monte Carlo investigation of thrust imbalance of solid rocket motor pairs

    Get PDF
    A technique is described for theoretical, statistical evaluation of the thrust imbalance of pairs of solid-propellant rocket motors (SRMs) firing in parallel. Sets of the significant variables, determined as a part of the research, are selected using a random sampling technique and the imbalance calculated for a large number of motor pairs. The performance model is upgraded to include the effects of statistical variations in the ovality and alignment of the motor case and mandrel. Effects of cross-correlations of variables are minimized by selecting for the most part completely independent input variables, over forty in number. The imbalance is evaluated in terms of six time - varying parameters as well as eleven single valued ones which themselves are subject to statistical analysis. A sample study of the thrust imbalance of 50 pairs of 146 in. dia. SRMs of the type to be used on the space shuttle is presented. The FORTRAN IV computer program of the analysis and complete instructions for its use are included. Performance computation time for one pair of SRMs is approximately 35 seconds on the IBM 370/155 using the FORTRAN H compiler

    The impact of fourth generation computers on NASTRAN

    Get PDF
    The impact of 'fourth generation' computers (STAR 100 or ILLIAC 4) on NASTRAN is considered. The desired characteristics of large programs designed for execution on 4G machines are described

    On the acceleration of wavefront applications using distributed many-core architectures

    Get PDF
    In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures

    An LLVM Instrumentation Plug-in for Score-P

    Full text link
    Reducing application runtime, scaling parallel applications to higher numbers of processes/threads, and porting applications to new hardware architectures are tasks necessary in the software development process. Therefore, developers have to investigate and understand application runtime behavior. Tools such as monitoring infrastructures that capture performance relevant data during application execution assist in this task. The measured data forms the basis for identifying bottlenecks and optimizing the code. Monitoring infrastructures need mechanisms to record application activities in order to conduct measurements. Automatic instrumentation of the source code is the preferred method in most application scenarios. We introduce a plug-in for the LLVM infrastructure that enables automatic source code instrumentation at compile-time. In contrast to available instrumentation mechanisms in LLVM/Clang, our plug-in can selectively include/exclude individual application functions. This enables developers to fine-tune the measurement to the required level of detail while avoiding large runtime overheads due to excessive instrumentation.Comment: 8 page
    corecore