18,175 research outputs found
PAN AIR: A computer program for predicting subsonic or supersonic linear potential flows about arbitrary configurations using a higher order panel method. Volume 4: Maintenance document (version 3.0)
The Maintenance Document Version 3.0 is a guide to the PAN AIR software system, a system which computes the subsonic or supersonic linear potential flow about a body of nearly arbitrary shape, using a higher order panel method. The document describes the overall system and each program module of the system. Sufficient detail is given for program maintenance, updating, and modification. It is assumed that the reader is familiar with programming and CRAY computer systems. The PAN AIR system was written in FORTRAN 4 language except for a few CAL language subroutines which exist in the PAN AIR library. Structured programming techniques were used to provide code documentation and maintainability. The operating systems accommodated are COS 1.11, COS 1.12, COS 1.13, and COS 1.14 on the CRAY 1S, 1M, and X-MP computing systems. The system is comprised of a data base management system, a program library, an execution control module, and nine separate FORTRAN technical modules. Each module calculates part of the posed PAN AIR problem. The data base manager is used to communicate between modules and within modules. The technical modules must be run in a prescribed fashion for each PAN AIR problem. In order to ease the problem of supplying the many JCL cards required to execute the modules, a set of CRAY procedures (PAPROCS) was created to automatically supply most of the JCL cards. Most of this document has not changed for Version 3.0. It now, however, strictly applies only to PAN AIR version 3.0. The major changes are: (1) additional sections covering the new FDP module (which calculates streamlines and offbody points); (2) a complete rewrite of the section on the MAG module; and (3) strict applicability to CRAY computing systems
Integrated risk/cost planning models for the US Air Traffic system
A prototype network planning model for the U.S. Air Traffic control system is described. The model encompasses the dual objectives of managing collision risks and transportation costs where traffic flows can be related to these objectives. The underlying structure is a network graph with nonseparable convex costs; the model is solved efficiently by capitalizing on its intrinsic characteristics. Two specialized algorithms for solving the resulting problems are described: (1) truncated Newton, and (2) simplicial decomposition. The feasibility of the approach is demonstrated using data collected from a control center in the Midwest. Computational results with different computer systems are presented, including a vector supercomputer (CRAY-XMP). The risk/cost model has two primary uses: (1) as a strategic planning tool using aggregate flight information, and (2) as an integrated operational system for forecasting congestion and monitoring (controlling) flow throughout the U.S. In the latter case, access to a supercomputer is required due to the model's enormous size
COSMIC/NASTRAN on the Cray Computer Systems
COSMIC/NASTRAN was converted to the CRAY computer systems. The CRAY version is currently available and provides users with access to all of the machine independent source code of COSMIC/NASTRAN. Future releases of COSMIC/NASTRAN will be made available on the CRAY soon after they are released by COSMIC
Numerical Techniques for the Study of Long-Time Correlations
In the study of long-time correlations extremely long orbits must be
calculated. This may be accomplished much more reliably using fixed-point
arithmetic. Use of this arithmetic on the Cray-1 computer is illustrated.Comment: Plain TeX, 10 pages. Proc. Workshop on Orbital Dynamics and
Applications to Accelerators, Lawrence Berkeley Laboratory, Berkeley,
California, March 7-12, 198
Cluster vs Single-Spin Algorithms -- Which are More Efficient?
A comparison between single-cluster and single-spin algorithms is made for
the Ising model in 2 and 3 dimensions. We compare the amount of computer time
needed to achieve a given level of statistical accuracy, rather than the speed
in terms of site updates per second or the dynamical critical exponents. Our
main result is that the cluster algorithms become more efficient when the
system size, , exceeds, -- for and --
for . The exact value of the crossover is dependent upon the computer
being used. The lower end of the crossover range is typical of workstations
while the higher end is typical of vector computers. Hence, even for
workstations, the system sizes needed for efficient use of the cluster
algorithm is relatively large.Comment: 13pages, postscript file, HLRZ 21/9
Experiences in porting mini-applications to OpenACC and OpenMP on heterogeneous systems
This article studies mini-applicationsâMinisweep, GenASiS, GPP, and FFâthat use computational methods commonly encountered in HPC. We have ported these applications to develop OpenACC and OpenMP versions, and evaluated their performance on Titan (Cray XK7 with K20x GPUs), Cori (Cray XC40 with Intel KNL), Summit (IBM AC922 with Volta GPUs), and Cori-GPU (Cray CS-Storm 500NX with Intel Skylake and Volta GPUs). Our goals are for these new ports to be useful to both application and compiler developers, to document and describe the lessons learned and the methodology to create optimized OpenMP and OpenACC versions, and to provide a description of possible migration paths between the two specifications. Cases where specific directives or code patterns result in improved performance for a given architecture are highlighted. We also include discussions of the functionality and maturity of the latest compilers available on the above platforms with respect to OpenACC or OpenMP implementations
A parallel nearly implicit time-stepping scheme
Across-the-space parallelism still remains the most mature, convenient and natural way to parallelize large scale problems. One of the major problems here is that implicit time stepping is often difficult to parallelize due to the structure of the system. Approximate implicit schemes have been suggested to circumvent the problem. These schemes have attractive stability properties and they are also very well parallelizable.\ud
The purpose of this article is to give an overall assessment of the parallelism of the method
Parallel computing for the finite element method
A finite element method is presented to compute time harmonic microwave
fields in three dimensional configurations. Nodal-based finite elements have
been coupled with an absorbing boundary condition to solve open boundary
problems. This paper describes how the modeling of large devices has been made
possible using parallel computation, New algorithms are then proposed to
implement this formulation on a cluster of workstations (10 DEC ALPHA 300X) and
on a CRAY C98. Analysis of the computation efficiency is performed using simple
problems. The electromagnetic scattering of a plane wave by a perfect electric
conducting airplane is finally given as example
Scalability Analysis of Parallel GMRES Implementations
Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics
Massive Parallel Quantum Computer Simulator
We describe portable software to simulate universal quantum computers on
massive parallel computers. We illustrate the use of the simulation software by
running various quantum algorithms on different computer architectures, such as
a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI
Altix 3700 and clusters of PCs running Windows XP. We study the performance of
the software by simulating quantum computers containing up to 36 qubits, using
up to 4096 processors and up to 1 TB of memory. Our results demonstrate that
the simulator exhibits nearly ideal scaling as a function of the number of
processors and suggest that the simulation software described in this paper may
also serve as benchmark for testing high-end parallel computers.Comment: To appear in Comp. Phys. Com
- âŠ