Search CORE

453 research outputs found

Guidelines for developing vectorizable computer programs

Author: Miner E. W.
Publication venue
Publication date
Field of study

Some fundamental principles for developing computer programs which are compatible with array-oriented computers are presented. The emphasis is on basic techniques for structuring computer codes which are applicable in FORTRAN and do not require a special programming language or exact a significant penalty on a scalar computer. Researchers who are using numerical techniques to solve problems in engineering can apply these basic principles and thus develop transportable computer programs (in FORTRAN) which contain much vectorizable code. The vector architecture of the ASC is discussed so that the requirements of array processing can be better appreciated. The "vectorization" of a finite-difference viscous shock-layer code is used as an example to illustrate the benefits and some of the difficulties involved. Increases in computing speed with vectorization are illustrated with results from the viscous shock-layer code and from a finite-element shock tube code. The applicability of these principles was substantiated through running programs on other computers with array-associated computing characteristics, such as the Hewlett-Packard (H-P) 1000-F

NASA Technical Reports Server

Supercomputer optimizations for stochastic optimal control applications

Author: Chung Siu-Leung
Hanson Floyd B.
Xu Huihuang
Publication venue
Publication date
Field of study

Supercomputer optimizations for a computational method of solving stochastic, multibody, dynamic programming problems are presented. The computational method is valid for a general class of optimal control problems that are nonlinear, multibody dynamical systems, perturbed by general Markov noise in continuous time, i.e., nonsmooth Gaussian as well as jump Poisson random white noise. Optimization techniques for vector multiprocessors or vectorizing supercomputers include advanced data structures, loop restructuring, loop collapsing, blocking, and compiler directives. These advanced computing techniques and superconducting hardware help alleviate Bellman's curse of dimensionality in dynamic programming computations, by permitting the solution of large multibody problems. Possible applications include lumped flight dynamics models for uncertain environments, such as large scale and background random aerospace fluctuations

NASA Technical Reports Server

Accuracy and speed in computing the Chebyshev collocation derivative

Author: Don Wai-Sun
Solomonoff Alex
Publication venue
Publication date
Field of study

We studied several algorithms for computing the Chebyshev spectral derivative and compare their roundoff error. For a large number of collocation points, the elements of the Chebyshev differentiation matrix, if constructed in the usual way, are not computed accurately. A subtle cause is is found to account for the poor accuracy when computing the derivative by the matrix-vector multiplication method. Methods for accurately computing the elements of the matrix are presented, and we find that if the entities of the matrix are computed accurately, the roundoff error of the matrix-vector multiplication is as small as that of the transform-recursion algorithm. Results of CPU time usage are shown for several different algorithms for computing the derivative by the Chebyshev collocation method for a wide variety of two-dimensional grid sizes on both an IBM and a Cray 2 computer. We found that which algorithm is fastest on a particular machine depends not only on the grid size, but also on small details of the computer hardware as well. For most practical grid sizes used in computation, the even-odd decomposition algorithm is found to be faster than the transform-recursion method

NASA Technical Reports Server

Development of a CRAY 1 version of the SINDA program

Author: Fogerson P. E.
Juba S. M.
Publication venue
Publication date
Field of study

The SINDA thermal analyzer program was transferred from the UNIVAC 1110 computer to a CYBER And then to a CRAY 1. Significant changes to the code of the program were required in order to execute efficiently on the CYBER and CRAY. The program was tested on the CRAY using a thermal math model of the shuttle which was too large to run on either the UNIVAC or CYBER. An effort was then begun to further modify the code of SINDA in order to make effective use of the vector capabilities of the CRAY

NASA Technical Reports Server

Initial operating capability for the hypercluster parallel-processing test bed

Author: Blech Richard A.
Cole Gary L.
Quealy Angela
Publication venue
Publication date
Field of study

The NASA Lewis Research Center is investigating the benefits of parallel processing to applications in computational fluid and structural mechanics. To aid this investigation, NASA Lewis is developing the Hypercluster, a multi-architecture, parallel-processing test bed. The initial operating capability (IOC) being developed for the Hypercluster is described. The IOC will provide a user with a programming/operating environment that is interactive, responsive, and easy to use. The IOC effort includes the development of the Hypercluster Operating System (HYCLOPS). HYCLOPS runs in conjunction with a vendor-supplied disk operating system on a Front-End Processor (FEP) to provide interactive, run-time operations such as program loading, execution, memory editing, and data retrieval. Run-time libraries, that augment the FEP FORTRAN libraries, are being developed to support parallel and vector processing on the Hypercluster. Special utilities are being provided to enable passage of information about application programs and their mapping to the operating system. Communications between the FEP and the Hypercluster are being handled by dedicated processors, each running a Message-Passing Kernel, (MPK). A shared-memory interface allows rapid data exchange between HYCLOPS and the communications processors. Input/output handlers are built into the HYCLOPS-MPK interface, eliminating the need for the user to supply separate I/O support programs on the FEP

NASA Technical Reports Server

Towards an Achievable Performance for the Loop Nests

Author: A Darte
AH Ashouri
AW Lim
DA Padua
G Fursin
Georgios Tournavitis
J Demšar
K Kennedy
K Stock
MJ Wolfe
Padua
R Allen
R Cammarota
T Grosser
U Bondhugula
W Li
Zhangxiaowen Gong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC 2018

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Parallel machine architecture and compiler design facilities

Author: Kuck David J.
Padua David
Sameh Ahmed
Veidenbaum Alex
Yew Pen-Chung
Publication venue
Publication date
Field of study

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role

NASA Technical Reports Server

Vector computers, Monte Carlo simulation and regression analysis: An introduction.

Author: Annink B.
Kleijnen J.P.C.
Publication venue
Publication date
Field of study

Research Papers in Economics

Use of CYBER 203 and CYBER 205 computers for three-dimensional transonic flow calculations

Author: Keller J. D.
Melson N. D.
Publication venue
Publication date
Field of study

Experiences are discussed for modifying two three-dimensional transonic flow computer programs (FLO 22 and FLO 27) for use on the CDC CYBER 203 computer system. Both programs were originally written for use on serial machines. Several methods were attempted to optimize the execution of the two programs on the vector machine: leaving the program in a scalar form (i.e., serial computation) with compiler software used to optimize and vectorize the program, vectorizing parts of the existing algorithm in the program, and incorporating a vectorizable algorithm (ZEBRA I or ZEBRA II) in the program. Comparison runs of the programs were made on CDC CYBER 175. CYBER 203, and two pipe CDC CYBER 205 computer systems

NASA Technical Reports Server

Loop Optimizations in C and C++ Compilers: An Overview

Author: Kovács Réka
Porkoláb Zoltán
Publication venue: 'Annales Mathematicae et Informaticae - AMI'
Publication date: 01/01/2020
Field of study

The evolution of computer hardware in the past decades has truly been remarkable. From scalar instruction execution through superscalar and vector to parallel, processors are able to reach astonishing speeds – if programmed accordingly. Now, writing programs that take all the hardware details into consideration for the sake of efficiency is extremely difficult and error-prone. Therefore we increasingly rely on compilers to do the heavy-lifting for us. A significant part of optimizations done by compilers are loop optimiza- tions. Loops are inherently expensive parts of a program in terms of run time, and it is important that they exploit superscalar and vector instructions. In this paper, we give an overview of the scientific literature on loop optimization technology, and summarize the status of current implementations in the most widely used C and C++ compilers in the industry

Repository of the Academy's Library