Search CORE

1,687 research outputs found

A bibliography on parallel and vector numerical algorithms

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

NASA Technical Reports Server

Solving the Klein-Gordon equation using Fourier spectral methods: A benchmark test for computer performance

Author: Aseeri S.
Batrašev O.
Icardi M.
Leu B.
Li N.
Liu A.
Muite B. K.
Müller E.
Palen B.
Quell M.
Servat H.
Sheth P.
Speck R.
Van Moer M.
Vienne J.
Publication venue
Publication date: 01/01/2015
Field of study

The cubic Klein-Gordon equation is a simple but non-trivial partial differential equation whose numerical solution has the main building blocks required for the solution of many other partial differential equations. In this study, the library 2DECOMP&FFT is used in a Fourier spectral scheme to solve the Klein-Gordon equation and strong scaling of the code is examined on thirteen different machines for a problem size of 512^3. The results are useful in assessing likely performance of other parallel fast Fourier transform based programs for solving partial differential equations. The problem is chosen to be large enough to solve on a workstation, yet also of interest to solve quickly on a supercomputer, in particular for parametric studies. Unlike other high performance computing benchmarks, for this problem size, the time to solution will not be improved by simply building a bigger supercomputer.Comment: 10 page

arXiv.org e-Print Archive

OPUS

Juelich Shared Electronic Resources

Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

Author: Swarztrauber Paul N.
Tong Charles
Publication venue
Publication date
Field of study

Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine

NASA Technical Reports Server

On the impact of communication complexity in the design of parallel numerical algorithms

Author: Gannon D.
Vanrosendale J.
Publication venue
Publication date
Field of study

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation

NASA Technical Reports Server

Solution of partial differential equations on vector and parallel computers

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

NASA Technical Reports Server

An investigation on the applicability of multi-microprocessing in the two dimensional digital filtering problem

Author: Whitcher Timothy J.
Publication venue: RIT Scholar Works
Publication date: 01/05/1980
Field of study

Digital image processing has been receiving an increasing amount of development in recent years, largely because high speed digital com puters are becoming readily available. In addition, the advent of the microprocessor has revolutionized the capabilities of compact electronic systems. This thesis examines the applicability of microprocessors to digital image processing. Using a Z8000 microprocessor as a baseline, the computation time for a two-dimensional fast Fourier transform is estimated for various microprocessor architectures. These results are later compared to the manufacturer\u27s specified computation times on a Floating Point. Systems AP-120B Array Processor

RIT Scholar Works

The Cascading Haar Wavelet algorithm for computing the Walsh-Hadamard Transform

Author: Thompson Andrew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/09/2016
Field of study

We propose a novel algorithm for computing the Walsh-Hadamard Transform (WHT) which consists entirely of Haar wavelet transforms. We prove that the algorithm, which we call the Cascading Haar Wavelet (CHW) algorithm, shares precisely the same serial complexity as the popular divide-and-conquer algorithm for the WHT. We also propose a natural way of parallelizing the algorithm which has a number of attractive features

arXiv.org e-Print Archive

Oxford University Research Archive

Gigaflop performance on a CRAY-2: Multitasking a computational fluid dynamics application

Author: Lambiotte Jules J.
Overman Andrea L.
Streett Craig L.
Tennille Geoffrey M.
Publication venue
Publication date
Field of study

The methodology is described for converting a large, long-running applications code that executed on a single processor of a CRAY-2 supercomputer to a version that executed efficiently on multiple processors. Although the conversion of every application is different, a discussion of the types of modification used to achieve gigaflop performance is included to assist others in the parallelization of applications for CRAY computers, especially those that were developed for other computers. An existing application, from the discipline of computational fluid dynamics, that had utilized over 2000 hrs of CPU time on CRAY-2 during the previous year was chosen as a test case to study the effectiveness of multitasking on a CRAY-2. The nature of dominant calculations within the application indicated that a sustained computational rate of 1 billion floating-point operations per second, or 1 gigaflop, might be achieved. The code was first analyzed and modified for optimal performance on a single processor in a batch environment. After optimal performance on a single CPU was achieved, the code was modified to use multiple processors in a dedicated environment. The results of these two efforts were merged into a single code that had a sustained computational rate of over 1 gigaflop on a CRAY-2. Timings and analysis of performance are given for both single- and multiple-processor runs

NASA Technical Reports Server