Search CORE

224 research outputs found

Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

Author: Swarztrauber Paul N.
Tong Charles
Publication venue
Publication date
Field of study

Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine

NASA Technical Reports Server

Fast Fourier Transform algorithm design and tradeoffs

Author: Adams George B., III
Kamin Ray A., III
Publication venue
Publication date
Field of study

The Fast Fourier Transform (FFT) is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional Single Instruction Stream/Multiple Data (SIMD) parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Fast Fourier Transform programs are compared to the currently best Cray-2 FFT program

NASA Technical Reports Server

Static Mapping of Functional Programs: An Example in Signal Processing

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1996
Field of study

Crossref

Solving the shallow water equations on the Cray X-MP/48 and the connection machine 2

Author: Sato Richard K.
Swarztrauber Paul N.
Publication venue
Publication date
Field of study

The shallow water equations in Cartesian coordinates and 2-D are solved on the Connection Machine 2 (CM-2) using both the spectral and finite difference methods. A description of these implementations is presented together with a brief discussion of the CM-2 as it relates to these specific computations. The finite difference code was written both in C* and *LISP and the spectral code was written in *LISP. The performance of the codes is compared with a FORTRAN version that was optimized for the Cray X-MP/48

NASA Technical Reports Server

Tackling Latency Using FG

Author: Natarajan Priya
Publication venue: Dartmouth Digital Commons
Publication date: 01/09/2011
Field of study

Applications that operate on datasets which are too big to fit in main memory, known in the literature as external-memory or out-of-core applications, store their data on one or more disks. Several of these applications make multiple passes over the data, where each pass reads data from disk, operates on it, and writes data back to disk. Compared with an in-memory operation, a disk-I/O operation takes orders of magnitude (approx. 100,000 times) longer; that is, disk-I/O is a high-latency operation. Out-of-core algorithms often run on a distributed-memory cluster to take advantage of a cluster\u27s computing power, memory, disk space, and bandwidth. By doing so, however, they introduce another high-latency operation: interprocessor communication. Efficient implementations of these algorithms access data in blocks to amortize the cost of a single data transfer over the disk or the network, and they introduce asynchrony to overlap high-latency operations and computations. FG, short for Asynchronous Buffered Computation Design and Engineering Framework Generator, is a programming framework that helps to mitigate latency in out-of-core programs that run on distributed-memory clusters. An FG program is composed of a pipeline of stages operating on buffers. FG runs the stages asynchronously so that stages performing high-latency operations can overlap their work with other stages. FG supplies the code to create a pipeline, synchronize the stages, and manage data buffers; the user provides a straightforward function, containing only synchronous calls, for each stage. In this thesis, we use FG to tackle latency and exploit the available parallelism in out-of-core and distributed-memory programs. We show how FG helps us design out-of-core programs and think about parallel computing in general using three instances: an out-of-core, distribution-based sorting program; an implementation of external-memory suffix arrays; and a scientific-computing application called the fast Gauss transform. FG\u27s interaction with these real-world programs is symbiotic: FG enables efficient implementations of these programs, and the design of the first two of these programs pointed us toward further extensions for FG. Today\u27s era of multicore machines compels us to harness all opportunities for parallelism that are available in a program, and so in the latter two applications, we combine FG\u27s multithreading capabilities with the routines that OpenMP offers for in-core parallelism. In the fast Gauss transform application, we use this strategy to realize an up to 20-fold performance improvement compared with an alternate fast Gauss transform implementation. In addition, we use our experience with designing programs in FG to provide some suggestions for the next version of FG

Dartmouth Digital Commons (Dartmouth College)

Parallel FFTCAP : a parallel precorrected FFT based capacitance extraction program for signal integrity analysis

Author: Nadkarni Vivek Bhalchandra, 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 57-58).by Vivek Bhalchandra Nadkarni.M.Eng

DSpace@MIT

A multidomain spectral method for solving elliptic equations

Author: Adams
Axelsson
Baden
Barrett
Boyd
Brandt
Canuto
Cook
Cook
Demaret
Deville
Funaro
Gervasio
Gottlieb
Grandclément
Harald P. Pfeiffer
Kidder
Ku
Lawrence E. Kidder
Macaraeg
Mark A. Scheel
Marronetti
Orszag
Orszag
Pfeiffer
Pinelli
Press
Saad
Saul A. Teukolsky
Smith
Swarztrauber
Swarztrauber
Swarztrauber
Publication venue: 'Elsevier BV'
Publication date: 27/02/2002
Field of study

We present a new solver for coupled nonlinear elliptic partial differential equations (PDEs). The solver is based on pseudo-spectral collocation with domain decomposition and can handle one- to three-dimensional problems. It has three distinct features. First, the combined problem of solving the PDE, satisfying the boundary conditions, and matching between different subdomains is cast into one set of equations readily accessible to standard linear and nonlinear solvers. Second, touching as well as overlapping subdomains are supported; both rectangular blocks with Chebyshev basis functions as well as spherical shells with an expansion in spherical harmonics are implemented. Third, the code is very flexible: The domain decomposition as well as the distribution of collocation points in each domain can be chosen at run time, and the solver is easily adaptable to new PDEs. The code has been used to solve the equations of the initial value problem of general relativity and should be useful in many other problems. We compare the new method to finite difference codes and find it superior in both runtime and accuracy, at least for the smooth problems considered here.Comment: 31 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Caltech Authors

CERN Document Server