1,000 research outputs found
Determining an Out-of-Core FFT Decomposition Strategy for Parallel Disks by Dynamic Programming
We present an out-of-core FFT algorithm based on the in-core FFT method developed by Swarztrauber. Our algorithm uses a recursive divide-and-conquer strategy, and each stage in the recursion presents several possibilities for how to split the problem into subproblems. We give a recurrence for the algorithm\u27s I/O complexity on the Parallel Disk Model and show how to use dynamic programming to determine optimal splits at each recursive stage. The algorithm to determine the optimal splits takes only Theta(lg^2 N) time for an N-point FFT, and it is practical. The out-of-core FFT algorithm itself takes considerably longer
ERS-1 SAR data processing
To take full advantage of the synthetic aperature radar (SAR) to be flown on board the European Space Agency's Remote Sensing Satellite (ERS-1) (1989) and the Canadian Radarsat (1990), the implementation of a receiving station in Alaska is being studied to gather and process SAR data pertaining in particular to regions within the station's range of reception. The current SAR data processing requirement is estimated to be on the order of 5 minutes per day. The Interim Digital Sar Processor (IDP) which was under continual development through Seasat (1978) and SIR-B (1984) can process slightly more than 2 minutes of ERS-1 data per day. On the other hand, the Advanced Digital SAR Processore (ADSP), currently under development for the Shuttle Imaging Radar C (SIR-C, 1988) and the Venus Radar Mapper, (VMR, 1988), is capable of processing ERS-1 SAR data at a real time rate. To better suit the anticipated ERS-1 SAR data processing requirement, both a modified IDP and an ADSP derivative are being examined. For the modified IDP, a pipelined architecture is proposed for the mini-computer plus array processor arrangement to improve throughout. For the ADSP derivative, a simplified version is proposed to enhance ease of implementation and maintainability while maintaing real time throughput rates. These processing systems are discussed and evaluated
Multiprocessor Out-of-Core FFTs with Distributed Memory and Parallel Disks
This paper extends an earlier out-of-core Fast Fourier Transform (FFT) method for a uniprocessor with the Parallel Disk Model (PDM) to use multiple processors. Four out-of-core multiprocessor methods are examined. Operationally, these methods differ in the size of mini-butterfly computed in memory and how the data are organized on the disks and in the distributed memory of the multiprocessor. The methods also perform differing amounts of I/O and communication. Two of them have the remarkable property that even though they are computing the FFT on a multiprocessor, all interprocessor communication occurs outside the mini-butterfly computations. Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butterfly computations require approximately 86% of the time of those that do. Moreover, the faster methods are much easier to implement
Optimizing the Dimensional Method for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs
We present an improved version of the Dimensional Method for computing multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system when the data consist of too many records to fit into memory. Data are spread across parallel disks and processed in sections. We use the Parallel Disk Model for analysis. The simple Dimensional Method performs the 1-dimensional FFTs for each dimension in term. Between each dimension, an out-of-core permutation is used to rearrange the data to contiguous locations. The improved Dimensional Method processes multiple dimensions at a time. We show that determining an optimal sequence and groupings of dimensions is NP-complete. We then analyze the effects of two modifications to the Dimensional Method independently: processing multiple dimensions at one time, and processing single dimensions in a different order. Finally, we show a lower bound on the I/O complexity of the Dimensional Method and present an algorithm that is approximately asymptotically optimal
Solving the Klein-Gordon equation using Fourier spectral methods: A benchmark test for computer performance
The cubic Klein-Gordon equation is a simple but non-trivial partial
differential equation whose numerical solution has the main building blocks
required for the solution of many other partial differential equations. In this
study, the library 2DECOMP&FFT is used in a Fourier spectral scheme to solve
the Klein-Gordon equation and strong scaling of the code is examined on
thirteen different machines for a problem size of 512^3. The results are useful
in assessing likely performance of other parallel fast Fourier transform based
programs for solving partial differential equations. The problem is chosen to
be large enough to solve on a workstation, yet also of interest to solve
quickly on a supercomputer, in particular for parametric studies. Unlike other
high performance computing benchmarks, for this problem size, the time to
solution will not be improved by simply building a bigger supercomputer.Comment: 10 page
Out-of-Core Hydrodynamic Simulations for Cosmological Applications
We present an out-of-core hydrodynamic code for high resolution cosmological
simulations that require terabytes of memory. Out-of-core computation refers to
the technique of using disk space as virtual memory and transferring data in
and out of main memory at high I/O bandwidth. The code is based on a two-level
mesh scheme where short-range physics is solved on a high-resolution, localized
mesh while long-range physics is captured on a lower resolution, global mesh.
The two-level mesh gravity solver allows FFTs to operate on data stored
entirely in memory, which is much faster than the alternative of computing the
transforms out-of-core through non-sequential disk accesses. We also describe
an out-of-core initial conditions generator that is used to prepare large data
sets for cosmological simulations. The out-of-core code is accurate,
cost-effective, and memory-efficient and the current version is implemented to
run in parallel on shared-memory machines. I/O overhead is significantly
reduced down to less than 10% by performing disk operations concurrently with
numerical calculations. The current computational setup, which includes a 32
processor Alpha server and a 3 TB striped SCSI disk array, allows us to run
cosmological simulations with up to 4000^3 grid cells and 2000^3 dark matter
particles.Comment: 19 pages, 10 figures; accepted by New Astronom
First-principle molecular dynamics with ultrasoft pseudopotentials: parallel implementation and application to extended bio-inorganic system
We present a plane-wave ultrasoft pseudopotential implementation of
first-principle molecular dynamics, which is well suited to model large
molecular systems containing transition metal centers. We describe an efficient
strategy for parallelization that includes special features to deal with the
augmented charge in the contest of Vanderbilt's ultrasoft pseudopotentials. We
also discuss a simple approach to model molecular systems with a net charge
and/or large dipole/quadrupole moments. We present test applications to
manganese and iron porphyrins representative of a large class of biologically
relevant metallorganic systems. Our results show that accurate
Density-Functional Theory calculations on systems with several hundred atoms
are feasible with access to moderate computational resources.Comment: 29 pages, 4 Postscript figures, revtex
- …