Search CORE

2,572 research outputs found

A study of the communication cost of the FFT on torus multicomputers

Author: Díaz de Cerio Ripalda Luis Manuel
González Colás Antonio María
Valero García Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Some fast elliptic solvers on parallel architectures and their complexities

Author: Gallopoulos E.
Saad Youcef
Publication venue
Publication date
Field of study

The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR

NASA Technical Reports Server

Hypercube technology

Author: Cwik Tom
Ferraro Robert D.
Liewer Paulett C.
Parker Jay W.
Patterson Jean E.
Publication venue
Publication date
Field of study

The JPL designed MARKIII hypercube supercomputer has been in application service since June 1988 and has had successful application to a broad problem set including electromagnetic scattering, discrete event simulation, plasma transport, matrix algorithms, neural network simulation, image processing, and graphics. Currently, problems that are not homogeneous are being attempted, and, through this involvement with real world applications, the software is evolving to handle the heterogeneous class problems efficiently

NASA Technical Reports Server

Preconditioning of Improved and ``Perfect'' Fermion Actions

Author: A. Frommer
B. Medeke
Barbour
Battista
Bhattacharya
Bietenholz
Bietenholz
Bietenholz
Eicker
Eisenstat
Fischer
Frommer
G. Weuffen
Gupta
Hasenfratz
Jansen
Jansen
K. Schilling
Lepage
Lüscher
Lüscher
N. Eicker
Niedermayer
Orginos
Oyanagi
Sheikholeslami
Symanzik
Th. Lippert
W. Bietenholz
Wilson
Wilson
Publication venue: 'Elsevier BV'
Publication date: 01/01/1998
Field of study

We construct a locally-lexicographic SSOR preconditioner to accelerate the parallel iterative solution of linear systems of equations for two improved discretizations of lattice fermions: the Sheikholeslami-Wohlert scheme where a non-constant block-diagonal term is added to the Wilson fermion matrix and renormalization group improved actions which incorporate couplings beyond nearest neighbors of the lattice fermion fields. In case (i) we find the block llssor-scheme to be more effective by a factor about 2 than odd-even preconditioned solvers in terms of convergence rates, at beta=6.0. For type (ii) actions, we show that our preconditioner accelerates the iterative solution of a linear system of hypercube fermions by a factor of 3 to 4.Comment: 27 pages, Latex, 17 Figures include

arXiv.org e-Print Archive

CiteSeerX

Crossref

Juelich Shared Electronic Resources

CERN Document Server

Spectral element methods: Algorithms and architectures

Author: Dewey Daniel
Fischer Paul
Patera Anthony T.
Ronquist Einar M.
Publication venue
Publication date
Field of study

Spectral element methods are high-order weighted residual techniques for partial differential equations that combine the geometric flexibility of finite element methods with the rapid convergence of spectral techniques. Spectral element methods are described for the simulation of incompressible fluid flows, with special emphasis on implementation of spectral element techniques on medium-grained parallel processors. Two parallel architectures are considered: the first, a commercially available message-passing hypercube system; the second, a developmental reconfigurable architecture based on Geometry-Defining Processors. High parallel efficiency is obtained in hypercube spectral element computations, indicating that load balancing and communication issues can be successfully addressed by a high-order technique/medium-grained processor algorithm-architecture coupling

NASA Technical Reports Server

Algorithmic considerations of integrated design for CSI on a hypercube architecture

Author: Oezguener F.
Oezguener UE.
Publication venue
Publication date
Field of study

An approach is presented to the integrated design problem for actively controlled large, flexible mechanical systems for which Control Structure Interaction (CSI) problems are of concern. The two coupled design problems were identified as the optimal Structural Design problem the optimal Controller Design problem. These two problems can be addressed within a decision making loop that would consider each separately, and then sequentially analyze the effects of one on the other. Embedded in such a loop would be the simulation and coordination tasks as part of the decision tools required in a total (software) package. All of the above are compute-intensive tasks. In any such task, possible decompositions and gains due to the inherent parallelism have to be exploited. The problems under consideration, as applied to large flexible mechanical structures, are particularly suited to be mapped onto multicomputer systems in a hypercube topology

NASA Technical Reports Server

Probabilistic structural mechanics research for parallel processing computers

Author: Chen Heh-Chyun
Martin William R.
Sues Robert H.
Twisdale Lawrence A.
Publication venue
Publication date
Field of study

Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical

NASA Technical Reports Server

Hypercube matrix computation task

Author: Calalo R.
Imbriale W.
Liewer P.
Lyons J.
Manshadi F.
Patterson J.
Publication venue
Publication date
Field of study

The Hypercube Matrix Computation (Year 1986-1987) task investigated the applicability of a parallel computing architecture to the solution of large scale electromagnetic scattering problems. Two existing electromagnetic scattering codes were selected for conversion to the Mark III Hypercube concurrent computing environment. They were selected so that the underlying numerical algorithms utilized would be different thereby providing a more thorough evaluation of the appropriateness of the parallel environment for these types of problems. The first code was a frequency domain method of moments solution, NEC-2, developed at Lawrence Livermore National Laboratory. The second code was a time domain finite difference solution of Maxwell's equations to solve for the scattered fields. Once the codes were implemented on the hypercube and verified to obtain correct solutions by comparing the results with those from sequential runs, several measures were used to evaluate the performance of the two codes. First, a comparison was provided of the problem size possible on the hypercube with 128 megabytes of memory for a 32-node configuration with that available in a typical sequential user environment of 4 to 8 megabytes. Then, the performance of the codes was anlyzed for the computational speedup attained by the parallel architecture

NASA Technical Reports Server

Hypercube matrix computation task

Author: Calalo Ruel H.
Imbriale William A.
Jacobi Nathan
Liewer Paulett C.
Lockhart Thomas G.
Lyons James R.
Lyzenga Gregory A.
Manshadi Farzin
Patterson Jean E.
Publication venue
Publication date
Field of study

A major objective of the Hypercube Matrix Computation effort at the Jet Propulsion Laboratory (JPL) is to investigate the applicability of a parallel computing architecture to the solution of large-scale electromagnetic scattering problems. Three scattering analysis codes are being implemented and assessed on a JPL/California Institute of Technology (Caltech) Mark 3 Hypercube. The codes, which utilize different underlying algorithms, give a means of evaluating the general applicability of this parallel architecture. The three analysis codes being implemented are a frequency domain method of moments code, a time domain finite difference code, and a frequency domain finite elements code. These analysis capabilities are being integrated into an electromagnetics interactive analysis workstation which can serve as a design tool for the construction of antennas and other radiating or scattering structures. The first two years of work on the Hypercube Matrix Computation effort is summarized. It includes both new developments and results as well as work previously reported in the Hypercube Matrix Computation Task: Final Report for 1986 to 1987 (JPL Publication 87-18)

NASA Technical Reports Server

Group implicit concurrent algorithms in nonlinear structural dynamics

Author: Ortiz M.
Sotelino E. D.
Publication venue
Publication date
Field of study

During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers

NASA Technical Reports Server