Search CORE

138 research outputs found

A study of the communication cost of the FFT on torus multicomputers

Author: Díaz de Cerio Ripalda Luis Manuel
González Colás Antonio María
Valero García Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Directed taskgraph scheduling using simulated annealing

Author: D'Hollander Erik
Devis Yves
Publication venue: 'Informa UK Limited'
Publication date: 01/01/1991
Field of study

Ghent University Academic Bibliography

A parallel progressive radiosity algorithm based on patch data circulation

Author: Aykanat C.
Capin T.
Ozguc B.
Publication venue: 'Elsevier BV'
Publication date: 01/01/1996
Field of study

Cataloged from PDF version of article.Current research on radiosity has concentrated on increasing the accuracy and the speed of the solution. Although algorithmic and meshing techniques decrease the execution time, still excessive computational power is required for complex scenes. Hence, parallelism can be exploited for speeding up the method further. This paper aims at providing a thorough examination of parallelism in the basic progressive refinement radiosity, and investigates its parallelization on distributed-memory parallel architectures. A synchronous scheme, based on static task assignment, is proposed to achieve better coherence for shooting patch selections. An efficient global circulation scheme is proposed for the parallel light distribution computations, which reduces the total volume of concurrent communication by an asymptotical factor. The proposed parallel algorithm is implemented on an Intel's iPSC/2 hypercube multicomputer. Load balance qualities of the proposed static assignment schemes are evaluated experimentally. The effect of coherence in the parallel light distribution computations on the shooting patch selection sequence is also investigated. Theoretical and experimental evaluation is also presented to verify that the proposed parallelization scheme yields equally good performance on multicomputers implementing the simplest (e.g. ring) as well as the richest (e.g. hypercube) interconnection topologies. This paper also proposes and presents a parallel load re-balancing scheme which enhances our basic parallel radiosity algorithm to be usable in the parallelization of radiosity methods adopting adaptive subdivision and meshing techniques. (C) 1996 Elsevier Science Lt

Bilkent University Institutional Repository

Parallel Genetic Algorithms with Application to Load Balancing for Parallel Computing

Author: Fox Geoffrey C.
Mansouri N.
Publication venue: SURFACE at Syracuse University
Publication date: 01/09/1991
Field of study

A new coarse grain parallel genetic algorithm (PGA) and a new implementation of a data-parallel GA are presented in this paper. They are based on models of natural evolution in which the population is formed of discontinuous or continuous subpopulations. In addition to simulating natural evolution, the intrinsic parallelism in the two PGA\u27s minimizes the possibility of premature convergence that the implementation of classic GA\u27s often encounters. Intrinsic parallelism also allows the evolution of fit genotypes in a smaller number of generations in the PGA\u27s than in sequential GA\u27s, leading to superlinear speed-ups. The PGA\u27s have been implemented on a hypercube and a Connection Machine, and their operation is demonstrated by applying them to the load balancing problem in parallel computing. The PGA\u27s have found near-optimal solutions which are comparable to the solutions of a simulated annealing algorithm and are better than those produced by a sequential GA and by other load balancing methods. On one hand, The PGA\u27s accentuate the advantage of parallel computers for simulating natural evolution. On the other hand, they represent new techniques for load balancing parallel computations

Syracuse University Research Facility and Collaborative Environment

Efficient processor management strategies for multicomputer systems

Author: Chang Chung-Yen
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1997
Field of study

Multicomputers are cost-effective alternatives to the conventional supercomputers. Contemporary processor management schemes tend to underutilize the processors and leave many of the processors in the system idle while jobs are waiting for execution;Instead of designing faster processors or interconnection networks, a substantial performance improvement can be obtained by implementing better processor management strategies. This dissertation studies the performance issues related to the processor management schemes and proposes several ways to enhance the multicomputer systems by means of processor management. The proposed schemes incorporate the concepts of size-reduction, non-contiguous allocation, as well as job migration. Job scheduling using a bypass-queue is also studied. All the proposed schemes are proven effective in improving the system performance via extensive simulations. Each proposed scheme has different implementation cost and constraints. In order to take advantage of these schemes, judicious selection of system parameters is important and is discussed

Digital Repository @ Iowa State University (ISU)

An Evolutionary Approach to Load Balancing Parallel Computations

Author: Fox Geoffrey C.
Mansouri N.
Publication venue: SURFACE at Syracuse University
Publication date: 01/04/1991
Field of study

We present a new approach to balancing the workload in a multicomputer when the problem is decomposed into subproblems mapped to the processors. It is based on a hybrid genetic algorithm. A number of design choices for genetic algorithms are combined in order to ameliorate the problem of premature convergence that is often encountered in the implementation of classical genetic algorithms. The algorithm is hybridized by including a hill climbing procedure which significantly improves the efficiency of the evolution. Moreover, it makes use of problem specific information to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results show that the hybrid genetic algorithm can find solutions within 3% of the optimum in a reasonable time. They also suggest that this approach is not biased towards particular problem structures

Syracuse University Research Facility and Collaborative Environment

Static allocation of computation to processors in multicomputers

Author: Norman Michael G.
Publication venue: The University of Edinburgh
Publication date: 01/01/1993
Field of study

Edinburgh Research Archive

Benchmarking hypercube hardware and software

Author: Grunwald Dirk C.
Reed Daniel A.
Publication venue
Publication date
Field of study

It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns

NASA Technical Reports Server