131 research outputs found
An Evolutionary Approach to Load Balancing Parallel Computations
We present a new approach to balancing the workload in a multicomputer when the problem is decomposed into subproblems mapped to the processors. It is based on a hybrid genetic algorithm. A number of design choices for genetic algorithms are combined in order to ameliorate the problem of premature convergence that is often encountered in the implementation of classical genetic algorithms. The algorithm is hybridized by including a hill climbing procedure which significantly improves the efficiency of the evolution. Moreover, it makes use of problem specific information to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results show that the hybrid genetic algorithm can find solutions within 3% of the optimum in a reasonable time. They also suggest that this approach is not biased towards particular problem structures
Parallel Computers and Complex Systems
We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines
Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems
This paper presents a semi-distributed approach, for load balancing in large parallel and distributed systems, which is different from the conventional centralized and fully distributed approaches. The proposed strategy uses a two-level hierarchical control by partitioning the interconnection structure of a distributed or multiprocessor system into independent symmetric regions (spheres) centered at some control points. The central points, called schedulers, optimally schedule tasks within their spheres and maintain state information with low overhead. We consider interconnection structures belonging to a number of families of distance transitive graphs for evaluation, and using their algebraic characteristics, show that identification of spheres and their scheduling points is, in general, an NP-complete problem. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as the Hadamard Matrix. Performance of the proposed strategy has been evaluated and compared with an efficient fully distributed strategy, through an extensive simulation study. In addition to yielding high performance in terms of response time and better resource utilization, the proposed strategy incurs less overhead in terms of control messages. It is also shown to be less sensitive to the communication delay of the underlying network
Principles for problem aggregation and assignment in medium scale multiprocessors
One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior
Parallel mapping and circuit partitioning heuristics based on mean field annealing
Ankara : Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent University, 1992.Thesis (Master's) -- Bilkent University, 1992.Includes bibliographical references.Moan Field Annealinp; (MFA) aJgoritlim, receñí,ly proposc'd for solving com
binatorial optimization problems, combines the characteristics of nenral networks
and simulated annealing. In this thesis, MFA is formulated for tlie
mapping i)roblcm and the circuit partitioning problem. EHicient implementation
schemes, which decrease the complexity of the proposed algorithms by
asymptotical factors, are also given. Perlormances of the proposed MFA algorithms
are evaluated in comparison with two well-known heuristics: simulated
annealing and Kernighan-Lin. Results of the experiments indicate that MFA
can be used as an alternative heuristic for the mapping problem and the circuit
partitioning problem. Inherent parallelism of the MFA is exploited by
designing efficient parallel algorithms for the proposed MFA heuristics. Parallel
MFA algorithms proposed for solving the circuit partitioning problem are
implemented on an iPS(J/2’ hypercube multicompute.r. Experimental results
show that the proposed heuristics can be efficiently parallelized, which is crucial
for algorithms that solve such computationally hard problems.Bultan, TevfikM.S
Recommended from our members
Cluster partitioning approaches to parallel Monte Carlo simulation on multiprocessors
We consider the parallelization of Monte Carlo algorithms for analyzing numerical models of charge transport used in semiconductor device physics. Parallel algorithms for the standard k-space Monte Carlo simulation of a three band model of bulk GaAs on hypercube multicomputers are first presented. This Monte Carlo model includes scattering due to polar-optical, intervalley, and acoustic phonons, as well as electron-electron scattering. The k-space Monte Carlo program, excluding electron-electron scattering, is then extended to simulate a semiconductor device by the addition of the real space position of each simulated particle and the assignment of particle charge, using a cloud in cell scheme, to solve the Poisson's equation with particle dynamics. Techniques for effectively partitioning this device so as to balance the computational load while minimizing the communication overhead are discussed. Approaches for improving the efficiency of the parallel algorithm, either by dynamically balancing of load or by employing the usual techniques for enhancing rare events in Monte Carlo simulations are also considered. The parallel algorithms were implemented on a 64-node NCUBE multiprocessor and test results were generated to validate the parallel k-space, as well as the device simulation programs. Timing measurements were also made to study the variation of speedups as both the problem size and number of processors are varied. The effective exploitation of the computational power of message passing multiprocessors requires the efficient mapping of parallel programs onto processors so as to balance the computational load while minimizing the communication overhead between processors. A lower bound for this communication volume when mapping arbitrary task graphs onto distributed processor systems is derived. For a K processor system this lower bound can be computed from the K (possibly) largest eigenvalues of the adjacency matrix of the task graph and the eigenvalues of the adjacency matrix of the processor graph. We also derive the eigenvalues of the adjacency matrix of the processor graph for a hypercube and give test results comparing the lower bound for the communication volume with the values given by a heuristic algorithm for a number of task graphs
- …