2,539 research outputs found
Parallel Aerodynamic Simulation on Open Workstation Clusters. Department of Aerospace Engineering Report no. 9830
The parallel execution of an aerodynamic simulation code on a non-dedicated, heterogeneous
cluster of workstations is examined. This type of facility is commonly
available to CFD developers and users in academia, industry and government laboratories
and is attractive in terms of cost for CFD simulations. However, practical
considerations appear at present to be discouraging widespread adoption of this technology.
The main obstacles to achieving an efficient, robust parallel CFD capability
in a demanding multi-user environment are investigated. A static load-balancing
method, which takes account of varying processor speeds, is described. A dynamic
re-allocation method to account for varying processor loads has been developed.
Use of proprietary management software has facilitated the implementation of the
method
Parallel Computers and Complex Systems
We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines
Neural Networks and Dynamic Complex Systems
We describe the use of neural networks for optimization and inference associated with a variety of complex systems. We show how a string formalism can be used for parallel computer decomposition, message routing and sequential optimizing compilers. We extend these ideas to a general treatment of spatial assessment and distributed artificial intelligence
Parallel Genetic Algorithms with Application to Load Balancing for Parallel Computing
A new coarse grain parallel genetic algorithm (PGA) and a new implementation of a data-parallel GA are presented in this paper. They are based on models of natural evolution in which the population is formed of discontinuous or continuous subpopulations. In addition to simulating natural evolution, the intrinsic parallelism in the two PGA\u27s minimizes the possibility of premature convergence that the implementation of classic GA\u27s often encounters. Intrinsic parallelism also allows the evolution of fit genotypes in a smaller number of generations in the PGA\u27s than in sequential GA\u27s, leading to superlinear speed-ups. The PGA\u27s have been implemented on a hypercube and a Connection Machine, and their operation is demonstrated by applying them to the load balancing problem in parallel computing. The PGA\u27s have found near-optimal solutions which are comparable to the solutions of a simulated annealing algorithm and are better than those produced by a sequential GA and by other load balancing methods. On one hand, The PGA\u27s accentuate the advantage of parallel computers for simulating natural evolution. On the other hand, they represent new techniques for load balancing parallel computations
Recommended from our members
A strategy for mapping unstructured mesh computational mechanics programs onto distributed memory parallel architectures
The motivation of this thesis was to develop strategies that would enable unstructured mesh based computational mechanics codes to exploit the computational advantages offered by distributed memory parallel processors. Strategies that successfully map structured mesh codes onto parallel machines have been developed over the previous decade and used to build a toolkit for automation of the parallelisation process. Extension of the capabilities of this toolkit to include unstructured mesh codes requires new strategies to be developed.
This thesis examines the method of parallelisation by geometric domain decomposition using the single program multi data programming paradigm with explicit message passing. This technique involves splitting (decomposing) the problem definition into P parts that may be distributed over P processors in a parallel machine. Each processor runs the same program and operates only on its part of the problem. Messages passed between the processors allow data exchange to maintain consistency with the original algorithm.
The strategies developed to parallelise unstructured mesh codes should meet a number of requirements:
The algorithms are faithfully reproduced in parallel.
The code is largely unaltered in the parallel version.
The parallel efficiency is maximised.
The techniques should scale to highly parallel systems.
The parallelisation process should become automated.
Techniques and strategies that meet these requirements are developed and tested in this dissertation using a state of the art integrated computational fluid dynamics and solid mechanics code. The results presented demonstrate the importance of the problem partition in the definition of inter-processor communication and hence parallel performance.
The classical measure of partition quality based on the number of cut edges in the mesh partition can be inadequate for real parallel machines. Consideration of the topology of the parallel machine in the mesh partition is demonstrated to be a more significant factor than the number of cut edges in the achieved parallel efficiency. It is shown to be advantageous to allow an increase in the volume of communication in order to achieve an efficient mapping dominated by localised communications. The limitation to parallel performance resulting from communication startup latency is clearly revealed together with strategies to minimise the effect.
The generic application of the techniques to other unstructured mesh codes is discussed in the context of automation of the parallelisation process. Automation of parallelisation based on the developed strategies is presented as possible through the use of run time inspector loops to accurately determine the dependencies that define the necessary inter-processor communication
Recommended from our members
Task Partitioning and Mapping Algorithms for Multi-core Packet Processing Systems
In this work, we have explored different task mapping algorithms for multi-core, packet processing systems. we also implemented these algorithms and compared the results of the algorithms. We first reviewed the previously designed algorithms which include UDFS algorithm and duplication process. We then applied the KL algorithm to our problem and were able to reduce the inter-processor communication by 20% while maintaining the similar utilization. We then modified the original KL algorithm by considering utilization during the mapping process. In this extended KL algorithm, we incorporated a tradeoff factor alpha to tradeoff between inter-processor communication and processor utilization. The best alpha is different for different system configurations in terms of communication bandwidth and computing power. Simulated annealing(SA) algorithm was then implemented. The parameters for SA algorithm were decided by following literature or by doing experiments. Results from SA algorithm shows that it can produce decent results that are comparable to KL algorithm. In order to further improve the utilization, merging operation was applied to the task graph before mapping algorithms were applied. The mapping results showed that merging is a good way to improve the utilization and at the same time keep the communication cost lower. Finally, we applied the mapping algorithms to different packet processing system architectures. The results show how inter-processor communication cost and processor utilization change as system architecture changes
Mapping large-scale FEM-graphs to highly parallel computers with grid-like topology by self-organization
We consider the problem of mapping large scale FEM graphs for
the solution of partial differential equations to highly parallel
distributed memory computers. Typically, these programs show a
low-dimensional grid-like communication structure.
We argue that conventional domain decomposition methods that are
usually employed today are not well suited for future highly
parallel computers as they do not take into account the
interconnection structure of the parallel computer resulting in a
large communication overhead.
Therefore we propose a new mapping heuristic which performs both,
partitioning of the solution domain and processor allocation in
one integrated step. Our procedure is based on the ability of
Kohonen neural networks to exploit topological similarities of an
input space and a grid-like structured network to compute a
neighborhood preserving mapping between the set of discretization
points and the parallel computer.
We report about results of mapping up to 44,000-node FEM graphs to
a 4096-processor parallel computer and demonstrate the capability
of the proposed scheme for dynamic remapping considering adaptive
refinement of the discretization graph
- …