58 research outputs found

    An Evolutionary Approach to Load Balancing Parallel Computations

    Get PDF
    We present a new approach to balancing the workload in a multicomputer when the problem is decomposed into subproblems mapped to the processors. It is based on a hybrid genetic algorithm. A number of design choices for genetic algorithms are combined in order to ameliorate the problem of premature convergence that is often encountered in the implementation of classical genetic algorithms. The algorithm is hybridized by including a hill climbing procedure which significantly improves the efficiency of the evolution. Moreover, it makes use of problem specific information to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results show that the hybrid genetic algorithm can find solutions within 3% of the optimum in a reasonable time. They also suggest that this approach is not biased towards particular problem structures

    A Comparison of Load Balancing Algorithms for Parallel Computations

    Get PDF
    Three physical optimization methods are considered in this paper for load balancing parallel computations. These are simulated annealing, genetic algorithms, and neural networks. Some design choices and the inclusion of additional steps lead to new versions of the algorithms with different solution qualities and execution times. The performances of these versions are critically evaluated and compared for test cases with different topologies and sizes. Orthogonal recursive coordinate bisection is also included in the comparison as a typical simple deterministic method. Simulation results show that the algorithms have diverse properties. Hence, different algorithms can be applied to different problems and requirements. For example, the annealing and genetic algorithms produce better solutions and do not show a bias towards particular problem structures. But, they are slower than the neural network and recursive bisection. Preprocessing graph contraction is one of the additional steps suggested for the physical methods. It produces a significant reduction in execution time, which is necessary for their applicability to large-scale problems

    Parallel Computers and Complex Systems

    Get PDF
    We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

    A new load balancing heuristic using self-organizing maps

    Get PDF
    Ankara : The Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent Univ., 1999.Thesis (Master's) -- Bilkent University, 1999.Includes bibliographical references leaves 68-71.Atun, MuratM.S

    Mapping large-scale FEM-graphs to highly parallel computers with grid-like topology by self-organization

    Get PDF
    We consider the problem of mapping large scale FEM graphs for the solution of partial differential equations to highly parallel distributed memory computers. Typically, these programs show a low-dimensional grid-like communication structure. We argue that conventional domain decomposition methods that are usually employed today are not well suited for future highly parallel computers as they do not take into account the interconnection structure of the parallel computer resulting in a large communication overhead. Therefore we propose a new mapping heuristic which performs both, partitioning of the solution domain and processor allocation in one integrated step. Our procedure is based on the ability of Kohonen neural networks to exploit topological similarities of an input space and a grid-like structured network to compute a neighborhood preserving mapping between the set of discretization points and the parallel computer. We report about results of mapping up to 44,000-node FEM graphs to a 4096-processor parallel computer and demonstrate the capability of the proposed scheme for dynamic remapping considering adaptive refinement of the discretization graph

    Upper Bound Analysis and Routing in Optical Benes Networks

    Get PDF
    Multistage Interconnection Networks (MIN) are popular in switching and communication applications. It has been used in telecommunication and parallel computing systems for many years. The new challenge facing optical MIN is crosstalk, which is caused by coupling two signals within a switching element. Crosstalk is not too big an issue in the Electrical Domain, but due to the stringent Bit Error Rate (BER) constraint, it is a big major concern in the Optical Domain. In this research dissertation, we will study the blocking probability in the optical network and we will study the deterministic conditions for strictly non-blocking Vertical Stacked Optical Benes Networks (VSOBN) with and without worst-case scenarios. We will establish the upper bound on blocking probability of Vertical Stacked Optical Benes Networks with respect to the number of planes used when the non-blocking requirement is not met. We will then study routing in WDM Benes networks and propose a new routing algorithm so that the number of wavelengths can be reduced. Since routing in WDM optical network is an NP-hard problem, many heuristic algorithms are designed by many researchers to perform this routing. We will also develop a genetic algorithm, simulated annealing algorithm and ant colony technique and apply these AI algorithms to route the connections in WDM Benes network

    Active Processor Scheduling Using Evolution Algorithms

    Get PDF
    The allocation of processes to processors has long been of interest to engineers. The processor allocation problem considered here assigns multiple applications onto a computing system. With this algorithm researchers could more efficiently examine real-time sensor data like that used by United States Air Force digital signal processing efforts or real-time aerosol hazard detection as examined by the Department of Homeland Security. Different choices for the design of a load balancing algorithm are examined in both the problem and algorithm domains. Evolutionary algorithms are used to find near-optimal solutions. These algorithms incorporate multiobjective coevolutionary and parallel principles to create an effective and efficient algorithm for real-world allocation problems. Three evolutionary algorithms (EA) are developed. The primary algorithm generates a solution to the processor allocation problem. This allocation EA is capable of evaluating objectives in both an aggregate single objective and a Pareto multiobjective manner. The other two EAs are designed for fine turning returned allocation EA solutions. One coevolutionary algorithm is used to optimize the parameters of the allocation algorithm. This meta-EA is parallelized using a coarse-grain approach to improve performance. Experiments are conducted that validate the improved effectiveness of the parallelized algorithm. Pareto multiobjective approach is used to optimize both effectiveness and efficiency objectives. The other coevolutionary algorithm generates difficult allocation problems for testing the capabilities of the allocation EA. The effectiveness of both coevolutionary algorithms for optimizing the allocation EA is examined quantitatively using standard statistical methods. Also the allocation EAs objective tradeoffs are analyzed and compared

    Doctor of Philosophy

    Get PDF
    dissertationPortable electronic devices will be limited to available energy of existing battery chemistries for the foreseeable future. However, system-on-chips (SoCs) used in these devices are under a demand to offer more functionality and increased battery life. A difficult problem in SoC design is providing energy-efficient communication between its components while maintaining the required performance. This dissertation introduces a novel energy-efficient network-on-chip (NoC) communication architecture. A NoC is used within complex SoCs due it its superior performance, energy usage, modularity, and scalability over traditional bus and point-to-point methods of connecting SoC components. This is the first academic research that combines asynchronous NoC circuits, a focus on energy-efficient design, and a software framework to customize a NoC for a particular SoC. Its key contribution is demonstrating that a simple, asynchronous NoC concept is a good match for low-power devices, and is a fruitful area for additional investigation. The proposed NoC is energy-efficient in several ways: simple switch and arbitration logic, low port radix, latch-based router buffering, a topology with the minimum number of 3-port routers, and the asynchronous advantages of zero dynamic power consumption while idle and the lack of a clock tree. The tool framework developed for this work uses novel methods to optimize the topology and router oorplan based on simulated annealing and force-directed movement. It studies link pipelining techniques that yield improved throughput in an energy-efficient manner. A simulator is automatically generated for each customized NoC, and its traffic generators use a self-similar message distribution, as opposed to Poisson, to better match application behavior. Compared to a conventional synchronous NoC, this design is superior by achieving comparable message latency with half the energy
    corecore