51 research outputs found

    Parallel Computers and Complex Systems

    Get PDF
    We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

    A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

    Get PDF
    A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm

    Distributed Simulation of High-Level Algebraic Petri Nets

    Get PDF
    In the field of Petri nets, simulation is an essential tool to validate and evaluate models. Conventional simulation techniques, designed for their use in sequential computers, are too slow if the system to simulate is large or complex. The aim of this work is to search for techniques to accelerate simulations exploiting the parallelism available in current, commercial multicomputers, and to use these techniques to study a class of Petri nets called high-level algebraic nets. These nets exploit the rich theory of algebraic specifications for high-level Petri nets: Petri nets gain a great deal of modelling power by representing dynamically changing items as structured tokens whereas algebraic specifications turned out to be an adequate and flexible instrument for handling structured items. In this work we focus on ECATNets (Extended Concurrent Algebraic Term Nets) whose most distinctive feature is their semantics which is defined in terms of rewriting logic. Nevertheless, ECATNets have two drawbacks: the occultation of the aspect of time and a bad exploitation of the parallelism inherent in the models. Three distributed simulation techniques have been considered: asynchronous conservative, asynchronous optimistic and synchronous. These algorithms have been implemented in a multicomputer environment: a network of workstations. The influence that factors such as the characteristics of the simulated models, the organisation of the simulators and the characteristics of the target multicomputer have in the performance of the simulations have been measured and characterised. It is concluded that synchronous distributed simulation techniques are not suitable for the considered kind of models, although they may provide good performance in other environments. Conservative and optimistic distributed simulation techniques perform well, specially if the model to simulate is complex or large - precisely the worst case for traditional, sequential simulators. This way, studies previously considered as unrealisable, due to their exceedingly high computational cost, can be performed in reasonable times. Additionally, the spectrum of possibilities of using multicomputers can be broadened to execute more than numeric applications

    Optimal processor assignment for pipeline computations

    Get PDF
    The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered

    Mapping large-scale FEM-graphs to highly parallel computers with grid-like topology by self-organization

    Get PDF
    We consider the problem of mapping large scale FEM graphs for the solution of partial differential equations to highly parallel distributed memory computers. Typically, these programs show a low-dimensional grid-like communication structure. We argue that conventional domain decomposition methods that are usually employed today are not well suited for future highly parallel computers as they do not take into account the interconnection structure of the parallel computer resulting in a large communication overhead. Therefore we propose a new mapping heuristic which performs both, partitioning of the solution domain and processor allocation in one integrated step. Our procedure is based on the ability of Kohonen neural networks to exploit topological similarities of an input space and a grid-like structured network to compute a neighborhood preserving mapping between the set of discretization points and the parallel computer. We report about results of mapping up to 44,000-node FEM graphs to a 4096-processor parallel computer and demonstrate the capability of the proposed scheme for dynamic remapping considering adaptive refinement of the discretization graph

    A new load balancing heuristic using self-organizing maps

    Get PDF
    Ankara : The Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent Univ., 1999.Thesis (Master's) -- Bilkent University, 1999.Includes bibliographical references leaves 68-71.Atun, MuratM.S

    MPS : a multiagent production system /

    Full text link

    A grammar-based technique for genetic search and optimization

    Get PDF
    The genetic algorithm (GA) is a robust search technique which has been theoretically and empirically proven to provide efficient search for a variety of problems. Due largely to the semantic and expressive limitations of adopting a bitstring representation, however, the traditional GA has not found wide acceptance in the Artificial Intelligence community. In addition, binary chromosones can unevenly weight genetic search, reduce the effectiveness of recombination operators, make it difficult to solve problems whose solution schemata are of high order and defining length, and hinder new schema discovery in cases where chromosome-wide changes are required.;The research presented in this dissertation describes a grammar-based approach to genetic algorithms. Under this new paradigm, all members of the population are strings produced by a problem-specific grammar. Since any structure which can be expressed in Backus-Naur Form can thus be manipulated by genetic operators, a grammar-based GA strategy provides a consistent methodology for handling any population structure expressible in terms of a context-free grammar.;In order to lend theoretical support to the development of the syntactic GA, the concept of a trace schema--a similarity template for matching the derivation traces of grammar-defined rules--was introduced. An analysis of the manner in which a grammar-based GA operates yielded a Trace Schema Theorem for rule processing, which states that above-average trace schemata containing relatively few non-terminal productions are sampled with increasing frequency by syntactic genetic search. Schemata thus serve as the building blocks in the construction of the complex rule structures manipulated by syntactic GAs.;As part of the research presented in this dissertation, the GEnetic Rule Discovery System (GERDS) implementation of the grammar-based GA was developed. A comparison between the performance of GERDS and the traditional GA showed that the class of problems solvable by a syntactic GA is a superset of the class solvable by its binary counterpart, and that the added expressiveness greatly facilitates the representation of GA problems. to strengthen that conclusion, several experiments encompassing diverse domains were performed with favorable results

    A specification-based design tool for artificial neural networks.

    Get PDF
    Wong Wai.Thesis (M.Phil.)--Chinese University of Hong Kong, 1992.Includes bibliographical references (leaves 78-80).Chapter 1. --- Introduction --- p.1Chapter 1.1. --- Specification Environment --- p.2Chapter 1.2. --- Specification Analysis --- p.2Chapter 1.3. --- Outline --- p.3Chapter 2. --- Survey --- p.4Chapter 2.1. --- Concurrence Specification --- p.4Chapter 2.1.1. --- Sequential Approach --- p.5Chapter 2.1.2. --- Mapping onto Concurrent Architecture --- p.6Chapter 2.1.3. --- Automatic Concurrence Introduction --- p.7Chapter 2.2. --- Specification Analysis --- p.8Chapter 2.2.1. --- Motivation --- p.8Chapter 2.2.2. --- Cyclic Dependency --- p.8Chapter 3. --- The Design Tool --- p.11Chapter 3.1. --- Specification Environment --- p.11Chapter 3.1.1. --- Framework --- p.11Chapter 3.1.1.1. --- Formal Neurons --- p.12Chapter 3.1.1.2. --- Configuration --- p.12Chapter 3.1.1.3. --- Control Neuron --- p.13Chapter 3.1.2. --- Dataflow Specification --- p.14Chapter 3.1.2.1. --- Absence of Control Information --- p.14Chapter 3.1.2.2. --- Single-Valued Variables & Explicit Time Indices --- p.14Chapter 3.1.2.3. --- Explicit Notations --- p.15Chapter 3.1.3. --- User Interface --- p.15Chapter 3.2. --- Specification Analysis --- p.16Chapter 3.2.1. --- Data Dependency Analysis --- p.16Chapter 3.2.2. --- Attribute Analysis --- p.16Chapter 4. --- BP-Net Specification --- p.18Chapter 4.1. --- BP-Net Paradigm --- p.18Chapter 4.1.1. --- Neurons of a BP-Net --- p.18Chapter 4.1.2. --- Configuration of BP-Net --- p.20Chapter 4.2. --- Constant Declarations --- p.20Chapter 4.3. --- Formal Neuron Specification --- p.21Chapter 4.3.1. --- Mapping the Paradigm --- p.22Chapter 4.3.1.1. --- Mapping Symbols onto Parameter Names --- p.22Chapter 4.3.1.2. --- Mapping Neuron Equations onto Internal Functions --- p.22Chapter 4.3.2. --- Form Entries --- p.23Chapter 4.3.2.1. --- Neuron Type Entry --- p.23Chapter 4.3.2.2. --- "Input, Output and Internal Parameter Entries" --- p.23Chapter 4.3.2.3. --- Initial Value Entry --- p.25Chapter 4.3.2.4. --- Internal Function Entry --- p.25Chapter 4.4. --- Configuration Specification --- p.28Chapter 4.4.1. --- Fonn Entries --- p.29Chapter 4.4.1.1. --- Neuron Label Entry --- p.29Chapter 4.4.1.2. --- Neuron Character Entry --- p.30Chapter 4.4.1.3. --- Connection Pattern Entry --- p.31Chapter 4.4.2. --- Characteristics of the Syntax --- p.33Chapter 4.5. --- Control Neuron Specification --- p.34Chapter 4.5.1. --- Form Entries --- p.35Chapter 4.5.1.1. --- "Global Input, Output, Parameter & Initial Value Entries" --- p.35Chapter 4.5.1.2. --- Input & Output File Entries --- p.36Chapter 4.5.1.3. --- Global Function Entry --- p.36Chapter 5. --- Data Dependency Analysis_ --- p.40Chapter 5.1. --- Graph Construction --- p.41Chapter 5.1.1. --- Simplification and Normalization --- p.41Chapter 5.1.1.1. --- Removing Non-Esscntial Information --- p.41Chapter 5.1.1.2. --- Removing File Record Parameters --- p.42Chapter 5.1.1.3. --- Rearranging Temporal offset --- p.42Chapter 5.1.1.4. --- Conservation of Temporal Relationship --- p.43Chapter 5.1.1.5. --- Zero/Negative Offset for Determining Parameters --- p.43Chapter 5.1.2. --- Internal Dependency Graphs (IDGs) --- p.43Chapter 5.1.3. --- IDG of Control Neuron (CnIDG) --- p.45Chapter 5.1.4. --- Global Dependency Graphs (GDGs) --- p.45Chapter 5.2. --- Cycle Detection --- p.48Chapter 5.2.1. --- BP-Net --- p.48Chapter 5.2.2. --- Other Examples --- p.49Chapter 5.2.2.1. --- The Perceptron --- p.50Chapter 5.2.2.2. --- The Boltzmann Machinc --- p.51Chapter 5.2.3. --- Number of Cycles --- p.52Chapter 5.2.3.1. --- Different Number of Layers --- p.52Chapter 5.2.3.2. --- Different Network Types --- p.52Chapter 5.2.4. --- Cycle Length --- p.53Chapter 5.2.4.1. --- Different Number of Layers --- p.53Chapter 5.2.4.2. --- Comparison Among Different Networks --- p.53Chapter 5.2.5. --- Difficulties in Analysis --- p.53Chapter 5.3. --- Dependency Cycle Analysis --- p.54Chapter 5.3.1. --- Temporal Index Analysis --- p.54Chapter 5.3.2. --- Non-Temporal Index Analysis --- p.55Chapter 5.3.2.1. --- A Simple Example --- p.55Chapter 5.3.2.2. --- Single Parameter --- p.56Chapter 5.3.2.3. --- Multiple Parameters --- p.57Chapter 5.3.3. --- Combined Method --- p.58Chapter 5.3.4. --- Scheduling --- p.58Chapter 5.3.4.1. --- Algorithm --- p.59Chapter 5.3.4.2. --- Schedule for the BP-Net --- p.59Chapter 5.4. --- Symmetry in Graph Construction --- p.60Chapter 5.4.1. --- Basic Approach --- p.60Chapter 5.4.2. --- Construction of the BP-Net GDG --- p.61Chapter 5.4.3. --- Limitation --- p.63Chapter 6. --- Attribute Analysis__ --- p.64Chapter 6.1. --- Parameter Analysis --- p.64Chapter 6.1.1. --- Internal Dependency Graphs (IDGs) --- p.65Chapter 6.1.1.1. --- Correct Properties of Parameters in IDGs --- p.65Chapter 6.1.1.2. --- Example --- p.65Chapter 6.1.2. --- Combined Internal Dependency Graphs (CIDG) --- p.66Chapter 6.1.2.1. --- Tests on Parameters of CIDG --- p.66Chapter 6.1.2.2. --- Example --- p.67Chapter 6.1.3. --- Finalized Neuron Obtained --- p.67Chapter 6.1 4. --- CIDG of the BP-Net --- p.68Chapter 6.2. --- Constraint Checking --- p.68Chapter 6.2.1. --- "Syntactic, Semantic and Simple Checkings" --- p.68Chapter 6.2.1.1. --- The Syntactic & Semantic Techniques --- p.68Chapter 6.2.1.2. --- Simple Matching --- p.70Chapter 6.2.2. --- Constraints --- p.71Chapter 6.2.2.1. --- Constraints on Formal Neuron --- p.71Chapter 6.2.2.2. --- Constraints on Configuration --- p.72Chapter 6.2.2.3. --- Constraints on Control Neuron --- p.73Chapter 6.3. --- Complete Checking Procedure --- p.73Chapter 7. --- Conclusions_ --- p.75Chapter 7.1. --- Limitations --- p.76Chapter 7.1.1. --- Exclusive Conditional Dependency Cycles --- p.76Chapter 7.1.2. --- Maximum Parallelism --- p.77Reference --- p.78Appendix --- p.1Chapter I. --- Form Syntax --- p.1Chapter A. --- Syntax Conventions --- p.1Chapter B. --- Form Definition --- p.1Chapter 1. --- Form Structure --- p.1Chapter 2. --- Constant Declaration --- p.1Chapter 3. --- Formal Neuron Declaration --- p.1Chapter 4. --- Configuration Declaration --- p.2Chapter 5. --- Control Neuron --- p.2Chapter 6. --- Supplementary Definition --- p.3Chapter II. --- Algorithms --- p.4Chapter III. --- Deadlock & Dependency Cycles --- p.14Chapter A. --- Deadlock Prevention --- p.14Chapter 1. --- Necessary Conditions for Deadlock --- p.14Chapter 2. --- Resource Allocation Graphs --- p.15Chapter 3. --- Cycles and Blocked Requests --- p.15Chapter B. --- Deadlock in ANN Systems --- p.16Chapter 1. --- Shared resources --- p.16Chapter 2. --- Presence of the Necessary Conditions for Deadlocks --- p.16Chapter 3. --- Operation Constraint for Communication --- p.16Chapter 4. --- Checkings Required --- p.17Chapter C. --- Data Dependency Graphs --- p.17Chapter 1. --- Simplifying Resource Allocation Graphs --- p.17Chapter 2. --- Expanding into Parameter Level --- p.18Chapter 3. --- Freezing the Request Edges --- p.18Chapter 4. --- Reversing the Edge Directions --- p.18Chapter 5. --- Mutual Dependency Cycles --- p.18Chapter IV. --- Case Studies --- p.19Chapter A. --- BP-Net --- p.19Chapter 1. --- Specification Forms --- p.19Chapter 2. --- Results After Simple Checkings --- p.21Chapter 3. --- Internal Dependency Graphs Construction --- p.21Chapter 4. --- Results From Parameter Analysis --- p.21Chapter 5. --- Global Dependency Graphs Construction --- p.21Chapter 6. --- Cycles Detection --- p.21Chapter 7. --- Time Subscript Analysis --- p.21Chapter 8. --- Subscript Analysis --- p.21Chapter 9. --- Scheduling --- p.21Chapter B. --- Perceptron --- p.21Chapter 1. --- Specification Forms --- p.22Chapter 2. --- Results After Simple Checkings --- p.24Chapter 3. --- Internal Dependency Graphs Construction --- p.24Chapter 4. --- Results From Parameter Analysis --- p.25Chapter 5. --- Global Dependency Graph Construction --- p.25Chapter 6. --- Cycles Detection --- p.25Chapter 7. --- Time Subscript Analysis --- p.25Chapter 8. --- Subscript Analysis --- p.25Chapter 9. --- Scheduling --- p.25Chapter C. --- Boltzmann Machine --- p.26Chapter 1. --- Specification Forms --- p.26Chapter 2. --- Results After Simple Checkings --- p.35Chapter 3. --- Graphs Construction --- p.35Chapter 4. --- Results From Parameter Analysis --- p.36Chapter 5. --- Global Dependency Graphs Construction --- p.36Chapter 6. --- Cycle Detection --- p.36Chapter 7. --- Time Subscript Analysis --- p.36Chapter 8. --- Subscript Analysis --- p.36Chapter 9. --- Scheduling --- p.3
    corecore