13 research outputs found

    Mapping unstructured grid problems to the connection machine

    Get PDF
    We present a highly parallel graph mapping technique that enables one to solve unstructured grid problems on massively parallel computers. Many implicit and explicit methods for solving discretizated partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The time spent communicating can limit the high performance promised by massively parallel computing. To eliminate this bottleneck, we map the graph of the irregular problem to the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. We show that, in comparison to a naive assignment of processors, our heuristic mapping algorithm significantly reduces the communication time on the Connection Machine, CM-2

    Load Redistribution on Hypercubes in the Presence of Faults

    Get PDF
    In this paper, we present load redistribution algorithms for hypercubes in the presence of faults. Our algorithms complete in low-order polynomial of the number of faulty nodes and exhibit excellent experimental performance. These algorithms are topology independent and can be applied to a wide variety of networks

    Neural Networks and Dynamic Complex Systems

    Get PDF
    We describe the use of neural networks for optimization and inference associated with a variety of complex systems. We show how a string formalism can be used for parallel computer decomposition, message routing and sequential optimizing compilers. We extend these ideas to a general treatment of spatial assessment and distributed artificial intelligence

    Placement de processus sur des architectures complexes et des partitions d'icelles

    Get PDF
    Data locality is a critical issue in order to achieve performance on today's high-end parallel machines. As these machines are highly non-uniform, distributing computations across their processing elements does not only require to minimize inter-process communication, but also to favor local communication over distant communication. For that purpose, static and/or dynamic (re)mapping tools have been devised, that allow one to map process graphs onto architecture graphs describing the topology and architectural features of such machines. However, in practice, the real problem to solve is to map a process graph onto possibly disconnected parts of a non-uniform parallel machine, such as a set of nodes provided by some batch scheduler.This paper presents a set of algorithms to perform this task in an efficient way. Efficiency is achieved thanks to a multilevel description of target architectures. All the presented algorithms have been implemented in the \scotch\ static mapping software. Experiments evidence the quality of the produced mappings.La localité des données est une question critique afin d'obtenir des performances sur les machines massivement parallèles actuelles. Comme ces machines sont hautement non-uniformes, distribuer efficacement les calculs sur leurs éléments de traitement ne nécessite pas seulement de minimiser la communication inter-processus, mais aussi de favoriser la communication locale par rapport à la communication distante. Dans ce but, des outils de (re)placement statique et/ou dynamique ont été conçus, qui permettent de placer des graphes de processus sur des graphes d'architecture représentant la topologie et les caractéristiques architecturales de ces machines. Cependant, en pratique, le vrai problème à résoudre est de placer un graphe de processus sur des parties potentiellement déconnectées d'une machine parallèle non uniforme, telles que des ensembles de nœuds attribués par un ordonnanceur batch.Cet article présente un ensemble d'algorithmes effectuant cette tâche d'une façon efficace. L'efficacité est obtenue grâce à une descriptionmulti-niveaux des architectures cibles. Tous les algorithmes présentés ici ont été implémentés dans le logiciel de placement statiqueScotch. Des expérimentations illustrent la qualité des placements produits

    Neural Networks and Dynamic Complex Systems

    Get PDF
    We describe the use of neural networks for optimization and inference associated with a variety of complex systems. We show how a string formalism can be used for parallel computer decomposition, message routing and sequential optimizing compilers. We extend these ideas to a general treatment of spatial assessment and distributed artificial intelligence

    Predicting application performance using supervised learning on communication features

    Get PDF
    Abstract not provide

    Mapping of portable parallel programs

    Get PDF
    An efficient parallel program designed for a parallel architecture includes a detailed outline of accurate assignments of concurrent computations onto processors, and data transfers onto communication links, such that the overall execution time is minimized. This process may be complex depending on the application task and the target multiprocessor architecture. Furthermore, this process is to be repeated for every different architecture even though the application task may be the same. Consequently, this has a major impact on the ever increasing cost of software development for multiprocessor systems. A remedy for this problem would be to design portable parallel programs which can be mapped efficiently onto any computer system. In this dissertation, we present a portable programming tool called Cluster-M. The three components of Cluster-M are the Specification Module, the Representation Module, and the Mapping Module. In the Specification Module, for a given problem, a machine-independent program is generated and represented in the form of a clustered task graph called Spec graph. Similarly, in the Representation Module, for a given architecture or heterogeneous suite of computers, a clustered system graph called Rep graph is generated. The Mapping Module is responsible for efficient mapping of Spec graphs onto Rep graphs. As part of this module, we present the first algorithm which produces a near-optimal mapping of an arbitrary non-uniform machine-independent task graph with M modules, onto an arbitrary non-uniform task-independent system graph having N processors, in 0(M P) time, where P = max(M, N). Our experimental results indicate that Cluster-M produces better or similar mapping results compared to other leading techniques which work only for restricted task or system graphs
    corecore