Search CORE

13 research outputs found

Mapping unstructured grid problems to the connection machine

Author: Hammond Steven W.
Schreiber Robert
Publication venue
Publication date
Field of study

We present a highly parallel graph mapping technique that enables one to solve unstructured grid problems on massively parallel computers. Many implicit and explicit methods for solving discretizated partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The time spent communicating can limit the high performance promised by massively parallel computing. To eliminate this bottleneck, we map the graph of the irregular problem to the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. We show that, in comparison to a naive assignment of processors, our heuristic mapping algorithm significantly reduces the communication time on the Connection Machine, CM-2

NASA Technical Reports Server

Load Redistribution on Hypercubes in the Presence of Faults

Author: Ranka Sanjay
Wang Jhy-Chun
Publication venue: SURFACE at Syracuse University
Publication date: 01/07/1990
Field of study

In this paper, we present load redistribution algorithms for hypercubes in the presence of faults. Our algorithms complete in low-order polynomial of the number of faulty nodes and exhibit excellent experimental performance. These algorithms are topology independent and can be applied to a wide variety of networks

Syracuse University Research Facility and Collaborative Environment

Neural Networks and Dynamic Complex Systems

Author: Fox Geoffrey
Furmanski Wojtek
Ho Alex
Koller Jeff
Simic Petar
Wong Isaac
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/1989
Field of study

We describe the use of neural networks for optimization and inference associated with a variety of complex systems. We show how a string formalism can be used for parallel computer decomposition, message routing and sequential optimizing compilers. We extend these ideas to a general treatment of spatial assessment and distributed artificial intelligence

Caltech Authors

Placement de processus sur des architectures complexes et des partitions d'icelles

Author: Lachat Cédric
Pellegrini François
Publication venue: HAL CCSD
Publication date: 21/12/2017
Field of study

Data locality is a critical issue in order to achieve performance on today's high-end parallel machines. As these machines are highly non-uniform, distributing computations across their processing elements does not only require to minimize inter-process communication, but also to favor local communication over distant communication. For that purpose, static and/or dynamic (re)mapping tools have been devised, that allow one to map process graphs onto architecture graphs describing the topology and architectural features of such machines. However, in practice, the real problem to solve is to map a process graph onto possibly disconnected parts of a non-uniform parallel machine, such as a set of nodes provided by some batch scheduler.This paper presents a set of algorithms to perform this task in an efficient way. Efficiency is achieved thanks to a multilevel description of target architectures. All the presented algorithms have been implemented in the \scotch\ static mapping software. Experiments evidence the quality of the produced mappings.La localité des données est une question critique afin d'obtenir des performances sur les machines massivement parallèles actuelles. Comme ces machines sont hautement non-uniformes, distribuer efficacement les calculs sur leurs éléments de traitement ne nécessite pas seulement de minimiser la communication inter-processus, mais aussi de favoriser la communication locale par rapport à la communication distante. Dans ce but, des outils de (re)placement statique et/ou dynamique ont été conçus, qui permettent de placer des graphes de processus sur des graphes d'architecture représentant la topologie et les caractéristiques architecturales de ces machines. Cependant, en pratique, le vrai problème à résoudre est de placer un graphe de processus sur des parties potentiellement déconnectées d'une machine parallèle non uniforme, telles que des ensembles de nœuds attribués par un ordonnanceur batch.Cet article présente un ensemble d'algorithmes effectuant cette tâche d'une façon efficace. L'efficacité est obtenue grâce à une descriptionmulti-niveaux des architectures cibles. Tous les algorithmes présentés ici ont été implémentés dans le logiciel de placement statiqueScotch. Des expérimentations illustrent la qualité des placements produits

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Neural Networks and Dynamic Complex Systems

Author: Fox Geoffrey
Furmanski Wojtek
Ho Alex
Koller Jeff
Simic Petar
Wong Isaac
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/1989
Field of study

Predicting application performance using supervised learning on communication features

Author: Bhatele A
Gamblin T
Jain N
Kale L V
Robson M P
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Abstract not provide

Crossref

UNT Digital Library

Smith College: Smith ScholarWorks

Mapping of portable parallel programs

Author: Chen Song
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1995
Field of study

An efficient parallel program designed for a parallel architecture includes a detailed outline of accurate assignments of concurrent computations onto processors, and data transfers onto communication links, such that the overall execution time is minimized. This process may be complex depending on the application task and the target multiprocessor architecture. Furthermore, this process is to be repeated for every different architecture even though the application task may be the same. Consequently, this has a major impact on the ever increasing cost of software development for multiprocessor systems. A remedy for this problem would be to design portable parallel programs which can be mapped efficiently onto any computer system. In this dissertation, we present a portable programming tool called Cluster-M. The three components of Cluster-M are the Specification Module, the Representation Module, and the Mapping Module. In the Specification Module, for a given problem, a machine-independent program is generated and represented in the form of a clustered task graph called Spec graph. Similarly, in the Representation Module, for a given architecture or heterogeneous suite of computers, a clustered system graph called Rep graph is generated. The Mapping Module is responsible for efficient mapping of Spec graphs onto Rep graphs. As part of this module, we present the first algorithm which produces a near-optimal mapping of an arbitrary non-uniform machine-independent task graph with M modules, onto an arbitrary non-uniform task-independent system graph having N processors, in 0(M P) time, where P = max(M, N). Our experimental results indicate that Cluster-M produces better or similar mapping results compared to other leading techniques which work only for restricted task or system graphs

Digital Commons @ New Jersey Institute of Technology (NJIT)