115 research outputs found

    Distributed Memory Compiler Methods for Irregular Problems -- Data Copy Reuse and Runtime Partitioning

    Get PDF
    This paper outlines two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on an iPSC/860 to demonstrate the usefulness of our methods

    Compiling global name-space programs for distributed execution

    Get PDF
    Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented

    The Architecture and Programming of a Fine-Grain Multicomputer

    Get PDF
    The research presented in this thesis was conducted in the context of the Mosaic C, an experimental, fine-grain multicomputer. The objective of the Mosaic experiment was to develop a concurrent-computing system with maximum performance per unit cost, while still retaining a general-purpose application span. A stipulation of the Mosaic project was that the complexity of a Mosaic node be limited by the silicon complexity available on a single VLSI chip. The two most important original results reported in the thesis are: (1) The design and implementation of C+-, a concurrent, object-oriented programming system. Syntactically, C+- is an extension of C++. The concurrent semantics of C+- are contained within the process concept. A C+- process is analogous to a C++ object, but it is also an autonomous computing agent, and a unit of potential concurrency. Atomic single-process updates that can be individually enabled and disabled are the execution units of the concurrent computation. The limited set of primitives that C+- provides is shown to be sufficient to express a variety of concurrent-programming problems concisely and efficiently. An important design requirement for C+- was that efficient implementations should exist on a variety of concurrent architectures, and, in particular, on the simple and inexpensive hardware of the Mosaic node. The Mosaic runtime system was written entirely in C+-. (2) Pipeline synchronization, a novel, generally- applicable technique for hardware synchronization. This technique is a simple, low-cost, high-bandwidth, high- reliability solution to interfaces between synchronous and asynchronous systems, or between synchronous systems operating from different clocks. The technique can sustain the full communication bandwidth and achieve an arbitrarily low, non-zero probability of synchronization failure, Pf, with the price in both latency and chip area being O(log 1/Pf). Pipeline synchronization has been successfully applied to the highperformance inter-computer communication in Mosaic node ensembles

    An Application Perspective on High-Performance Computing and Communications

    Get PDF
    We review possible and probable industrial applications of HPCC focusing on the software and hardware issues. Thirty-three separate categories are illustrated by detailed descriptions of five areas -- computational chemistry; Monte Carlo methods from physics to economics; manufacturing; and computational fluid dynamics; command and control; or crisis management; and multimedia services to client computers and settop boxes. The hardware varies from tightly-coupled parallel supercomputers to heterogeneous distributed systems. The software models span HPF and data parallelism, to distributed information systems and object/data flow parallelism on the Web. We find that in each case, it is reasonably clear that HPCC works in principle, and postulate that this knowledge can be used in a new generation of software infrastructure based on the WebWindows approach, and discussed in an accompanying paper
    corecore