147 research outputs found

    Reconfiguration for Fault Tolerance and Performance Analysis

    Get PDF
    Architecture reconfiguration, the ability of a system to alter the active interconnection among modules, has a history of different purposes and strategies. Its purposes develop from the relatively simple desire to formalize procedures that all processes have in common to reconfiguration for the improvement of fault-tolerance, to reconfiguration for performance enhancement, either through the simple maximizing of system use or by sophisticated notions of wedding topology to the specific needs of a given process. Strategies range from straightforward redundancy by means of an identical backup system to intricate structures employing multistage interconnection networks. The present discussion surveys the more important contributions to developments in reconfigurable architecture. The strategy here is in a sense to approach the field from an historical perspective, with the goal of developing a more coherent theory of reconfiguration. First, the Turing and von Neumann machines are discussed from the perspective of system reconfiguration, and it is seen that this early important theoretical work contains little that anticipates reconfiguration. Then some early developments in reconfiguration are analyzed, including the work of Estrin and associates on the fixed plus variable restructurable computer system, the attempt to theorize about configurable computers by Miller and Cocke, and the work of Reddi and Feustel on their restructable computer system. The discussion then focuses on the most sustained systems for fault tolerance and performance enhancement that have been proposed. An attempt will be made to define fault tolerance and to investigate some of the strategies used to achieve it. By investigating four different systems, the Tandern computer, the C.vmp system, the Extra Stage Cube, and the Gamma network, the move from dynamic redundancy to reconfiguration is observed. Then reconfiguration for performance enhancement is discussed. A survey of some proposals is attempted, then the discussion focuses on the most sustained systems that have been proposed: PASM, the DC architecture, the Star local network, and the NYU Ultracomputer. The discussion is organized around a comparison of control, scheduling, communication, and network topology. Finally, comparisons are drawn between fault tolerance and performance enhancement, in order to clarify the notion of reconfiguration and to reveal the common ground of fault tolerance and performance enhancement as well as the areas in which they diverge. An attempt is made in the conclusion to derive from this survey and analysis some observations on the nature of reconfiguration, as well as some remarks on necessary further areas of research

    Expanded delta networks for very large parallel computers

    Get PDF
    In this paper we analyze a generalization of the traditional delta network, introduced by Patel [21], and dubbed Expanded Delta Network (EDN). These networks provide in general multiple paths that can be exploited to reduce contention in the network resulting in increased performance. The crossbar and traditional delta networks are limiting cases of this class of networks. However, the delta network does not provide the multiple paths that the more general expanded delta networks provide, and crossbars are to costly to use for large networks. The EDNs are analyzed with respect to their routing capabilities in the MIMD and SIMD models of computation.The concepts of capacity and clustering are also addressed. In massively parallel SIMD computers, it is the trend to put a larger number processors on a chip, but due to I/O constraints only a subset of the total number of processors may have access to the network. This is introduced as a Restricted Access Expanded Delta Network of which the MasPar MP-1 router network is an example

    Parallel software caches

    Get PDF
    We investigate the construction and application of parallel software caches in shared memory multiprocessors. In contrast to maintaining a private cache for each thread, a parallel cache allows the re-use of results of lengthy computations by other threads. This is especially important in irregular applications where the re-use of intermediate results by scheduling is not possible. Example applications are the computation of intersections between a scanline and a polygon in computational geometry, and the computation of intersections between rays and objects in ray tracing. A parallel software cache is based on a readers/writers lock, i.e. as long as no thread alters the cache data structure, multiple threads may read simultaneously. If a thread wants to alter the cache because of a cache miss, it waits until all other threads have left the data structure, then it can update the contents of the cache. Other threads can access the cache only after the writer has finished its work. To increase utilization, the cache has a number of slots that can be locked separately. We investigate the tradeoff between slot size, search time in the cache, and the time to re-compute a cache entry. Another major difference between sequential and parallel software caches is the replacement strategy. We adapt classic replacement strategies such as LRU and random replacement for parallel caches. As execution platform, we use the SB-PRAM, but the concepts might be portable to machines such as NYU Ultracomputer, Tera MTA, and Stanford DASH

    A note on implementing combining networks

    Get PDF
    In shared-memory multiprocessors, combining networks serve to eliminate hot spots due to concurrent access to the same memory location. Examples are the NYU Ultracomputer, the IBM RP3 and the Fluent Machine. We present a problem that occurs when one tries to implement the Fluent Machine`s network nodes with network chips that do not know their position within the network. We formulate the problem mathematically and present two solutions. The first solution requires some additional hardware around nodes that can be put outside network chips. The second solution requires a minor modification of the routing algorithm, but one can prove that there is no performance loss

    Shared versus distributed memory multiprocessors

    Get PDF
    The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors

    Granularity of parallel memories

    Get PDF
    Consider algorithms which are designed for shared memory models of parallel computation in which processors are allowed to have fairly unrestricted access patterns to the shared memory. General fast simulations of such algorithms by parallel machines in which the shared memory is organized in modules where only one cell of each module can be accessed at a time are proposed. The paper provides a comprehensive study of the problem. The solution involves three stages: (a) Before a simulation, distribute randomly the memory addresses among the memory modules. (b) Keep several copies of each address and assign memory requests of processors to the "right\u27; copies at any time. (c) Satisfy these assigned memory requests according to specifications of the parallel machine
    corecore