15 research outputs found

    Fast Broadcast and Multicast in Wormhole Multistage Networks with Multidestination Worms

    No full text
    : This paper presents a new approach to implement fast broadcast and multicast operations in bidirectional wormhole Multistage Interconnection Networks (MIN) with loopback, as used in IBM SP1/SP2 network. The novelty lies in using multidestination message passing mechanism instead of single destination (unicast) messages. For broadcast/multicast operation, it is shown that a single worm with multiple destinations is sufficient to allow pipelined replication of flits at appropriate intermediate switches and deliver copies to the required destinations. For higher communication start-up (t s ), for an n-processor system, this new approach leads to an asymptotic improvement by a factor of dlog 2 ne compared to the unicast-based messagepassing. Two schemes for broadcast and multicast are presented together with the necessary architectural supports at a switch-level. Storage requirements at a switch to ensure deadlock freedom are also derived. These schemes are evaluated and compared with t..

    Reliable Hardware Barrier Synchronization Schemes

    No full text
    Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier through software, hardware, or a combination of these mechanisms. However, none of these schemes emphasize fault-tolerant barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardware-based barrier synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated fault-tolerant message-passing protocols are presented. The protocols are optimized for the no-fault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme is evaluated with and without specialized support at the network interface and compared with similar approaches using software-based schemes. It promises significant potential to be applied to switch-based parallel systems, e..

    HIPIQS: A High-Performance Switch Architecture using Input Queuing

    No full text
    Switch-based interconnects are used in a number of application domains including parallel system interconnects, local area networks, and wide area networks. However, very few switches have been designed that are suitable for more than one of these application domains. Such a switch must offer both extremely low latency and very high throughput for a variety of different message sizes. While some architectures with output queuing have been shown to perform extremely well in terms of throughput, their performance can suffer when used in systems where a significant portion of the packets are extremely small. On the other hand, architectures with input queuing offer limited throughput, or require fairly complex and centralized arbitration that increases latency. In this paper we present a new input queue-based switch architecture called HIPIQS (HIgh-Performance Input-Queued Switch). It offers low latency for a range of message sizes, and provides throughput comparable to that of output qu..

    Multicasting in Irregular Networks with Cut-Through Switches using Tree-Based Multidestination Worms

    No full text
    . Multidestination message passing has been proposed as a mechanism to achieve efficient multicast in regular direct and indirect networks. The application of this technique to parallel systems based on irregular networks has, however, not been studied. In this paper we propose two schemes for performing multicast using multidestination worms on irregular networks and examine the extent to which multidestination message passing can improve multicast performance. For each of the schemes we propose solutions for the associated problems such as, methods for encoding and decoding multidestination headers, alterations to the setup algorithm run by the switches, logic to perform header manipulation, etc. We perform extensive simulations to evaluate our schemes under a variety of changing parameters: software startup overhead per message, system size, switch size, message length, and degree of connectivity. Our results establish that even a very naive multicasting algorithm using multidestina..

    Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact

    No full text
    Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is non-trivial. In this paper we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch-based parallel systems. First, we discuss issues related to such implementation (deadlock-freedom, replication mechanisms, header encoding, and routing). Next, we demonstrate how an existing central-buffer-based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing. Similarly, implementing multidestination worms on an input-buffer-based switch architecture is discussed. Both of these implementations are evaluated against each other as well as against a software-based scheme using the central buffer organization. Simulation experiments und..

    A Reliable Hardware Barrier Synchronization Scheme

    No full text
    Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier synchronization through software, hardware, or a combination of these mechanisms. However, few of these schemes emphasize fault-tolerant barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardware-based barrier synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated fault-tolerant message-passing protocols are presented. The protocols are optimized for the no-fault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme shows significant potential for use in parallel systems, especially the emerging systems based on networks of workstations. 1. Introduction Barrier synchronization, or barrier-sync, is a crucial collective co..

    PASSION Runtime Library for the Intel Paragon

    Get PDF
    We are developing a runtime library which provides a number of routines to perform the I/O required in parallel applications in an efficient and convenient manner. This is part of a project called PASSION, which aims to provide software support for high-performance parallel I/O at the compiler, runtime and file system levels. The PASSION Runtime Library uses a high-level interface which makes it easy for the user to specify the I/O required in the program. The user only needs to specify what portion of the data structure needs to read from or written to the file, and the PASSION routines will perform all the necessary I/O efficiently. This paper gives an overview of the PASSION Runtime Library and describes in detail its high-level interface. 1 Introduction Parallel computers are becoming increasingly powerful day by day. This has made possible the solution of many problems which were previously considered intractable. These include large scale applications in physics, chemistry, bio..

    Where to Provide Support for Efficient Multicasting in Irregular Networks: Network Interface or Switch?

    No full text
    Recent research has proposed methods for enhancing the performance of multicast in networks with irregular topologies. These methods fall into two broad categories: (a) network interface (NI) based schemes that make use of enhanced functionality of the software/firmware running at the NI processor, and (b) switch-based methods that use enhancements to the switch architecture to support hardware multicast. However, it is not clear how these methods compare to each other and when it makes sense to use one over the other. In order to answer such questions, we perform a number of simulation experiments to compare the performance of three efficient multicasting schemes: an NI-based multicasting scheme that uses a k-binomial tree [5], a switch-based multicasting scheme that uses path-based multidestination worms [4], and a switchbased multicasting scheme that uses a single tree-based multidestination worm [14]. We first study the performance of the three schemes for single multicast traffic ..
    corecore