96 research outputs found

    Optimal Multicast with Packetization and Network Interface Support

    No full text
    : Most multicast algorithms proposed in the literature assume an arbitrarily long message being communicated as a single packet. Modern networks typically limit the size of the largest packet, and long messages are packetized and transmitted. Such networks also provide network interface support for nodes, which typically includes a coprocessor and memory, to implement the lower layers of the communication protocol. Such network interfaces can be programmed to support efficient multicasting to eliminate software overhead for higher layers during absorb and retransmit. In this paper, we present an optimal multicast algorithm for systems with such smart network interface support for packetization. Two implementations of smart network interface, the First-Child-First-Served (FCFS) and the First-Packet-First-Served (FPFS), are studied and compared. It is shown that the FPFS network interface support is more practical and efficient. Next, the multicast latency is modeled for the FPFS impleme..

    Building Efficient Limited Directory-Based DSMs: A Multidestination Message Passing Based

    No full text
    : A cost-effective distributed shared memory (DSM) system typically uses a limited directory protocol to enforce cache coherence. This paper presents a new family of protocols, called Limited directory with Region-based Broadcast (Limited-RB), to efficiently implement cache coherence in wormhole routed DSM systems. This protocol family uses multidestinationbased cache invalidation mechanisms to distribute invalidation requests to and collect the associated acknowledgments from separate regions. As a result, a write invalidation can be accomplished with fewer messages, less network traffic, and reduced occupancy at home nodes. These reductions contribute to decreasing invalidation latency and improving overall system performance. Directory organization under this new protocol is developed for 2D systems with e-cube routing and evaluated through simulations for a set of applications. The results indicate that with a small directory storage, the Limited-RB protocol family can achieve supe..

    Multicasting on Switch-based Irregular Networks using Multi-drop Path-based Multidestination Worms

    No full text
    : This paper presents a novel concept of multi-drop path-based multidestination message passing on switch-based irregular networks. First, the multi-drop mechanism is defined with an associated header encoding scheme, and this mechanism is used to develop path-based multidestination worms. Next, a method is proposed to identify valid multidestination paths on arbitrary irregular networks with a typical deadlock-free routing. Then, the deadlock-free property of multi-drop path-based worms is emphasized. Using the above concepts, three multicast algorithms are proposed: multi-drop path-based greedy (MDP-G), multi-drop path-based lessgreedy (MDP-LG), and multi-drop path-based binomial (MDP-B). The proposed algorithms are compared with each other and with the best unicast based algorithm, the CCO [5], for a range of system and technological parameters. The MDP-LG scheme is shown to be the best to implement multicast with reduced latency. Keywords: Interconnection network, collective commu..

    Fast Broadcast and Multicast in Wormhole Multistage Networks with Multidestination Worms

    No full text
    : This paper presents a new approach to implement fast broadcast and multicast operations in bidirectional wormhole Multistage Interconnection Networks (MIN) with loopback, as used in IBM SP1/SP2 network. The novelty lies in using multidestination message passing mechanism instead of single destination (unicast) messages. For broadcast/multicast operation, it is shown that a single worm with multiple destinations is sufficient to allow pipelined replication of flits at appropriate intermediate switches and deliver copies to the required destinations. For higher communication start-up (t s ), for an n-processor system, this new approach leads to an asymptotic improvement by a factor of dlog 2 ne compared to the unicast-based messagepassing. Two schemes for broadcast and multicast are presented together with the necessary architectural supports at a switch-level. Storage requirements at a switch to ensure deadlock freedom are also derived. These schemes are evaluated and compared with t..

    Efficient Schemes for Limited Directory-Based DSMs Using Multidestination Message Passing

    No full text
    : Many limited directory schemes have been proposed in the literature to build costeffective DSM systems. However, all these schemes are based on networks supporting only point-to-point (unicast) message passing. New generation networks are providing architectural support to implement collective communication operations (broadcast, multicast, reduction, etc.) with reduced latency. This paper explores the impact of such architectural support in designing efficient and more cost-effective limited directory schemes. The study is carried out for wormhole k-ary n-cube networks supporting multidestination message passing mechanism. The study in this paper is carried out along two major directions. First, a variation of the dir i B scheme to work with multidestination-based broadcast and gather operations is proposed. This scheme is defined as m dir i B scheme. Next, two coarse vector schemes (m dir i CB 1 and m dir i CB 2 ) are proposed to work efficiently with selective broadcast operation..

    Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-cube Systems

    No full text
    This paper identifies performance degradation in wormhole routed k-ary n-cube networks due to limited number of router-to-processor consumption channels at each node. Many recent research in wormhole routing have advocated the advantages of adaptive routing and virtual channel flow control schemes to deliver better network performance. However, most of these results are based on infinite message consumption capacity leading to unrealistic design guidelines. This paper indicates that the advantages associated with these schemes can not be realized with limited consumption capacity. To alleviate such performance bottleneck, a new solution using multiple consumption channels is proposed. It is shown that wormhole networks with higher routing adaptivity, dimensionality, degree of hot-spot traffic, and number of virtual networks have to take advantage of multiple consumption channels to deliver better performance. The interplay between system topology, routing algorithm, number of virtual c..

    Efficient Collective Communication on Heterogeneous Networks of Workstations

    No full text
    : Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and the multiplicity of vendors and platforms for NOW systems, the NOW environments are being gradually redefined as Heterogeneous Networks of Workstations (HNOW) environments. This paper presents a new framework for implementing collective communication operations (as defined by the Message Passing Interface (MPI) standard) efficiently for the emerging HNOW environments. We first classify different types of heterogeneity in HNOW and then focus on one important characteristic: communication capabilities of workstations. Taking this characteristic into account, we show that the algorithms such as the Binomial-tree based algorithms which are currently used for implementing collective operations are not efficient. We propose two new approaches (Speed-Partitioned Ordered Chain (SPOC) and Fastest-Node First (FNF)) to..

    Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks

    No full text
    : This paper presents a new approach to minimize node contention while performing multiple multicast/broadcast on wormhole k-ary n-cube networks with overlapped destination sets. The existing multicast algorithms in the literature deliver poor performance under multiple multicast because these algorithms have been designed with only single multicast in mind. The new algorithms introduced in this paper do not use any global knowledge about the respective destination sets of the concurrent multicasts. Instead, only local information and source-specific partitioning approach are used. For systems supporting unicast message-passing a new SPUmesh (Source- Partitioned Umesh) algorithm is proposed and shown to be superior than the conventional Umesh algorithm [14] for multiple multicast. Two different algorithms, SQHL (Source-Quadrant Hierarchical Leader) and SCHL (Source-Centered Hierarchical Leader), are proposed for systems with multidestination message-passing and shown to be superior t..
    corecore