362 research outputs found

    A Framework for Adaptive Routing in Multicomputer Networks

    Get PDF
    Message-passing concurrent computers, also known as multicomputers, such as the Caltech Cosmic Cube [47] and its commercial descendents, consist of many computing nodes that interact with each other by sending and receiving messages over communication channels between the nodes. The communication networks of the second-generation machines, such as the Symult Series 2010 and the Intel iPSC2 [2], employ an oblivious wormhole-routing technique that guarantees deadlock freedom. The network performance of this highly evolved oblivious technique has reached a limit of being capable of delivering, under random traffic, a stable maximum sustained throughput of ~~45 to 50% of the limit set by the network bisection bandwidth, while maintaining acceptable network latency. This thesis examines the possibility of performing adaptive routing as an approach to further improving upon the performance and reliability of these networks. In an adaptive multipath routing scheme, message trajectories are no longer deterministic, but are continuously perturbed by local message loading. Message packets will tend to follow their shortest-distance routes to destinations in normal traffic loading, but can be detoured to longer but less-loaded routes as local congestion occurs. A simple adaptive cut-through packet-switching framework is described, and a number of fundamental issues concerning the theoretical feasibility of the adaptive approach are studied. Freedom of communication deadlock is achieved by following a coherent channel protocol and by applying voluntary misrouting as needed. Packet deliveries are assured by resolving channel-access conflicts according to a priority assignment. Fairness of network access is assured either by sending round-trip packets or by having each node follow a local injection-synchronization protocol. The performance behavior of the proposed adaptive cut-through framework is studied with stochastic modeling and analysis, as well as through extensive simulation experiments for the 2D and 3D rectilinear networks. Theoretical bounds on various average network-performance metrics are derived for these rectilinear networks. These bounds provide a standard frame of reference for interpreting the performance results. In addition to the potential gain in network performance, the adaptive approach offers the potential for exploiting the inherent path redundancy found in richly connected networks in order to perform fault-tolerant routing. Two convexity-related notions are introduced to characterize the conditions under which our adaptive routing formulation is adequate to provide fault-tolerant routing, with minimal change in routing hardware, The effectiveness of these notions is studied through extensive simulations, The 2D octagonal-mesh network is suggested; this displays excellent fault-tolerant potential under the adaptive routing framework. Both performance and reliability behaviors of the octagonal mesh are studied in detail. A number of implementation issues are examined. Encoding schemes for packet headers that admit simple incremental updates while providing all necessary routing information in the first flit of a relatively narrow flit width are developed. A pipelined control structure that allows a packet to cut through an intermediate node with a minimum delay of two cycles is described. A distributed clocking scheme is developed that eliminates the problem of global clock-signal distribution. Under this clocking scheme, the adaptive routers can be tessellated to form a network of arbitrary size

    On the performance of routing algorithms in wormhole-switched multicomputer networks

    Get PDF
    This paper presents a comparative performance study of adaptive and deterministic routing algorithms in wormhole-switched hypercubes and investigates the performance vicissitudes of these routing schemes under a variety of network operating conditions. Despite the previously reported results, our results show that the adaptive routing does not consistently outperform the deterministic routing even for high dimensional networks. In fact, it appears that the superiority of adaptive routing is highly dependent to the broadcast traffic rate generated at each node and it begins to deteriorate by growing the broadcast rate of generated message

    A general analytical model of adaptive wormhole routing in k-ary n-cubes

    Get PDF
    Several analytical models of fully adaptive routing have recently been proposed for k-ary n-cubes and hypercube networks under the uniform traffic pattern. Although,hypercube is a special case of k-ary n-cubes topology, the modeling approach for hypercube is more accurate than karyn-cubes due to its simpler structure. This paper proposes a general analytical model to predict message latency in wormhole-routed k-ary n-cubes with fully adaptive routing that uses a similar modeling approach to hypercube. The analysis focuses Duato's fully adaptive routing algorithm [12], which is widely accepted as the most general algorithm for achieving adaptivity in wormhole-routed networks while allowing for an efficient router implementation. The proposed model is general enough that it can be used for hypercube and other fully adaptive routing algorithms

    Submicron Systems Architecture Project: Semiannual Technial Report

    Get PDF
    No abstract available

    Submicron Systems Architecture Project : Semiannual Technical Report

    Get PDF
    The Mosaic C is an experimental fine-grain multicomputer based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM, processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed router. The chip architecture provides low-overhead and low-latency handling of message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are now in prototype production under a subcontract with Hewlett-Packard. We are planning to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic C hardware also includes host-interface boards and high-speed communication cables. The hardware developments and activities of the past eight months are described in section 2.1. The programming system that we are developing for the Mosaic C is based on the same message-passing, reactive-process, computational model that we have used with earlier multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain concurrency. A process executes only in response to receiving a message, and may in execution send messages, create new processes, and modify its persistent variables before it either exits or becomes dormant in preparation for receiving another message. These computations are expressed in an object-oriented programming notation, a derivative of C++ called C+-. The computational model and the C+- programming notation are described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides automatic process placement and highly distributed management of system resources. The Mosaic C runtime system is described in section 2.3

    Submicron Systems Architecture Project: Semiannual Technical Report

    Get PDF
    No abstract available

    Contention and achieved performance in multicomputer wormhole routing networks

    Get PDF

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori

    New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

    Get PDF
    Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms
