159 research outputs found

    Analytical modelling of hot-spot traffic in deterministically-routed k-ary n-cubes

    Get PDF
    Many research studies have proposed analytical models to evaluate the performance of k-ary n-cubes with deterministic wormhole routing. Such models however have so far been confined to uniform traffic distributions. There has been hardly any model proposed that deal with non-uniform traffic distributions that could arise due to, for instance, the presence of hot-spots in the network. This paper proposes the first analytical model to predict message latency in k-ary n-cubes with deterministic routing in the presence of hot-spots. The validity of the model is demonstrated by comparing analytical results with those obtained through extensive simulation experiments

    A Dag Based Wormhole Routing Strategy

    Get PDF
    The wormhole routing (WR) technique is replacing the hitherto popular storeand- forward routing in message passing multicomputers. This is because the latter has speed and node size constraints. The wormhole routing is, on the other hand, susceptible to deadlock. A few WR schemes suggested recently in the literature, concentrate on avoiding deadlock. This thesis presents a Directed Acyclic Graph (DAG) based WR technique. At low traffic levels the proposed method follows a minimal path. But the routing is adaptive at higher traffic levels. We prove that the algorithm is deadlock-free. This method is compared for its performance with a deterministic algorithm which is a de facto standard. We also compare its implementation costs with other adaptive routing algorithms and the relative merits and demerits are highlighted in the text

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori

    A method of computation for worst-case delay analysis on SpaceWire networks

    Get PDF
    SpaceWire is a standard for on-board satellite networks chosen by the ESA as the basis for future data-handling architectures. However, network designers need tools to ensure that the network is able to deliver critical messages on time. Current research only seek to determine probabilistic results for end-to-end delays on Wormhole networks like SpaceWire. This does not provide sufficient guarantee for critical traffic. Thus, in this paper, we propose a method to compute an upper-bound on the worst-case end-to-end delay of a packet in a SpaceWire network

    New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

    Get PDF
    Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

    Performance analysis of wormhole switched interconnection networks with virtual channels and finite buffers

    Get PDF
    An efficient interconnection network that provides high bandwidth and low latency interprocessor communication is critical to harness fully the computational power of large scale multicomputer. K-ary n-cube networks have been widely adopted in contemporary multicomputers due to their desirable properties. As such, the present study focuses on a performance analysis of K-ary n-cubes employing wormhole switching, virtual channels, and adaptive routing. The objective of this dissertation is twofold: to examine the performance of these networks, and to compare the performance merits of various topologies under different working conditions, by means of analytical modelling. Most existing analytical models reported in the literature have used a method originally proposed by Dally to capture the effects of virtual channels on network performance. This method is based on a Markov chain and it has been shown that its prediction accuracy degrades as traffic increases. Moreover, these studies have also constrained the buffer capacity to a single flit per channel, a simplifying assumption that has often been invoked to ease the derivation of the analytical models. Motivated by these observations, the first part of this research proposes a new method for modelling virtual channels, based on an M/G/1 queue. Owing to the generality of this method. Daily's method is shown to be a special case when the message service time is exponentially distributed. The second part of this research uses theoretical results of queuing systems to relax the single-flit buffer assumption. New analytical models are then proposed to capture the effects of deploying arbitrary size buffers on the performance of deterministic and adaptive routing algorithms. Simulation experiments reveal that results from the proposed analytical models are in close agreement with those obtained through simulation. Building on these new analytical models, the third part of this research compares the relative performance merits of K-ary n-cubes under different operating conditions, in the presence of finite size buffers and multiple virtual channels. Namely, the analysis first revisits the relative performance merits of the well-known 2D torus, 3D torus and hypercube under different implementation constraints. The analysis has then been extended to investigate the performance impact of arranging the total buffer space, allocated to a physical channel, into multiple virtual channels. Finally, the performance of adaptive routing has been compared to that of deterministic routing. While previous similar studies have only taken account of channel and router costs, the present analysis incorporates different intra-router delays, as well, and thus generates more realistic results. In fact, the results of this research differ notably from those reported in previous studies, illustrating the sensitivity of such studies to the level of detail, degree of accuracy and the realism of the assumptions adopted

    Performance evaluation of distributed crossbar switch hypermesh

    Get PDF
    The interconnection network is one of the most crucial components in any multicomputer as it greatly influences the overall system performance. Several recent studies have suggested that hypergraph networks, such as the Distributed Crossbar Switch Hypermesh (DCSH), exhibit superior topological and performance characteristics over many traditional graph networks, e.g. k-ary n-cubes. Previous work on the DCSH has focused on issues related to implementation and performance comparisons with existing networks. These comparisons have so far been confined to deterministic routing and unicast (one-to-one) communication. Using analytical models validated through simulation experiments, this thesis extends that analysis to include adaptive routing and broadcast communication. The study concentrates on wormhole switching, which has been widely adopted in practical multicomputers, thanks to its low buffering requirement and the reduced dependence of latency on distance under low traffic. Adaptive routing has recently been proposed as a means of improving network performance, but while the comparative evaluation of adaptive and deterministic routing has been widely reported in the literature, the focus has been on graph networks. The first part of this thesis deals with adaptive routing, developing an analytical model to measure latency in the DCSH, and which is used throughout the rest of the work for performance comparisons. Also, an investigation of different routing algorithms in this network is presented. Conventional k-ary n-cubes have been the underlying topology of contemporary multicomputers, but it is only recently that adaptive routing has been incorporated into such systems. The thesis studies the relative performance merits of the DCSH and k-ary n-cubes under adaptive routing strategy. The analysis takes into consideration real-world factors, such as router complexity and bandwidth constraints imposed by implementation technology. However, in any network, the routing of unicast messages is not the only factor in traffic control. In many situations (for example, parallel iterative algorithms, memory update and invalidation procedures in shared memory systems, global notification of network errors), there is a significant requirement for broadcast traffic. The DCSH, by virtue of its use of hypergraph links, can implement broadcast operations particularly efficiently. The second part of the thesis examines how the DCSH and k-ary n-cube performance is affected by the presence of a broadcast traffic component. In general, these studies demonstrate that because of their relatively high diameter, k-ary n-cubes perform poorly when message lengths are short. This is consistent with earlier more simplistic analyses which led to the proposal for the express-cube, an enhancement of the basic k-ary n-cube structure, which provides additional express channels, allowing messages to bypass groups of nodes along their paths. The final part of the thesis investigates whether this "partial bypassing" can compete with the "total bypassing" capability provided inherently by the DCSH topology

    Design and implementation of NoC routers and their application to Prdt-based NoC\u27s

    Full text link
    With a communication-centric design style, Networks-on-Chips (NoCs) emerges as a new paradigm of Systems-on-Chips (SoCs) to overcome the limitations of bus-based communication infrastructure. An important problem in the design of NoCs is the router design, which has great impact on the cost and performance of a NoC system. This thesis is focused on the design and implementation of an optimized parameterized router which can be applied in mesh/torus-based and Perfect Recursive Diagonal Torus (PRDT)-based NoCs; In specific, the router design includes the design and implementation of two routing algorithms (vector routing and circular coded vector routing), the wormhole switching scheme, the scheduling scheme, buffering strategy, and flow control scheme. Correspondingly, the following components are designed and implemented: input controller, output controller, crossbar switch, and scheduler. Verilog HDL codes are generated and synthesized on ASIC platforms. Most components are designed in parameterized way. Performance evaluation of each component of the router in terms of timing, area, and power consumption is conducted. The efficiency of the two routing algorithms and tradeoff between computational time (tsetup) and area are analyzed; To reduce the area cost of the router design, the two major components, the crossbar switch and the scheduler, are optimized. Particularly, for crossbar switch, a comparative study of two crossbar designs is performed with the aid of Magic Layout editor, Synopsys CosmosSE and Awaves; Based on the router design, the PRDT network composed of 4x4 routers is designed and synthesized on ASIC platforms
    • …
    corecore