263 research outputs found

    An Aggregate Scalable Scheme for Expanding the Crossbar Switch Network; Design and Performance Analysis

    Get PDF
    New computer network topology, called Penta-S, is simulated. This network is built of cross bar switch modules. Each module connects 32 computer nodes. Each node has two ports, one connects the node to the crossbar switch module and the other connects the node to a correspondent client node in another module through a shuffle link. The performance of this network is simulated under various network sizes, packet lengths and loads. The results are compared with those obtained from Macramé project for Clos multistage interconnection network and 2D-Grid network. The throughput of Penta-S falls between the throughput of Clos and the throughput of 2D-Grid networks. The maximum throughput of Penta-S was obtained at packet length of 128 bytes. Also the throughput grows linearly with the network size. On the opposite of Clos and 2D-Grid networks, the per-node throughput of Penta-S improves as the network size grows. The per-packet latency proved to be better than that of Clos network for large packet lengths and high loads. Also the packet latency proved to be nearly constant against various loads. The cost-efficiency of Penta-S proved to be better than those of 2D-Grid and Clos networks for large number of nodes (>200 nodes in the case of 2D-Grid and >350 nodes in the case of Clos).On the opposite of other networks, the cost-efficiency of Penta-S grows as its size grows. So this topology suits large networks and high traffic loads

    Symmetric rearrangeable networks and algorithms

    Get PDF
    A class of symmetric rearrangeable nonblocking networks has been considered in this thesis. A particular focus of this thesis is on Benes networks built with 2 x 2 switching elements. Symmetric rearrangeable networks built with larger switching elements have also being considered. New applications of these networks are found in the areas of System on Chip (SoC) and Network on Chip (NoC). Deterministic routing algorithms used in NoC applications suffer low scalability and slow execution time. On the other hand, faster algorithms are blocking and thus limit throughput. This will be an acceptable trade-off for many applications where achieving ”wire speed” on the on-chip network would require extensive optimisation of the attached devices. In this thesis I designed an algorithm that has much lower blocking probabilities than other suboptimal algorithms but a much faster execution time than deterministic routing algorithms. The suboptimal method uses the looping algorithm in its outermost stages and then in the two distinct subnetworks deeper in the switch uses a fast but suboptimal path search method to find available paths. The worst case time complexity of this new routing method is O(NlogN) using a single processor, which matches the best known results reported in the literature. Disruption of the ongoing communications in this class of networks during rearrangements is an open issue. In this thesis I explored a modification of the topology of these networks which gives rise to what is termed as repackable networks. A repackable topology allows rearrangements of paths without intermittently losing connectivity by breaking the existing communication paths momentarily. The repackable network structure proposed in this thesis is efficient in its use of hardware when compared to other proposals in the literature. As most of the deterministic algorithms designed for Benes networks implement a permutation of all inputs to find the routing tags for the requested inputoutput pairs, I proposed a new algorithm that can work for partial permutations. If the network load is defined as ρ, the mean number of active inputs in a partial permutation is, m = ρN, where N is the network size. This new method is based on mapping the network stages into a set of sub-matrices and then determines the routing tags for each pair of requests by populating the cells of the sub-matrices without creating a blocking state. Overall the serial time complexity of this method is O(NlogN) and O(mlogN) where all N inputs are active and with m < N active inputs respectively. With minor modification to the serial algorithm this method can be made to work in the parallel domain. The time complexity of this routing algorithm in a parallel machine with N completely connected processors is O(log^2 N). With m active requests the time complexity goes down to (logmlogN), which is better than the O(log^2 m + logN), reported in the literature for 2^0.5((log^2 -4logN)^0.5-logN)<= ρ <= 1. I also designed multistage symmetric rearrangeable networks using larger switching elements and implement a new routing algorithm for these classes of networks. The network topology and routing algorithms presented in this thesis should allow large scale networks of modest cost, with low setup times and moderate blocking rates, to be constructed. Such switching networks will be required to meet the bandwidth requirements of future communication networks

    Performance Study of Multilayered Multistage Interconnection Networks under Hotspot Traffic Conditions

    Get PDF
    The performance of Multistage Interconnection Networks (MINs) under hotspot traffic, where some percentage of the traffic is targeted at single nodes, which are also called hot spots, is of crucial interest. The prioritizing of packets has already been proposed at previous works as alleviation to the tree saturation problem, leading to a scheme that natively supports 2-class priority traffic. In order to prevent hotspot traffic from degrading uniform traffic we expand previous studies by introducing multilayer Switching Elements (SEs) at last stages in an attempt to balance between MIN performance and cost. In this paper the performance evaluation of dual-priority, double-buffered, multilayer MINs under single hotspot setups is presented and analyzed using simulation experiments. The findings of this paper can be used by MIN designers to optimally configure their networks

    Reading list of selected PASM-related publications

    Get PDF
    Prepared for a chapter to be published in the forthcoming Encyclopedia of Parallel Computing by Springer Publishing Company. The Encyclopedia will contain a broad coverage of the field and will include entries on machine organization, programming, algorithms, and applications. The broad coverage, together with extensive pointers to the literature for in-depth study, is expected to make the Encyclopedia a useful reference tool in parallel computing

    Quarc: a novel network-on-chip architecture

    Get PDF
    This paper introduces the Quarc NoC, a novel NoC architecture inspired by the Spidergon NoC. The Quarc scheme significantly outperforms the Spidergon NoC through balancing the traffic which is the result of the modifications applied to the topology and the routing elements.The proposed architecture is highly efficient in performing collective communication operations including broadcast and multicast. We present the topology, routing discipline and switch architecture for the Quarc NoC and demonstrate the performance with the results obtained from discrete event simulations

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori
    corecore