250 research outputs found

    Adaptive turn-prohibition routing algorithm for the networks of workstations

    Get PDF
    Deadlock occurrence is a critical problem for any computer network. Various solutions have been proposed over last two decades to solve problem of deadlocks in networks using different routing schemes, like up/down routing algorithm used in Myrinet switches. However, most of existing approaches for deadlock-free routing either try to eliminate any possibility of deadlock occurrence, which can result in putting extra restrictions on the routing in the networks or put no restrictions on routing, which leads to other approach namely deadlock recovery. In this thesis emphasis is on developing hybrid approach for routing in wormhole networks, wherein some prohibition is imposed on routing along with some kind of deadlock recovery. This adaptive approach allows changing the amount of routing restrictions depending on network traffic, thus providing a flexible method to achieve better network performance compared to the existing techniques. The main idea of the proposed method consists in the sequential selections of some turns, which are prohibited to be selected during routing. After each additional turn is added, the probability of deadlock occurrence decreases gradually. Cost formula is proposed to estimate cost of implementing both strategies in a network which is basis of proposed adaptive model

    An Efficient Routing Implementation for Irregular Networks

    Get PDF
    with the recent advancements in multi-core era workstation clusters have emerged as a cost-effective approach to build a network of workstations NOWs NOWs connect the small groups of processors to a network of switching elements that form irregular topologies Designing an efficient routing and a deadlock avoidance algorithm for irregular networks is quite complicated in terms of latency and area of the routing tables thus impractical for scalability of On Chip Networks Many deadlock free routing mechanisms have been proposed for regular networks but they cannot be employed in irregular networks In this paper a new methodology has been proposed for efficient routing scheme called LBDR-UD which save the average 64 59 routing tables in the switch for irregular networks as compare to up down routing The Basic concept of routing scheme is combination of up down and Logic Based Distributed Routing By simulation it has been shown that the LBDR-UD is deadlock free and adaptive to all dynamic network traffic condition

    APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

    Full text link
    Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

    Tree-structured small-world connected wireless network-on-chip with adaptive routing

    Get PDF
    Traditional Network-on-Chip (NoC) systems comprised of many cores suffer from debilitating bottlenecks of latency and significant power dissipation due to the overhead inherent in multi-hop communication. In addition, these systems remain vulnerable to malicious circuitry incorporated into the design by untrustworthy vendors in a world where complex multi-stage design and manufacturing processes require the collective specialized services of a variety of contractors. This thesis proposes a novel small-world tree-based network-on-chip (SWTNoC) structure designed for high throughput, acceptable energy consumption, and resiliency to attacks and node failures resulting from the insertion of hardware Trojans. This tree-based implementation was devised as a means of reducing average network hop count, providing a large degree of local connectivity, and effective long-range connectivity by means of a novel wireless link approach based on carbon nanotube (CNT) antenna design. Network resiliency is achieved by means of a devised adaptive routing algorithm implemented to work with TRAIN (Tree-based Routing Architecture for Irregular Networks). Comparisons are drawn with benchmark architectures with optimized wireless link placement by means of the simulated annealing (SA) metaheuristic. Experimental results demonstrate a 21% throughput improvement and a 23% reduction in dissipated energy per packet over the closest competing architecture. Similar trends are observed at increasing system sizes. In addition, the SWTNoC maintains this throughput and energy advantage in the presence of a fault introduced into the system. By designing a hierarchical topology and designating a higher level of importance on a subset of the nodes, much higher network throughput can be attained while simultaneously guaranteeing deadlock freedom as well as a high degree of resiliency and fault-tolerance

    The performance evaluation of a 3D torus network using partial link-sharing method in NoC router buffer

    Get PDF
    The high performance network-on-chip (NoC) router using minimal hardware resources to minimize the layout area is very essential for NoC design. In this paper, we have proposed a memory sharing method of a wormhole routed NoC architecture to alleviate the area overhead of a NoC router. In the proposed method, a memory is shared by multiple physical links by using a multi-port memory. In this paper, we have proposed a partial link-sharing method and evaluated the communication performance using the proposed method. It is revealed that the resulted communication performance by the proposed methods is higher than that of the conventional method, and the progress ratio of the 3D-torus network is higher than that of 2D-torus network. It is shown that the improvement of communication performance using partial link sharing method is achieved with slightly increase of hardware cost. Copyright © 2017 The Institute of Electronics, Information and Communication Engineers

    Efficient routing mechanisms for Dragonfly networks

    Get PDF
    High-radix hierarchical networks are cost-effective topologies for large scale computers. In such networks, routers are organized in super nodes, with local and global interconnections. These networks, known as Dragonflies, outperform traditional topologies such as multi-trees or tori, in cost and scalability. However, depending on the traffic pattern, network congestion can lead to degraded performance. Misrouting (non-minimal routing) can be employed to avoid saturated global or local links. Nevertheless, with the current deadlock avoidance mechanisms used for these networks, supporting misrouting implies routers with a larger number of virtual channels. This exacerbates the buffer memory requirements that constitute one of the main constraints in high-radix switches. In this paper we introduce two novel deadlock-free routing mechanisms for Dragonfly networks that support on-the-fly adaptive routing. Using these schemes both global and local misrouting are allowed employing the same number of virtual channels as in previous proposals. Opportunistic Local Misrouting obtains the best performance by providing the highest routing freedom, and relying on a deadlock-free escape path to the destination for every packet. However, it requires Virtual Cut-Through flow-control. By contrast, Restricted Local Misrouting prevents the appearance of cycles thanks to a restriction of the possible routes within super nodes. This makes this mechanism suitable for both Virtual Cut-Through and Wormhole networks. Evaluations show that the proposed deadlock-free routing mechanisms prevent the most frequent pathological issues of Dragonfly networks. As a result, they provide higher performance than previous schemes, while requiring the same area devoted to router buffers.This work has been supported by the Spanish Ministry of Science under contracts TIN2010-21291-C02-02, TIN2012-34557, and by the European HiPEAC Network of Excellence. The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. ERC-2012-Adg-321253-RoMoL. M. Garc´ıa and M. Odriozola participated in this research work while they were affiliated with the University of Cantabria.Peer ReviewedPostprint (author's final draft
    corecore