798 research outputs found

    Quarc: a novel network-on-chip architecture

    Get PDF
    This paper introduces the Quarc NoC, a novel NoC architecture inspired by the Spidergon NoC. The Quarc scheme significantly outperforms the Spidergon NoC through balancing the traffic which is the result of the modifications applied to the topology and the routing elements.The proposed architecture is highly efficient in performing collective communication operations including broadcast and multicast. We present the topology, routing discipline and switch architecture for the Quarc NoC and demonstrate the performance with the results obtained from discrete event simulations

    Quarc: a high-efficiency network on-chip architecture

    Get PDF
    The novel Quarc NoC architecture, inspired by the Spidergon scheme is introduced as a NoC architecture that is highly efficient in performing collective communication operations including broadcast and multicast. The efficiency of the Quarc architecture is achieved through balancing the traffic which is the result of the modifications applied to the topology and the routing elements of the Spidergon NoC. This paper provides an ASIC implementation of both architectures using UMCpsilas 0.13 mum CMOS technology and demonstrates an analysis and comparison of the cost and performance between the Quarc and the Spidergon NoCs

    CLEX: Yet Another Supercomputer Architecture?

    Get PDF
    We propose the CLEX supercomputer topology and routing scheme. We prove that CLEX can utilize a constant fraction of the total bandwidth for point-to-point communication, at delays proportional to the sum of the number of intermediate hops and the maximum physical distance between any two nodes. Moreover, % applying an asymmetric bandwidth assignment to the links, all-to-all communication can be realized (1+o(1))(1+o(1))-optimally both with regard to bandwidth and delays. This is achieved at node degrees of nεn^{\varepsilon}, for an arbitrary small constant ε(0,1]\varepsilon\in (0,1]. In contrast, these results are impossible in any network featuring constant or polylogarithmic node degrees. Through simulation, we assess the benefits of an implementation of the proposed communication strategy. Our results indicate that, for a million processors, CLEX can increase bandwidth utilization and reduce average routing path length by at least factors 1010 respectively 55 in comparison to a torus network. Furthermore, the CLEX communication scheme features several other properties, such as deadlock-freedom, inherent fault-tolerance, and canonical partition into smaller subsystems

    Nature-Inspired Interconnects for Self-Assembled Large-Scale Network-on-Chip Designs

    Get PDF
    Future nano-scale electronics built up from an Avogadro number of components needs efficient, highly scalable, and robust means of communication in order to be competitive with traditional silicon approaches. In recent years, the Networks-on-Chip (NoC) paradigm emerged as a promising solution to interconnect challenges in silicon-based electronics. Current NoC architectures are either highly regular or fully customized, both of which represent implausible assumptions for emerging bottom-up self-assembled molecular electronics that are generally assumed to have a high degree of irregularity and imperfection. Here, we pragmatically and experimentally investigate important design trade-offs and properties of an irregular, abstract, yet physically plausible 3D small-world interconnect fabric that is inspired by modern network-on-chip paradigms. We vary the framework's key parameters, such as the connectivity, the number of switch nodes, the distribution of long- versus short-range connections, and measure the network's relevant communication characteristics. We further explore the robustness against link failures and the ability and efficiency to solve a simple toy problem, the synchronization task. The results confirm that (1) computation in irregular assemblies is a promising and disruptive computing paradigm for self-assembled nano-scale electronics and (2) that 3D small-world interconnect fabrics with a power-law decaying distribution of shortcut lengths are physically plausible and have major advantages over local 2D and 3D regular topologies

    Energy efficient torus networks with on/off links

    Full text link
    [EN] Future exascale computing systems will require energy and performance efficient interconnection networks to respond to the high data movement demands of new applications, such as those coming from big-data and artificial intelligence areas. The network structure plays a major role in the overall interconnect performance, for this reason torus is a common topology used in the current largest supercomputers. There are several proposals to improve energy efficiency of interconnection networks. However, few works combine both energy and performance, and sometimes they are treated as opposed issues. In this paper, we try to determine which torus network configuration offers the best performance/energy ratio when high-radix switches are used to build the interconnect system. The performance/energy evaluation has been performed by trace-driven simulation under realistic scenarios, where several mixes of scientific applications share a supercomputer system and are scheduled to be executed with the available resources at each moment.This work has been supported by the Spanish MINECO and European Commission (FEDER funds) under project TIN2015-66972-05-1-R and project TIN2015-66972-05-2-R. Francisco J. Andujar is also funded by the Spanish MINECO under a Juan de la Cierva grant FJCI-2015-26080.Andújar, FJ.; Coll, S.; Alonso Díaz, M.; Martínez-Rubio, J.; López Rodríguez, PJ.; Sánchez, JL.; Alfaro, FJ.... (2019). Energy efficient torus networks with on/off links. Journal of Parallel and Distributed Computing. 130:37-49. https://doi.org/10.1016/j.jpdc.2019.03.015S374913

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

    Constructing virtual 5-dimensional tori out of lower-dimensional network cards

    Full text link
    [EN] In the Top500 and Graph500 lists of the last years, some of the most powerful systems implement a torus topology to interconnect themillions of computing nodes they include. Some of these torus networks are of five or six dimensions, which implies an additional difficulty as the node degree increases. In previous works, we proposed and evaluated the nD Twin (nDT) torus topology to virtually increase the dimensions a torus is able to implement. We showed that this new topology reduces the distances between nodes, increasing, therefore, global network performance. In this work, we present how to build a 5DT torus network using a specific commercial 6-port network card (EXTOLL card) to interconnect those nodes. We show, using the same number of cards, that the performance of the 5DT torus network we are able to implement using our proposal is higher than the performance of the 3D torus network for the same number of compute nodes.Spanish MINECO; European Commission, Grant/Award Number: TIN2015-66972-C5-1-R and TIN2015-66972-C5-2-R; JCCM, Grant/Award Number: PEII-2014-028-P; Spanish MICINN, Grant/Award Number: FJCI-2015-26080Andújar-Muñoz, FJ.; Villar, JA.; Sanchez Garcia, JL.; Alfaro Cortes, FJ.; Duato Marín, JF.; Fröning, H. (2017). Constructing virtual 5-dimensional tori out of lower-dimensional network cards. Concurrency and Computation Practice and Experience. 1-17. https://doi.org/10.1002/cpe.4361S11
    corecore