1,010 research outputs found

    An efficient 2D router architecture for extending the performance of inhomogeneous 3D NoC-based multi-core architectures

    Get PDF
    To meet the performance and scalability demands of the fast-paced technological growth towards exascale and Big-Data processing with the performance bottleneck of conventional metal based interconnects, alternative interconnect fabrics such as inhomogeneous three dimensional integrated Network-on-Chip (3D NoC) has emanated as a cost-effective solution for emerging multi-core design. However, these interconnects trade-off optimized performance for cost by restricting the number of area and power hungry 3D routers. Consequently, in this paper, we propose a low-latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow 2D routers in inhomogeneous 3D NoCs. By combining the low-complexity bypassing technique with adaptive routing, the proposed router is able to balance the traffic in the network to reduce the average packet latency under various traffic loads. Simulation shows that, the proposed router can reduce the average packet delay by an average of 45% in 3D NoCs

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

    Modeling high-performance wormhole NoCs for critical real-time embedded systems

    Get PDF
    Manycore chips are a promising computing platform to cope with the increasing performance needs of critical real-time embedded systems (CRTES). However, manycores adoption by CRTES industry requires understanding task's timing behavior when their requests use manycore's network-on-chip (NoC) to access hardware shared resources. This paper analyzes the contention in wormhole-based NoC (wNoC) designs - widely implemented in the high-performance domain - for which we introduce a new metric: worst-contention delay (WCD) that captures wNoC impact on worst-case execution time (WCET) in a tighter manner than the existing metric, worst-case traversal time (WCTT). Moreover, we provide an analytical model of the WCD that requests can suffer in a wNoC and we validate it against wNoC designs resembling those in the Tilera-Gx36 and the Intel-SCC 48-core processors. Building on top of our WCD analytical model, we analyze the impact on WCD that different design parameters such as the number of virtual channels, and we make a set of recommendations on what wNoC setups to use in the context of CRTES.Peer ReviewedPostprint (author's final draft

    Towards the practical design of performance-aware resilient wireless NoC architectures

    Get PDF
    Recently, an improved surface wave-enabled communication fabric has been proposed to solve the reliability issues of emerging hybrid wired-wireless Network-on-Chip (WiNoC) architectures. Thus, providing a promising solution to the performance and scalability demands of the fast-paced technological growth towards exascale and Big-Data processing on future System-on-Chip (SoC) design. However, WiNoCs trade-off optimized performance for cost by restricting the number of area and power hungry wireless nodes. Consequently, in this paper, we propose a low-latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow wired routers in such emerging hyhbrid NoCs. The proposed router is able to redistribute traffic in the network to alleviate average packet latency at both low and high traffic conditions. As a second contribution the paper presents an experimental evaluation of a practically implemented surface wave communication fabric. By reducing the latency between the wired nodes and wireless nodes the proposed router can improve performance efficiency in terms of average packet delay by an average of 50% in WiNoCs

    Extending the performance of hybrid NoCs beyond the limitations of network heterogeneity

    Get PDF
    To meet the performance and scalability demands of the fast-paced technological growth towards exascale and Big-Data processing with the performance bottleneck of conventional metal based interconnects (wireline), alternative interconnect fabrics such as inhomogeneous three-dimensional integrated Network-on-Chip (3D NoC) and hybrid wired-wireless Network-on-Chip (WiNoC) have emanated as a cost-effective solution for emerging System-on-Chip (SoC) design. However, these interconnects trade-off optimized performance for cost by restricting the number of area and power hungry 3D routers and wireless nodes. Moreover, the non-uniform distributed traffic in chip multiprocessor (CMP) demands an on-chip communication infrastructure which can avoid congestion under high traffic conditions while possessing minimal pipeline delay at low-load conditions. To this end, in this paper, we propose a low-latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow 2D routers in such emerging hybrid NoCs. The proposed router transmits a flit using dimension-ordered routing (DoR) in the bypass datapath at low-loads. When the output port required for intra-dimension bypassing is not available, the packet is routed adaptively to avoid congestion. The router also has a simplified virtual channel allocation (VA) scheme that yields a non-speculative low-latency pipeline. By combining the low-complexity bypassing technique with adaptive routing, the proposed router is able balance the traffic in hybrid NoCs to achieve low-latency communication under various traffic loads. Simulation shows that, the proposed router can reduce applications’ execution time by an average of 16.9% compared to low-latency routers such as SWIFT. By reducing the latency between 2D routers (or wired nodes) and 3D routers (or wireless nodes) the proposed router can improve performance efficiency in terms of average packet delay by an average of 45% (or 50%) in 3D NoCs (or WiNoCs)

    System architecture and hardware implementations for a reconfigurable MPLS router

    Get PDF
    With extremely wide bandwidth and good channel properties, optical fibers have brought fast and reliable data transmission to today’s data communications. However, to handle heavy traffic flowing through optical physical links, much faster processing speed is required or else congestion can take place at network nodes. Also, to provide people with voice, data and all categories of multimedia services, distinguishing between different data flows is a requirement. To address these router performance, Quality of Service /Class of Service and traffic engineering issues, Multi-Protocol Label Switching (MPLS) was proposed for IP-based Internetworks. In addition, routers flexible in hardware architecture in order to support ever-evolving protocols and services without causing big infrastructure modification or replacement are also desirable. Therefore, reconfigurable hardware implementation of MPLS was proposed in this project to obtain the overall fast processing speed at network nodes. The long-term goal of this project is to develop a reconfigurable MPLS router, which uniquely integrates the best features of operations being conducted in software and in run-time-reconfigurable hardware. The scope of this thesis includes system architecture and service algorithm considerations, Verilog coding and testing for an actual device. The hardware and software co-design technique was used to partition and schedule the protocol code for execution on both a general-purpose processor and stream-based hardware. A novel RPS scheme that is practically easy to build and can realize pipelined packet-by-packet data transfer at each output was proposed to take the place of the traditional crossbar switching. In RPS, packets with variable lengths can be switched intelligently without performing packet segmentation and reassembly. Primary theoretical analysis of queuing issues was discussed and an improved multiple queue service scheduling policy UD-WRR was proposed, which can reduce packet-waiting time without sacrificing the performance. In order to have the tests carried out appropriately, dedicated circuitry for the MPLS functional block to interface a specific MAC chip was implemented as well. The hardware designs for all functions were realized with a single Field Programmable Gate Array (FPGA) device in this project. The main result presented in this thesis was the MPLS function implementation realizing a major part of layer three routing at the reconfigurable hardware level, which advanced a great step towards the goal of building a router that is both fast and flexible
    • …
    corecore