645 research outputs found

    The performance evaluation of a 3D torus network using partial link-sharing method in NoC router buffer

    Get PDF
    The high performance network-on-chip (NoC) router using minimal hardware resources to minimize the layout area is very essential for NoC design. In this paper, we have proposed a memory sharing method of a wormhole routed NoC architecture to alleviate the area overhead of a NoC router. In the proposed method, a memory is shared by multiple physical links by using a multi-port memory. In this paper, we have proposed a partial link-sharing method and evaluated the communication performance using the proposed method. It is revealed that the resulted communication performance by the proposed methods is higher than that of the conventional method, and the progress ratio of the 3D-torus network is higher than that of 2D-torus network. It is shown that the improvement of communication performance using partial link sharing method is achieved with slightly increase of hardware cost. Copyright © 2017 The Institute of Electronics, Information and Communication Engineers

    An Energy and Performance Exploration of Network-on-Chip Architectures

    Get PDF
    In this paper, we explore the designs of a circuit-switched router, a wormhole router, a quality-of-service (QoS) supporting virtual channel router and a speculative virtual channel router and accurately evaluate the energy-performance tradeoffs they offer. Power results from the designs placed and routed in a 90-nm CMOS process show that all the architectures dissipate significant idle state power. The additional energy required to route a packet through the router is then shown to be dominated by the data path. This leads to the key result that, if this trend continues, the use of more elaborate control can be justified and will not be immediately limited by the energy budget. A performance analysis also shows that dynamic resource allocation leads to the lowest network latencies, while static allocation may be used to meet QoS goals. Combining the power and performance figures then allows an energy-latency product to be calculated to judge the efficiency of each of the networks. The speculative virtual channel router was shown to have a very similar efficiency to the wormhole router, while providing a better performance, supporting its use for general purpose designs. Finally, area metrics are also presented to allow a comparison of implementation costs

    Least Upper Delay Bound for VBR Flows in Networks-on- Chip with Virtual Channels

    Get PDF
    Real-time applications such as multimedia and gaming require stringent performance guarantees, usually enforced by a tight upper bound on the maximum end-to-end delay. For FIFO multiplexed on-chip packet switched networks we consider worst-case delay bounds for Variable Bit-Rate (VBR) flows with aggregate scheduling, which schedules multiple flows as an aggregate flow. VBR Flows are characterized by a maximum transfer size, peak rate, burstiness, and average sustainable rate. Based on network calculus, we present and prove theorems to derive per-flow end-to-end Equivalent Service Curves (ESC) which are in turn used for computing Least Upper Delay Bounds (LUDBs) of individual flows. In a realistic case study we find that the end-to-end delay bound is up to 46.9% more accurate than the case without considering the traffic peak behavior. Likewise, results also show similar improvements for synthetic traffic patterns. The proposed methodology is implemented in C++ and has low run-time complexity, enabling quick evaluation for large and complex SoCs

    Buffer-aware Worst Case Timing Analysis of Wormhole Network On Chip

    Get PDF
    A buffer-aware worst-case timing analysis of wormhole NoC is proposed in this paper to integrate the impact of buffer size on the different dependencies relationship between flows, i.e. direct and indirect blocking flows, and consequently the timing performance. First, more accurate definitions of direct and indirect blocking flows sets have been introduced to take into account the buffer size impact. Then, the modeling and worst-case timing analysis of wormhole NoC have been detailed, based on Network Calculus formalism and the newly defined blocking flows sets. This introduced approach has been illustrated in the case of a realistic NoC case study to show the trade off between latency and buffer size. The comparative analysis of our proposed Buffer-aware timing analysis with conventional approaches is conducted and noticeable enhancements in terms of maximum latency have been proved

    Priority Based Switch Allocator in Adaptive Physical Channel Regulator for On Chip Interconnects

    Get PDF
    Chip multiprocessors (CMPs) are now popular design paradigm for microprocessors due to their power, performance and complexity advantages where a number of relatively simple cores are integrated on a single die. On chip interconnection network (NoC) is an excellent architectural paradigm which offers a stable and generalized communication platform for large scale of chip multiprocessors. The existing model APCR has three regulation schemes designed at switch allocation stage of NoC router pipelining, such as monopolizing, fair-sharing and channel-stealing. Its aim is to fairly allocate physical bandwidth in the form of flit level transmission unit while breaking the conventional assumptions i.e.its size is same as phit size. They have implemented channel-stealing scheme using the existing round-robin scheduler which is a well known scheduling algorithm for providing fairness, which is not an optimal solution. In this thesis, we have extended the efficiency of APCR model and propose three efficient scheduling policies for the channel stealing scheme in order to provide better quality of service (QoS). Our work can be divided into three parts. In the first part, we implemented ratio based scheduling technique in which we keep track of average number of its sent from each input in every cycle. It not only provides fairness among virtual channels (VCs), but also increases the saturation throughput of the network. In the second part, we have implemented an age based scheduling technique where we prioritize the VC, based on the age of the requesting flits. The age of each request is calculated as the difference between the time of injection and the current simulation time. Age based scheduler minimizes the packet latency. In the last part, we implemented a Static-Priority based scheduler. In this case, we arbitrarily assign random priorities to the packets at the time of their injection into the network. In this case, the high priority packets can be forwarded to any of the VCs, whereas the low priority packets can be forwarded to a limited number of VCs. So, basically Static-Priority based scheduler limits the accessibility on the number of VCs depending upon the packet priority. We study the performance metrics such as the average packet latency, and saturation throughput resulted by all the three new scheduling techniques. We demonstrate our simulation results for all three scheduling policies i.e. bit complement, transpose and uniform random considering from very low (no load) to high load injection rates. We evaluate the performance improvement because of our proposed scheduling techniques in APCR comparing with the performance of basic NoC design. The performance is also compared with the results found in monopolizing, fair-sharing and round-robin schemes for channel-stealing of APCR. It is observed from the simulation results using our detailed cycle-accurate simulator that our new scheduling policies implemented in APCR model improves the network throughput by 10% in case of synthetic workloads, compared with the existing round-robin scheme. Also, our scheduling policy in APCR model outperforms the baseline router by 28X under synthetic workloads

    Dynamic Power Management of High Performance Network on Chip

    Get PDF
    With increased density of modern System on Chip(SoC) communication between nodes has become a major problem. Network on Chip is a novel on chip communication paradigm to solve this by using highly scalable and efficient packet switched network. The addition of intelligent networking on the chip adds to the chip’s power consumption thus making management of communication power an interesting and challenging research problem. While VLSI techniques have evolved over time to enable power reduction in the circuit level, the highly dynamic nature of modern large SoC demand more than that. This dissertation explores some innovative dynamic solutions to manage the ever increasing communication power in the post sub-micron era. Today’s highly integrated SoCs require great level of cross layer optimizations to provide maximum efficiency. This dissertation aims at the dynamic power management problem from top. Starting with a system level distribution and management down to microarchitecture enhancements were found necessary to deliver maximum power efficiency. A distributed power budget sharing technique is proposed. To efficiently satisfy the established power budget, a novel flow control and throttling technique is proposed. Finally power efficiency of underlying microarchitecture is explored and novel buffer and link management techniques are developed. All of the proposed techniques yield improvement in power-performance efficiency of the NoC infrastructure

    Scalability of broadcast performance in wireless network-on-chip

    Get PDF
    Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version
    corecore