30 research outputs found
Throttling Control for Bufferless Routing in On-Chip Networks
As the number of core integration on a single die grows, buffers consume significant energy, and occupy chip area. A bufferless deflection outing that eliminates router’s input port buffers can considerably help saving energy and chip area while providing similar performance of xisting buffered routing, especially for low-to-medium network loads. However when congestion increases, the bufferless frequently causes flits deflections, and misrouting leading to a degradation of network performance. In this paper, we propose IRT(Injection Rate Throttling), a ocal throttling mechanism that reduces deflection and misrouting for high-load bufferless networks. IRT provides injection rate control independently for each network node, allowing to reduce network congestion. Our simulation results based on a cycle-accurate simulator show that using IRT, IRT reduces average transmission latency by 8.65% compared to traditional bufferless routing
On the Effectiveness of Source Throttling for Networks-on-Chip in Chip Multiprocessor Designs
In modern chip-multiprocessor (CMP) designs, with the increasing number of cores, traffic between different cores keeps increasing. Consequently, on-chip interconnection networks experience increasingly large communication bandwidth demand. This thesis focuses on Quality-of-Service (QoS) of Networks-on-Chip (NoC). NoC is considered as a scalable approach of interconnection network compared to conventional bus-based architecture. Like Ethernet, NoC faces common QoS issues such as bandwidth utilization and fairness. This thesis is a study on the effectiveness of source throttling for NoC, including fairness and overall performance such as program run time and packet latency. Source throttling is a well-known technique for traffic regulation. It is shown to be effective for bufferless NoC in previous studies. Due to different traffic behaviors and characteristics, however, it is not obvious if source throttling is effective for general buffered NoC. The first part of this research is a set of network simulations on various synthetic traffic cases. The results indicate that source throttling can reduce application runtime when (1) the network is congested, (2) there are dependencies among communication requests, and (3) the width of the dependence graph must be sufficiently large. The second part is full system simulations on public benchmark suites. Source throttling does not bring benefit for these relative realistic cases. Further experiment reveals that the aforementioned conditions are not satisfied. This explains why source throttling is of little use for general buffered NoC in CMP designs
On the Effectiveness of Source Throttling for Networks-on-Chip in Chip Multiprocessor Designs
In modern chip-multiprocessor (CMP) designs, with the increasing number of cores, traffic between different cores keeps increasing. Consequently, on-chip interconnection networks experience increasingly large communication bandwidth demand. This thesis focuses on Quality-of-Service (QoS) of Networks-on-Chip (NoC). NoC is considered as a scalable approach of interconnection network compared to conventional bus-based architecture. Like Ethernet, NoC faces common QoS issues such as bandwidth utilization and fairness. This thesis is a study on the effectiveness of source throttling for NoC, including fairness and overall performance such as program run time and packet latency. Source throttling is a well-known technique for traffic regulation. It is shown to be effective for bufferless NoC in previous studies. Due to different traffic behaviors and characteristics, however, it is not obvious if source throttling is effective for general buffered NoC. The first part of this research is a set of network simulations on various synthetic traffic cases. The results indicate that source throttling can reduce application runtime when (1) the network is congested, (2) there are dependencies among communication requests, and (3) the width of the dependence graph must be sufficiently large. The second part is full system simulations on public benchmark suites. Source throttling does not bring benefit for these relative realistic cases. Further experiment reveals that the aforementioned conditions are not satisfied. This explains why source throttling is of little use for general buffered NoC in CMP designs
Round Robin based Arbitration Mechanism for Signaling Approach based Router Architecture
In Network-on-Chip the effectiveness of the network resource allocation is demonstrated by the flow control mechanism. There are two types of flow control mechanisms: buffered and bufferless. Compared to buffered flow control methods, buffer less flow control mechanisms are easier to use, need less power, and take up less space. When there are congestion and resource conflicts, it experiences higher packet loss and packet misrouting inside the network. A good buffered control mechanism useful as it overcomes the limitations of buffer less mechanism. There are numerous buffered and bufferless flow control methods available. In this paper, signaling-based Virtual Output Queue Router Arbiter Mechanism is used to explore credit-based flow control. This mechanism worked on new concept that is “stress value”. This information is generated in the form of credit whenever any input buffer has free space. Then, using this credit data, the node's stress value is determined. Free buffer space takes precedence over stress value if it is bigger. The stress value will increase if there is less available buffer space. To handle the congestion problem, the signaling block then sends this stress value to a neighboring router. To help the arbitrator make a more accurate decision, the crediting system constantly operates in tandem with arbitration
A Multifunctional Integrated Circuit Router for Body Area Network Wearable Systems
A multifunctional router IC to be included in the nodes of a wearable body sensor network is described and evaluated. The router targets different application scenarios, especially those including tens of sensors, embedded into textile materials and with high data-rate communication demands. The router IC supports two different functionality sets, one for sensor nodes and another for the base node, both based on the same circuit module. The nodes are connected to each other by means of woven thick conductive yarns forming a mesh topology with the base node at the center. From the standpoint of the network, each sensor node is a four port router capable of handling packets from destination nodes to the base node, with sufficient redundant paths. The adopted hybrid circuit and packet switching scheme significantly improve network performance in terms of end-to-end delay, throughput and power consumption. The IC also implements a highly precise, sub-microsecond one-way time synchronization protocol which is used for time stamping the acquired data. The communication module was implemented in a 4-metal, 0.35 μm CMOS technology. The maximum data rate of the system is 35 Mbps while supporting up to 250 sensors, which exceeds current BAN applications scenarios.This work was supported in part
by the Fundação para a Ciéncia e a Tecnologia (FCT) (Portuguese Foundation
for Science and Technology) under Project PROLIMB PTDC/EEAELC/103683/2008 and through the Ph.D. Grant SFRH/BD/75324/2010,
and in part by the CREaTION, FCT/MEC through national funds and
co-funded by the FEDER-PT2020 partnership agreement under Project
UIDB/EEA/50008/2020, Project CONQUEST (CMU/ECE/030/2017),
Project COST CA15104, and ORCIP. (Corresponding author:
Fardin Derogarian Miyandoab.)info:eu-repo/semantics/publishedVersio
Theoretical Analysis and Evaluation of NoCs with Weighted Round-Robin Arbitration
Fast and accurate performance analysis techniques are essential in early
design space exploration and pre-silicon evaluations, including software
eco-system development. In particular, on-chip communication continues to play
an increasingly important role as the many-core processors scale up. This paper
presents the first performance analysis technique that targets networks-on-chip
(NoCs) that employ weighted round-robin (WRR) arbitration. Besides fairness,
WRR arbitration provides flexibility in allocating bandwidth proportionally to
the importance of the traffic classes, unlike basic round-robin and
priority-based arbitration. The proposed approach first estimates the effective
service time of the packets in the queue due to WRR arbitration. Then, it uses
the effective service time to compute the average waiting time of the packets.
Next, we incorporate a decomposition technique to extend the analytical model
to handle NoC of any size. The proposed approach achieves less than 5% error
while executing real applications and 10% error under challenging synthetic
traffic with different burstiness levels.Comment: This paper is accepted in International Conference on Computer Aided
Design (ICCAD), 202