Since Network-on-Chip (NoC) is proposed to act as the communication infrastructure for many-core architecture, it has become one of the most investigated research topics. The term "saturation" is always used while discussing about performance and power figures exhibited by a NoC. Router buffer utilization is one of the key factors to understand network saturation status. In this paper, we carry out detail study on the router buffer utilization under various traffic load. According to our study, the network packets are classified into three types: destination packets, passing packets, and deterring packets. These packets will make different contribution to the router buffer occupation.
Introduction
As traditional ad-hoc or bus interconnect could not effectively provide information transferring service for a large number of processing cores which are integrated into a single chip (SoC), Network-on-Chip (NoC) has been proposed to convey information within a single chip [1, 2, 3, 4, 5, 6] .
In many NoC research, network saturation status is widely considered. The concrete number of researches related with saturation is too large to be counted. In general, the network status is closely related to the buffer space occupancy level. In this paper, we carry out detail study to analyze the variations of a single router buffer space occupancy level. According to our study, the network packets are classified into three types: destination packets, passing packets, and deterring packets. These packets will make different contribution to the router buffer occupation. If a router only receives destination packets, its buffer utilization will be kept at very low level. If a router receives both destination packets and passing packets, and the passing packets will be blocked by other flow, then its buffer utilization will be gradually used up. Likewise, deterring packets may decrease a router's buffer utilization.
Although a large variety of NoC topologies have been presented in the literature, 2D mesh is the most widely studied [4, 8] . A huge number of routing algorithms are proposed for 2D mesh topology to overcome congestion and improve performance. In 2D mesh topology, some paths have to be prohibited in order to avoid deadlock. Routing algorithms could be classified into deterministic routing in which each communication pair has only one path and adaptive routing in which more than one path is allowable for the communication pairs. If all paths provided by the topology are allowable the routing becomes fully adaptive.
Study on Network Status
In NoC, the wormhole switching is the most widely adopted switching technique [7] . In wormhole switching, each packet is sliced into a set of flits, whose size usually equals the physical channel bandwidth. The first flit is called header flit and the last flit is called tail flit. The remaining flits are body flits. When the header flit of a packet reaches a node, the routing logic is responsible to choose the output channel and reserve it if it is free. Then, the body and tail flits will follow the reserved channel. Finally, the tail flit will release the reserved channel. The packet will be transmitted in a pipeline way. Any other packets which contend the same output channel have to be blocked until it is released. The blocked packets may span several routers. Once that time comes, the header flits of all blocked packets can be dispatched to contend the output channel.
The main property of wormhole switching is that every router requires smaller buffers since a blocked packet could be temporarily stored in the buffers of different routers.
When adopting wormhole switching, network saturation has special characteristics. This section analyzes the network router buffer utilization through simulation.
In an M×N 2D mesh NoC, each node is denoted as a point in a 2D coordinate system. The top left corner node is the origin of the coordinate system. The X axis is the horizontal direction, and Y axis the vertical direction. The X axis positive direction points to east, Y axis positive direction points to south. Each node is represented by the coordinate (x, y), where 0≤x<M and 0≤y<N. Every node also has an identifier (ID) calculated by the formula: ID = y * M + x.
The 8×8 mesh network is depicted in Figure 1 , each node's ID is labeled. In this section, the simulation setup is summarized in Table 1 . The network payload is regulated by the packet injection rate (PIR). For example, each node with PIR of 0.2 injects two packets into network in every 10 cycles, on average. Each simulation runs 50,000 cycles after 1,000 cycles of warm-up period. Router buffer utilization is one of the key factors to understand network saturation status. In this section, we count the router buffer utilization under various traffic load.
When there is a single data flow, the source node and destination node are 32 and 39 (shown in Figure 1 ), respectively, the router buffer utilization is depicted in Figure 2 . When the PIR is increased from 0.04 to 0.08, the average packet latency is increased from 18 to 1886. In that process, the router buffer utilization is maintained at a very low level, ranging from 5% to 6%. This is due to the fact that when there is a single data flow, the generated packets are cached in the source node's input queue. Then the cached packets are gradually sliced into flits and serialized into the network router. In Noxim simulator, it takes two cycles to serialize a flit. When the router receives a flit, it will forward the flit to the downstream router. It also takes two cycles to forward a flit. The router could timely forward the flits it receives. The router buffer will not be occupied by blocked flits. Only one buffer slot is used to temporarily save a received flit. Consequently, the buffer utilization is very low (the edge routers have a little larger utilization because of their smaller total buffer slots).
As PIR increases, although no congestion occurs at the router, it will take thousands of cycles for the packets to reach destination node. During this time, the packets are not congested in the network router, but in the source node's input queue. When there are two contending data flows, one is from 32 to 39, the other is from 33 to 39, the buffer utilization is shown in Figure 3 . The two data flows come from the west input port and local input port of router 33. They contend the east output port or router 33. When PIR is increased from 0.02 to 0.06, the average packet latency is increased from 27 to 4784.
When both the west input port and local input port of router 33 have flits trying to pass through its east output port, only one could acquire it. The other flits have to wait in the corresponding buffer. As PIR increases, the waiting flits will gradually consume all the buffer slots of the west and local input port of router 33. Because of the back pack pressure, the local buffer slots of router 32 will also be used up. Consequently, the maximum buffer utilization of router 33 is 40\%, and 25\% for router 32.
For the routers from 34 to 39, their receiving speeds of flits equal the forwarding speeds. Consequently, their buffer utilizations are all small. Next, we will examine the variation of the destination router's buffer utilization. The two data flows are from 35 to 28 and from 37 to 28, respectively. The two data flows come from the west and east input port of router 36. They contend the north output port of router 36. When PIR increases from 0.02 to 0.05, the average packet latency increases from 18 to 3566. The router buffer utilizations are depicted in Figure 4 . The two contending data flows will use up all the buffer slots of router 36's west and east input port, making its buffer utilization reaches the maximum of 40%. Due to the back pack pressure, the local input ports' buffer slots of router 35 and 37 will also be used up. The two router's buffer utilization will also reach the maximum of 20%.
However, no matter how serious the congestion is, the buffer utilization of router 28 that is the destination router is always small.
Next, we will show when a destination router's buffer slots will also be used up. In addition to data flows from 35 to 28 and from 37 to 28, two data flows are added, from 44 to 20 and from 29 to 20. The router's buffer utilizations are shown in Figure 5 . The average packet latency increases from 13 to 4172 as PIR increases from 0.01 to 0.05. Router 28 is two data flow's destination. It also forwards packets belong to data flow from 44 to 20. Data flows from 44 to 20 and from 29 to 20 will contend the north output port of router 28 and the contention will lead to congestion. When the output port is reserved to a packet from 29 to 20, the flits that belongs to data flow from 44 to 20 will be blocked in router 28's south input port. In this way, buffer slots of router 28's south input port will also gradually be used up although it is the destination of two data flows. The buffer utilization of router 28 increases from the minimum of 5\% to the maximum of 40%.
From these examples, we can observe that a router's buffer occupancy level could be maintained at different level, depending on the received packets. The packets that are received by a router could be classified into two categories: destination packets and passing packets. Destination packet, are those intended to the router. For example, packets both from 35 to 28 and from 37 to 28 are destination packets for router 28. Passing packets are those pass through the router. For example, packets from 44 to 20 are passing packets for router 28.
If a router only receives destination packets, its buffer utilization will be kept at very low level. If a router receives both destination packets and passing packets, and the passing packets will be blocked by other flow, then its buffer utilization will be gradually used up. In addition to these two flows, other packets will also affect a router's buffer utilization. The next example will present it.
In this example, seven data flows denoted by their source and destination node are included: 35-28, 37-28, 44-20, 29-20, 35-37, 37-35, 44-36. The first four are the same with the previous example. We compare the buffer utilization of router 28 in these two examples. The results are shown in Figure 6 . The previous example with four data flows is labeled as four flows and this example is labeled as seven flows.
In the seven data flows example, data flows 35-37, 37-35 and 44-36 could block packets that try to reach router 28's south input port. Consequently, the buffer utilization of router 28 is lower than that of four flows. This example shows that a router's buffer occupancy level could be decreased by another packet type: deterring packets. Deterring packets are those deter packets from reaching the router For example, packets that belong to 35-37, 37-35 and 44-36, are deterring packets for router 28. They could deter packets from such as 35-38, 37-28, 44-20 etc. reaching router 28. Consequently, deterring packets may decrease a router's buffer utilization.
Conclusions
In this paper, we carry out detail study to analyze the variations of a single router buffer space occupancy level. According to our study, the network packets are classified into three types: destination packets, passing packets, and deterring packets. These packets will make different contribution to the router buffer occupation. If a router only receives destination packets, its buffer utilization will be kept at very low level. If a router receives both destination packets and passing packets, and the passing packets will be blocked by other flow, then its buffer utilization will be gradually used up. Deterring packets may deter packets to reach their destinations. Consequently, then may decrease a router's buffer utilization.
