Packet-based Adaptive Virtual Channel Configuration for NoC Systems  by Gharan, Masoud Oveis & Khan, Gul N.
 Procedia Computer Science  34 ( 2014 )  552 – 558 
1877-0509 © 2014 Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/3.0/).
Selection and peer-review under responsibility of Conference Program Chairs
doi: 10.1016/j.procs.2014.07.069 
ScienceDirect
Available online at www.sciencedirect.com
2014 International Workshop on the Design and Performance of Network on Chip                
(DPNoC 2014) 
Packet-based Adaptive Virtual Channel Configuration for             
NoC Systems 
Masoud Oveis Gharan and Gul N. Khan*
Electrical and Computer Engineering, Ryerson University, Toronto, ON M5B 2K3 Canada  
 
Abstract 
Growing number of on-chip cores requires the introduction of an efficient communication structure such as NoC. In NoC 
design, the channel buffer organization facilitates the use of Virtual Channels (VC) for on-chip communication. A VC structure 
can be categorized as static or dynamic. In a dynamic VC structure, variable number of buffer-slots can be employed by each VC 
according to different traffic conditions in the NoC. We introduce a Packet-Based Virtual Channel (PBVC) scheme, where a VC 
is reserved when a packet enters the router and released when the packet leaves. A VC will hold the flits of only one packet at a 
time that subsequently removes the Head-of-Line blocking. PBVC technique is an amended version of dynamically allocated 
multi-queue schemes where, an input or output port employs a centralized buffer whose slots are dynamically allocated to VCs. 
The experimental results show that our approach improves network latency and throughput as compared to other VC designs. 
© 2014 The Authors. Published by Elsevier B.V. 
Peer-review under responsibility of the Program Chairs of FNC-2014. 
Keywords: adaptive virtual channels; head-of-line blocking; NoC; on-chip communication; VC organization. 
1. Introduction 
In NoC (Network-on-Chip) domain, wormhole routing is mainly employed for communication among various 
cores of NoC1. The flits of a packet are stored in the channel buffers. The header flit of a packet starts from the 
source core router and passes through a series of routers reserving the route path. This type of routing can cause 
traffic congestion. The blocking of one packet leads to the blocking of other packets queued for the channel. This 
blocking is known as Head-of-Line (HoL) blocking that increases the latency and results in lower throughput and 
buffer utilization. HoL problem can be alleviated by employing Virtual Channels2. The VC mechanism enables the 
multiplexing and buffering of several packets in a single router channel3. However, it does not remove HoL blocking 
                                                          
* Corresponding author. Tel.: +1-416-979-5000 x6084; fax: +1-416-979-5280. 
E-mail address: gnkhan@ryerson.ca 
  Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://cr ativecommons.org/licenses/by-nc-nd/3.0/).
Selection and peer-review under responsibility of Conference Program Chairs
553 Masoud Oveis Gharan and Gul N. Khan /  Procedia Computer Science  34 ( 2014 )  552 – 558 
Fig. 1. Reserved buffer space for virtual channels. 
VC0 
Physical channel P0  P1 P1   P3 P3 P1 P1       
VC1 VC2 VC3 Free 
completely4,5,6. Queue structure is an important component of NoC router that stores the flits of message packets. It 
is the main storage of a router structure that temporarily stores packet flits by following the First-Come-First-Serve 
(FCFS) order until network resources become available. Sometimes ‘queue’ and ‘FIFO’ terms are used 
interchangeably. Many queue architectures have been proposed such as FIFO queue, circular queue, Dynamically 
Allocated Multi-Queue (DAMQ) and their variants7,8. DAMQ is a single storage array that maintains multiple FIFO 
queues4. It can provide a solution to the HoL blocking problem10. The packets are stored in the queues of a multi-
queue of the output port for routing. Therefore, in case the current packet faces a blockage, the other packets 
destined to that output port can become blocked. Choi and Pinkston claimed that this type of blocking is not HoL10. 
However, we can argue that in fact this is HoL blocking. Even, HoL blocking can also happen inside the queue of a 
DAMQ buffer2,11. We present a new methodology that entirely removes the effect of HoL blocking. 
2.  DAMQ based VC organization  
A DAMQ buffer organization scheme for all the virtual channels (VCs) has been presented in the past6. In Fig. 1, 
two buffer slots are reserved for each virtual channel before the buffer accepts an incoming flit. The rest of the buffer 
slots are free to be employed by any of the VCs according to the traffic demand. Various research groups have 
presented the design and implementation of DAMQ buffers2,4,5. These designs are either expensive in terms of 
hardware or inefficient due to data dependencies. A centralized shared buffer architecture called Virtual Channel 
Regulator (ViChaR) is introduced by Nicopoulos and others2. ViChaR requires a large size of arbiter that can create 
latency bottlenecks in the critical path and may limit the overall NoC frequency11. It dynamically allocates VCs on a 
First-Come-First-Served basis, and there is no priority for the new packets. In the case of blocking, a packet can 
occupy all the slots of a channel buffer and prevents any new packet to pass through the router. If the blocking of 
that packet continues, all the upstream routers will be occupied by the packet, and a new packet may not pass 
through the routers on that route. Another drawback of ViChaR is its huge hardware for some configurations. 
Two more approaches are introduced by Evripidou et.al involving DAMQ mechanism11. These are named as 
Mask-based and Link-List-based techniques. The Mask-based approach is cheaper in terms of hardware but it is of 
synchronous nature and its performance is low. The Link-List-based approach that mimics DAMQ organization is 
expensive however, it leads to higher performance. All the existing DAMQ based VC buffer structures suffer from 
HoL blocking except ViChaR that impose high latency and expensive hardware overhead. Therefore, a general 
solution to remove HoL blocking from DAMQ buffer organization is needed. All of the above points have motivated 
us to investigate and design the PBVC approach for avoiding HoL blocking. 
2.1. Conventional wormhole VC organization  
Conventional Wormhole Virtual Channel (CWVC) mechanism is a common approach used in most of the existing 
VC based NoCs being researched. It is usually used as a basic approach to be compared with the other new 
approaches5,8,9,12,13. In Fig. 2, the VC implementation of a physical channel is illustrated for a conventional static 
queue structure. The buffer slots in static queue structure are statically allocated to incoming packets where the 
buffer slots are dynamically allocated to incoming packets in DAMQ organization. In CWVC mechanism, when the 
header flit of a packet enters a VC buffer, the VC is reserved by the packet. The reservation of the VC is kept until 
the tail flit arrives. Then the VC can accept a new packet and it might have flits from previous packet. In this way, it 
can contain two parts of two packets at a time. When a VC has at least two parts of two packets, the blocking of HoL 
packet blocks the 2nd packet even if the route of the other packet is open. This is the main source of HoL blocking. 
 
 
 
 
 
  
554   Masoud Oveis Gharan and Gul N. Khan /  Procedia Computer Science  34 ( 2014 )  552 – 558 
Fig. 2. Input-port with static queue.  
Physical Channel 
32 32 
Read-Pointer VC3
Read-Pointer VC2
Write-Pointer VC3
Write-Pointer VC2
Write-Pointer VC1 Read-Pointer VC1
Write-Pointer VC0 Read-Pointer VC0
 P0 P0 P0 
P1 P1 P1 P1 
P2 P2 P2 P2 
P2 P2 P2 P2 
 
 
 
 
 
 
 
 
 
 
For DAMQ based VC buffers, the depth of VC dynamically varies according to the traffic situation. To better 
illustrate the structure and operation of DAMQ buffers, two different cases are illustrated in Fig. 3. In the first case, 
the depth of VC0 is thirteen while in the 2nd case, VC0 depth is zero. The varying depth of VC0 can be due to 
different traffic conditions. This property of DAMQ buffer where a VC can occupy all the buffer space can lead to 
lower performance. In the case of DAMQ based CWVC organization, there may be two situations of lower 
performance. First of all, when a packet is blocked in a VC and the other packets can move into that VC buffer, the 
VC becomes bigger and occupies all the space of buffer. This prevents the other VCs to perform efficiently due to lack 
of buffer space. Secondly, this blockage leads to the blocking in upstream routers (related to the blocked packets). In 
other words, the buffer blockage will spread to the other parts of NoC causing global congestion. 
2.2. PBVC approach  
Our PBVC approach provides an effective solution to remove HoL blocking. The methodology is suitable for 
DAMQ buffer organization. In this approach, when the header flit of a packet enters a VC, the VC is reserved by the 
packet. The VC becomes free when the tail flit of the packet exits. The VC does not accept a new packet until the 
VC buffer is holding any flit of the existing packet. In this way, our PBVC methodology removes HoL blocking 
completely as only one packet can occupy a VC at a particular instance. Its main features are listed below: 
x The chances of getting a free VC for unblocked packets of a channel in the PBVC approach are much higher 
than the CWVC mechanism. Consider two situations of Figures 4 and 5. In Fig. 4 that represents CWVC 
organization, the packet 4 and 5 remain blocked until packet-0 is facing a blockage. For PBVC organization, 
when one of the VCs (i.e. VC1, VC2 or VC3) becomes free, the packet 4 and 5 can occupy the buffer as 
illustrated in Fig. 5. 
x In the case of our PBVC approach, when a packet faces a blockage, its VC gets minimum space from the 
channel buffer. Consider a situation of packet blockage as shown in Fig. 6. For the CWVC scheme, the upstream 
router continues sending new packets to the VC and it continues allocating more buffers and can occupy all the 
free area of DAMQ buffer as illustrated in Fig. 6a. In the case of PBVC, the new packets stay in the upstream 
routers until a VC becomes empty. In fact, more free space is available for the other unblocked VCs as shown in 
Fig. 6b. Consequently, PBVC performance and buffer utilization is better under such scenarios. 
x We expect that the packets reach their destinations in a sequential order. In the traditional CWVC mechanism, if 
any packet of a series is faced with HoL blocking, the following packets of the series can travel via a free VC 
and reach the destination before the blocked packet. However, our PBVC approach avoids HoL blocking and 
each packet of a series will reach the destination in order. 
 
 
 
 
 
  
Fig. 3. Two different cases in a DAMQ buffer for 4 VCs/channel and 4 flits/packet. 
Header flit: 
Body flit: 
Tail Flit: 
H 
B 
T 
VC3 VC1 VC0 VC2 Free 
T3 B3 B3 H3 
(b) 2nd case 
T1 B1 B1 H1 T2 B2 B2 
VC3 VC1 VC0 VC2 
B2 T1 H3 H6 T5 B5 B5 H5 T4 B4 B4 H4 T0 B0 B0 H0 
     (a) first case 
Physical 
Channel Physical Channel P4 P4 P5 P4 P4 P3 P6 P5 P2 P5 P5 P0 P1 P0 P0 P0 P3 P3 P2 P3 P2 P2 P3 P1 P1 P1 P1 
555 Masoud Oveis Gharan and Gul N. Khan /  Procedia Computer Science  34 ( 2014 )  552 – 558 
Fig. 4. CWVC: Packet-0 blockage causes HoL blocking for P4 & P5. Fig. 5. PBVC: Packet-0 blocked but no HoL blocking.   
VC1  VC2 VC0 
T2 B2 B2 H2 T1 B1 
VC3 
H3 H5 T4 B4 B4  H4 T0 B0 B0 H0 
P4 P4 P5 P4 P4 P3 P2 P2 P2 P2 P1 P0 P1 P0 P0 P0 Physical channel 
VC1 VC0 VC2 
T2 B2 B2 H2 T1 B1 
VC3 
H3 T0 B0 B0 H0 
Free 
     
P3 P2 P2 P2 P2 P1 P0 P1 P0 P0 P0 
 
 
 
 
 
 
 
 
 
 
 
x The buffer utilization of PBVC can be a bit lower than the CWVC approach. However, due to the adaptive 
nature of buffer, its utilization can be compensated. Moreover, in NoCs where the communication involves a 
large number of HoL blockings, PBVC mechanism will show better buffer utilization.  
3. PBVC router structure 
A DAMQ based CWVC 5x5 router architecture for mesh topology is given in Fig. 7. Generally, the micro-
architecture of a router consists of input and/or output ports, an arbiter and a crossbar switch. Each input or output 
port can utilize VCs to control flow and to enable the sharing of a NoC communication channel. We assume in this 
paper that a router has VCs at the input-ports level. The NoC router organization and architecture is similar for both 
CWVC and PBVC approaches. The only difference is the organization of their input-ports. The micro-architecture of 
a typical CWVC input-port is illustrated in Fig. 8. It contains an SRAM, five arrival/departure tables and other logic 
and control circuits. The SRAM module serves as the channel buffer. The word size of SRAM is equal to the packet-
flit size. The data pointed by the Read-Address will always appeared at the SRAM output. When credit-in is active, 
the data is stored in the SRAM slot pointed by the Write-Address. Five tables are used to implement a dynamic 
Link-List based CWVC mechanism. The VC state table keeps the records of the occupied VCs. The Header-List 
table keeps the addresses of channel buffer (SRAM) that point to the header flits of VCs. The Tail-List table keeps 
the addresses of SRAM buffers that point to the tail flits of VCs. The Link-List table keeps the address of next slot of 
each buffer slot in the SRAM. In fact, it links the address of flits of each VC in a FIFO manner. The Slot-State table 
keeps the records of occupied slots in the SRAM. In a CWVC router with reserved space for each VC, a credit signal 
is sent to upstream routers indicating the VC state in the down-stream router. When the capacity of a VC is full, the 
credit signal will change to close the VC. Assume that one slot is reserved per VC, the capacity of each VC is 
dynamic and it varies from one to M buffer slots dynamically, where: 
 M = SRAM slots ৮ Total VCs ৮ 1                             
Assuming a total of 16 buffer slots and four virtual channels, the dynamic capacity of each VC varies from 1 slot 
to 13 slots. In the dynamic buffer implementation, the credits are regulated only by two conditions as given below: 
 if {(Current VC is  Empty) OR (Free Slots < Free VCs)} then credit = ON 
In other words, a VC is open if it is empty or at least one slot is reserved for each free VC. These conditions 
guarantee that at least one slot is dedicated to each VC. Subsequently, the rest of the slots are dynamically used for 
all the VCs. It will also remove any starvation and protocol-level deadlocks in the NoC communication. 
A little-bit of additional hardware is required to implement the PBVC approach in each router. In fact, the coding 
of two "if statements" is required to open and close each VC as illustrated in Fig. 9a. The first "if statement" is used 
to close each VC when the arriving flit is a tail flit. The second "if statement" is used to open each VC when the 
exiting flit is a tail flit. To open and close VC, we use a single bit register, PBVC-Blk. It is set in the case of open 
and reset otherwise. PBVC-Blk output is inverted and ANDed with the credit out signal of the related VC. 
Therefore, when PBVC-Blk is set, the VC will be closed as shown in Fig. 9b. 
PBVC and CWVC routers are also implemented for two platforms: SYNOPSYS and FPGA. The SYNOPSYS 
P4 P4 P5 P4 P4 P3 P5 P5 P5 P2 P6 P0 P1 P0 P0 P0 
Fig. 6. HoL blocking due to a blocked Packet-0. 
(a) CWVC 
VC3 VC1 VC0 VC2 
B2 H6 T5 B5 B5 H5 T4 B4 B4 H4 T0 B0 B0 H0 H3 T1 
P3 P3 P2 P3 P2 P2 P3 P0 P1 P0 P0 P0 
(b) PBVC 
VC3 VC1 VC0 VC2 Free 
T1 T3 B3 B3 H3 T0 B0 B0 H0 T2 B2 B2 
556   Masoud Oveis Gharan and Gul N. Khan /  Procedia Computer Science  34 ( 2014 )  552 – 558 
platform uses 0.25μ technology to determine the power and area overhead for implementing PBVC approach. Both 
routers use a dual-ported SRAM memory for data buffering. The SRAMs have 16 slots with 16 bits in each slot. The 
power and chip area values of both routers are listed in Table 1. The PBVC router requires an extra 0.006% chip area 
and consumes 0.24% more power than CWVC. We have also investigated the amount of extra hardware for PBVC 
router when implemented in the FPGA platform. The PBVC router requires 0.4% more combinational logic and 
0.9% more registers as compared to CWVC router. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
        
4.   Experimental results  
The conventional DAMQ organization with a slot reserved per VC is employed in the CWVC mechanism and the 
same terminology is used. We simulated two different traffic patterns such as Random and HoL specific traffic. For 
Random traffic, all the destination cores are chosen randomly. HoL specific traffic creates a situation of higher HoL 
blocking conditions. By evaluating the results of these traffic patterns, the efficiency of our PBVC approach is 
demonstrated in this section. We setup our simulator for PBVC and CWVC modes in a DAMQ Link-List based 
router and input-port micro-architecture. We change the number of traffic packets and VCs to measure the 
 
Table 1. PBVC and CWVC router design. 
Router 
Type 
ASIC FPGA 
Area 
(μm2) 
Power 
(mW) 
Combinational 
Logic (elements) 
Registers 
(bits) 
CWVC 3420235 231 8991 2244 
PBVC 3420435 231.6 9032 2264 
Extra 
Hardware  0.006% 0.24% 0.4% 0.9% 
 
(a)  logic for if statements   
 In     Out 
 
PBVC-Blk 
 
One-bit 
Register 
> 
Credit-in 
tail 
grant 
tail 
“0” 
“1” 
PBVC-Blk 
Fig. 9. Extra hardware per VC for PBVC router input-port. 
PBVC-Blk 
Credit-out 
........ 
(b) logic for Credit-out 
Fig. 8. Micro-architecture of a CWVC router input-port. 
Arrival/Departure 
Tables 
 
  
 
                                       
 
 
  Grant 
Read-pointer 
Write-pointer 
Credit-in 
VC-ID 
VC-ID-local 
VC-State 
VC state 
0 1 
 1 1 
 2 1 
 3 0 
Header-List 
VC state 
0 2 
 1 1 
 2 7 
 3 - 
Tail-List 
VC state 
0 6 
 1 5 
 2 7 
 3 - 
Link-list 
slot state 
0 3 
 1 4 
 .. … 
15 0 
 
 
 Read-Address 
 
SRAM 
 
Write-Address 
 
 
                Data Out 
 Data In        
Buffer-full 
        Out 
 
Reg. 
 
 In Data_in 
Flit- info 
Data_out 
Grant Credit_in 
 
Slot-State 
 
slot state 
0 0 
 1 1 
.. .. 
15 1 
De 
coder 
 
VC-block 
VC-State 
VC state 
0 1 
 1 1 
 2 1 
 3 0 
Read-pointer Header-List 
VC state 
0 2 
 1 1 
 2 7 
 3 - 
Read-pointer 
Req 
Write-pointer 
Slot-State 
slot state 
0 1 
 1 1 
… … 
 15 1 
0 1 
2 
Request 
<< “00001” 
3 
4 
Clk 
VC-ID-local 
Buffer-full 
Router 
Link S
Link N
Link W
Link L
Link E 
Link S 
Link L 
Link N 
LinkW 
Link E
VC_ID 
Credit in 
Fig. 7. A 5×5 router architecture for 2D-mesh NoCs. 
Credit_out 
VC_ID 
Buffer_full 
 
In N                   out N 
 
 
In E                  out E 
Crossbar 
In S                   out S 
 
In W                  out W 
 
In L                   out L   
 
Input Port E 
 Req N                 grant N 
     
Arbiter 
(VC and SW Allocator) 
             
 VC_block           VC_full 
                               
Aselect 
config 
Input Port N 
Input Port S 
Input Port W 
Input Port L 
557 Masoud Oveis Gharan and Gul N. Khan /  Procedia Computer Science  34 ( 2014 )  552 – 558 
performance in terms of throughput and latency. The NoC topology selected is a 4×4 mesh with XY routing 
methodology. The communication of packets is in the form of wormhole switching where the channel width is equal 
to the flit size (32 bits). Each packet is made of 16 flits and the buffer depth for each input-port is 16 slots of a 
packet-flit each. Each flit is sent or received by a source core/router in two clock cycles. We assume that the link 
delays between routers are negligible as compared to the router delay. The performance of PBVC is compared with 
the CWVC approach during the following experiments and results are presented in Figures 10, 11, 12 and 13. 
In the case of Random traffic, all the sources, destinations and routers are clocked at the same rate (e.g. 1nsec). In 
the second experiment, the HoL specific traffic pattern is employed, where all the source cores send their first packet 
to one destination. The destination is set to be two times slower than the other destination cores. After sending the 
first packet, the rest of the packets of all the sources are sent randomly to all the destination cores. This condition 
increases the HoL blocking especially when the second or later packets are transferred. Our simulator is coded in 
Verilog and simulation is done by using the ModelSim for the Altera FPGA platform. In these experiments, the 
throughput is measured by the rate of packets received to the maximum number of packets (ideally) sent at a specific 
time. The latency is measured in terms of time that a specific number of packets are sent and received by the NoCs.   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
      
                       Fig. 12. Throughput for HoL specific traffic.                                             Fig. 13. Average latency for HoL specific traffic. 
% 
5% 
10% 
15% 
20% 
25% 
30% 
35% 
40% 
45% 
132 256 512 1024 1536 2048 
Th
ro
gh
pu
t (
ra
te
 o
f r
ec
ei
vi
ng
) 
Time (ns) 
CWVC VC4 PBVC VC4 
CWVC VC3 PBVC VC3 
CWVC VC2 PBVC VC2 
0 
500 
1000 
1500 
2000 
2500 
64 128 256 512 1024 
Av
er
ag
e 
La
te
nc
y 
( n
s )
 
Packets Sent 
CWVC VC4 PBVC VC4 
CWVC VC3 PBVC VC3 
CWVC VC2 PBVC VC2 
    
                             Fig. 10. Throughput for random traffic.                                                  Fig. 11. Average latency for random traffic. 
% 
10% 
20% 
30% 
40% 
50% 
60% 
70% 
132 256 512 1024 1536 2048 
Th
ro
gh
pu
t (
ra
te
 o
f r
ec
ei
vi
ng
) 
Time (ns) 
CWVC VC4 PBVC VC4 
CWVC VC3 PBVC VC3 
CWVC VC2 PBVC VC2 
0 
200 
400 
600 
800 
1000 
1200 
64 128 256 512 1024 
Av
er
ag
e 
La
te
nc
y 
( n
s )
 
Packets Sent 
CWVC VC4 PBVC VC4 
CWVC VC3 PBVC VC3 
CWVC VC2 PBVC VC2 
558   Masoud Oveis Gharan and Gul N. Khan /  Procedia Computer Science  34 ( 2014 )  552 – 558 
Figures 10 and 11 show the throughput and latency results in the case of Random Traffic. In the beginning of 
simulation (around 132 nsec), the performance of PBVC is much higher than that of CWVC, and as the time passes 
this advantage diminishes. This is due the fact that in the beginning of simulation, the traffic is not crowded, and 
when the HoL blocking occurs in a channel, the incoming packet can move out of the channel. This situation will 
improve the performance of PBVC approach. In the case of four virtual channel (i.e. VC4 case in Figures 10 and 
11), the throughput of PBVC is 2.6% higher and the latency is 8.2% lower than those of CWVC. In the second part 
of the experiment, both models are evaluated in a high contention environment. 
Figures 12 and 13 show the throughput and latency results for the HoL specific traffic where two scenarios are 
investigated. In the first scenario, the first packets of all the sources are intended for one destination, and afterward 
the packets travel to all the destinations randomly. In the second condition, the destination is two times slower than 
the other destinations. In this traffic pattern, the PBVC performance improvement is much better than CWVC. For 
1024 packets, the average latency is 40% less than CWVC, and the average throughput is 23% higher than CWVC 
for 2048 nsecs. This is due to a lot of HoL blocking occurring in the beginning of simulation. As the time passes the 
occurrence of HoL blockings will reduce, and the throughputs of two methods are going to be close to each other. 
Another important point is that as the number of VCs is reduced to two, the advantage of PBVC also diminishes. 
This is due the fact that when the HoL blocking occurs in PBVC and there are free VCs, the new packet passes 
through these free VCs that improves the performance. 
5.   Conclusions  
The architecture and structure of packet-based virtual channel (PBVC) approach is presented. PBVC buffer has 
its root in dynamically allocated multi-queue (i.e. DAMQ) buffers. We conclude that PBVC is able to completely 
remove HoL blockings in NoCs. To verify our claims PBVC and the conventional wormhole VCs (CWVC) are 
implemented using DAMQ buffers. In the experiments, two traffic patterns i.e. random and HoL specific traffic are 
applied to PBVC and CWVC based NoCs. Throughput and latency for PBVC based NoC are compared with those 
of CWVC based NoC. The performance results are obtained in varying number of VCs and traffic conditions. The 
PBVC results are better on average as compared to the CWVC. In the case of HoL specific traffic, the average 
latency and throughput improve for our PBVC approach as compared to traditional CWVC. 
References 
1. Yoo HJ, Lee K, Kim JK, Network-on-Chip based SoC. In: Low-Power NoC for High-Performance SoC Design. Boca Raton: CRC Press; 
2008.  p. 142–5. 
2. Nicopoulos CA, Dongkook P, Jongman K, Vijaykrishnan N, Yousif MS, Das CR. ViChaR: A dynamic virtual channel regulator for 
Network-on-Chip routers. In Proc. 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006. p. 333–46.  
3. Dally WJ.  Virtual-channel flow control. IEEE Transactions on Parallel and Distributed Systems, 1992; 3:194–205. 
4. Frazier GL, Tamir Y. The design and implementation of a multiqueue buffer for VLSI communication switches. In Proc. IEEE International 
Conference on Computer Design: VLSI in Computers and Processors, 1989. p. 466–71. 
5. Liu J, Delgado-Frias JG. DAMQ self-compacting buffer schemes for systems with Network-on-Chip. In Proc. International Conference on 
Computer Design, 2005.  p. 97–103.  
6. Tamir Y, Frazier GL. Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches. IEEE Transactions on Computers, 
1992; 41:725–37. 
7. Park J, O’Krafka BW, Vassiliadis S, Delgado-Frias J. Design and evaluation of a DAMQ multiprocessor network with self-compacting 
buffers. In Proc. Supercomputing, 1994. p. 713–22.  
8. Benini L, Micheli GD. Register designs for queuing buffer. In: Networks on Chips: Technology And Tools. San fransisco: Morgan Kaufmann 
Publishers; 2006. 
9. Choi Y, Pinkston TM. Evaluation of queue designs for true fully adaptive routers. Journal of Parallel and Distributed Computing, 2004; 
64:606–16. 
10. Xu Y, Zhao B, Zhang Y, Yang J. Simple virtual channel allocation for high throughput and high frequency on-chip routers. In Proc. 
International Symposium on High Performance Computer Architecture, 2010. p. 1–11.  
11. Evripidou M, Nicopoulos C, Soteriou V, Kim J. Virtualizing virtual vhannels for increased Network-on-Chip robustness and upgradeability. 
In Proc. IEEE Computer Society Annual Symposium on VLSI, 2012. p. 21–6. 
12. Zhang H, Wang K, Dai Y, Liu L. A Multi-VC Dynamically Shared Buffer with Prefetch for Network on Chip. In Proc. IEEE 7th 
International Conference on Networking, Architecture and Storage, 2012. p. 320–7. 
13. Oveis-Gharan M, Khan GN. A novel virtual channel implementation technique for multi-core on-chip communication. In Proc. IEEE Symp. 
Computer Architecture and High Performance Computing (WAMCA’12), 2012. p. 36–40. 
