Abstract
Introduction
The continuous scaling of transistors and the increase in their frequency of operation has led to an increase in the overall power consumption of the chip. This increase in the complexity of the chip, in accordance with the Moore's Law, has also led to increased power densities within the die as shown in Figure 1 . Power densities over 100W/cm2 has become a matter of concern as frequency
Figure 1: Power density of Intel microprocessors families (http://www.nanowerk.com/spotlight/spotid=1762.php)
of operation has to be compromised in order to keep the power dissipation under a sustainable limit.
The basic concepts of NoC are provided in (Micheli et al, 2006) , (Nurmi, 2005) , (Dally et al, 2004) , (Dally et al, 2001 ) and (Rijpkema, 2001 ). In (Kariniemi, 2004) , different arbitration schemes are discussed. Virtual channel was discussed in (Kavaldjiev, 2004) . Arbiter centric design were exploited in (Santhosh, 2015) , (Jou et al, 2010) , (Wang et al, 2010) and (Teja et al, 2007) . Monemi et al, 2017 showed various types of arbiters e.g. Thermo Coded arbiter, PingPong arbiter and their proposed Ping-Lock arbiter. Shelke et al, 2012 discussed clock gating techniques for designing power efficient 2-D mesh NoC. Dynamic power reduction of digital circuits by clock gating was implemented in (Kaushik et al, 2013) . (Simunic et al, 2004) and (Lee et al, 2014) have shown techniques of reducing power of NoC by tracking of changes in the system parameters and Smart Power Saving respectively. Switching activities in the circuit leads to dynamic power dissipation. Higher frequency of operation is the major source of switching activity which results in increased dynamic power dissipation. Clock signal is a major contributor to dynamic power dissipation. Clock gating is a method used to reduce leakage power of a system. is a power saving techniques used in synchronous circuits. Figure 2 shows the traditional buffered worm hole based router microarchitecture. The VC allocator is used for reserving a buffer slot in the next router. The switch allocator choses any one of the Virtual Channels (VCs) from each input port of the router, and arbitrates all the selected channels for output port selection. These allocators are composed of arbiters which grant one input to be forwarded out of many requests. In our paper, we exploit the RoundRobin arbiter for reducing the power as this arbiter is strongly fair to all the input requests. 
Literature Review
In literature, various implementations of an arbiter have been cited. In (Artan et al, 2009 ), the authors have proposed a hierarchical structure for round robin arbiter and it was shown that as number of inputs increase, the energy consumption is lower than conventional ones. Shin et al, 2002 showed an implementation of round robin arbiter. Iterative round-robin algorithm (iSLIP) (Mckeown et al, 1999 ) and a dual round-robin matching (DRRM) algorithm (Chao et al, 1998) discussed different round robin algorithms. More than 50% of the dynamic power may be due to the operational need of clock buffers and their toggle rate is the highest among all the components (Kaushik et al, 2013) . In this paper, they have exploited the concept of providing the clock to the sequential circuits from an AND gate where one input of the gate was clock and the other was from a controlling circuit. This concept is used in this paper for clock gating. 
Round Robin Arbiter
A round-robin arbiter is a fair arbiter which, after granting a request, assigns the lowest priority to that. This can be accomplished by generating the next priority vector p from the current grant vector g. In Verilog, this logic is given by: assign next_p = |g ? {g[n-2:0],g[n-1]} : p ; In a four-bit round-robin arbiter. If a grant was issued on the current cycle, one of the gi lines will be high, causing pi+1 to go high on the next cycle. This makes the request next to the one receiving the grant highest priority on the next cycle, and the request that receives the grant lowest priority. If no grant is asserted on the current cycle, any g is low and the priority generator holds its present state. The round-robin arbiter exhibits strong fairness. After a request is served, it is given the lowest priority. All other pending requests will be serviced before priority again rotates around so that it can be serviced again. Figure 3 shows the block diagram of a 4X4 round robin arbiter. Figure 4 shows the clock gating technique for the counter by inserting one AND Gate. Figure 5 shows the output of counter. From Figure 5 we have observed that when counter is positive edge triggered and enable is changing starting from positive edge to the next positive edge, counter increments one extra time, due to tiny glitch, it gives a wrong output. Figure 6 . The enable signal is applied through a latch. However, the delay of the logic for the computation of En may fall on the critical path of the circuit and its effect must be taken into account during time verification. Clock gating of negative edge counter using negative Latch Based AND gate Circuit. The corrected waveform using latch is shown below Figure 7 . The waveform due to former and later is illustrate by the figure 8.
Implementation
Round-robin token passing has a strong fairness associated to it.. The worst-case wait time is proportional to number of requestors minus one. In each cycle, one of the masters (in round-robin order) has the highest priority (i.e., owns the token) for access to a shared resource. 
Simulation Results
The various blocks in the clock gating Round-Robin arbiter is written separately using Verilog. Table 2 and figure 11 show the power dissipation of different techniques. From the results, it is clear that this clock gating technique has a significant effect on the power consumed especially when a latch is inserted in the logic path. Power saving as compared to traditional arbiter is 65.20% with the latch. Also from the simulation result, it can also be seen that this technique has negative or little effect on the static power but a stronger effect on the dynamic power as expected. 
Conclusion & Future Scope
This paper has provided an incorporated solution for clock gating Round Robin arbiter generator(C-RAG) design (4X4). Clock gating technology can reduce the consumption of clock signals' switching power of flip-flops. The generated BA using Clocks gating Round Robin arbiter generator (C-RAG) is low power, fair, fast, and has a low and predictable worst-case wait time. The clock gate enable functions can be identified by Boolean analysis of the logic inputs for all Clocks gating Round Robin arbiter generator(C-RAG). However, the enable functions of clock gate can be further simplified, and the average number of Clock gating Round Robin arbiter generator(C-RAG) driven by enable functions can be improved. Clock gating Round
Robin arbiter generator(C-RAG) design (4X4) is simulated using ISE Design Suite 14.2 software. The generated arbiter is fair, fast, and has a low and predictable worst-case wait time.
The various blocks of the design, each of which is being modelled in Verilog, i.e. logically verified, and synthesized. In this design case, the power optimization and further, different aspect This paper does not provide the effect of extra circuitry on the area and performance information of the overall network. Work needs to be done in these aspects to see the effect on area as well as network throughput and delay.
