INTRODUCTION
Design complexity is increasing day by day in modern digital systems. Due to the reconfigurable architecture, low non recurring engineering (NRE) and ease of design Field Programmable Gate Arrays (FPGA) become a better solution for managing increasing design complexity. They are very much suitable for the application that requires both performance and flexibility.
For the past few years, FPGAs witnessed an increase in market shares by providing a fast time-tomarket alternative for ASICs. Due to the scaling trends and to support reconfigurability, FPGA uses more transistors which increase the leakage current. As we know that leakage power is proportional to the total number of transistor count and so leakage optimization of FPGA becomes one of the major design challenges for future FPGA technologies.
The biggest challenge for FPGAs implemented in nanometer CMOS technologies is the increasing power dissipation and in particular, leakage power. Also, as we go down to technology ground bounce noise also become important metric of comparable importance to active power, delay and area for the analysis and design of battery operated devices.
Traditionally, leakage power reduction in FPGAs has been overshadowed by an interest in reducing the dynamic power dissipation and improving the overall performance. Recently, several research projects have been conducted to mitigate the leakage power reduction in FPGAs. The most popular of these techniques employs dual Vdd, transistor sizing, dual Vth, body biasing, multihreshold CMOS (MTCMOS), and input vector forcing.
Shortening the gate length of a transistor increases its power consumption due to the increased leakage current between the transistors source and drain when no signal voltage is applied at the gate [1] , [2] . In addition to the sub threshold leakage current, gate tunneling current also increases due to the scaling of gate oxide thickness. Each new technology generations results nearly a 30xincrease in gate leakage [2] , [4] . The leakage power is expected to reach more than 50% of total power in sub 100nm technology generation [5] . Hence, it has become extremely important to develop design techniques to reduce static power dissipation during periods of inactivity. The power reduction must be achieved without trading-off performance which makes it harder to reduce leakage during normal (runtime) operation. On the other hand, there are several techniques to reduce leakage power [6] .
Power gating is one such well known technique where a sleep transistor is added between actual ground rail and circuit ground (called virtual ground) [7] , [8] , [9] , [10] . This device is turned off in the sleep mode to cut-off the leakage path. It has been shown that this technique provides a substantial reduction in leakage at a minimal impact on performance [11] , [12] , [13] and further peak of ground bounce noise is possible with proposed novel technique.
The biggest challenge for FPGAs implemented in nanometer CMOS technologies is the increasing power dissipation and, in particular, leakage power and ground bounce noise. Leakage power dissipation exponentially increases with the CMOS process scaling and is expected to dominate the total chip power dissipation in deep submicron CMOS process. To mitigate the leakage problem, power gating is widely implemented to suppress the leakage power in standby mode. Sleep transistors (ST), which are used to gate the power, deteriorate the noise characteristic of the circuits because their drain to source voltage drop changes the virtual rails of the circuit [14, 15, 16] . Furthermore, during the mode transitions: especially from sleep mode to active mode, the power gating schemes cause large power and ground bounce that greatly affects the reliability of the circuits nearby in a mixed signal design. It is therefore essential to consider using techniques such as power gating to address the problem of ground bounce in low-voltage CMOS circuits [17] - [21] . Ground bounce noise is the voltage induced at the internal power/ground connections due to the switching currents passing through the parasitic inductance associated to the bonding and package lead wiring. Ground bounce noise has been a phenomenon traditionally associated to input/output buffers. Buffers are typically used to drive large capacitive loads. Due to this fact large currents flow through the parasitic inductance and significant voltage glitches are induced at the internal power/ ground connections. Switching noise affects the performance of the integrated circuits. Taking into account technology trends ground bounce due to internal logic has became an important issue in the design of high performance integrated circuits. This is mainly due to the increased speed and higher density in scaled-down technologies. Device targeted for mobile applications typically require standby currents in the range of 10's to 100's of µA. Current low cost FPGA's consumes up to 100mW of standby power. So if we can decrease the leakage power of FPGA's that it can be used for fast growing mobile IC applications. Also, by proper leakage power and ground bounce noise analysis of different deep submicron FPGA devices we can use them in low power wireless, biomedical and other battery operated devices applications. This paper focuses on reducing leakage power consumption and ground bounce noise of low power FPGA benchmark circuit using power gating scheme. We provide a design which did a significant reduction in standby leakage and ground bounce noise so that it can be used for battery operated devices.
The rest of the paper is organizes as follows: In section II we discuss the background and related contribution in the field of FPGA. Section III gives the comparison analysis of all enhanced high performance power gating techniques for ground bounce noise, standby leakage power, average power and delay. Section IV provides simulation results of presented high performance power gating techniques and also did the performance analysis of LUT on different FPGA devices and finally section V provides the conclusion part.
II. RELATED WORK
A variety of power reduction techniques have been proposed in literature. The basic idea of the leakage power is described in [22] . The simulation methodology accounts for design. Here a detailed leakage power of a low cost 90nm FPGA is described by device level simulation. The paper describes about the percentage of resource utilization in FPGA and the total power consumption of a particular configurable logic block (CLB). Many of the recent works also described FPGA power consumption [23] - [26] and have shown that power consumed by the current FPGA device is increasing, with such devices consuming watts of power. In [27] authors introduces the concept of high threshold sleep transistor into the N-network (or P-network) of CMOS gates. Sleep transistors are power ON when the circuit is used in active mode and OFF when the circuit is in standby mode and hence decreasing the power of the circuit.
In [28] author provides a detailed performance analysis of a low power and high speed DTMOS based 4-input multiplexer switch circuit for FPGA applications. Proper and efficient sizing of all the required transistors are done which achieves improved performance in terms of delay and optimum power delay product (PDP), so that it can be used for fast growing low power and high performance applications.
In [29] the basic idea of the DTMOS operation for ultra low voltage VLSI circuits is described. The drawback of the paper is that the DTMOS threshold voltage drop as the gate voltage is raised, which results in a much higher current drive than a conventional MOSFET. In [30] high speed buffer circuit is used in order to minimize delays. In this analyzed transistors are sized to minimize the propagation delay through the switch and to balance rising and falling transition time. In this, analyzed transistors are sized to minimize the propagation delay through the switch and to balance rising and falling transition time.
In [31] , this is based on the fact that a circuit's leakage depends on its input state. A specific input vector is identified that minimizes leakage power in a circuit. The vector is then applied to circuit inputs when the circuit is placed in standby mode. This method reduces leakage in some circuits up t0 70%.
In [32] a detailed performance analysis of low power and high speed LUT has been done using a circuit technique. Proper sizing of all the sleep transistors are done in the LUT to achieve an optimum power -delay relationship. This design saves 12.8% of average power in high speed mode and 56.7% in low power mode. Many of the other recent works also described FPGA power consumption and have shown that power consumed by the current FPGA device is increasing, with such devices consuming watts of power. In this paper we focused on the power gating techniques to improve the leakage and ground bounce noise to get FPGA for battery operated devices.
III. LOW LEAKAGE LOW GROUND BOUNCE NOISE REDUCTION AWARE POWER GATING TECHNIQUES
A.
Basic LUT Structure of Benchmark circuit
The fig. 1 shows the basic LUT design of benchmark circuit 74182. This design is considered as the conventional case for all comparisons. This section provides the different leakage current and ground bounce noise reduction power gating techniques.
Stacking power gating technique
Here we present a benchmark circuit ( fig.2 ) using stacking power gating technique. In this technique, stacking sleep transistors are used to reduce the magnitude of peak current and voltage glitches in power rails i.e. ground bounce noise. In this technique, the leakage current is reduced by the stacking effect, turning both MSL1 and MSL2 sleep transistors off. Here, we apply the SELECT input in a manner by which the ground bounce noise is minimum. This is achieved by adjusting the value of ∆T (this is the delay introduced to the SL signal using delayed buffer) which gives the summation of ground bounce noises of these two transistors minimum. When the value of ∆T is half of the oscillation period of the ground bounce noise then the positive peak of the ground bounce noise superimposes with the negative peak thereby bringing it closer to zero. 
Diode based stacking power gating technique
If we incorporate the strategy which is operating the sleep transistor as a diode in stacking power gating leads diode based stacking power gating [15] - [19] . Stacking sleep transistors (T1, T2) are used in diode based stacking power gating scheme shown in fig.3 reduce the magnitude of peak current and voltage glitches in power rails i.e. ground bounce noise. The diode based stacking power gating scheme consists of 5 parts: " Fig.3 ": Look Up Table with diode based stacking power gating technique  Transistors T1, T2 are the sleep transistors which are high Vt transistors for less leakage current.  Transistor S1 is a control transistor used to make the sleep transistor S1 working as a diode during mode transition.  TG1 is the transmission gate.  Tn time delay provided for T1 and T2.  C2 is the capacitor inserted in the intermediate node VGND2. In this scheme, 3 strategies have been used to reduce the peak of ground bounce noise and leakage current. 1. Making the sleep transistor working as a diode during mode transition for some period of time due to this limitation in large transient hence reduction in the peak of ground bounce noise. 2. Isolating the ground for small duration during mode transition this was achieved by delay circuitry. 3. Turning ON the T2 transistor in linear region instead of saturation region to decrease the current surge was achieved by a capacitor placed in intermediate node.
There are several benefits of combining stacked sleep transistors with capacitors. First, the magnitude of power supply voltage fluctuations/ground bounce noise during mode transitions will be reduced because these transitions are gradual. The leakage current is reduced by the stacking effect, by turning both T1 and T2 sleep transistors off. Whereas, in terms of peak of ground bounce noise the technique works in two stages.
1. In first stage sleep transistor T1 works as diode by turn on the control transistor S1which is connected across the drain and gate of the sleep transistors T1. This reduces the voltage fluctuation on the ground and power net and it also reduces the circuit wakeup time. In sleep to active transition mode, we are turning on transistor T1 initially, after small duration of time transistor T2 will be turned on to reduce the ground bounce noise. 2. In second stage control transistor is off so that sleep transistor works normally. During mode transition, T1 is turned on and transistor T2 is turned on after a small duration of time Tn. When the circuit is going from sleep to active mode, there exists a two stage procedure. The two stage procedure is common for both the sleep transistors but operate with a time delayed by half the oscillation period. We delays the activation time of one of the sleep transistors relative to the activation time of the other one by a time that is equal to half the resonant oscillation period. In stage I, the transmission gate SG1 is turned off and the sleep control signal is cut off, the input node of the sleep transistor SG1 is a floating node. And at the same time, the control transistor CT1 is turned on to make sleep transistor SG1 working as a diode. The stored charge in the cluster1 is discharged through the sleep transistor SG1.
Diode based staggered phase damping power gating technique
We follow the same procedure for the sleep transistor SG2 also. The noise induced by the first sleep transistor SG1 is similar to that induced by the second sleep transistor with a phase shift. This phase shift suppresses the overall power mode transition noise. And same signals are applied to second cluster2 also but with duration of half of the oscillation period as calculated. As a result, noise cancellation occurs once the second sleep transistor SG2 turns on due to phase shift between the noise induced by the second transistor hence reduction in peak of ground bounce noise.
IV. PERFORMANCE ANALYSIS AND SIMULATION RESULTS
The simulation setup has been done for look up table (LUT) of benchmark circuit 74182 (carry look ahead adder) for characterization of peak of ground bounce noise, standby leakage current and average power with different power gating techniques. In this paper look up table of benchmark circuit 74182 has been designed based on above mentioned techniques and comparisons have been done with the conventional case. All the simulation results were taken in Cadence Virtuoso analog design environment tool in 45nm technology. The detailed simulation set up used was: 1)
Tool -Cadence 2)
Module -Virtuoso Analog Design Environment 3)
Simulator -Spectra 4)
Technology -45nm 5)
Technology File -gpdk045 6)
Power supply -0.7V Table I depicts average power of LUT design for different power gating techniques. Diode based stacking power gating technique has minimum average power as compared to the conventional case. Here the conventional case means the LUT without sleep transistor. This reduction is almost 99.6% as on comparison with other power gating techniques.
Average Power

Standby Leakage Current
Standby leakage current of LUT of benchmark circuit 74182 is evaluated in this section. Standby leakage power is measured when the circuit is in standby mode. Sleep transistor is connected to the pull down network of benchmark circuit. Sleep transistor is off by asserting an input 0V. Standby leakage is measured by giving different input combinations to the circuit. Standby leakage current is greatly reduced in the diode based staggered phase damping technique. Table I shows leakage current comparison of different power gating techniques. Transistor stacking is a very effective way to reduce the standby leakage current for various power gating techniques.
5.1.3
Ground Bounce Noise During the power mode transition, an instantaneous charge current passes through the sleep transistor, which is operating in its saturation region, and creates current surges elsewhere, because of the self-inductance of the off-chip bonding wires and the parasitic inductance inherent to the on-chip power rails, these surges result in voltage fluctuations in the power rails. If the magnitude of the voltage surge or circuit may erroneously latch to the wrong value or switch at the wrong time. Ground bounce noise can be reduced by limiting the large transient current flowing through the sleep transistors during mode transition. Table I shows the effect of ground bounce noise for different power gating techniques. The result shows the incredible reduction in ground bounce noise in these power gating techniques. Ground bounce noise is reduced in all power gating techniques.
Delay
The delay is measured between the trigger input edge reaching 50% of the supply voltage value and the circuit output edge reaching 50% of the supply voltage value. The effect of delay can be studied from Table I .
From table I, clearly depicts that on doing comparison with the base case , the diode based staggered phase damping technique has the minimum delay. The reduction of delay is 80.2%. 
Performance Analysis of 74182 Benchmark Circuit
The performance analysis of LUT of benchmark circuit 74182 on Spartan-3A DSP, 90nm FPGA, Virtex-5, 65nm FPGA, Virtex-LP, 40nm FPGA, Kintex-7, 28nm FPGA device is discussed in this section. Table II shows the details of resource utilization summary of benchmark circuit on above mentioned FPGA devices. The performance analysis for leakage current, average power and ground bounce noise of LUT on different FPGA devices has been done using Xilinx ISE 14.2 is shown in Table III .
The Spartan -3 FPGA belongs to the fifth generation Xilinx family. It is specifically designed to meet the needs of high volume, low unit cost electronic systems. This consists of five fundamental programmable functional elements: CLBs, IOBs, Block RAMs, dedicated multipliers and digital clock managers (DCMs). The SPARTAN -3A DSP FPGA is built by extending the Spartan-3A FPGA family by increasing the amount of memory per logic and adding Xtreme DSP DSP48A slices. The Xtreme DSP DSP48A slices replace the 18 * 18 multipliers found in the SPARTAN-3A devices. This FPGA is excellent for applications such as blade servers, medical devices, automotive infotainment, GPS, digital television equipments etc. The Spartan®-3A DSP family of Field-Programmable Gate Arrays (FPGAs) solves the design challenges in most high-volume, costsensitive, high-performance DSP applications. The two-member family offers densities ranging from 1.8 to 3.4 million system gates. The Spartan-3A DSP family builds on the success of the Spartan-3A FPGA family by increasing the amount of memory per logic and adding Xtreme DSP™ DSP48A slices. New features improve system performance and reduce the cost of configuration. These Spartan-3A DSP FPGA enhancements, combined with proven 90 nm process technology, deliver more functionality and bandwidth per dollar than ever before, setting the new standard in the programmable logic and DSP processing industry. The Spartan-3A DSP FPGAs extend and enhance the Spartan-3A FPGA family. The XC3SD1800A and the XC3SD3400A devices are tailored for DSP applications and have additional block RAM and Xtreme DSP DSP48A slices. Spartan-3A DSP FPGAs are ideally suited to a wide range of consumer electronics applications, such as broadband access, home networking, display/projection, and digital television. The Spartan-3A DSP family is a superior alternative to mask programmed ASICs. FPGAs avoid the high initial cost, lengthy development cycles, and the inherent inflexibility of conventional ASICs. Also, FPGA programmability permits design upgrades in the field with no hardware replacement necessary, an impossibility with ASICs.
The Virtex-5 devices built on a 65nm state -of -the -art copper process technology are a programmable alternative to custom ASIC technology [32] . The Virtex -5 family is the first FPGA platform to offer a real 6-input look up table (LUT) with fully independent inputs. This leads to increased logic fabric performance due to the reduced critical path delay through the LUTs. Virtex-5 family provides power-optimized high speed serial transceiver blocks for enhanced serial connectivity, tri-mode Ethernet MACs and highperformance PPC 440 microprocessor embedded blocks. Virtex-5 devices also use triple-oxide technology for reducing the static power consumption. The viretx5 slices include four LUTs that can be configured as 6-input LUTs with 1-bit output or 5-input LUTS with 2-bit output. Three dedicated user-controlled multiplexers for combinational logic (F7AMUX, F7BMUX and F8MUX ). Power consumption is reduced in this because the larger LUT reduces the amount of required interconnects.
The Virtex-6 family provides the newest, most advanced features in the FPGA market. Virtex-6 FPGAs are the programmable silicon foundation for Targeted Design Platforms that deliver integrated software and hardware components to enable designers to focus on innovation as soon as their development cycle begins. Using the third-generation ASMBL™ (Advanced Silicon Modular Block) column based architecture, the Virtex-6 family contains multiple distinct sub-families. This overview covers the devices in the LXT, SXT, and HXT sub-families. Each sub-family contains a different ratio of features to most efficiently address the needs of a wide variety of advanced logic designs. In addition to the high-performance logic fabric, Virtex-6 FPGAs contain many built-in system-level blocks. These features allow logic designers to build the highest levels of performance and functionality into their FPGA-based systems. Built on a 40 nm state-of-the art copper process technology, Virtex-6 FPGAs are a programmable alternative to custom ASIC technology. Virtex-6 FPGAs offer the best solution for addressing the needs of high-performance logic designers, high-performance DSP designers, and high-performance embedded systems designers with unprecedented logic, DSP, connectivity, and soft microprocessor capabilities. The look-up tables (LUTs) in Virtex-6 FPGAs can be configured as either one 6-input LUT (64-bit ROMs) with one output, or as two 5-input LUTs (32-bit ROMs) with separate outputs but common addresses or logic inputs. Each LUT output can optionally be registered in a flip-flop. Four such LUTs and their eight flip-flops as well as multiplexers and arithmetic carry logic form a slice, and two slices form a configurable logic block (CLB). Four flip-flops per slice (one per LUT) can optionally be configured as latches. In that case, the remaining four flip-flops in that slice must remain unused. Between 25-50% of all slices can also use their LUTs as distributed 64-bit RAM or as 32-bit shift registers (SRL32) or as two SRL16s. Modern synthesis tools take advantage of these highly efficient logic, arithmetic, and memory features. Expert designers can also instantiate them [33] .
Kintex-7 FPGAs are available in -3, -2, -1, and -2L speed grades, with -3 having the highest performance. The -2L devices can operate at either of two VCCINT voltages, 0.9V and 1.0V and are screened for lower maximum static power. When operated at VCCINT = 1.0V, the speed specification of a -2L device is the same as the -2 speed grade. When operated at VCCINT = 0.9V, the -2L performance and static and dynamic power is reduced.Kintex-7 FPGA DC and AC characteristics are specified in commercial, extended, and industrial temperature ranges. Except the operating temperature range or unless otherwise noted, all the DC and AC electrical parameters are the same for a particular speed grade (that is, the timing characteristics of a -1 speed grade industrial device are the same as for a -1 speed grade commercial device). However, only selected speed grades and/or devices are available in each temperature range. All supply voltage and junction temperature specifications are representative of worst-case conditions. The parameters included are common to popular designs and typical applications. This FPGA is based on 28nm technology [33] .
From table III it clearly depicts that leakage current is most reduced in diode based staggered phase damping technique. On comparing with conventional mode this power gating technique reduces 59.9% of leakage current. While, the average power is most reduced in diode based stacking power gating technique for Virtex -6 LP FPGA device. This reduction is almost 99.6% as on compared to conventional mode. The ground bounce noise is most reduced in diode based stacking power gating technique.
On the basis of table III, we classified the power gating techniques on the basis of leakage current, ground bounce noise and average power perspective. It has been found that for leakage current diode based staggered phase damping technique is considered as best while diode based stacking power gating technique is classified as best for ground bounce noise and for average power diode based staggered phase damping is best. This can be studied from fig.5. 
V. Conclusions
In this paper, we evaluated different power gating techniques for benchmark circuit 74182 which is a high speed carry look ahead adder and also evaluated these techniques for LUT design of benchmark circuit 74182 for reduction of leakage current and ground bounce noise. We also, did the performance analysis of LUT design using different FPGA devices as Spartan 3A DSP, Virtex-5, Virtex-6 LP and Kintex-7. We did the analysis of leakage current, average power and ground bounce noise of LUT design for different FPGA devices. Stacking power gating technique, diode based stacking power gating and diode based staggered phase damping power gating technique has been presented for LUT design of benchmark circuit which reduces ground bounce noise on mode transition and leakage current in standby mode. Simulation results show that on comparison with the conventional case the average power is reduced by 99.61% (diode based stacking), 99.58%( stacking power gating) and 99.2% in diode based staggered phase damping technique. Standby leakage current has been reduced by 60%, 42.2% and 43.2% respectively in diode based staggered phase damping , stacking power gating and diode based stacking technique. Diode based stacking technique achieves ground bounce reduction of 82% compared to other techniques.
Further, performance analysis for LUT design used on different FPGA devices namely Spartan -3A (90nm), Virtex-5 (65nm), Virtex-6 LP (40nm) and Kintex-7(28nm) has been done. A comparison chart of leakage current, average power and ground bounce noise consumed by these devices in conventional mode, stacking power gating, diode based stacking and diode based staggered phase damping power gating techniques has been prepared, and it has found that by using different power gating techniques leakage current, average power and ground bounce noise decreases in all techniques. Further, classification has been done of power gating techniques according to leakage, ground bounce noise and average power perspective. On classification we conclude that for leakage current diode based staggered phase damping is best while for leakage and ground bounce noise perspective diode based stacking power gating and for average power diode based stacking technique is best.
By using stacking power gating technique in Spartan-3A, Virtex-5, Virtex-6 LP and Kintex-7devices, a leakage current consumption of 2311.23E-15, 1926E-15, 1540.8E-15 and 1540.8E-15 respectively has been observed as compared to 3981.6E-9nW by conventional mode. Similarly, diode based stacking power gating technique in Spartan-3A, Virtex-5, Virtex-6 LP and Kintex-7devices, a leakage current consumption of 2268fA, 1890fA, 1512fA and 1512fA respectively has been observed as compared to 3981.6E-9nA by conventional mode. While in diode based staggered phase damping technique leakage current consumption of 1595.4fA, 1329.5fA, 1063.6fA and 1063.6fA has been observed for FPGA devices as compared to 3981.6E-9nA by conventional mode.
Similarly, using stacking power gating technique in Spartan-3A, Virtex-5, Virtex-6 LP and Kintex7devices, an average power consumption of 9.81nW, 8.175nW, 6.54nW and 6.54nW respectively has been observed as compared to 2388nW by conventional mode. Diode based stacking power gating technique in Spartan-3A, Virtex-5, Virtex-6 LP and Kintex-7devices, an average power consumption of 9.22nW, 6.14nW, 6.14EnW and 7.686nW respectively has been observed as compared to 2388nW by conventional mode. Whereas in diode based staggered phase damping technique leakage current consumption of 9.79nW, 8.16nW, 6.52nW and 6.52nW has been observed for FPGA devices as compared to 2988 nA by conventional mode. Ground bounce is also calculated for these FPGA devices. So, stacking power gating technique in Spartan-3A, Virtex-5, Virtex-6 LP and Kintex-7devices, ground bounce noise reduction of 1585.8nV, 1321.5nV, 1057.2nV and 1057.2nV respectively has been observed. For diode based stacking power gating technique in Spartan-3A, Virtex-5, Virtex-6 LP and Kintex-7devices, ground bounce noise reduction of 717.6nV, 588nV, 478.4nV and 478.4nV respectively has been calculated. Whereas in diode based staggered phase damping technique ground bounce noise reduces to 1017.2nV, 847.7nV, 678.1nV and 678.1nV has been observed for FPGA devices.
