Abstract-In
I. INTRODUCTION
Increased usage of FPGA in portable devices and becoming smaller in size with every generation process, causes increase in power consumption [1] thus, structure should be modified through presenting new methods and modification of internal block for reducing power. In a digital complementary metal-oxide-semiconductor (CMOS) circuit, dynamic power dominates the totalpower dissipation. Reducing the supply voltage V dd is the most effective approach to reduce dynamic power dissipation. Lowering V dd is also important in deep submicron (DSM) technologies to avoid reliability problems [2] . However, reducing the supply voltage alone causes serious degradation in the circuit's performance. One way to maintain performance is to scale down both Vdd and the threshold voltage Vth. However, reducing Vth increases the subthreshold leakage current exponentially [3] . Integrated low-power architecture is proposed and implemented for FPGA, which fully utilizes the fine-grain assignment of VDD/VTH in time and space that way, 4 Configurable Logic Blocks (CLB) are grouped into one block where is shared [4] . In [5] average switching activity of the input and output state variables is reduced by minimizing the number of bit changes during state transitions. In [6] a variable threshold voltage as keeper circuit techniques proposed for simultaneous power reduction. Power gating [7] - [8] is the most popular circuit technique to suppress subthreshold leakage. It consists of gating, or cutting off, a circuit from its power supply rails during standby mode. When footer, located between a logic block and V ss , is turned off, the voltage at virtual ground (V ssv ), where footer has its drain, rises slowly until it reaches a steadystate potential, which is usually close toV dd . Similarly, if a header is used and if it is turned off, the voltage at virtual V dd (V ddv ), slowly goes down to a steady-state potential, which is close to V ss . Due to collapse of either V ssv or V ddv during standby, the circuit states that are represented by sequential elements and primary outputs have to be captured in advance and preserved. Data retention elements (flip-flops and isolation circuits) are used to preserve circuit states during standby mode, if the states are needed again after wake-up. These elements must be controlled by an external power management unit, causing a network of control signals implemented with extra wires and buffers. A power-gated circuit with autonomous data retention (APG) is proposed in [9] figure 1 . In both circuits, combinational gates are placed between V dd and V ssv . V ssv gets controlled through footer, that way it is powered on, in an active state and powered off in a standby state. In APG [9] , storing Flip flop and isolation circuits are replaced with Autonomous Retention Flip Flop (ARF) and Autonomous Retention Isolation (ARI). ARF and ARI are used for storing data while footer is off. For this purpose one slave latch, which is in charge of ongoing data, is connected directly to V ss and V dd to capture Flip flop data [9] - [10] . In [11] area-efficient circuits for programmable fine-grained power-gating of individual unused interconnect switches is presented. Fine-grain power gating is more effective than coarse-grain power gating to reduce the active leakage power of unused logic and interconnection resources [12] . In [13] a modification to the fabric of an FPGA is presented that enables dynamically-controlled power gating, in which logic clusters can be selectively powered-down at run-time. Detecting the data arrival in advance prevents the delay increase for waking-up and the power consumption of unnecessary power switching [14] . In [15] architecture is presented that enables selectively powering down SBs along with the logic blocks during their idle periods. In [16] a tool is presented that capable of modeling the power usage of many different field programmable gate array (FPGA) architectures. In [17] a novel directional coarse grained power gating architecture for switch boxes is presented. In [18] an FPGA architecture is presented that enables dynamically controlled power gating, in which FPGA resources can be selectively powered down at run-time. In [19] area, delay, and energy for two intra-CLB topologies are compared. Figure 2 shows how logic blocks get controlled through V ssv state via footer switch. In this paper by controlling time period of V ssv logic state through Footer switch, power dissipation in CLB block is investigated. First in section 2, power consumption for 5 sample circuits of CLB block is investigated. In section 3, we apply our method for reduction of power dissipation and test it through the sample CLB blocks; finally in section 4 we show result of power recovery in the sample CLB blocks. In this paper when we mention V ssv =1 or 0, we describe V ssv logic state, not its voltage. 
A. Clb Design through Transistor Model of Logic Gates
Basic logic gates (NOT, AND, OR,…) are simulated through NMOS and PMOS model in 65nm technology for design of CLB block in FPGA. Structure of CMOS model for Inverter, 2-input NAND, 2-input NOR and SR Flip flop, is used in this research, shown in figure 3.
Utilizing these CMOS transistor models, we are able to design the main blocks in FPGA, since architecture of CLB block is made through Logic gates. Initially in the simulation process, Flip flop is designed through combination of transistor model of (OR), (AND) and (NOT) gates. Then through synthesis of two (AND), one (NOT) and one (OR), (MUX-2 input) is acquired (depending on amount of inputs in MUX). Afterward via utilizing the logic gates, (LUT) is modeled and in result by using MUX, LUT and F.F, CLB block, which is most important block in FPGA, is obtained. Following which, power dissipation of each logic gate, which contributes in CLB block design, is investigated. 
B. Power Equations
Allowable power consumption is main concern while expanding integration in circuits as well as processing speed. Depending on the design, different formulas can be considered for power dissipation. When it is required to assign voltage and current in power supply, peak power factor is more useful, when battery consumption and cooling system is the main concern, then P av factor becomes a preferable parameter that is shown in (1) [20] .
In formula (1), P(t) is transition power, I sup is the amount of current that is drawn from V sup in period of    t , 0 . After acquiring power dissipation, in proportion with given input and transistor CMOS model for each gate, final results are concluded in table 1. Table 1 shows average power (P av ) of logic gates, Flip flop, Mux, LUT, CLB.
C. Implementation of Sample Circuits on CLB Block
In figure 4, internal schematic of CLB block, which designed through logic gates and flip flop, is shown that includes LUT at input, MUX in middle and Flip flop at output. The main CLB block is utilized as the first sample circuit under test. In this paper 5 sample circuits are presented on basis of the CLB block shown in figure4; in each one, 2 logic gates are added to LUT section of last sample circuit in order to investigate and compare power dissipation on these sample circuits. 
D. Dissipation Power in First Sample Circuits
Main CLB block architecture is utilized as the first sample circuit under test. In this test, dissipation power is investigated through simulation in two logic state ofV ssv =0 and V ssv =1. Results in Table 3 shows that dissipation power is reduced in V ssv =1 state.
E. Dissipation Power in 5th Sample Circuits
In 5 th sample circuit, else LUT is added to 4 th sample circuit includes (G6(XOR)-F6(NAND)) and totally compared to main CLB circuit, 4 LUT include ((G3(AND), F3(XOR), G4(AND), F4(OR), G5(NAND), F5(OR), G6(XOR), F6(NAND)) is added to inputs that shown in figure 5 . The power dissipation of 5 th sample circuit, in two logic states of V ssv =0 and V ssv =1, is shown in table 3(V). According to obtained result, despite expanding CLB block through added LUT, dissipation power reduced dramatically via keeping V ssv at state of logic 1. 
F. Dissipation Power Results in Sample Circuits
In table3, power dissipation for 5 sample circuits in two logic states of V ssv =1 and V ssv =0 is compared and shows despite expanding LUTs in CLB, at state of V ssv =1 (Power gated) power is reduced. In table 4 power dissipation is considered in column 2 up to 6 in (per) interval for 10µs up to 50µs out of 100µs in 5 steps for sample circuits of CLB. Column 7 and 8 respectively show power dissipation in power gated state and power not gated state.
According to figure 7, as much as (per) period (period of time that V ssv =1) gets longer, power dissipation get less, thus The power reduction is in linear proportion with (per); in addition, slope of lines become more along with expanding the sample circuits of CLB. For obtaining maximum power saving, it requires to keep V ssv =1 for 50% of total duty cycle that cause 49% saving in energy but in contrast extra delay will be imposed that limits our method to just low speed application. By using less percentage of duty cycle being at state of V ssv =1, energy optimization get reduced but it rises time efficiency and in result it will lead to fast performance, though the method can be adjusted according to the design and application. By using Power Recovery factor, it is possible to measure energy saving through controlling duty cycle time of Vssv in operation that can be applied to previous works that requires more power saving and less speed. Approximately 16 percent of the FPGA power is consumed by the CLBs alone [21] . As technology nodes scale down, the leakage power is going to increase in these logic blocks. Also, its effects on the total power should not be ignored for FPGA cores that can be embedded within an ASIC architecture [21] . The components of the CLB are latches (in place of SRAM cells), Muxes, and flip-flop. The methods considered in previous works are multi-threshold CMOS, clock gating, and variable gate length transistors [21] . Previous work on creating a low energy FPGA, explored many architectures and circuit techniques. [22] The lookup table size was chosen because previous studies revealed that it is a good choice for speed and density. A variable gate length method was introduced to reduce the power consumption of the CLB. Variable gate length transistors allow for the reduction of leakage in circuits that do not require high performance. The clock gating of the D flipflop isn't required to be fast if the flip flop is not switched very frequently. Therefore, the clock gating circuitry can slowly compare the input and output of the flipflop and have a value ready when the flipflop is clocked. This method reduced the power consumption for the sequential mode by reducing the leakage in those gates. The comparison of the three CLB power reduction methods can be seen in Figure 9 . The delay of the logic block increases when the clock gating technique is integrated. The addition of the variable gate length transistors only increases the delay slightly but results in a large reduction in power consumption [21] . In this paper power optimization of CLB by utilizing time controlling of virtual ground (Vssv) is investigated and in result, it showed that by CLB in operation for 50% of the total time in Vssv =1 logic state, the average power reduces up to 49%, but depends on application of design, whether, it will allow designer to keep the CLB off for the special period of time. By merging this method with previous works, there will be more potential for reducing power dissipation that can be accomplished in future works. Fig.9 . Power of optimized CLB in previous works (f=200MHZ) [21] VI. CONCLUSION In this research initially power consumption of logic gates that contribute in CLB block design is obtained, then through investigating internal block of CLB in FPGA at 2 logic states of V ssv =1 and V ssv =0, reducing power is considered. In result, it is observed that, via adding logic gates in LUT section of CLB, power dissipation through controlling V ssv state is changed dramatically, such that when V ssv =0, power is raised and in case of V ssv =1, circuit goes to power gated state and power dissipation get reduced. Results also show that 
