Abstract: Due to advent in CMOS technology, it has become possible now to put millions of transistors on a single chip of silicon. This has drastically increased the performance of the device and it can do much faster operations. But on the other side, putting more transistors on a silicon chip triggering the problem of increased power consumption. So, it becomes a bottleneck for the designer to choose in between performance and power consumption. Particularly, for reconfigurable hardware like FPGAs the situation is worst and demands concern. So, this paper presents some optimization techniques that are applied on FPGAs at different levels of abstraction. Some benchmark circuits like ALU, Register, Counter and RAM are used for experimental measurements to validate the results. After simulation and power analysis of benchmark circuits at different frequencies, a power aware utility software is developed that performs optimization of power keeping performance in consideration at a given frequency for the selected FPGA. The circuits have been implemented using VHDL as the hardware description language and simulation is carried out using Xilinx ISE 14.1 by targeting Virtex-4, 5 and Artix-7 FPGA.
Introduction
Since mid-80s, when the FPGAs were first introduced, their popularity has grown very quickly, and now they accounts for more than half the 3 billion dollar programmable logic industry. FPGAs are the programmable logic devices that can easily implement digital circuits having millions of gates and can operate at speeds in the hundreds of megahertz. Custom ASICs are generally compared with FPGAs and referred as their primary competitor. But these need number of weeks or months for their fabrication. On the other hand, FPGAs can be programmed in seconds and can be used any number of times. So it is a key advantage of FPGAs over ASICs as they reduce time to market, which is the crucial requirement nowadays in the electronic industry for the development of new products. But since they consume large amount of power, it has become a challenge for the designers to reduce this power dissipation while maintaining performance efficiency. Since its circuitry involves significant hardware overhead thus it includes a number of interconnects and programmable switches. There are some generic logic structures in FPGAs that consume more power than the circuitry components used in earlier ASICs. Thus power has been called as a limiting factor in the ability of FPGAs for replacing ASICs. In order to continue this competition of FPGA with Application Specific Integrated Circuits, we must handle this problem using power optimization techniques keeping performance maintained. Thus we are motivated to do this power and timing analysis using power minimization techniques along with consideration of timing constraints. We have done the analysis on some benchmark circuits by applying some power reduction techniques at different operating frequencies from processor point of view. Power Aware Utility software has been developed and a GUI interface is provided to have user friendly environment.
Background and Related Work
Power utilization in CMOS circuits can be named either dynamic or static. Static power is otherwise called leakage power , and is dispersed when a logic circuit is in a quiet state. The essential leakage systems in a MOS transistor incorporate sub-edge leakage, door oxide leakage, and intersection leakage. Dynamic power, on the opposite side, is because of the logic switches that happen on the signs of a logic circuit. Such switches happen as a typical piece of valuable calculation, and dynamic power scales in extent to the rate of calculation. Dynamic power is devoured through two systems: s.c. current and the charging and releasing of capacitance. We are concentrating on dynamic power here since it includes real part of the total consumed power.
Literature survey
Lamoureux et al. [1] give an overview of low power techniques from FPGA point of view. They described FPGA architecture and the basics of power dissipation in it. They have covered system level and device level design techniques that mainly targeted on commercial applications. Anderson [2] presented a number of power optimization and prediction techniques for FPGA for dynamic as well as leakage power. Here two CAD techniques for leakage power are proposed which reduce leakage power without imposing any cost and having no effect on area, efficiency, cost for fabricating IC and its speed also. Khaleel et al. [3] have focused on making an efficient binary coded decimal digit adder. Where the author proposed two designs using VHDL and Xilinx ISE 10.1 targeting Xilinx virtex-5 XC5VLX30-3 FPGA. First one involves minimizing the delay of the adder and it was discussed using CAD flow i.e generation of truth table and Boolean expression which was then expressed in VHDL code and simulated. In second design, area efficiency is discussed using Look Up Tables. Huda et al. [4] discussed about the clock gating architectures and how these can be used for power reduction with the help of flexible placement algorithm which is operated with various gating granularities. Pandey et al. [9] gives three power decrease methods clock gating, clock empower and blocking input. These are utilized to power decrease utilization by turning off part of a framework or switch amongst dynamic and standby mode or incapacitating the information way of the circuit. Clock Gating is connected to decrease dynamic power utilization of target plan which is 90 nm Spartan-3. Pandey et al. [10] dealt with mapping which means optimizing one line code. For low power design, low fan out Clock Enable is necessary. There are two strategies for this. In the first place is to utilize union credits to control the utilization of control signs at the flag or module level for low power outline. Second is coding (behavioral HDL or dataflow HDL) for low fan out Clock Enable. Madhok et al. [12] things. Aggarwal et al. [13] designed a Green and Energy Efficient ECG machine on FPGA using different logic families. Gupta et al. [14] presented counter design which is based on voltage scaling to improve the energy efficiency. Verma et al. [15] used the thermal aware approach in RAM design and thus tested the thermal stability at different ambient temperatures. Also he checked the compatibility of the device with wireless network by carrying out the experiment at different processor frequencies. Mishra et al. [16] dealt BCD Adder for power efficiency at architectural level through techniques like pipelining and parallelism. Verma et al. [17] presented low power techniques which can be applied for different target platforms at different levels of design hierarchy. Verma et al. [18] performed power analysis by performing scaling of parameters like voltage, frequency and load capacitance and airflow for power optimization.
Low power Techniques used A. Clock Gating
Reduction of dynamic power can be achieved by clock gating technique after turning off the inactive parts temporarily or by putting unused modules in standby mode. This is a simple technique in which clock enable is used. Clock gating [4] can be utilized for controlling switching activity at the function unit level, if we inhibit the input updates to the function units whose outputs are not needed for a given operation. The selection of these I/O standards has a great impact on power dissipation. This selection has been done by using UCF File i.e User Constraints File which actually helps in providing a complete and realistic set of constraints for utilisation and performance of I/O logic and Core logic both. There are a number of I/O standards that are supported by Virtex devices like LVTTL, LVCMOS, HSTL, PCI etc. but we have used three of them for analysis purpose that are LVCMOS, HSTL and SSTL.
LVCMOS (Low Voltage Complementary Metal Oxide Semiconductor)
It is widely used as a switching standard implemented in CMOS transistors, defined by JEDEC (JESD 4. SSTL_15, 1.5 V. The Digitally Controlled Impedance(DCI) specifications of all I/O standards are also used. DCI adjusts the characteristic impedance of the transmission line to accurately match the output impedance or the input termination. DCI also adjusts the impedance of the input/output to make it equal to an external reference resistance. Because of this, the changes in input/output impedance, due to process variations, are compensated. The changes in the input/output impedance can also compensate for the variations of temperature and supply voltage fluctuations.
C. Logic level power Minimization through coding
At logic level, power can be reduced by reducing the number of transitions i.e switching activity, at the I/O interface of processor. One approach to reduce switching is to perform suitable encoding of the data before sending it over the I/O interface and a decoder should be used to get back the original data at the receiving end. The input is applied in different coding ways and then the power is calculated using the SAIF file generated. XPower analyzer generates power report using this file. The coding styles used are binary coding, grey coding, silent coding and bus-inverse coding.
Binary Coding
Binary Coding is a coding scheme for representing a number to the base 2. Here, each place of a number corresponds to a power of 2. It uses only the digits 1 and 0.
Grey Coding
Gray Coding produces a code word sequence where adjacent code words differ only by 1 bit i.e with the hamming distance of 1. For large values of n, the number of transitions for binary representation will approach 2. Contrary to this, the number of transitions for grey code will always have 1 transition.
SILENT Coding
This is a serialized low energy transmission technique for reducing the transmission energy to the minimum on the serial wire [11] b (t) [n-1: 0] represents n-bit data word from a sender at time t. [i] for i = 0 -n-1 By serializing these encoded words, we can reduce the number of transitions of the serial wire and the wire looks silent.
Bus Inverse Coding
Bus Inverse Coding is a coding scheme, which demands only 1 repetitive bit i.e. m = n+1 for the transmission of information words. On the off chance that where the hamming separation amongst Data(t) and Bus(t-1) is more than n/2, supplement of Data(t) is sent by the Encoder over the transport as B(t). Then again when the hamming separation amongst Data(t) and Bus(t-1) is not exactly or equivalent to n/2 then Data(t) is sent as B(t) by the encoder. The repetitive bit P is added to demonstrate whether B(t) is a rearranged rendition of Data(t) or not. By utilizing this encoding strategy, the quantity of moves is lessened by 10-20% for information transports. So this coding helps in reducing power dissipation.
Power Aware UTility
A Graphical User Interface which incorporates a power aware utility algorithm has been developed using MATLAB R2012b. The algorithm works on the data values obtained after synthesizing benchmark circuits for different mix of configurations. Around 500 simulations run have been performed and the observed data has been put in the form of look up tables. The algorithm basically calculates the optimized power in watts for the selected digital circuit at a particular frequency for a specific FPGA as per the flowchart shown in fig.2 . The advantage of using this software is that it takes very less time to perform power optimization for all the power reduction techniques along with the consideration of timing i.e here timing constraints are also kept in this analysis. In case of proper matching of timing constraints, technique is considered for optimization and accordingly output i.e minimum power value is calculated and displayed along with reduction in power in terms of percentage as well as the technique by which we have got the minimum power. Counter and RAM circuits, the best result has been given by Clock Gating at 72 MHz and 100 MHz. But these two circuits cannot be used at 1.2 GHz and 1.3 GHz since timing constraints are not matched for any technique and thus no optimized Power has been calculated. 
Conclusion
After observing the results, we can conclude that Artix-7 gives us the best optimized results for all digital circuits out of all the three FPGA packages at frequencies 72 MHz and 100 MHz but it cannot be used at 1.2 GHz and 1.3 GHz as timing constraints are not met for these two cases.
Future Scope
This paper provides a wider scope for refinement in the foreseeable future. Further analysis of power can be done by considering other benchmark circuits. Not only Timing score but there are other performance parameters also which can be involved. Thus the software can be generalize by considering different FPGA packages with different power and timing techniques involved into it.
