Abstract-This paper deals with the design and implementation of a Clock Gating Aware Low Power Arithmetic and Logic Unit that has been developed as part of low power processor design in the platform Xilinx ISE 14.2 and synthesized on 90nm Spartan-3 FPGA. Clock power contributes 45-60 percent of total dynamic power. Hence, clock power reduction is necessary in low power design. In this paper, we analyze theoretical 93.75% clock power reduction in ALU using clock gating techniques. On simulator, we achieved 88.23% clock power reduction using latch based clock gating and 70.58% clock power reduction using latch free clock gating.
I. INTRODUCTION
Low Power ALU Design is based on application of clock gate to turn off the sub-module of ALU that is not in use by current executing instruction as decided by instruction decoder unit. According to [1] - [3] , Clock Power consumes 50-70 percent of total chip power and will increase in the next coming generation of hardware designs at 32nm and below. Hence, reducing clock power is very important. Clock gating is a key power reduction technique used by hardware designers and is typically implemented by RTL-level HDL Simulator or gate level power analyser tools. 
A. Statement of the Problem:
Clock gating is used in VLSI circuit design to reduce dynamic power by gating off the functional unit that is not in use by current executing instructions as decided by instruction decoder unit.
II. LITERATURE REVIEW
Clock Enable consumed More Power and Clock Gating consumed Less Power [1] . According to reference [2] , Power optimization, traditionally relegated to the synthesis and circuits level, now shifted to the System Level and Register-Transfer-Level. This is possible due to clock gating which switch off the inactive units of the design and reduce overall power consumption. There are many clock gating styles available to optimize power in VLSI circuits. They can be:
Latch-free based CG design. Latch-based CG design. Flip-flop based CG design. Intelligent CG design.
A. Latch-free Clock Gated ALU design
We use an AND gate in clock gate if clock is active on the rising edge. We use an OR gate in clock gate if clock is active on the falling edge. Using idea given in [2] and [4] , we develop following ALU design as shown in Fig. 1 . 
C. Latch Based Clock Gated ALU Design
The latch-based clock gate consists of a level sensitive latch in design to hold the enable signal from the active edge to the inactive edge of the clock as shown in Fig. 3 . to the inactive edge of the clock as shown in Fig. 4 . Reference [5] presents the design and implementation of a self-timed arithmetic logic unit (ALU) that has been developed as part of an asynchronous microprocessor. Reference [5] displays an inherent operational characteristic of low consumption, owing to the synchronization signals that stop when the execution of an operation finishes (stoppable clock); that is a precursor of clock gating. Our whole work of clock gating is an extension of the work done in [5] i.e. switch off functional unit when unit is not in use. ALU in [6] performs 16 instructions and has a two-stage pipelined architecture. For low power consumption, [6] propose a new ALU architecture which has an efficient ELM adder of propagation (P) and generation (G) block scheme. The operation of an adder of the proposed ALU is disabled while the logical operation is performed and vice versa, this concept is same as our clock gating approach, here we also switching off the arithmetic function when logical function in use and vice versa using clock gate approach. In outputs of [6] , P block are separated to become dual bus to reduce switching capacitances during the ALU operation. The ALU generates 4 flags-Zero (Z), Carry (C), Sign (S), and Parity (P). Flags are not affected by the Unary Logic functions. Only the C flag is affected by the Shift function. All flags are affected by the other ALU functions. 
A. Clear Function of ALU
The clear function reset the output of ALU to 8'h00. If we add clock gate in place of de-multiplexing clock signal to all 16 sub-modules of ALU then we reduce 93.75% power reduction as shown in Fig. 6 (a-b) . 
B. Save Operand Register Value in ALU
Pass value of B to ALU output. ALU_Out=B; In Clock Gating, we turn off the supply of clock signal to rest 15 modules other than Save B. Hence reduce 93.75% power reduction as shown in Fig.7 (a-b) . 
D. Hold Data Bus Value
ALU out=A; Pass value of A to ALU output. In Clock Gating, we turn off the 15 functional units as shown in Fig. 9 Hence reduce 93.75% power reduction. 
F. Decrement Data Bus Value
ALU out=A-1; Pass decremented value of A to ALU output. In Clock Gating, we turn off the 15 functional units as shown in Fig. 11 . Hence reduce 93.75% power reduction. 
H. Left Shift Data Bus Value
ALU out=A << 1; Left shift A by 1 bit and Pass that value of A to ALU output In Clock Gating, we turn off the 15 functional units as shown in Fig.13 . Hence reduce 93.75% power reduction. In Clock Gating, we turn off the 15 functional units as shown in Fig. 15 . Hence reduces 93.75% power reduction. K. Addition with Carry ALU out=A+B+Carry_in; Add three values A, B and Carry_in and store that result to ALU out. In Clock Gating, we turn off the 15 functional units other than ADDC as shown in Fig. 16 . Hence reduces 93.75(15/16*100) % power reduction.
L. Subtraction with Carry
ALU out=A-B-Carry in; Subtract value of B from value of A and then subtract Carry In from last result and pass to ALU out. In Clock Gating, we turn off the 15 functional units as shown in Fig. 17 . Hence reduce 93.75% power reduction. 
O. Logical XOR Operation
ALU out=A⊕ B; Calculate Logical A⊕ B and pass that value to ALU out. In Clock Gating, we turn off the 15 functional units as shown in Fig.20 . Hence reduce 93.75% power reduction.
P. Logical XNOR Operation
ALU out= (A⊕ B)'; Calculate Logical (A⊕ B)' and pass that value to ALU out. In Clock Gating, we turn off the 15 functional units as shown in Fig.21 . Hence reduce 93.75% power reduction. 100MHz  2mW  1mW  1mW  0 mW  1000MHz  17 mW  9mW  10 mW  4 mW  10GHz  168 mW  48mW  88 mW  41mW  100GHz  1679mW  153mW  802 mW  410mW  1000GHz  16795mW  1198mW 7983 mW  4099mW In next phase using clock gating, we turn off rest 15 modules when any module is in execution then theoretical assumption is 93.75% power reduction. Table II shows 88.23% clock power reduction using latch based clock gating. Table III shows 70.58% clock power reduction using latch free clock gating. VI. CONCLUSION Power reduction deals with synthesis, design at circuit level and placement and routing stages, now moved to the System Level and Register Transfer Level. This is possible due to clock gating which always switch off the inactive unit of the design and reduce overall power consumption. The Register Transfer Level approach is always important because hardware designers generally verify power only at the gate level and any changes to the Register Transfer Level needs many design repetition to reduce power. Our designed ALU has 16 functions. Each function has one dedicated module. When one instruction executes in their respective module, others module that was not used by current executing instruction must gated off by the clock gate. From given formula, Here, when any one of module execute because of clock gating rest 15 modules turned off and hence reduce power (15/16)×100=93.75% power reduction.
VII. FUTURE SCOPE
Clock gating technique is one of the best techniques to reduce dynamic power. There is need to extend clock gating technique to reduce leakage power consumption. Virtex-6 FPGA is based on 40-nm technology. Latest FPGA like Virtex-7, Kintex-7, Artex-7 based on 28-nm technology contribute significant leakage power consumption. There is need to optimize clock gating to reduce leakage power along with dynamic power.
