This paper presents a comparative research of low-power and high-speed 
Introduction
With the explosion of mobile computers and other portable devices, low-power and lowenergy design became a must. Power and energy go hand in hand, power reduction leads to lower energy consumption over a fixed time span. Arithmetic circuits are considerable contributors of power and energy in computation intensive applications and require therefore a careful power-delay design tradeoff [1] [2] [3] .
Addition is a basic arithmetical operation in many VLSI systems such as DSP and microprocessor. Propagation delay, power consumption and power-delay product (PDP) are the significant quality measure parameters for most of full adder systems, and the full adder would affect the overall performance of the system. That is why optimizing the efficiency of addition is a constantly attractive research topic. The XOR-XNOR circuits are basic building blocks in various full adder cell circuits [4] .
In this paper, a comprehensive approach for analyzing is presented. It is based on anatomized the adder cell into small modules, then use simulation to measure the
Source of Power Consumption
There are three major components of power dissipation in CMOS circuits 1). Switching Power: Power consumed by the circuit node capacitances during transistor switching and it may be thought of as useful in that it establishes information by charging and discharging signal lines.
2). Short Circuit Power: Short energy is waste and comes from short-circuit currents which flow directly from supply voltage to the ground. The short energy represents a small percentage in the total energy consumption.
3). Static Power: Consumed due to static and leakage currents flowing while the circuit in a stable state.
The first two components are mentioned to as dynamic power. Dynamic power constitutes the majority of the power dissipated in CMOS circuits. It is the power dissipated during charging or discharging the load capacitances of a given circuit [6] . It depends on the input pattern that will either cause the transistors to switch (consume dynamic power) or not to switch (no dynamic power consumed) at every clock cycle. If the capacitance is C then the amount of charge is CV dd and the current is that charge multiplied by how often the output switches. The total energy consumption (E total ) per cycle are expressed as
The dynamic energy consumption (E dyn ) is shown in equation (2) .
Where the summation is over each gate, and C i is the load capacitance at the output node i. V dd is source voltage. The f i is the average rate at which the output of gate i charges and discharges. The I i is the averages short-circuit current flowing through gate i. So we can consider how to reduce the values of these components to design circuits with low power consumption.
Estimating the power of a large circuit is a complex task. We can find the best design module through estimating the decomposed modules and then by connecting the
Full Adder Building Modules
The full adder cell can be implemented in two types of logic structures. One is static style and the other is dynamic style. Static full adders are commonly more reliable, simpler and consume less power than dynamic ones. However, dynamic adders suffer from charge sharing, high power due to high switching activity and clock load [7] .
The simplest way to implement the static full-adder circuit is to take the logic equation and translate them directly into circuit. The typical full-adder function can be described as follows.
There are standard implementations for the full-adder cells which are used for the fundamental unit of a multi-bit adder. We may take these adders into consideration [8] [9] [10] . From the equation (3-4) we can know most adders' logic expression are based on two XOR-XNOR circuits: one to generate H (XOR) and H (XNOR), and the other to generate the Sum output function. The C in is not only used to carry-in bit but have the effect of the multiplexer. Let's rewriting equation (3) (4) as
From the equation (5-7), it's clear that if we optimize the generation of H and H ,this can significant enhance the performance of the full adder cell. A block diagram of the full adder cell and its building block is shown in Figure 1 [11] . The module1 should be calculated H and its complement H , which are the key variables in both equations. The module2 is needed to generate the Sum using C in , H and H . A third module is needed to generate C out given H, H , A and C in . As mentioned above, the XOR-XNOR module is the critical component unit of adder cell which generates H bit and H bit.
Circuit Analysis of Adder Cell Modules
The module1 is required to generate XOR and XNOR function. The Sum and C out are generated by module 2 and module 3, respectively, which are required to provide enough 
36
Copyright ⓒ 2014 SERSC driving power to the following stage. In other words, the driving cell must provide almost full swing outputs to the driven cell. Otherwise, the performance of the circuit will be degraded seriously or become non-operative at low supply voltage. In order to optimize the performance of the full adder, it is necessary to analyze each module of a full adder in detail. Beginning with the first module, it is required to generate both the XOR and XNOR functions. The XNOR function can be realized by XOR function with an inverter. Another method is to use both of the XOR and XNOR modules to generate the XOR-XNOR function, but it need more transistors and more power will be dissipated. Several different logic styles have been proposed to implement both the XOR-XNOR functions are shown in Figure 2 . A minimum of six transistors are presented in Figure 2 , while the module with more than 14 transistors will not be competitive because of the large power consumption.
In the following paragraphs, the characteristic of first module which are presented will be briefly described as follows. outputs of the nMOSFET pass-transistor network suffer from threshold voltage drop, which results in the incomplete turn-off of pMOSFET's in the inverters. It can be suffered from large power consumption and also leads to performance degradation and severe design limitations. This structure could require buffer to achieve desirable outputs. One drawback of the CPL logic is the current-driving capability which is limited and delay increases with long pass structure chains, so buffering is needed to restore the transmitted signal and improve the driving capability [12] [13] [14] . However, it can be suffered from large power consumption due to short circuit current between the power supply and ground from inverters,
. Double Pass-Transistor Logic (DPL)
Figure (2G) has 10 transistors include two inverters uses double Pass-transistor logic in which both NMOS and PMOS logic network are used [15] . The advantages of this gate avoids the nMOSFET threshold voltage drop issue of the CPL design and eliminate the static power consumption, The problems of narrow noise margin and performance degradation at low supply voltages, which occur in CPL circuits due to the threshold voltage drop, are avoided. However, the drawback of this gate is a large area and input capacitances because of the PMOS used [16] .
(4). Transmission Gate Logic
Figure (2C) has eight transistors which use one inverter, one transmission gate and two pass transistors to produce a XOR circuit, and then the XNOR function is implemented by using an inverter. Although it keeps full swing operation, the circuit consumes more static power due to the inverter and bigger transistor count.
(5). Complementary cross-coupled structure
Figure (2D) uses the complementary cross-coupled structure to produce the XOR and XNOR functions. It is able to provide full voltage swing at the output nodes because of the effect of two feedback transistors (transistors M1 and M6), which will lower the maximal operating frequency and require the MOSFETs to be rationed. The propagation delay may be suffered a small increase when the feedback transistors try to recover the threshold voltage loss. The case can be illustrated as follows: consider the case when A = 1 and B = 1, NMOS transistor M2 will charge the output H towards 1. However, since a NMOS transistor passes only a weak '1', the output H will be charged only to V dd -V tn , the V dd -V tn at the H output node will drive NMOS transistor M6 to fully discharge the H to 0, which, in turn, will drive PMOS transistor M1 to charge the output H to a strong '1'. Thus for A = 1 and B = 1, the H output node generated a fully '1' without a threshold voltage loss problem. However, when the supply voltage is below 2|V tp | and the Vdd-V tn cannot turn on the NMOS transistor, the feedback circuitry is not effective in restoring the output voltages to their full logic states.
As mentioned above it may be noted that the cell can easily be used for low power operation when the supply voltage is scaled down (as long as it keeps above 2|V tp |). 
39
Simulation results: After discussion the characteristic of the XOR-XNOR cell as illustrated above, the results of the simulation at 100MHz input frequency and 20fF load capacitance are summarized in Table 1 . 
B. Module2: Sum circuit
This module can be realized by XOR-XNOR function. An important requirement of second module is to provide enough driving capability to the following circuits. In [17] , it presents four different designs of second module. We choose the one which use the transmission gate logic to implement the XOR-XNOR function as shown in Figure 5 for this paper. 
Simulation of Full Adder Cells
A full adder can be build by connecting three modules together. In this section, it introduces eight different 1-bit adder cells for the purpose of analysis and comparison. The cells are shown in Figure 7 at the last page of this paper.
Simulation Environment Setup
All the circuits are designed in Cadence VIRTUOSO environment using CMOS design kit. The netlists of all adders are extracted and simulations are carried out at 27°C with an input frequency of 25 MHz, 50 MHz, 100 MHz, and 200 MHz, respectively. By optimizing the transistor size of all adders considered, the Power-delay product (PDP) can be set to achieve minimum as far as possible.
We use six input patterns to cover all test situations. Each model is simulated 4 times using frequencies at 25 MHz, 50 MHz, 100 MHz, and 200 MHz. Four different output loads of 0fF, 10fF, 20fF and 40fF are used for power and delay measurements. Thus, for each adder, 96 HSPICE simulations runs (4 frequencies * 6 patterns * 4 frequencies) are execute. This present a total of 768 simulation runs are comparison (96 simulations/adder * 8 adder cells).
For a simulation, 50 complete periods are given. The average power of every adder is taken from the beginning of the second period to the end of the fiftieth period. In order to avoid transient glitches, the testing can not include the first period.
In this paper, the time-delay is defined as the maximum delay which is associated with the longest path is measure for SUM and C out output in the circuit, the value of time-delay has been measured from 50% of voltage level of input signals (after the buffers) to the 50% of voltage level of output signal, the Power-delay product (PDP) is the significant quality measure parameter of the efficiency and a compromise between power dissipation and speed for CMOS circuits [18] . This value is calculated from worst-case delay multiplied with average power consumption is given as equation (24). 
To produce more realistic performance in the simulation, buffers are added to all the three inputs nodes. The complete simulation environment is shown in Figure 8 . The circuit signals are probed at the outputs and at the inputs of the output inverters [19] . The cell was simulated by HSPICE based on 130 nm CMOS technology at 1.2V supply voltages. 
Simulation Results and Comparison
We compared the performance of eight full adder cells, and the simulation results for different cells are summarized in Figure 10 and Figure 12 Those show the cell8 and cell2 have larger delay than the rest of analyzed cells at four different load capacitance, it's due to using pass transistor logic at the first module which produces a non full swing intermediate signal and has poor output driving capacitance. The cell6 has the lowest delay in the entire adder cells simulated here and the propagation delay at four different frequencies is almost equal.
Considering power dissipation, it's clear that cells with the same number of transistors produce different power consumption values. From Figure 10 , it's indicated that, as frequency is increased, the power dissipation increases. It also shows that the best cell which consumes the least power is Cell6, although it does not have the least transistor count. Its schematic and layout are shown in Figure 11 . It uses the bootstrapped pass transistor logic with SOI MOSFET for improvement in both speed and power consumption. The Cell6 consumes approximately 8% to 35% less power than seven other full adder cells at four different frequencies and four different loads. The Cell3 which is based on the transmission gate logic comes second. The Cell4 which has two feedback transistors in the first module consumes the largest power dissipation than all other cells due to the short-circuit path from V dd to V SS when simulated in a more realistic environment. This can be illustrated as follows: consider input pattern of A=B=0 (generally these signals are fed form inverters) applied to the gate. If B now changes from 0 to 1 an instantaneously short-circuit path from V dd to V SS arise, as shown in Figure 9 [20] . 
45
The PDP is always a quantitative measure of the performance of the trade-off between power consumption and propagation delay, and is significantly important when low-power operation is needed. The results of simulation under different frequencies and loads are shown in Figure 12 . From the simulation results it can be observed that the Cell6 (Novel circuit) has the best performance and takes approximately 5% to 45% less than all others under the four different frequencies at the load of 0fF, while for the load of 40fF it becomes higher by 3% to 17%.
Overall, the Cell6 (Novel circuit) which has bootstrapped pass transistor logic use silicon on insulator (SOI) process is an excellent alternative in this paper for PDP-efficient designs, and the simulation results of this research are expected to help designers to select the appropriate full adder cell that satisfies their specific applications.
Conclusion
In this paper, the simulation results shows that the Cell6 (Novel circuit) which has bootstrapped pass transistor logic use silicon on insulator (SOI) process outperforms the other adders analyzed by reducing the power consumption and delay,
The comprehensive simulation shows that PDP of the Cell6 circuit is improved up to 5%-45% as compared with all other reference adder cells at the load of 0fF, while for the load of 40fF it becomes higher by 3% to 17%. The experimental results confirm that the Cell6 is novel and efficient for system applications.
