Abstract: Scaling the feature size under 0.1 micron leads to the domination of leakage power along with the consumption by interconnects also. Among alternatives for power optimization, the circuit level approach is the best option for low power consumption which is used in this work to incorporate a novel MOS Switch Integrated Ultra-Low Power (MOSSI-ULP) 1-bit full adder using 32 nm BPTM file. It is then used as a base cell in an array multiplier. Its performance is compared with similar kind of adders like SERF, ULPFA, TGA, TFA and BBL-PT. The analysis shows that the proposed design consumes a maximum of 11 time lower average power than ULPFA. Though MOSSI-ULP switches with a moderate frequency, it requires 17 times less PDP than ULPFA and maximum of 3 times less area.
Introduction
Feature size scaling yields several benefits for low-power operation. The improved device performance with V dd under 1 V is one of them. It also ensures the circuit to operate with lower threshold voltage (V t ). Improved interconnect technology is used to minimize the parasitic effects. Reduced junction capacitance is another result of scaling the device. Availability of multiple and variable threshold devices lead to MTCMOS technology. This results in good management of active and standby power trade-off and higher density of integration. The sleep transistor and power gating are the familiar methods in connection with scaling the devices.
Apart from scaling of CMOS devices, high performance is more complex to obtain due to the effect of velocity saturation [1] . At short channel length L, the drain current per channel width (W) is no more equal to (V gs À V t ) 2 /L, where V gs is the gate-to-source voltage and V t is the threshold voltage. The drain current is proportional to V sat *(V gs À V t ), where V sat is the saturation velocity [2] . At sub-micron technology the difference between V dd and V t is very small and the static leakage is dominant to increase the static power consumption. This situation increases the design complexity and hence care is needed. This paper implements the 1-bit full adder by integrating PMOS and NMOS transistors as switches using 32 nm model file named as MOS Switch Integrated Ultra Low Power full adder (MOSSI-ULP). This is used as base cell in a multiplier array and compared with five more adders of this kind.
Previous un-conventional 1-bit full adders
This section deals with the previous non-static CMOS adder implementations. In a conventional CMOS circuit, all the MOS transistors are swinging with the help of supply rail with respect to ground. On the contrary, the methods taken for comparison with MOSSI-ULP are propagating mostly with the applied input signals and at few nodes the V dd is applied (static inverters) to form hybrid kind of designs. The Transmission Gate Adder (TGA) and Transmission Function Adder (TFA) are similar circuits which use the back to back coupled PMOS and NMOS devices to offer dynamic switching [3] . The Branch Based Logic and Pass Transistor combined (BBL-PT) adder and Ultra Low Power Full Adder (ULPFA) use the XOR function for sum circuits and static CMOS circuit for carry [4] . The Static Energy Recovery Full adder (SERF) eliminates the path to ground and avoids short circuit power P sc [5] . These are implemented as base cells individually in a multiplier to analyze and compare with the proposed MOSSI-ULP adder.
Proposed MOSSI-ULP 1-bit adder
The transmission gates use the parallel combination of PMOS and NMOS to get a specific function. But the proposed MOSSI-ULP adder has two PMOS transistors at the first stage, a static inverter at the intermediate stage and two pairs of back to back coupled PMOS and NMOS transistors at the output end: one pair for the sum output S and the other for the carry output C out as in Fig. 1 (a) . The remaining part of this section describes the functions and derivations of the proposed adder.
The drains of the two PMOS transistors at the first stage are coupled together and the node is named as q. The input A is applied to gate of the PMOS at the top as well as source of the bottom transistor. The other input B is applied to the gate of the PMOS transistor at the bottom and also to source of the upper PMOS. The upper transistor is switching A 0 B while the other predicts AB 0 . At node q these two signals are combined to give a wired or function. It is expressed as q=A 0 B+AB 0 which is a xor function constructed with the help of PMOS transistors as switches only (proved as in Fig. 3 (f) ). When V g 0, the PMOS conducts and yields V d V gs. Where V d is the drain voltage which is at logic '0' for V gs =0 and logic '1' for V d 1 V. On the other hand the PMOS transistor is at cut-off for |V gs | < |V TP | and hence it offers V TP as logic '0' termed as weak '0' [6, 7] . Here V TP is the threshold voltage of PMOS (for 32 nm, V TP =À0.35 V). These two conditions are expressed as V gs < V T P or jV gs j > jV T P j ; for conduction (1) V gs > V T P or jV gs j < jV T P j ; for cut-off (2) At the node r the static inverter gives the complement of q as r=q 0 = ðA 0 B þ AB 0 Þ 0 which is a xnor function. With respect to C in , the Sum output 'S' is expressed as
The carry output C out uses the signal from node r which is computed for the sum output. This is a logic sharing approach and also one of the circuit level low power techniques. The node r is connected with the gate terminals of PMOS and NMOS transistors of the bottom pair: the path in which the drains of the pair are coupled together to produce C out . The sources of the pair are driven by inputs C and B respectively to yield a wired or function C out at the coupled drains given as
Since r = q 0 , it is also true that q = r 0 . Hence Eq. (4) can be written as
Eq. (3) and Eq. (5) are the sum and carry outputs of MOSSI-ULP, proved true and the signals are as in the Fig. 1 (b) while Fig. 1 (c) represents its full custom layout.
Simulation results and comparative analysis
32 nm Berkeley Technology Model (BPTM) file is used to integrate the adders taken for consideration and they are used as base cells to test and analyze the performances. Fig. 2 shows the simulation test set up to validate the performances of the various logic methods. This top module uses two inputs AND gates to perform multiplication (product terms) which are represented in two forms. However, these are the same structures: one is the logic symbol and the other one is the dot product notation, represented to manage the space. The half adder is also a commonly used cell. 1-bit full adder is the Design Under Test (DUT) for which the discussed adders are used individually to simulate. All the methods use the same test environment with a supply voltage of V dd  1 V to find out the results uniformly as well as most accurately. Depending on the depth of vector merging cells the full adders and half adders (Fig. 3 (e) ) are used. Transistor sizing is the key factor which decides the value of V t and reduces power consumption as well as the circuit's switching activity [8] . Here all the methods use the same channel width profiles. The width of NMOS transistors are taken twice that of the feature size (channel length: 32 nm). For PMOS the width is double that of the NMOS. The results of adders chosen are listed in Table I .
The RCA structure uses 2T AND gates as in Fig. 3 (b) to produce partial products which needs reduced power than the consumption by Multiplier array using RCA blocks to test the adders conventional gate as in Fig. 3 (a) . For an m x n array with regular AND gates the transistors can be calculated as T C AND ¼ 6mn (6) Where, TC AND is the transistor count of AND gate, m is the number of digits of the multiplicand and n is for multiplier. For a 2T gate used in RCA, it is calculated as T C AND ¼ 2mn (7) Hence the 2T structure needs 3 times fewer transistors than the regular gate and its waveforms are as in Fig. 3 (c) . Fig. 3 (d) shows the conventional half adder which can be replaced with a circuit as in Fig. 3 (e) and its signals are given in Fig. 3 (f) . Short circuit (P sc ) and leakage (P leak ) power are the parts of total power dissipation. The former one happens when the output is in a meta-stable state while the latter is caused by the leakage current flow during the input voltage less than V t [9] .
At deep sub-micron technology, P sc and P leak are less important than the dynamic power due to its dominance and dependency on frequency given by
In Eq. (8), at full swing signal the input voltage V i  V dd and the terms are reduced as V dd 2 . The circuit with smaller load capacitance, higher delay and few nodes will consumes low power. Hence with least nodes the proposed MOSSI-ULP consumes 13 times low power than the ULPFA. Fig. 4 (a) shows the comparisons of average power with V dd ranging from 0.8 V to 1.6 V. The TGA and TFA are switching faster than the other designs due to the availability of supply voltage at their intermediate nodes. For all the adder types, the delay (t d ) as a function of V dd has no major variation because the difference in delay is distinct only when V dd is less than V t or different feature sizes are used for comparison. It is given as,
Where k is the trans-conductance and W is the width of MOS transistors. Suffix n and p refer NMOS and PMOS transistors respectively. The switching factor is calculated by taking number of nodes transferring from logic '0' to '1' in a unit time. Area is calculated using λ based design rules. . Hence 8λ is the pitch. The proposed architecture utilizes the least area whereas the ULPFA requires more than thrice the requirement by the former one. The MOSSI-ULP adder is efficient also in terms of Power Delay Product (PDP) by referring Fig. 4 (b) which is given in logarithmic scale to view the comparison legibly.
Conclusion
Similar circuits which are passing the signals to output with the help of the applied inputs are simulated and compared with the MOSSI-ULP adder. With just 8 transistors and fewer nodes, the MOSSI-ULP consumes a minimum of 4 times lower power against the SERF adder. It switches moderately with nearly 5 times later than the TFA and TGA circuits. The TFA and TGA circuits are propagating faster due to the availability of supply voltage at their intermediate nodes. The Power Delay Product of the MOSSI-ULP is 30% lower than TFA and TGA. The proposed method operates with 17 times lowest PDP against the ULPFA. The layout area of the proposed architecture is 3.4 times compact than the other adders. The signal integrity issues may be the future scope of this work. 
