ABSTRACT
INTRODUCTION
In most of the digital systems, adders are the fundamental component in the design of application specific integrated circuits like RISC processors, digital signal processors (DSP), microprocessors etc. The design criterion of a full adder cell is usually multi-fold. Transistor count is, of course, a primary concern which largely affects the design complexity of many function units such as multiplier and Arithmetic logic unit (ALU). The basic principle in designing digital adder circuit hovers around reducing the required hardware thus reducing the cost too. To achieve this, logical optimization helps to obtaining minimum number of literals to minimizing the transistor count and the power consumption and increasing the speed of operation.
A logic expression can be expressed in various logic forms, which differ in literal counts. In widely-used MOS circuits, the number of transistors to implement a Boolean expression is directly proportional to literal counts in its logic form [1] [2] . Thus, a logic optimization is simply to derive a logic form with the fewest literals. Logic level optimization is the design task where an RTL circuit description is optimized in terms of area, delay and power.
Conventionally, a logic level optimization can be achieved in two steps; they are Technology Independent (TI) and Technology -Dependent (TD) optimization. In the former method the circuit's Boolean description is optimized ignoring the technology in which the circuit will be implemented. In the second method, the output of the technology independent optimization step (i.e. optimized Boolean network) is optimized considering the adopted technology. During the TI step there is much flexibility to restructure circuit logic to minimize the number of nodes and literals, thereby reducing the area of the circuit. During this stage the circuit can be most effectively restructured to meet the specified delay constraints critical for circuit performance. During the TD step, the delay characteristics of the target library are available, but very few restructuring of the circuit is possible.
Logical effort [13, 14] has been widely used in a variety of application domains as well as in industry standard EDA synthesis tools. Designing a circuit to achieve the greatest speed or to meet a delay constraint presents a bewildering array of choices [13, 14] . The method of logical effort is a design procedure for achieving the least delay along a path of a logic network. This method is based on a simple approximation that treats MOS circuits as networks of resistance and capacitance. This RC model provides simple mathematical calculation to obtain the circuit's maximum speed. In this paper the delay model for optimized full adder circuit and its delay estimation is also presented.
In this paper, we proposed 20 different Boolean expressions (logic construction) to implement a 1-bit full adder circuit. All the Boolean expressions are realized in terms of CMOS logic. The optimization method used in this work is technology independent optimization step. These Boolean logic realization and performances are analyzed in terms of transistor count, delay and power dissipation using Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is selected and it is implemented in terms of multiplexers and it is incorporated in selected existing adder topologies like ripple carry adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. Performance comparison of existing and logic optimized schemes are analyzed on cell-based VLSI technologies, such as standard-cell based FPGAs. The cell-based approach is justified by its wide-spread use in the ASIC design community and its compatibility with hardware synthesis, which in turns satisfies the demand for ever higher productivity. This work presents the significance of adder comparison in terms of CLBs occupied and its maximum combinational delay exist in adder topology.
The organization of the paper is as follows: The section 2, describes the existing adder topologies. Section 3, presents the mathematical Boolean expression for the design of 1-bit full adder cell. Section 4 presents the simulation and analysis of full adder using Tanner EDA. Section 5 presents the FPGA implementation of different adder topologies. Section 6 gives the summary of comparison. Finally the conclusion is presented in section 7.
REVIEW OF EXISTING ADDER TOPOLOGY
Most of the VLSI applications, such as digital signal processing, image and video processing, and microprocessors, extensively use arithmetic operations. Addition, subtraction, multiplication, and multiply and accumulate (MAC) are examples of the most commonly used operations. The 1-bit full-adder cell is the building block of all these modules. Thus, enhancing its performance is critical for enhancing the overall module performance. This section presents the overview of the existing adder topologies.
In FPGAs, the most fundamental component implemented for high speed applications like microprocessors, arithmetic logic unit, program counters and multiply accumulate unit. Lot of implementations has been made for these adder topologies for optimizing area, delay and power dissipations. In the reference [1] , it provides an overview for the comparison of adders in the early design phase for selecting their appropriate design structure for implementing adders with the constraints of area, delay and power dissipation. This paper also reveals the pre-estimation of energy-delay, product, energy-delay estimation and power estimation in the energy delay space. In the reference [2] , the proposed high speed and low power full adder cells which has designed with pass transistor logic styles to reduce the power delay product (PDP). This paper also reports the performance comparison of adder cells with CMOS, DCVS, CPL, DPL, Swing restorer CPL and hybrid styles. This paper shows the implementation of adder cells with enhanced carry generation stage which is implemented with multiplexes. This feature provides that for this logic there are no internal signals being generated for controlling the selection of output multiplexers, thereby reducing the full voltage swing, delay and overall propagation delays.
The adder topology is present in literature [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] , Ripple Carry Adder (RCA) is the simplest, but slowest adders with O(n) area and O(n) delay, where n is the operand size in bits. Carry LookAhead (CLA) have O(nlog(n)) area and O(log(n)) delay, but typically suffer from irregular layout. On the other hand, Carry Skip Adder, carry increment and carry select have O(n) area and
delay provides a good compromise in terms of area and delay, along with a simple and regular layout. Carry save adder have O(n) area and O(log n) delay. The ripple carry adder, the most basic of flavours, is at the one extreme of the spectrum with the least amount of CLBs but the highest delay. CLA adders can be realized in two gate levels provided there is no limit on fan in/out. The carry select adders reduce the computation time by pre-computing the sum for all possible carry bit values (ie '0' and '1'). After the carry becomes available the correct sum is selected using multiplexer. Carry select adders are in the class of fast adders, but they suffer from fan-out limitation since the number of multiplexers that need to be driven by the carry signal increases exponentially. In the worst case, a carry signal is used to select n/2 multiplexers in an nbit adder. When three or more operands are to be added simultaneously using two operand adders, the time consuming carry propagation must be repeated several times. If the number of operands is 'k', then carries have to propagate (k-1) times.
MATHEMATICAL EQUATIONS FOR FULL ADDER
A full adder is a combinational circuit that performs the arithmetic sum of three bits: A, B and a carry in, C, from a previous addition produces the corresponding SUM, S, and a carry out, CARRY. The various equations for SUM and CARRY are given below are implemented with CMOS logic with technology independent optimization process and its performance are analyzed in terms of transistor count, delay and power dissipation using Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is selected and it is implemented in terms of multiplexers and it is incorporated in selected existing adder topologies like ripple carry adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size.
Mathematically it is also possible to calculate the delay of a circuit by constructing delay models instead of simulation tools using logical effort methods. The logical effort provides a simple method "on the back of an envelope" [13, 14] to choose the best topology (logical constructs) and number of stages of logic for a function. Speed optimization of circuit network can be achieved by the method of logical effort. This method provides how many stages of logic are required for the fastest implementation of any given logic function. The speed of the circuit depends on the capacitive load that the circuit of the logic gate drives and the logic function of the gate. The delay incurred in a logic gate are expressed as sum of two components namely, the parasitic delay p and the effort delay f as follows [13] [14] .
The delay in single stage network is expressed as
where, g -Logical effort (the ability of the logic gate's topology to produce output current) h -Electrical effort (the ratio of output capacitance to input capacitance) p -Intrinsic delay (delay of the gate due to its own internal capacitance) Table 1 presents the logical effort of common static CMOS gates assuming the aspect ratio of pull-up and pull-down network to be 2:1 to have equal rise and fall delay. Table 2 presents the parasitic delay of CMOS logic independent of the size of the logic gate and of the load capacitance it drives. The principle contribution to parasitic delay is the capacitance of the source/drain regions of the transistors that drive the gate's output.
An example to calculate the delay of a full adder is shown in Figure ( 2) using the expression
. The circuit is realized as two stage network, stage1 and stage2 respectively. Assume that the input capacitance of 10pf on each input and it will drive the output capacitance with a maximum of 10pf. So the total delay will be the sum of CARRY and SUM which is equal to 22.6ps. From this observation the delay of the circuit vary with change in the input and output capacitance value.
SIMULATION AND PERFORMANCE ANALYSIS OF FULL ADDER
The proposed 20 different Boolean expressions (logic construction) are simulated using Tanner EDA with BSIM3v3 250nm technology with supply voltage ranging from 1V to 2V in steps of 0.2V. All the full adders are simulated with multiple design corners (TT, FF, FS, and SS) to verify that operation across variations in device characteristics and environment. The simulated setup for optimized full adder's (using XOR,MUX) test bed and its gate equivalent along with its input/output waveform is shown in Figure ( 3 ). The test bed is supplied with a nominal voltage of 2V in steps of 0.2V and it is invoked with the technology library file Generic 025 and it is specified with TT, FF, FS and SS conditions. The W/L ratios of both nMOS and pMOS transistors are taken as 2.5/0.25µm. To establish an unbiased testing environment, the simulations have been carried out using a comprehensive input signal pattern, which covers every possible transition for a 1-bit full adder.
The frequencies have been chosen in the range from 10 to 200MHz and its input and output capacitances are set to 10pf. The three inputs to the full adder are A, B, C and all the test vectors are generated and have been fed into the adder cell. The cell delay has been measured from the moment the inputs reach 50% of the voltage supply level to the moment the latest of the SUM and CARRY signals reach the same voltage level. All transitions from an input combination to another (total 8 patterns, 000, 001, 010, 011, 100, 101, 110, 111) have been tested, and the delay at each transition has been measured. The average has been reported as the cell delay. The power consumption is also measured for these input patterns and its average power has been reported in Table 3 . The simulation results are shown in Table 3 . The performance of all the full adders has been analyzed in terms of delay, transistor count and power dissipation. It is observed that adder designed with XOR and MUX has the least delay, transistor count and power dissipation when compared to other combinations of gate. So the adder realized with MUX and XOR is considered to be the optimized adder in terms of delay, transistor count and power dissipation. The second optimized full adder is realized from XNOR, NOT and MUX. 
FPGA IMPLEMENTATION
In this work the adder structures used are: Ripple Carry Adder, Carry Look-Ahead Adder, Carry Save Adder, Carry Increment adder, Carry Select Adder, Carry Skip Adder. From section IV it is observed that the optimized equation for implementing 1-bit full adder is using XOR and MUX. So the primitive of this adder cell is implemented with multiplexer and this module is incorporated with existing adder topologies. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. This device was chosen because the Spartan3E families of Field-Programmable Gate Arrays (FPGAs) are specifically designed to meet the needs of high volume, cost-sensitive consumer electronic applications. The Spartan-3E family builds on the success of the earlier Spartan-3 family by increasing the amount of logic per I/O, significantly reducing the cost per logic cell. These Spartan-3E FPGA enhancements, combined with advanced 90 nm process technology, deliver more functionality and bandwidth. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size. Structural Gate level modeling using Verilog HDL was used to model each adder. The Xilinx ISE Foundation version 12.1i software was used for synthesis and implementation. Figure 6 Different Adder topologies the adder topology implemented with optimized equations that are realized in terms of multiplexers. It is noticed that delay, area and power delay product are less when compared to the normal expression. Figure 6 shows different adder topologies 
SUMMARY
The proposed 20 different Boolean expressions (logic construction) are simulated using Tanner EDA with BSIM3v3 250nm technology with supply voltage ranging from 1V to 2V in steps of 0.2V. It is observed that adder designed with XOR and MUX has the least delay, transistor count and power dissipation when compared to other combinations of gate. So the adder realized with MUX and XOR is considered to be the optimized adder in terms of delay, transistor count and power dissipation. A new low-power, high-speed full adder cell is proposed using XOR and MUX gates. Its performances have been analyzed and reported in section 4. This optimized adder is designed with fully MUX based structure in FPGA using VERILOG HDL and this module is incorporated in the existing adder topologies and its comparison is made. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. The comparison of delay, slice occupied, AT and its power dissipation is depicted in the ) and it is also observed that the delay for RCA and CLA are the same and its distribution is shown in the graph (Figure 7a ). In case of slice utilized there is no change occurs for RCA and CLA hence its distribution is shown as single red line in the chart ( Figure  7b ). From AT chart (Figure 7c ) it is noticed that the AT value is large for 64 bit carry select adders and adders like ripple carry adder, carry look ahead adder and carry increment adder have less AT Value. From PD distribution (Figure 7d ) less power dissipation occurs for carry increment and ripple carry adders, maximum dissipation occurs for carry save and carry skip adders. According to the presented results, the adder topology which has the best compromise between area, delay and power dissipation are carry look-ahead and carry increment adders and they are suitable for high performance and low-power circuits. The fastest adders are carry select and carry save adders with the penalty of area. The simplest adder topologies that are suitable for low power applications are ripple carry adder, carry skip and carry bypass adder with least gate count and maximum delay. 
CONCLUSION
An extensive performance analysis of 1-bit full-adder cells has been presented. Technology independent logic optimization is used to design 1-bit full adder with 20 different Boolean expressions and its performance was analyzed in terms of transistor count, delay and power dissipation using Tanner EDA with TSMC MOSIS 250nm technology. From this analysis XOR and MUX based expression provides low transistor count, minimum delay and minimum power dissipation when compared to other logic equations. The second optimized full adder can be realized using XNOR, NOT and MUX. The other optimized solution for constructing full adders are using NAND gates only, XOR, XNOR,MUX combination and XOR, AND,OR,MUX combination. The worst case full adder construction is not using NOR gate which occupies large transistor count, dissipates large power and has longer delay. Logical effort delay model to estimate the parasitic delay is also presented. Using the optimized expression the primitive adder cell is implemented with multiplexer and this module is incorporated with existing adder topologies like ripple carry adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. The comparison and its simulation results have been presented. Based on the comparison it is observed that number of slices occupied, power dissipation and delay are less using the optimized expression. The work presented in this paper gives more insight and deeper understanding of constituting modules of the adder cell to help the designers in making their choices.
