Abstract-In this paper the main topologies of one-bit full adders, including the most interesting of those recently proposed, are analyzed and compared for speed, power consumption, and power-delay product. The comparison has been performed on two classes of circuits, the former with minimum transistor size to minimize power consumption, the latter with optimized transistor dimension to minimize power-delay product. The investigation has been carried out with properly defined simulation runs on a Cadence environment using a 0.35-m process, also including the parasitics derived from layout. Performance has been also compared for different supply voltage values. Thus design guidelines have been derived to select the most suitable topology for the design features required. This paper also proposes a novel figure of merit to realistically compare -bit adders implemented as a chain of one-bit full adders. The results differ from those previously published both for the more realistic simulations carried out and the more appropriate figure of merit used. They show that, except for short chains of blocks or for cases where minimum power consumption is desired, topologies with only pass transistors or transmission gates are not attractive. In contrast, the most interesting implementations in terms of trade off between power and delay are the traditional CMOS and Mirror topologies. Moreover, the Dual-rail Domino and the CPL allow the best speed performance.
I. INTRODUCTION

D
UE to increasing interest in low power ICs [1] - [7] for portable measurement instrumentation, laptop computers, cellular communications, etc., design choices which take into consideration low power features along with other circuit features are of the utmost importance.
Addition is the most commonly used arithmetic operation in microprocessors and DSPs, and it is often one of the speed-limiting elements [8] , [9] . Hence optimization of the adder both in terms of speed and/or power consumption should be pursued. During the design of an adder we have to make two choices in regard to different design abstraction levels. One is responsible for the adder's architecture implemented with the one-bit full adder as a building block. The other defines the specific design style at transistor level to implement the one-bit full adder. In this paper we have focused our attention on the lower design level. We have analyzed and compared the most interesting known topologies to implement a one-bit full adder. Moreover, considerations at architecture level deriving directly from the kind of one-bit full adder adopted are also included. The one-bit full adder used is a three-input two-output block. The inputs are the two bits to be summed, and , and the carry bit , which derives from the calculations of the previous digits. The outputs are the result of the sum operation and the resulting value of the carry bit . More specifically, the sum and carry output are given by
From (2) it is evident that if the carry output is equal to their value. If we have (the full adder is said to be in propagate mode), and, hence, the full adder has to wait for the computation of .
Until now, in the literature there have been some comparisons between full adder circuits [10] , [11] . However, in the former work, no low-power topologies were analyzed at all, whereas in the latter, new topologies which appear to be promising are not taken into account [12] - [14] . Moreover, the effects of the interconnection parasitics of low-power full adders in [11] were extracted from layout only for the CMOS and CPL topologies, while they were only approximately estimated for the other circuits. For these reasons, these approximate results differ from those presented in this paper.
Apart from [10] and [11] , no systematic comparisons have been developed in the literature for other topologies, and recently proposed circuits are compared to existing ones by applying different simulation and comparison strategies, and even using different technologies. Hence, it is not easy to compare performances in a fair and clearly understandable way.
The analysis and comparison developed here have been carried out in terms of speed, power consumption and power-delay product (PDP). The investigation, which also includes the most interesting recently proposed one-bit full adders, has been based on simulation runs on a Cadence environment by using a 0.35 m process taking parasitics into account, since postlayout simulations have been performed. Two design strategies have been used to size each topology. The former aims to minimize power consumption adopting minimum-size transistors, the latter achieves minimum power-delay product by suitable transistor sizing. Performance for both design strategies has been also compared for different supply voltage values. In Section II the considered topologies are reported and briefly described. The reasons for not considering other known topologies are also looked into. In Section III, there is provided a detailed description of the simulation strategy adopted which enabled accurate and consistent results to be obtained. Then the performance of one-bit full adders in terms of propagation delay, power and PDP is analyzed and compared. More specifically, the analysis and comparison carried out in Section IV are based on transistor sizing for minimum power dissipation, while those in Section V assume a transistor sizing for min- imum PDP. In Section VI a higher design level is considered. In particular it is shown that, for the adder or its building blocks designed as cascaded one-bit full adders, speed performance cannot simply be represented in terms of propagation delay.
Thus a more realistic figure of merit is introduced and used to compare in a more realistic manner an -bit adder implemented as a chain of full adders. Final remarks and conclusions are reported in Section VII. 
II. FULL ADDER TOPOLOGIES ANALYZED
The most significant one-bit full adders suitable for low power dissipation and/or high performance proposed in the literature are briefly reported in this section.
The CMOS full adder, in Fig. 1(a) , is a simple implementation of (1)-(2) by noting that [8] , [9] . The Mirror adder, in Fig. 2(a) , is simply derived from the CMOS adder by directly connecting the series PMOS transistors to the supply, both in the carry and sum circuits [8] , [9] , since when the series connected PMOS transistors are connected to . The CPL full adder [15] , shown in Fig. 3(a) , is made up of NMOS pass-transistors, and has differential inputs and outputs. Cross-coupled PMOS transistors are introduced to achieve the level restoring thus reducing short-circuit power consumption.
The LEAP full adder [16] , shown in Fig. 4(a) , is obtained from the CPL, and has a smaller number of transistors since it uses only one NMOS tree for each output, with the complementary output being obtained by a simple inverter. Due to the swing degradation at the first inverter's input, the minimum supply voltage is equal to . The LP full adder [13] , shown in Fig. 5(a) , has a low power consumption because it is based on the low-power XOR and XNOR cells [17] .
The TG full adder [8] , shown in Fig. 6(a) , is based on transmission gates and introduced for its low power dissipation [14] . As in the case of the LP circuit, cascading full adders leads to an overall propagation delay roughly proportional to , which becomes excessive for long chains of full adders. This drawback is solved in the TGdrivcap [8] , shown in Fig. 7(a) . Output buffers which interrupt the transmission gate chain when cascading full adders are added.
The Dual-rail Domino circuit [18] is the differential implementation of the traditional dynamic Domino full adder, and overcomes the well-known limitation of the latter in the gate implementation of inverting logic functions [19] . Since it exhibits a low delay, it is usually used in high-performance circuits [20] . Its topology in Fig. 8(a) is based on an NMOS pull-down network and a PMOS pair, driven by the clock, that brings the gate in the precharge or evaluation mode. The output inverters ensure that no race occur, while the weak feedback PMOS transistors allow to reduce the charge-redistribution problem, increasing the noise immunity [21] .
The 10T full adder proposed in [14] has not been considered since its output voltage levels are severely degraded, and hence it cannot work at low supply voltages. Finally, the other known pass-transistor topologies have not been considered here due to their lower performance compared with the CPL and LEAP.
The full adder topologies can be classified into two categories: circuits with driving capability and circuits without driving capability. In the circuits belonging to the former class, inputs and outputs are decoupled since the path between them includes the gate terminal of one or more transistors. In the latter category, inputs and outputs are not decoupled, as for example implementations with pass-transistors or transmission gates which do not include inverters. In this paper, excepting LP and TG, the other topologies considered have driving capability.
The layout view of the topologies considered are shown in Figs. 1(b)-7(b), which refer to the topologies in Figs. 1(a)-7(a), respectively, for minimum power design (i.e., minimum-sized transistors), and in Figs. 1(c)-4(c), 6(c)-7(c) and 8(b), which refer to the circuits designed to optimized power-delay product. It is worth noting that the LP full adder sized for minimum power consumption also exhibits minimum PDP. Moreover, the Dual-rail Domino circuit designed for minimum power consumption has not been considered because of its high power consumption, and, hence, it is only suitable for high-speed applications.
The transistor count and the area of the circuits analyzed are reported in Table I for the two design strategies, and the values normalized to the smallest circuit are in brackets. By analyzing this table, it becomes apparent that the transistor count is strongly related to silicon area in full adder circuits designed to minimize the power consumption. More specifically, for the process con- sidered the ratio between area and transistor count has an average value of 16.7 and a standard deviation equal to 2.7.
III. SIMULATION STRATEGIES
To compare one-bit full adders' performance, we have evaluated delay and power dissipation by performing simulation runs on a Cadence environment using a 0.35-m CMOS technology, whose main parameters are reported in Table II , extracting parasitics from the layout.
The simulations have been performed for different power supply voltages, which allow us to compare the speed degradation and power reduction versus power supply of the topologies considered. In particular, we considered four values of , equal to 3.3 V (the maximum value allowed by the technology), 2.5 V, 1.8 V and 1.2 V. The last value is slightly lower than the sum of the threshold voltages of the N and P transistors.
Each one-bit full adder has been analyzed in terms of propagation delay, average power dissipation and their product. The propagation delay has been measured as the time interval between the time the input signal takes to reach 50% of its logic swing and the time the output takes to reach the same value (for the case of differential input and output, we considered the worst value among the outputs and its complemented value). The power dissipation has been evaluated by averaging the power flowing into the full adder.
It is well known that input waveform significantly affects delay and short-circuit power dissipation. Hence, to avoid underestimating delay and power consumption, we have fed realistic waveforms to inputs and by inserting two symmetrical inverters between the ideal voltage sources and the input nodes, while carry input has been obtained by inserting an equal full adder in propagate mode. The sum output and the carry output nodes have been realistically loaded by connecting a minimum-sized inverter and the carry input of an equal full adder, respectively.
Of course, for each supply value, we first carried out the functional verification of each full adder following the strategy in [22] , considering all possible input transitions. Moreover, since the overall speed of an -bit adder mainly depends on the carry output delay in propagate mode [9] , , it makes sense to consider it as a measure of the speed performance.
The average power dissipation, , has been evaluated by applying a sequence of 1000 casual patterns, assuming 0 and 1 with equal probability for each input. The frequency of the 1000 patterns has been set small enough to allow each full adder output to reach the steady state for all the considered values of TABLE II and transitions, in order to carry out a fair comparison. Otherwise, the slowest topologies would have been advantaged with respect to the fastest ones, since their internal capacitances would have not been charged to the full swing voltage, so that the delay and dynamic power contribution would have been understimated. For the considered process and values of , according to this criterion we have set the switching frequency to 50 MHz.
Finally, we have evaluated the PDP, as the product of the with the average dissipated power .
IV. MINIMUM POWER SINGLE FULL ADDER PERFORMANCE
Comparison of circuits with minimum-sized transistors has been carried out in terms of delay, average power, and powerdelay product. The Dual-rail Domino has not been considered since, being a dynamic circuit, it suffers from a high power consumption compared to its speed performance. Results and comments are reported in the subsections that follow.
A. Delay Comparison
The values of obtained for the considered values of (1.2 V, 1.8 V, 2.5 V, and 3.3 V) are shown in Table III (the results normalized to the best are reported in brackets). To make the comparison easier, we have also plotted of each topology versus in Fig. 9(a) . Moreover, to highlight the speed degradation for low supply voltages we have defined a degradation index as the value of normalized to that for V, which is plotted versus in Fig. 9 (b). It is apparent that the topologies without driving capability (TG and LP) have the smallest , even though their speed advantage is greatly reduced for low due to their heavier speed degradation [see Fig. 9(b) ]. Moreover, the TG full adder is always faster than LP especially at high supply voltages.
Apart from TG and LP, the CPL full adder is faster than the others independently of the supply voltage value, and its speed advantage increases at low values of . Indeed its speed degradation is the lowest as shown in Fig. 9(b) . This means TABLE III that, reducing , the speed degradation of pass-transistors is lower than that of traditional CMOS circuits (CMOS and Mirror adders). This is in contrast with [11] , in which the CPL seemed to be slower than the Mirror adder, and its speed seemed to degrade more significantly for low . In [23] the CPL adder was said to be slower and to have a dissipation lower than the CMOS adder for low , but the considered pass transistor topology did not include the buffer and level restoring circuitry and the pass transistor tree topology was slightly different (the path between carry input and carry output was not topologically optimized). Moreover, the CMOS adder was implemented with logic gates in an inefficient way, hence the comparison was not realistic. Along with the CPL topology, the LEAP full adder has a high speed for high (slightly better than CPL, by 4% at V). However, its performance quickly degrades for low (worse than CPL by 62% at V) and does not work for V ( ). Thus it is not suitable for low-voltage circuits. Its speed degradation lowering is worse than the CPL and traditional CMOS style due to the reduction by of the high input voltage of inverters, thus the PMOS swing restorer is activated late 1 . This is confirmed by simulations, which show that for low the transient when the PMOS goes on is significantly slower than the other transitions.
The CMOS is only slightly better than the Mirror full adder (at most 4%), and they have a significantly higher delay than CPL only for low , while for high their delay is only 25% greater than CPL.
By inspection of Fig. 9(b) , the circuits without driving capability (TG and LP) suffer from a greater speed degradation compared to those with driving capability. This heavy speed degradation at low can be understood considering that the behavior of the transmission gate connecting and strongly depends on whether is greater or lower than . In fact, assuming a gate voltage equal to and 0 for the NMOS and PMOS, respectively, for the transmission gate roughly behaves like a linear resistance proportional to ).
B. Power Dissipation Comparison
The average power dissipation evaluated under different supply voltages is summarized in Table IV and plotted in Fig. 10 . It is clear that CPL adder dissipation is always the highest, hence the CPL topology should not be used when the primary target is low consumption. This confirms that for an assigned value of the CPL full adder does not have a low power consumption [11] , even though in the literature the CPL logic was considered a low-power logic style [15] , [16] , [23] - [27] . The LEAP adder has a lower power dissipation, TABLE IV which rapidly decreases for low . More specifically, the power saving with respect to CPL is equal to 30% for V and 40% for V. The topologies without driving capability always dissipate less than the others, hence, TG and LP circuits are suitable for low power adders for every value of , and between them the TG adder has a stronger advantage for high supply voltages.
The Mirror adder has slightly (at most 5%) better values than the CMOS, and they both have the lowest power dissipation after TG and LP. More specifically, both at high and low values the Mirror adder consumes at most 25% more than the best value, and for intermediate this difference is reduced. Hence, the Mirror and CMOS adders are also suitable for low power digital circuits.
The TGdrivcap always consumes significantly more than CMOS and Mirror. In particular, it wastes 34% more than the Mirror adder at V and 54% at V. Hence it is not suitable for low-power adders.
From the results summarized in Table IV it is possible to observe that power dissipation decreases more rapidly than the expected . This is because dynamic power contribution scales as and the short circuit contribution scales more quickly.
C. PDP
The PDP is a quantitative measure of the efficiency of the tradeoff between power dissipation and speed, and is particularly important when low-power operation is needed.
The values of PDP evaluated under different supply voltages from Tables III and IV are summarized in Table V and plotted in Fig. 11(a) , and the value normalized to that obtained with V is plotted in Fig. 11(b) . As expected, the PDP has a flat minimum for intermediate values of
. Apart from the LEAP adder, the minima are placed at around V for all the topologies considered. In the particular case of traditional CMOS topologies (i.e., the CMOS and the Mirror adder), this confirms that the optimum PDP lies at about equal to as stated in [5] (where the threshold voltage of the PMOS and NMOS where assumed equal).
From analysis of Table V, the circuits without driving capability have the lowest PDP, and among those with driving capability the Mirror and CMOS adders are the best, with a large advantage compared to the others (excepting the LEAP). This means that the more suitable topologies for low-power high-performance circuits are the TG, LP, CMOS, and Mirror adders, and they are also attractive when it is possible to lower and speed by a tradeoff with power dissipation. 
TABLE V
The CPL adder has a high PDP (roughly 30% more than CMOS and Mirror adders), hence it is not convenient to use it in low-power circuits.
The LEAP adder has a favorable PDP only for high , while it significantly degrades reducing the power supply. More specifically, for V it is lower than that of Mirror Adder by 15%, while for V it becomes higher by 26%.
V. MINIMUM-PDP ONE-BIT FULL ADDER PERFORMANCE
By optimizing the transistors size of the full adders considered, it is possible to reduce the delay of all topologies without significantly increasing the power consumption, and transistor sizes can be set to achieve minimum PDP. In particular, we have redesigned the considered topologies to minimize PDP by iterating design after post-layout simulations. Only for the LP topology it has been found that minimum-sized transistors guarantee minimum PDP too.
Comparison of single full adders designed to achieve minimum PDP is discussed below. In particular, the following three subsections refer to delay, average power, and PDP, respectively. 
A. Delay Comparison
The percentage delay reduction after PDP minimization for the considered topologies versus , shown in Fig. 12(a) , reveals that the optimization benefits the LEAP full adder less than the others (at most 20%). The same consideration holds for the TG adder for medium to high values of . The other topologies exhibit a greater delay reduction, on the order of 40-50%. By inspection of Fig. 12(a) , all the topologies (with the exception of LEAP) have a speed degradation for low lower than the analogous circuits with minimum transistors. This means that the transistor optimization makes the circuit performance less sensible to scaling, especially for the TG topology. Hence, the degradation index of all the topologies optimized for minimum PDP, shown in Fig. 13(b) , is always more favorable than that with minimum transistors, especially at low (the reduction is within 20%, with the exception of the TG full adder whose reduction is about 50%).
The values of obtained for the considered values of are shown in Table VI and plotted versus in Fig. 13(a) . As detailed in the following, many of the observations on and highlighted for minimum sizing are valid in the case of nonminimum sizing. As in the case of minimum transistor sizing, the topologies without driving capability (TG and LP) have the smallest , and their speed advantage strongly reduces for low . However, this reduction is almost halved for the optimized TG full adder with respect to the minimum transistor circuit, thus the TG full adder advantage over the LP is stronger when designed for minimum PDP especially for low values of . Among the circuits with driving capability, the CPL is confirmed to be faster than the others, especially for low values of . Compared to the case of minimum transistor sizing, its advantage over the other topologies with driving capability is stronger at high , and equal at low . Moreover, in the case of sizing for minimum PDP, the delay of CPL is closer to that of TG/LP topologies (except for V). Therefore, as highlighted in Section IV, the CPL is suitable for high-performance arithmetic circuit, especially for the design strategy that minimizes PDP.
Differently from the case with minimum transistors, the CMOS is slower than the Mirror full adders roughly by 15% (slightly more at V), due to the higher parasitic capacitances. Moreover, the latter exhibits a delay greater than that of CPL roughly by 25% for all values of . The optimized LEAP full adder does not exhibit a high speed with respect to the other topologies, and it is always slower than the Mirror adder (by 15% to 100% for ranging from 3.3 V to 1.8 V). Moreover, in contrast with [11] , in which LEAP was simulated without taking parasitics into account, the LEAP circuit is always slower than the CPL (by 45% to 150% for ranging from 3.3 V to 1.8 V). The TGdrivcap does not offer any advantage over the Mirror Adder, since it is always slower.
In regard to the dynamic circuit, as expected the Dual-rail Domino full adder proves to be very fast. Indeed, its delay is lower than that of the CPL ( by 10% at low and 40% at high ), and is close to that of TG topology at low . Hence it is particularly suitable for high-performance arithmetic circuits.
B. Power Dissipation Comparison
As shown in Fig. 12(b) , although the power dissipation increases with respect to the cases in Section IV, many of the considerations highlighted in the case of minimum transistor size still hold. More specifically, as reported in Table VII and plotted in Fig. 14 , the CPL adder dissipation is the highest after the Dual-rail Domino. Moreover, as expected, the Dual-rail Domino full adder has a power consumption much higher than CPL, by a factor of three, and greater than that of Mirror adder by a factor of five (of this amount, 15% is consumed in the clock input). 
TABLE VII
The power dissipation of LEAP full adder is about half of that of CPL, and quite close to that of the Mirror adder.
Among circuits without driving capability, the LP power dissipation is the lowest just because its optimized version has minimum transistors, and the TG full adder optimized for minimum PDP no longer has a low power dissipation with respect to the other circuits. More specifically, the CMOS and Mirror power dissipation is comparable to or even lower than that of the TG topology, and, among them, the CMOS topology typically has a dissipation lower than Mirror by 7%.
The power wasted by the TGdrivcap is confirmed to be greater than that of CMOS and Mirror by about 50%.
C. PDP
The PDP decrease, obtained by transistor sizing which optimize it with respect to the case with minimum-sized transistors, is shown in Fig. 12(c) and, unless for LEAP and TG (at medium and high ), is of about 30-40%. The values of PDP obtained from Tables VI and VII are summarized in Table VIII and plotted in Fig. 15(a) , and the value normalized to that obtained with V is plotted in Fig. 15(b) .
By inspection of Table VIII, the circuits without driving capability still have the lowest PDP, even though their advantage is significantly lower than in the case of minimum sizing for medium and high . Among circuits with driving capability, the Mirror and CMOS adders are confirmed to be the most efficient, and, compared to the others, have a greater advantage with respect to the case of minimum transistors. The CPL and the Dual-rail Domino circuits have a higher PDP than that of the Mirror adder by 30% and 400%, respectively. Thus, they should be used only when high performance is mandatory, especially for the latter. The LEAP and TGdrivcap adders have a PDP greater than of the Mirror adder, especially at low , and since they do not provide a significant advantage in terms of speed, they are not convenient neither in high performance nor in low power circuits. 
VI. -BIT ADDER CIRCUITS PERFORMANCE
In the previous section, we have analyzed the performance of a one-bit full adder to understand the potential of each topology without the influence of the architecture adopted. Actually, most adder circuits are based on chains of full adders [8] , [9] , [28] . In this section we analyze the performance of the topologies considered implementing full adder chains for different numbers of cascaded full adders.
In regard to power consumption evaluation, considering that a full adder chain does not allow a fair comparison, the most correct way to compare the dissipation of full adder topologies is to consider a single full adder. Indeed, evaluating the power dissipation by simulating a full adder chain advantages the slowest circuit because it is less affected by the glitching contribution compared to faster ones [9] . Since the power dissipation due to glitches depends on the number of cascaded full adders and the specific architecture adopted, simulating a full adder chain does not provide a good estimation of the intrinsic properties of each circuit. As a consequence, in this section we only consider the speed performance of cascaded full adders.
A. Full Adder Chain Performance
Considering a chain of full adders to implement an -bit adder, the critical path delay (through the carry input and output nodes) increases very differently when increasing the number of bits depending on the design style of the one-bit full adder. In particular, we have to separately consider the topologies with driving capability and those without driving capability.
In the circuits belonging to the first category, we can evaluate the propagation delay of the full adder chain simply by adding the propagation delay of each full adder in propagate (3) In contrast, for the circuits without driving capability (i.e., the TG and LP full adders), from the input to the output of the full adder chain, the critical path which defines the propagation delay is an uninterrupted chain of transmission gates. Hence, it can be represented as a linear ladder RC network, as shown in Fig. 16 , where represents the equivalent linear resistance of a transmission gate, and the equivalent capacitance at each node. Approximating the RC network with a single pole behavior [29] , [30] , whose time constant is , and assuming a step carry input signal 2 , the propagation delay is equal to (4) Note that (4) gives for a one-bit full adder without driving capability.
Since the full adder chain propagation delay strongly depends on the number of bits, , and the dependence differs for implementation with or without driving capability, the comparison between solutions with and without driving capability limited to only one full adder is not realistic. Indeed, as shown by the ratio between (5) the speed advantage of the TG and LP topologies shown in the previous section is too optimistic, especially for a high value of .
To validate the theoretical results on the speed performance of an -bit adder, simulations at different power supplies versus the number of bits have been performed using circuits with minimum transistors, and the results are plotted in Fig. 17 . In particular, the number of bits ranges from one to eight, and Fig. 17(a) to (d) refer to a equal to 1.2 V, 1.8 V, 2.5 V, and 3.3 V, respectively. Simulations show the linear and the square dependence of the delay of full adders with and without driving capability, respectively. Indeed, according to theoretical results, simulations show that the advantage of the one-bit LP and TG full adders is lost as increasing the number of bits, while this advantage holds for a smaller number of bits which is reduced as lowering the power supply. Moreover, at a given power supply, the speed advantage of TG over LP is roughly independent of .
The scenario is even worse for the implementation without driving capability. Indeed, in order to evaluate the speed performance of a whole system, we have to take into account that they drive other circuits, thus we should not only consider the propagation delay, but also the shape of the output nodes response. To this end compare for example the carry output waveform of the full adders in an 8-bit chain of Mirror adders, plotted in Fig. 18 , with those of the TG implementation plotted in Fig. 19 . It is clear that the speed performance is also highly affected by the rise time. 
B. A Novel Parameter to Evaluate Full Adder Speed
To overcome the limited information on the full adders' speed performance given by the simple propagation delay metrics, we introduce a novel figure of merit which includes the information of the propagation delay and that of rise/fall time . Indeed, the former can be thought of as the delay that the signals encounter in crossing the circuit, but without taking into account the shape of the output signal (i.e., the time it takes to go from one state to the other). The latter gives information only on the dynamics of the output without any link to the input signals.
The novel figure of merit is defined as (6) and represents the time required to almost achieve the steady state output value starting from the time in which the input signal crosses the logic threshold. For a full adder with driving capability the rise/fall time is independent of the number of bits. Hence, it is equal to that of a one-bit full adder, and can be evaluated in an approximated manner considering the capacitance at the output node charged/discharged by a constant current source (i.e., a linear charge/discharge). Remembering that for this class of circuits the propagation delay can be approximated as 50% of a linear charge/discharge on the output node [9] , the rise/fall time can be approximated as (7) In contrast, for a full adder implemented with a circuit without driving capability, the output waveform has an exponential behavior quite close to that of a one-pole transfer function. Hence the rise/fall time depends on the number of bits since it is linearly related to the propagation delay of the full adder. For this class of circuit, the rise/fall time is given by (8) To carry out a comparison using the proposed figure of merit, evaluate the ratio between the for the full adder with and without driving capability (9) which, as expected, shows that the real advantage of the implementation without driving capability holds for a number of bits lower than those resulting from (5) . A detailed comparison shows that the number of bits is reduced by at least half compared to those resulting from the simple comparison in terms of propagation delay.
VII. FINAL REMARKS AND CONCLUSIONS
In this paper we have carried out a comparison among the most suitable topologies of full adder, including recently proposed ones. The information obtained is useful in the early design phases of an adder circuit, since architectural optimization techniques are based on the knowledge of the full adder cell used.
The full adder cells have been simulated both as a single circuit and implemented in chains, to understand the intrinsic properties and the performances in actual adder circuits. Moreover, the parasitics extracted from layout have been also considered. The paper includes the most promising recently proposed topologies, and for the well-known topologies, in some cases the results are different from those previously published in the literature (where often layout was not considered). The comparison has been carried out both assuming circuits with minimum transistors size, to minimize the power consumption, and with transistors sized to optimize the power-delay product.
The analysis has showed that the one-bit full adders without driving capability (TG and LP) are the fastest, and have a power dissipation quite low, hence, they are suitable for low-power adders, even though their advantage strongly decreases reducing the power supply. In the class of one-bit full adders with driving capability, the Dual-rail Domino has a very high power dissipation, greater than the CMOS by a factor ranging from five (at low ) to eight (at high ), but exhibits the lowest delay, close to that of TG and LP. Hence, the Dual-rail Domino circuit is only suitable for arithmetic circuits where no compromise on performance is allowed. The CPL topology offers a more reasonable trade-off between power and delay for high performance circuits, having a delay lower than the Mirror adder, but paying a power dissipation penalty greater than the speed improvement. The TGdrivcap topology does not provide any advantage neither in terms of power nor in terms of speed, hence it should be never used. The LEAP topology offers only a small speed advantage at high when transistors are minimum, thus, in general it is not a good choice.
We have also shown that the advantage of the topologies without driving capability is lost in a chain of one-bit full adders increasing the number of blocks in the chain. For example consider the TG and the CMOS topologies with minimum transistors, which are the fastest and the slowest. The ratio of their propagation delays goes from about one-fifth with a 3.3-V power supply to about one-half with a 1.2-V power supply. Hence, considering (5), which considers only the propagation delay of the chain, the advantage of the TG is maintained until a chain of 10 blocks and 4 blocks (i.e., 10 bits and 4 bits) for a 3.3 V and 1.2 V, respectively. This count is still optimistic. Indeed, using the proposed figure of merit which leads to relationship (9), we find that the advantage of TG is maintained only for chain of 5 and 2 bits for equal to 3.3 V and 1.2 V, respectively. Thus at low power supply the advantage of the topology without driving capability is almost completely lost. It is worth noting that the ratio between the power consumption of topologies without driving capability with those with driving capability is maintained almost constant increasing the number of bit. Hence, the better efficiency of the TG and LP one-bit full adder, shown by the PDP evaluated, in an -bit chain reduces as the speed advantage does.
In conclusion, unless for short chains of blocks or for cases in which we want to have the minimum power consumption, the topologies without driving capability are not attractive. Among solutions with driving capability, the most interesting implementations are the traditional Mirror or the CMOS (even though it is worse in speed and PDP) which have roughly the same performance and require an energy comparable to that of the TG and LP circuits. In the case we want to implement the fastest solution we have to choose the Dual-rail Domino or, to save power, the CPL.
