This paper presents the design, the analysis and the complete characterization of a novel split-path Data Driven Dynamic (sp-D3L) full adder cell in IBM's 65 nm CMOS process. The split path D3L design style derived from standard D3L allows the design of high speed dynamic circuits without the power overhead of the clock tree while providing significantly higher performance than the D3L due to reduced capacitance at the pre-charge node. To demonstrate the performance benefits of the new split-path dynamic approach, we present comparison of the proposed adder with conventional static and dynamic adder cells. All the adder circuits were characterized for speed, power, area, noise margins, supply voltage scaling as well as fan-out capabilities. To evaluate the combined impact of load driven by the adder and load presented by the adder to the driving circuit, a combined fan-infan-out analysis with varying loads was also performed. Monte Carlo simulations were performed to evaluate the reliability of the adder design against random process, voltage and temperature variations. To compare with state of the art, we also performed a comparison of our proposed adder with several low power as well as high performance adders proposed recently in literature. Furthermore, to simulate the behavior of the adder in data path elements, we built ripple carry adders of varying lengths using the proposed adder. The new design was found to achieve from 16% to 27% performance advantages over its static and dynamic counterparts at nominal supply voltage. With supply voltage scaled from 1 V to 0.8 V, the adder shows 12%, 34% and 39% PDP advantage over domino, static and conventional D3L designs respectively. Fan-out analysis showed the adder to perform with 11% to 41% better PDP than the others at worst case FO32 loading.
INTRODUCTION
Addition forms the basis for many processing operations, from counting to multiplication to filtering. As a result, adder circuits are of great interest to digital system designers. An extensive assortment of adder circuits and architectures has been proposed in literature to meet various performance targets in the design of data path sub-systems and processor control structures. Most high performance circuits, both general purpose processors as well as application specific architectures employ several arithmetic circuits. Since arithmetic circuits often form the critical components in a data path sub-system, the overall performance and throughput of these systems depends on the speed and power efficiency of the critical arithmetic components.
The full adder is the building block for several high performance binary adders. Hence enhancing the performance of the 1 bit full adder has always attracted great amount of interest in research circles. Several circuit solutions have been proposed in literature to implement the full adder functions in different design styles. Like every other digital circuit, delay, power and area have since long been the primary considerations in full-adder design. Various solutions have been proposed at the circuit level to achieve the optimum trade-off between delay, power and area in adder cells. Over the years, several new adder designs have been proposed to achieve optimum performance, power and area. [1] [2] [3] [4] [5] Besides these, several interesting combinations of logic and circuit design have been proposed to optimize full adder circuits. One of the most popular strategies is the design of adder circuits using majority functions Refs. [6] [7] [8] [9] present the design of full adder cells based on majority function implementation. In Ref. [10] Navi et al.
combine the majority function approach with a special circuit style called "bridge implementation" to achieve high speed with reduced power consumption and increased area density and layout symmetry. Another approach that can be followed for implementing the full adder SUM and CARRY functions is using different static and dynamic design styles. While static design styles provide robust operation with low power consumption relatively reasonable speed of operation, dynamic design implementations allow for realization of high performance adder circuits. Domino 11 is one of the most popular dynamic design styles providing high performance, low area implementation but comes with the additional cost of high power consumption due to the clock distribution network. However domino designs suffer from the inability to provide inverting functions, which a drawback is considering that full-adder circuits are most commonly used in cascaded carry-linked configurations. Solutions such as use of dual-rail dynamic designs, 12 13 np-CMOS implementation 14 have been suggested to tackle this issue. However each of these design styles is still plagued by the additional power dissipation associated with clock tree design. As a solution, Rafati et al. [15] [16] [17] proposed data pre-charged or data driven dynamic logic (D3L) where pre-charge evaluate functions are achieved through the selection of appropriate combination of input signals of the circuit. D3L and dualrail D3L 18 show significant improvements in power-delay trade-offs compared to domino, np-CMOS and dual-rail domino. Frustaci et al. 19 recently presented a new method of realizing D3L functions by splitting the pre-charge paths. This approach allows having lower number of transistors in the pre-charge path thereby speeding up the operation of the circuit. The split-path D3L was shown to provide 25-34% better energy delay product compared to domino and traditional D3L implementations when used in the implementation of a multiplier.
In this paper, we describe the complete design and characterization of a new hybrid full adder topology. The adder is derived from Data Driven Dynamic Logic design methodology. The proposed adder circuit provides the speed advantages of dynamic design styles without the additional power consumption associated with the design of the clock distribution network, to provide excellent performance metrics in terms of speed, area, power, reliability and driving strength. The new split-path pre-charge network employed in the adder provides further improvement in speed over the conventional D3L design. An important observation upon studying the numerous full adder designs proposed in literature is the severity of the tradeoffs involved. As an example the standard 28 transistor static CMOS full adder cell provides robust SUM and CARRY evaluations but is weaker in power consumption and area. While most designs focus on optimizing speed or reducing power, several other parameters such as drivability, fan-out capability, noise margins, and robustness to process variations are often overlooked. We realize that an ideal adder is one which performs reasonably well in most of these areas along with the traditional speed, power and area criteria. Especially as we move towards more advanced technology nodes 65 nm and beyond, it becomes particularly important to characterize the circuits' behavior in case of random process and mismatch variations. Hence, to give the reader, a complete picture of the trade-offs involved in the design, we implemented the adder in IBM's 65 nm technology node and characterized the design not only for speed, power, and area, but also for fan-in, fanout and more importantly for tolerance to process, voltage and temperature variations.
Due to a wide range of adder circuits presented in literature [1] [2] [3] [4] [5] [6] [7] [8] [9] it becomes difficult to compare the performance of a new design with all published results. As a result we present a comparison of our adder cell with other adders in two stages. We present a comparison of our design with standard CMOS, domino and D3L adders in terms of power delay product, fan-in, fan-out and process variation tolerance. This comparison essentially serves as a benchmark for adder performance in all the above mentioned areas. Next we present a comparison with recently proposed high speed as well as low power adder designs.
The rest of the paper is organized as follows. Section 2 describes in brief the basic D3L design approach which forms the basis of the proposed design cell. Section 3 presents the implementation of the proposed full adder cell. The performance characterization of the full adder is presented in Sections 4. Section 5 presents the statistical analysis of the circuits to characterize reliability against process variations. Finally some concluding remarks follow in Section 6.
DATA DRIVEN DYNAMIC LOGIC-AN OVERVIEW
Dynamic circuit design styles increased in popularity as low area, high speed alternatives to standard CMOS. Unlike standard CMOS which requires implementation of the function using both PMOS and NMOS networks, dynamic styles rely on the use of fast NMOS-only realizations of functions. Most dynamic circuit families use only a single PMOS transistor commonly known as pre-charge transistor to pre-charge the output node, and NMOS evaluate transistor in the Pull-Down Network (PDN) to pull the output node to 0 during evaluation. Thus, dynamic circuits allow the implementation of an N input function using only N + 2 transistors in contrast to 2N transistors required by standard CMOS. This not only results in a clear advantage in terms of area, but also reduces the logical and electrical effort on the driver circuit, thereby also offering relaxed transistor sizing constraints. Moreover, due to the elimination of a full PMOS Pull-Up Network (PUN), the effective capacitance driven at the output is reduced significantly allowing significant improvements in the speed of the circuit. An important trade-off in dynamic circuits is however the large amount of power consumed by the clock signal. As technology scales, the supported clock frequencies have already entered into the GHz range and clock distribution networks account for 40-60% of the total power consumption in the circuit. 20 The use of a heavily loaded clock signal also implies a careful design and distribution of the clock tree. 21 In our previous work in Ref. [22] we found that the clock tree accounted for almost 39% of the total power consumption of an 8 bit integer data path implemented using domino design style.
Rafati et al. proposed a dynamic design style called Data Driven Dynamic Logic (D3L). [15] [16] [17] This design style works in alternate pre-charge and evaluate fashion similar to traditional domino or np-CMOS design styles. The main difference however is that unlike other dynamic design styles, D3L employs a suitable combination of the input signals to produce the alternate pre-charge and evaluate circuit behavior. The circuit is designed so that during the pre-charge phase the PDN is completely cut-off with no path to ground, thus charging the output node to VDD. During the evaluate phase, depending on the input signal combination, the output node evaluates and discharges to ground or maintains its pre-charged value. Figure 1 shows the basic D3L design methodology. Figure 2 shows basic 2-input AND, OR and XOR gates implemented in D3L. A minimum sized keeper may be employed at the output node to restore the charge at the output node. This design style offers the high speed advantages of dynamic design without the additional power overhead due to the clock network, at the cost of slightly higher area. Recently, we demonstrated the application of this D3L methodology in the design of high throughput reconfigurable data paths. 22 23 In 90 nm CMOS technology, D3L outperformed static and domino design styles by over 35% and 12% respectively while consuming 29% less power than domino implementations. The investigation clearly established the design style as an attractive alternative approach to dynamic circuit design. Due to the above mentioned advantages, it naturally becomes interesting to explore the design of the full adder in this design style.
The following section details the implementation of the full adder in conventional D3L as well as the new modified split-path full adder circuit.
SPLIT-PATH D3L FULL ADDER DESIGN

Conventional D3L Full Adder Design
A full adder circuit using pure D3L design methodology is shown in Figure 3 . As per conventional D3L logic methodology, the design of the PDN is done by carefully selecting a combination of inputs such that the discharge path to ground is completely cut-off during the pre-charge phase. As a general rule, the critical transistor combinations closest to the supply and ground rails are designed to be duals of each other. This means, a parallel combination of three NMOS transistors driven by A, B, C which connects to the ground rail, is dualed by a series connection of PMOS transistors driven by the same signals. The remainder part of the logic path in the PDN does not need to be replicated in the PUN. A similar strategy is followed in the design of the carry path. Both circuits employ minimum sized keepers to ensure correct and complete charging of the output node.
Although this arrangement offers advantages over other dynamic styles, it suffers from longer pre-charge times and poor fan-in, due to the series stacked PMOS transistors. We observed that it is possible to improve the performance of the adder design by reducing the capacitance at the output node. This allows for higher fan-in, lower pre-charge times and relaxed sizing requirements. The new adder design with reduced pre-charge capacitance is described in detail in the following sub-section.
Hybrid Split-Path D3L Full Adder Design
A first observation upon looking at any full adder design is the relatively small and simple CARRY path compared to the SUM circuit. Due to the relatively simple logic function and fewer transistors, the CARRY path employs lesser number of internal nodes. The smaller device sizes mean reduced capacitance at the output node. This in turn reduces the pre-charge time as well as the probability of charge sharing. As a result, the circuit is relatively less susceptible to speed and signal integrity issues. For a better sum path, therefore it is logical to incorporate some of these factors in the design of the pull-up and pulldown networks. To improve the slow pre-charge phase of the SUM path in the original D3L adder, we split the pre-charge path into two. The design of the PDN is also divided into two paths. Effectively, the output node from the original D3L logic has now been split into two, marked by D1 and D2. Splitting the pre-charge and evaluation paths results in significantly reduced capacitance at the output node. Combined with faster pre-charge, the circuit also achieves a degree of resistance to charge-sharing issues. The split-path approach effectively halves the possible paths for charge sharing within the adder circuit, as compared to the conventional D3L or standard domino implementations. Minimum sized keepers further help to restore the charge at nodes D1 and D2. The output signals D1 and D2 drive a standard CMOS NAND structure to generate the final output. An inverted SUM output goes back to drive the gates of the two keeper transistors. The charge on D1 and D2 is thus restored through a common path, allowing both the nodes to restore simultaneously, minimizing the probability of any incorrect operation. The hybrid adder circuit thus, incorporates the key advantages of D3L logic style as well as original standard CMOS. The use of a shorter pre-charge path ensures faster circuit operation. The final standard CMOS topology brings along a high noise margin, improved signal robustness, sharper rise and fall times as well as an overall increase in reliability. Splitting the pre-charge and evaluation networks also means lower number of stacked PMOS and NMOS transistors. This allows relaxed sizing of the transistors (even minimum sized devices) and translates into an improved fan-in and fan-out performance of the 
circuit in both super-threshold and sub-threshold operating regions. The circuit diagram for the adder is shown in Figure 4 . As seen from Figure 4 , the sum function has been split into two functions D1 and D2 to split the pre-charge path in the conventional D3L implementation. The final SUM is obtained by the NAND of D1 and D2. In any dynamic implementation the pre-charged node loses charge either when the function evaluates to zero or due to charge leakage, charge sharing, capacitive coupling and clock feedthrough 20 in the NMOS chain in the pull-down network. Hence, to maintain correct pre-charge at this node, keeper transistors have been traditionally used as pre-charge or pre-discharge transistors in dynamic circuits. In the SUM implementation, the keeper transistors are driven by the signal D1.D2. This arrangement ensures that whenever the SUM output evaluates to 1, which means either of the two nodes D1 and D2 are discharged to ground, the keepers driven by signal OUT, help in restoring the charge during the pre-charge phase. This logical behavior can be understood more clearly from Table I which elaborates the operation of the SUM circuit for each possible combination of inputs. It can be noted that the table includes a combination (A, B, C) = 0,0,0) after each input combination. This is shown to imply the need for all the inputs to simultaneously go into pre-charge phase (all inputs go low) during each operation cycle. The corresponding values at nodes D1, D2, OUT and SUM have also been listed to give a clearer understanding of the pre-charge and charge restoration process.
To evaluate the quality of the design, a library of popular standard adder topologies was designed. All the circuits were characterized for speed, power, area, and reliability. The results of simulations in the IBM CMOS 65 nm technology node are discussed in the sections to follow.
PERFORMANCE CHARACTERIZATION IN SUPER-THRESHOLD OPERATING REGION
This section describes the characterization and performance comparison of the adder circuits in the IBM CMOS 65 nm process flow. All the delays have been characterized at 50/50% input/output threshold levels. For all the circuits presented in the paper the sum delay was found to be lower than the carry stage. We simulated the circuits for all 8 combinations of inputs and outputs and then identified the worst case transition which is reported in the paper. The power reported is the average power measured after running a 50 ns simulation with all the input combinations occurring randomly as in the case of real-world application of the adder circuits.
Comparison of Adder Performance with Standard Circuits
In order to get an accurate understanding of the behavior of the adder, several sizing methodologies were investigated. For low power applications, a fully minimum sized design methodology was adopted. For performance oriented applications, we implemented both, standard sizing methodology and progressive sizing described in Ref. [20] . Table II presents the performance comparison of the proposed full adder cell with the three different sizing methodologies. It was observed that progressive sizing allowed for 12% to 14% speed advantage over standard and minimum sized designs respectively. Overall, the adder with all transistors minimum sized achieved 7% and 30% better Power Delay Product (PDP) compared with its progressively sized and standard sized versions respectively. The next step in the characterization process was to compare the performance of the adder with standard full adder designs, both static and dynamic. For this purpose we also implemented standard CMOS, domino and D3L full adders in the same technology. Figure 5 shows the simulation set up used. The results are presented in Table III . It can be observed that the new adder provides the SUM computation 16% to 27% faster than the standard circuits and provides a 14% speed improvement in the CARRY computation. The adder consumed 30% more power than its static counterpart for single cell simulations at nominal 1 V operating voltage. However the speed and power consumption numbers as well as overall power efficiency improved significantly with supply voltage scaling, and when used in the implementation of large adder chains of 8-16-32 bits and higher. This will be demonstrated in the later sub-sections. The power consumption of the domino circuit appears low due to the fact that the clock power is not evident when simulating a single adder. This power becomes more evident in the design of larger adder chains. Table IV presents comparison of adder performance with state of the art adder designs for speed, power, area and noise margins. The delays in the sum path and the total average power consumed by the adder have been factored in for the comparison to match the published results of other competing adder designs. To measure the noise margins of the proposed adder, we performed incremental simulations to identify the highest value of input voltage recognized by the adder as logic 0 (V IH ) and the lowest value of voltage recognized by the adder as logic 1 (V IL ). The values of high and low output voltages (V OH and V OL ) were fixed at 1 V and 0 V respectively. For an accurate comparison of the adders across technologies, all the reported results from competing designs were normalized to the 65 nm technology node considering 1 GHz operating frequency. The proposed adder achieves higher speeds than the recently proposed designs while consuming higher power. However, detailed results on other metrics such as area, noise margins, fan-out performance and process variability were not presented by either of the previously published designs. To get an idea of the adder performance when used in large arithmetic circuits, we implemented 8, 16 and 32 bit chains of ripple carry adders. All the reported results have been calculated using average power consumption of the adder. As mentioned above, we
Purohit et al. simulated the adder for all possible combinations of inputs and measured the worst-case delay. For power measurement, we measured the average power using a random set of inputs over a 50 ns simulation, ensuring that every input combination occurs atleast once. This method provides a more accurate picture of the PDP behavior of the adders as already presented by us in Ref. [23] . For the domino ripple carry adders, to account for the power consumed by the clock distribution network, we used buffers with PMOS to NMOS ratios of 2:1::4:2;:8:4. Every 8 bits were driven by a separate branch in the clock tree. The adders were simulated for all input combinations to identify the worst case propagation delay across the adder chains. This analysis was further reported at supply voltages from 1 V to 0.8 V to account for the impact of supply voltage scaling on the power-delay performance of the adders. Table V shows the worst case propagation delays of the adders for all the supply voltages considered in the analysis. To estimate the power efficiency of the adder chains, Figure 7 shows a plot of proposed adder PDP variation for different bit lengths. When included in the design of large adder chains, the proposed sp-D3L solution was found to achieve from 8% to 25% better power efficiency compared to the static adder. The new design, when used in a 32 bit ripple carry configuration, showed almost 3X improvement in power efficiency over the conventional D3L adder and almost 5X improvement over the domino design. The PDP figure of the domino adder deteriorates due to the higher power now consumed by the simple clock tree designed.
Design Space Exploration of Split-Path Data Driven Dynamic Full Adder
Performance Variation with Supply Voltage Scaling
Power consumption in CMOS circuits has always been one of the primary concerns for the designers. Especially with the advent of high performance portable electronics, designing high speed circuits operating on strict power budgets has received increasing attention. The power consumption in a CMOS circuit consists of the dynamic switching power as well as static power due to leakage currents, short-circuit power and sub-threshold conduction. Although leakage power has come to occupy increasingly higher proportion of the total power consumption, dynamic switching power still dominates. The dynamic power in a CMOS circuit is given by the
where P : Dynamic Power, f max : Operating Frequency, CL: Load Capacitance, VDD: Supply Voltage, : Activity Factor.
From the expression it is clear that one of the best ways to reduce power consumption in a CMOS circuit is by scaling the supply voltage. Reduction in the operating VDD of the circuit means less current and hence slower device operation. In a typical CMOS circuit, during normal operation, most transistors are already velocity saturated. 5 This is especially more significant in advanced technologies 90 nm and beyond. From a supply voltage scaling point of view, it means that modern CMOS technologies provide more headroom for the designer to scale the supply voltage without significant degradation in circuit performance.
For this purpose, it becomes absolutely essential to characterize performance of any circuit for scaled supply voltage. We simulated the proposed adder along with the static, domino and D3L implementations for VDD varying from 1 V down to 0.8 V. The 0.8 V range was selected as the lower threshold of the supply voltage sweep in order to achieve reliable operation of the entire simulation setup at the 1 GHz characterization frequency. As 0.8 V is approximately the sum of threshold voltages of the PMOS and NMOS transistors in the target technology, scaling the supply voltage below 0.8 V requires a reduction in operating frequency to get reliable operation. Table VI presents the delay, power and PDP variation of each of the adder Fig. 7 . Comparison of adder PDP for different loading conditions. cells with VDD variation. The proposed adder shows the lowest degradation in delay with decreasing VDD. With supply voltage scaled from 1 V to 0.8 V, the adder shows 12%, 34% and 39% PDP advantage in SUM computation over domino, static and conventional D3L designs respectively. When comparing the CARRY path in each design, we observed the conventional D3L to always show a better PDP over the other designs. It was also observed however that the proposed adder shows around 10% better overall PDP performance compared to the static and domino implementations when operated under scaled supply voltages. Thus, the new design performs significantly better than the other circuits in a scaled supply voltage scenario. This makes it an ideal candidate to be utilized in data path units for portable electronics such as in Ref. [24] .
Fan-in and Fan-out Analysis
For a full-adder circuit to be truly useful, it is necessary to analyze its performance under various loading conditions. Such an analysis helps to evaluate the reliability of the adder under loading when used in applications such as address generators, instruction counters, multipliers, etc. Hence a detailed fan-out analysis was performed on the proposed adder and also on the standard designs to compare their performance under loading. The circuit was simulated to drive fan-out loads of 4,8,16 and 32 minimum sized inverters. Figure 7 shows PDP curves of the adders under various loading conditions. The analysis reveals that the split-path D3L adder circuit shows the best performance amongst all the adders considered by providing from 11% to 41% better PDP than the others at worst case FO32 loading. This analysis thus makes a valid point for the reliability of the proposed circuit while operating under multiple fan-out conditions.
Equally important is the impact of the adder circuit on the previous driving stage. The fan-in load presented by the adder to its driving stage affects the delay and power consumption of the driving stage as well as the performance of the adder itself. The best example to illustrate this would be the carry generation circuits of the D3L and the split-path circuits. Both the adders employ identical carry generation circuits with identical transistor sizing. But as seen from Figure 8 the delay of the carry path in both circuits varies differently under different loading conditions. A part of this behavior is attributed to the different loading presented to the inputs A, B, C in due to the difference in the two Sum circuits.
Thus, in order to fully characterize the adder, it is necessary to perform a detailed fan-in analysis on all the adder cells. An intuitive method of performing this analysis would be similar to the fan-out analysis. This would include simulating a CMOS circuit driving various numbers of identical adder circuits to observe the performance impact. We propose a new fan-in characterization method which allows us to evaluate the combined impact of fan-in and fan-out together. Figure 9 shows the simulation set-up used for this analysis. Each input of the adder is driven by a minimum size buffer. The sum and carry outputs are loaded with varying fan-out loads. In each case we calculate the sum and carry delays two times: (i) Delay w.r.t buffer input (ii) Delay w.r.t buffer output. The PDP is calculated in each case. We then evaluate the combined impact of fan-in and fan-out on the PDP by calculating the ratio of the two PDPs for both Sum and Carry. Although it is intuitive that a self-loading scheme is more appropriate to take the fan-in into consideration, even in case of cascading multiple adders, the outputs of the adders are generated through inverters and have to drive the local inverters in the next adder stage before they reach the actual adder. Hence, we believe that having an inverter as a fan-out load for the stand-alone adder analysis is sufficiently accurate to predict the behavior of the adders while also allowing for simpler simulation setup and shorter simulation time. This analysis combined with the results from the previously described fan-out analysis and ripple carry structures gives a more complete picture of the design trade-offs involved when deploying the adder cells in ripple carry schemes of various depths. Figure 10 shows plots of the PDP before and after the input buffers and the PDP impact ratio due to the combined effect of the output load driven by the adder and the input load offered by the adder. We propose that this analysis gives a better estimate of the performance of the adder in big circuits than stand-alone fan-in analysis. The curves show that the proposed adder achieves the least impact factor amongst all the other designs. The domino circuit offers very little load to the driving circuit as can be seen from the curves showing low impact factor. However Fig. 10 . Impact of fan-in and fan-out on full adder performance. it suffers from poor drivability under high load conditions. This is evident from the higher PDP of the domino circuit under high load conditions compared to the other adders under study.
In the analysis of various aspects of adder performance presented, we have considered the delay in the sum path of the adder when comparing it to other published designs. The power considered is the average power consumption of the adder to give the reader an understanding of the performance-power trade-offs. It can be noted that in the proposed adder, the SUM computation is faster than the carry. For this reason, whenever a comparison with other standard adders is presented, we have included the performance of the carry path as well. For the fan-in fanout analyses, only the SUM PDP has been considered. This is due to the fact that SUM function is more critical when considering the choice of the adder, as this is the output which drives the next stage in a data path. While the carry is critical to consideration in case of carry linked structures, its impact is automatically factored in when comparing the performance of long chains of adders. Furthermore, it should be noted that all the analyses presented here are meant to serve as a design space exploration for adders under various operating conditions, rather than a point solution to speed, power, fan-out, fan-in issues.
IMPACT OF PROCESS, VOLTAGE, AND TEMPERATURE VARIATIONS
As technology shrinks into the nanometer regime, several problems considered secondary in the past have been brought into the foreground. Perhaps the most significant impact of technology scaling has been on the reliability of the devices. With smaller device sizes and lower threshold voltages, modern high performance circuits in advanced technology nodes become increasingly susceptible to reliability issues and the circuit performances suffer process and environmental variations. In advanced technology nodes, process variations have been shown to strongly influence the speed of the circuit. Also as leakage power is becoming an increasingly dominating component of circuit power consumption, the power performance of the circuits considered. To incorporate the impact of power supply voltage variations on the circuits, the Monte Carlo analysis was performed at VDD ranging from 1 V to 0.8 V. Table VI shows the results of the Monte Carlo simulations.
To compare the behavior of adder cells under process variations, we use the mean ( ) and standard deviation ( ). It should be noted that the value of mean delay reported in the table may differ from those reported in the previous tables due to the fact that the previous tables report measurements from stand-alone simulations of adders whereas the mean delay represents the mean value obtained after 1000 sample Monte Carlo simulations. The ratio between the maximum spread and the mean value has been used as a measure of uncertainty due to the process variations impact on a particular performance parameter. In the following discussion this ratio is referred as variability. Based on the results obtained from the Monte Carlo Analysis we evaluated the worst case power and delay ( +3 ) and corresponding worst case PDP. Figure 11 shows the variation in worst case PDP due to process and supply voltage variations. It is clear from Table VII and Figure 11 that as expected, the static full adder performs best in terms of variability due to process and voltage variations. The domino adder shows the worst variation in speed due to random process and mismatch variations. The proposed adder achieves speed and delay robustness comparable to the other designs, but suffers in terms of power consumption variability, showing about 2-3% more variation than the standard designs. However as seen from the curves in Figure 11 the worst case PDP of the proposed adder is still lower than the standard designs considered in the comparison when the supply voltage is scaled down to 0.9 V and 0.8 V respectively. Based on these curves it can be argued that the proposed adder, shows robustness against random process variations even in scenarios of scaled supply voltages. Monte Carlo simulations were also performed at temperatures 27, 50 and 100. Despite an overall drop in performance, no significant degradation in performance and power efficiency was observed.
CONCLUSION
This paper presented the design and complete characterization of a new split-path D3L full adder circuit. The proposed adder provides up to 27% speed improvement over standard static, domino and D3L adders. When tested over a wide spectrum of supply voltages and fan-out scenarios the new design shows almost 41% improvements in PDP over the standard designs. The combined fan-in fan-out analysis revealed that the adder has least impact on the previous stage driving the circuit. Extensive Monte Carlo simulations revealed that the adder offers extended robustness against random process, mismatch, and power supply and temperature variations compared to the traditional D3L implementation. During this design and characterization process, we also proposed a new technique to analyze circuits for fan-in and fan-out capabilities. The simulation based analysis procedure is not limited only to adders but can be extended to evaluate other families of digital circuits. We suggest that any new design proposed in literature should be tested extensively for all the factors mentioned in this paper to get a true understanding of the strengths and weaknesses of the proposed design. This is of particular importance in digital circuit design which sees a flurry of new designs proposed rapidly, as it helps the reader to get a more detailed understanding and better appreciation of the proposed work.
