In this brief, we introduce a novel low-power dual mode logic (DML) 
ISSN: 2089-3191 
Subthreshold Dual Mode Logic (J Nageswara Reddy) 143 evaluation. Note, all gates can be designed either as Type A or Type B, ignoring the optimization guidelines mentioned above. The optimal design methodology when designing with DML gates is to cascade connect Type A and Type B gates, exactly like in np-CMOS gates. Even though this design methodology will allow maximum performance, minimize area, and maximize power efficiency, it is possible to connect gates of the same type by using an inverter buffering between them, in a similar way it is done in domino logic. Connecting gates of the same type without inverters is also possible when a footer/header is used at each stage, however, this structure will cause glitching after precharge ends and until the evaluation data ripples through the chain. These are standard problems when designing with dynamic gates [11] . However in contrast to the standard dynamic logic, DML's inherent keeper helps recover the logical value.
Comparative Performance Analysis through Simulations and Measurements
We compared DML gates to their CMOS and domino counterparts by means of speed, power, and robustness. All the test gates were examined and characterized in a standard lowpower 80-nm process, using the Cadence Virtuoso-based Spectre simulator. Power supplies between 150 mV and 600 mV were tested for energy estimation. Monte Carlo statistical simulations were performed at 300 mV to compare the sensitivity of the simulated gates to process variations and mismatch. The DML gates, tested in the rest of this brief, are unfooted, except for Section III-C, where the comparison of the footed DML gates to their footed dynamic counterparts is presented. In cases of DML gates without footers, the simulation results include the overhead of generating the ripple precharge signals. In order to provide a fair comparison, the same metric was used to design all gates (CMOS, domino, and DML). All gates were designed to conduct the same Ion current during evaluation. This current is equal to the Ion current flowing through a single transistor of a CMOS inverter.
Speed
We setup a framework for evaluating frequency consisting of fanout three NAND and NOR gates. We compared standard CMOS gates, unfooted DML gates, and domino gates both with and without a. FO-3 NAND-NOR chain and for DML FO-3 NAND-NAND chain keeper. The role of the keeper in receiving acceptable robustness will be discussed in Section III-C. A test chain was composed of 20 consecutive NAND and NOR gates, in which the NOR gate was implemented in A topology, and NAND was implemented in B topology, laying a similar structure to an np-CMOS design. While this np-CMOS-like chain demonstrated better results, we also show the performance of consecutive DML gates of the same type. We tested the minimal functional After the precharge phase, the output of a dynamic NOR gate is high, and when no switching occurs, it literally gives tplh = 0. When switching does occur, the output capacitance CL is discharged through the pull-down network. Usually, CL is the input capacitance of the next node in the dynamic chain, so it is substantially smaller than the input capacitance of the CMOS equivalent. The switching period thus is decreased and becomes similar to the CMOS-design current-sinking capabilities of the pull-down network. This analysis seems somewhat unfair, since it does not take into account the precharge phase. However, it is very often possible to conceal the precharge during other system functions. Figure 2 depicts a comparison of the maximum gate frequency as a function of VDD for CMOS, dynamic, and DML chains. First, as expected, the highest frequency is achieved by unfooted dynamic logic. However, dynamic logic is very sensitive to process variations (discussed in Section III-C), which make it unusable for the subthreshold regime. Second, the dynamic DML gates with an average of an order of magnitude have higher-frequency than CMOS. Third, the unfavorable case of consecutive gates of the same type (in this case the chain was composed of interleaved Type A and Type B NAND gates) shows speed degradation of 17%, as compared to the DML chain of consecutive NAND and NOR gates. Fourth, CMOS logic achieves frequency which is lower than the dynamic DML. Fifth, and last, is the static DML, which offers on average 55% of the achievable CMOS frequency. This means that switching from static mode in DML to dynamic mode offers a 14× frequency boost on average, with energy consumption consequences that will be discussed in the following section.
Energy Dissipation
A simulation of the same chain composed of 20 consecutive NANDNOR demonstrates an energy consumption analysis. We used the test chain to estimate the total energy consumed during one switch. We used only footed dynamic gates, since, as previously noted, an unfooted dynamic gate does not stand process variation. The results of the analysis are shown in Figure 3 . VDD varies from 0.2 to 0.6 V, and the minimum energy point (MEP) is marked with an "X." The DML static mode demonstrated a lowest energy consumption, on average, 2.2× less than CMOS and 5× less than domino. As can be observed, the MEP for DML gates is located in the subthreshold region. Although it is not always possible, the optimal operation voltage for ultralow power applications is VDD, MEP at MEP [12] . If VDD is higher than VDD, MEP,
dynamic energy is wasted, and if VDD is below VDD, MEP, leakage energy is wasted, due to the prolonged TCycle [13] . Herein lies an interesting DML feature: the circuit can be tuned to operate at an MEP bound to a certain nominal frequency, but, when required or higher throughput, a higher-frequency can be easily achieved by changing the operation mode to dynamic with an acceptable energy penalty. The opposite is also possible: the circuit can operate at a high-frequency, but at standby the consumed energy can drop down to 20% of the nominal consumption. As expected, domino logic consumes the highest amount of energy, due to the precharging, high leakage, and excessive transistors as keepers.
Robustness and Sensitivity to Process Variations
The subthreshold regime, while offering low power consumption, suffers from process variation susceptibility and reduction of noise margins. In the following sections, we present two metrics used to quantitatively estimate the robustness of DML logic versus CMOS and domino design. 1) Static Noise Margin (SNM): The metric to estimate an employed logic gate failure is SNM for logic gates, as introduced by Kwong and Chandrakasan [14] . This metric suggests a simple analysis of the butterfly curve. Logic failure is defined as a butterfly plot SNM analysis with no inscribed square, analogous to a 6T static random-access memory (SRAM) cell displaying negative SNM. In order to test DML we connected a NAND gate to a NOR gate back to back, as it was applied in [15] for an SRAM cell. SNM is defined as the largest inscribed square's side in the smaller lobe of a butterfly plot. We have used this criterion only for the CMOS and the static DML, since dynamic logic and dynamic DML cannot be tested correctly using this analysis. Figure 4 shows the DML and CMOS SNMs at VDD = 300 mV. The Monte Carlo analysis for 1 k points, which takes into account both local and global variations, was utilized. The simulated SNM for CMOS is μCMOS = 77 mV, σCMOS = 7.7 mV, and the DML static SNM is μDML = 52 mV and σDML = 11.2 mV. The SNR of the SNM received for CMOS is a little bit higher than the SNR of static DML, which implies higher robustness of CMOS. However, it can be seen that static DML is still very robust. Moreover, it should be noted that when DML was optimized for improved robustness rather than improved speed, better SNM values were received. In the following section, we will evaluate the dynamic DML versus the domino robustness. D. Logical Level (LL) Analysis To evaluate the process variation susceptibility of the dynamic DML and domino, we introduced LL analysis. We used LL analysis as a framework to evaluate the tested dynamic logic's ability to handle leakage currents. According to the LL analysis, a gate is either precharged to VDD or dis-precharged to 0 V, and after a predefined period, the output voltages of the different gates are compared. Dynamic gates suffer from charge leakage, which becomes more severe in subthreshold due to long static periods. This analysis takes into account all the parasitic leakages and approximates the robustness of the dynamic gate to hold a logical 0 or a logical 1. The test consisted of a single gate in a chain, precharged, and after a period suitable for 10-MHz operation, the voltage was measured at the output of the gate. We tested the DML unfooted gates versus the domino gates with a keeper. We used a keeper since domino gates without a keeper failed to operate. The LL analysis was performed using a 1-k-point Monte Carlo simulation with local and global interdie variations, which simulate a sampling of logic gates across various dies. Figure 5 These results strongly indicate an improved obustness of DML dynamic logic versus standard domino implementation. It can be noted that a fairly large amount of the tested domino gates failed to keep the LL "1," due to the topology which consists of a stack of n-MOS transistors struggling with a feeble p-MOS precharge transistor at some of the simulated dies. We also examined the lowest possible VDD for CMOS, domino, and DML under global and local variations. The results were 285 mV for CMOS, 470 mV for domino, and 300 mV for DML.
Delay Variation
Obviously, delay variation affects the performance, which thus affects the yield. It is well known that circuits operating in the subthreshold regime exhibit more magnified sensitivity to variations than in the above-threshold. This is due to the exponential dependence of Vth. The common assumption is that Vth is distributed normally, hence, the subthreshold current is lognormally distributed. The delay of a subthreshold logic gate can be modeled as
Where K is a fitting parameter and Cg is the extracted output capacitance. The denominator is the active current, modeled using I0 as a fitting parameter, which takes into account the total current flowing through the n-MOS and p-MOS transistors. Assuming nonvarying output capacitance, we predict that the delay will also be log-normally distributed, since it is linearly related to the on current. Indeed, the 1-k Monte Carlo analysis of the average delay yields a log-normal distribution, as depicted in Figure 6 . The received results are, from the fastest to the slowest: domino with μDomino = 12.77 ns, DML dynamic mode with μDML_w_footer = 16.22 ns, CMOS with μCMOS = 18.8 ns and DML static mode with μDML_static = 31 ns. The domino offers the highest-frequency, but as previously discussed, it suffers greatly from leakage, and consequently exhibits a very low yield. In terms of yield, for example, if the target operation frequency is 10 MHz at 300 mV, Monte Carlo results mean almost 100% yield in the case of the DML, and less than 40% in the domino. Thus, in practice, standard domino logic is unsuitable for the subthreshold regime.
Test Chip Measurements-Preliminary Proof-of-Concept
In order to provide a preliminary proof-of-concept of the proposed family, we have fabricated two DML test structures as a part of a test chip in a low-power 80-nm TSMC process. The fabricated DML structures are 100 stage chains with the following architectures: 1) Type A gate followed by a Type B gate, denoted as AB, and 2) Type A gate followed by a CMOS inverter, denoted as AI. Figure 7 shows the layout and die photograph of the test chip, which includes other projects as well. The DML devices under test are marked in Figure 7 (a). The chip was covered by metal layers for density reasons. Post-silicon testing was performed with 400 mV-1.1 V supply voltages at 27 °C. All control signals and biases were generated externally using a Pulse Instruments 4000 Series Test System. Static and dynamic behaviors were measured using the Agilent B1500a semiconductor device analyzer. In Figure 8 , we can see positive evidence for the functionality of the DML family. Figure 8 Figure 8(b) , was also verified. To activate the static mode, Clk was connected to VDD. As expected, both chains behaved as CMOS gates. The measured chain delay was approximately 200 ns, which is about 10× higher than the dynamic operation. Comparison between simulation and measurement results, which is not presented in this brief due to length limitations, showed coherence between simulated and measured results with an average and maximum differences of 13% and 25%, respectively.
Conclusion
In this brief, we presented a novel family, DML, which was shown for subthreshold operation. We showed that the DML dynamic mode presented an average 10× speed improvement as compared to CMOS, and improved robustness as compared to a standard dynamic logic.
The DML static mode demonstrated the lowest energy dissipation: 2.2× less than CMOS on average, and 5× less than the domino. We presented a basic proof-of-concept of the proposed DML logic by measurements of an 80-nm test chip. Future work will include the optimization of the DML gates for operation with standard supply voltages, development of a standard library and designing of a benchmark design using a standard ASIC flow.
