Abstract-Domino dynamic circuits are widely used in critical parts of high performance systems. In this paper we show that, in addition to the functional limitation associated to the noninverting behavior of Domino gates, there are also robustness disadvantages when compared to inverting dynamic gates. We analyze and compare the tolerance to parameter and operating conditions variations of gate-level pipelines implemented with Domino and with DOE, an inverting dynamic gate we have recently proposed. Our experiments confirm that DOE pipelines are more robust and that improvements are due to its noninverting feature.
I. INTRODUCTION
Design of functional units implementing very fine-grained pipelining for high performance applications is currently an area of active research. These solutions do not apply conventional pipeline techniques which insert flip-flops to short down signal propagation paths in combinational logic, but instead rely on logic circuit styles, which naturally exhibit the capacity to block data propagation. Thus, they are well suited to implement pipeline architectures without memory elements. Potential of dynamic logic, with its precharge and evaluation phases, to implement this kind of pipelining was long ago recognized. Thus, in [1] the operation of the wellknown dynamic-based Domino logic, in a pipelined fashion using an overlapping multi-phase clock scheme and without latches between consecutive clock phases, was analyzed in depth. It is known that many variations of this multi-phase solution have been developed achieving high performance [2] - [8] and some of them have been applied to speed up critical parts of commercial microprocessors. In particular, architectures with a single gate per clock phase (nanopipeline) have been proposed and have demonstrated large operating frequency and throughput [4] , [5] , [7] , [8] .
In spite of their speed advantages, it is well known that dynamic gates exhibit limitation such that only non-inverting blocks can be chained (a static inverter is added between each two dynamic stages to guarantee that all inputs to the next logic block are set to 0 after the pre-charge period in Domino). The functional limitation is not the only penalty related to the non-inverting behavior of Domino gates. We have realized that the operating frequency of Domino nanopipelines is not independent of the number of pipeline stages as it should be. This behavior rises from the fact that in order to produce a logic one, a non-inverting gate requires one or more of its inputs to be also at logic one. This translates in that non-ideal logic ones get worse as they propagate through the logic network, eventually leading to a functional failure. Non-ideal logic ones could be the result of parameter or operating conditions variations. In this paper the use of inverting Domino-like gates to improve circuit robustness is explored.
The paper is organized as follows: in Section II, gate topologies for both the conventional non inverting Domino and the recently proposed inverting DOE logic style [9] are described. In Section III, the operation of pipelines build from them is analyzed and the implications of the non-inverting behavior of Domino are illustrated. In Section IV, DOE and Domino nanopipelines are evaluated through simulations experiments and compared in terms of robustness. Finally, some conclusions are given in Section V.
II. DOMINO AND DOE GATE TOPOLOGY

A. Domino Topology
A conventional dynamic gate (or Domino gate) is shown in Figure 1a and consists of a dynamic stage and a static output stage. A keeper transistor (M K ) is added to protect dynamic node against leakage/noise. It operates in two phases called precharge (CLK = 0) and evaluation (CLK = 1). During the precharge phase, the dynamic node is precharged to V DD through M PREC (and, thus, the output is discharged), whereas during the evaluation phase, the pull-down network (PDN) and the footer transistor M FOOTER discharge the dynamic node. The sizing of these three elements, the PDN, the keeper transistor and the PMOS transistor of the output determine the evaluation delay in the conventional gate
B. DOE Topology
Recently, we have proposed a new topology called Delayed Output Evaluation (DOE) [9] which modifies the static output stage of the conventional Domino gate. Figure 1b depicts the schematic of the DOE topology, in which the static inverter of the output stage of the Domino gate has been replaced by a static NAND gate and one static inverter. Note that the inputs of the NAND are the dynamic node and a new version of the clock, D CLK V , whose rising edge is delayed with respect to the rising edge of V CLK by Δ CLK , while, ideally, both falling edges are simultaneous (see Fig. 1b ). For V = 1, the NAND gate evaluates its input. For input combinations which discharge the dynamic node, the pull-down network is on and gate output remains low. For input combinations which do not discharge dynamic node, the NAND output node is pulled down and V OUT is pulled up. DOE topologies implement inverting functionalities since, as already mentioned when describing its operation, a logic one is achieved when an input combination which does not discharge the dynamic node is applied. Clearly, this is an advantage compared to conventional Domino gates in terms of logic flexibility.
Unlike conventional gate, in DOE the evaluation delay is determined by the speed of the NAND-INV static stage and by the amount by which evaluation of the NAND is delayed. Gate delay is to some extend independent of how fast dynamic node discharges. As a result, achieved delay-noise tolerance tradeoff is significantly better than in Domino gates as we showed in [9] . Also, the design of a Kogge-Stone adder using DOE gates is reported in [9] in order to validate their capability to build up logic networks. Figure 2a shows the block diagram of the target gate-level pipeline interconnection scheme, using, in this example three overlapped clock phases. We have implemented it with Domino OR gates and simulated at different frequencies. The output of each stage is analyzed. Over a given frequency, logic ones start to degrade as they propagate through the chain of gates, eventually leading to a functional failure after several stages, as it is shown in Figure 2b . Note the final output of the Domino chain (V OUT,CONV in Fig. 2b ) is zero since due to the degradation of the intermediate outputs. This behavior can be explained on the basis of the input combination producing a zero-to-one transition. In Domino, being non-inverting, this output transition is associated with inputs combinations discharging the dynamic node. Discharging of the dynamic node requires one or more inputs being at logic one. "Good" ones are required to fully discharge dynamic node and produce a "good" output one. Moreover, non-ideal behavior of consecutive stages accumulates. A non-ideal one causes that the dynamic node is not fully discharged. This translates in faster precharge of the dynamic node and so even more narrower logic one output. Thus, dynamic node of the next stage is discharged to a higher voltage level. Contrary, for pipeline stages implementing inverting functionality, the zero to one output transition occurs for input combinations which do not discharge the dynamic node. Thus, how good the output logic one is does not depend on how good the input logic ones are. In order to illustrate the differences we have simulated ten stage pipelines implemented with both logic styles. The voltage level to which the dynamic node of each stage discharges has been measured at different frequencies. Simulation results are summarized in Fig. 2c in which this voltage level versus stage number for Domino and DOE networks are depicted. In Domino, the discharge of the dynamic node is progressively degraded in consecutive stages and the complete chain does not operate correctly. DOE behavior is completely different. Minimum voltage level is slightly increased from first stage to second one due to nonideal inputs but then remains constant. The accumulative effect observed in Domino is avoided. The circuits work up to a given frequency. All stages fail if frequency is further increased. That is, it is not the accumulative effect what is causing circuit failure.
III. ANALYSIS OF PIPELINE OPERATION
These results suggest advantages in terms of robustness related to the use of inverting stages. DOE pipelines could be less sensitive to parameter or operating conditions variations since their effect does not accumulate along the network. Next section describes the experiments carried out to compare robustness of Domino and DOE logic styles for pipeline operation.
IV. SIMULATION RESULTS
Several experiments have been performed in order to evaluate the robustness of a set of ten stage chains like those described in previous section implemented both in Domino and DOE. A very conservative operating frequency has been calculated for each benchmark pipeline. Then, their behaviors are analyzed by simulation of corner SS and by determining minimum voltage at that frequency.
Benchmark pipelines.-Gate level pipelines built from three types of gates (NOR/OR-8, NOR/OR-16 and NAND/AND-2)
have been analyzed, each one with four different sizes of the keeper and the precharge transistor and, thus, different behaviors of the dynamic node (NOD) are obtained. Dynamic stages as well as keeper transistor and feedback and output inverters have been identically sized in Domino and DOE counterparts. The NAND sizing and the amount by which the clock is delayed in DOE (Δ CLK ) are selected such that gate delays of DOE and Domino are comparable. Gates are connected such that input changes propagate through the circuit and each gate is excited with the worst case input combination. Simulations have been carried out in a commercial 1.2V 130nm technology for each gate type.
Estimation of conservative frequency.-Five conservative timing constraints have been derived in order to guarantee a correct operation of the pipeline, involving both the period and the precharge and evaluation delays of the dynamic and output nodes. The maximum frequency that fulfills all the constraints described below is obtained for each pipeline.
• RST 1: One third of the period must be larger than the evaluation delay of the output node in order to ensure that the input to next stage is available when its evaluation phase starts.
T/3 > Output Node Evaluation Delay (OUT E )
• RST 2: One third of the period must be larger than the evaluation delay of the dynamic node. Thus ensuring that the dynamic node is enough discharged before evaluation of next stage starts.
T/3 > Dynamic Node Evaluation Delay (NOD E )
• RST 3: Half of the period must be larger than the precharge delay of the output node to guarantee that the output node is fully precharged.
T/2 > Output Node Precharge Delay (OUT P )
• RST 4: Half of the period must be larger than the precharge delay of the dynamic node to guarantee that the dynamic node is fully precharged.
T/2 > Dynamic Node Precharge Delay (NOD P )
• RST 5: Sixth of the period plus the precharge delay of the output must be larger than the evaluation delay of dynamic node to guarantee that evaluation is not stopped by the precharge of the previous stage.
T/6 + Output Node Precharge Delay > Dynamic Node Evaluation Delay (NOD E )
To calculate conservative frequencies gate delays have been characterized using same load conditions and input combinations that when operated in the pipelines. Four delays have been measured according to the following criteria:
• Evaluation (E): It has been measured from 50% of the rising edge of the clock to 10% of the falling edge of the dynamic node (NOD E ) and to 90% of the rising edge of the output (OUT E ).
• Precharge (P): It has been measured from 50% of the falling edge of the clock to 90% of the rising edge of the dynamic node (NOD P ) and to 10% of the falling edge of the output (OUT P ). Table I reports these delays for the 12 Domino and DOE gates from which pipelines are built. Keeper and precharge transistors are K P and K PRE times the minimum size of the technology, respectively. As expected, similar results for the dynamic node are obtained in both counterparts. Larger values of K P imply that the dynamic node discharges slower, whereas larger values of K PRE speed the dynamic node precharge up. It can be observed that a wide range of dynamic node behaviors are covered (evaluation delays form 32ps to 104ps and precharge values from 37ps to 106ps). There are differences in the output node delays. In DOE they are not directly linked to the keeper and precharge transistor sizes, since they depend on Δ CLK , and the delays of the static NAND and the output inverter. In spite of the differences, the delays of DOE and Domino counterparts are comparable due to the design criteria above explained. Results.- Table II summarizes the results of the two experiment carried out on the pipelines. Concerning the minimal V DD at which the chains are operative at their conservative frequency, in 9 0f the 12 pipelines DOE can be operated at lower V DD . Only for one benchmark Domino can operate at lower V DD than DOE.
The second simulation consists of an analysis of corners. It can be observed that DOE passes the SS corner for all the chains. Domino passes for 3 of the 12. Two of them are built from the simplest gates (AND2). None Domino pipeline built from the more complex gates (OR16) passes the SS corner.
These results support that DOE pipelines are less sensitive than Domino to parameter or operating conditions variations as we suggested in Section III. However further analysis is required to show that the improvements are not due to the gate itself but to advantages of their cascading. The pipelines have been simulated with V DD voltage 100mV under their measured minimal V DD and the first stage failing has been identified (STG VDD in Table II) . Also this experiment has been carried out for Domino pipelines that do not pass the SS corner (STG SS ). Domino failures appear at different stages while it is the second stage (the first receiving non-ideal outputs) the one failing in DOE. This confirms that the failures are due in most cases to the accumulative effect exhibited by Domino. Unlike, DOE failures are due to gate delay degradation. Parameter or V DD variations that do not degrade gate delays enough to produce the DOE failure are not tolerate in Domino pipelines after several stages because they accumulate. 
V. CONCLUSIONS
We have analyzed and compared the operation of gatelevel pipelines implemented with Domino and with DOE gates. We have shown that a non-ideal one degrades as it propagates through the Domino network, eventually leading to a functional failure, due to its non-inverting nature. DOE, being inverting, does not exhibit this behavior. Our experiments support that this translates in larger tolerance to variations of DOE pipelines.
