Improving robustness of dynamic logic based pipelines by Quintero Álvarez, Héctor Javier et al.
Improving robustness of dynamic logic based 
pipelines 
 
Héctor J. Quintero, María J. Avedillo and Juan Núñez 
Instituto de Microelectrónica de Sevilla, IMSE-CNM (CSIC/Universidad de Sevilla) 
Av. Américo Vespucio s/n 41092, Seville (Spain) 
Emails: {quintero, avedillo, jnunez}@imse-cnm.csic.es 
 
Abstract— Domino dynamic circuits are widely used in 
critical parts of high performance systems. In this paper we show 
that, in addition to the functional limitation associated to the non-
inverting behavior of Domino gates, there are also robustness 
disadvantages when compared to inverting dynamic gates. We 
analyze and compare the tolerance to parameter and operating 
conditions variations of gate-level pipelines implemented with 
Domino and with DOE, an inverting dynamic gate we have 
recently proposed. Our experiments confirm that DOE pipelines 
are more robust and that improvements are due to its non-
inverting feature. 
Keywords— Nanopipeline, Dynamic logic, Robust design 
techniques. 
I. INTRODUCTION 
Design of functional units implementing very fine-grained 
pipelining for high performance applications is currently an 
area of active research. These solutions do not apply 
conventional pipeline techniques which insert flip-flops to 
short down signal propagation paths in combinational logic, 
but instead rely on logic circuit styles, which naturally exhibit 
the capacity to block data propagation. Thus, they are well 
suited to implement pipeline architectures without memory 
elements. Potential of dynamic logic, with its precharge and 
evaluation phases, to implement this kind of pipelining was 
long ago recognized. Thus, in [1] the operation of the well-
known dynamic-based Domino logic, in a pipelined fashion 
using an overlapping multi-phase clock scheme and without 
latches between consecutive clock phases, was analyzed in 
depth. It is known that many variations of this multi-phase 
solution have been developed achieving high performance [2]-
[8] and some of them have been applied to speed up critical 
parts of commercial microprocessors. In particular, 
architectures with a single gate per clock phase (nanopipeline) 
have been proposed and have demonstrated large operating 
frequency and throughput [4],[5],[7],[8].  
In spite of their speed advantages, it is well known that 
dynamic gates exhibit limitation such that only non-inverting 
blocks can be chained (a static inverter is added between each 
two dynamic stages to guarantee that all inputs to the next 
logic block are set to 0 after the pre-charge period in Domino). 
The functional limitation is not the only penalty related to the 
non-inverting behavior of Domino gates. We have realized 
that the operating frequency of Domino nanopipelines is not 
independent of the number of pipeline stages as it should be. 
This behavior rises from the fact that in order to produce a 
logic one, a non-inverting gate requires one or more of its 
inputs to be also at logic one. This translates in that non-ideal 
logic ones get worse as they propagate through the logic 
network, eventually leading to a functional failure. Non-ideal 
logic ones could be the result of parameter or operating 
conditions variations. In this paper the use of inverting 
Domino-like gates to improve circuit robustness is explored. 
The paper is organized as follows: in Section II, gate 
topologies for both the conventional non inverting Domino 
and the recently proposed inverting DOE logic style [9] are 
described. In Section III, the operation of pipelines build from 
them is analyzed and the implications of the non-inverting 
behavior of Domino are illustrated. In Section IV, DOE and 
Domino nanopipelines are evaluated through simulations 
experiments and compared in terms of robustness. Finally, 
some conclusions are given in Section V. 
 
II. DOMINO AND DOE GATE TOPOLOGY 
A. Domino Topology 
A conventional dynamic gate (or Domino gate) is shown in 
Figure 1a and consists of a dynamic stage and a static output 
stage. A keeper transistor (MK) is added to protect dynamic 
node against leakage/noise. It operates in two phases called 
precharge (CLK = 0) and evaluation (CLK = 1). During the 
precharge phase, the dynamic node is precharged to VDD 
through MPREC (and, thus, the output is discharged), whereas 
during the evaluation phase, the pull-down network (PDN) 
and the footer transistor MFOOTER discharge the dynamic node. 
The sizing of these three elements, the PDN, the keeper 
transistor and the PMOS transistor of the output determine the 
evaluation delay in the conventional gate 
B. DOE Topology 
Recently, we have proposed a new topology called 
Delayed Output Evaluation (DOE) [9]which modifies the 
static output stage of the conventional Domino gate. Figure 1b 
depicts the schematic of the DOE topology, in which the static 
inverter of the output stage of the Domino gate has been 
replaced by a static NAND gate and one static inverter. Note 
that the inputs of the NAND are the dynamic node and a new 
version of the clock, DCLKV , whose rising edge   is delayed with 
respect to the rising edge of VCLK by ΔCLK, while, ideally, both 
falling edges are simultaneous (see Fig. 1b). For DCLKV = 0, 
978-1-4673-7228-2/15/$31.00 ©2015 IEEE 
VNAND is pulled up independently of VDYN. The static inverter is 
added guarantying that the precharge value of the gate output 
(VOUT) is low as in Domino logic.  For DCLKV = 1, the NAND 
gate evaluates its input. For input combinations which 
discharge the dynamic node, the pull-down network is on and 
gate output remains low. For input combinations which do not 
discharge dynamic node, the NAND output node is pulled 
down and VOUT is pulled up.  
DOE topologies implement inverting functionalities since, 
as already mentioned when describing its operation, a logic 
one is achieved when an input combination which does not 
discharge the dynamic node is applied. Clearly, this is an 
advantage compared to conventional Domino gates in terms of 
logic flexibility.  
Unlike conventional gate, in DOE the evaluation delay is 
determined by the speed of the NAND-INV static stage and by 
the amount by which evaluation of the NAND is delayed. Gate 
delay is to some extend independent of how fast dynamic node 
discharges. As a result, achieved delay-noise tolerance trade-
off is significantly better than in Domino gates as we showed 
in [9]. Also, the design of a Kogge-Stone adder using DOE 
gates is reported in [9] in order to validate their capability to 
build up logic networks. 
III. ANALYSIS OF PIPELINE OPERATION 
Figure 2a shows the block diagram of the target gate-level 
pipeline interconnection scheme, using, in this example three 
overlapped clock phases. We have implemented it with 
Domino OR gates and simulated at different frequencies. The 
output of each stage is analyzed. Over a given frequency, logic 
ones start to degrade as they propagate through the chain of 
gates, eventually leading to a functional failure after several 
stages, as it is shown in Figure 2b. Note the final output of the 
Domino chain (VOUT,CONV in Fig. 2b) is zero since due to the 
degradation of the intermediate outputs.  
This behavior can be explained on the basis of the input 
combination producing a zero-to-one transition. In Domino, 
being non-inverting, this output transition is associated with 
inputs combinations discharging the dynamic node. 
Discharging of the dynamic node requires one or more inputs 
being at logic one.  “Good” ones are required to fully 
discharge dynamic node and produce a “good” output one. 
Moreover, non-ideal behavior of consecutive stages 
PDN
VDD
VCLK
VCLK
VOUTINPUTS
VDYN
MPREC MK
MFOOTER
     
 (a)      
 
MPREC MK
PDN
MIN
VDD
VCLK
VCLK
VCLK
D
MFOOTER
VOUTINPUTS
VDYN
VNAND
                               
(b) 
 
Fig 1. (a) Domino gate. (b) DOE gate. 
 
STG1
VCLK,1
STG2
VCLK,2
STG3
VCLK,3
STG4
VCLK,1
STG10
VCLK,1
...
VOUT,STG1 VOUT,STG2 VOUT,STG3
VOUT
 
 
VCLK,1
VCLK,2
VCLK,3
 
                      
                     (a) 
                   
  
 
t0
VDD
VOUT,CONVVOUT,STG4
VOUT,STG3
VOUT,STG2
 
 
(b) 
 
 
  (d)                        
Fig 2. (a) Nanopipeline and three-phases clock scheme. (b) 
Simulation results corresponding to a chain of ten 16-inputs Domino 
OR gates. (c) Voltage level to which the dynamic node of each stage 
discharges. 
 
 
accumulates. A non-ideal one causes that the dynamic node is 
not fully discharged. This translates in faster precharge of the 
dynamic node and so even more narrower logic one output. 
Thus, dynamic node of the next stage is discharged to a higher 
voltage level. 
Contrary, for pipeline stages implementing inverting 
functionality, the zero to one output transition occurs for input 
combinations which do not discharge the dynamic node. Thus, 
how good the output logic one is does not depend on how 
good the input logic ones are.  
In order to illustrate the differences we have simulated ten 
stage pipelines implemented with both logic styles. The 
voltage level to which the dynamic node of each stage 
discharges has been measured at different frequencies. 
Simulation results are summarized in Fig. 2c in which this 
voltage level versus stage number for Domino and DOE 
networks are depicted. In Domino, the discharge of the 
dynamic node is progressively degraded in consecutive stages 
and the complete chain does not operate correctly. DOE 
behavior is completely different. Minimum voltage level is 
slightly increased from first stage to second one due to non-
ideal inputs but then remains constant. The accumulative 
effect observed in Domino is avoided. The circuits work up to 
a given frequency. All stages fail if frequency is further 
increased. That is, it is not the accumulative effect what is 
causing circuit failure. 
These results suggest advantages in terms of robustness 
related to the use of inverting stages. DOE pipelines could be 
less sensitive to parameter or operating conditions variations 
since their effect does not accumulate along the network. Next 
section describes the experiments carried out to compare 
robustness of Domino and DOE logic styles for pipeline 
operation. 
IV. SIMULATION RESULTS 
Several experiments have been performed in order to 
evaluate the robustness of a set of ten stage chains like those 
described in previous section implemented both in Domino 
and DOE. A very conservative operating frequency has been 
calculated for each benchmark pipeline. Then, their behaviors 
are analyzed by simulation of corner SS and by determining 
minimum voltage at that frequency. 
Benchmark pipelines.- Gate level pipelines built from three 
types of gates (NOR/OR-8, NOR/OR-16 and NAND/AND-2) 
have been analyzed, each one with four different sizes of the 
keeper and the precharge transistor and, thus, different 
behaviors of the dynamic node (NOD) are obtained. Dynamic 
stages as well as keeper transistor and feedback and output 
inverters have been identically sized in Domino and DOE 
counterparts. The NAND sizing and the amount by which the 
clock is delayed in DOE (ΔCLK) are selected such that gate 
delays of DOE and Domino are comparable. Gates are 
connected such that input changes propagate through the 
circuit and each gate is excited with the worst case input 
combination. Simulations have been carried out in a 
commercial 1.2V 130nm technology for each gate type.  
Estimation of conservative frequency.- Five conservative 
timing constraints have been derived in order to guarantee a 
correct operation of the pipeline, involving  both the period 
and the precharge and evaluation delays of the dynamic and 
output nodes. The maximum frequency that fulfills all the 
constraints described below is obtained for each pipeline. 
• RST 1: One third of the period must be larger than the 
evaluation delay of the output node in order to ensure 
that the input to next stage is available when its 
evaluation phase starts. 
T/3 > Output Node Evaluation Delay (OUTE) 
• RST 2: One third of the period must be larger than the 
evaluation delay of the dynamic node. Thus ensuring 
that the dynamic node is enough discharged before 
evaluation of next stage starts. 
T/3 > Dynamic Node Evaluation Delay (NODE)  
• RST 3: Half of the period must be larger than the 
precharge delay of the output node to guarantee that the 
output node is fully precharged. 
T/2 > Output Node Precharge Delay (OUTP) 
• RST 4: Half of the period must be larger than the 
precharge delay of the dynamic node to guarantee that 
the dynamic node is fully precharged. 
T/2 > Dynamic Node Precharge Delay (NODP) 
• RST 5: Sixth of the period plus the precharge delay of 
the output must be larger than the evaluation delay of 
dynamic node to guarantee that evaluation is not 
stopped by the precharge of the previous stage. 
T/6 + Output Node Precharge Delay > Dynamic 
Node Evaluation Delay (NODE) 
To calculate conservative frequencies gate delays have 
been characterized using same load conditions and input 
combinations that when operated in the pipelines. Four delays 
have been measured according to the following criteria:  
• Evaluation (E): It has been measured from 50% of the 
rising edge of the clock to 10% of the falling edge of 
the dynamic node (NODE) and to 90% of the rising 
edge of the output (OUTE). 
• Precharge (P): It has been measured  from 50% of the 
falling edge of the clock to 90% of the rising edge of 
the dynamic node (NODP) and to 10% of the falling 
edge of the output (OUTP). 
Table I reports these delays for the 12 Domino and DOE 
gates from which pipelines are built. Keeper and precharge 
transistors are KP and KPRE  times the minimum size of the 
technology, respectively. As expected, similar results for the 
dynamic node are obtained in both counterparts. Larger values 
of KP imply that the dynamic node discharges slower, whereas 
larger values of KPRE speed the dynamic node precharge up. It 
can be observed that a wide range of dynamic node behaviors 
are covered (evaluation delays form 32ps to 104ps and 
precharge values from 37ps to 106ps). There are differences in 
the output node delays. In DOE they are not directly linked to 
the keeper and precharge transistor sizes, since they depend on 
ΔCLK, and the delays of the static NAND and the output 
inverter. In spite of the differences, the delays of DOE and 
Domino counterparts are comparable due to the design criteria 
above explained. Table I also includes the derived 
conservative operation frequency for DOE pipelines (FN,DOE), 
normalized with respect to its Domino counterpart. 
TABLE I.  GATE DELAYS 
Gate KPRE KP 
 Domino(ps) DOE (ps) 
FN,DOE NOD OUT NOD OUT 
NAND/AND 2 
5 1 
E 34.4 43.98 33.53 42.34 
1.03 P 37.7 47.82 36.24 53.81 
3 1 
E 32.08 42.57 31.28 43.15 
0.98 P 53.59 58.52 51.1 53.67 
3 3 
E 44,17 48,42 43,19 42,96 
1.12 P 55,99 60,53 54,2 53,85 
5 3 
E 46,66 50,01 45,71 42,39 
1.09 P 40,28 49,1 38,77 53,87 
NOR/OR 8 
7 5 
E 61.99 61.57 55.12 58.03 
1.06 P 57.84 33.84 53.11 51.43 
7 3 
E 49.19 55.17 45.69 54.98 
1.00 P 56.14 33.13 51.23 51.5 
5 5 
E 60,3 60,4 56,4 55,44 
1.07 P 73,27 41,89 67,26 51,6 
5 3 
E 73,09 54,02 66,99 55,68 
0.97 P 72,39 82,48 66,73 61,91 
NOR/OR 16 
10 7 
E 108.4 93.72 104.1 87.53 
1.08 P 64.59 64.91 61.19 92.68 
10 5 
E 87.52 80.99 83.59 87.39 
1.00 P 63.21 64 59.72 92.69 
5 7 
E 104,2 90,3 99,81 87,73 
1.04 P 106,4 97,1 101,2 92,56 
7 5 
E 85,12 79,27 81,2 87,49 
1.08 P 82,25 77,82 77,9 92,54 
 
Results.- Table II summarizes the results of the two 
experiment carried out  on the pipelines. Concerning the 
minimal VDD at which the chains are operative at their 
conservative frequency, in 9 0f the 12 pipelines DOE can be 
operated at lower VDD. Only for one benchmark Domino can 
operate at lower VDD than DOE. 
The second simulation consists of an analysis of corners. It 
can be observed that DOE passes the SS corner for all the 
chains. Domino passes for 3 of the 12. Two of them are built 
from the simplest gates (AND2). None Domino pipeline built 
from the more complex gates (OR16) passes the SS corner.  
These results support that DOE pipelines are less sensitive 
than Domino to parameter or operating conditions variations 
as we suggested in Section III. However further analysis is 
required to show that the improvements are not due to the gate 
itself but to advantages of their cascading. The pipelines have 
been simulated with VDD voltage 100mV under their measured 
minimal VDD and the first stage failing has been identified 
(STGVDD in Table II). Also this experiment has been carried 
out for Domino pipelines that do not pass the SS corner 
(STGSS). Domino failures appear at different stages while it is 
the second stage (the first receiving non-ideal outputs) the one 
failing in DOE. This confirms that the failures are due in most 
cases to the accumulative effect exhibited by Domino. Unlike, 
DOE failures are due to gate delay degradation. Parameter or 
VDD variations that do not degrade gate delays enough to 
produce the DOE failure are not tolerate in Domino pipelines 
after several stages because they accumulate. 
TABLE II.  MINIMAL VDD AND CORNERS ANALYSIS  
 Building block
Domino DOE 
VDD(V) STGVDD SS STGSS VDD(V) STGVDD SS
1
NAND/AND 2 
 
1 9 3 - 0.9 2 3
2 1.1 8 2 4 1.1 2 3
3 1 2 3 - 1.1 2 3
4 1 4 2 8 0.9 2 3
5
NOR/OR 8 
1 4 2 4 0.9 2 3
6 1 5 2 8 0.9 2 3
7 1 5 2 5 0.9 2 3
8 0.9 3 3 - 0.9 2 3
9
NOR/OR 16 
1.1 2 2 3 0.8 2 3
10 1 3 2 3 0.8 2 3
11 1 6 2 6 0.8 2 3
12 1 2 2 7 0.8 2 3
 
V. CONCLUSIONS 
We have analyzed and compared the operation of gate-
level pipelines implemented with Domino and with DOE 
gates. We have shown that a non-ideal one degrades as it 
propagates through the Domino network, eventually leading to 
a functional failure, due to its non-inverting nature. DOE, 
being inverting, does not exhibit this behavior. Our 
experiments support that this translates in larger tolerance to 
variations of DOE pipelines.  
ACKNOWLEDGMENT 
This work has been funded by Ministerio de Economía y 
Competitividad del Gobierno de España with support from 
FEDER under Project TEC2013-40670-P. 
 
REFERENCES 
[1] D. Harris and M.A. Horowitz, "Skew-tolerant domino circuits", IEEE J. 
of  Solid-State Circuits, vol.32, no.11, pp.1702-1711, Nov. 1997. 
[2] R. Hossain, “High Performance ASIC Design”, Cambridge, 2008. 
[3] S. Horne, D. Glowka, S. McMahon, P. Nixon, M. Seningen and G. 
Vijayan, "Fast14 Technology: design technology for the automation of 
multi-gigahertz digital logic", International Conference on  Integrated 
Circuit Design and Technology, pp. 165- 173, 2004 
[4] W. Belluomini; D. Jamsek; A. Martin; C. McDowell; R. Montoye; T. et 
al. “An 8 GHz floating point multiply”, IEEE International Solid-State 
Circuits Conference, pp. 374-604., 2005. 
[5] J. Sivagnaname, H.C. Ngo, K.J. Nowka, R.K. Montoye and R.B. 
Brown,”Wide limited switch dynamic logic circuit implementations”, 
IEEE International Conference on VLSI Design, 2006. 
[6] R.J. Sung and D.G. Elliot, “Clock-logic domino circuits for high-speed 
and energy-efficient microprocessor pipelines”, IEEE Trans. on Circuits 
and Systems II: Express Briefs, vol. 54, no.5, pp. 460-464, 2007. 
[7] C.K. Jerry, W.-H. Ma, S. Kim and M. Papaefthymiou, "2.07 GHz 
floating-point unit with resonant-clock precharge logic", IEEE Asian 
Solid State Circuits Conference (A-SSCC),  pp.1-4, Nov. 2010. 
[8] Z. Owda, Y. Tsiatoushas and T. Haniotakis, “High Performance and 
Low Power Dynamic Circuit Design” IEEE New Circuits and System 
Conference  pp. 502-505, 2011. 
[9] J. Núñez, M.J. Avedillo, J. M. Quintana, H. J. Quintero. “Novel 
Dynamic Gate Topology for Superpipelines in DSM Technologies” 
Proceedings Digital System Design 2013. pp. 280-28, 2013. 
 
