is fully racefree, yet has high logic flexibility. 
I. INTRODUCTION I N the conventional CMOS technique there is an inherent redundancy of information.
For each n-type device there is a corresponding p-type device. In fact, a complete logic function is built with the n devices and repeated with the p devices. Aa a consequence of this approach, substantial amounts of silicon are wasted, especially for complex logic. Also, power dissipation and speed are degraded by the extra area and extra transistors.
Another important problem of CMOS technique is clock races in pipelined circuits.
To latch the information between two pipelined sections, transmission gates are usually employed. In Manuscript received November 8, 1982; revised January 18, 1983 . This wc,rk was supported in part by Funda$ao de Amparo a Pesquisa do Estado de S50 Paulo, Brazil.
The authors are with the Department Elektrotechniek-ESAT, Katholieke Universiteit Leuven, B 3030-Heverlee, Belgium.
--j~PMOS +~NMOS CMOS logic, these transmission gates are generally implemented with p-n gates in parallel and controlled by clocks o and T, as shown in Fig. 1 . The use of single gates (p-or n-type) is to be avoided in CMOS due to power dissipation and low noise mar@ as a result of clock feedthrough and bulk effect. CMOS p-n transmission gates, controlled by clocks o and~, suffer from signal races. As depicted in Fig. 1 , this results from unavoidable overlap of the clock phases during the clock transitions. During the phase overlaps, all the transmission gates are switched on, which may cause illegal flow of information, depending on the ratio between the gate delay and the clock skew. This race problem is usually bypassed by a careful synchronization of the two clock phases within a small fraction of the gate delay (a few nanoseconds). This skew clock control is extremely difficult, especially for high speed technologies, for unmatched clock loads or for distributed clock VLSI circuits [1] . This leads to highly critical and untestable designs. A possible solution to the clock race is the use of four clock phases which, however, requires too much siliconl area.
To overcome the redundancy of information in the conven- to a high level while the current path to ground is turned off.
Then, for phase @= 1, the path to the high level is turned off by the clock and the path to ground is turned on. Therefore, depending on the state of the inputs, the output node will either float at the high level or will be pulled down. A clear advantage of this CMOS dynamic block is the reduced silicon area. Whereas there are 2 n transistors in a conventional n-input CMOS gate, the dynamic configuration needs only n + 2. ALSO due to the smaller area and consequently smaller capacitances, power dissipation and speed are, in principle, improved by the dynamic approach.
A strong limitation of this dynamic structure is the impossibility of cascading the logic blocks for implementing complex logic. Consider, for instance, the circuit in Fig. 3 . During the precharge phase, nodes 111 and fV2 are setup to the high level "1 ." In the evaluation phase (~= 1), internal delay in block 1, associated with a "1" -+ "O" transition of node JV1, can cause an incorrect discharge of node i?2. This occurs because, during the evaluation phase and while node iVl is still "1 ," there is a direct path between node N2 and ground. When this path is eliminated by the effective transition of node N1 to "O ," the precharge information of node N2 could already be gone. We define such a race as the "internal delay problem."
In the Domino technique, Krambeck et al. [4] have solved the internal race by placing a static inverter after every dynamic block, as indicated in Fig. 4 . During the precharge phase, the outputs of all the static inverters are set up to a low level. Consequently, all the n-type transistors driven by these inputs are set up to an OFF condition. Now, during the evaluation phase, internal delays cannot incorrectly discharge the dynamic storage nodes since during the entire delay period the path to ground is turned off.
A Logic composition rules to combine these functions, preserving the racefree properties, will be presented in Section III. As it will further be shown, to guarantee a fully racefree operation in pipelined circuits, the storage of information must always be performed by a C2MOS function block (C2MOS latch stage). In a previous paper [6], the NORA (NO l?xtce) technique was called n-p-CMOS, due to the possible employment of n-and p-dynamic blocks, We decided to change the name because the p-dynamic block is not essential to the racefree principle; it is only used to increase the logic flexibility. The pipelined circuit in Fig. 5 is defined as a @-section. For phase @= O~= 1, the~-section is in the precharge phase. The outputs of all the n-and p-dynamic blocks are precharged to "1" and "O," respectively. Mso during this phase, the @section inputs are in a sampling mode, i.e, these inputs are set up.
For phase @= 1~= O, the @-section is in the evaluation phase.
The~-section inputs are held constant, and the outputs of all the dynamic blocks are evaluated as a function of the o-section inputs and of the internal inputs 1. From these output results, those which must be transferred to the next pipelined section are stored in C2MOS latch stages.
In the circuit of When direct coupling between dynamic blocks is desired, the logic function is implemented by alternating p-and n-logic blocks.
If the inverter is required, a Domino like connection is employed, i.e, sequences of the same block type are used (n1For convenience, the inputs of a dynamic block have been separated into section inputs and internal inputs.
The section inputs are set Up during the precharge phase. 2) n-p as well as p-n sequences are possible and the sequences can be of arbitrary logical depth. Therefore, many logic levels can be operated in only half a clock period.
--By interchanging o and 7 in the circuit of 
COMPOSITION RULES
In this section the racefree properties of the NORA technique will be carefully analyzed. L@c composition rules to combine dynamic, conventional, and C2MOS function blocks will also be derived.
A. Internal De.Lzy Racefree Property
The internal delay racefree property is defined as the capability of the dynamic block to keep its preckarge signal during the delay time of the previous blocks to set up the internal inputs. It is easy to prove that a dynamic block will have the internal delay racefree property if the following conditions occur ; 1) During the precharge phase, the internal inputs are setup in such a way they cut off their corresponding transistors.
2) During the evaluation phase, the internal inputs are glitchfree, i.e, these inputs can make only one transition.
From the above conditions the following results can be derived:. a) When the number of "static" inversions between two dynamic blocks is even, complementary type of logic blocks must be used for these two blocks (n-p or p-n). For instance in Fig. 5 , this corresponds to alternate p-and n-logic blocks when the direct coupling between dynamic blocks is desired.
b) The same type of dynamic blocks (n-nor p-p) must be used when the number of "static" inversions is odd. In Fig. 5 
B. Clock Racefree Properties
As indicated in Fig. 6 , to have a worjcing pipelined system the results generated during the evaluation phase must be held constant until the end of the transfer phase. The latched information should not be ziltered by the precharge signal or by input variations. It will now be proven that after the evaluation phase a NORA pipelined section keeps its output results in spite of high-high or low-low clock overlaps (clock skew). For simplicity, let us initially consider that all the circuits in the pipelined section are built only with dynarnjc blocks; the two exceptions being the C2 MOS latch stage and the static inverter for connecting complementary dynamic blocks. For this circuit, two possible cases should be analyzed.
Case I-Precharge Racefree
During the evaluation phase, the dynamic block which precedes the C2MOS latch stage has its precharge signal modified by the inputs. Such a situation is indicated in Fig. 7 for an ntype and a p-type dynamic block. As indicated in Fig, 7 , the alteration of the output information is controlled by only one of the phases~or~. Therefore, these outputs are not influenced by the other phase. The outputs are, for instance, completely immune to the overlap of the phases. This kind of output latch~ontrolby only one phase (õ r~) is completely different from the conventional case with transmission gates, where the output latch is controlled simultaneously by the two phases~and~. In contrast to the critical clock skew specification of the conventional transmission gates (few nanoseconds), the NORA technique imposes no restriction to the amount of clock skew. The other possible case, i.e, when the dynamic block keeps the precharge signal, is illustrated in Fig. 8 .
If the dynamic block keeps the precharge signal, at least one of the logic transistor should be driven off. If this transistor is controlled by an internal input, the dynamic block which generates this input has also kept its precharge signal. This occurs because the internal inputs are precharged in such a way that the corresponding driven transistors' are off. Therefore, there must be at least one sequence of dynamic blocks with precharge signals preserved. Fig. 9 depicts this sequence. Again, as shown in Fig. 9 , the alteration of the output information is controlled by only one of the phases r#Ior T. Therefore, they are not influenced by the overlap of the phases.
For the case being analyzed, the racefree property has been derived from the interelation between a dynamic CMOS block And a C' MOS latch stage. Let us now show that the input variation racefree property can also be derived by the action of two C2MOS stages: "A NORA pipelined circuit is input variation racefree if the total number of inversions (static and dynamic) between two C2MOS latch stages is even." The proof is indicated in Fig. 10 . This racefree property can also be used to solve the clock race condition of some conventional CMOS circuits. An important circuit which can be built using the above property is the shift register.
Combining the racefree properties derived from two C2MOS functions and from C2MOS with dynamic block, the following result can be proven. 1) Precharge racefree: a) There is an even number of inversions between the C2 MOS output stage and the last dynamic block (see Fig. 11 ).
2) Input variation racefree: bl ) There is a dynamic block in such a way that there is an even number of inversions between this dynamic block and the C2 MOS input stage (see Fig. 12 ); or b2) the total number of inversions between the two (input, output) C* MOS stages is even (see Fig. 10 ).
If the pipelined section does not satisfy the clock race conditions, generally, circuit modifications can be easily included.
By way of example, consider the nonracefree pipelined section indicated in Fig. 13(a) . For this example, the following circuit modifications would eliminate the race condition: 1) conversion of one static function to dynamic function [see Fig. 13 stage, provided that the racefree property of the next pipelined section would not be destroyed" [see Fig. 13(d) ].
IV. DYNAMIC CMOS LIMITATIONS
In this section the limitations of the NORA technique WY be presented. These limitations are directly Telated to the dynamic storage of information and, therefore, they are common to all the dynamic techniques.
A. C%arge Redistribution
The output signal of the dynamic blocks relies on storage nocles. As indicated in Fig. 14, by commutation of an OFF transistor to an ON state, a charge redistribution effect may ap- pear between the output capacitance and the parasitic logic tree capacitances.
Normally, there will be no charge redistribution between the precharged node and the lo~c tree nodes controlled by section inputs. This occurs because these inputs are set up during the precharge phase and, therefore, the logic tree nodes will also be precharged. Yet, some charge redistribution effect will exist, if the precharge period after input set up is too small.
This extra period of precharge generally does not result in spepd limitation for the pipelked system duq to the small capacitances of the logic trees.
For the internal inputs, such attenuation of the charge redistribution does not exist, since these inputs are set up only after the precharge period. In this case, the charge redistribution must be minimized by layout and by proper logic tree arrangement. The transistors driven by internal inputs must be placed as far as possible from the output storage node.
B. Leahzrge and Noise Margin
Another limitation of the dynamic CMOS techniques is the leakage of the storage nodes. Due to clock feedthrough, power supply variation, noise, etc., the inputs of the dynamic block can be altered from the ideal zero and. VDD values. Consequently, the logic transistors are driven to weak inversion. This leakage effect imposes a limit to the lowest operating frequency and to the noise margin of the circuit. The threshold levels of the n-type and p-type devices are +1 V and -1 V, respectively. Fig. 18 shows experimental results for very large clock skew = 150 ns at 1 MHz clock frequency, without disturbing the circuit operation.
Notice that during the evaluation phase @= 1 @= O, the results are obtained and then hold constant until the end of the transfer phase @= Õ = 1. Also from experimental results, the minimum working frequency was less than 1 kHz at room temperature, indicating that the current leakage due to weak inversion is not a critical limitation.
The measured power dissipation of the serial fulladder was 17 p W/MHz at a supply voltage of 5 V. The circuits were designed for a maximum operating frequency of 14 MHz,
and the devices have been tested on the wafer probe up to 10 MHz. More careful measurements about speed and noise margin are under investigation and will be presented in a later publication. This simplifies the design and greatly increases the reliability, feasibility, and testability of CMOS circuits, The
NORA technique also provides very high density layouts, which compare favorably with NMOS solutions.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
