effective channel length) is presented. The prescaler implementation purpose is the evaluation of the E-TSPC technique potentialities.
This paper is organized as follow. In Section II, the principal features of the E-TSPC technique, blocks and design rules, are presented. In Section III, some different dual-modulus implementations are analyzed. Experimental results and comparisons are reported in Section IV, and the principal conclusions are drawn in Section V.
II. E-TSPC CIRCUIT BLOCKS AND COMPOSITION RULES

A. Basic CMOS Blocks
An E-TSPC circuit should use any of the blocks: CMOS static block, n-dynamic block [ Fig. 1 and high (PH) and low (PL) data-precharged blocks (Fig. 2) .
In Fig. 1 , the clocked transistors of the n-and p-latches are placed close to the power rail, following the suggestion of [11] . This configuration can attain a higher speed but suffers chargesharing problems. Clocked transistors close to either the power rail or the block output are admissible latch configurations.
In data-precharged blocks [10] , some input signals, called precharging inputs or pc-inputs, control the output precharge (see Fig. 2 ). If all PH block pc-inputs are high, or if all PL block pc-inputs are low, then the PH or PL block is precharged. In this case, the PH block output goes to low, and the PL block output to high. In Fig. 2 , the CMOS static block that executes the logic function is drawn, along with all equivalent PH and PL blocks [ Fig. 2 (b) and (c)]. The pc-inputs of each block are also indicated. The PH and PL blocks that have the output precharged when the clock is low will be called n-Dp blocks; similarly, the PH and PL blocks that have the output precharged when the clock is high will be called p-Dp blocks.
B. Composition Rules
First, the definition of data chains, fundamental to the design rules, is given.
Definition: An n-data chain is any noncyclic signal propagation path:
1) containing at least one n-latch, one n-dynamic, or one n-Dp block; 2) starting in a circuit external input, or in the output of a p-latch, p-dynamic, or p-Dp block; when this output is followed by static blocks in the normal data flow, the data chain starts in the output of the last static block;
0018-9200/99$10.00 3) going through static, n-dynamic, n-Dp, or n-latch blocks;
4) regardless of the number and ordering of the blocks defined above; 5) finishing in a circuit external output, or in the input of the first p-latch, p-dynamic, or p-Dp block.
For the p-data chains, an equivalent definition applies, replacing n with p and vice versa.
When clock is high, n-data chains are in evaluation phase; otherwise, they are in holding phase. P-data chains evaluate when clock is low. In Fig. 3 , part of a circuit schematic is depicted with seven complete n-data chains. Some examples are the data chain starting at input and going through blocks , , , and ; the data chain starting at and going through , , , , and ; and the data chain starting at and going through , , , and . Five of the six E-TSPC composition rules are now listed. Their purpose is to ensure the observance of some constraints during the evaluation and holding phases. To simplify the rule statements, the symbol will be used to denote n or p in nouns like -data chain, -dynamic block, etc.
Composition Rule ( ):
The -data chain input should be an input of a dynamic block, an input of a latch, or a nonpcinput of a Dp block.
Composition Rule ( ): A -latch must not drive, directly or through static blocks, a -dynamic or a -Dp block.
Composition Rule ( ): The number of inversions between: ) any two adjacent dynamic blocks must be odd 1 ; ) any two adjacent Dp-blocks of the same type (PH and PH or PL and PL) must be odd; ) any two adjacent Dp-blocks of complementary types must be even; ) a PH (PL) block and an adjacent n-(p)-dynamic (or vice versa) in an n-(p)-data chain must be even; ) a PL (PH) block and an adjacent n-(p)-dynamic (or vice versa) in an n-(p)-data chain must be odd. Composition Rule ( ): Consider the last dynamic block in the -data chain (when it exists). The number of inversions (due to any block) from this dynamic block up to at least one -latch must be even.
1 Through all the rules, zero inversion will be considered even. 
The -data chain must have one of the following two configurations:
) at least one dynamic block and one latch;
) at least two latches and an even number of inversions (latches or static blocks) between them. It is worth noting that these five composition rules are very similar to the five rules proposed in the NORA technique [6] .
In a circuit where all data chains obey the five rules, it can be proved that (six theorems presented in [1] and [2] ): a) all data-precharged blocks are precharged during the holding phase of the data chains to which they belong; b) the dynamic and the data-precharged blocks are not incorrectly discharged during the evaluation phase; c) the output of the data-chain last latch is steady during the holding phase of the data chain.
C. Exception Rule
Although the above-described rules are necessary to avoid race problems, typical TSPC systems do not follow some of them. The most common exception is found in connecting two D-flip-flops (D-FF's), as shown in Fig. 4 . In such a configuration, the p-data chains are constituted of only one p-latch block, namely, or ( violation). In consequence, the p-latch output may change during its holding time. A faulty sequence example is depicted below: consider an initial state on which the signals clock, input, and output a are low, and both blocks and are evaluating. At the end of the evaluation period, the outputs and are high. Subsequently, when the clock goes to high, the other blocks will evaluate. Suppose that works properly, holding its former value (high). In this case, the node goes to low, output a goes to high, and goes to low. As a result, the transistor is cut, and the final value of node will depend on the circuit delays.
Commonly, the delay between nodes and is long enough to ensure that is fully discharged through transistors and ; in this case, the second D-FF works properly. A simple exception rule is added to cover the utilization of the well-established TSPC D-FF's (Fig. 4) .
Exception Rule ( ): Configurations similar to that of Fig. 4 , where rules and are not obeyed, are accepted if enough delay exists.
The data chains where is applied, to the detriment of and , do not have a latch with steady output at the holding phase. Since the correct operation of the circuit will depend on the block delays, the exception rule should be used with caution.
Considering the connection rules presented in former works [7] - [10] , our six proposed rules differ in the following aspects.
a) The "nonlatched domino logic," a timing strategy considered in [10] , is not accepted in our proposal. b) The proposed rules permit a more flexible usage of both data-precharge blocks, due to the distinction between pc and nonpc-inputs, and static logic blocks (static logic is allowed between dynamic and latch blocks). In Fig. 2 , where no rule violations occur, several connections not allowed by former work rules are provided, for instance, the connection between blocks and , between and , between and , etc.
D. NMOS-Like Logic Extension
When high speed is also a requirement, restrictions on the use of p-dynamic and p-latch blocks should be imposed. These blocks have at least two p-transistors in series, which may reduce considerably the maximum speed. In such applications, the p-data chains are limited to one block, and most logic operations are handled with n-data chains with limited logic dept. Thus, deep pipelines will be necessary to implement complex and fast logic designs.
NMOS-like dynamic and latch blocks can be used to minimize this difficulty and also to increase the n-data chain speed. They are ratioed logic blocks, where the n-transistor section and the p-transistor section may conduct simultaneously. A similar technique was used in [12] , but restricted to D-FF's. In Fig. 1 , the NMOS-like versions of the dynamic and latch blocks are drawn. To assure a correct operation, these blocks should satisfy the constraints summarized in Table I . The transistor section that must impose the output value, when both sections are conducting, is drawn with bold lines in Fig. 1 .
The NMOS-like blocks are faster due to the reduced number of transistors in series, but, unfortunately, they consume more power. In consequence, they should be used only in critical data chains, where the desirable speed has not been reached. Since the connection characteristics do not depend on whether it is a conventional or an NMOS-like block, the composition rules ( -and ) are valid and necessary for both; as a result, NMOS-like blocks and conventional blocks can replace one another, and the judicious selection of NMOS-like blocks is made easy.
Summarizing, the static blocks, the n/p-dynamic, the n/platch, the PH/PL data-precharged, the NMOS-like blocks, and the composition rules -and compose the E-TSPC technique. 
III. DUAL-MODULUS DESIGN
Dual-modulus prescalers, a circuit with applications in frequency synthesis systems, have been frequently used to compare different high-speed implementations [12] and [13] , our current goal. A high-speed dual-modulus prescaler (divide by 128/129) was designed using a standard 0.8 m CMOS bulk process.
The schematic of the dual-modulus prescaler is depicted in Fig. 5 . The circuit inside the cross-hatched box, composed of three D-FF's and two logic gates, forms a divide-by-4/5 counter. The div32 signal selects if it counts up to four (div32 high) or up to five (div32 low). The five D-FF's at the bottom of the figure form a divide-by-32 counter. The fractional division ratio of the prescaler, 128 or 129, is selected according to the signal. Four different approaches were applied to draw a layout of the divide-by-4/5 counter, which is the critical high-speed part of the prescaler. The approaches are:
) design with conventional rise edge-triggered TSPC D-FF (Fig. 4); ) design with rise edge-triggered D-FF, and further optimization applying the E-TSPC technique;
) design with a modified fall edge-triggered D-FF [12] ;
) design with fall edge-triggered D-FF, and further optimization applying the E-TSPC technique. In Fig. 6 , the transistor schematic of the approach, with transistor dimensions, is depicted. The three cross-hatched boxes mark the D-FF's; the first D-FF (left) has a buffered output. The maximum speed and the power consumption for each design are shown in Table II . These results were obtained with SPICE simulations from the extracted netlists of the layouts for slow parameters, room temperature, and power supply at 5 V. The comparison of the results exhibits some advantages of the E-TSPC technique. From the to approach, the speed improvement is higher than 70%, and from to is 20%. On the other hand, the power consumption increases 72% from to . As uses only NMOSlike blocks, the latter result is not surprising, and confirms that these blocks should be restricted to critical circuit parts. Since the composition rules favor the replacement of conventional blocks with NMOS-like ones and vice versa, E-TSPC circuits can reach high speed and keep the power consumption low.
To better evaluate the above results, the following notes should be taken into account:
• all approaches use small transistor sizes, usually minimum sizes (as indicated in Fig. 6 );
• the Fig. 5 divide-by-4/5 counter schema was slightly modified for each design ( ) to conform with its structure characteristics;
• the NOR configuration of Fig. 6 is similar to an NMOS logic, but the load is now a PMOS transistor. It is faster than the CMOS static NOR and is used in the , , and approaches; • and blocks, Fig. 6 , drive the clock signal to the divide-by-32 counter. All four designs have similar configuration.
IV. EXPERIMENTAL RESULTS
The full prescaler circuit, occupying a 0.0126 mm area, was formed with the counter . The D-FF's of the 32 asynchronous counter were built with conventional rise edgetriggered TSPC D-FF (Fig. 4) . The clock signal from the divide-by-4/5 counter, Fig. 6 , is inverted before being sent to the 32 counter. This expedient allows a longer time interval for preparation of the signal div32.
The prescaler test chip, whose photograph is shown in Fig. 7 , was mounted on an alumina substrate with the chip-onboard technique. A coplanar radio-frequency probe was used to feed the unique prescaler high-speed signal, the clock input.
In Fig. 8 , the measured maximum frequency and current consumption as a function of the power supply are shown. Since the used pulse generator has a maximum excursion of 3 V, the circuit real maximum frequencies are expected to be slightly higher than the measured results for power supply above 3 V.
Performance results of this work, of two recently published prescalers using TSPC D-FF's, and of a new prescaler architecture are summarized in Table III . In [13] , the prescaler is implemented with rise edge-triggered TSPC D-FF's, which were size optimized to reach maximum speed; in consequence, not only the circuit speed but also the area and power consumption are high. Fall edge-triggered TSPC D-FF's with small-sized transistors and with some NMOS-like blocks are used in [12] . The resulting circuit has a small area and a low power consumption but a reduced maximum operation rate. Our implementation, with the E-TSPC technique and smallsized transistors, provides the smallest area and the lowest power consumption; the speed, in addition, is comparable to [13] and [14] .
V. CONCLUSIONS
A complete high-speed dual-modulus prescaler (divide by 128/129) was developed in a 0.8 m CMOS process. The measured circuit attained 1.59 GHz and 8.0 mW/MHz power consumption with 5 V power supply. It can be advantageously compared with other implementations in terms of area and power consumption; in terms of speed, it matches the fastest TSPC prescaler. The studies done during the design reveal that, to take full advantage of the TSPC technique, every possible configuration should be considered. The E-TSPC, being an extension of TSPC, permits exploring a larger number of solutions and, in consequence, finding the best configuration. The dual-modulus prescaler results exhibit some significant improvements produced by the E-TSPC.
