Abstract-A high speed and low power 8-bit carry-lookahead adder using two-phase modified dual-threshold voltage (dual-) domino logic blocks which are arranged in a programmable logical array-like design style with pipelining is presented. The modified domino logic circuits employ dual-transistors and reversed bulk-source biases for reducing subthreshold leakage current when advanced deep submicrometer process is used. Moreover, an nMOS transistor is inserted in the discharging path of the output inverter such that the modified domino logic can be properly applied in a pipeline structure to reduce the power consumption. The addition of two 8-bit binary operands is executed in two cycles. Not only is it proven to be also suitable for long adders, the dynamic power consumption is also drastically reduced by more than 10% by the measurement results on silicon.
I. INTRODUCTION
FAST ADDERS are key elements in digital circuits, e.g., multipliers and digital signal processing (DSP) chips. Many efforts have been focused on the improvement of adder designs [4] - [8] . CMOS dynamic logic has been recognized as one of the promising options to challenge the gigahertz operations for the adder design [1] . However, the major tradeoff of these prior gigahertz logic circuits is the high power consumption which is not a tolerable price to pay in recent mobile technologies. These circuits unavoidably consume power even if they are in a standby condition. A dual-threshold voltage (dual-Vt) circuit technique was proposed in [2] for reducing standby power dissipation while still maintaining high performance in domino logic. Lim et al. [11] proposed an energy-efficient carry-lookahead adder (CLA) using reversible energy recovery logic with self-energy-recovery circuit to reduce the reversibility overhead. Kursun et al. [3] employed sleep switches and dual-Vt CMOS technology to place an idle domino logic circuit into a low leakage state. In this paper, we propose a lowpower programmable logical array (PLA)-like structure using modified dual-Vt domino logic blocks. An 8-bit CLA using the dual-Vt domino logic blocks which are arranged in a PLA-like manner [6] and synchronously triggered. It is implemented on silicon to verify the power reduction as well as the preservation of high speed. The major advantage of the low-power design methodology is that it is robust regardless of long data words, e.g., 64-bit binary data. The power reduction is found to be more than 10% compared to the prior works.
II. LOW-POWER HIGH-SPEED 8-bit CLA

A. Typical Dual-Vt Domino Logic Circuits
Employing dual-Vt transistors for reducing subthreshold leakage current in domino logic circuits was proposed by [2] . We, then, utilized such a dual-Vt scheme to carry out a typical dual-Vt domino logic circuit in Fig. 1 . The high-V t transistors are represented in Fig. 1 by a thick line in the channel region. The domino logic operation is divided into two phases: precharge phase and evaluation phase.
1) During the precharge phase clock = 0, P11 is on and N11 is off.
Then, node A is precharged to V DD and the output is initialized to be low. 2) During the evaluation phase clock = 1, P11 is off and N11 is on.
If the low-V t evaluation block is evaluated to be "pass," the charge at node A should be ground through the low-Vt evaluation block and N11. The output then is a logic high. If the low-V t evaluation block is evaluated to be "stop," there will be no discharging path for node A. A "keeper" pMOS, P13, is added to keep node A at V DD . The output then is a logic low.
Summarized by the previous description, the output will be high when the low-Vt evaluation block is evaluated "pass," i.e., "1," during clock = 1. On the contrary, the output will be low when the low-V t evaluation block is evaluated "stop," i.e., "0," during clock = 1.
The critical signal transitions are the delay of the domino logic circuits occurring along the evaluation path when node A is discharging. Hence, in the dual-Vt domino logic circuits, high-Vt transistors are used in those noncritical precharge paths. Alternatively, low-V t transistors must be utilized in the speed critical evaluation paths [2] . As a result, the subthreshold leakage current of the dual-Vt domino logic circuits is expected to be smaller compared to an all low-V t domino logic circuit.
B. Modified Dual-V t Domino Logic Circuits
However, there is a problem with such a typical dual-V t domino logic. That is, the output of the typical dual-Vt domino logic circuit cannot hold the logic state during the precharge phase in the next cycle. For example, Fig. 2 shows a typical two-stage dual-V t domino logic circuit to construct a pipeline structure. Both stages must alteratively operate in the precharge phase and the evaluation phase, respectively, for the pipelining operation. When the first stage is in the precharge phase and the second stage is in the evaluation phase, node B will be low rather holds the previous state causing the second stage cannot evaluate the function itself. Therefore, it cannot be directly applied in any pipeline structure. By contrast, a modified dual-Vt domino logic circuit, as shown in Fig. 3 , is proposed to resolve such a difficulty. A clock-controlled nMOS transistor, N33 in Fig. 3 , is inserted in the discharging path of the output inverter. The operation of the modified dual-V t domino logic circuit is similar to that of the typical dual-V t domino logic circuit apart from the precharge phase. During the precharge phase, clock= 0, P31 is on, N31 and N33 are both off. Thus, P32 is switched off. The output has neither charging path nor discharging path such that the state will be kept as the previous state. This also results in that the circuit will consume less power.
C. High-V t Transistors With Reverse-Biased Bulk-Source Voltage
According to [9] , we attain the following V TH formulation V TH = V TH0 + j2 F j + V BS 0 j2 F j (1) where VTH0 denotes the threshold voltage with a zero bulk bias, F is electrostatic potential, is the body coefficient, and V BS is the bulksource bias. Applying a positive V BS bias, the V TH will be increased and the subthreshold leakage current, on the contrary, will be reduced. In order to reduce the subthreshold leakage current as well as preserve the high speed in the proposed design, the high-V t transistors, P31, and P33, in the noncritical precharge path are applied with the reversed bulk-source biases. The variation of the subthreshold leakage current of the high-V t pMOS transistors with reversed bulk-source biases in the dual-Vt domino logic circuit is simulated. In the simulation, the bulk-source junctions of both high-V t transistors P31 and P33 are reversed with a 1.2 V, i.e., VB = 3 V. Meanwhile, the low-Vt evaluation block is set to be "pass," and the domino logic circuit is operated at the evaluation phase (clock = 1). that of P31 and P33 with 0 V VBS . Notably, though the IBS of P31 and P33 with reversed bulk-source biases will be increased, it still can be ignored when compared with the subshreshold leakage current.
D. PLA-Styled 8-bit CLA Design
If the propagate signals (P i ) and the generate signals (G i ) of a CLA are produced by combinatorial logic function blocks before they are fed into the function blocks for S i 's and C i 's, then the Boolean equations of Si's and Ci's imply that a two-level AND-OR logic function block is a possible solution to achieve high speed operations. Thus, the PLAstyled design is suitable for such a function block. A conceptual PLAstyled design for CLA is shown in Fig. 4 . A typical PLA consists of an AND array and an OR array. It is well known that the series nMOS in the evaluation block of NAND or AND gates will produce long discharging delays which subsequently slow down the entire circuit. We can take advantage of the non-inverting feature of the domino logic to utilize a NOT-OR-NOT-OR configuration instead of the typical AND-OR style, where the two OR planes are made of the modified dual-Vt domino logic circuits as shown in Fig. 3 . Meanwhile, it can also minimize the series transistor count in the low-V t evaluation block. The OR array is made of the modified dual-Vt domino logic with a predefined low-Vt evaluation block. The inputs to the first OR array is the inverted P i 's and G i 's signals which are also produced by other modified dual-V t domino logic units as shown in Fig. 5 . Notably, we define the propagate signals in a different way from the traditional P i = A i + B i , because the Pi = Ai Bi can be reused to generate the sum term, i.e., Si.
E. Cycle-Based Operation and Area Analysis 1) Cycle-Based Operation:
The critical path of an adder resides on the generation of carry signals, i.e., C8 in the 8-bit adder. After the binary operands are ready, the generation of P i 's and G i 's by using the modified dual-V t domino logic takes the high half of a full cycle.
That is, the results of GP-blocks will be ready when the clock is low.
The inverted P i 's and G i 's will then be fed into the first OR plane of the modified dual-Vt domino-based PLA. The inverted outputs of the first OR plane will be presented to the second OR at the high half of the second cycle. The final C i 's results then are ready in the low half of the second cycle. Right after the generation of every Ci's, they are inverted and fed into the S i 's function blocks. Another half cycle then is required to produce all of the Si's. The final result will be latched after two cycles as shown in Fig. 6 . 2) Area: The transistor count of the PLA-styled implementation for CLA using all-N-transistor (ANT) logic, an analytic form has been derived in [10] . By the similar derivation method, the number of the total transistors required to implement the proposed n-bit CLA with PLA-styled design using the modified dual-V t domino logic is as follows
(n + 1)(n + 2)(n + 3) + 9 2 n(n + 1) + 48n + 9: (2)
For instance, for an 8-bit adder using our proposed design, the overall transistor count is 882.
III. SIMULATIONS AND IMPLEMENTATION
To reveal the power-saving advantage of the proposed low power design, two 8-bit CLAs are, respectively, implemented by the modified single-V t domino logic and the modified dual-V t domino logic using the same CMOS process. The detailed schematic and die photo of the two CLAs implemented by Taiwan Semiconductor Manufacturing Company (TSMC) 0.18-m 1P6M CMOS process shown in Figs. 7 and 8 , respectively. The proposed CLA using the modified dual-V t domino logic passes all model (FF, TT, and SS), and temperature (0 C 75 C) corner simulations. A post-layout simulation example is shown in Fig. 9 , which is operated in 1 GHz given a worst case as SS model, 75
C. It illustrates that the result of an addition appears after two clock cycles. Fig. 10 shows the waveforms of Operand Length Ratio = Bit length of the prior design Bit length of the proposed design Voltage Scaling = ( of the prior design \ of the proposed design) Energy Operation Scaling = (Process of the prior design/Process of the proposed design) Scaled Power = Avg. Power/Power Reduction the modified dual-Vt domino logic CLA measured by Agilent 93000 system-on-a-chip (SOC) test system. The characteristics of the proposed low power CLA is tabulated in Table II . The power consumption of both CLAs is summarized in Table III given the same randomly generated input sequences. A power consumption comparison with several prior works is shown in Table IV . It is obvious that the proposed design possesses the least power consumption.
IV. CONCLUSION
We propose a low-power high-speed PLA-styled dual-Vt domino logic design for adder implementation. A modified dual-V t domino logic circuit is used for pipelining structure and the unnecessary power consumption is avoided. Not only is the correctness of the function in the gigahertz range preserved, the power dissipation is also reduced. The PLA-styled dual-V t domino logic structure using only one clock makes the result of an 8-bit adder appear in two cycles. The proposed design can be easily expanded to a hierarchical 64-bit adder such that the result will be attained in four cycles.
