Abstract-Noise in deep submicron technology combined with the move toward dynamic circuit techniques have raised concerns about reliability and energy efficiency of VLSI systems in the deep submicron era. To address this problem, a new noise-tolerant dynamic circuit technique is presented. The average noise threshold energy (ANTE) and the energy normalized ANTE (NANTE) metrics are proposed to quantify the noise immunity and energy efficiency, respectively. Simulation results in 0.35-m CMOS for NAND gate and full-adder designs indicate that the proposed technique improves the ANTE and NANTE by 2 and 1.4 over conventional domino circuits. The improvement in the NANTE is 11% higher than the existing noise-tolerance techniques. Furthermore, the proposed technique has a smaller area overhead (36%) as compared to static circuits whose area overhead is 60%. Also presented in this paper is an ASIC developed in 0.35-m CMOS to evaluate the performance of the proposed technique. Experimental results demonstrate a 27% average improvement in noise immunity over conventional dynamic circuits.
IV. EXPERIMENTAL RESULTS
A micrograph of the filter implementation is shown in Fig. 6 . Its core area is 1.1 2 1.4 mm 2 . The chip is fully functional for clock frequencies up to 20 MHz, while powered from a 1-V power supply. The average energy consumption for a low pass filter configuration was measured to be 330 pJ per biquad section. The energy of the adders and registers dominates the total dissipation (58%) and the interconnects are responsible for an additional 25%.
The leakage current was found to contribute 8% to the total energy consumption of the full 32nd-order filter configuration. Based on the total leakage path width, the contribution of the memory blocks to the total leakage dissipation is approximately three times greater than that of other parts of the filter circuit. The leakage current would exhaust a typical 1-V 30-mAh battery in 193 days if the filter were held inactive. In the full 32nd-order configuration, the filter would run on the same battery for approximately 11 days.
V. CONCLUSION
This brief has addressed the issue of implementation of low-power low-voltage DSP systems in low V t CMOS processes. An architectural approach that minimizes leakage dissipation was adopted. Minimization of the overall computational dissipation was attempted for the chosen architecture. Energy consumption properties of multiplexers, latches, and registers were highlighted, and some energy-saving solutions proposed. The observations made about dissipation in multiplexer-latch combination and register glitching effect are quite general and apply to most DSP datapaths. Probabilistic analysis of leakage paths in SRAM blocks was performed, demonstrating the possibility for reduction of leakage current across SRAM busses.
The experimental results have revealed that a single low-threshold CMOS process is a viable implementation solution in cases when the processing element can be reused many times within one sampling period, allowing the high ratio of the memory circuit size to the processing element circuit size. In such cases, the dominant source of leakage dissipation is RAM, while the dominant source of switching dissipation is the processing element. Our design has shown that this condition can be easily met for relatively low sampling rates such as those of audio filtering applications.
I. INTRODUCTION
Technology scaling combined with aggressive design practices have made deep submicron noise a major issue that limits the reliability and integrity of high performance ICs [1] , [2] . While static circuits are deemed robust to noise, the need for high-speed and low-power operations has forced IC designers to consider dynamic techniques [3] - [5] for the next generation of high performance VLSI systems.
While dynamic circuits are faster and consume less power than their static counterparts, they are inherently susceptible to noise [2] . For Manuscript received September 1999; revised June 2000. This work was supported by the National Science Foundation under CAREER Award MIP-9623737, Award CCR-000987, and by Intel Corporation. This paper was recommended by Associate Editor E. Friedman.this reason, noise-tolerant dynamic circuit techniques have been developed [6] - [9] . However, these techniques do not explicitly consider energy-efficiency as a design metric of interest. In this paper, we present a noise-tolerant dynamic circuit technique that has better noise immunity, energy efficiency, speed, and area, as compared to existing techniques [7] , [8] . Also presented in this paper is the design of a multiply-accumulate (MAC) ASIC in 0.35-m CMOS. Experimental results further confirm the advantages of the proposed technique over conventional dynamic circuits.
The paper is organized as follows. In Section II, we introduce the existing noise-tolerance techniques [7] , [8] . In Section III, we present the proposed technique and develop the concept of average noise threshold energy (ANTE) to quantify the noise-immunity. Simulation results in 0.35-m CMOS are presented in comparison to the static and conventional dynamic circuits. In Section IV, we describe the design of the MAC ASIC, along with measured results.
II. EXISTING NOISE-TOLERANT DYNAMIC CIRCUIT TECHNIQUES
Noise in VLSI circuits is defined as any disturbance that drives node voltages away from a nominal value. Noise sources that have substantial impact on the performance of digital circuits include ground bounce, IR drop, crosstalk, charge sharing, process variations, charge leakage, alpha particles, electro-magnetic radiation, etc. [1] , [2] .
Dynamic circuits are susceptible to noise due to their low switching threshold voltage V th , defined as the input voltage at which the output changes state. For conventional dynamic circuits, i.e., the domino NAND gate shown in Fig. 1(a) , V th = V tn , where V tn is the threshold voltage of an nMOS transistor. Therefore, one method to improve noise immunity is to increase the switching threshold voltage V th of the gate.
Doing this inevitably sacrifices circuit performance metrics such as speed and power consumption, which are features that make dynamic circuits attractive in the first place. Thus, any noise-tolerance technique should provide substantial improvement in noise-immunity with minimal speed and power penalty. Several techniques have been developed so far to improve the noise immunity of dynamic circuits. In this paper, we mainly compare two techniques: the CMOS inverter technique [7] [see Fig. 1 [8] [see Fig. 1(c) ]. Note that the CMOS inverter technique cannot be used for dynamic OR/NOR gates, since some input logic combinations will short the power supply to ground. On the other hand, the pMOS pull-up technique suffers from a large static power dissipation due to the direct path from the pull-up pMOS to the last nMOS in the network. Therefore, it is not suitable for low-power applications.
(b)] and pMOS pull-up technique
Note that "keeper" transistors, which are utilized mainly to combat charge sharing noise [10] , are usually designed in such a way that the dynamic node switches as soon as the inputs switch. An input noise pulse with sufficient amplitude and duration can easily turn off the keeper transistor and discharge the protected dynamic node. Therefore, the existing noise-tolerance techniques present certain drawbacks and in general are not energy-efficient. Hence, it is of interest to develop energy and throughput efficient noise-tolerant dynamic circuit techniques such as the one described in this paper.
III. MIRROR TECHNIQUE: A NEW NOISE-TOLERANT DYNAMIC CIRCUIT TECHNIQUE
In Section III-A, we present an energy-efficient noise-tolerant dynamic circuit technique referred to as the mirror technique. In order to quantify the noise-immunity and energy penalty incurred in improving noise-immunity, we propose the metrics of ANTE and energy normalized ANTE (NANTE) in Section III-B. Simulation results of NAND gate and full-adder designs in 0.35-m CMOS technology are provided in Section III-C. 
A. Mirror Technique
As shown in Fig. 2(a) , the proposed noise-tolerant dynamic circuit (based on the Schmitt trigger [11] ) requires two identical nMOS evaluation nets. One additional nMOS transistor M1, whose gate voltage is controlled by the signal Vx, provides a conduction path between the common node of the two evaluation nets and V DD . During the precharge phase, clock signal 8 turns M2 on, and voltage V x is charged up to VDD. If the common node voltage V1 = 0V initially, then V1 reaches the value of (V DD 0V tn ). While the lower nMOS net still suffers from input noise which may discharge the common node voltage V1, the switching threshold voltage of the upper nMOS net is increased (due to body effect) as long as V1 is not fully discharged. This enhances the noise-immunity of the gate.
It must be mentioned that the proposed technique does not consume static power. However, there can be a speed penalty if the devices are not resized. The area penalty due to transistor resizing of the proposed technique has been found to be less than, or close to, that of the existing noise-tolerant techniques and static CMOS style. This will be demonstrated in Section III-C.
B. ANTE
Noise pulses must have sufficiently high amplitude and long duration to cause unrecoverable logic errors in dynamic circuits. This fact is embodied in the noise-immunity curves (denoted by C nic ) [12] . Fig. 3 shows two typical noise immunity curves, where all the points on and above the curves represent the noise pulses that will cause logic errors. Obviously, a circuit with a noise-immunity curve given by C nic1 is more robust to noise than the one with Cnic2 as its noise-immunity curve. Note that the vertical asymptote of a noise-immunity curve reflects the best case circuit speed. This is because the noise-immunity curve for, say a NOR gate, is measured when all nMOS pull-down transistors are subject to the input noise, whereas the worst-case delay of the gate is measured with only one nMOS pull-down transistor being on.
For comparison of different noise-tolerance techniques, we propose the ANTE metric, which is defined as the average input noise energy that the circuit can tolerate. Note that each point on the noise-immunity curve represents an amplitude V n and width T n of the input noise pulse that causes logic errors. Defining the pulse energy as being equal to the energy dissipated in a 1 resistor subject to a voltage waveform with amplitude V n and width T n , the ANTE measure is defined as
where E( ) denotes the expectation operator. Clearly, an input noise pulse with amplitude V n V th will turn on the pull-down nMOS transistor and discharge the dynamic node. On the other hand, if V n < V th , subthreshold leakage current can discharge the dynamic node erroneously provided that the noise pulse duration T n is sufficiently long. In order to motivate the ANTE metric further, consider a generic circuit shown in Fig. 4 , where the input noise pulse Vn discharges a node x with voltage V x . The differential equation describing this event is
Cx dV x dt = 0ix:
For the sake of simplicity, we only consider the V n V th case. Assuming the transistor to be in saturation region, the discharging current ix can be expressed as where k n is the nMOS transconductance, i pull0up accounts for the counteracting current if present (such as the current in the keeper). Substituting for i x from (3) into (2) and integrating, we obtain
where 1V is the voltage drop at node x that causes a logic error, and T n is the corresponding time duration of the input noise pulse Vn. Note, 1V is a constant which depends only upon the circuit to which the node x is connected as input. For example, 1V = V th for domino logic, where V th is the switching threshold voltage of the inverter. Considering T n and V n to be random variables, we take the expectation of (4) over the probability distribution of T n and V n to obtain
For any noise distribution with a finite E(T n ), the first two terms on the left side of (5) are constants. In most cases, for speed considerations, i pull0up will be small compared to the current generated by the noise V n . Therefore, a larger ANTE measure in (5) implies that a higher noise pulse amplitude V n , or equivalently larger noise energy, is needed to discharge the dynamic node and cause a logic error. Noise-tolerance techniques provide improved noise-immunity at the expense of area, speed, and power. While noise-immunity curves, such as those in Fig. 3 , and the ANTE measure (1) provide comparisons of noise-immunity, they do not indicate the energy or speed penalty involved. Therefore, we employ the NANTE defined as follows
where " represents the energy dissipated per cycle, as a measure of the energy penalty incurred in improving noise-immunity. Note that " must include all energy components, such as those from the increased fan-in (input) capacitance, static power dissipation, etc. All the comparisons in this paper are based on the circuits with the same speed. Hence, a speed-normalized ANTE metric is not considered. 
C. Simulation Results and Comparisons
In the next, we present the simulation results of NAND gate and fulladder designs in 0.35-m 3.3-V CMOS process.
1) Simulation Results of a NAND Gate: Fig. 2(b) shows the NAND gate implemented by the proposed noise-tolerance technique, while those using the CMOS inverter technique and pMOS pull-up technique are shown in Fig. 1(b) and Fig. 1(c) , respectively. To account for the increased fan-in (input) capacitance in multistage implementation, we simulated three serially connected identical NAND gates and measured the delay of the first two gates. Power consumption averaged over the three gates is compared. Fig. 12 illustrates the output block, where the 22-bit parallel data are converted to three bit-serial outputs. The ASIC is designed and fabri- Fig. 1(a) was designed to meet the specifications 1)-3) . Table I indicates that the proposed technique improves the ANTE and NANTE by 1.842 and 1.422 over the conventional domino circuit in Fig. 1(a) . The improvement in the NANTE is 11% higher than the existing noise-tolerance techniques. In addition, the proposed technique has a smaller area overhead (41%) as compared to the pMOS pull-up technique whose area overhead is 49%. It must be mentioned that while the CMOS inverter technique has similar noiseimmunity as the proposed technique, it cannot be used for designing dynamic OR/NOR gates. Another observation is that the pMOS pull-up technique degrades the NANTE by 36% due to its large static power dissipation. 
2) Simulation Results of a Full Adder:
Performance of the conventional dynamic full adder [see Fig. 6(a) ], the CMOS static full adder (not shown), and the proposed technique [see Fig. 6(b) ] have been studied. Note that the full-adder SUM output cannot be implemented directly by conventional dynamic logic, and thus, is not protected by the proposed technique. Even so, the proposed technique still improves the noise-immunity of the entire MAC by 27%, as shown in Section IV-B.
All the full adders satisfy the following specifications: 1) power supply VDD = 3:3 V; 2) load capacitor C load = 20 fF; and 3) clock cycle f clk = 1 GHz. The switching threshold voltage V th for the CARRY output equals 0.6, 1.65, and 1.8 V for the dynamic full adder, static full adder, and noise-tolerant full adder, respectively. Because the MAC ASIC in Section IV is pipelined at full-adder level, the effect of the increased fan-in (input) capacitance in multistage implementation is not investigated here. Noise-immunity curves in Fig. 7 demonstrate that the proposed technique has better noise-immunity than conventional dynamic circuits and static circuits. Table II also indicates that the proposed technique improves the ANTE and NANTE by 22 and 1.482 over the conventional dynamic full adder. In comparison, the static full adder improves the ANTE by 1.22 but degrades the NANTE by 32%. In addition, the proposed technique has a smaller area overhead (36%), as compared to the static full adder whose area overhead is 60%.
IV. MAC ASIC DESIGN
In this section, we describe the architecture of a MAC ASIC designed in 0.35-m CMOS that employs the conventional dynamic technique and the proposed noise-tolerance technique. Measured results are presented to demonstrate the merits of the proposed technique.
A. Chip Architecture
The chip consists of five functional blocks (see Fig. 8 ): the input block, noise-injection circuits (NICs), dynamic multiplier-accumulator (dynamic MAC), noise-tolerant dynamic multiplier-accumulator (mirror NT MAC), and the output block. Separate power supplies are provided for input and output blocks in order to isolate them from the NICs. In order to operate each MAC in the presence of ground bounce noise generated by its own NIC, we provide the two MACs with independent power supplies, shared by its NIC.
The main functional blocks in the ASIC are the dynamic MAC and mirror NT MAC (see Fig. 9 ). Both MACs are bit-level pipelined unsigned array structure. Pipelining at full-adder level facilitates the de- tection of logic errors because the output D-latch can easily capture an erroneous output. The two MACs have 8-bit inputs and 22-bit outputs, indicating that a 64-tap FIR filter can be programmed. The inputs of two MACs are identical so that any discrepancy at the outputs will be due to the logic errors in the MACs. Fig. 6 shows the transistor-level schematics of the conventional dynamic full adder and noise-tolerant full adder employed in the corresponding MACs. Fig, 10 depicts the block diagram of a NIC for ground bounce noise.
Each NIC contains eight 4-stage super buffers with scale factor = 3.
The number of the external load capacitors connected to each NIC can be adjusted to control the magnitude of the injected ground bounce noise. A 26-tap linear feedback shift register (LFSR) provides pseudorandom input sequences to the super buffers. The control signal EN-ABLE activates the NIC when it is logic high. The input block (see Fig. 11 ) provides data and coefficients to the two MACs. Both the data and coefficients are in bit-serial format to reduce the pin count. The input data can either be read from an external data source or be generated internally by an on-chip linear feedback shift register (LFSR), which provides pseudo-random sequences to minimize data-dependent logic errors during the testing.
B. Experimental Results
We compare the noise-immunity achieved by the mirror NT MAC and dynamic MAC. A general expression for ground bounce noise is [10] L di dt max
where L inductance of the bonding wire;
C load load capacitor; t s gate switching time, which we assume to be approximately twice the gate delay. This is given by 
From (9), the ground bounce noise on power supply increases with VDD. A higher error-free power supply voltage in the presence of ground bounce noise implies better noise-immunity. Hence, we tested the two MACs under different clock speeds and measured the maximum power supply voltage at which errors start appearing at the outputs. The experimental results are shown in Fig. 14 , where we observe that the maximum error-free power supply voltage increases with clock speed. This is because the available discharging time is reduced at a faster clock speed; thus, only those noise pulses with large amplitude can cause logic errors. On the other hand, as seen from (9), a higher power supply voltage will induce ground bounce noise pulses with larger amplitude. We calculate the relative noise-immunity improvement (RNI) from (9), normalized by the corresponding maximum error-free power supply voltages, as 
V. CONCLUSION
In this paper, we have presented a new energy-efficient noise-tolerant dynamic circuit technique and a noise-immunity metric. The proposed technique can significantly improve the noise-immunity with a performance loss much less than the existing noise-tolerance techniques and static circuits. The proposed technique was employed in the design of a 0.35-m CMOS MAC ASIC. The experimental results demonstrate the noise-immunity improvement over conventional dynamic circuits. Future work involves minimizing the performance penalty of the proposed technique and providing flexibility in terms of tuning the noise-immunity.
