Abstract-Certain logic functions such as the control units of VLSI processors are difficult to implement by random logic. Since the programmable logic arrays (PLA's) can implement almost any Boolean function, they have become popular devices in the realization of both combinational and sequential circuits. We present a low-power high-speed complementary-metal-oxide semiconductor (CMOS) circuit implementation of NOR-NOR PLA using a single-phased clock. Buffering static NAND gates are inserted between the NOR planes to erase the racing problem and shorten the duration of glitches such that the dynamic power is reduced in addition to the low static power dissipation, no ground switch, no charge sharing, and zero offset.
I. INTRODUCTION
PLA's can be implemented by either static or dynamic styles. The style is chosen depending on the timing and power strategies. Modern CAD tools are required to support the integration of commonly used single-phased edge-triggered basic elements [1] , including programmable logic arrays (PLA's). Before the discussion of the proposed PLA design, the shortcoming of several PLA design methods are listed as follows.
Pseudo-NMOS [5] : It is the simplest design style to realize PLA's. The main disadvantage of this approach is the dc-path dissipation. In addition, because of the ratioed design the PMOS and NMOS have to be enlarged dually when the pull-up time is critical. Meanwhile, the ratioed design will reduce the speed.
Dynamic NOR-NOR [5] , [4] : The major problem of this type of logic is the racing problem when two dynamic logic gates are cascaded in series. There is a possibility that the output of the first gate wrongly turns the second gate ON or OFF such that the final result is incorrect. Thus, it is necessary to generate a delayed clock for the second gate in order to prevent the racing problem. This will reduce the operation speed. In addition, the ground switch will produce a large parasitic capacitance which certainly reduces the speed.
Domino [5] : In domino-logic design, the gates are all precharged, and connected to the next stage through inverters. Although the SOP domino circuits are excellent with regard to power saving, the serial NMOS's of the front AND plane will cause a large pull-down delay. In addition, the serial NMOS's could cause charge-sharing problems.
Dhong's Design [3] : Dhong et al. proposed a PLA design approach which employs a precharged OR array and a charge-sharing AND array to eliminate the ground switch of the second gate. Since the charge sharing is used, the output voltage V oH can only reach approximately 3.0 V when V dd is 5.0 V. It cannot provide the full swing of the voltage aside from the low-noise margin problem. As well, a delayed clock is needed in order to prevent the racing problem. Capacitors are required in this design in addition to the mentioned shortcomings. This implies large area consumption. Blair's Design [2] : Blair replaced the usual AND plane with a predischarging pseudo-NMOS NOR plane in order to shorten the series NMOS transistors in the evaluation block. The PMOS load transistor is constrained by the sizing ratio such that it is hard to drive a large capacitance load and the speed is reduced. In addition, the static power is increased during the evaluation period of the clock. (Notably, ratioed designs will reduce speed.)
We consider the combination of dynamic, pseudo-N, and dominologic design styles to develop a low-power and high-speed design for PLA's using only one clock. The basic concept is to insert a buffering NAND gate between two NOR planes in order to eliminate the ground switch and reduce the duration of dynamic power spikes to avoid racing problems.
II. LOW-POWER AND HIGH-SPEED SINGLE-CLOCK PLA DESIGN

A. Low-Power and High-Speed (LP-HS) PLA Circuit
Referring to Fig. 1 , all of the inputs of the first NOR plane are ANDed with the clock signal by using the triggered one-bit decoder [3] clk before they are fed into the gates of the evaluation block.
Thus, in the precharging duration of the clock clk = 0 the node p will be charged to high. A NAND gate is utilized as the buffer. The advantage of this NAND gate, as shown in Fig. 1 , is to precharge node q to high while to predischarging node r to ground to prevent the racing problem.
When the clock turns high, clk = 1, the input is fed through the triggered one-bit decoder to the NMOS transistors in the evaluation block. In the meantime, the buffering NAND gate turns into an inverter. If the pull-down NMOS network resolves high, the node p is discharged, which keeps q and r, respectively, high and low. The state of output s remained unchanged. If the pull-down NMOS network resolves low, the node p remains high, which in turn flips the states of q and r, respectively, to be low and high. The state of output s then is grounded.
B. Analysis of Speed and Power
Speed: The speed of the dynamic-style PLA depends on the discharging speed of nodes p and s: The buffering NAND gate helps to charge node q to be high during the precharging duration. And r is low to turn off the pull-down NMOS network of the second gate before the evaluation phase. Thus, there is no racing problem so that the delayed clock can be eliminated to improve the speed.
Power: Because of the triggered one-bit decoder and the buffering NAND gate, there is no dc path from V dd to ground. The most important factor regarding power dissipation is that the buffering NAND gate can statistically reduce the probability of switching activity in the PLA. Referring to Table I, the switching activities of our design and NOR-NOR PLA are, respectively, tabulated.
Note that the switching activity of other PLA's other than domino PLA is the same as that of NOR-NOR PLA. According to Table I, The above result predicts the power cost of our PLA decreases as the number of inputs increases. In contrast, the power cost of other NOR-NOR style PLA's increases as the number of inputs increases. 
C. Area Overhead
The total area overhead is (3n + 2m 0 ground switches 0 delayed clock circuits) transistors where n is the number of inputs and m is the number of minterms. 3n results from the triggered one-bit decoder with three transistors, while 2m results from using a buffering NAND gate to replace the traditional buffering inverter.
III. SIMULATION AND ANALYSIS
Speed (Delay) Simulations:
In order to verify the proposed lowpower high-speed PLA configuration, we conduct a series of different PLA simulations to compare with other PLA designs as shown in Fig. 2 . Different PLA designs are implemented by TSMC 0.6 m SPTM technology with PMOS (w=l = 2:25=0:6) and NMOS (w=l = 0:9=0:6) except that the PMOS load used in pseudo-NMOS PLA and Blair's PLA is ratioed to be w=l = 0:9=1:2: Fig. 3 shows the timing responses of these PLA configurations. To effect a comparison, the output load of the first planes of the PLA's is assumed to be 0.5 pF, that of the ground switch is assumed to be 1.0 pF, the load of the buffers is assumed to be 1.0 pF, and the output load of these PLA's is set to be 1.0 pF. The waveforms in Fig. 3 are simulated by CADENCE and HSPICE tools with V dd = 5:0 V. The average delay of these PLA's are tabulated in Table II . The delay is measured from 2.5 V of the input voltage to 50% of output voltage.
Our proposed PLA is the fastest circuit among all of the PLA design approaches. Notably, Dhong's design is a normally low operation which is different from the other designs. During the precharge period the output of Dhong's is low. Thus, the critical delay of Dhong's design is the rising edge delay, which is 22.9 ns, instead of the falling edge delay.
Power Dissipation Simulations: As for the power consumption comparison, we also conduct a series of simulations which employ the Monte Carlo method of HSPICE. The number of sweeps is 30, and the signal frequency is 1.67 MHz (clock period = 600 ns). The power dissipation results are tabulated in Table III. The proposed PLA produces the least power consumption among these PLA design approaches other than the domino PLA. These results correspond to what we expect regarding dynamic power consumption when n increases. As for the comparison between our PLA and the domino PLA, although the domino PLA consumes less power when n increases, its pull-down delay will become longer and longer owing to the fact that the number of serial NMOS's in the evaluation block increases. If we consider the power-delay product as a measure, Table IV reveals the superiority of our PLA design.
IV. CONCLUSION
In short, pseudo-NMOS PLA and Blair's PLA are ratioed design and dissipate DC power; NOR-NOR PLA and Dhong's PLA need delayed clock; domino's PLA has serial NMOS's AND gate. They all have their individual problems. The proposed PLA configuration, using one NAND gate between the product line and output line instead of one inverter, can eliminate the ground switch. It also keeps the inputs of the second plane at low before the evaluation phase to prevent the racing problem and the usage of delayed clocks. Thus, the speed is enhanced. The buffering NAND gate also reduces the switching probability such that the dynamic power consumption consequently becomes much smaller. This approach makes PLA low-power and highspeed possible. Its performance is also verified by the simulations.
I. INTRODUCTION
BiCMOS technologies are emerging as the next generation techniques for digital VLSI circuits [1] , [2] . They also can be a viable FIG. 1 approach for analog circuits to improve system performance by combining both bipolar and CMOS technologies [3] , [4] . Moreover, the trend toward higher device densities per unit chip area requires short channel length devices and, consequently, lower supply voltages in the VLSI chip. Thus, it is desirable to develop an analog integrated circuit suitable for low supply voltages. Multipliers [5] , [6] are very important building blocks in many applications, such as adaptive filters, frequency doublers, and modulators. Some BiCMOS multipliers [7] - [9] have been presented, but few of them are suitable for low supply voltages. The triode-based multiplier can provide higher linearity and a smaller supply voltage [10] . In this paper, a new low-voltage BiCMOS four-quadrant multiplier using trioderegion transistors is presented. It provides the advantage over the circuits [7] , [9] which require some additional control circuitry to achieve the same goal. Experimental results are given to verify the theoretical analysis.
II. CIRCUIT DESCRIPTION
The proposed BiCMOS four-quadrant multiplier is shown in Fig. 1 
1057-7122/99$10.00 © 1999 IEEE
