Abstract-The performance of subthreshold source-coupled logic (STSCL) circuits for ultra-low-power applications is explored. It is shown that the power consumption of STSCL circuits can be reduced well below the subthreshold leakage current of static CMOS circuits. STSCL circuits exhibit a better power-delay performance compared with their static CMOS counterparts in situations where the leakage current constitutes a significant part of the power dissipation of static CMOS gates. The superior control on power consumption, in addition to the lower sensitivity to the process and supply voltage variations, makes the STSCL topology very suitable for implementing ultra-low-power lowfrequency digital systems in modern nanometer-scale technologies. An analytical approach for comparing the power-delay performance of these two topologies is proposed.
I. INTRODUCTION
T O OPTIMIZE the power consumption of integrated digital CMOS systems, different approaches have been proposed in the literature [1] . These techniques (e.g., multiple threshold voltage devices or various power management techniques [1] - [3] ) can be used to reduce the system power dissipation with respect to the work load.
In ultra-low-power applications, where the power dissipation is a crucial parameter, the supply voltage V DD is generally reduced below the threshold voltage V T of metaloxide-semiconductor (MOS) devices [4] . Reducing the supply voltage or choosing high-threshold-voltage (HVT) devices results in a smaller V eff = V DD − V T value and, hence, less power consumption [2] . However, reducing V eff reduces the ratio of the ON-current of a logic gate I ON to its leakage current I OFF , as shown in Fig. 1(a) . Reduction in γ = I ON /I OFF results in degradation of reliability and power efficiency of the circuit, requiring special design techniques to implement robust logic operations [4] .
The wide variation of circuit characteristics, such as the speed of operation and power dissipation, due to variations of process parameters, supply voltage, and temperature (PVT), is the other important issue in the design of ultra-low-power digital circuits in modern nanometer-scale technologies [5] . The effects of such variations become more evident when the devices are biased in the subthreshold regime. Fig. 1 (a) depicts the variation of γ for different process corner parameters, and Fig. 1(b) shows the variation of drain current versus temperature in different V GS values. As illustrated in Fig. 1(b) , the variation on drain bias current increases by moving the operating point toward the subthreshold regime. Meanwhile, in this regime, the operation frequency and power consumption both exponentially depend on the supply voltage. Therefore, a very accurate control on V DD is required. The design of such high-precision supply voltage control systems, however, becomes more challenging in battery-operated systems, where the power budget is very restricted and the battery voltage reduces with time.
The subthreshold source-coupled logic (STSCL) topology has recently been shown as an alternative approach for implementing ultra-low-power circuits [6] , [7] . The accurate control on the power consumption of each gate makes this topology very suitable for very low bias current operations, where the dissipation of conventional static CMOS circuits is limited by their subthreshold leakage current. Meanwhile, the gate delay in this configuration does not depend on the supply voltage, 1549-7747/$25.00 © 2009 IEEE Fig. 2 . STSCL buffer (inverter) circuit schematic [7] . and hence, there is a very low sensitivity to the supply voltage variations. The performance variation due to the PVT variations is also much less in this type of circuits compared with the static CMOS topology, as will be shown later in this brief.
In this brief, an analytical approach for analyzing and comparing the leakage and power-delay performance of CMOS and STSCL topologies will be presented. In Section II, after a very short introduction on STSCL topologies, the main performance parameters of this topology are analyzed. In Section III, power-speed tradeoffs for the CMOS topology will be studied, and Section IV provides a comparison between the two topologies. Fig. 2 shows the topology of an STSCL circuit [6] . In this topology, the n-channel MOS (NMOS) switching transistors and the p-channel MOS (PMOS) load devices are biased in the subthreshold regime. To execute a Boolean operation, the voltage swing at the input and output of this circuit should be V SW > 4 · n n U T (n n is the subthreshold slope factor of the NMOS differential pair devices, and U T is the thermal voltage). Satisfying this constraint, the circuit shown in Fig. 2 will show also enough gain for a successful logic operation [7] . To provide the required voltage swing at very low tail bias current values I SS , very high valued load resistances are required (R L = V SW /I SS ). The load resistances should occupy a very small area with a very good controllability to be able to adjust their resistivity with respect to their tail bias current. In Fig. 2 , PMOS transistors with shorted drain-bulk terminals have been utilized to implement the proposed high-resistance load devices [6] . Using small-size PMOS devices, this structure can be used to implement very high valued resistances with a relatively high voltage swing at the output. A replica bias circuit can be used to control the resistance of the load devices and, hence, adjust the output voltage swing with respect to the tail bias current [7] . The replica bias circuit will also track the variations on temperature and supply voltage and, hence, compensates their effect on the circuit performance. 
II. PERFORMANCE ANALYSIS OF STSCL

A. STSCL Topology
B. Power-Speed Tradeoff in STSCL Circuits
In contrast to the CMOS gates, where there is no static power consumption (neglecting the leakage current), each STSCL gate draws a constant bias current of I SS from a supply source (Fig. 2) . Therefore, the power consumption of each STSCL gate can be calculated by
Meanwhile, the time constant at the output node of each STSCL gate, i.e.,
is the main speed-limiting factor in this topology (C L is the total output loading capacitance). Based on (2), one can choose the proper I SS value to operate at the desired frequency. Since the power consumption and delay of each gate only depend on I SS , which can very precisely be controlled, this circuit exhibits a very low sensitivity to the process variations. Meanwhile, since the speed of operation in this case does not depend on the threshold voltage of the MOS devices, it is not necessary to use special process options to have low threshold voltage devices, as frequently used for static CMOS. Fig. 3 shows that the gate delay is adjustable in a very wide range proportional to the tail bias current. This figure shows that the tail bias current can be reduced to about 10 pA, where the forward bias current of the source-bulk diode of the PMOS load devices becomes comparable to I SS . Considering (1), it can also be concluded that the power consumption is constant and independent of the operation frequency. Therefore, it is necessary to use the STSCL circuits at their maximum activity rate to achieve the maximum achievable efficiency. It is also important to note that the gate delay does not depend on the supply voltage, whereas it linearly varies with the tail bias current. This property can be exploited for applications in which the supply can vary during the operation.
Based on (1) and (2), the power-delay product (PDP) of each gate can be approximately calculated by
which is directly proportional to the supply voltage, the voltage swing at the output of the gate, and the total load capacitance.
To have a better understanding of the power-speed tradeoff in the STSCL configuration, consider a simple STSCL circuit constructed of N cascaded identical gates (indeed, N is the logic depth) that is operating at a frequency of f op . Using (1) and (2), it can be shown that the total power consumption of this chain will be
which quadratically increases with the logic depth and linearly increases with the operation frequency.
C. Process and Temperature Variation
Considering (4), it can be concluded that the device parameters, particularly the threshold voltage, do not influence the speed-power consumption tradeoff in the STSCL topology. As mentioned before, the replica bias circuit (that is used to generate and adjust the gate voltage of PMOS load devices) will compensate the effect of temperature variations [7] . Therefore, this topology exhibits a very low sensitivity to PVT variations. Fig. 4 shows the simulated gate delay versus load capacitance in different temperatures. It can be seen that the variation on gate delay due to the temperature variations is less than 4%.
D. Minimum Supply Voltage
Since the devices are biased in weak inversion, it is possible to use HVT devices in STSCL circuits without affecting the speed of operation. The minimum supply voltage of an STSCL gate is (Fig. 2 )
where V CS is the required headroom for the current source.
Since all the devices are in subthreshold, therefore, V CS ≥ 4U T . Meanwhile, V GS,1 = V T 0 + n n U T ln I SS /I 0 (V T 0 stands for the threshold voltage of M1-M2, and [8] . Notice that for a complete switching, V GS,1 should be always larger than V SW , i.e., V GS,1 > V SW . Therefore, assuming V SW ≈ 6U T , the minimum supply voltage will be
Measurements show that it is possible to reduce the supply voltage of an (8 × 8) multiplier implemented based on the STSCL topology down to 350 mV [7] .
III. PERFORMANCE ANALYSIS OF CMOS LOGIC CIRCUITS
The required power consumption of a chain of N STSCL gates operating at a frequency of f op was calculated in (4). Similar to that case, consider a chain of identical CMOS gates. Fig. 5(a) illustrates the proposed test structure, and Fig. 5(b) depicts the simplified waveform of the current drawn from a supply source by a single gate. The peak current I peak and leakage current I leak drawn from supply by the proposed logic cell both depend on V DD and the size ratio of devices. Meanwhile, I peak depends on the transition time at the input of the proposed gate. To simplify the calculations, we are assuming that the transition time at the input of each gate is comparable to the intrinsic transition time at the output of that gate when it drives C L . This assumption is very close to reality when the logic depth is high. With this constraint, I peak will only depend on V DD .
The root-mean-square (RMS) power consumption of this circuit shown in Fig. 5(a) can be calculated by
Considering the simplified waveform of Fig. 5(b) for the supply current, the total RMS power consumption of the circuit will be
where α = f op /f max represents the activity rate, f max = 1/(2t d ) is the maximum operation frequency of a single gate,
Here, η is used to take into account that the supply current only depends on the current that is used for charging the load capacitances. As expected, the minimum power consumption of the circuit is determined by the leakage current when the activity rate is very low (α ≈ 0). Fig. 6 illustrates the power consumption versus speed of operation (or activity rate) as predicted by (8) . By increasing the logic depth, the total power consumption proportionally scales up, whereas the maximum speed of operation reduces by the same factor. Based on (8) , it can be found that for activity rates smaller than a "critical activity rate" α C , which is given by
the subthreshold leakage power consumption will be dominant, whereas for higher activity rates, the dynamic power consumption comprises the main part of the power consumption. Since α C is proportional to 1/γ 2 = (I leak /I peak ) 2 , α C quadratically increases with reducing γ. This means that in more advanced CMOS technologies, the contribution of the leakage current will be more evident, and α C will be higher. As illustrated in Fig. 7 , α C considerably increases by moving toward technologies with smaller feature sizes. While in a 0.18-μm CMOS technology, α C ≈ 10 −4 for V DD = 0.2 V, it increases by almost four orders of magnitude in a 65-nm CMOS technology at the same supply voltage (for N = 10).
Based on Fig. 5(b) , the maximum operating frequency of a CMOS gate f max can be estimated by
Although this is a simplified relationship, it can predict f max with good accuracy. To complete the calculations, it is necessary to estimate the peak and leakage currents. The Enz-Krummenacher-Vittoz (EKV) model can provide a general expression for the drain current of MOS devices operating in different regions and different supply voltages [8] . Using the EKV model, it is possible to calculate the peak and leakage currents in V GS = V DD and V GS = 0 V, respectively. Fig. 8 depicts the peak and leakage currents for a CMOS inverter gate designed in a 65-nm technology. It is noticeable that the leakage current does not exponentially reduce by reducing the supply voltage when the devices operate in subthreshold. This implies that reducing the supply voltage does not help very much in reducing the leakage current. The other important parameter is γ = I peak /I leak , which is an indicator of the power efficiency in the CMOS topology. While γ ≈ 10 4 for V DD > 0.6 V, it rapidly reduces by reducing the supply voltage, and ultimately, it gets close to unity for very low supply voltages. In addition to (8) , the EKV model provides the necessary information to estimate the power consumption versus speed of operation for the CMOS topology.
The analysis done in this section does not depend on the type of logic cell used in the test structure shown in Fig. 5 , and it is sufficient to use the I peak and I leak values corresponding to the proposed logic circuit to complete the discussion.
IV. PERFORMANCE COMPARISON
Using (4) and (8), it is possible to compare the power consumption of two chains of identical gates with a logic depth of N that are constructed based on CMOS and STSCL topologies. Based on this comparison, the maximum logic depth for which the STSCL topology exhibits lower power consumption compared with the CMOS topology is
where V DD is the supply voltage of the CMOS circuit. Fig. 9 compares the power consumption of CMOS and STSCL XOR gates for a logic depth of 20 as a function of the operation frequency based on simulation results in 65-nm CMOS. It can clearly be seen that the power consumption of CMOS gates cannot be reduced beyond a certain level due to leakage (both for low-threshold-voltage (LVT) and HVT cases), whereas the STSCL topology offers smaller power consumption below the crossover frequency.
The maximum logic depth for which an STSCL circuit with an operating frequency of f op consumes less power compared with its CMOS counterpart is shown in Fig. 9(b) for a 65-nm CMOS technology. The comparison has been made for XOR gates, and the simulation results have been depicted for both HVT and LVT devices. As expected, increasing the logic depth reduces the efficiency of the STSCL topology. However, for low supply voltages or at low operation frequencies, where the leakage current is more evident, STSCL starts to exhibit better performance. This can be also concluded from (11). On the other hand, Fig. 9 and (11) imply that as operation frequency reduces, N max increases, and hence, the power efficiency of STSCL will increase in comparison with CMOS. In other words, in nanometer-scale technologies, where the subthreshold leakage current in the CMOS topology is more evident, the STSCL topology can offer a more power-efficient solution, even at low activity rates (or, equivalently, for higher logic depths). This is in addition to the superior power-delay performance of the SCL topology at very high activity rates [7] . Fig. 9 (b) also shows that with HVT devices, the power efficiency of the CMOS topology improves. However, the main issue with HVT devices is that they cannot be used in very low supply voltages mainly because of reliability issues. Fig. 10 shows the measurement results for two (8 × 8) array multipliers designed based on CMOS and STSCL topologies. The test circuits are implemented in a 0.18-μm CMOS technology, where the leakage current is much less than 65-nm CMOS. As depicted in Fig. 10 , for frequencies below 80 kHz, the STSCL topology consumes less power and exhibits less variations due to the process and temperature differences. As predicted in Fig. 9 , it is expected that in more advanced technologies, the crossover frequency increases.
V. CONCLUSION
An analytical approach for studying and comparing the performance of ultra-low-power CMOS and STSCL circuits has been presented. While there is a tight tradeoff among the power consumption, speed of operation, and supply voltage in the design of CMOS digital circuits, the STSCL topology provides a more flexible design option for ultra-low-power applications. The frequency range in which the STSCL topology exhibits a superior performance over the static CMOS topology depends on the logic depth and the leakage current in CMOS circuits. While the STSCL topology occupies more area and the supply voltage cannot be reduced below 10U T , this topology can successfully be utilized for reducing the power consumption of digital systems well below the levels limited by the CMOS subthreshold leakage current when the circuit operates in low frequencies.
