Abstract-The large subthreshold leakage current of static CMOS logic circuits designed in modern nanometer-scale technologies is one of the main barriers for implementing ultralow power digital systems. Subthreshold source-coupled logic (STSCL) circuits are based on an NMOS differential pair that is switching a constant tail bias current between the two output branches while biased at very low current levels. The power consumption of each STSCL gate depends on the tail bias current that can be controlled very well even for current levels in the range of few tens of pico-Amperes. The precise control on the power consumption of each gate, makes this topology very attractive for ultra-low power applications, where the power consumption of conventional static CMOS system is practically limited by the subthreshold leakage current. In this work, an analytical approach supported by simulation and measurement results will be presented to study the main issues in design of ultra-low power static CMOS and STSCL systems.
Abstract-The large subthreshold leakage current of static CMOS logic circuits designed in modern nanometer-scale technologies is one of the main barriers for implementing ultralow power digital systems. Subthreshold source-coupled logic (STSCL) circuits are based on an NMOS differential pair that is switching a constant tail bias current between the two output branches while biased at very low current levels. The power consumption of each STSCL gate depends on the tail bias current that can be controlled very well even for current levels in the range of few tens of pico-Amperes. The precise control on the power consumption of each gate, makes this topology very attractive for ultra-low power applications, where the power consumption of conventional static CMOS system is practically limited by the subthreshold leakage current. In this work, an analytical approach supported by simulation and measurement results will be presented to study the main issues in design of ultra-low power static CMOS and STSCL systems.
I. INTRODUCTION
The energy consumption of static CMOS circuits can be minimized effectively by reducing the supply voltage. In must of the nano-meter scale applications where the power consumption is crucial, the supply voltage is generally reduced below the threshold voltage of MOS devices [1] . However, presence of different leakage current sources (mainly subthreshold leakage current) in CMOS gates, prevents continuous reduction of power consumption by reducing the supply voltage [2] . Indeed, in low supply voltages the ratio of the on current of the logic gate (I ON ) to the leakage current (I OF F ) becomes very small and the power efficiency as well as reliability of the static CMOS topology diminishes. Using devices with high threshold voltage (HVT) is a remedy to control the subthreshold leakage current. The main issue associated with using high threshold voltage devices is that in this case the delay times tend to increase significantly, and as a result the minimum supply voltage needs to be increased. High enough supply voltage is mainly important in design of sequential logic circuits such as flip-flops or memory cells. Therefore, one might use different types of devices for different purposes.
In addition to the tight tradeoffs among power dissipation (P diss ), speed of operation (f op ), and supply voltage (V DD ), very wide process variations of the device characteristics should be also considered, especially for nanometer-scale technologies [1] . Meanwhile, exponential dependence of the operation frequency and power consumption to the supply voltage in subthreshold regime, requires a very careful control on V DD [3] . In addition to the precise control on V DD , the supply system needs to be robust enough against very large current spikes. The design of this type of controlling systems become more critical in battery operated systems where the power budget is very restricted and also battery voltage reduces by time.
Recently, subthreshold source-coupled logic (STSCL) circuits have been proposed for implementing ultra-low power applications [4] , [5] . The proposed topology is based on a differential NMOS switching network performing the logic operation and also very compact and high value load resistances. The current consumption of each cell can be controlled through the tail bias transistor very accurately. The precise control on gate current consumption, provides the opportunity to reduce the bias current of each cell well below the subthreshold leakage current of CMOS logic circuits.
Section II provides a short overview on STSCL topology. The main focus of this article is comparing the performance of CMOS and STSCL topologies based on analysis provided in Sections II and III which is summarized in Section IV.
II. PERFORMANCE ANALYSIS OF SUBTHRESHOLD SOURCE-COUPLED LOGIC
In this Section, after a short overview on source-coupled logic (SCL) and subthreshold SCL (STSCL) circuits [4] , the main constraint in the design of STSCL circuits operating with ultra-low power consumption will be studied [5] .
A. STSCL Topology Figure 1 shows the topology of a subthreshold SCL circuit [4] . In this topology, all transistors are biased in subthreshold regime. To obtain a successful Boolean operation, the voltage swing at the input and output of this circuit should be V SW > 4 · n n U T [6] (n n is the subthreshold slope factor of the NMOS differential pair devices, and U T = kT /q is the thermal voltage (k is Boltzmann's constant, T is the junction temperature in Kelvin, and q stands for elementary charge). Satisfying this constraint, the circuit shown in Fig. 1 will show also enough gain for successful logic operation [5] . To provide the required voltage swing at very low tail bias current values (I SS ), very high value load resistances are required (R L = V SW /I SS ). This load resistance should occupy a very small area with a very good controllability to be able to adjust its resistivity with respect to the tail bias current. In Fig. 1 , Fig. 1 . Subthreshold CMOS SCL buffer (inverter) circuit schematic [5] .
PMOS transistors with shorted drain-bulk terminals have been used to implement the proposed high resistance load devices. Using small size PMOS devices, this structure can be used to implement very high value resistances with a relatively high voltage swing at the output. A replica bias circuit can be used to control the load resistivity and hence adjust the output voltage swing with respect to the tail bias current [5] .
B. Power-Speed Tradeoff in STSCL
In contrast to the CMOS gates in which there is no static power consumption (neglecting the leakage current), each STSCL gate draws a constant bias current of I SS from the supply source [ Fig. 1] . Therefore, the power consumption of each STSCL gate can be calculated by
Meanwhile, the time constant at the output node of each STSCL gate, i.e.
is the main speed limiting factor in this topology (C L is the total output loading capacitance). Based on (2), one can choose the proper I SS value to be able to operate in the desired frequency. Regarding (1), it can be concluded that the power consumption is constant and independent of the operation frequency. Therefore, it is necessary to always operate the STSCL circuits at their maximum activity rate to achieve the maximum achievable efficiency. It is also noticeable that the gate delay does not depend on supply voltage while it depends on tail bias current linearly. This property can be exploited for applications where supply can vary during the operation. Based on (1) and (2), power-delay product (PDP) of each gate can be approximately calculated by
which is directly proportional to the supply voltage, voltage swing at the output of the gate, and the total load capacitance.
To have a better understanding of the power-speed tradeoff in STSCL configuration, consider a simple STSCL circuit constructed of N cascaded identical gates (indeed, N is the logic depth) that is operating at a frequency of f op . Using (1) and (2), it can be shown that the total power consumption of the chain will be:
which is quadratically increasing with logic depth and linearly with the operation frequency. It is important to note that the speed of operation in STSCL configuration does not depend on device threshold voltage. Indeed, since the devices are biased in weak inversion, it is possible to use high threshold voltage (HVT) devices without affecting the speed of operation. The minimum supply voltage of a STSCL gate is: V DD,min = V CS + V GS1 in which V CS is the required headroom for the current source. Since all the devices are in subthreshold, therefore [7] . Notice that for a complete switching, V GS,1 should be always larger than V SW (V GS,1 > V SW ). Therefore, assuming V SW ≈ 6U T , the minimum supply voltage will be:
which is again independent of device threshold voltage. Therefore, using low threshold voltage transistors does not always help to reduce the supply voltage, especially for the STSCL topology.
C. Improving Power-Speed Performance
Some techniques can be employed to improve the performance of STSCL circuits in terms of power consumption and speed of operation. A simple approach, is using source follower buffers at the output of each gate [8] . This technique helps to reduce the circuit PDP by a factor of about two. Using fine-grained pipelining is one other approach that can reduce the system PDP by a factor of about N/2 [5] . In addition to the proposed techniques, using compound STSCL gates using stacked NMOS switching network can improve the system performance even further [5] .
III. PERFORMANCE ANALYSIS OF CMOS LOGIC CIRCUITS
Static CMOS topology has been widely used for implementing digital systems for different applications and different specifications [9] . The main focus in this Section is studying the performance of CMOS topology for ultra-low power applications and developing proper concept for comparing the performance with STSCL topology.
A. Power-Speed Tradeoff in CMOS
Developing a general model for analysis of P diss and f op of CMOS topology is not straightforward. The complex dynamic behavior of MOS devices especially when interacting with other CMOS gates, makes it very difficult to have a closed form model [9] . In this Section, we will develop an approximate model for a simple test structure sufficiently accurate for behavior studying and comparison purposes. As shown in (4), one can simply calculate the required power consumption of a chain of N STSCL gates operating in frequency f op . Similar to that case, consider a chain of identical CMOS gates. For simplicity, we are assuming the the transition time of the input signal is equal to the inherent transition time of each CMOS gate at the specified supply voltage that the circuit is operating. Figure 2(a) illustrates the proposed test structure and Fig. 2(b) depicts the simplified waveform of the current drawn from supply source by a single gate. The peak current (I peak ) and leakage current (I leak ), both are depending on V DD and the size of devices. Based on Fig.  2(b) , the total rms power consumption of the circuit will be:
where, α = f op /f M ax represents the activity rate of the proposed circuit, f M ax = 1/(2t d ) is the maximum operation frequency of a single gate, γ = I peak /I leak , and:
Here, η is used to take it into account that the supply current only depends on the current that is used for charging the load capacitances. As it was expected, the minimum power consumption of the circuit is determined by the leakage current when activity rate is zero (α=0). At higher operating frequencies where the dynamic power consumption becomes dominant, the power dissipation is proportional to the square root of the operating frequency. Figure 3 illustrates the power consumption versus speed of operation (or activity rate) as predicted by (6) . By increasing the logic depth, the total power consumption scales us proportionally while the maximum speed of operation reduces by the same factor. Based on (6), for activity rates smaller than
the subthreshold leakage power consumption will be dominant, while for higher activity rates, the dynamic power consumption comprises the main part of power consumption. Since α C is proportional to: 1/γ 2 = (I leak /I peak ) 2 , α C increases quadratically with reducing the γ that means in more advanced CMOS technologies, the contribution of leakage current will be more dominant and α C will be higher.
The maximum operating frequency of a CMOS gate (f max ) can be estimated by:
To complete the calculations, it is necessary to estimate the peak and leakage currents. The EKV model can provide a general expression for drain current of MOS devices operating in different regions and different supply voltages [7] . Based on EKV model, it is possible to calculate the peak and leakage currents in | V GS |=V DD and | V GS |=0V, respectively. Figure 4 depicts the peak and leakage currents for an inverter gate designed in 65nm technology. It is noticeable that the leakage current does not reduce exponentially by reducing the supply voltage when the devices are in subthreshold. This is mainly due to the finite output impedance of the gates. This implies that reducing the supply voltage does not help very much to reduce the inverter leakage current. The other important parameter is γ = I peak /I leak which is an indicator of power efficiency in CMOS topology. While γ ≈ 10 4 for V DD >0.6V, it reduces rapidly by reducing the supply voltage and ultimately it gets close to unity for very low supply voltages. In addition to (6), the EKV model provides the necessary information in order to estimate the power consumption versus speed of operation for CMOS topology.
IV. PERFORMANCE COMPARISON
Using (4) and (6), it is possible to compare the power consumption of two chains of identical gates with logic depth of N that are constructed based on CMOS and STSCL topologies. Based on this comparison, the maximum logic depth for which the STSCL topology exhibits lower power consumption compared to the CMOS topology, is:
where V DD is the supply voltage of CMOS circuit and F depends on supply voltage and voltage swing in STSCL circuit as:
The maximum logic depth for which an STSCL circuit with operating frequency of f op consumes less power compared to its CMOS counterpart, is shown in Fig. 5 . These results have been achieved for CMOS 65nm technology. Based on this figure, for applications that are operating at frequencies below 100kHz, and a V DD of 300mV, the STSCL topology with a logic depth of more than 10 will have an advantage in terms of power dissipation. Figure 6 shows the measurement results for two (8×8) multipliers designed based on CMOS and STSCL topologies. The test circuits are implemented in 0.18µm CMOS technology where the leakage current is much less than CMOS 65nm. As depicted in Fig. 6 , for frequencies below 80kHz, the STSCL topology consumes less power consumption and exhibits less variations due to the process and temperature variations. Less sensitivity to the process variation in STSCL topology can be traced back to (2) . This equation depicts that delay of a STSCL gate does not depend on device parameters such as its threshold voltage. Meanwhile, it is expected that in more advanced technologies where the subthreshold leakage current is more pronounced, the cross-over points in Fig. 6 will move toward higher frequencies.
V. CONCLUSION
An analytical approach for studying and comparing the performance of ultra-low power CMOS and STSCL circuit, has been presented. While there is a tight tradeoff among power consumption, speed of operation, and supply voltage in design of CMOS digital circuits, STSCL topology provides a more convenient design opportunity for ultra-low power applications. It is shown that the frequency range in which STSCL topology exhibits a superior performance compared to the CMOS topology, depends on logic depth and also on the leakage current of CMOS gates.
