Abstract-This article explores the effect of using source follower buffers (SFB) at the output of source coupled logic (SCL) circuits. This technique can help to improve the power-delay product (PDP) of an SCL gate approximately by a factor of two. The proposed approach has been applied to improve the PDP in sub-threshold SCL circuits that have been developed for ultralow power applications. Designed in conventional digital 0.18μm CMOS technology, the proposed SCL gate utilizing SFB at the output achieves a PDP of 0.5fJ/fF/gate while the gate draws 10nA from a 0.6V supply voltage.
I. INTRODUCTION
The demand for implementing ultra-low power complex digital circuits has made the design of logic circuits in sub-threshold regime very attractive for many modern applications. As it is shown in [1] - [4] , it is possible to bias the CMOS logic circuits in sub-threshold regime to achieve a very low power consumption. In this approach, it is possible to reduce the power consumption by reducing the supply voltage. However, the supply voltage can not be reduced arbitrarily: noise margin and operational requirements of various logic circuits (such as memory cells, flip flops, etc.) impose a lower limit on the supply voltage and thus on the power consumption of the CMOS logic circuits [1] , [2] .
On the other hand, in the source coupled logic (SCL) circuits the power consumption is controlled by the tail bias current and the circuit speed is almost independent of the supply voltage. Theoretically, this property can be exploited to reduce the power consumption of SCL-based logic gates to levels below what can be achieved with CMOS logic circuits. However, the design of sub-threshold SCL circuits requires special design techniques to get the desired performance particularly when the bias current of each gate is as low as tens of pico-Amperes [5] . This paper proposes a simple technique to improve the power-delay product of the SCL circuits biased in deep sub-threshold regime. To make the performance of SCL gates comparable to their CMOS counterparts, it is necessary to reduce the PDP of this type of circuits as much as possible. Using more complex logic networks in SCL gates with several stacked differential pairs is one approach to reduce the PDP.
Section II very briefly describes the design technique applied for implementing sub-threshold SCL circuits. Section III explores the effect of utilizing source follower buffers (SFBs) at the output of SCL gates.
II. SUB-THRESHOLD SCL
SCL circuits are based on a simple operation principle: the tail current I SS is switched between two branches of a differential circuit as illustrated in Fig. 1 . The switched current is converted to voltage via the load resistances, creating the necessary logic levels to drive the following logic stages. This conversion is the main speed limiting factor in SCL circuits.
For a given tail bias current I SS , the load resistance depends on the desired output voltage swing, V SW , as R L = V SW /I SS . Therefore, in sub-threshold regime, when the bias current is in nano-Ampere range (or even less), the load resistance would be in the range of hundreds of MΩ which makes the area efficient implementation of this circuit very difficult. Some techniques for implementing very high value resisors in CMOS technology have been reported [6] . However, in the proposed application it is necessary to have a very good control on the resistance value. Meanwhile, the Fig. 1 . A conventional SCL-based buffer stage and the corresponding replica bias circuit load device should be very small and hence exhibit very low parasitic capacitive loading at the output.
As shown in [5] , it is possible to use a pMOS device with the bulk shorted to its drain to implement the proposed load resistance with a very small area. Since in this configuration V BD = 0V , the threshold voltage of the device depends on its drain voltage. By reducing the drain voltage of the proposed load device, the absolute value of the threshold voltage of this pMOS device reduces and hence the device current (I SD )increases. Therefore, this technique can be applied to implement the desired load resistance. The resistivity of this pMOS load, which is illustrated in Fig. 2 , can be controlled through its gate voltage. The load device can be realized with very small size transistors, reducing the area overhead of the load device significantly. A replica bias circuit, similar to the one shown in Fig. 1 , can be used to adjust this voltage and hence set the output voltage swing to the desired value [7] .
As the input differential pair transistors are in sub-threshold regime, theoretically the minimum required voltage swing would be: V SW,min 4−6U T (U T = kT /q, k is Boltzmann's constant, T is temperature, and q is unit charge). Since all the devices are in sub-threshold regime and also the required voltage swing does not depend on bias current, this circuit can be applied for a very wide tail bias current range without the need for changing the size of devices. Simulations confirmed by measurement show that the proposed sub-threshold SCL circuit is operational for 30pA < I SS < 200nA [5] . As the maximum frequency of operation in this type of circuits is proportional to the bias current [7] , [8] , then based on the maximum required frequency of operation, the circuit power consumption can simply be adjusted by tuning the bias current accordingly.
III. UTILIZING SOURCE FOLLOWER BUFFERS

A. Speed Limitation in SCL Circuits
In [7] - [10] some analytical approaches have been proposed for optimized designing of SCL circuits which can also be applied in sub-threshold SCL circuits. It can be shown that the time constant at the output node (which is the main source for limiting the speed) depends on voltage swing and tail bias current as:
Hence, it is expected that by reducing the tail bias current the speed of operation also reduces. Therefore, to have a better speed for a given tail bias current, output voltage swing (V SW ) and load capacitance (C L ) should be minimized. Measurements on D flip flops implemented based on the idea shown in Fig. 2 , show that for a reliable operation, V SW can be as low as 150 mV (∼ 6U T ). The other possibility is to reduce the loading effect of C L either by reducing the parasitic capacitances at the output node by physical design, or by inserting a buffer stage at the output. 
B. Using Source Follower at the Output
The output load capacitance in a complex design is generally due to the interconnections and can be as high as hundreds of fF. In this case, using a simple buffer stage can relax the power-delay tradeoff in the SCL circuits considerably. As illustrated in Fig. 3 , in this case the SCL core only drives the input capacitance of the buffer stage which is composed of the gate-drain overlap capacitance and the gate-source contribution (strongly reduced by the Miller effect) of M3; is therefore very low [8] .
In Fig. 3 , the time constant at the output node would be
in which g m3 is the transconductance of M3 and since the device is in sub-threshold regime, it can be approximated by: g m3 I B /(nU T ) (n is the subthreshold slope of M3). Neglecting the delay of SCL core in Fig. 3 (since this stage simply drives the low input capacitance of source follower 
Assuming that V SW = 6U T and I B = I SS , then γ τ 4.62 which means that the time constant at the output node improves by a factor of 4.62. In this case, the power-delay product improves by a factor of less than 4.62, since the total bias current of SCL-SFB is I DD,SCL−SF B = 2·I B +I SS,C > 2 · I SS . Therefore,
Meanwhile, it should be mentioned that the time constant calculated in (2) is based on small signal model for devices. In large signal operation, when the gate of SFB increases and sources some current to the output load, this equation remains valid. However, when the gate voltage goes down and the current source of I B should discharge the load capacitance, then the output node will slew down by a slope of I B /C L . This means that when the rise time improves considerably by adding SFB, for falling transition on the output M3 will be cutoff, and the output load will be discharged by I B .
To have a fair comparison, let's assume that the power consumption in both topologies is equal, i.e., both have the same supply voltage V DD , and
then it is possible to compare the delay of these two topologies for the same bias current level and for different γ I values. Here, γ I represents the ratio of the current that is consumed in the core of SCL gate compared to the total bias current of source followers. When the load capacitance is high, then the time constant at the output node is dominant and it is preferred to have a small γ I value. On the other hand, if the load capacitance is small, a large γ I value (close to 0.5) would be preferred to have optimum total settling time. The delay time of these two topologies have been compared in Fig. 4 . As can be seen in this plot, the delay improves by a factor of about two for high values of the load capacitance (C L > 100fF ). This means that the SCL-SFB topology can achieve a PDP two times better than a simple SCL topology when C L > 30fF .
As illustrated in Fig. 5 , the SCL-SFB topology shows better rise time when the load capacitance gets large. The cross point in this figure is 20fF < C L < 90fF . For the 180nm process used in this comparison, this loading capacitance corresponds to the parasitic capacitance of an interconnection of ∼20-100 μm. Considering the interconnection distance, it is possible to argue that in a practical design, the load capacitance will always exceed this limit (20μm or C L > 20fF for the 180nm process used in this comparison). This means that using SCL-SFB can help to reduce the rise time, considerably. Figure 6 compares the fall time of these two topologies. As explained in Secion III-B, there is not a big difference between the fall times in these two topologies. For falling edge, M3 turns off and the load capacitance will be discharged by the bias current I B . Fig. 7 shows the transient simulation results. While this circuit shows a very fast rising edge at the output, the fall time is relatively slow and the output waveform is asymmetric. The fast rise time, as shown in Fig. 7 , has been achieved at the cost of current spikes at each rising edge. This current is almost equal to I P eak g m3 · V SW which is I P eak 5 · I B assuming V SW = 200mV . Figure 8 shows the operation frequency of two ring oscillators designed based on the proposed logic topologies. As can be seen, the oscillation frequency of the SCL-SFB topology is much higher in high C L values, as it was expected. Based on Fig. 8 it is possible to calculate the PDP of the two topologies. The normalized PDP for a simple SCL gate is close to 1fJ/fF (normalized to the load capacitance C L ), while an SCL-SFB topology shows a PDP of 0.5fJ/fF/gate in high C L values, about two times less than simple SCL gates.
IV. CONCLUSION
We propose to use source follower buffers at the output of source coupled logic (SCL) circuits to improve the powerdelay product (PDP). Using this technique, the PDP for SCL circuits can be improved by a factor of two, as shown by analytical examination and confirmed by detailed circuit level simulations. As an example, this technique has been applied to design ultra-low power SCL gates biased in subthreshold regime. The simulations show that the proposed circuit designed in 0.18μm digital CMOS technology, can reduce the PDP and achieve a PDP in the order of 0.5fJ/fF/gate.
