Abstract-Power-frequency scaling in subthreshold sourcecoupled logic (STSCL) systems has been studied and analyzed. It is shown that the operating frequency of such systems can be adjusted over about three decades with linearly proportional power dissipation. The heart of such a system is a phase-locked loop (PLL)-based clock generator (CG) with a very wide tuning range controlling the dynamics of the STSCL system. The design of a wide tuning range PLL utilizing a novel self-adjustable loop filter that generates the reference clock as well as the bias current for the STSCL system is described. The PLL-based CG exhibits linear power-frequency characteristics in order to minimize its power consumption overhead (7 pJ with 350 nA standby current). Implemented in 0.13 μm CMOS, the CG occupies 0.06 mm 2 with a supply voltage that can be reduced down to V DD = 0.9 V.
I. INTRODUCTION
W HILE the energy dissipation in CMOS digital systems depends on the frequency of operation [1] - [3] , the current-mode logic or source-coupled logic (SCL) systems exhibit a relatively constant figure over a wide range of frequencies [4] . At high frequencies, the supply voltage of CMOS circuits needs to be increased to reduce the delay of the gates, and hence energy consumption increases as E CMOS ≈ CV 2 DD (here V DD is supply voltage and C stands for the total parasitic capacitor to be charged; the energy dissipation due to the leakage power is ignored). At very low frequencies, supply voltage reduction does not considerably help to reduce the leakage power dissipation. Moreover, by reducing supply voltage and moving toward subthreshold region, the gate delay increases very rapidly. Hence, the optimum point for energy dissipation lies somewhere in the midfrequency range, depending on technology. As the supply voltage for optimal energy consumption in CMOS circuits generally lies somewhere close to the threshold voltage (V T H ) of devices, process and supply variations have a great influence on system performance and, hence, reliability [2] , [5] . Fig. 1 shows the circuit diagram of an AND/NAND logic operator implemented in CMOS and SCL topologies. In [4] , a comparative study analyzing the performance of these two topologies in the presence of process variations has been presented. Shown in Fig. 2 , the energy consumption of a SCL system is relatively constant over a wide range of operating frequencies. This energy consumption can be expressed as E SCL ≈ CV DD V SW . Here, V SW stands for the voltage swing at the output of each gate and can be much smaller than the supply voltage. Therefore, for digital systems with shallow logic depth, SCL architecture potentially consumes less compared to the CMOS topology. By reducing the operating frequencies generally V DD and V SW both remain constant, which means the energy consumption in the SCL architecture is independent of operating frequency. Subthreshold SCL (STSCL) architecture, however, helps to reduce both V DD and V SW , and hence reduce the energy consumption at very low operating frequencies (Fig. 2) . As a conclusion, unlike CMOS topology, STSCL provides the opportunity to save energy when moving toward very low frequencies. In addition, one can employ pipelining to improve the energy efficiency of such systems [4] . This article is a continuation of [6] , where we introduce and expand the concept of dynamic power management for SCL systems operating in either strong or subthreshold region. It is also shown that a phase-locked loop (PLL)-based clock generator (CG) circuit can be employed in widely tunable STSCL systems to dynamically adjust the operating frequency and biasing conditions with respect to the system work load, and hence save power.
II. ENERGY CONSUMPTION AND FREQUENCY SCALING
This section briefly describes the main tradeoffs in optimizing the energy consumption in SCL systems. 1 As described in [4] , the energy dissipation in this type of system can be approximated by
where N stands for average logic depth, α is the activity rate of the system, V DD is the supply voltage of the system, V SW is the voltage swing at the output of each SCL gate, and C L is the average load capacitance. Using a pipelining technique [4] , one can maximize the efficiency of the static bias current in each cell and, hence, reduce the energy consumption toward
A further reduction in consumption is possible using: 1) more advanced technology nodes to reduce C L [4] , 2) reducing supply voltage, and 3) reducing V SW . Interestingly, operating at subthreshold provides the opportunity to reduce both supply voltage and voltage swing. This property is conceptually shown in Fig. 2 . In the subthreshold region, the voltage swing can be reduced theoretically down to about 4 n U T (n is the subthreshold slope factor of the NMOS differential pairs devices in Fig. 1 and U T stands for thermal voltage). Meanwhile, supply voltage can be reduced in proportion to ln I SS , where I SS is the bias current of a logic cell. In extreme case, where the bias current is well below 1 nA, the supply can be reduced to about V DS,M N B + 4 n U T ≈ 10 U T (as shown in Fig. 1 , MNB is the tail bias transistor). With these assumptions and assuming that the load capacitance is dominated with the input capacitance of the following STSCL stages (with average fan-out of FO = 4):
where ω T stands for transient frequency of the device and I SS clearly depends on operating frequency. Though an extreme case, (3) shows proper managing of the operating condition of STSCL systems with respect to the work load and operating frequency can help to save considerable amount of energy, which is generally very essential in ultra-low-power (ULP) applications. At very low speeds, power dissipation of STSCL systems will be limited by different sources of leakage currents [4] , and hence the energy dissipation starts to increase. Based on this analysis, one can make the following observations: Observation 1: Power-frequency scaling in SCL systems does not help considerably to save energy. As the energy dissipation does not depend on operating frequency (or equivalently on gate delay), by scaling power with respect to the 1 Detailed analysis and optimization techniques for designing SCL gates can be found in [4] , [7] , and [8] . operating frequency, we only make sure that the system will remain on a constant energy contour, which has been optimized for the proposed SCL system. Observation 2: Moving toward subthreshold operation and using power-frequency scaling helps to leave the constant energy contours toward contours of energy with less consumption. This property is completely contrary to the property of the conventional CMOS circuits where moving to supply voltages less than the optimal supply voltage will increase the energy dissipation and most of the time power dissipation. Observation 3: Power-frequency scaling in SCL systems always helps to save power. Even at very low speeds where devices can be in deep subthreshold (weak inversion) region, the power dissipation can be reduced in linear proportion to the operation frequency. Therefore, for ULP systems where power dissipation is a critical issue, this technique can be used as a powerful tool. The concept of dynamic power scaling has been widely studied for CMOS architecture, where the operating conditions and clock frequency are scaled with respect to the work load of the system [9] . Fig. 3 describes the techniques that can be used for dynamic power-frequency scaling in SCL systems. When the system is in sleep mode, as illustrated in Fig. 3(a) , the clock frequency can be reduced considerably to save power. The clock frequency will be increased only if there is any need for higher processing speed. On such occasions, the clock frequency and the operating conditions are required to be switched quickly to high-performance mode. In addition, having a scalable power-frequency controlling unit and a versatile CG can further improve the overall system power efficiency. As shown in [10] , the energy dissipation of an 8 × 8 Carry-Save multiplier can be reduced down to about 500 fJ when the data rate is in the range of 20 Mb/s. For data rates higher or lower than 20 Mb/s, the energy consumption increases considerably [10] . Based on experimental data, an STSCL-based multiplier with similar architecture and with a power management system introduced in Fig. 3 consumes about 750 fJ for a data rate ranging from 1 kHz up to 50 MHz.
Based on this, the main goal of this work has been to implement an ULP CG with arbitrary output frequency that can be adjusted over a very wide range with scalable powerfrequency property.
III. CG TOPOLOGY

A. Frequency Scalability
To implement widely tunable PLL-based CGs, adaptive bandwidth and self-biased topologies have been developed [11] , [12] . In this paper, we are introducing a PLL with a selfadjustable loop in which the poles and the zero of the system are automatically tuned with respect to the oscillation frequency of the ring oscillator, f osc , and the input frequency, f REF . This approach helps to control the loop dynamics, and hence the loop bandwidth and jitter performance by frequency scaling. Moreover, one of the main targets in this work is to implement a linearly power-frequency scalable CG. The importance of having a scalable power with respect to operating frequency is to have a constant consumption overhead due to the CG over the range of operating frequencies.
B. Analyzing the Dynamics of the System
Using continuous-time approximation [13] for the PLL shown in Fig. 4 , the open loop gain can be calculated as
where K OSC is the oscillator sensitivity factor and is defined as the variation at the output oscillation frequency divided by the input controlling signal. Here, we intentionally avoid using the conventional definition where the controlling signal is voltage. As will be shown later, we will be using a current-controlled oscillator (CCO) to add more flexibility to the system, and hence make it more convenient for a widely tunable implementation. The loop filter should be designed based on jitter and dynamic performance requirements of the system. In Fig. 4 , R 1 and C 1 create a zero to make the loop stable. The noise associated with R 1 can degrade the phase noise at the output of oscillator; hence, it is recommended to choose a small enough value for R 1 [14] . Meanwhile, C 2 is used to reduce the ripples on controlling signal, V C , and hence reduce the pattern jitter [15] . However, the extra phase lag associated with the extra pole created by C 2 will cause some stability issues. The ratio of b = C 1 /C 2 needs to be selected very carefully to avoid instability [14] . To reduce the pattern jitter that is mainly due to the variations on the controlling signal, V C , the order of loop filter can be increased even more [14] . The design can be made based on estimating the loop damping factor [11] , [15] 
where ω C = 2ζ/(R 1 C 1 ). After choosing a proper value for ζ, the value of the other elements can be derived. 2 To implement a scalable PLL, it is possible to change the input frequency (f REF ) or the division ratio of the frequency dividers (N and P , which are shown in Fig. 4) . Therefore, the effect of changing these three parameters on the loop dynamic behavior is needed to be studied. To achieve a stable PLL, it is necessary to properly set the values of ω C and loop zero, |z| = 1/(R 1 C 1 ) = 1/τ , with respect to the reference frequency. Finally, the bias current of the charge-pump circuit (CPC) needs to be selected with respect to the input frequency and also the division ratio.
The design process can be started by estimating the value of τ with respect to the input reference frequency [14] 
where M F = 2πf REF /ω C and b are two constant numbers (ω C needs to be smaller than 2πf REF by a factor of M F for stability issues). Therefore, τ depends only on f P = f REF /P , and not on N . The next step is to calculate the charge-pump bias current from (5)
which indicates that, for constant values of C 1 , f REF , and M F , and the charge-pump bias current are needed to be changed proportional to N and inversely proportional to the square value of P . Therefore, a CPC with programmable or adjustable bias current is required. Design of a CPC with a bias current proportional to N/P 2 will be complicated and requires a complex current switching network. A remedy for simplifying the circuit topology is to use a CCO instead of a voltage-controlled oscillator in which [16] 
Based on (8), a transconductance, G m , is inserted into the loop in order to convert the controlling voltage to the controlling current. In this case, the controlling current is equal to the oscillator current
Therefore, the controlling current is always proportional to N/P . Based on this, if we make G m value proportional to its current, i.e., G m = I C /V char . 3 , then using (7) and also [14] 
where N d is the number of delay stages in ring oscillator, C L is the output load capacitance that each STSCL-based delay cell in the ring oscillator observes, and V SW is the voltage swing at the output of delay elements. As a conclusion, it is sufficient to make the bias current of the CPC proportional to I C /N , as shown in Fig. 4 .
C. Topology
Fig . 4 shows the topology of the proposed PLL. As explained, a transconductor has been added to the loop in order to have an extra degree of freedom to keep the loop damping factor constant over its tuning range. In addition, the pole and the zero of the loop are required to be scaled with f osc . For this reason, the bias current of the CPC, I CP C , is biased with a fraction of I C in order to adjust the pole placement in proportion to f osc , as shown in Fig. 4 . To have a loop zero in proportion to f osc , R 1 is implemented using the same type of resistance that has been used in each STSCL gate [shown in Fig. 3(b) ]. Using this approach, and with selecting proper ratio between I CP C and I C , the system remains stable with a scalable dynamic behavior for its entire tuning range, which can be programmed by division ratio inside the loop, N , and also division ratio outside the loop, P .
D. Power Consumption Scalability
To make the power dissipation of the PLL proportional to the operation frequency, and hence minimize it in the sleep mode, STSCL-based phase-frequency detection (PFD) and dividers [6] have been employed. An appropriate fraction of the controlling current is used to bias these circuits. As the bias current of the CPC is also proportional to I C , the total power dissipation of PLL will become proportional to I C , and hence to f osc . This property has been particularly important to reduce the current consumption of the CG to only 350 nW in the sleep mode.
IV. CIRCUIT IMPLEMENTATION
A copy of the critical path of the digital STSCL system has been used to construct the ring oscillator in Fig. 4 . This part uses the same supply as the digital STSCL block. In this way, the delay of critical path will be always properly controlled with respect to the clock frequency. The controlling current will be copied to all the digital STSCL gates with an appropriate ratio. Therefore, the PLL circuit not only provides the system clock, but also it controls the bias current, and hence the delay of each gate. Unlike CMOS logic circuits where supply voltage needs to be adjusted using a complicated dc-dc converter, there is no need for such a block implemented in STSCL, and hence the system is more power efficient. The absolute value of the supply voltage of STSCL system and its variation does not affect the circuit speed or performance as the core is based on differential topologies [4] . An SCL to CMOS converter is used to convert the differential low-swing signal at the output of the PFD to full CMOS level at the input of CPC.
One of the critical blocks in the proposed CG, shown in Fig. 4 , is the transconductor, which needs to have a transconductance proportional to I C with a very wide output current swing [ Fig. 5(a) ] [6] . Based on simulation results, this circuit can provide an output current between 40 pA and 800 nA with a transconductance proportional to its bias current.
In the current design, the supply voltage of the digital STSCL part can be reduced down to 350 mV [4] . To cover a wide range of frequencies, certain parts of the PLL-based CG (such as CPC and the transconductor block) are using higher supply voltages.
V. SIMULATION AND EXPERIMENTAL RESULTS Fig. 5(b) shows the simulated transient response of the PLL at different operating frequencies. The time scale of the graph is normalized to the oscillation period. As can be seen, the transient response of the PLL remains invariant with the frequency scaling, and hence the ratio of the settling time with respect to the oscillation period remains almost unchanged. The CG circuit has been implemented in 0.13 μm CMOS and occupies a 0.06-mm 2 active area. Fig. 6 shows the measured spectrum of the output clock operating at 100 kHz at V DD = 1.2 V. Since here N = 16, the internal clock is at 1.6 MHz. The output clock is buffered before delivering the signal to the output pad for measurements as well as the internal digital blocks. The digital system will have its own clock tree. As the measurement results in Fig. 7(a) show, the oscillation frequency can be adjusted from 1 kHz up to about 3 MHz at V DD = 1.0 V. While the tuning range is about 2.5 MHz at V DD = 0.9 V, it increases to 4 MHz at 1.2 V. For frequencies above 10 kHz, power dissipation scales linearly with the rate of 7 pA/Hz at 1.0 V. Fig. 7(b) illustrates the step response of the PLL to a frequency jump equal to 1/200 followed by an increase by a factor of 200. While the controlling current is linearly proportional to f osc , the controlling voltage changes in proportion to the logarithm of the controlling current. The standby current of the CG circuit is 350 nA. The supply voltage of the PLL can be set to as low as 0.9 V, with minimum impact on the tuning range.
One important aspect in this work has been to make sure that the delay of STSCL gates used inside the PLL loop are closely matched with the delay of STSCL gates inside the digital system (see Fig. 3 ). Some test structures have been implemented on this prototype for measuring delay mismatch among the gates. Eight different ring oscillators were implemented supplied by the same replica bias circuit. The measured delay mismatch is very close to the predicted value through analysis [4] and Monte Carlo simulations. Predicted by analysis, the variation on delay is proportional to δt d /t d ≈ δV T H /(nU T ). Using devices larger than minimum size, the measured on-die delay mismatch was σ t d /t d ≈ 5%, tested in different bias current values down to 20 pA.
VI. CONCLUSION
The concept of power-frequency management in SCL systems, particularly in STSCL systems, was introduced and analyzed. It has been shown that unlike the conventional CMOS architecture, using STSCL topology can help to reduce not only the energy consumption, but also the power dissipation. A wide tuning range CG circuit to control the operating condition of ULP STSCL based digital systems has been presented. Using current-mode PLL topology, the loop dynamics are controlled very carefully to have a stable operation over its wide tuning range. Having a wide tuning range (×1000), the power dissipation of the circuit is linearly proportional to the output frequency. This performance makes the design very convenient for ULP bio-medical applications.
