Abstract-This paper describes a novel technique for implementing ultra low-voltage/low-power digital circuits. The effective threshold voltage seen from a control gate is adjusted during a UV-light activated tuning procedure. The optimal effective threshold voltage matching the supply voltage and speed may be programmed by UV-light through an activated conductance between the power-rails and the floating gates. Measured results are provided for gates operating down to 0.4V power-supply using a standard double-poly CMOS process.
I. Introduction R EDUCING power dissipation in digital circuits becomes more and more important due to an increasing number of transistors in digital chips. Reducing supply voltage leads to reduced performance if the threshold voltage is not scaled accordingly. Both supply and threshold voltage scaling have been proposed for low-voltage/lowpower digital designs [1] , [2] , [3] . Due to circuit topology, the optimal operating point may vary significantly between sub-circuits depending on the activity and logic depth. Usually we are stuck with the inherent and fixed threshold voltage of a selected process, and applying different supply voltages for the sub-circuits imposes severe area penalties. The inherent variation in threshold voltages and supply will normally further reduce the advantage of operating at ultra low supplies [2] . However, the advantage of operating circuits close to the optimal point is significant.
Floating-gate MOS transistors have been used as longterm non-volatile memories. Recently, several new ways of utilizing floating-gate transistors have been proposed [4] , [5] , [6] . The floating-gate transistors may be used to design ultra-low voltage circuits, both analog and digital [6] , [7] . The challenge when using only floating-gate transistors is to tune the floating-gate voltage of all transistors without adding extra circuitry and/or control inputs.
The energy consumed by a circuit may be defined as the total power consumption multiplied by the cycle time. The power delay product (PDP) equals the energy. The energy delay product (EDP) [2] is equal to the PDP multiplied by the cycle time and is often used as a measure of digital circuit efficiency. The energy-delay product for different effective threshold voltages and ultra low supply voltages are presented in this paper.
In the second section we present the floating-gate UVMOS (FGUVMOS) design technique [3] , [7] , [6] . The effective threshold voltage seen from the control gate is programmed during a reverse biased initialization step called programming mode. The programming mode may be used to design sub-circuits with different effective threshold and supply voltages without altering the switching points of the gates. The next section presents some basic FGU-VMOS combinatorial gates; inverter, NAND, NOR and XOR gates. In section IV, V and VI we present a FGU-VMOS D flip-flop, a single bit adder and a pad-driver respectively. The optimal operating point for the D flip-flop is discussed and simulation results are presented.
II. Floating-Gate UVMOS (FGUVMOS) Circuits
A circuit symbol for an m-input FGUVMOS transistor and I-V characteristics are shown in Fig. 1 . The inputs, or control gates, are shorted together in Fig. 1 to illustrate the difference in I-V characteristics between an m-input floating-gate transistor and a standard (non floating-gate) transistor. Due to capacitive division the slope in weak inversion is somewhat reduced. In practical circuits the coupling capacitor may be increased reducing this effect, but to avoid area overhead we keep our coupling capacitors no more then three to four times the size of the gate-area of the corresponding transistor. When applying separate inputs (control gates) to each input capacitor C 1 to C m , we have
where U T is the thermal voltage, n is the slope factor, V dd is the supply voltage, V i is the voltage on the i th control gate, k i is the capacitive division factor of the i th input capacitor C i and I bec is the balanced equilibrium current. The i th capacitive division factor is defined as k i = C i /C T , where C T is the total capacitance seen by the floating-gate. For simplicity we use a weak inversion transistor-model, but the FGUVMOS transistor may be operated in strong inversion as well with a slightly more complicated analysis. The effective threshold voltage seen from the control gate depends on the sum of the input capacitances compared to the total floating-gate capacitance, i.e. the slope of the I-V characteristics may vary among the transistors. When all control inputs are V dd /2 the floating-gate voltage is equal to the offset voltage V offset .
In this paper we discuss the performance, power consumption and efficiency in terms of supply and offset voltages. If the offset voltage of an NMOS transistor is increased, the current increases, and the effective threshold voltage is reduced. Furthermore the capacitive division factors may be exploited to improve the symmetry of PMOS and NMOS transistors. If the capacitive division of the PMOS transistor is slightly larger compared to the NMOS transistor for a FGUVMOS inverter the transconductances can be equalized. The effect of differences in nominal threshold voltages and the βs of PMOS and NMOS transistors are removed through the offset programming.
The noise immunity of digital FGUVMOS circuits is an important issue to be considered. The noise margin (NM) of digital gates can be expressed as I on /I of f . 
for an inverter operating in weak inversion. The noise margin deteriorates with reduced supply voltages. Another contribution to a reduced noise margin is the capacitive division factor k (0 < k < 1). Eq. 2 is based on weak inversion operation. A typical region of operation for FGU-VMOS circuits, or any ultra low-voltage digital circuits, is moderate inversion. If the applied offset is increased, that is, the effective threshold voltage is reduced, the noise margin will be reduced due to the smaller dynamic current range in strong and moderate inversion compared to weak inversion. The relative transconductance g m /I is at its maximum in weak inversion and decreases gradually through moderate inversion to strong inversion. The offsets can be tuned or programmed to virtually any value for any double-poly CMOS process. All transistors are programmed simultaneously and the programming time is independent of the number of transistors on the chip (or wafer). The offsets are programmed by reverse biasing the supply rails and exposing the chip to UV-light [3] , [9] . The UV-activated conductances [10] are used to alter the floating-gate voltages. The programming phase may be monitored on any output and is completed when the output converges to V dd /2.
The programming scheme is as follows: 1. Choose the supply voltage V dd . 2. Determine the offset voltage. 3. Turn on the UV-light and apply the offset voltages through the power rails giving the positive offset for NMOS transistors with the ground pin and negative offset for PMOS transistors on the V dd pin. 4. Monitor any output node and stop the programming by turning off the UV-light when the output node stabilizes at V dd /2. The supply rails are reverse biased in the programming mode and thus the source and drain terminals are interchanged giving us a high impedance power rail connection and low impedance (common source) output, and hence all internal and output nodes are driven to V dd /2. A more detailed discussion of the reverse biasing conditions is given in [9] .
The output of the FGUVMOS inverter in Fig. 2 during programming is shown in Fig. 3 . The different curves show measurements of identical inverters on different chips. The initial states of the floating-gates are unknown prior to programming. During programming the logic levels and switching point of the inverter evolve from an initial random state as shown in Fig. 4 . The time constant of the programming is independent of the number of transistors on a chip. The floating-gate capacitance and the UV-activated current determine the programming time. Measured inverter characteristics for different supply voltages and current levels (offsets) are shown in Fig. 5 .
The digital FGUVMOS design may be compared to stan- the static power consumption, and the worst case noise margin deteriorates as the number of input increases. As a matter of fact any digital gate may be implemented using FGUVMOS transistors with only two capacitive inputs. 4. Area/Simplicity. The input capacitors can be used to cancel the mobility difference between the NMOS and PMOS transistors without increasing the transistor size. The multiple input transistors can be used to reduce the area consumption, due to the reduced number of transistors required for a specific function (Boolean). 5. Economy Any double-poly CMOS process can be used to implement ultra low-voltage/low-power circuits. The typical programming time for the 0.8µ AMS process is 10 minutes. However, for a modern process the programming time can be significantly reduced. The programming time is proportional to the gate capacitance. If, or rather when, the programming time is reduced to a few seconds the manufacture potential is greatly increased. The FGU-VMOS programming may be included in the fabrication process or rather in the test phase, and the cost is proportional to the programming time. A chip may be reprogrammed after packaging when the package provide an UV-window (i.e. UV-erasable EPROM package).
The inherent random mismatch in threshold voltages is not expected to be affected by the programming scheme, although some work have been reported on using floatinggates to compensate for mismatch [11] .
III. FGUVMOS Combinatorial Gates
The FGUVMOS inverter is shown in Fig. 2 . NAND and NOR FGUVMOS gates are shown in Fig. 6 . The worst case static power consumption of the NAND gate is equal to I bec V dd for inputs '01' or '10' and the noise margin is I on /I bec , where I on is the drain current of the PMOS transistor with control input '0'. The FGUVMOS XOR gate is shown in Fig. 7 . The static power consumption of the XOR gate is equal to the worst case power consumption of the Simulated noise margins of FGUVMOS inverter (solid lines) and worst case noise margins (AB = '01' or '10') of the NAND/NOR gates (dashed lines). Models of the FGUVMOS inverter and NAND/NOR gates implemented in matlab [12] using the EKV transistor model [8] are used to derive the noise margins. The capacitive division factor for the single input FGU-VMOS transistor is equal to 0.75 and the capacitive division factors for each input to the two-input FGUVMOS transistor is equal to 0.43.
NAND or NOR gate. Measured FGUVMOS NOR characteristica are shown in Fig. 8 . Simulated noise margins of the FGUVMOS inverter and NAND/NOR gates are shown in Fig. 9 . The noise margin of the FGUVMOS XOR gate is equal to the NAND/NOR gate noise margin divided by two.
IV. FGUVMOS D Flip-Flop
A standard D flip-flop [13] is easily converted to a FGU-VMOS D flip-flop as shown in Fig. 10 . The flip-flops are combined to provide a frequency division (Fig. 11 ) and the maximum operating speed and the corresponding power consumption for several values of offsets and supply voltages are obtained using the circuit simulator Spectre [14] and the process parameters for AMS 0.8µ double poly process [15] .
The performance as a function of offsets for supply 0.3V , 0.5V and 0.7V is shown in Fig. 12 . At offsets below the Fig. 15 . Inverse EDP contours. For large offsets and ultra low supply voltages, bottom right corner, the noise margin is too low to guarantee a reliable operation. The optimal operating point is V dd = 0.3V and an offset equal to 0.83.
nominal threshold voltage (≈ 0.77V ) the performance degrades more rapidly for low supply voltages. The power consumption is reduced even more significantly for low offsets and low supplies as shown in Fig. 13 . Minimum energy (PDP) solutions are generally low performance solutions. A more useful measure of efficiency is to optimize for minimum energy delay product (EDP). The inverse relative EDP for the frequency divider is shown in Fig. 14. An interesting observation is that the optimal operating point is ultra low supply and large offset. If we reduce the offset, and hence increase the effective threshold voltage, the optimal supply voltage increases. The contours of constant inverse EDP are shown in Fig. 15 .
The optimal operating point, that is, optimal supply voltage and effective threshold voltage (offset) may vary among designs. The logic depth and level of activity affect the optimal operating point [1] . Typical optimal operating points are V dd = 0.22V and V t = 0.16V [1] , V dd ≈ 0.3V and V t ≈ 0.13V [2] , and V dd ≈ 0.3V and an offset equal to 0.83V (effectiveV t ≈ 0 .1V for transistor with nominal V t = 0.77V [15] ) for the FGUVMOS frequency divider. A smaller offset, or increased effective threshold voltage, yields a larger noise margin.
V. FGUVMOS single bit full adder
Measurements of a single bit FGUVMOS full adder, shown in Fig. 16 , are included to demonstrate the practicality of the FGUVMOS circuits. The logic depth from the inputs A and B to Sum and C out is 5 and logic depth from C in to C outb is 1. Measured response of the single bit FGUVMOS adder for supply voltage equal to 0.4V for different input (DC) combinations and offset equal to 0.9V is shown in Fig. 17 . The Sum output is logically correct and the noise margin is comfortable. The different transition regions are expected to be close to V dd /2. The actual switching points can be adjusted in the programming to fit a particular supply voltage and current level as shown in Fig. 5 . The rather larger deviation (156mV ) in transition regions seen in Fig. 17 is due to an incomplete programming. More uniform switching points may be achieved by increasing the programming time. Simulation of an adder bit with the same offsets used in the measurements shown in Fig. 17 and a supply equal to 0.8V is shown in Fig. 18 . The average delay from the input C in to Sum is approximately 2.8ns and the average delay from C in to C out is approximately 0.5ns as shown in Fig. 19 . Fig. 20 illustrates the transistor diagram of a four-stage buffer circuit with four times current scaling per stage. The buffer acts as a pad-driver for FGUVMOS-circuits and is designed for high frequency operation. This circuit must be used to interface with external components or measurement equipment. The buffer is included in this paper to show that FGUVMOS-circuits are able to operate on frequencies in the MHz-range. For a typical 0.8µ CMOS-process with a standard package, the load-capacitance associated with the bonding-pads and chip-package is approximately 3pF . Combined with a 300Ω series resistance, the maximum frequency measured externally without loosing amplitude is given by the following equation: 
VI. FGUVMOS pad-driver
Including the 2pF capacitance associated with the oscilloscope probe used, the maximum frequency for a sinusoidal signal will be:
So even though the buffer-circuit itself tolerates very high frequencies, the response is limited by external parasitics. To verify the frequency response of the FGUVMOS-buffer, it was adapted to operate on a supply voltage of 0.5V . The results when connecting the input of the buffer to a signal-generator and measuring the response are shown in Fig. 21 and Fig. 22 for a 10M Hz and a 20M Hz input signal respectively. As can be seen from the results we almost have a full signal swing in both cases. Another property to notice is that the output signal is no longer a square-wave. This is due to the filtering of the 3rd and higher order harmonics caused by the filtering mentioned above.
VII. Conclusions
In this paper we have presented a novel tuning technique for digital logic enabling ultra low voltage operation (< 1V ) in standard double-poly CMOS. The tuning of the effective threshold voltage of both NMOS and PMOS transistors is conveniently done with the power rails during exposure to UV-light. Combinatorial logic is shown to work through measured results with good agreement to simulations.
VIII. Acknowledgment
The authors would like to thank S. Naess for valuable comments and proofreading of the manuscript.
