Abstract-A nano-power CMOS voltage reference is proposed in this paper. Through a combination of switched-capacitor technology with the body effect in MOSFETs, the output voltage is defined as the difference between two gate-source voltages using only a single PMOS transistor operated in the subthreshold region, which has low sensitivity to the temperature and supply voltage. A low output, which breaks the threshold restriction, is produced without any subdivision of the components, and flexible trimming capability can be achieved with a composite transistor, such that the chip area is saved. The chip is implemented in 0.18 µm standard CMOS technology. Measurements show that the output voltage is approximately 123.3 mV, the temperature coefficient is 17.6 ppm/ o C, and the line sensitivity is 0.15 %/V. When the supply voltage is 1 V, the supply current is less than 90 nA at room temperature. The area occupation is approximately 0.03 mm 2 .
I. INTRODUCTION
Interest in subthreshold CMOS circuits with power supply voltages below V th (the transistor threshold voltage when other effects are neglected) is growing because reducing the operating voltage can effectively achieve ultra-low-power dissipation [1] [2] [3] [4] . For these circuits, the reference voltage decreases as the supply voltage is reduced. Hence, a voltage reference (VR) that produces a low output voltage (< th V ) and consumes power in the sub-µW range is required. However, few studies have been explored to meet such requirements in VRs. Several reported VRs use only MOSFETs and allow for nano-power consumption [5] [6] [7] [8] . The outputs of these VRs are above the threshold voltage while the V th of the MOS diodes are used as the absolute voltage reference source. When process variations are considered, a low enough temperature coefficient (T.C.) and accuracy of the output voltage are not sufficient because the trimming procedures are not implemented for these structures.
Resistive dividers are commonly used to achieve low output (< V th ) [9] [10] [11] . However, operation is difficult with nano-power dissipation because a trade-off between the design area and power consumption (low current requires large resistor) always exists, the resistor network required for trimming implementation occupies a large area of the chip.
Recently, several solutions employ two devices with different V th values to achieve a low output [12] [13] [14] . Because the bandgap reference source is used in [12] , the output voltage is larger than the threshold voltage, and the power dissipation is still high because resistors are used. In [13, 14] , ultra-low-power dissipation can be achieved. However, these designs require a process with multiple-V th values. The output voltage depends entirely on the V th difference from two distinct devices. Thus, the value is constrained by the process technology chosen, and the impact of process variations is increased.
A new method for the design of a low reference voltage (< th V ) through the use of the body effect in
MOSFETs is proposed in [15] . Fig. 1 shows the circuit by which a VR with low T.C. can be implemented by using the reverse-bias body effect in transistor M1. The output voltage (V ref ), which is based on the difference between the two gate-source voltages (
is independent of V th . However, the typical structure has the following drawbacks: the supply current remains large (1.2 µA); and the output suffers from the offset of the amplifier; the performance is degraded when the value of the resistor varies with the process, the large resistor occupies more chip area; the output voltage and the T.C. are subject to component mismatches and deviations resulting from process variations; the deviations and the mismatches are not well discussed, and the trimming procedures are not considered. Generally, the switched-capacitor (SC) technique is popular for the design of bandgap VRs [16] [17] [18] because of the following advantages [19] : the offset of the amplifier can be canceled; instead of two transistors, only one transistor is used to generate the output voltage, thereby reducing the chip area and avoiding mismatch between transistors; the capacitors match better and occupy less area; and the low current does not require large resistors. A CMOS-based switched-capacitor voltage reference (SCVR) [20] is proposed to improve the design in [6] . Low output is produced through capacitive subdivision instead of the resistive subdivision used in [11] , and nano-power dissipation can be achieved. The trimming capacitors can be adjusted for process variations to obtain a low T.C.. However, this operation conventionally requires three capacitors, and the capacitive subdivision requires a large value. Thus, this operation occupies more chip area. Therefore, a method that combines the SC-based technology with the bodybiasing technology in MOSFETs is explored in this paper to solve the aforementioned problems. Low output (< V th ) is produced and can be scalable with standard CMOS technology, and the method achieves power and area savings. The remainder of this paper is organized as follows: Section II describes the operating principle of the proposed SCVR. In Section III, implementation of the circuit is presented. Considerations for this design and simulations are discussed in Section IV, and measurement results are demonstrated in Section V. Finally, Section VI provides the conclusions.
II. OPERATING PRINCIPLES OF PROPOSED SCVR CIRCUIT
The proposed SCVR circuit shown in Fig. 2 is based on a sample and hold structure [19] . The circuit is composed of a bias current circuit, a core circuit, switched capacitors, and an operational transconductance amplifier (OTA 
From the equations, V off can be eliminated by storing the capacitors, which is auto-zeroing technology. According to the law of conservation of charge, the charges must be equal to each other
From Eqs. (1) and (2), the output voltage can be expressed as
The load capacitor C L , which is usually off-chip [16] [17] [18] 20] , ensures that the output is valid during the sampling period. The control circuit of the switches for the non-overlapping clock signals is presented in Figs The gate-source voltage V sg of the PMOS transistor in the subthreshold region can be expressed as (
where I D is the drain current, n is the subthreshold slope factor, V T (= kT/q, where k is Boltzmann's constant) is the thermal voltage, K (= W/L) is the aspect ratio of the transistor, C ox is the oxide capacitance per unit area, and µ p is the hole mobility. '' '' ln
With Eqs. (7) to (10) and V bs1 = V sg2 , Eq. (4) can be presented as ( )
In Eq. (11), the first term inside the square brackets is complementary to the absolute temperature (CTAT) and achieves a negative T.C. [15] . The second term is proportional to the absolute temperature (PTAT) and implements a positive T.C.. Because of the body-biasing effect from the bias voltage V sg2 , the CTAT term is achieved with two distinct threshold voltage levels ( and ). Therefore, the absolute value of V th0 in V ref is canceled. Under the assumption that
The proposed SCVR has advantages as follows: The operation only requires two capacitors (C 1 and C 2 ), whereas other SCVRs require more [18, 20] . Based on Eqs. (4) and (11), the low output does not require capacitive subdivision; furthermore, the ratio C 1 /C 2 can be used to determine the scale of the output. The reference voltage is implemented with only one transistor (M1) in the standard process, such that the component matching errors are decreased, and the chip area is saved. The input offset of the amplifier can be canceled. The bias currents only require nano-ampere levels because M1 and M2 work in the subthreshold region, and the current ratio m can be selected to obtain a zero T.C., as demonstrated in the next section. Fig. 6 shows the proposed SCVR circuit composed of the start-up circuit, the bias current circuit, the core circuit, and the OTA. 
III. IMPLEMENTATION OF PROPOSED SCVR C IRCUIT

Nano-ampere Bias Current Circuit
The cascode current-bias circuit [21] can generate a nano-level current that is insensitive to technology, temperature, and supply voltage; thus, low-power dissipation can be ensured. Transistors M5 and M6 operate in the subthreshold region, whereas M7 and M8 work in the linear region and the saturation region, respectively. Current mirrors in the cascode structure improve line sensitivity; however, the structure increases the minimum supply voltage. For the subthreshold operation in current mirrors and the saturation operation in M8, the minimum supply voltage must ensure V sd > 4V T and V gs8 > V th8 . Thus, the minimum supply voltage can be redistributed as the sum of two drain-source voltages V ds and a gate-source voltage V gs8 in the branches of the circuit (
Control of T.C.
The core circuit consists of transistors M1 and M2, switches, and capacitors. The zero T.C. is satisfied by
where T 0 is room temperature. Thus, the temperature dependence of the SCVR can be obtained by differentiating Eq. (11) with respect to temperature and can be given by
In this work, the T. , the first term in the brace has a negative T.C., whereas the thermal voltage V T has a positive T.C. (≈ 0.087 mV/ o C) [22] in the second term. Hence, a zero T.C. can be achieved by choosing an adaptive biased-current ratio m that is the size ratio between M3a (M3b) and M4a (M4b). With the value of the parameters in Eq. (14), the value of m can be calculated from ( )
OTA Circuit
The OTA [23] consists of the transistors M o1 -M o7 . The input offset resulting from asymmetries and process variations is canceled by the use of the switched- capacitor network. The transistors M o1 and M o2 form the cascode amplifier, and M o3 and M o4 constitute the feedback loop to increase the output impendence and boost the gain of the OTA, thereby enhancing the precision and speed of the switched-capacitor amplifier. The OTA can be operated with a nano-current supply. From simulations, the gain of the OTA is approximately 70 dB when Vdd = 1 V at room temperature.
IV. DESIGN CONSIDERATIONS AND SIMULATIONS 1. Charge Injection Effects and Clock Feed-through
The switched-capacitor circuit suffers from chargeinjection errors and clock feed-through, which influences the precision. The priori method is the transmission gate at the switch [22] . The charge injection and clock feedthrough effects can be effectively eliminated with the complementary structure because the opposite charge packets (holes and electrons) injected from the PMOS and NMOS cancel each other, and the on-resistance in the switch can be reduced to achieve high-speed operation.
Operational Clock Frequency
For SC circuits, the output is not valid in one phase because of the discrete-time nature of these circuits; the solution is to use an output filter capacitor. However, a droop of the output voltage across the load capacitor is caused by leakage current that mainly consists of the switch leakage current and bias current of amplifier. Thus, a ripple always exists at the output. The value of the ripple (V ripple ) is expressed by [24] leak ripple L clk
where I leak is the leakage current that flows in or out of the hold capacitor when the sampled and hold amplifier is in hold mode, and f clk is the clock frequency. There is a trade-off between the clock frequency and power consumption; the lower the frequency, the less the power dissipated. However, lower frequency can result in larger voltage ripples and can degrade the accuracy of the output voltage according to Eq. (16) . Hence, the operational frequency must not be too small. With our design used in analog-digital converter (ADC) [4] , the output voltage is not allowed to droop by more than 1/2 LSB. With a large leakage current at 80 o C, the transient analysis is simulated to evaluate the performance, as shown in Fig. 7 . The ripple is approximately 0.5 mV with a 100 pF output capacitor when the clock frequency is 1 kHz.
Process Variations and Trimming
Process variations are generally distinguished between within-die (WID) variations and die-to-die (D2D) variations [7] . WID variations (e.g., σ V th , σ W/L) cause mismatch between transistors of the same chip and influence the relative accuracy of the transistor parameters, whereas D2D variations (e.g., ∆ V th , ∆ W/L) influence the absolute accuracy of the transistor parameters. For the core circuit in this study, the influence of WID variation (V th ) decreases because only one transistor is used for the voltage reference generator. The zero T.C. is primarily determined by the size ratio between M3a (M3b) and M4a (M4b); thus, the mismatch of W/L ( σ W/L) between transistors results from WID variations. This effect can be effectively reduced because of the careful layout techniques and large sizes of transistors [22] .
However, D2D variations cannot be avoided. The variations. Thus, according to Eqs. (11) and (14), the process variations result in spreads in the output voltage and in the T.C.. In particular, the T.C. is more significantly impacted. To investigate the impact, Fig. 8 shows the simulated results for the T.C. when Vdd = 1.5 V and C 1 = C 2 = 600 fF. Because V th (S) > V th (T) > V th (F), V sg2 (S) > V sg2 (T) > V sg2 (F) at room temperature T 0 for M2. Thus, from Eqs. (14) and (15), if the adaptive size ratio m between M3a (M3b) and M4a (M4b) is set to achieve a zero T.C. for TT, then the absolute value of the negative T.C. is equal to that of the positive T.C.. For SS, V sg2 (S) causes a decrease in the value of the negative T.C. in accordance with Eq. (14), such that the absolute value of the negative T.C. is smaller than that of the positive T.C., and the output voltage shows a positive temperature variation. Conversely, V sg2 (F) leads to an increase in the value of the negative T.C., such that the absolute value of the negative T.C. is larger than that of the positive T.C. for FF, and the output voltage exhibits a negative temperature dependence. Therefore, current trimming (changing the T.C. of the PTAT term) is used to correct the slope (compensate for the T.C. variation in the CTAT term); thus, the size ratio m should be adjustable after chip fabrication. One solution is the use of a composite transistor [25] , Fig. 9(a) presents the trimming network. The trim range and the resolution are determined from the simulations shown in Fig. 8 . The aspect ratios of M3a and M3b can be changed by digital logic control to adjust the T.C., as shown in Fig. 10 . The solid lines show the temperature characteristics for the typical code of "111000" (S 1 to S 6 : 111000, where "1" indicates that the switch turns on, whereas "0" means the switch turns off). The trimming performance is shown by the dashed lines in Fig. 10 . For SS, the temperature characteristics have a negative variation with the lowest code of "000000," and the temperature characteristics for FF have a positive variation with the highest code of "111111." Hence, the resolution of 2% is enough for adjustment to obtain a zero T.C. within the range of variation because SS and FF are the worst cases in this process, and the spread of V ref is also minimized. The adjustable range of m has little impact on the power dissipation and the signal-tonoise ratio because the bias circuit generates a bias current of several nano-amperes. Based on Eq. (11), the programmable outputs can be implemented by adjusting the ratio between C 1 and C 2 ; the method can also effectively reduce the output voltage spread. Fig. 9(b) shows such an adjustment with 2% resolution. Six digital control bits are used to satisfy the requirement of increasing or decreasing the output value. Table 1 shows the simulated programmable outputs for TT at room temperature, the value is scalable from 99.8 mV to 135.0 mV by changing the ratio between C 1 and C 2 .
In our work, the purpose of the trimming process is to obtain the zero T.C. as well as the minimum output spread. The detailed trimming procedures are as follows. 
where
To obtain the zero T.C., m should be adjusted to compensate for the slope; then, Eq. (17) is expressed by
where m ∆ is the required range for trimming. The n and T.C. of V T are assumed to have minimal sensitivity to the process variation, and capacitors are assumed to match very well. With Eqs. (17) and (18), the required trimming code can be calculated by
After obtaining the zero T.C., the output voltage spread can be reduced by changing the ratio between C 1 and C 2 .
NMOS transistors are used as switches (S 1 to S 6 , S a to S f ). In production lines, one-time-programmable (OTP) memories such as fuses can be used to control the switches with minimal power consumption [26] .
Effects from Noise
The noise performance should be considered in the SCVR circuit. The most critical noise sources are the non-zero resistance switches, the active transistors inside the OTA, and the noise induced by the clocks. The highfrequency noise is folded back into the signal's baseband in sampled data networks clocked at the same frequency as f clk because of fold effects in sampled noise [27] , For OTA, thermal noise can be reduced by designing a large trans-conductance g m(o1) in M o1 and a small transconductance g m(o5) in M o5 because the noise is proportional to g m(o5) /g m(o1) [22] . The use of transistors with large widths and lengths can decrease flicker noise.
For the switches S W1 -S W8 shown in Fig. 2 , the noise analysis is as follows. During the sampling mode (Ф 1 = 1), the switches S W1 , S W2 , S W3 and S W4 turn on; thus, the contributor of the noise is the on-resistors of the switches S W1 , S W2 , S W3 and S W4 . Then, the total input-referred noise variance is sampled by the capacitors C 1 and C 2 , which can be expressed by [22] 
With the closed-loop gain in OTA (A cl = -C 1 /C 2 ), during hold mode (Ф 2 = 1), the total output noise variance contributed by the switches S W1 , S W2 , S W3 and S W4 is given by
During the hold mode (Ф 2 = 1), the switches S W5 , S W6 , S W7 and S W8 turn on; thus, the noise is from the onresistors R 5 , R 6 , R 7 and R 8 of the switches S W5 , S W6 , S W7 and S W8 , respectively. Fig. 11 illustrates the equivalent small signal model for noise analysis. C in is the input parasitic capacitance, G m is the trans-conductance of OTA, R out is the output resistance, and V n5 , V n6 , V n7 and V n8 are the noise sources that model thermal noise of R 5 , R 6 , R 7 and R 8 , respectively. The output noise variation due to R 8 is kT/C L ; then, the total output noise variance contributed by the switches S W5 , S W6 , S W7 and S W8 can be calculated by [28] 
where B w = G m C 2 /(C L C n +C 2 C m ), with C m = C 1 +C in and C n = C 2 +C 1 +C in , is the bandwidth of the circuit in Fig.  11 with R out >> (C m +C 2 )/C 2 G m , and R = R 5 R 6 /(R 5 +R 6 ). All on-resistors of switches are assumed to have the same value. From Eqs. (21) and (22), with the folding effects, the total noise variance from the switches S W1 -S W8 can be expressed by [29] ( )
where f on is the cutoff frequency of the switchedcapacitor amplifier.
V. MEASUREMENT RESULTS
The proposed SCVR circuit is fabricated in the CMOS 0.18 µm process. For this circuit, capacitors C 1 and C 2 are approximately 600 fF, Fig. 12 shows the chip micrograph, and the chip area is approximately 0.03 mm 2 . The clock frequency is 1 kHz, and the off-chip capacitor C L used for the output filter capacitor is 100 pF. The measured output waveforms are shown in Fig. 13 . Table 2 compares the proposed SCVR with other reported low-power CMOS VRs with low output (< V th ).
The results show that the SCVR circuit can be implemented in standard CMOS technology and that, low output, under threshold voltage, is achieved without using any component subdivision, and is scalable. The output voltage has satisfactory sensitivity to the temperature and supply voltage. The circuit also has advantages in nano-level power dissipation and compact silicon occupation.
VI. CONCLUSIONS
A CMOS switched-capacitor voltage reference with body biasing technology at subthreshold operation is proposed in this paper. The low output voltage, which is below the threshold voltage, can be achieved with standard technology and without using any component subdivision. In addition, the value can be made scalable. The T.C. is 17.6 ppm/ o C, the line sensitivity is approximately 0.15 %/V, and the circuit consumes power at the nano-level and occupies a small area. The influences resulting from process variations can be effectively suppressed, and the proposed trimming procedure with a composite transistor improves the T.C. when the circuit suffers from D2D variations. The design can ensure a precise voltage reference for applications in subthreshold integrated circuits.
