A body bias generator (BBG) for fine-grained body biasing (FGBB) is proposed. The FGBB is effective to reduce variability and power consumption in a system-on-chip (SoC). Since FGBB needs a number of BBGs, the BBG is preferred to be implemented in cell-based design procedure. In the cell-based design, it is inefficient to provide an extra supply voltage for BBGs. We invented a BBG with switched capacitor configuration and it enables BBG to operate with wide range of the supply voltage from 0.6 V to 1.2 V. We fabricated the BBG in a 65 nm CMOS process to control 0.1 mm 2 of core circuit with the area overhead of 1.4% for the BBG. key words: body bias generator, dynamic voltage frequency scaling, analog-assisted digital, low supply voltage
Introduction
Variability on LSI prevents lowering supply voltage to reduce dynamic power of a system-on-chip (SoC). It is important to compensate variability for low power application. It has been reported that the body bias achieved delay reduction and compensated die-to-die variability [1] . The reference showed that the body bias is an important technique to compensate the variability. The variability is classified into die-to-die and within-die variability. In recent scaled processes, location correlated within-die variability also becomes large [2] . Impact of location correlated within-die variability was measured in an 80-core processor [3] . In the reference, a maximum operating speed of each core was measured on a real chip and its spread was 62% at V DD = 0.8 V between the fastest and the slowest cores on the die. It showed a necessity to tune performance of each core within a chip. Figure 1 (a) shows a concept of a finegrained body bias (FGBB) to compensate within-die variability. Each grain is called "substrate island" in this paper. We assume that an SoC is partitioned into substrate islands and each substrate island has a body bias generator (BBG) as shown in Fig. 1(b) .
In the references [4] - [6] , the BBGs require nominal supply voltage larger than 1 V, This means that additional supply voltage is needed for BBGs as shown in Fig. 1(b) if the core has low supply voltage V DD,core . Extra supply † † The author is also with CREST, JST, Kyoto-shi, 606-8501 Japan.
a) E-mail: kamae@vlsi.kuee.kyoto-u.ac.jp DOI: 10.1587/transfun.E97.A.734 voltage increases the cost. Thus it is important for BBGs to work with the core supply voltage as shown in Fig. 1(c) . However in case of lower supply voltage systems like nearthreshold or subthreshold, it is a challenge to design BBGs without extra supply voltage.
In this paper, we propose a BBG with wide range of supply voltage range from nominal supply down to near threshold for location correlated within-die variability compensation. The proposed BBG generates forward body bias, which reduces threshold voltage and enhances operating speed with a penalty of leakage power.
Additionally, the proposed BBG was also implemented in cell-based logic design procedure as explained in [7] . The BBG was partitioned into unit-height cells and the BBG cells were placed and routed by CAD tools into digital circuits. In the unit-height cell structure, it is inefficient to provide an extra supply voltage for BBGs. The feature of low supply voltage does not require extra voltage domains and suitable for cell-based design flow.
To implement the BBG satisfying the supply voltage requirements, a limitation of amplifier supply voltage is critical. We invented a switched capacitor configuration which holds common mode level of a differential amplifier constant.
We have presented preliminary results on our work [7] and we show a consideration for low supply voltage and detailed implementation of the BBG. We also present a test circuit of the proposed BBG in a 65 nm CMOS process. This is the first BBG operating down to near threshold voltage.
The remaining of this paper is organized as followings. In Sect. 2, we describe requirements of the BBG for the finegrained body bias and necessity of low voltage BBGs. In Sect. 3, we show a strategy for the low voltage BBG design. In Sect. 4, we show an implemented instance of the BBG which satisfies the requirements. In Sect. 5, we show measurement results of the BBG, and conclude in Sect. 6. Additionally, since extra supply increases design difficulty in cell-based design flow, the BBG is preferred to operate under supply lines shared with core circuit. Several BBGs have been developed [1] , [4] but no one satisfies all of the features. In our previous work, we proposed a BBG for finegrained body bias [8] , which achieved the smallest area of the BBG. But it does not work with voltage scaling.
Low Supply Voltage BBG

Switched Capacitor Voltage Follower
In this section, the basic idea to lower the supply voltage for the BBG is described. In a straightforwards implementation a DAC and a voltage follower are used [4] . Lowering the supply voltage of the operational amplifier is challenging since the input common mode voltage is required wide range. We utilized a voltage follower using switched capacitors to lower the supply voltage as shown in Fig. 2 (1).
In the next step, φ 2 become open and φ 1 are closed. V in− satisfies Eq. (2).
Since the frequency of switchings between φ 1 and φ 2 is much faster than a transition of the input voltage, it becomes V C1,φ 1 = V C1,φ 2 in steady state. We can assume virtual connection of the amplifier input V in− = V in+ = V middle and finally output voltage V out becomes Eq. (3).
The output voltage V out does not depend upon the bias voltage V middle . It means that we can choose V middle arbitrarily.
Low Supply Voltage Amplifier
In the previous section, we describe that the proposed configuration of the BBG does not require wide range of the input common mode voltage range. We can design the amplifier and choose input common mode voltage arbitrarily to ensure low supply voltage operation. To achieve supply voltage (V DD ) scaling, we consider these points; the number of stacked transistors in the first stage and the input common mode range. Figure 3 shows an amplifier. The most critical stage in low V DD is the first stage. To maintain every MOSFETs in saturation region, every MOSFET should meet these conditions.
where V thx and V Dsatx are the threshold voltage of MOSFET Mx and drain-source voltage of the MOSFET which causes the channel to pinch-off, respectively. To satisfy Eqs. (4) and (5), the common mode voltage V in,cm satisfies Eq. (6) .
For instance V th = 0.4 V and V Dsat = 0.1 V, it is possible to operate at low V DD of 0.6 V if V in,cm ≈ 0.5 V.
Implementation
In this section, we will show a proposed low supply voltage BBG. To utilize the elements described in the previous section, the BBG has a structure as show in Fig. 4 ; the BBG consists of two pairs of a DAC and a voltage follower, a timing generator, control logic, and a bias circuit. Two switches to tie zero body bias (ZBB) are also implemented for robust startup. Every transistor in the BBG is body-biased by the BBG itself by sharing N-well and P-well with the core circuit. The label V in in the schematic shows the input of the voltage follower. The voltage follower operates as a sampler for the DAC concurrently. Features of our implementation are as below: (1) Lower limit of supply voltage is determined by the amplifier in this configuration. To obtain wide range of the supply voltage, input common mode voltage is kept constant. (2) Since the sensitivity of threshold voltage to body bias (δV th /δV BB , where V BB is the body bias voltage) is small, moderate accuracy of output voltage is acceptable, we can reduce power and area overhead while achieving the required level of accuracy. For instance, the sensitivity to body bias is around 0.3 in the process we will demonstrate, it means that 0.6 V body bias lowers threshold voltage by 0.18 V. (3) In the triple well structure, there are large capacitance between P/N-well and P-substrate. Parameterized strength of output drivers handles phase compensation issue of load capacitance [7] .
DAC
The proposed BBG has a serial charge redistribution DAC. The serial charge redistribution DAC has two advantages: (1) Power consumption of the DAC is easily reduced by scaling sampling rate. (2) It consists less number of elements compared to other types including C-2C DAC and occupies less area.
A resolution of the DAC is 6-bit, which is determined by the timing generator. The resolution is 19 mV at V DD = 1.2 V and 9 mV at V DD = 0.6 V, which is equivalent to 4 mV and 2 mV in the threshold voltage of MOSFETs, and equivalent to 1% and 3% in gate delay, respectively. It shows a possibility to suppress location correlated within die variability less than 3% at V DD = 0.6 V.
To achieve enough linearity less than 1 LSB, the capacitor C DA1 = C DA2 should meet Eq. (7).
where C switch is parasitic capacitor on CMOS switch. We chose minimum size of CMOS switch and small enough capacitor C DA1 and C DA2 satisfying Eq. (7). The area of two capacitors and 3 switches of the DAC is 37 μm 2 and equivalent to 6 DFF cells in this process. Figure 3 shows a schematic of the amplifier. Since the amplifier is required to operate in wide range of supply voltage down to near threshold, number of MOSFETs stacked is also limited. The 1st stage has 3 MOSFETs stacked and 2nd and 3rd stages have only 2 MOSFETs stacked. The load capacitance C L of the amplifier is large and heavily depend on bias voltage, for instance, assuming 0.1 mm 2 substrate island, C L becomes up to 100 pF. The Miller effect of 2 stages magnifies phase compensation capacitor Cc. An output impedance of 2nd stage is small enough and we can neglect a pole composed by the 2nd stage output impedance and 3rd stage input capacitance.
Amplifier
The second stage comprises 2 differential to single-end amplifiers to drive next stage as class B so as to reduce steady state power consumption in the BBG. MP4, MN4, MP5, and MN5 drive MN7 and MP4, MN4, MP6, and MN6 drive MP7. The sizes of MOSFETs are designed to satisfy W P5 /W N5 < W P4 /W N4 < W P6 /W N6 , where W x is gate width of MOSFET Mx, and every MOSFETs have similar gate length. Since crossover distortion caused by the class B configuration is reduced by voltage gain of the first stage, input equivalent crossover distortion is 3 mV in this design, which is small enough comared with LSB of the DAC.
Using the small signal parameters of MOSFETs, we can analyze the amplifier to confirm stability. Detailed anal-ysis for the amplifier is described in Appendix. According to the analysis, two poles P 1 , P 2 , one zero Z 1 , and voltage gain A V are obtained as followings.
mp7 C C (8)
where g mx and g dsx are the mutual conductance and output conductance of MOSFET Mx, respectively, and defining g
, g
. Since the load capacitance heavily depends on a PN junction voltage, the bias point of the highest body bias voltage was considered to compensate phase margin in any V DD .
The amplifier was designed for maximum load capacitance 100 pF. The phase compensation capacitor Cc is implemented with gate-body capacitor of MOSFETs and the phase compensation resistor Rc is implemented with sourcedrain resistance of MOSFETs in linear region as described in Fig. 6 . Sources and drains are also connected to the gates so that the MOSFET is always in cut-off region. Fig. 7 Voltage dependency of the phase compensation capacitor Cc and resistor Rc. sults with both nominal supply voltage (1.2 V) and low supply voltage (0.6 V) were shown on Table 1 . The results show that phase margin is always larger than 45
• in any supply voltage. Figure 8 shows a schematic of a bias voltage (V middle ) generator for the voltage followers.
Bias Circuits
While switches φ VM1 are closed, capacitor CM1 is charged to V DD and capacitor CM2 is charged to V SS . In the next step, switches φ VM2 are closed and it outputs voltage V middle = (C M1 V DD + C M2 V SS ) / (C M1 + C M2 ). Capacitor CM3 holds the output voltage while φ VM2 are open. Since the V middle generator consists CMOS switches, it is able to operate down to near threshold voltage. Figure 9 shows a schematic of a current bias generator for the amplifiers. A switched capacitor CB1 operates as a resistor R B1 = f s,bias · C B1 , where f s,bias is frequency of the switch and C B1 is the capacitance. A filter consisting a switched capacitor CB3 and a capacitor CB4 suppresses switch noise on gate voltage V bias . To generate required bias current t any operating voltage down to near threshold voltage, division factor of a prescaler is programmable. The prescaler changes frequency f s,bias of the switch capacitors to source required current for the amplifier.
Layout
We designed an instance of test circuit of the BBG. The area of the core circuit to be controlled is 0.1 mm 2 . We designed the BBG in cell-based compatible with CAD tools. The cell-based design strategy is explained in [7] .
The BBG was fabricated using a 65 nm CMOS dual-V th process. Figure 10 shows the result of the placement and chip photograph [7] . In the area of 72 μm × 36 μm, we synthesize, place, and route the BBG with a core circuit. We integrate a process monitor [9] to estimate NMOS and PMOS characteristics. Remaining of the circuit contains ring oscillators to evaluate logic gates including inverters, NAND, NOR, XOR, etc. The BBG occupies 0.0014 mm 2 , which consists 128 analog block cells and 239 logic gates. The process monitor and body bias control logic also occupy 0.0012 mm 2 . The total area overhead of body bias is less than 2.6%. Figure 11 shows a step response of the BBG outputs at supply voltage V DD = 1.2 V and V DD = 0.6 V. After the control signal FB-on goes low, the output voltages of N-well (V Nwell ) and P-well (V Pwell ) swing. The response time of 90% is within 2.0 μs including 1.0 μs DAC conversion time with clock f ck = 50 MHz at V DD = 1.2 V and 4.0 μs with clock f ck = 25 MHz at V DD = 0.6 V.
Measurement Results
Differential nonlinearity (DNL) is also measured and is less than 8 mV with forward body bias (FBB) input voltage up to 0.6 V under V DD = 1.2 V as shown in Fig. 12 . Figure 13 shows speeds of fan-out 4 inverters measured with 0.5 V FBB generated by the BBG and without body bias. In scaled supply voltage V DD = 0.6 V, 0.5 V FBB enhances gate speed 170% compared to that without body bias. Table 2 shows the performance comparison of the BBG. The proposed BBG shows the lowest supply voltage in the table and suitable for fine-grained body bias and design automation. The proposed BBG also shows the best performance in terms of area overhead and power consumption.
Conclusion
In this paper, a forward body bias generator for fine-grained body bias has been presented. The BBG operates with low supply voltage of 0.6 V and it makes a significant improvement for design portability and implemented with cellbased design flow. For the low supply voltage operation, a switched capacitor voltage follower with constantly-biased operational amplifier was introduced.
We also show a test circuit of the BBG which is automatedly placed and routed with the core circuit. The BBG and C gp7 g dsp6 +g dsn6 much higher than unity gain frequency (UGF), we can neglect these poles and handle as a large gain single stage amplifier.
mp7 V 1 −g
mp7 V 2 −I out (I out > 0) (A· 7)
mn7 V 1 −g
mn7 V 2 −I out (I out < 0) (A· 8)
where g
mp7 = g mp7 g mp6 g dsp6 +g dsn6
mp7 = g mp7 g mn6 g dsp6 +g dsn6
mn7 = g mn7 g mn5 g dsp5 +g dsn5
. For the simplicity, we choose design parameters to satisfy linearity over every I out ; . Following these analysis and neglecting poles located much higher than UGF, we can apply an analysis for multipath miller zero cancellation described in [10] . Two poles P 1 , P 2 , a zero Z 1 , DC gain A V are obtained as followings.
mp7 C C (A· 9)
C L (A· 10) 
