Abstract-A low-power level shifter capable of up-converting sub-50 mV input voltages to 1 V has been implemented in a 28 nm FDSOI technology. Diode connected transistors and a single-NWELL layout strategy have been used along with poly and back-gate biasing techniques to achieve an adequate balance between the drive strength of the pull-up and pull-down networks. Measurements showed that the lowest input voltage levels, which could be upconverted by the 10 chip samples, varied from 39 mV to 52 mV. Half of the samples could upconvert from 39 mV to 1 V. The simulated energy consumption of the level shifter was 5.2 fJ for an up-conversion from 0.2 V to 1 V and 1 MHz operating frequency.
I. INTRODUCTION

S
UPPLY voltage scaling has received an intense attention as an efficient approach to meet the growing demand for energy and power efficiency in battery-operated and energyharvested wireless systems. Multiple voltage domains are usually employed to address issues such as speed, area and robustness of such systems. Level shifters (LS) are required in a multiple-V dd system for correct communication between different voltage domains. For example in [1] , the ultra-low voltage sub-system was comprised of 14 different voltage domains, where an ARM Cortex-M0+ processor had a minimum operating voltage of 250 mV.
In the cross-coupled LS of Fig. 1(a) , when the 'IN' signal switches from low to high (high to low) M1 (M2) pulls node n1 (n2) down and then M4 (M3) turns on trying to pull node n2 (n1) up. If V ddL is much lower than V ddH , the drive current of the NMOS devices (when having comparable aspect ratios with those of the PMOS transistors) is very low compared to that of the PMOS devices and n1 (n2) will slowly be lowered down. Thus, M4 (M3) turns on and charges n2 (n1) to V ddH very gradually, and hence, M3 (M4) slowly turns off. This, in turn, results in a contention between the pull-up network (PUN) and the pull-down network (PDN). When the input transistors of a conventional cross-coupled LS are in deep-subthreshold region, the NMOS devices must be excessively upsized [2] to overcome the drive strength of the PUN. Several solutions have been proposed to overcome this obstacle. Most of the proposed techniques are based on the cross-coupled or the current mirror structures, and multithreshold CMOS (MTCMOS) [3] - [5] techniques have been used to make a better balance between drive strength of the PUN and the PDN. The cross-coupled PUN has been replaced by a current mirror load in [6] to reduce the minimum convertible input voltage level. The main drawback of this solution is its high static power consumption [2] . A Wilson current mirror (WCM) and a modified-WCM (MWCM) has been used in PUNs of the level-shifters proposed in [2] and [7] , respectively, to reduce the leakage. Power consumption of the LS circuits based on WCM [2] and MWCM [7] shows a rapid rise for applicable input voltages lower than roughly 300 mV. The implemented WCM-LS in a 28 nm FDSOI technology also shows a rapid rise in the propagation delay for deepsubthreshold voltages [8] . This is not a desirable characteristic for an LS to be used in up-converting the output signals of circuits operating at minimum possible V dd [9] .
A two-stage LS has been introduced in [10] , where a top diode connected device weakened the strength of the first stage PUN. The two-stage LS in [10] suffers from relatively large delay.
Diode connected transistors between the pull-up PMOSs and the pull-down NMOSs has been proposed in [11] to limit the PUN current at the beginning of a transition. In [12] , header off-biased PMOS devices have been used in conjunction with the current limiter diode connected transistors between the pull-up PMOSs and the pull-down NMOSs to decrease the minimum convertible voltage level further. Moreover in [12] , the gates of PMOS and NMOS devices in the output inverter have been connected to different nodes to reduce the leakage current. While achieving low-power consumption, the approach of [12] needs wide off-biased PMOS transistors.
In this brief, the top diode connected devices are proposed to be used along with the current-limiter diodes. Additionally, a single-NWELL (SNW) configuration together with backgate and poly biasing techniques create an adequate balance 1549-7747 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. between the drive strength of the PUN and PDN. This enables the proposed topology to upconvert extremely low input voltages to 1 V. According to the post-layout simulation results, the energy per transition of the LS was 5.2 fJ at 0.2 V and 1 MHz. The measured minimum V ddL of the LS was in the range of 39 mV to 52 mV for ten measured samples. To the best of the authors knowledge, this is the lowest reported V ddL so far.
This brief is organized as follows: Section II describes the proposed level shifter circuit. The simulation and measurement results are shown in Section III. Section IV discusses the results, and the conclusion is drawn in Section V.
II. PROPOSED LEVEL SHIFTER
A. Operation Principle Fig. 1(b) shows the schematic of the proposed level shifter. The diode connected transistors M7-M8 are used to decrease the source-gate voltage of the M5-M6 transistors. Additionally, M3 and M4 transistors restrict the PUN current at the beginning of a transition because V SG s of M3 and M4 do not change instantaneously [11] . In order to reduce the leakage current of the output inverter, as proposed in [12] , the gates of M9 and M10 are not connected to the same node.
The transistor dimensions are listed in Table I . The channel lengths are shown with L min +L PB to indicate the applied polybiasing [13] . L min is the minimum drawn gate length and L PB is the applied poly biasing. The poly biasing technique allows to increase the effective gate-length by up to 16 nm without increasing the active area. In order to reduce the drive strength of the PUN, the channel lengths of the PMOS transistors in the main conversion stage (i.e., M3-M8) were increased by 8 nm using poly-biasing. A poly-biasing of 4 nm was also exploited in transistors M1 and M2 to decrease the leakage current. In the steady state when IN is '0', voltage at node N4 will be V ddH − |V DP | and V SG of M10 will be |V DP |, where |V DP | is the voltage drop across the diode connected transistor. This causes a rise in the leakage current flowing through M10 which, at the same time, is assumed to be in the off-state. Thereby, to limit this leakage current, the effective gate length of M10 was increased by 16 nm by poly-biasing. To balance the driving strengths of M10 and M9 a poly-biasing of 8 nm was applied to M9 as well.
Although the top diode connected transistors cause an increase in the leakage current flowing through M10, utilization of these transistors leads to a reduction in the leakage current of the main conversion stage. The reduction in the leakage current of the main stage is due to the fact that the absolute values of the drain-source voltages of M1-M6 decreases by employing M7 and M8. Moreover, the capability of the LS to convert extremely low level input voltages to higher voltages improves because the source-gate voltage of M5-M6 devices decreases by adding these two transistors.
To investigate the above-mentioned issues further, the static power, delay and capability of the level shifter to convert low level voltages to 1 V were simulated with, and without, the M7 and M8 transistors. The extracted netlists from the layout of the circuits (considering parasitic capacitors and resistors) were used during the simulations. Fig. 2(a) shows the static power consumption of the level shifter versus V ddL with, and without, M7 and M8 transistors. As can be observed from Fig. 2(a) , a substantial reduction in the static power consumption was achieved by adding the M7 and M8 transistors. For example, the static power dissipation decreases by 45% at a supply voltage of 200 mV.
As illustrated in Fig. 2(b) , the LS with the top diode connected devices is slower, and for higher V ddL s, differences in delays are larger. In order to investigate the impact of M7 and M8 on the yield of the level shifter at each V ddL , we carried out 2000 Monte Carlo mismatch simulations (to achieve 3σ accuracy) at all five process corners. To estimate the yield, we applied a pulse signal with a frequency of 1 KHz and 50% duty cycle to the LS input while V ddH was 1 V. The yield of the level shifter at each V ddL was calculated by counting the number of outputs that were correctly converted from V ddL to the V ddH voltage domain. The simulations were performed at room temperature. The distinct advantage of employing M7 and M8 transistors to up-convert extremely low level input voltages is evident from Fig. 3 .
B. Level Shifter Layout
The MTCMOS technique has been exploited to design subthreshold to above-threshold level shifters [3] - [5] . Since the input devices have low V GS , devices with lower threshold voltage have been used at the input. In this brief, a SNW configuration was utilized to implement the proposed level shifter. This means that all the M1-M10 transistors were implemented in the NWELL, i.e., low-threshold (LVT) NMOS transistors and regular-threshold (RVT) PMOS transistors [13] . The NWELL ties were connected to the high supply voltage (V ddH ). By doing so, a forward back-bias was provided to the NMOS devices and a reverse back-bias was provided to the PMOS devices which, in turn, provides a better balance between the strength of the pull-up and the pull-down transistors, and provides a reverse bias to the parasitic diodes between NWELL and PWELL/P-substrate. In Fig. 4 , vdds and gnds represent the back-gate voltages of PMOS and NMOS devices in the subthreshold logic cells. All the polysilicon polygons have single direction and a channel width of 200 nm was used for all the M1-M10 transistors to have regular layout and relatively good matching properties. Fig. 4 (a) and (b) depicts two layout approaches to implement the level shifter. Both techniques are compliant with the layout of our custom logic cell library. The cell height in the layout strategy shown in Fig. 4(a) is slightly above twice that of the custom logic cells and has two separate NWELLs for PMOS and NMOS devices. We used this approach in the implementation of the LS in silicon. The layout strategy shown in Fig. 4(b) , on the other hand, shares a single NWELL between both NMOS and PMOS devices and has a same height as the height of the custom logic cells.
III. RESULTS
A. Simulation Results
The simulated performance of the proposed LS in terms of delay, total and static power consumption are presented in this section. We used an extracted netlist from the layout view for all the simulations, and parasitic capacitors and resistors were taken into account during the netlist extraction. The temperature was set to be 27 • C and V ddH was set to be 1 V for all the simulations. The low supply voltage increased with 50 mV steps from 50 mV up to 500 mV. The load of the LS was a 1x strength inverter (L p = L n = 30 nm, W p = 300 nm, W n = 200 nm). An input pulse with 1 MHz frequency, 50% duty cycle and 10 ns rise/fall time was applied to the LS input to measure its power consumption and delay. The reported static power consumption in this brief is the average of the static power when the input was set to 0 and V ddL .
The propagation delay, the total power and the static power consumptions of the proposed LS, at five different process corners, are shown in Fig. 5 as a function of V ddL .
To verify robustness of the LS circuit against the within-die random process variations, 2000 mismatch post-layout simulations at the typical process corner (TT) were carried out. The simulations were performed at 50, 100, 150 and 200 mV to investigate the variability of the circuit parameters as a function of V ddL . The yield of the up-conversion at 50 mV and 100 mV was not 100%. Therefore, only results of the MC runs for which the LS had correct functionality were included at 50 and 100 mV. The distribution of the propagation delay, total and static power consumptions at four different V ddL s are shown in Fig. 6, Fig. 7 and Fig. 8, respectively. 
B. Measurement Results
The proposed LS circuit was fabricated in a 28 nm FDSOI technology. Fig. 9(a) shows the block-diagram of the test circuit. The output signal of an 8-bit subthreshold multiplier was applied to the LS input. The input to the LS was provided by the 9X inverter to avoid limiting circuit performance by too high rise/fall times at its input. The duty cycle of the input signal was 29%. The supply voltage of the multiplier circuit was 180 mV, and the LS input voltage level was controlled by adjusting V ddL . The LS output was buffered through an inverter chain, designated to drive the capacitive load of the output pad and oscilloscope.
We used an HP 6632A and a MASCOT 719 DC power supply to generate the supply voltages of the LS, and the input/output signals were captured by a ROHDE & SCHWARZ RTE 1022 oscilloscope.
We measured ten chips to evaluate the capability of the LS to up-convert subthreshold signals to nominal 1 V. Fig. 9(b) illustrates the measured waveforms of the LS output (LS_out) and buffer output (BUF_out) when V ddL was set to be 39 mV and V ddH was 1 V. Five samples among the ten measured chips were able to up-convert a 39 mV input to 1 V, and V ddLmin of nine samples were equal or below 50 mV. The minimum convertible input voltage level was 52 mV for one sample.
IV. DISCUSSION According to the simulation results of Fig. 5 : (I) In the subthreshold region (i.e., V ddL <≈ 350 mV), the propagation delay of the designed LS increases exponentially by lowering V ddL . This is because of the reduction in current drive capability of the M1, M2 and M11-M16 at lower V ddL s.
(II) The slew-rate of the LS input inverter decreases exponentially by lowering V ddL . This contributes to a significant rise in the short-circuit power, and consequently the total power consumption of the LS, for V ddL s smaller than 150 mV. (III) The static power is relatively flat for V ddL s larger than 100 mV. However, the deviation of the low output voltage level (V OL ) of the input inverter from its ideal value of 0 causes a notable increment in the leakage current flowing through the M1 and M2 transistors, particularly in the FF (fast-NMOS, fast-PMOS) process corner.
The proposed LS circuit has a robust operation regarding the variability (σ/μ) of LS characteristics achieved from statistical simulations, and measured minimum V ddL across ten chip samples. The changes in variability of the parameters shown in Fig. 6, 7 and 8 for different V ddL s can be explained as:
(I) The propagation delay depends on the I DS of devices in subthreshold region (i.e., M1, M2 and M11-M16), where the drain current depends exponentially on the gate-source voltage (I ds ∝ exp(V GS − V th )/nV t ) [14] , where n is the inverse slope of the subthreshold current and V t is the thermal voltage. Therefore, the on-current variability can be explained as σ I on /μ I on = (e (σ V th /nV th ) 2 − 1) 0.5 [15] . Moreover, the slope factor (n) decreases as V ddL reduces. Consequently, the delay variability of the circuit increases with lowering V ddL .
(II) In the case of a constant operating frequency of 1 MHz, when dynamic power is dominant, the total power consumption has a quadratic dependence on V ddL (i.e., αcfV 2 ddL ) and hence its variability is relatively independent of V ddL . However, the short-circuit component of the power consumption depends on the rise/fall time of the input inverter and has a significant increase at 100 mV, and 50 mV (Fig. 5(b) ). The variation in the rise/fall time of the input inverter also increases by lowering V ddL . Likewise, σ/μ of the total power is notably larger at 50 and 100 mV as shown in Fig. 7 .
(III) As can be observed from Fig. 8 , there is not a major difference in the variability of the static power for V ddL s of 100 mV, 150 mV and 200 mV. This is due to the fact that during leakage current measurements the gate-source voltage of the devices is 0 V. A constant gate-source voltage, in turn, results in a constant subthreshold swing factor (n), and hence, off-current variability (σ I off /μ I off = (e (σ V th /nV th ) 2 − 1) 0.5 ) does not change over V ddL . Nevertheless, for V ddL = 50 mV, the deviation of the V OL of the inverter from its ideal value of 0 V leads to a rise in the static power of the LS (Fig. 5(c) ). The variations of the V OL then increases the variability of the static power for extremely low V ddL s. Table II makes a comparison between this brief and several state-of-the art designs. Apart from the minimum V ddL , reported characteristics of the proposed level shifter in Table II are post-layout simulation results at the TT process corner and room temperature. The minimum convertible voltage of the proposed level shifter and its energy per transition (at 1 MHz) are the lowest among the circuits in Table II . A variant of the LS in [20] achieves an extremely low energy per transition of 4.2 fJ at the cost of higher minimum V ddL (120 mV). The designed LS has second shortest delay after [19] in Table II , while at 0.2 V it has the shortest reported delay among the listed LSs. The relatively high leakage of the designed LS was not unexpected considering its shortest channel lengths among the other designs in Table II , and the LVT input devices (M1, M2) with forward back-bias. From the simulation results shown in [21] , it can be seen that even the static power of the LS in [17] (which has the lowest P static in TABLE II) can be in the range of few nW in the same 28 nm FDSOI technology. This is a cost of having the capability to up-convert extremely low-voltages to nominal voltages. However, the static power consumption of the proposed LS can be reduced by adding additional sleep transistors as shown in [12] . The LS circuit in [19] has the smallest area among the circuits in Table II . Nevertheless, we avoided using minimum sized devices to minimize variations, and the layout of the proposed LS is compliant with the layout of a custom cell library we developed for subthreshold applications. This results in an added height in the layout of the circuit.
V. CONCLUSION A level shifter topology which uses top diode connected devices together with current limiter diodes was proposed. Moreover, SNW, poly-biasing and back-gate biasing techniques were used to achieve a balanced PUN and PDN at extremely low V ddL s. The proposed sub-100 mV to 1 V LS was fabricated in 28 nm FDSOI technology. The average minimum convertible voltage level of 43.5 mV was achieved across 10 chip samples. The area of the LS was 16.6 or 8.6 μm 2 depending on the chosen layout strategy. At 0.2 V and 1 MHz, the energy and the delay per switching of the design LS was 5.2 fJ and 10.1 ns, respectively.
