Abstract-
I. INTRODUCTION
L OW-POWER and low-energy circuit designs using multiple supply voltages have been widely adopted in not only modern VLSI systems but also emerging VLSI systems [1] , [2] . A level shifter (LS) circuit is one of the most important circuits in such systems to correctly communicate with peripheral circuits. We present here a low-power and lowenergy LS circuit that converts extremely low-voltage input into high-voltage output.
Sub-threshold (sub-V TH ) and near-threshold (near-V TH ) circuits have been widely used to achieve low-power and lowenergy dissipation [3] - [9] . These circuits will be implemented with other peripheral circuits that operate at higher supply voltages. Because the supply voltage of sub-V TH and near-V TH circuits is close to or lower than the threshold voltage (V TH ) of a MOS transistor (e.g., below 0.5 V) and those of the peripheral circuits are still high (e.g., 1.8 or 3.3 V), signal communications between each circuit have become difficult when conventional LSs are used [10] .
Emerging applications consisting of battery-less LSI systems have been reported [11] - [18] . They obtain the necessary energy from energy harvesters. However, because the output voltages of the small harvesters are too low for VLSIs to operate (e.g., single photovoltaic (PV) cell: about 0.5 V, thermoelectric generator (TEG): several tens-of-mV), power management circuits, or voltage boost converters, are necessary to generate sufficiently high voltages. For energy harvesting systems to operate with high efficiency, they have to handle extremely low-voltage inputs together with the boosted voltages [15] , [16] , [19] . Because the systems are inherently multiple supply systems, an LS capable of converting those low-voltage signals is strongly required. Fig. 1 (a) depicts a conventional LS [20] . It consists of crosscoupled pMOS transistors and two nMOS transistors driven by low-voltage inputs of IN and INB. The LS operation fails when there is a large difference between the low supply voltage V DDL and high supply voltage V DDH . This is because the drive currents of nMOS transistors become significantly smaller than those of pMOS transistors when the V DDL becomes much lower than the V DDH (e.g., V DDL < 0.5 V and V DDH = 1.8 or 3.3 V).
To realize robust communications between circuits with large supply voltage differences, several LSs and remedies have been reported [21] - [30] . Osaki et al. proposed an LS using a logic error correction circuit (LECC) as shown in Fig. 1 (b) [22] , [23] . Because the LECC operates only when the input and output logic levels do not correspond, the power dissipation of the LS itself can be minimized. However, because the LS uses a two-stage amplifier, the contention between pull-up and pull-down MOS transistors (MP6 and MN8) occurs when the input changes from High to Low. In addition, because the LS does not have a latch structure, the logic level of the output is retained by the leakage currents of MP6 and MN8. Thus, the LS was not able to ensure good retention ability. In addition, because the slew rate of the output becomes much worse due to the lack of the latch structure, CMOS gates connected to the LS dissipate large amounts of energy. Hosseini et al. presented a modified LS as shown in Fig. 1 (c) . By combining a conventional crosscoupled LS ( Fig. 1 (a) ) and an LECC ( Fig. 1 (b) ), it achieves the short transition time and the low-power dissipation [30] . However, because pull-down nMOS transistors (MN3, MN4) [23] , and (c) LS reported by Hosseini et al. [30] .
and cross-coupled pMOS transistors (MP3, MP5) are still driven by V DDL and V DDH , respectively, the same problems as those occurring with the conventional cross-coupled LS remain to be solved.
In light of this background, this paper presents a low-power LS capable of converting extremely low-voltage inputs into high-voltage outputs [31] . In contrast to Matsuzuka et al. [31] , we fabricated a proof-of-concept chip in a 0.18-μm CMOS process to demonstrate the low-energy and low-voltage performance of our LS architecture. The proposed LS consists of a pre-amplifier (pre-AMP) stage with an LECC and an output latch stage. The pre-AMP amplifies input signals with extremely low current dissipation. Then the output latch stage accepts the amplified voltages and converts them into full-swing output voltages. The chip's main features are a minimum input voltage (80 mV into 1.8 V), low energy per transition (0.35 pJ at 0.4-V input and 1.8-V output), and low static power dissipation (0.12 nW).
This paper is organized as follows. Section II presents the operating principles behind our proposed circuit. Section III describes simulated results of the circuit using 0.18-μm CMOS process technology. Section IV shows experimental measured results with fabricated chips. Extremely low-voltage input of 80 mV was successfully converted into high-voltage output of 1.8 V. Section V concludes the paper. Fig. 2) , respectively, the HLECC detects the logic error of the LS and generates current of I R . Because the LLECC accepts a Low signal of INB, it does not generate current of I F . The current I R is transferred to MN3, and node voltage of V R increases. Because the V F is kept at Low by MN6, complementary amplified signals of V R and V F are generated (V R and V F are High and Low, respectively).
When logic levels of IN and OUT are Low and High (i.e., INB and Q are both High in Fig. 2) , respectively, the LLECC detects the logic error of the LS and generates current of I F . Because the HLECC accepts Low of IN, it does not generate current of I R . The current I F is transferred to MN6, and node voltage of V F increases. Because the V R is kept at Low by MN3, complementary amplified signals of V F and V R are generated (V R and V F are Low and High, respectively).
The latch stage accepts voltages of V R and V F and converts them into full-swing output voltages. Because the latch stage includes a positive feedback configuration, it enhances the transition speed and keeps output logic levels stable.
Note that, as discussed earlier, when IN and OUT correspond, neither HLECC nor LLECC generate currents of I R and I F for the pre-AMP. This means that one of voltages to the latch stage, V R or V F , will become a floating node when IN and OUT correspond. For example, when IN is High, V R becomes floating (V F is Low). This arises a concern about the operation stability due to the floating node. However, because another voltage to the latch will be kept at Low (i.e., GND) by the low-voltage input signal through MN6 or MN3, the latch can determine correct output logic of the circuit. If some unexpected noise ever changes the floating node lower than GND, the latch stage toggles internal logic incorrectly. However, the LECC detects logic errors and supplies the operating current until IN and OUT once again correspond. Therefore, the proposed LS can correct logic errors caused by the unexpected noise. 
B. Theoretical Delay Analysis
Delay of the proposed LS is determined by those of the pre-AMP, latch stage, and output inverter. Among them, the delay of the pre-AMP becomes a dominant factor because the pre-AMP is driven by V DDL . In the following, we assume that the delay of the proposed LS is mainly determined by that of the pre-AMP. Fig. 3 shows a simplified schematic of the HLECC in the pre-AMP when logic levels of IN and OUT correspond to Low logic level, where C P1 and C P2 are the parasitic capacitance at the PMOS gate node and output node V R , respectively. When IN changes from Low to High, I IN flows in MN1 and is given by
where
is the pre-exponential factor of the subthreshold current, μ is the carrier mobility, C OX (= ε ox /t ox ) is the gate-oxide capacitance, ε ox is the oxide permittivity, t ox is the oxide thickness, W/L is the aspect ratio, W is the channel width, L is the channel length, η is the subthreshold slope factor, V T (= k B T /q) is the thermal voltage, k B is the Boltzmann constant, T is the absolute temperature, q is the elementary charge, and V THN is the threshold voltage of the MOSFET [32] . The pMOS current mirror accepts I IN and generates the output current I R . The pMOS current mirror has a single-pole transfer function and it is given by
where g mp1,2 is the transcondactance of MP1 and MP2, and τ p is the time constant of the pMOS current mirror (τ p = C P1 /g mp1 ). From Eq. 2, the output current I R (t) with a step response can be derived as
Because I R is applied to the C P2 , I R is also expressed as
From Eqs. 3 and 4, V R (t) is given by
As shown in Eq. 5, the output voltage of the pre-AMP V R increases with time. When V R (t) reaches the threshold voltage of the latch at t = t 1 , the latch stage toggles internal logic and thus we obtain the following equation:
where V R,TH is the threshold voltage of the latch stage. Eq. 6 can be rewritten as
Although Eq. 7 cannot be solved for t 1 , we find that the delay t 1 increases exponentially when V DDL becomes low. More importantly, we also find that the time constant of τ p should be designed as small as possible because the last term of Eq. 7, or τ p (1 − e −t 1 /τ p ), monotonically increases as τ p increases. Therefore, the smaller τ p becomes, the faster V R increases.
III. SIMULATION RESULTS
The proposed LS was simulated in 0.18-μm standard CMOS technology. We designed our proposed LS using both 1.8-V and 3.3-V tolerant transistors (see Fig. 2 ). Threshold voltages of 1.8-V tolerant nMOS and pMOS transistors are 0.44 and -0.46 V, respectively, and those of 3.3-V tolerant nMOS and pMOS transistors are 0.72 and -0.72 V, respectively. Table I lists the transistor sizes of the proposed LS. For comparison, LSs of [23] , [29] , [30] were also evaluated in the same technology.
The simulated waveforms are shown in Fig. 4 . The V DDL , V DDH , and input pulse frequency f IN were set to 0.3 V, 1.8 V, and 100 kHz, respectively. As a load circuit, an inverter was added. The IN, OUT, V R , and V F of the proposed LS are shown. The OUTs of other LSs are also shown for comparison. When the IN changed from Low to High (Fig. 4 (a) ), the V R increased to around 0.8 V. Then the OUT of the proposed LS changed from Low to High with the highest slew rate and smallest delay. When the IN changed from High to Low (Fig. 4 (b) ), the V F increased to around 0.8 V. Then the OUT changed from High to Low. Although the proposed LS had a slightly longer delay time than the LSs of [23] , [30] , it achieved the highest slew rate.
The simulated output delay and energy as a function of V DDL are shown in Figs. 5 and 6. The V DDH and f IN were set to 1.8 V and 10 kHz, respectively. As a load circuit, an inverter was added. Fig. 5 shows the simulated delay. The delay of the proposed LS in higher V DDL (>0.5 V) was slower than those of other LSs. This was because the proposed LS uses two-stage architecture. The delays were increased exponentially as the V DDL decreased because transistors operated in the sub-V TH region of a MOS transistor. However, the delay became comparable to or even faster than those of other LSs in lower V DDL . Fig. 6 shows the simulated energy. The energy per transition increased as V DDL decreased. However, the proposed LS achieved the lowest energy because the latch stage changed the output nodes quickly due to the signals amplified by the pre-AMP. The proposed LS reduced energy by 86% at V DDL = 0.3 V compared with that of [30] .
We investigated the circuit operation against process variation by performing 10k-run Monte Carlo statistical circuit simulations assuming die-to-die (D2D) global variations and within-die (WID) random mismatch variations in all MOS transistors using the parameters provided by the manufacturer, which cover the slow-slow (SS) and fast-fast (FF) process corners by changing the key parameters such as threshold voltage, gate-oxide thickness, channel length and width, and carrier mobility. The V DDL , V DDH , and the input pulse frequency f IN , were set to 0.3 V, 1.8 V, and 10 kHz, respectively. Fig. 7 shows Fig. 7 (a) , the proposed LS successfully changed the OUT from Low to High in all cases. Then, the minimum and maximum delay times were 13.7 ns and 1.79 μs. When the IN changed from High to Low as shown in Fig. 7 (b) , the proposed LS successfully changed the OUT from High to Low in all cases. Then, the minimum and maximum delay times were 18.0 ns and 4.23 μs. We compared the energy of the proposed LS with those of other LSs [23] , [29] , [30] . The simulated distributions are shown in Fig. 8 . The proposed LS achieved the lowest energy. Table II summarizes the results of yield and energy per transition (mean: μ E , standard deviation: σ E , and coefficient of variation: σ E /μ E ). Yields were evaluated to see whether the LS can successfully convert the low-voltage IN into the high-voltage OUT. The proposed LS showed the lowest μ E , σ E , and σ E /μ E .
The physical design of the proposed LS is shown in Fig. 9 . The area occupied 95.6 μm 2 . We evaluated the circuit performance by extracting parasitic devices such as resistances and capacitances from post-layout data. The V DDL , V DDH , and f IN , were set to 0.4, 1.8 V, and 100 kHz, respectively. The results of the energy per transition, the static power dissipation, and the delay time, were 0.24 pJ, 0.15 nW, and 21.4 ns. The simulated waveforms at V DDL = 60 mV, V DDH = 1.8 V, and f IN = 10 Hz, are shown in Fig. 10 . The proposed LS can successfully convert the extremely low-voltage signal of 60 mV into the high-voltage signal of 1.8 V.
IV. EXPERIMENTAL RESULTS
We fabricated a proof-of-concept chip of the proposed LS using 0.18-μm, 1-poly, and 6-metal CMOS technology. Fig. 11 shows a micrograph of our chip and a partial enlarged view of the proposed LS. The area was 95.6 μm 2 . Ten sample chips were measured.
The measured input and output waveforms of the proposed LS are shown in Fig. 12 . The results at V DDL = 60 mV, V DDH = 1.8 V, and f IN = 10 Hz are in Fig. 12 (a) . The proposed LS converted an extremely low-voltage input of 60 mV into a full-swing output even though not all chips were able to operate correctly. Note that in the measurement, five out of 10 chips were able to convert the 60-mV input into a full-swing output. Fig. 13 shows measured count that was able to convert a low voltage input into a 1.8-V output successfully as a function of V DDL in 10 chips. All chips successfully converted an 80-mV input into a 1.8-V fullswing output. Thus, we defined the minimum V DDL as 80 mV. Fig. 12 (b) shows the results. Fig. 12 (c) shows the results at V DDL = 160 mV, V DDH = 3.3 V, and f IN = 100 Hz. All chips were able to convert to 3.3-V output when V DDL was higher than 160 mV. The minimum V DDL of 160 mV was larger than that at V DDH = 1.8 V. This means that the effect of the supply voltage dependence still exists in the proposed LS. However, we confirmed that the proposed LS can convert an extremely low-voltage input into a high-voltage output.
The maximum operating frequency as a function of V DDL and V DDH is shown in Fig. 14 . As V DDL and V DDH increased, the maximum frequency increased. Fig. 15 shows the results at V DDH = 1.8 V as a function of V DDL (10 samples). The maximum operating frequency depended exponentially on V DDL because the delays of the LSs exponentially depend on V DDL in the sub-V TH region. Although the LS can operate below 0.1 V, the maximum frequencies decreased drastically. The maximum frequencies of our 10 samples largely varied. This was because our proposed LS has a two-stage structure consisting of the pre-AMP and latch stage and is therefore susceptible to delay variation. Fig. 16 shows the results at V DDL = 0.2 V as a function of V DDH . Although the LS can operate below V DDH = 0.5 V, the maximum operating frequency decreased drastically. This is because the delays of the LSs have exponential dependence on V DDH in the sub-V TH region. Fig. 17 plots the measured energy as a function of V DDL and V DDH at f IN = 10 kHz. The energy of the output inverter was included in the measurement. As V DDL decreased and V DDH increased, the energy increased. Fig. 18 shows the results at V DDH = 1.8 V as a function of V DDL . In the sub-V TH region, the energy increased as V DDL decreased due to the leakage current. This tendency is the same as the results of the conventional LS [27] , [29] and the simulation results of the proposed LS. Fig. 19 shows the results at V DDL = 0.2 V as a For a fair comparison, a conventional LS [30] was also fabricated with the same technology and sizing. In addition, although we used dual V T transistors in this work (i.e., 1.8-V and 3.3-V tolerant Trs.), the proposed LS can be composed of only 1.8-V tolerant transistors. All 3.3-V transistors in Fig. 2 were replaced with the 1.8-V tolerant transistors in the same technology and sizing. Fig. 20 shows the measured results. The maximum operating frequency as a function of V DDL (V DDH = 1.8 V) is shown in Fig. 20 (a) . The input voltage that the proposed LS was able to convert into 1.8 V was lower than that of the conventional LS of [30] . The maximum frequency of the LS using only a 1.8-V transistor was comparable to that of the proposed LS. This was because the response time of the LS is mainly determined by the pre-AMP and 1.8-V transistors of MN1 and MN4 are the same in both cases. Fig. 20 (b) shows the measured energy at V DDH = 1.8 V and f IN = 10 kHz as a function of V DDL . The proposed LS achieved lower energy than that of the conventional LS of [30] because the latch stage changed the output quickly due to the signals amplified by the pre-AMP. The energy of the LS using only 1.8-V transistors was larger than that of the proposed LS. This was because the leak current of a 3.3-V transistor is less than that of a 1.8-V transistor. However, because it can be composed of a single transistor, it is beneficial for cost saving. Table III summarizes the circuit performance of the proposed LS and others [23] - [30] for comparison. Because the advanced technology is suitable for low-voltage operation, we compare our circuit with the-state-of-the-arts, which use the same and advanced technology, except for [23] . The proposed LS converted the lowest input voltage of 80 mV. The energy of the proposed LS was 0.35 pJ at V DDL = 0.4 V and f in = 10 kHz. The proposed LS also had the lowest static power dissipation without input pulse (0.12 nW).
In this design, we used a 0.18-μm CMOS technology for its low-leakage characteristics and high process stability to show the concept of our proposed LS design. We consider that the proposed LS architecture can be used in advanced CMOS technology nodes. This is because the functionality of our proposed LS is mainly determined by that of lowvoltage digital circuits themselves, as discussed in [23] . If the low-voltage digital circuits themselves can operate correctly, the proposed LS can also operate successfully. If we choose an advanced technology, the leakage current will increase due to the lower threshold voltage. In such cases, we have to consider to use HVT transistor to reduce the leakage power. In 2016, he joined Ricoh Electronic Devices Corporation, Ikeda, Japan. His current research interests are in low-voltage and low-power CMOS analog circuits.
Kyohei Shinonaga received the B.E. degree in electrical and electronic engineering from Kobe University, Kobe, Japan, in 2014, where he is currently pursuing the M.E. degree in electrical and electronic engineering.
His current research interests are in low-voltage and low-power CMOS digital circuits. 
