A new solution for an ultra-low-voltage, ultra-low-power operational transconductance amplifier (OTA) is presented in the paper. The design exploits a three-stage structure with a Reversed Miller Compensation Scheme, where the input stage is based on a non-tailed bulk-driven differential pair. Optimization of the structure for very low supply voltage is discussed. The resulting amplifier outperforms other ultra-low-voltage OTAs in terms of a DC voltage gain and power efficiency, expressed by standard figures of merit. Experimental verification using a 0.18 µm CMOS technology, with supply voltage of 0.3-V, showed a dissipation power of 13 nW, a DC voltage gain of 98 dB, a gain-bandwidth product of 3.1 kHz and an average slew-rate of 9.1 V/ms at 30 pF load capacitance. The experimental results agree well with simulations.
I. INTRODUCTION
The increasing demand for ultra-low-power electronic systems, entails an increasing interest in the design of analog and mixed signal circuits, powered with very low supply voltages, often much lower than 0.5 V. These new designs include operational amplifiers [1] - [8] , linear transconductors [9] , [10] , differential-difference amplifiers [11] hysteretic comparators [12] , and many other building blocks. In order to ensure sufficient voltage swing at such extreme supply conditions, a bulk-driven (BD) technique [13] , [14] is often considered. Even though the BD transistors suffer from a reduced transconductance, that entails larger input noise and offset, this technique seems to be still attractive in such cases when the supply voltage (V DD ) is comparable with the threshold voltage (V TH ) of the used MOS transistors and a large input common-mode range (ICMR) is required at the same time.
One of the most important analog building blocks in integrated systems is the operational transconductance amplifier (OTA). In recent years a number of deep sub 0.5-V OTAs have been shown in the literature [2] , [4] , [6] - [8] . It is worth mentioning, that the presented OTAs usually show moderate value of the DC open-loop voltage gain. In most cases it is associated with the use of low-V TH CMOS processes, that entails reduced intrinsic voltage gains of MOS transistors, The associate editor coordinating the review of this manuscript and approving it for publication was Cihun-Siyong Gong .
as well as smaller g mb /g m ratios [2] . In order to overcome this issue, specific layout design techniques combined with a partial positive feedback [2] , three-stage structures [4] , [8] or other gain-boosting techniques can be used [6] . Nevertheless, despite special design approaches, the achieved voltage gains in most cases are rather moderate, ranging from 43 dB [4] to 70 dB [8] . Only in some cases a slightly larger voltage gain, exceeding 70 dB, can be achieved, using partial positive feedback or auxiliary current-mode amplifiers [6] , but at the cost of a reduced gain-bandwidth product (GBW) and slew-rate (SR).
In this work a new solution for a high-gain (ca. 100 dB) deep sub 0.5-V OTA is presented. The design is based on a three-stage structure, where the input differential pair is realized using the idea of non-tailed differential stage [15] , that can operate from very low relative supply voltages (V DD /V TH ), while offering improved DC voltage gain and SR. Consequently, an ultra-low-voltage (ULV) three-stage amplifier was designed using standard 0.18 µm CMOS process with relatively large intrinsic voltage gains of MOS transistors. The achieved voltage gain of the considered structure approaches 100 dB, while operating from V DD as low as 0.3 V. Moreover, the achieved GBW and SR to supply current ratios are competitive to the ones achieved for other OTAs presented in literature with similar V DD . The three-stage amplifier was frequency compensated using similar approach as presented in [16] for a 1-V gate-driven amplifier. Nevertheless, the ULV design requires larger transistor sizes, that leads to larger parasitic capacitances affecting the circuit performance. Hence, the design constraints associated with an ULV environment as well as the optimization of such a structure under sub-V TH supply are discussed in the paper.
The proposed circuit can serve as a general-purpose OTA in ULV amplifiers, active filters, analog to digital converters and other applications requiring high-gain OTAs.
The rest of the article is organized as follows. In section II a general description of the OTA is presented. In section III, the design constraints in an ULV environment are discussed. Experimental and simulation results are shown in section IV. Finally, the paper is concluded in section V.
II. CIRCUIT DESCRIPTION
The structure of the presented OTA is shown in Fig.1 . The circuit consists of three stages. Its simplified block diagram is depicted in Fig. 2 , where g mi , R oi and C oi are the i-th stage transconductance, resistance and equivalent output capacitance respectively, while g mFF stands for the transconductance of the feedforward path implemented by M 10 . The overall structure can be considered as a Reversed Nested Miller Compensation (RNMC) topology [17] - [19] . The RNMC structures are often implemented with the use of a special nulling resistor, however, in this design the nulling resistor was removed, since its value would be un-practically large, in the range of tens of mega ohms. The resistorless
version of a 3-stage RNMC OTA, devoted to driving large capacitive loads was previously discussed in [16] .
The first stage of the presented structure (M 1 -M 4 ) is based on a non-tailed bulk-driven differential amplifier [15] . Despite the non-tailed architecture, the circuit behaves as a truly differential amplifier, with good common-mode rejection (CMRR) and power supply rejection (PSRR) ratios. Compared to the BD differential pair biased with the same total current and the same sum of transistor channel areas, the circuit offers improved voltage gain (+ 6 dB), slew-rate and lower minimum V DD , while showing the same input referred noise and offset [7] , [15] . The use of such a circuit allows realizing amplifiers with very low V DD ( 0.5 V) and rail-to-rail ICMR, while using CMOS processes with standard V TH voltages of around 0.4-0.5V, low leakage currents and relatively high intrinsic voltage gains of MOS transistors. All these features allow increasing the overall voltage gain of an ULV OTA.
The second stage of the amplifier is formed by the transistor M 6 , loaded with the current source based on the transistor M 5 . The output stage (M 7 -M 10 ) [16] , [20] operates in class AB, that increases its current driving capability and SR. The capacitances C C1 and C C2 are used for frequency compensation. Transistors M 11 -M 13 form the biasing circuit.
A. SMALL-SIGNAL PERFORMANCE
The small-signal analysis of the structure in Fig.2 . has been performed in [16] . For simplicity it was assumed that C C1,2 , C L C oi and g mi R oi 1, i=1. . .3 (C L being the load capacitance of OTA). With the above assumptions the circuit is described by a third order transfer function with two zeros, one located in the left half-plane (LHP) and the other one in the right half-plane (RHP) [16] . The DC voltage gain of the circuit can be approximated as:
while the dominant pole p 1 and the gain-bandwidth product (ω GBW ) are respectively given by:
As it was shown in [16] , assuming g mFF = g m3 , C C2 < g m2 g m3 C C1 /g 2 m1 and C L C C1 , the phase margin can be approximated as:
From (4), the compensation capacitor C C1 can be calculated using the following formula [16] :
Assuming operation in a weak inversion region and neglecting the second-order effects, the transconductance g m1 in Fig. 2 can be expressed as:
where η =g mb /g m is the bulk to gate transconductance ratio at a given operating point for transistors M 1A,B , I D1 is the quiescent drain current of these transistors, n p is the subthreshold slope factor for a p-channel MOS and V T is the thermal potential. The transconductances g m2 and g m3 in Fig. 2 are given by:
Note, that the feedforward transconductance g mFF is equal to g m3 .
B. IMPACT OF STRAY CAPACITANCES
Ultra-low-voltage (sub V TH ) design entails large transistor sizes to maintain both, low |V GS | voltage drops at the operating point, as well as sufficiently high output resistances of transistors at reduced V DS voltages. The large transistor sizes entail larger parasitic capacitances C oi . Therefore, using the formulas developed in ideal case could lead to large errors. The most critical is the impact of the parasitic capacitance C o2 , since outputs of the first and the third stages are shunted with large capacitances C C1 , C C2 and C L (in addition, in case of the first stage the capacitances C C1 and C C2 are multiplied by the Miller effect). Non-ideal analysis with a non-zero value of C o2 give the same formulas for tan and C C1 as developed for ideal case, however, in (4) and (5) the capacitance C C2 should be replaced with C C2 = C C2 + C o2 . Note, that C o2 limits the minimum value of C C2 , which next limits the minimum value of C C1 and consequently lowers the GBW product of OTA.
C. SLEW-RATE PERFORMANCE
Slew-rate in multi-stage amplifiers is limited by the slowest stage. For the considered structure its value can be expressed by:
;
is the maximum output current of the i-th stage of the OTA. Assuming that the SR is not constrained by the |V GS | voltage drops across the diode-connected transistors, in the considered design the negative slew-rate (SR-) is limited rather by the first stage and can be estimated as:
where I o1max is given by:
where V inm is the amplitude of the input step and its maximum value is equal to V DD . It is worth noting, that the maximum value of I o1max is achieved only in the first phase of falling of the output signal. As the output voltage (in a voltage follower configuration) decrease, the output current of the first stage decrease as well, which slows down the falling process. Thus, the real SR-can be lower than calculated from (10) . The positive slew-rate (SR+) is limited rather by the intermediate stage and can be estimated as:
As it can be concluded from (10) and (12) in general case one can expect an unsymmetrical large-signal behavior of the amplifier.
D. INPUT NOISE AND DYNAMIC RANGE
The input noise of the OTA is determined by its input stage. Given that M 1 is identical with M 2 (I D1 = I D2 ), which guarantees optimum noise performance [15] , the input-referred thermal and flicker noise densities (v 2 t andv 2 1/f respectively) in a weak inversion region can be expressed as:
where K fn , K fp are the flicker noise constants for n-and p-channel transistors respectively and the other symbols have their usual meaning. Considering only the thermal noise (which is dominant in most cases due to the low biasing currents) and assuming the maximum amplitude of the input signal equal to V DD /2, the biasing currents of the input stage (I D1 = I D2 ) for the assumed dynamic range (DR) in a voltage follower configuration can be calculated using the following equation [6] :
Thus, the required biasing current I D1,2 is proportional to the GBW product of the amplifier and to the square of the DR. On the other hand, the current I D1,2 is inversely proportional to the square of the parameter η, so maintaining the same DR and GBW using the BD approach entails worse power effectiveness of the overall structure as compared with the gate-driven (GD) approach.
III. OTA DESIGN
Due to the ULV operation of the considered structure, where the input noise and DR are one of the main concerns, the design has been started with calculating the value of the required biasing currents of the input stage, using eqn. (15) . This determined also the value of the input transconductance g m1 [see (6)]. Assuming DR = 60 dB (η = 0.36, n p = 1.35,)), from (15) the current I D1 was found to be 2.5 nA for V DD = 0.3V.
As it was shown in [16] , to achieve power optimization, the transconductance g m3 should be equal to 2g m1 + g m2 , while it is suggested to choose the g m2 /g m1 ratio between 4 and 7.
Taking the above into account, from (6)-(8), we achieve:
Thus, with n n /n p ≈ 1, assuming g m2 /g m1 = 5.56 we set the value of I D5,6 to 10 nA. Consequently, from (17) the value of I D9,10 was calculated to be 13.6 nA, which was rounded up to 15 nA, to achieve the I D9,10 /I D1 ratio to be an integer number.
For the given biasing currents of OTA the circuit was dimensioned during the simulation phase. The channel lengths of transistors (L) were chosen relatively large to increase their output resistances, and hence their intrinsic voltage gains. Only for M 7 and M 9 the channel lengths were lower, to decrease their C gs capacitances, and consequently increase the frequency of the parasitic pole p p , associated with the drain/gate node of M 7 , which should be located well above the GBW product of OTA. The current I D7,8 was chosen to be I D9,10 /4. This value was also a result of a compromise between the total dissipation power and the frequency of p p . For the assumed L, the channel widths (W) of all transistors were adjusted to achieve |V GS | ≈ V DD /2, which results in a maximum voltage headroom for possible changes of |V GS |, caused by the process and temperature variations. It is also worth noting, that minimum channel areas for transistors in the input stage are constrained by the required offset of OTA and flicker noise.
Once the circuit was dimensioned, the parasitic capacitance C o2 was estimated to be 0.31 pF. The capacitance C C2 was then assumed to be 0.6 pF, i.e. the capacitance C o2 was around 1/3 of the sum C C2 + C o2 , and the nonlinear capacitors C bd5 + C bd6 were around 10 % of this sum. Next, from (5), replacing C C2 with C C2 ' and assuming C L = 20pF, = 68 • , the capacitance C C1 was calculated to be 2.4 pF. Thus, the anticipated GBW was 3.45 kHz.
IV. RESULTS AND COMPARISON A. IMPLEMENTATION
The circuit has been implemented in a 0.18 µm CMOS process from TSMC, with threshold voltages of around ±0.5V. The supply voltage was 0.3 V (±0.15 V during measurement) and the biasing current I B was 2.5 nA (provided externally). The circuit performance was also tested for V DD = 0.5V/I B = 40nA. The transistor aspect ratios are shown in Table 1 . The circuit was dimensioned as described in section III. The small-signal parameters of the design are summarized in Table 2 , while Fig.3 shows the microphotograph of the fabricated OTA. 
B. EXPERIMENTAL RESULTS
Below, the measured performance of the test chip is presented. Since the parasitic load capacitance of our printed circuit board exceeded 20 pF, the test chip was measured for C L = 30pF. This capacitance was fine-tuned and measured with a precise RLC meter E498A from Agilent. Fig.4 shows the open-loop frequency responses of OTA, measured for two supply voltages (0.3 V and 0.5 V) and two The value of a DC voltage gain was found to be 98.1 dB and 103.6 dB for a 0.3-V and 0.5-V version respectively. Figs. 5 and 6 show the selected small-signal parameters (GBW, , A vo ) against the input common-mode voltage V icm for V DD = 0.3V. The quiescent output level of V out was equal to V icm . The measured A vo remained larger than 85 dB for V icm ranging from 50 mV to 250 mV, i.e. 50 mV for each supply rail. The measured variations of GBW and remained relatively small (±4.5 % for GBW) with V icm ranging from 50 mV to 250 mV, that proves small variation of the input transconductance of the OTA. Due to the limited voltage swing, and operation with V BS < 0 in the whole input range, the relative variations of the input transconductance were lower than observed for other BD input stages supplied with larger V DD [21] . Fig. 7 shows the large-signal step responses of the OTA for an input step of V DD -50 mV peak-to-peak. The overshoots and oscillations of the responses are on acceptable level. Both responses showed SR+ > SR−, that was in agreement with theoretical expectations (see (10) and (12)). In Fig. 8 the sine-wave responses for rail-to-rail input are presented, showing, that the input and output VOLUME 8, 2020 common-mode ranges are both rail-to-rail. The Total Harmonic Distortion (THD) for a 0.3-V version was 0.49 % for V in = 250 mV pp and f=10 Hz. Table 3 summarizes the simulated and measured performances of the OTA for two supply voltages. The circuit was simulated for C L = 20 pF (value used in design) and C L = 30 pF (used in measurements). The simulated value of for C L = 20 pF was 63.7 • , and was quite close to the value predicted theoretically (68 • ). The small difference between the theoretical and simulated values may be attributed to the second order effects, mainly the impact of the parasitic pole p p , associated with M 7 . Note, that neglecting the C o2 in the design leads to a phase error exceeding 10 • .
The values of the measured A vo , GBW and other parameters were in good agreement with simulations. Larger differences were observed only for CMRR and PSRR, however, both parameters remain on acceptable levels. The simulated DC voltage gains for V DD = 0.5V were larger than for V DD = 0.3V due to the larger V DS voltages of MOS transistors at the operating point, that entailed slightly larger output resistances of the devices.
The input referred noise was only simulated because of the lack of the proper noise meter. The corner frequency for flicker noise was relatively low (32 Hz) due to the low biasing currents (larger thermal noise) and large transistor channel areas. The total input noise integrated from 0.1 Hz to GBW was 109 µV and was dominated by thermal noise.
The measured input current of OTA was lower than 100 pA for V DD = 0.3V and V icm = 0 (i.e. −0.15 V during measurement, where ±0.15 V supply was used). They are equal to the currents of the forward-biased bulk-source junctions of the input transistors M 1 /M 2 and decrease exponentially with V icm . Table 4 shows the results of Monte Carlo (MC) analysis for V DD = 0.3V. As it can be concluded from the results, the design is robust against transistor mismatch.
In order to show the circuit sensitivity against process and temperature (P/T), the simulated results of corner and temperature analysis for V DD = 0.3 V are shown in Table 5 . The results of the simulations prove, that the circuit is robust also against P/T variations. Larger changes were observed only for SR, which may be attributed mainly to the variations of |V GS | voltages of transistors, that affected the maximum output currents of the amplifier gain stages in an ULV environment. Table 6 presents a comparison of the proposed OTA with other fabricated ultra-low-voltage OTAs (V DD ≤ 0.7 V). As it can be concluded, the proposed design offers the best DC voltage gain, much higher than achieved in other designs and one of the lowest relative supply voltages (V DD /V TH ), the same as achieved in the best so far designs [6] , [7] . The noise properties of the compared circuits differ significantly, since the input noise highly depends on the biasing currents (GBW product of OTA), and the type of the input stage (BD or GD). In general, the solutions based on the GD approach show lower input noise, at the cost of a lower ICMR. The design proposed here show similar noise as other deep sub 0.5-V amplifiers with similar bandwidth and rail-to-rail input.
C. PERFORMANCE COMPARISON
In order to compare small-signal and large-signal power effectiveness of the OTAs in Table 6 , the following standard figures of merit (FOMs) have been adopted:
where SR a = (SR + + SR − )/2 is the average slew-rate and I DD is the total supply current. In order to better compare the low voltage capabilities of the OTAs in Table 6 , the next two FOMs have been used, which refer the small-signal bandwidth and SR to the total dissipation power P diss rather than I DD : As it is can be concluded from Table 6 , the proposed circuit outperforms all other designs in terms of IFOM L and FOM L , and offers one of the highest values of IFOMs and FOMs. Only the design in [3] offers better FOMs, since it was optimized for particular value of C L . Nevertheless, the OTA in [3] does not provide a rail-to-rail input range.
As compared with other similar OTAs published recently, [6] , [7] , the proposed OTA offers larger DC voltage gain and large-signal FOMs (FOM L and IFOM L ). The most important advantage of the structure in [7] is high CMRR and simple topology, while the circuit in [6] offers symmetrical SR. It is also worth noting, that the 1-V OTA in [16] , where similar frequency compensation method was used, showed IFOMs of 20513 MHz · pF/mA at C L = 200pF. However, this IFOMs was proportional to the square root of C L . Therefore, for C L = 30 pF one could expect the IFOMs of around 8000 MHz · pF/mA. Nevertheless, the above OTA was realized with larger V DD and much lower transistor sizes, that allowed decreasing C C2 to 0.15 pF, thus improving the GBW product of OTA. In the design discussed here, the GBW product was constrained by the parasitic capacitances of transistors. The other factors affecting IFOMs in this design were: larger biasing current of the BD input stage for the same transconductance and additional power consumed by the biasing circuit, which was included in the total dissipation power. Nevertheless, the OTA described in this work showed the best performance among all sub 0.5-V OTAs in terms of the DC voltage gain and standard FOMs.
V. CONCLUSION
The paper presents a design of a 0.3-V ultra-low-power OTA. The circuit is based on a three-stage structure with an RNMC compensation scheme and an input stage based on a non-tailed BD differential pair. Design restrictions in an ULV supply conditions are discussed. Experimental verification showed superior performance in terms of standard FOMs and DC voltage gain, as compared with other similar OTAs with sub 0.5-V supply and rail-to-rail input common-mode range.
TOMASZ KULEJ received the M.Sc. and Ph.D. degrees (Hons.) from the Gdańsk University of Technology, Gdańsk, Poland, in 1990 and 1996, respectively. He was a Senior Design Analysis Engineer at Polish Branch of Chipworks Inc., Ottawa, Canada. He is currently an Associate Professor with the Department of Electrical Engineering, Częstochowa University of Technology, Poland, where he conducts lectures on electronics fundamentals, analog circuits, and computer-aided design. He has authored or coauthored over 70 publications in peer-reviewed journals and conferences. He holds three patents. His current research interests include analog integrated circuits in CMOS technology, with emphasis to low voltage and low power solutions. He serves as an Associate Editor of the Circuits, Systems, and Signal Processing and IET Circuits, Devices &Systems. He was also a Guest Editor of the special issues on Low Voltage Integrated Circuits on Circuits, Systems, and Signal Processing, in 2017, IET Circuits, Devices & Systems, in 2018, and Microelectronics Journal, in 2019. 
