Performance enhancement techniques for operational amplifiers by Huang, Bin
Graduate Theses and Dissertations Iowa State University Capstones, Theses andDissertations
2017
Performance enhancement techniques for
operational amplifiers
Bin Huang
Iowa State University
Follow this and additional works at: https://lib.dr.iastate.edu/etd
Part of the Electrical and Electronics Commons
This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University
Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University
Digital Repository. For more information, please contact digirep@iastate.edu.
Recommended Citation
Huang, Bin, "Performance enhancement techniques for operational amplifiers" (2017). Graduate Theses and Dissertations. 17210.
https://lib.dr.iastate.edu/etd/17210
  
Performance enhancement techniques for operational 
amplifiers 
 
by 
 
Bin Huang 
 
A dissertation submitted to the graduate faculty 
in partial fulfillment of the requirements for the degree of 
DOCTOR OF PHILOSOPHY 
 
Major: Electrical Engineering  
Program of Study Committee: 
Degang Chen, Major Professor  
Randall Geiger 
Nathan Neihart 
Chris Chu 
Jiming Song 
 
The student author, whose presentation of the scholarship herein was approved by the 
program of study committee, is solely responsible for the content of this dissertation. The 
Graduate College will ensure this dissertation is globally accessible and will not permit 
alterations after a degree is conferred. 
 
Iowa State University 
Ames, Iowa 
2017 
Copyright © Bin Huang, 2017. All rights reserved.
ii 
 
DEDICATION 
 
 
 
 
 
 
To my parents 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
iii 
 
TABLE OF CONTENTS 
 
 
LIST OF FIGURES ................................................................................................................. vi 
LIST OF TABLES ................................................................................................................... ix 
ACKNOWLEDGMENTS ........................................................................................................ x 
ABSTRACT ............................................................................................................................. xi 
 INTRODUCTION ........................................................................................ 1 
1.1. Background ........................................................................................................... 1 
1.2. Dissertation Outline .............................................................................................. 1 
1.3. References ............................................................................................................. 3 
 GAIN ENHANCEMENT FOR OPERATIONAL AMPLIFIERS ............... 5 
2.1. Introduction ........................................................................................................... 5 
2.2. Literature Review.................................................................................................. 5 
2.2.1 General approaches for op amp DC gain enhancement .................................... 5 
2.2.2 A state-of-the-art gain enhancement method via gds cancellation ................... 7 
2.3. Principles of Robust Gain Enhancement via Gds Cancellation ............................ 8 
2.4. Concept of Proposed Gain Enhancement Method via Gds Cancellation ............. 9 
2.5. A SDC-based Gain Enhancement Technique ..................................................... 10 
2.6. A FVA-based Gain Enhancement Technique ..................................................... 14 
2.7. The SDC-based vs. the FVA-based Gain Enhancement Technique ................... 17 
2.8. A Current Mirror Input Op Amp with the FVA-based GE Technique [7] ......... 18 
2.8.1 Operating Principles........................................................................................ 18 
2.8.2 Sizing Strategies for DC Gain Boost .............................................................. 19 
2.8.3 Stability of an Op Amp with a RHP Dominant Pole ...................................... 20 
2.8.4 Frequency Analysis ......................................................................................... 21 
2.8.5 Noise Analysis ................................................................................................ 24 
2.8.6 Simulation and Measurement Results ............................................................. 25 
2.9. A Folded Cascode Amplifier with the FVA-based GE Technique [6] ............... 33 
2.10. Discussion ........................................................................................................... 36 
2.11. Summary ............................................................................................................. 37 
2.12. References ........................................................................................................... 38 
 SLEW RATE ENHANCEMENT FOR OPERATIONAL 
TRANSCONDUCTANCE AMPLIFIERS ............................................................................. 39 
3.1. Introduction ......................................................................................................... 39 
3.2. Literature Review................................................................................................ 40 
3.3. Desired Features of Slew Rate Enhancement Circuits ....................................... 42 
 
iv 
 
3.4. Proposed SRE Method via Excessive Transient Feedback................................. 43 
3.4.1 Concept of the slew rate enhancement via excessive transient feedback ....... 43 
3.4.2 Selections of sensing and driving nodes for a SRE circuit ............................. 43 
3.5. Design Example with the Proposed SRE Technique .......................................... 45 
3.5.1 Small signal analysis ....................................................................................... 48 
3.5.2 Large signal analysis ....................................................................................... 49 
3.6. Simulation Results .............................................................................................. 50 
3.7. Summary ............................................................................................................. 52 
3.8. References ........................................................................................................... 53 
 POWER EFFICIENCY ENHANCEMENT FOR OP AMPS DRIVING 
LARGE CAPACITIVE LOADS ............................................................................................ 54 
4.1. Introduction ......................................................................................................... 54 
4.2. Literature Review................................................................................................ 54 
4.2.1 General review ................................................................................................ 54 
4.2.2 State-of-the-art methods.................................................................................. 55 
4.3. Desired Features of Op Amp for Driving Large Capacitive Loads .................... 60 
4.4. Concept of the Proposed Power-Efficient Op Amp Design for Driving Large 
Capacitive Loads ................................................................................................................. 60 
4.5. Design Example .................................................................................................. 62 
4.5.1 Design of the V-V preamp stage ..................................................................... 62 
4.5.2 Design of the entire op amp ............................................................................ 69 
4.6. Simulation Results .............................................................................................. 78 
4.6.1 Typical corner simulation results .................................................................... 79 
4.6.2 Process corner variation simulation results ..................................................... 81 
4.6.3 Mismatch variation simulation results ............................................................ 84 
4.6.4 Process corner plus mismatch variation simulation results ............................ 88 
4.6.5 Post-layout simulation results ......................................................................... 91 
4.7. Performance Comparison of This Work with the Literature .............................. 93 
4.8. Discussion ........................................................................................................... 96 
4.9. Summary ............................................................................................................. 96 
4.10. References ........................................................................................................... 97 
 CURRENT UTILIZATION EFFICIENCY ENHANCEMENT FOR 
FOLDED CASCODE AMPLIFIERS ..................................................................................... 99 
5.1. Introduction ......................................................................................................... 99 
5.2. Literature Review.............................................................................................. 101 
5.2.1 General review .............................................................................................. 101 
5.2.2 A state-of-the-art FCA design for CUE enhancement .................................. 102 
5.3. Proposed FCA Output Stage Design for Low Noise, Offset and Power .......... 104 
5.3.1 Desired features and conceptual design of a FCA output stage .................... 104 
5.3.2 Proposed FCA core amplifier design ............................................................ 107 
5.3.3 Proposed FCA output stage design ............................................................... 111 
 
v 
 
5.4. Simulation Results for Proposed FCA vs. Conventional Fast FCA ................. 123 
5.4.1 Typical corner simulation results .................................................................. 123 
5.4.2 Process corner and temperature variation simulation results ........................ 129 
5.4.3 Mismatch variation simulation results .......................................................... 134 
5.4.4 Process corner plus mismatch variation simulation results .......................... 136 
5.5. Performance Comparison of This Work with the literature.............................. 139 
5.6. Discussion ......................................................................................................... 141 
5.7. Summary ........................................................................................................... 143 
5.8. References ......................................................................................................... 143 
 COMBINED PERFORMANCE ENHANCEMENT TECHNIQUES 
FOR FOLDED CASCODE AMPLIFIERS .......................................................................... 145 
6.1. Schematic Design.............................................................................................. 145 
6.2. Frequency Response Analysis .......................................................................... 152 
6.3. Noise Analysis .................................................................................................. 156 
6.4. Offset Voltage Analysis .................................................................................... 159 
6.5. Simulation Results ............................................................................................ 160 
6.5.1 Typical corner simulation results .................................................................. 161 
6.5.2 Process corner and temperature variation simulation results ........................ 166 
6.5.3 Process corner plus mismatch variation simulation results .......................... 171 
6.6. Performance comparison to the literature ......................................................... 174 
6.7. Discussion ......................................................................................................... 176 
6.8. Summary ........................................................................................................... 176 
6.9. References ......................................................................................................... 176 
 CONCLUSION ......................................................................................... 177 
 
 
 
 
 
 
 
 
 
 
vi 
 
LIST OF FIGURES 
Page 
Figure 2.1 Gain enhancement via (a) cascoding transistors (b) cascading gain stages (c) 
regulated gain boost (d) conductance cancellation ................................................ 7 
Figure 2.2: Yan’s conductance cancellation method ................................................................ 8 
Figure 2.3: Concept of the proposed gain enhancement method via gds cancellation ........... 10 
Figure 2.4: SDC-based gds cancellation  a) negative gds generator b) small signal circuit 
of the circuit in (a) c) low gain amplifier AN ...................................................... 11 
Figure 2.5: FVA-based gds cancellation  a) negative gds generator b) small signal circuit 
of the circuit in (a) c) low gain amplifier AN2 .................................................... 14 
Figure 2.6. Schematic of the designed op amps...................................................................... 19 
Figure 2.7. A two-stage op amp with a RHP dominant pole in a negative feedback loop ..... 21 
Figure 2.8: Small signal block diagram of the proposed op amp ........................................... 22 
Figure 2.9:  Noise model of the proposed op amp .................................................................. 24 
Figure 2.10: Layout and microphotograph of the fabricated proptotyp op amp ..................... 25 
Figure 2.11: Simulated DC gain vs. (a) P.T. variation (b) supply voltage (c) OSW .............. 26 
Figure 2.12:  Op amps DC gain measurement (a) schematic (b) lab setup (c) measured DC 
gain vs. OSW ....................................................................................................... 28 
Figure 2.13: Op amps’ DC gain under P.Mis variation (a) proposed op amp (b) 
conventional op amp (c) gain enhancement ........................................................ 28 
Figure 2.14: Post-layout simulated AC responses of the prop. and conv. op amps ............... 29 
Figure 2.15: Measured transient performance of the proposed op amp ................................. 30 
Figure 2.16: Post-layout simulated noise performance of the two op amps ........................... 31 
Figure 2.17: A fully differential FCA with the FVA-based technique ................................... 33 
Figure 2.18: gD1 and gB1 under P.T variation a) gD1 b) gB1 .................................................. 34 
Figure 2.19: gD and gB under P.T variation a) gD b) gB ....................................................... 34 
Figure 2.20: DC gain of the proposed and conventional op amp ........................................... 35 
Figure 3.1: Conventional Class-A operation transconductance amplifier .............................. 40 
Figure 3.2: An OTA with the adaptive biasing circuit [3] ...................................................... 41 
Figure 3.3: Concept of the proposed SRE method ................................................................. 43 
Figure 3.4: Different types of SRE methods ........................................................................... 44 
Figure 3.5: Designed one-stage OTA with the proposed SRE method .................................. 46 
vii 
 
Figure 3.6: Small signal transient response of the three designed OTAs ............................... 51 
Figure 3.7: Step responses of the three OTAs (a) output voltages (b) tail currents ................ 51 
Figure 5.1: Schematic of a conventional folded cascode amplifier (FCA) ............................. 99 
Figure 5.2: Rudy’s FCA a) the FCA’s schematic b) floating battery in the FCA ................ 102 
Figure 5.3: Desired features of a FCA’s output stage ........................................................... 105 
Figure 5.4: A conceptual design of a FCA output stage ....................................................... 107 
Figure 5.5: A PMOS input FCA with  differential-to-single-ended conversion on a) PMOS 
side b) NMOS side ............................................................................................. 108 
Figure 5.6: Frequency responses of the conventional fast and slow FCA ............................ 110 
Figure 5.7: Transient responses of the fast and slow FCA ................................................... 111 
Figure 5.8: Schematic of the proposed FCA with a new turn-around stage ......................... 112 
Figure 5.9: Small signal block diagram of the proposed FCA ............................................. 115 
Figure 5.10: Poles and zeros distribution of the proposed FCA ........................................... 117 
Figure 5.11: Phase drop due to complex poles and zeros vs. k1 and k2 ............................... 118 
Figure 5.12: The proposed FCA’s PM vs. k1 and k2 ........................................................... 118 
Figure 5.13:  Noise model for the proposed FCA ................................................................. 119 
Figure 5.14: Frequency responses of the proposed and conventional FCAs ........................ 124 
Figure 5.15: Noise performance of the proposed and conventional FCAs ........................... 125 
Figure 5.16: Transient responses of the proposed and conventional FCAs .......................... 126 
Figure 5.17: Frequency responses of the two FCAs a) proposed b) conventional ............... 130 
Figure 5.18: Noise performance of the prop. and conv. FCAs under P.T. variation ............ 131 
Figure 5.19: Transient responses of the prop. and conv. FCAs under P.T. variation ........... 132 
Figure 5.20: Transient responses of the prop. and conv. FCAs under mismatch variation .. 135 
Figure 5.21: Transient responses of the prop. and conv. FCAs under P.Mis. variation ....... 137 
Figure 5.22: Average Ts_0.01% of the proposed FCA under P.Mis. variation .................... 138 
Figure 5.23: Average Ts_0.01% of the conventional FCA under P.Mis. variation .............. 138 
Figure 5.24: A circuit to reduce leakage current of M14 in the turn-around stage ............... 142 
Figure 6.1: Schematic of the proposed FCA with gain, slew rate and CUE enhancement .. 146 
Figure 6.2: Schematice of the negative SRE circuit for the proposed FCA ......................... 146 
Figure 6.3: Small signal block diagram of the proposed FCA ............................................. 152 
Figure 6.4: Distribution of the proposed FCA’s poles and zeros ......................................... 155 
Figure 6.5: Phase drop due to complex poles and zeros vs. k1 and k2 ................................. 156 
viii 
 
Figure 6.6: The FCA’s PM vs. k1 and k2 ............................................................................. 156 
Figure 6.7:  Noise model for the proposed op amp............................................................... 157 
Figure 6.8: Frequency responses of the proposed and conventional FCAs .......................... 162 
Figure 6.9: Noise performance of the proposed and conventional FCAs ............................. 163 
Figure 6.10: Transient responses of proposed and conventional FCAs ................................ 164 
Figure 6.11: Frequency responses of the two FCAs a) proposed b) conventional ............... 167 
Figure 6.12: Noise performance of the prop. and conv. FCAs under P.T. variation ............ 168 
Figure 6.13: Transient responses of the prop. and conv. FCAs under P.T. variation ........... 170 
Figure 6.14: Transient responses of the prop. and conv. FCAs under P.Mis. variation ....... 171 
Figure 6.15: Average Ts_0.01% of the proposed FCA under P.Mis. variation .................... 172 
Figure 6.16: Average Ts_0.01% of the conventional FCA under P.Mis. variation .............. 172 
 
 
 
 
 
 
 
 
 
 
 
 
 
ix 
 
LIST OF TABLES 
Page 
Table 2.1: Expression of the conductance and capacitance in the proposed op amp ............. 23 
Table 2.2: Sumamry of measured performance of the two op amps ...................................... 32 
Table 2.3: Performacne Comparison to the literature ............................................................. 32 
Table 2.4: Performance summary of the designed op amps ................................................... 36 
Table 3.1: Performance summary of the three designed OTAs .............................................. 52 
Table 4.1: Expressions of parasitic capacitance for the op amp’s input stage ........................ 74 
Table 4.2: Performance summary of the designed op amp in the typical corner .................... 81 
Table 4.3: Process corner setups for the simulations of the designed op amp ........................ 82 
Table 4.4: Performance sumamry of the designed op amp under process corner variation ... 84 
Table 4.5: Performance summary of the designed op amp under mismatch variation ........... 87 
Table 4.6: Performance summary of the designed op amp under P.Mis variation ................. 91 
Table 4.7: Performance comparison of this work in schematic and post-layout view with 
recently reported amplifiers .................................................................................. 95 
Table 4.8: Performance comparison of this work with recently reported amplifiers ............. 95 
Table 5.1: Performacne summary of the designed conventional slow and fast FCAs ......... 111 
Table 5.2: Expressions of the conductance and capactance in the proposed FCA ............... 116 
Table 5.3: Performance summary of the proposed and conventional FCAs ........................ 129 
Table 5.4: Simulation setup with process corner and temperature variation ........................ 130 
Table 5.5: Performance summary of the prop. and conv. FCAs under P.T. variation .......... 133 
Table 5.6: Performance summary of the prop. and conv. FCA under mismatch variation .. 136 
Table 5.7: Performance summary of the prop. and conv. FCA under P.Mis variation ........ 139 
Table 5.8: Performance comparison of the proposed FCA to the state-of-the-art method 
and the conventional FCA .................................................................................. 140 
Table 6.1: Expressions of the conductance and capactance in the proposed FCA ............... 153 
Table 6.2: Performance summary of the prop. and conv. FCAs in typical corner ............... 165 
Table 6.3: Simulation setup with process corner and temperature variation ........................ 167 
Table 6.4: Performance summary of the prop. and conv. FCA under P.T. variation ........... 170 
Table 6.5: Performance summary of the prop. and conv. FCA under P.Mis variation ........ 174 
Table 6.7: Performance comparison of the proposed FCA to the literature ......................... 175 
x 
 
 ACKNOWLEDGMENTS 
 
First and foremost, I would like to thank my parents for their encouragement, support and 
unselfish love. All the support and encouragement they have provided to me over years was 
my greatest gift. By the same token, I am thankful for the continuous support, trust and love 
of my wife, Manman Qian.  
My deepest appreciation goes to my advisor, Dr. Degang Chen. His guidance and advice 
have been invaluable to my research as well as my career. He has made my Ph.D. study a 
wonderful and unforgettable journey.  
I would also like to thank my committee members, in alphabetical order, Dr. Chris Chu, Dr. 
Randall Geiger, Dr. Nathan Neihart, and Dr. Jiming Song, for their advice and 
recommendations throughout my graduate program and for their service in my various 
examination committees.  
I would also like to express my thanks to my peers including Yongjie Jiang, Chih-Wei Chen 
and Bharath Vasan. The technical exchange of ideas with them was especially helpful and 
therefore appreciated.  
 
 
 
 
 
 
 
xi 
 
ABSTRACT 
Operational amplifiers (op amps) are one of the most fundamental and widely used building 
blocks for analog and mixed-signal circuits and systems. As transistors’ feature size scales 
down in the deep submicron process, the short channel effects, high leakage current and 
reduced supply voltages make the design of op amps more challenging. In this dissertation, we 
present several methods to improve op amps’ DC gain, slew rate, power efficiency and current 
utilization efficiency (CUE). 
A basic requirement for an op amp is high DC gain especially for high precision applications. 
We introduce a method to robustly improve op amps’ DC gain with negligible power and area 
overhead. The new DC gain enhancement method can be implemented based on the source 
degeneration circuit (SDC) or the flipped voltage attenuator (FVA). Compared to the FVA-
based technique, the SDC-based technique is more suitable for those CMOS processes whose 
transistors’ threshold voltages are too low for the transistors in the FVA to work in weak or 
strong inversion regions. Otherwise, the FVA-based technique is recommended as this 
technique is more robust to devices’ random mismatch. A prototype op amp with the FVA-
based technique is designed and fabricated in the IBM130nm process. The measurement and 
simulation results of the prototype verify that the technique largely enhances an op amp’s DC 
and is very robust over process, voltage and temperature variations.  
Another important op amp requirement is high slew rate. In this regard, we introduce a 
method that greatly improves an op amp’s slew rate while still preserving its small signal 
performance by a well-defined turn-on condition. The performance of the introduced method 
is discussed in comparison with an existing adaptive biasing method that was widely used to 
enhance slew rate. The introduced method excels in several aspects. First, unlike the adaptive 
xii 
 
biasing method which degrades an op amp’ linearity, the introduced method is able to enhance 
linearity. Second, the proposed method improves an op amp’s slew rate by 2320% (vs. 780% 
by the adaptive method) with the power and area overhead of 2% and 1.2% (vs. 15% and 35% 
by the adaptive method). In addition, the proposed method improves the op amp’s total 
harmonic distortion (THD) by 6dB but the adaptive method degrades the THD by 12dB.  
The ability to drive large capacitive loads is becoming critical for op amps in emerging 
applications such as liquid crystal display drivers. In this regard, we introduce a power efficient 
design of op amps that can drive large capacitive loads. The proposed method decouples the 
large and small signal performance, eliminates current waste in the preamp stages’ load 
circuits, and is not sensitive to devices’ random mismatches. Compared to the state-of-the-art 
methods, our design prototype in a CMOS 180nm process shows largely improved small and 
large signal figure of merits, equivalent to largely improved power efficiency for given small 
and large signal performance specifications.  
Folded cascode amplifier (FCA) is a commonly used architecture for designing op amps, but 
a significant portion of supply current is wasted in the cascode stage. This not only reduces the 
current utilization efficiency (CUE), defined as the ratio of an FCA’s tail current to its total 
supply current, but also degrades the FCA’s gain, noise and offset. In this regard, we introduce 
a method to dramatically reduce a FCA’s cascode stage current without degrading the FCA’s 
settling performance. Compared to the existing methods, the proposed method effectively 
improves not only the CUE but also the settling performance of op amps.  
Lastly, a prototype FCA, with the proposed performance enhancement techniques of gain, 
slew rate and CUE, is designed to demonstrate the compatibility of these techniques. 
  
1 
 
 INTRODUCTION  
1.1. Background 
Operational amplifiers (op amps) are one of the most fundamental and widely-used building 
blocks for analog and mixed-signal circuits and systems. Their applications are found in low 
to high speed systems with large to small capacitive loads such as filters, data converters, 
integrators, power management IC and communication transmitters and receivers [1-5].  
The designs of analog circuits like op amps are becoming more challenging in submicron 
CMOS processes, mainly due to the short channel effects, high leakage current and reduced 
supply voltage. Short channel effects reduce a transistor’s intrinsic gain, which results in more 
challenges to design an effective high gain op amp. High leakage current imposes an upper 
limit to the achievable impedance at a node, which consequently limits the DC gain of an op 
amp. A low supply voltage for an op amp limits the maximum achievable signal to noise ratio 
(SNR) and slew rate. In addition to design difficulties caused by the submicron CMOS process, 
achieving desired specifications for many op amps demand large power and area consumption 
such as low noise, large gain-bandwidth product (GBW), fast slew rate (SR), short settling 
time, large capacitive load driving capability and wide input common mode range.  
This dissertation is concerned with op amp performance enhancement for DC gain, slew 
rate, power efficiency and current utilization efficiency (CUE). Several new techniques to 
improve an op amp’s DC gain, slew rate, power efficiency and CUE are introduced and 
discussed in this dissertation.  
1.2. Dissertation Outline 
In Chapter 2, high precision applications demanding high DC gain operational amplifiers 
are presented first, followed by a literature review of state-of-the-art DC gain enhancement 
2 
 
(GE) techniques. Then the principles for robust GE techniques via conductance cancellation 
are introduced. After that, two proposed GE techniques, designed based on a source 
degeneration circuit (SDC) and a flipped voltage attenuator (FVA) respectively, are 
introduced, analyzed and compared. Finally, incorporating the FVA-based GE technique, 
several design examples are presented and discussed with the simulation and measurement 
results to confirm the robustness and effectiveness of the proposed GE method.  
In Chapter 3, operation transconductance amplifiers (OTAs) applications whose setting time 
is restricted by op amps’ slew rate are described. The literature on slew rate enhancement 
(SRE) for OTAs is reviewed. Then a new SRE circuit is introduced, which has the ability to 
preserve the OTAs’ small signal performance, process well-defined turn-on voltage for SRE 
circuits, dramatically improve the OTA’s slew rate and slightly improve the OTAs’ linearity. 
Next, the introduced SRE circuit is analyzed in both small and large signal operations. A design 
example incorporating the SRE technique is presented with the simulation results to confirm 
the effectiveness of the proposed SRE method.  
In Chapter 4, op amp applications to drive capacitive loads in the range of nF to uF are 
reviewed first, followed by a literature view of state-of-the-art op amp designs for these 
applications. Then, the desired features and conceptual design of the proposed power-efficient 
op amps for these applications are introduced. After that, a design example, incorporating the 
proposed power-efficient design method, is introduced and analyzed in detail. Comprehensive 
simulation results of the design example are presented at the end of the chapter. The results 
verify that the proposed design has indeed better small and large signal figure of merits 
compared with the state-of-the-art methods in both.  
3 
 
In Chapter 5, op amp applications that need wide or close-to-supply-rail input common mode 
range, low noise, low offset voltage, low power consumption and high gain are reviewed. For 
these applications, folded cascode amplifiers (FCAs) are natural structure selections. Then, 
two differential-to-single-ended conversion circuits for a conventional FCA are discussed and 
compared in terms of speed. Then a literature review on the design of FCA’s cascode stage or 
turn-around stage is presented. Following that, a new turn-around stage for a FCA is introduced 
to dramatically reduce the current waste in the FCA so as to improve the FCA’s current 
utilization efficiency (CUE). In the end, a design example is presented with detailed analysis 
and extensive simulation results so as to verify the CUE improvement and confirm that no long 
recovery time is brought by the proposed CUE enhancement technique.  
In Chapter 6, an op amp, which integrates the GE, SRE and CUE enhancement techniques 
introduced in Chapter 2, 3 and 5 is designed. The simulation results of the design example are 
presented and discussed to confirm the compatibility of these proposed performance 
enhancement techniques.   
1.3. References 
[1]. S. Koziel, R. Schaumann and H. Xiao, "Analysis and optimization of noise in 
continuous-time OTA-C filters," IEEE Transactions on Circuits and Systems I: 
Regular Papers, vol. 52, no. 6, pp. 1086-1094, June 2005 
[2]. H. Ishii, K. Tanabe and T. Iida, "A 1.0 V 40mW 10b 100MS/s pipeline ADC in 90nm 
CMOS," Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005., 
San Jose, CA, 2005, pp. 395-398. 
[3]. J. Silva, U. Moon, J. Steensgaard and G. C. Temes, “Wideband low-distortion delta-
sigma ADC topology”, Electronics Letters, vol. 37, no. 12, pp. 1-2, 2001 
4 
 
[4]. Cheung Fai Lee and P. K. T. Mok, “A monolithic current-mode CMOS DC-DC 
converter with on-chip current-sensing technique," IEEE Journal of Solid-State 
Circuits, vol. 39, no. 1, pp. 3-14, Jan. 2004 
[5]. A. A. Abidi, “Direct-conversion radio transceivers for digital communications,” IEEE 
Journal of solid-state circuits, vol. 30, no. 12, pp. 1399-1410, Dec 1995 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5 
 
 GAIN ENHANCEMENT FOR OPERATIONAL 
AMPLIFIERS 
2.1. Introduction 
    Operational amplifiers (op amps) are important fundamental analog building blocks for 
many analog and mixed signal systems. Realization of high gain op amps in standard digital 
CMOS is key to implementing a high precision system on a chip. However, as transistor feature 
sizes continuously scale down and supply voltage reduces, transistor’s intrinsic gain becomes 
smaller typically in the range of 20-30dB in a deep submicron process. Cascoding two 
transistors in a stack can boost DC gain to 40~ 60dB, but it is still far from sufficient for high 
precision applications such as sigma-delta converters, switched capacitor circuits and optical 
sensor analog front end to perform at their best. An efficient DC gain enhancement (GE) 
method is needed.  
2.2. Literature Review 
2.2.1 General approaches for op amp DC gain enhancement 
    In an effort to improve DC gain of op amps in submicron processes, four methods shown in 
Figure 2.1 have been reported [1-4] in the literature: a) cascoding multiple transistors in a stack; 
b) cascading multiple gain stages; c) gain-boosting [1] and d) conductance cancellation [2-4]. 
Method a) is simple but results into a loss in voltage headroom especially when there are many 
transistors in a stack. Method b) requires complex frequency compensation, thus seriously 
degrading an amplifier’s frequency response and settling performance. Method c) usually 
introduces pole-zero doublets which harm an amplifier’s settling performance, in particular for 
high accurate settling performance. Method d) is so far seldom used in commercial production 
because the yield of large DC gain enhancement over PVT (process, voltage and temperature) 
6 
 
variation and output voltage swing is low with the existing schemes of method d) such as [2-
4]. The main difficulty lies in that the generated negative conductance by [2-4] does not track 
and cancel the positive counterpart when the op amps’ PVT condition and output voltages 
change. As a result, without the aid of extensive tuning work, methods [2-4] can only provide 
large DC gain enhancement in a particular PVT condition and output voltage. Whenever the 
operating conditions or temperatures of the op amps [2-4] change, the op amps need to be 
calibrated again, because otherwise the methods [2-4] would fail to provide large DC gain 
enhancement and can even potentially reduce the DC gain of op amps. Consequently, the op 
amps [2-4] always need to be calibrated before their normal operation, making these op amps 
not suitable for continuous-time operation. To maintain functionality, off-chip high-gain low-
offset comparators are used in [2-3] to function as manual tuning circuits. A micro-controller 
and a 16-bit DAC are used in [4] to function as automatic tuning circuits. However, due to the 
need for extensive operations of the complicated tuning circuits the cost, power consumption 
and area overhead of [2-4] are high. 
 
(a) (b) 
Figure 2.1 Gain enhancement via (a) cascoding transistors (b) cascading gain stages (c) 
regulated gain boost (d) conductance cancellation  
 
Vss
D
Vo
Vb2
Vb1
M10
M12
gm1Vin
Ibias
A1 A2
Cc
Vin Vo
7 
 
 
                                                             (c)                                                              (d) 
Figure 2.1 (continued) 
2.2.2 A state-of-the-art gain enhancement method via gds cancellation 
A half circuit of Yan’s method [3] is shown in Figure 2.2. Because the drain current of M5 
is fixed and the bulk and source of M5 are connected, the gate voltage of M5 and M6 track the 
changes in source voltage of M5 and M6. Consequently, the equivalent conductance looked 
down from the source of M6 is (1-A)*gds6, which becomes negative if A is larger than 1. The 
DC gain of the amplifier in Figure 2.2 can be derived as (2-1) assuming that the input 
impedance of gain block A is high.  
 Ao =  −
gm1
gds1 + gds2 + gds4 + gds8 + (1 − A)gds6
 (2-1) 
As can be seen from (2-1), in order to achieve large DC gain enhancement, (1-A)*gds6 needs 
to always approach and cancel gds1 + gds2 + gds4 + gds8, which is a very challenging task 
due to two factors. First, the conductance variations of PMOS and NMOS are different at 
various process corners. For example, the conductance of PMOS and NMOS change in 
opposite directions when the process corner changes from TT (typical corner) to SF (NMOS 
slow PMOS fast) or from TT to FS (NMOS fast PMOS slow). Second, gds6, gds2 and gds4 
vary in a direction different from that of gds1 and gds8 when output voltage changes. For 
M23
M24
AX
Vb4
Vb3
Ibias
gm1Vin
Vo
CL
Vin
Vo
CL-gn
 M1
Ibias
8 
 
instance, as the output voltage increases, drain-source voltage of M1 and M8 increase while 
drain-source voltage of M2, M4 and M6 reduces, assuming that A is larger than 1. As a result, 
gds1 and gds8 decreases while gds6, gds2 and gds4 increases when output voltage increases; 
but the amount of increase in gds6 is different from that in gds2 and gds4 because of 
amplification A. Therefore, the generated negative conductance by this method cannot track 
the changes of the positive conductance under PVT variation and output voltage swing.  
 
Figure 2.2: Yan’s conductance cancellation method 
2.3. Principles of Robust Gain Enhancement via Gds Cancellation 
     In order to robustly enhance amplifiers’ DC gain via the conductance cancellation method, 
the gds cancellation should satisfy the following three requirements.  
1) Type matching: only NMOS can be used for NMOS gds cancellation and only PMOS 
can be used for PMOS gds cancellation.  
2) Operation matching: the critical transistors in the gds cancellation circuits should have 
the same bias and operation conditions. It means they should share the same gate, source, 
drain and bulk voltages and current densities.  
Vi+
M2
M1
A
M4
M6M5
M8
Vo
Vctrl
Vb1
M7
M3
9 
 
3) Layout matching: the critical transistors may have different multipliers but should have 
the same width and length. Those transistors should have common centroid layout so as 
to share the same temperature variation.  
2.4. Concept of Proposed Gain Enhancement Method via Gds Cancellation 
The concept of the proposed gain enhancement (GE) method is illustrated by Figure 2.3, in 
which the gds of the bottom NMOS transistor in an op amp’s cascode stage will be cancelled. 
As shown in Figure 2.3, the sensing and control block senses signals Vs+ and Vs- from the 
cascode stage first, where the Vs+ and Vs- are functions of the bottom NMOS’s gds. With the 
sensed signals, the block then generates Vfb to adjust the negative conductance, –gn, so that –
gn becomes a function of the bottom NMOS’s gds. The dependency of –gn on the bottom 
NMOS’s gds makes -gn inherently track and cancel the bottom NMOS’s gds over PVT 
variations. Similarly, the gds of the top PMOS transistor in the cascode stage can be robustly 
cancelled via this method by implementing the PMOS counterpart of the sensing and control 
block. As the NMOS and PMOS counterparts of the sensing and control block are independent, 
the output impedance of the cascode stage can be independently increased. When the gds of 
both top PMOS and bottom NMOS of the cascode stage are completely cancelled by the 
proposed method, the op amp’s DC gain will be ideally infinite. 
In regard to the implementations of the sensing and control block, two design approaches 
will be introduced in the following chapter sections. The first design approach is based on a 
flipped voltage attenuator (FVA). The second approach is based on a source degeneration 
circuit (SDC).  
10 
 
 
Figure 2.3: Concept of the proposed gain enhancement method via gds cancellation  
2.5. A SDC-based Gain Enhancement Technique  
Figure 2.4 shows a SDC-based gain enhancement circuit for an op amp. In Figure 2.4a), 
transistors M3 and M6 are respectively bottom and cascode NMOS transistors in the op amp’s 
cascode stage. The level shifter AN1, whose implementation is shown in Figure 2.4c), senses 
transistor M3’s drain voltage (VA) and then shifts up VA to voltage VB. The voltage VB is 
connected to the input of an SDC formed by transistors M4-M9.  In the SDC, the current mirror 
M7-M8 has mirror ratio of 1:1. Transistors M3, M4 and M9 have the same unit size (same 
width and length) with different multipliers: m, 1, and 1, respectively.  As M3, M4 and M9 are 
the same type of transistors and have the same current density, Vb4, VE and VF will be equal in 
the DC operation. The gain block of -1 can be easily implemented in fully differential circuits.   
The small signal circuit of SDC is displayed in Figure 2.4 b). After deriving KCL equations 
(2-2), (2-3) and (2-4) at nodes VC, VD and VE separately, the DC gain from VB to VC is 
calculated as (2-5), where ε1 is given in (2-6) and η5 is gmb5/gm5≅0.15. ε1 is approximately 
equal to 
gds4+gds5
gm5
≈
gds5
gm5
  as cascode transistor M5 are usually sized with a much shorter length 
compared to M4. The magnitude of ε1 is in the order of 0.05 when a transistor’s intrinsic gain 
is about 20.  
-gn
Vss
Sensing&
Control
D
Vo
Vb1
Cascode 
Quarter 
cascode
circuit Vs-
Vs+
gds cancellation 
circuit
Vfb
gD
go
11 
 
 
Figure 2.4: SDC-based gds cancellation  a) negative gds generator b) small signal circuit of 
the circuit in (a) c) low gain amplifier AN 
gds4VC − gm5(VB − VC) + gds5(VC − VD) + gmb5VC = 0     (2-2) 
VD(gm7 + gds7) + gds4VC = 0         (2-3) 
gm8VD + gds8VE + gm9VE + gds9VE = 0 (2-4) 
VC
VB
=
gm5(gm7 + gds7)
gds4gds5 + (gm7 + gds7)(gds4 + gds5 + gm5(1 + η5))
 
=
1
(1 + η5)(1 + ε1)
 
(2-5) 
ε1 =
gds5(gds7 + gm7) + gds4(gds5 + gds7 + gm7)
gm5(gds7 + gm7)
≅ 0.1 (2-6) 
AN1 =
(gds10 + gm10 + gmb10)(gds11 + gds12 + gm11)
gds11gds12 + (gds10 + gm10)(gds11 + gds12 + gm11)
= (1 + η10)(1 + ε2) 
(2-7) 
ε2 = −
gds11gds12gm10 + a
b (gm10 + gmb10)
≅ −0.01 (2-8) 
a = (gds11gds12 + gds10(gds11 + gds12 + gm11))gmb10   
b = gds11gds12 + (gds10 + gm10)(gds11 + gds12 + gm11) 
(2-9) 
VC
VA
=
VCAN1
VB
=
(1 + η10)(1 + ε2)
(1 + η5)(1 + ε1)
=
1
1 + ε3
≅ 0.94 (2-10) 
M7
M5
M4
1:
M3
m
VA
gA
M6
Vo
VB
Vb4
M8
M9
M10
M11
Vb1
Vb2
VA
VB
AN1
gds4
gm5(VB-VC)
gds5
VD
VC
gm7
(b)
-gmb5VCVD
VC
VE
gds7 -gm8VD
gds8
gm9+gds9
VE
(a) (c)
Vb3
-1
1 :
1:1
VF
M12
12 
 
VF
VA
= −
(1 + η10)(1 + ε2)gds4gm5gm8
(gds8 + gds9 + gm9)k
 (2-11) 
k = gds4gds5 + (gds7 + gm7)(gds5 + gds4 + gm5 + gmb5) (2-12) 
VF
VA
=
−(1 + η10)(1 + ε2) gds4gm8
(1 + η5)(1 + ε4)gm7gm9
=
−(1 + ε5) gds4gm8
gm7gm9
 (2-13) 
Ao =  −
gm1
gds1 + gds2 + gds4 + gds8 + (1 − A)gds6
 (2-14) 
ε4 =
k(gds8 + gds9 + gm9)
(gm5 + gmb5)gm7gm9
− 1 ≈ 0.1; ε5 =
(1 + η10)(1 + ε2) 
(1 + η5)(1 + ε4)
− 1 ≈ −0.1     (2-15) 
gN1 = −
VF
VA
gm3 =
(1 + ε5) gm8gds4gm3
gm7gm9
 = (1 + ε5)
gm8
gm7
gds4gm3
gds3gm4
gm4
gm9
gds3 =
= (1 + ε5)(1 + ε6)(1 + ε7)(1 + ε8)gds3 
(2-16) 
ε6 =
gm8
gm7
− 1; ε7 =
gds4gm3
gds3gm4
− 1; ε8 =
gm4
gm9
− 1             (2-17) 
gA = gds3 − gN1 +
gds12gds11(1 + η10)
gm11 + gds11 + gds12
  ≈ (ε5 + ε6 + ε7 + ε8) gds3 (2-18) 
F1 =
gA
gds3
≈ ε5 + ε6 + ε7  + ε8 (2-19) 
σF1
2 ≈ σε5
2 + σε6
2 + σε7
2 + σε8
2  (2-20) 
In addition, the DC gain of the level shifter, AN1, is found as (2-7), where η10 is gmb10/gm10. 
ε2 is shown in (2-8) and is in the order of -0.01 where a and b are expressed as (2-9). Therefore, 
the DC gain from VA to VC is found as (2-10), where ε3 is approximately in the same order 
as ε1, that is 0.05. As can be seen from (2-10), the voltage VC is approximately a replica voltage 
of VA. As M3 and M4 are transistors of the same type with the same width and length, gate, 
source voltage, drain voltage and current density, gds3/gm3 and gds4/gm4 should have the same 
variation over process, supply voltage and electrical operating point variation if not considering 
mismatch effects. A Common centroid layout for M3 and M4 is recommended to maximize 
the matching between gds3/gm3 and gds4/gm4 over temperature variations. Once the gain from VA 
13 
 
to VC is calculated, the DC gain from VA to VF is derived as (2-11), where k is displayed in (2-
12).  With further simplification, equation (2-11) is simplified as (2-13), where ε4 and ε5 are 
given in (2-14) and (2-15) respectively. The magnitude of ε4 and ε5 are in the order of 0.1 and 
-0.1. Thus, the generated negative conductance, -gN1, is derived as (2-16), where ε6, ε7 and ε8, 
as shown in (2-17), respectively represent the mismatch between M7 and M8, mismatch 
between M3 and M4, and mismatch between M4 and M9. After combining the positive and 
generated negative conductance, equation (2-18) shows the net conductance, gA, looked down 
from the source of M6. As can be seen from (2-18), the generated negative conductance is an 
intrinsic function of the positive conductance, gds3, which makes this proposed DC gain 
enhancement technique effective and robust over PVT variations.  
The term of 
gds12gds11(1+η10)
gm11
 in (2-18) can be easily designed at least 100 times smaller than 
gds3 due to two facts. First, drain current of M12 should be designed to be much smaller than 
(10 times) that of M3 and thus gds12 can be 10 times smaller than gds3. Second, the intrinsic 
gain of M11 can be designed in the neighborhood of 20. Thus, the variation of 
gds12gds11(1+η10)
gm11+gds11+gds12
 
is negligible compared with the change in gds3 and gN1 when mismatch and PVT variations are 
in presence. The expected conductance reduction factor, F1, from this conductance cancellation 
method can be derived as (2-19). As shown in (2-19), F1 is highly related to the matching 
between those critical transistor pairs in the current mirrors such as M7 and M8, M4 and M9, 
and M3 and M4. The variance of F1 can be roughly calculated as (2-20), which may not be 
rigorously correct since ε5, ε6, ε7 and ε8 are not pairwise independent. But Equation (2-20) can 
still offer insight for designing this type of negative impedance generators with source 
degeneration circuits.  
14 
 
2.6. A FVA-based Gain Enhancement Technique 
Figure 2.5a) shows a flipped voltage attenuator (FVA)-based gain enhancement circuit via 
gds cancellation. Transistors M3 and M2 are respectively the bottom NMOS and cascode 
NMOS transistors in a cascode stack. The drain voltage of M3 is sensed by the low gain 
amplifier AN2, the implementation of which is shown in Figure 2.5c). The output of amplifier 
AN2 is connected to the FVA, formed by M4~M7.  
 
Figure 2.5: FVA-based gds cancellation  a) negative gds generator b) small signal circuit of 
the circuit in (a) c) low gain amplifier AN2 
The small signal circuit of the flipped voltage attenuator is displayed in Figure 2.5b). By 
writing KCL equations, as shown in (2-21), at nodes of VF and VC, the DC gain from VB to VC 
is derived as (2-22), where gx is gds6gds7/(gds6 + gds7 + gm6) and γ1 is given by (2-23). γ1 
is about 0.04 in this process. The DC gain of the low gain amplifier, AN2, is shown in (2-24), 
where γ2, expressed in (2-25), is about -0.01. After knowing AN2 (VB/VA) and VC/VB, the DC 
gain from VA to VC can be found close to 1. This means that the voltage variation at drain of 
M3 is approximately the same as that at drain of M4. Therefore, when M3 and M4 are placed 
in a common centroid layout, the variations of gds3/gm3 and gds4/gm4 should track each other 
M6
M5
M41
:
M3
m
AN2
VA
Vb2
VF
gA
M2
Vo
gm4VF gds4
gm5(VB-VC)
gds5
VF
VC
gds6gds7
(a) (b)
VC
-gmb5VC
M8
M9
M10
VB
Vb1
Vb2
VD
VA
(c)
M7
Vb1
gds6+gds7+gm6
gx= gx
15 
 
over PVT and output voltage swing variations because M3 and M4 are the same type of 
transistors with identical width, length, gate, source, drain voltage, current density. 
gm5(VB − VC) − gmb5VC + gds5(VF − VC) + gds7VF = 0 
gds7VF + gds4VC + gm4VF = 0 
(2-21) 
VC
VB
=
gm5(gm4 + gx)
gds4(gds5+gx) + (gm4 + gx)(gds5 + gm5 + gm5b)
  =
1
(1 + η5)(1 + γ1)
 (2-22) 
γ1 =
gds4(gds5 + gx) + gds5(gm4 + gx) 
(gx + gm4)(gm5 + gmb5)
≅ 1 [(1 + η5)Av5⁄ ] = 0.04 (2-23) 
AN2 =
(gds8 + gm8 + gmb8)(gds9 + gds10 + gm9)
gds9gds10 + (gds8 + gm8)(gds9 + gds10 + gm9)
= (1 + η8)(1 + γ2) (2-24) 
γ2 = −
gds9gds10gm8 + c
d(gm8 + gmb8)
 ≅ −
η8
(1 + η8) Av8
≅ −0.007 (2-25) 
c = [gds9gds10 + gds8(gds9 + gds10 + gm9)]gmb8 
 d = gds9gds10 + (gds8 + gm8)(gds11 + gds10 + gm9) 
(2-26) 
The DC gain from VA to VF is calculated as (2-27), where h, γ3 and γ4 are given in (2-28), 
(2-29) and (2-30). The values of γ3 and γ4 are respectively close to 0.04 and -0.05. Utilizing 
the expression of VF/VA, the generated negative impedance, −gN2, can be easily derived as (2-
31), where the expression of γ5 is (2-32), representing the transistor intrinsic gain mismatch 
between M3 and M4. After combining −gN2 and the positive conductance, looked down from 
source of M2, the net impedance, gA, is derived as (2-33). The equation can be simplified as 
gds3(γ4 + γ5) because 
gds10gds9(1+η8)
gm9+gds9+gds10
 is typically more than 100 times smaller than gds3. As 
γ4 and γ5 are much smaller than 1, gA is a much smaller value than the original conductance. 
The conductance reduction factor brought by the technique is derived as (2-34). As can be 
seen, F2 is highly related to the matching between M3 and M4. The variance of F2 can be 
roughly derived as (2-35), which may not be rigorously correct since γ4  and γ5 are not 
16 
 
independent. However, this still provides design insights for the FVA-based gain enhancement 
technique. 
Though the analysis above provides design insights (for example, improving the matching 
between M3 and M4 is meaningful) for achieving the best gain enhancement, a simulation 
should be conducted to examine the gain enhancement. If gA is systematically positive in 
simulation, one can increase M8 size or decrease M5 size slightly to reduce the drain voltage 
of M4, which raises gds4/gm4 and γ5. If gA is systematically negative, one should reduce M8 
size or increase M5 size slightly. A design example with this technique will be discussed at 
length in Section 2.8.  
VF
VA
=
gds4gm5AN2
−h
+ gds4(gds5 + gx) = −
gds4(1 + η8)(1 + γ2)
gm4(1 + η5)(1 + γ3)
 
= −
gds4(1 + γ4)
gm4
 
(2-27) 
h = (gx + gm4)(gds5 + gm5 + gmb5) (2-28) 
γ3 =
gds4gds5 + gds5gm4 + gx(gds4 + gds5 + gm5 + gmb5)
gm4(gm5 + gmb5)
 ≅ 0.04 (2-29) 
γ4 =
(1 + η8)(1 + γ2)
(1 + η5)(1 + γ3)
− 1 ≅
η8 − η5 − η8 Av8 − 1/Av5⁄
1 + η5 + 1/Av5 
=  −0.05 (2-30) 
gN2 =
VFgm3
−VA
=
gm3gds4(1 + γ4)
gm4
= (1 + γ4)(1 + γ5)gds3 (2-31) 
γ5 = gm3gds4 (gm4gds3⁄ ) − 1 (2-32) 
gA = gds3 − gN2 +
gds10gds9(1 + η8)
gm9 + gds9 + gds10
≈ gds3(γ4 + γ5) (2-33) 
F2 = gA gds3⁄ ≈ γ4 + γ5 (2-34) 
σF2
2 ≈ σγ4
2 + σγ5
2  (2-35) 
17 
 
2.7. The SDC-based vs. the FVA-based Gain Enhancement Technique 
The discussions in Sections 2.5 and 2.6 together show that both the SDC-based and FVA-
based gain enhancement techniques via conductance cancellation obey the rules of robust gain 
enhancement via conductance cancellation. In this section, the two techniques are compared. 
Compared to the FVA, the SDC-based technique is more suitable for CMOS processes, in 
which transistors’ threshold voltages are too low for the transistors to work in weak or strong 
inversion regions with the FVA configuration. Otherwise, the FVA-based technique is 
recommended due to the following advantages.  First, the FVA-based technique is simpler, 
more compact and more power efficient due to the involvement of fewer transistors and 
branches of circuits. Second, the FVA-based technique has fewer high frequency poles in the 
gain enhancement signal path. Third, the FVA-based technique is very suitable for both fully 
differential and single ended op amps, whereas the SDC-based technique needs an additional 
gain block of -1 for single ended op amp. The last but not least, the FVA-based technique is 
more robust in response to devices’ random mismatches, simply because the variance of its 
gds reduction factor, shown in equation (2-35), is smaller than the SDC-based technique in 
equation (2-20). 
In the following designed prototype op amps in IBM130nm CMOS process, the FVA-based 
gain enhancement technique is implemented in favor of design simplicity, low power and area 
consumption. 
18 
 
2.8. A Current Mirror Input Op Amp with the FVA-based GE Technique 
[7] 
2.8.1 Operating Principles  
A current mirror input op amp with the proposed FVA-based gain enhancement technique 
is shown in Figure 2.6. The op amp core consists of a current mirror input stage, a cascode 
stage, and a push-pull output stage. By reusing the wide swing cascode current mirrors in the 
op amp core, only six transistors (M7~M9 and M22~M24) are needed for implementing the 
proposed gds cancellation circuits for both NMOS and PMOS sides. Similar to the previous 
DC analysis of the FVA-based technique, the equivalent conductance looked down from the 
source of M13 and looked up from the source of M15 can be found as (2-36). The expressions 
of δ1, δ2, δ3, and  δ4 are given in (2-37) to (2-40). 
 gA ≈ gds11(δ1 + δ2); gB ≈ gds17(δ3 + δ4) (2-36) 
 
  δ1 ≅
η7 − η3 − η7 Av7 −
1
Av3
⁄
1 + η3 +
1
Av3
≅ −0.05     (2-37) 
 δ2 = (gm11gds5) (gm5gds11)⁄ − 1 = (Av11 − Av5) Av5⁄  (2-38) 
 δ3 ≅
η22 − η14 − η22 Av7 −
1
Av14
⁄
1 + η14 + 1 Av14⁄
≅ −0.08 ;   (2-39) 
δ4 = (gm17gds16) (gm16gds17⁄ ) − 1 = (Av17 − Av16) Av16⁄  (2-40) 
 
19 
 
 
Figure 2.6. Schematic of the designed op amps 
2.8.2 Sizing Strategies for DC Gain Boost  
In terms of sizing, a good start point is to set the sizes of the transistors (M7 and M22) in the 
NMOS and PMOS gds cancellation circuits as small copies of the corresponding transistors in 
the op amp core. For example, M7 and M22 have the same width and length as M3 and M14 
but with fewer multipliers. M8~M9 and M23~M24 are just cascode current sources. With this 
start point, the conductance, gA and gB should be close to zero. The second step is to simulate 
the gA and gB and check if they are systematically positive or negative by typical corner 
simulation. If gA is systematically above zero, one should increase the size of M7 or decrease 
the size of M3 slightly to reduce the drain voltage of M5 so as to raise gds5/gm5 and reduce gA. 
On the contrary, if gA is systematically below zero, the opposite sizing strategy should be used. 
The sizing strategy can be applied to find the optimal gB as well. The third step is to determine 
the transistor’s mismatch via Monte Carlo simulation. For example, according to (2-35), the 
mismatch between M5 and M11 and the mismatch between M16 and M17 should be within 
Vb3
Vss
Vdd
Vb2
Vin- Vin+
Vb1
M0
M1 M2
M3
M16
M15
M12 M13M4
M17
M11M10
VA
Vo1-
M14
M20
M21
Vo
M5 M6
Vb2
M7
M8
M9
Vx-Vx+
PMOS -gds 
generator
Vb1
Output stageCascode stack
NMOS -gds 
generator
Current mirror
 input stage
M18
M19
Vb3
Rc
M24
M23
M22
Vb3
Vb4
VB
VG
VD
Cc
①④⑧
②
⑥
⑤
③
⑦
gA
gB
⑨
⑩
 
20 
 
10% to obtain a DC gain enhancement of 20dB. The last step is to simulate the robustness of 
the DC gain enhanced by the technique over PVT variations and to fine tune the transistor sizes 
accordingly.  
2.8.3 Stability of an Op Amp with a RHP Dominant Pole 
Under PVT variations, the generated negative conductance from the proposed method can 
be larger or smaller than the positive conductance to be cancelled. When the generated negative 
conductance is larger than the positive conductance, the output impedance of the first stage 
becomes negative and the dominant pole of the open loop op amp turns into a right half plane 
[RHP] pole. To understand a RHP pole’s impact on an op amp’s stability, a generic two-stage 
op amp placed in the negative feedback shown in Figure 2.7 will be discussed. The open loop 
transfer function of the two-stage op amp is given as (2-41), where GBW, P1, and P2 are 
respectively the op amp’s gain bandwidth product, dominant pole, and secondary non-
dominate pole. The closed loop transfer function of the configuration in Figure 2.13, H(s), is 
calculated as (2-42), where β is the feedback factor. According to (2-42), as long as β is larger 
than the reciprocal of the op amp’s DC gain ( P1/GBW), the closed loop system in Figure 2.7 
is stable. β is almost always larger than  P1/GBW in most practical applications. Therefore, the 
possible RHP dominant pole incurred by an overcompensation of the positive conductance 
should not change an op amp’s stability in most of the closed loop applications anyway. 
A(s) = (P2GBW) [(s − P1)(s + P2)]⁄  (2-41) 
H(s) =
A(s)
1 + A(s)β
=   
P2GBW
s2 + (P2 − P1)s + P2(βGBW − P1)
 (2-42) 
 
21 
 
 
Figure 2.7. A two-stage op amp with a RHP dominant pole in a negative feedback loop 
2.8.4 Frequency Analysis 
In order to understand the effects of the proposed conductance cancellation circuit on the 
entire op amp’s frequency response, the small signal block diagram of the proposed op amp, 
shown in Figure 2.8, is used for frequency analysis. Nodes ①~⑤and ⑨ in Figure 2.6 are of 
very low impedance. The poles associated with these nodes are close to or fractions of the 
transistors’ unity current gain frequencies (fT) and are significantly larger than the unity gain 
frequency (UGF) of the designed op amp. Since the frequency of interest (possible stability 
and slow time constant concerns) is below the UGF, the poles associated with nodes ①~⑤ 
and ⑨ will be neglected in the following calculations for simplicity. Similarly, the poles and 
zeros close to transistors’ fT in u1(s) and u2(s) will also be neglected. The full expressions of 
u1(s) and u2(s) are derived as (2-43) and (2-44), in which Zx, Px, Zy, and Py are shown in (2-
45). 
u1(s) ≈
(1 + s
 Cgs22
gm22
)
1 + s
 Cgs22 + Cgs14
gm22
gds16(1 +
sCgd14
gds16
)(1 −
sCgs16
gm16
)
(1 +
sCgs16
gm16
)
≈
gds16 (1 +
s
Zx
)
1 +
s
Px
 (2-43) 
u2(s) ≈
(1 + s
 Cgs7
gm7
)
1 + s
 Cgs7 + Cgs3
gm7
gds5(1 +
sCgd3
gds5
)(1 −
sCgs5
gm5
)
(1 +
sCgs5
gm5
)
≈
gds5 (1 +
s
Zy
)
1 +
s
Py
 (2-44) 
+
-
A(s)
Vout
β 
Vin
22 
 
Zx =
gds16
Cgs14
 ;   Px =
gm22
Cgs22 + Cgs14
;   Zy =
gds5
Cgd3
;    Py =
gm7
Cgs7 + Cgs3
 (2-45) 
 
Figure 2.8: Small signal block diagram of the proposed op amp  
In order to find the poles and zeros created by the gain enhancement method, KCL equations 
at nodes ②~⑥ and ⑩ are written as (2-46) ~ (2-51), where gi and Ci, i ∈ [1, 2, … ,6], are 
shown in Table 2.1. To obtain design insights from the transfer function from the op amp input 
to output (Vout/Vid), three assumptions are made to simplify the transfer function without losing 
accuracy during the derivation. The three assumptions are: 
1) The transconductance of transistors M1~M6 and M10~M21 are much larger than their 
conductance.  
2) CL>>CC>> C3, C5. 
3) The current mirror ratio of M6 to M10 is 1.  
 −0.5Vid gm1gm10 gm6⁄ + V2(g2 + sC2) + V3u1(s) = 0 (2-46) 
 gm17V2 + V3(g3 + sC3) + gds15(V3 − Vo1) + gm15V3 = 0                               (2-47) 
 −0.5gm1Vid + V4 (g4 + sC4) + V5u2(s) = 0                              (2-48) 
 gm10V4 + V5(g5 + sC5 + gds13 + gm13) − gds13Vo1 = 0                              (2-49) 
Vid/2 V1
C1g1
-gm1 -gm10
C2
g2
-gm17
C3g3
-u1(s)
-Vid/2 -gm1
C4g4
-gm10
C5g5
-u2(s)
C6
Vo1
V2 V3
V4 V5
gds15
gds13
++
++
gm15V3
gm13V5
① ②
④ ⑤
③
⑥
-gm18
C9g9
-gm20
-gm21
CL
Rc Cc
⑨ 
⑩ 
gL
23 
 
 gds13(V5 − Vo1) + gm13V5 + gds15(V3 − Vo1) + gm15V3
= Vo1sC6 +
(Vo1 − Vout)sCc
(1 + sRcCc)
 
(2-50) 
 
(Vo1 − Vout)sCc (1 + sRcCc)⁄ + V2gm18gm20 gm19 ⁄
= Vo1gm21 + Vout(gL + sCL) 
(2-51) 
Vout
Vid
≈
gm1
P1Cc
(1 +
sCcgmeff
2gm16gm21
) (1 +
sgm21tRcCc
gmeff
) (1 + s
Px+Py
PxPy
) (1 +
s
Px+Py
)
(1 +
s
P1
) (1 +
sCL
gm21
) (1 + s RcCc) (1 + s
Px+Py
PxPy
) (1 +
s
Px+Py
)
 
t =
gm18gm20
gm19gm21
;       gmeff = gm21t + 2gm16(gm21Rc − 1) 
(2-52) 
P1 = gL(
gAgds13
gm13
+
gBgds15
gm15
) (gm21Cc)⁄  (2-53) 
With the three assumptions above, the transfer function Vout/Vid is derived as (2-52), where 
P1, t, and gmeff are given in (2-53). Expressions of gA and gB are the same as the equations 
shown in (2-36). Expression (2-52) shows that the high frequency poles (Px and Py) associated 
with μ1(s)  and  μ2(s)  form two compressed pole-zero pairs at frequencies around 
(PxPy) (Px+Py⁄ ) and Px+Py. Fortunately, Px and Py are a fraction of transistor’s fT so they are 
inherently high frequency poles. As long as both Px and Py are at frequencies several times 
higher than the UGF of the op amp shown in Figure 2.6, the FVA-based gain enhancement 
technique should not change an op amp’s high frequency response.  
Table 2.1: Expression of the conductance and capacitance in the proposed op amp 
g1≈gm6+gds2 C1≈Cgs6+ Cgs10+Cgd2+Cgd4 
g2≈gm16 C2≈ Cgs16+ Cgs17+ Cgs18+ Cgd14+Cgd12 
g3≈gds17 C3≈ Cgs15+ Cgs22 + Cgd17 
g4≈gm5+gds1 C4≈Cgs5+ Cgs11+Cgd1+Cgd3 
g5≈gds11 C5≈ Cgs13 + Cgs7+ Cgd11 
gL≈gds21+gds20 C6≈ Cgs21 + Cgd15 +Cgd13 
g9≈gm9 C9≈ Cgs20 +Cgs19+Cgd18 
24 
 
2.8.5 Noise Analysis 
The noise model of the proposed op amp is shown in Figure 2.9. The modeled voltage noise 
includes flicker and thermal noise of the transistors. As the first stage of the proposed op amp 
has a high gain, the input referred noise contributed from the output stage is negligible. Noise 
contribution from cascode transistors will also be neglected because it is much smaller than 
the noise contribution from the input pair and current sources.  
 
Figure 2.9:  Noise model of the proposed op amp 
Therefore, the input referred noise power of the proposed op amp is approximately 
calculated as (2-54). In (2-54), the noise terms of(4gm10
2 en10
2 + 2gm17
2 en17
2 + 2gm1
2 en1
2 )/gm1
2  
and (gm24
2 en24
2 + gm9
2 en9
2 ) gm1
2⁄  are respectively contributed from the op amp core and the gds 
cancellation circuits. The noise contribution ratio of M24 to M10 can be found as (2-55), where 
𝛼 is the size ratio of M24 to M10. As the size ratio of M9 to M16 is the same as that of M24 
to M10, it can be found that the noise contribution ratio of M9 to M16 is also 𝛼. Therefore, the 
input referred noise power of the proposed op amp can be simplified as (2-56). As 𝛼 is set as 
1/6 in this design, the extra noise contribution from the conductance cancellation circuit is a 
very small portion of the total noise. 
Vss
Vdd
M1 M2
M3
M16
M15
M12 M13M4
M17
M11M10
Vo1-
M14
M5 M6
M7
M8
M9
M24
M23
M22
*
en9
2
*
en3
2
*
en5
2
*
en6
2
*
en10
2
*
en11
2
*
en24
2
*
en14
2
*
en17
2
*
en16
2
*
en2
2
*
en1
2
*
en7
2
*
en22
2
*
en4
2
*
en12
2
*
en13
2
*
en15
2
*
en8
2
*
en23
2
25 
 
 
eeq,prop
2 ≈
4gm10
2 en10
2 + gm24
2 en24
2 + 2gm17
2 en17
2 + gm9
2 en9
2 + 2gm1
2 en1
2
gm1
2  (2-54) 
 
gm24
2 en24
2
gm10
2 en10
2 =
gm24
2 (
8kT
3gm24
+
KFflicker
W24L24Coxf
) ∆f
gm10
2 (
8kT
3gm10
+
KFflicker
W10L24Coxf
) ∆f
=
gm24
gm10
= α (2-55) 
eeq,prop
2 ≈
(4 + β)gm10
2 en10
2 + (2 + β)gm17
2 en17
2 + 2gm1
2 en1
2
gm1
2  (2-56) 
2.8.6 Simulation and Measurement Results 
    
Figure 2.10: Layout and microphotograph of the fabricated proptotyp op amp 
In the designed prototype op amp as shown in Figure 2.6, the feedback paths from the gate 
of M7 to the gate of M3 and from the gate of M22 to the gate of M14 are made controllable by 
a switch so as to compare the DC gain of the op amp in two conditions: one with the gds 
cancellation circuit enabled (proposed) and one with the circuit disabled (conventional). The 
microphotographs and layouts of the two op amps are combined and shown in Figure 2.10. 
Post-layout simulation and measurement results of the two op amps are compared under 
various process corners, temperatures, supply voltage, and OSW. The comparison shows the 
effectiveness of the proposed method under of PVT, wide temperature and OSW variations. 
The Monte Carol simulation with 200 runs confirms the proposed op amp’s ability to provide 
large DC gain enhancement over random mismatch. 
26 
 
2.8.6.1 Simulated Open Loop DC Gain vs. PVT Variation 
 
Figure 2.11: Simulated DC gain vs. (a) P.T. variation (b) supply voltage (c) OSW 
The two op amps are placed in a unity gain buffer structure without any resistive load in the 
following post-layout DC gain simulation to show the open loop DC gain of the op amps. 
Figure 2.11(a), (b) and (c) respectively show the dependency of the proposed and conventional 
op amps’ loop DC gain over PVT variations and OSW. The dashed and solid lines respectively 
correspond to the proposed and conventional op amps’ performance. Figure 2.11(a) shows that 
the DC gain of the proposed and conventional op amps respectively ranges from 110.4dB to 
139.8dB and from 87.2dB to 90.1dB. The gap between the solid and dashed lines represents 
1.3 1.4 1.5 1.6 1.7 1.8
70
80
90
100
110
120
130
140
150
Supply voltage (V)
O
p
e
n
 L
o
o
p
 D
C
 G
a
in
 (
d
B
)
DC gain vs. Supply voltage
 
 
fff
fs
sf
ssf
ff
ss
tt
fff
fs
sf
ssf
ff
ss
tt
-40 -20 0 20 40 60 80
80
90
100
110
120
130
140
Temperature (C)
O
p
e
n
 L
o
o
p
 D
C
 G
a
in
 (
d
B
)
DC gain vs. Process and temperature variation
 
 
fff
fs
sf
ssf
ff
ss
tt
fff
fs
sf
ssf
ff
ss
tt
Proposed
0.2 0.4 0.6 0.8 1 1.2
70
80
90
100
110
120
130
Output voltage swing
O
p
e
n
 L
o
o
p
 D
C
 G
a
in
 (
d
B
)
DC gain vs. Output voltage swing
 
 
fff
fs
sf
ssf
ff
ss
tt
fff
fs
sf
ssf
ff
ss
tt
> 21.1dB
Conventional
Proposed
Conventional Conventional
Proposed
> 22.5dB
> 23.2dB
(a) (b) (c) 
1.3 1.4 1.5 1.6 1.7 1.8
70
80
90
100
110
120
130
140
150
Supply voltage (V)
O
p
e
n
 L
o
o
p
 D
C
 G
a
in
 (
d
B
)
DC gain vs. Supply voltage
 
 
fff
fs
sf
ssf
ff
ss
tt
fff
fs
sf
ssf
ff
ss
tt
-40 -20 0 20 40 60 80
80
90
100
110
120
130
140
Temperature (C)
O
p
e
n
 L
o
o
p
 D
C
 G
a
in
 (
d
B
)
DC gain vs. Process and temperature variation
 
 
fff
fs
sf
ssf
ff
ss
tt
fff
fs
sf
ssf
ff
ss
tt
Proposed
0.2 0.4 0.6 0.8 1 1.2
70
80
90
100
110
120
130
Output voltage swing
O
p
e
n
 L
o
o
p
 D
C
 G
a
in
 (
d
B
)
DC gain vs. Output voltage swing
 
 
fff
fs
sf
ssf
ff
ss
tt
fff
fs
sf
ssf
ff
ss
tt
> 21.1dB
Conventional
Proposed
Conventional Conventional
Proposed
> 22.5dB
> 23.2dB
(a) (b) (c) 
27 
 
the amount of DC gain boost solely brought out by the proposed technique. Across all process 
corners and temperatures ranging from -40°C to 80
°C, the minimum DC gain boost yielded by 
the proposed technique is around 23.2dB, which is comparable to the DC gain of a transistor 
in this process. This amount of DC gain boost is also consistent with the calculation in (2-34). 
Figure 2.11(b) shows that the minimum amount of DC gain enhancement brought by the 
proposed technique is about 21.1dB when supply voltage varies from 1.3V to 1.8V across all 
process corners. Figure 2.11(c) demonstrates that under the nominal supply voltage of 1.5V 
and room temperature, the proposed technique provides at least 22.5dB DC gain boost across 
all process corners when output voltage varies from 0.1V to 1.3V. 
Another noteworthy finding is that the DC gain enhancement brought by the proposed 
technique still has potential for further improvement and such potential can be easily realized. 
This is because the constraints on the current DC gain enhancement mostly originate from 
variations in process corner instead of temperature, OSW or supply voltage, as can be observed 
from Figure 2.11. These constraints can be reduced through a one-time trimming, which can 
be implemented using either registers or one-time programmable elements (OTP). 
2.8.6.2 Measured DC Gain vs. OSW 
The lab setup shown in Figure 2.12(a) is used to measure the DC gain of the two op amps, 
in which Vcm1 and Vcm2 are set as half of the supply voltage using two voltage calibrators DVC 
8500. The servo loop in Figure 2.12(a) keeps the DUT’s output equal to Vforce by adjusting the 
DUT’s inverting input accordingly. The voltage change at the DUT’s inverting input is 
amplified by a resistor ratio of 1000 and then low passed to be measured by a multi-meter of 
Agilent 344401A. The DC gain of the two op amps (DUT) is calculated by AOL =
| ΔVforce ΔVout⁄ | ∗ 1000. The measured DC gain of the proposed and conventional op amps is 
28 
 
shown in Figure 2.12b). It can be seen that more than 26.4dB DC gain is brought by the 
proposed method compared with its counterpart method. In addition, this DC gain boost drops 
only by 1dB over OSW of 0.1V~1.4V under a supply voltage of 1.5V. 
 
Figure 2.12:  Op amps DC gain measurement (a) schematic (b) lab setup (c) measured DC 
gain vs. OSW 
2.8.6.3 Simulated Open Loop DC Gain vs. Mismatch 
 
Figure 2.13: Op amps’ DC gain under P.Mis variation (a) proposed op amp (b) 
conventional op amp (c) gain enhancement  
The dependency of the DC gain enhancement on critical transistors’ mismatch has been 
analyzed in Section 2.6. The Monte Carol simulation of 200 runs is used to check the 
effectiveness of the method under process corner and random mismatch (P.Mis) variations. 
The Monte Carol simulated DC gain of the proposed and conventional op amps is shown in 
0.2 0.4 0.6 0.8 1 1.2 1.4
65
70
75
80
85
90
95
100
105
110
Output voltge swing (V)
M
e
a
s
u
re
d
 D
C
 g
a
in
 (
d
B
)
 Measured DC gain vs. Output voltage swing
 
 
Proposed
Conventional
Minimum gap 
of 26.4dB
-
+
+
-
DUT
-
+
+
-Vcm1
100100
10K
10K MAX400
100K
1uF
Vforce
10uF
99.9K
Vx
+
- Vcm2
20K
Vout
(a)     (b) (c)
100 110 120 130 140 150
0
5
10
15
20
25
30
35
40
45
Open Loop DC Gain (dB)
H
it
s
 
Proposed Op Amp 
84 86 88 90 92
0
10
20
30
40
50
Open Loop DC Gain (dB)
H
it
s
 
Conventional Op Amp 
20 30 40 50 60
0
5
10
15
20
25
30
35
40
45
Open Loop DC Gain Boost (dB)
H
it
s
 
Open Loop DC gain Boost by the Proposed Method
(a) (b) (c)
29 
 
Figure 2.13(a) and (b). The DC gain of the conventional op amp ranges from 83.49dB to 
92.77dB with a mean of 89.77dB and a sigma of 1.8dB whereas the proposed op amp produces 
a DC gain ranging from 104.2dB to 150.2dB with a mean of 123.9dB and a sigma of 9.19dB. 
The Monte Carol simulated DC gain boost brought by the proposed conductance cancellation 
method is shown in Figure 2.13(c). The mean, sigma, maximum, and minimum of the DC gain 
enhancement are respectively 34.13dB, 8.3dB, 59.12dB, and 18.8dB. 
2.8.6.4 Simulated AC Frequency Response 
Figure 2.14 shows the AC responses of the proposed and conventional op amps when they 
are placed in a unity buffer structure with a resistive and capacitive load of 20KΩ and 40pF. It 
can be seen that the proposed DC gain enhancement method increases the op amp’s low 
frequency gain while preserving the conventional op amp’s high frequency response. This is 
consistent with the frequency analysis. The simulation results show that the two op amps have 
the same GBW and PM of 13.6MHz and 55.7o.  
 
 
Figure 2.14: Post-layout simulated AC responses of the prop. and conv. op amps  
 
10
0
10
2
10
4
10
6
10
8
0
50
100
150
A
C
 g
a
in
 (
d
B
)
AC frequency response
 
 
Proposed
Conventional
10
0
10
2
10
4
10
6
10
8
0
50
100
150
Frequency (Hz)
A
C
 p
h
a
s
e
 (
 
)
 
 
Proposed
Conventional
10
0
10
2
10
4
10
6
10
8
0
50
100
150
A
C
 g
a
in
 (
d
B
)
AC frequency response
 
 
Proposed
Conventional
10
0
10
2
10
4
10
6
10
8
0
50
100
150
Frequency (Hz)
A
C
 p
h
a
s
e
 (
 
)
 
 
Proposed
Conventional
30 
 
2.8.6.5 Measured Transient Response 
     The measured transient responses of the proposed and conventional op amps with a unity 
gain buffer configuration are almost the same and shown in Figure 2.15.  The blue curve is the 
0.5V input step voltage and the red curve is the output of the proposed op amp. The rising and 
falling slew rates of the two op amps are about 19.4 V/µs and 14.38V/µs.  
 
Figure 2.15: Measured transient performance of the proposed op amp  
2.8.6.6 Simulated Noise Performance 
The input referred voltage noise densities of the proposed and conventional op amps are 
shown as red and blue curves in Figure 2.16. As expected, the two op amps have almost the 
same noise performance. Specifically, the integrated voltage noise of the proposed and 
conventional op amps from 0.1 to 10Hz are respectively 7.847uV and 7.815uV. Among the 
integrated noise from the proposed op amp, about 95.6%, 3.1% and 1.3% are respectively from 
the op amp core, M9, and M24 in Figure 2.6. 
31 
 
 
Figure 2.16: Post-layout simulated noise performance of the two op amps  
2.8.6.7 Performance Summary and Comparison  
Table 2.2 shows a performance comparison of the proposed and conventional op amps. Both 
simulated and measured results of the two op amps demonstrate the effectiveness and 
robustness of the proposed conductance cancellation method for DC gain enhancement.  
Compared with previous work [3][5], this work provides a simpler, more robust, and cost-
efficient solution to enhance DC gain of an op amp. For example, the DC gain boost of the 
fully differential op amp in [3] drops by 33dB with OSW between -0.24V and 0.24V under 
supply voltage of 3V, while the amount of boost in this work only drops by 1dB with OSR 
(output swing) between 0.1V and 1.4V under 1.5V supply. The normalized sensitivity of the 
DC gain boost with respect to OSW, SOSR, can be calculated as ∆Aen/OSR ∗ Vsupply. The SOSR 
of this work and [3] are respectively 0.5dB and 412.5dB. A detailed comparison between this 
work and [3][5] is shown in Table 2.3. With all process corner variations considered, APT_min, 
APS_min, and APOSW_min separately represent the minimum DC gain enhancement under 
temperatures between -40oC and 80oC, supply voltage between 1.3V and 1.8V, and OSR 
between 0.1V to 1.3V. APMS_min, APMS_mean, and APMS_max respectively are the minimum, mean, 
10
0
10
2
10
4
10
6
2
4
6
8
10
12
frequency (Hz)
In
p
u
t 
R
e
fe
rr
e
d
 N
o
is
e
 D
e
n
s
it
y
 (

V
/s
q
rt
(H
z
))
Input Referred Voltage Noise Density
 
 
Proposed
Conventional
32 
 
and maximum DC gain enhancement of the proposed method based on Monte Carol simulation 
results.  
Table 2.2: Sumamry of measured performance of the two op amps 
Op Amps Proposed Conv. 
+DC gain (dB)  108 80 
Gain bandwidth product (MHz) 8.0 8.0 
Phase margin (o) 50 50.3 
*Input referred noise (µVrms) (0.1Hz-1Hz)  5.719 5.694 
*Input referred noise (µVrms) (0.1Hz-1MHz)  16.467 16.409 
Capacitive and resistive load 40pF//20kΩ 40pF//20kΩ 
SR+/SR- (V/µs) 19.38/14.38 19.30/14.30 
Supply voltage (V) 1.5 1.5 
Current consumption (µA) 1128 1108 
Area (µm2) 14836 14432 
Process technology IBM 130nm CMOS 
+ DC gain measurement setup in Figure 2.12 a), * postlayout simulation resut  
Table 2.3: Performacne Comparison to the literature 
 [3] Yan [5] He This work 
CMOS Process 0.5µm 0.5µm 0.13µm 
Supply voltage (V) 3 - 1.5 
Current consumption (mA) 
 (excluding tuning circuits) 
15 - 1.128 
DC gain (dB) >83 >60 108 
DC gain boost (dB) - - 26.4 
OSW (V) -0.24~0.24 - 0.1~1.3 
Drop in DC gain boost over 
OSW (dB) 
33 - 1 
SOSW (dB/V) 68.75 - 0.77 
APT_min (dB) - - 23.2 
APS_min (dB) - - 21.1 
APOSW_min (dB) - - 22.5 
APMS_min, APMS_mean 
and APMS_max (dB) 
  18.8,  34.1, 
59.1 
Tuning circuits High-gain low-
offset 
comparator 
16bDAC, 
comparator, 
MCU 
NA 
Power/Area overhead (%) -- -- 1.8/2.8 
33 
 
2.9. A Folded Cascode Amplifier with the FVA-based GE Technique [6] 
The FVA-based gain enhancement technique is also suitable for folded cascode amplifiers 
(FCAs). Figure 2.17 shows a folded cascode amplifier (FCA) design with the FVA-based gain 
enhancement technique. In this work, three two-stage fully differential op amps are designed 
in the IBM 130nm CMOS process. The two op amps share the same core amplifier as shown 
in Figure 2.17 c), except that the first one (conventional) does not have any gain enhancement 
technique, whereas the second op amp has the aforementioned FVA-based gain enhancement 
technique. In the core amplifier, transistors M3 and M4 are cascoded to the op amp’s input 
pair, M1 and M2, so that the conductance looked up from the drain of M4 is much smaller than 
gds25. Thus, gds25 is the main positive conductance of the NMOS side to be cancelled in order 
to achieve DC gain enhancement. The second stage of the op amp is a folded mesh class-AB 
output stage.  
 
Figure 2.17: A fully differential FCA with the FVA-based technique 
gD ≈ gds25 (1 −
gds24gm25
gds25gm24
1 + η31
1 + η23
) ≈ −gds25(ε1 + ε2) (2-57) 
Similar to the conductance cancellation analysis in section 2.6, gD is the net conductance 
looking down from the drain of transistor M25 and can be found as (2-57), in which η31 and 
B+
Vb3
Vo1+_A
Vss
D-
B-
Vdd
Vb2Vin+ Vin-
Vb1
M0
M1 M2
M3
M5
M8A
M9A M10A
M4
M6
Vb3
gD
M22
M23
M24
1:
M25
m
N-side -gds generatorCore amplifier with only one-side output
D+
M26
Vfb+
VG+
Vfb-
VD+
P. -gds 
Gen.
P. -gds 
Gen.
M10B
M7B
M9B
Vo1-_A Vo1+_BVo1-_B
Vb3
M7A M8B
M11
M12
M13
M14
M15
M16
M17
Vo-
Cc/2
Cc/2
N. -gds 
Gen.
1st  stage
Vb2Vb2
2nd   stage
Vx+
Vb1
Vb2
M31
M32
M33
34 
 
η23 are the body effect coefficient of transistors M31 and M23 respectively. The values of η31 
and η23  are both close to 0.15 in this process. Also, ε1  is the transistor’s intrinsic gain 
mismatch between transistors M24 and M25 and ε2  is the body effect mismatch between 
transistors M23 and M31. ε1 is about 5% in this design but ε2 is negligibly small. Therefore, 
the expected gD is about 20 times less than gds25. The similar amount of conductance reduction 
is expected from the PMOS side conductance cancellation circuit. Therefore, the expected gB, 
the net conductance looking up from the drain of M6, is about 20 times less than gds6.  
 
Figure 2.18: gD1 and gB1 under P.T variation a) gD1 b) gB1 
 
Figure 2.19: gD and gB under P.T variation a) gD b) gB 
 
-40 -20 0 20 40 60 80
40
60
80
100
120
Temperature (C)
(a)
g
D
1
 (
 
S
)
-40 -20 0 20 40 60 80
20
25
30
35
40
Temperature (C)
(b)
g
B
1
 (
 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
-40 -20 0 20 40 60 80
-10
-5
0
5
Temperature (C)
(a)
g
D
 (
p
ro
p
o
s
e
d
) 
( 
S
)
-40 -20 0 20 40 60 80
-2
-1
0
1
2
3
Temperature (C)
(b)
g
B
 (
p
ro
p
o
s
e
d
) 
( 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
-40 -20 0 20 40 60 80
-10
-5
0
5
10
15
Temperature (C)
(c)
g
D
 [
4
] 
( 
S
)
-40 -20 0 20 40 60 80
-5
0
5
Temperature (C)
(d)
g
B
 [
4
] 
( 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
-40 -20 0 20 40 60 80
-10
-5
0
5
Temperature (C)
(a)
g
D
 (
p
ro
p
o
s
e
d
) 
( 
S
)
-40 -20 0 20 40 60 80
-2
-1
0
1
2
3
Temperature (C)
(b)
g
B
 (
p
ro
p
o
s
e
d
) 
( 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
-40 -20 0 20 40 60 80
-10
-5
0
5
10
15
Temperature (C)
(c)
g
D
 [
4
] 
( 
S
)
40 -20 0 20 40 60 80
-5
0
5
Temperature (C)
(d)
g
B
 [
4
] 
( 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
-40 -20 0 20 40 60 80
-10
-5
0
5
Temperature (C)
(a)
g
D
 (
p
ro
p
o
s
e
d
) 
( 
S
)
-40 -20 0 20 40 60 80
-2
-1
0
1
2
3
Temperature (C)
(b)
g
B
 (
p
ro
p
o
s
e
d
) 
( 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
-40 -20 0 20 40 60 80
-10
-5
0
5
10
15
Temperature (C)
(c)
g
D
 [
4
] 
( 
S
)
-40 -20 0 20 40 60 80
-5
0
5
Temperature (C)
(d)
g
B
 [
4
] 
( 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
-40 -20 0 20 40 60 80
-10
-5
0
5
Temperature (C)
(a)
g
D
 (
p
ro
p
o
s
e
d
) 
( 
S
)
-40 -20 0 20 40 60 80
-2
-1
0
1
2
3
Temperature (C)
(b)
g
B
 (
p
ro
p
o
s
e
d
) 
( 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
-40 -20 0 20 40 60 80
-10
-5
0
5
10
15
Temperature (C)
(c)
g
D
 [
4
] 
( 
S
)
40 -20 0 20 40 60 80
-5
0
5
Temperature (C)
(d)
g
B
 [
4
] 
( 
S
)
 
 
ssf
fff
sf
tt
ff
fs
ss
35 
 
 
Figure 2.20: DC gain of the proposed and conventional op amp  
 
As discussed in section 2.86, the FVA-based gain enhancement technique is mainly affected 
by the process variation. The original positive conductance is gD1 by looking down from M25’s 
drain node and is gB1 by looking up from M6’s drain node. Figure 2.18 shows the variation of 
the original positive conductance under process corner and temperature (P.T) variation. As can 
be seen that gD1 varies from 35µS to 110µS, whereas gB1 varies from 22µS to 40µS. Compared 
to gB1, the larger variation range of gD1 is caused by an inherently wider spread of NMOS 
transistors’ transconductance and conductance over process and temperature variations. The 
net conductance by looking down from M25’s drain node and by looking up from M6’s drain 
node are annotated as gD and gB respectively. The simulated gD and gB are shown in Figure 
2.19. As can be seen, the gD only varies from -9.62µS to 2.93µS and gB only changes from -
1.09µS to 2.58µS over process and temperature variations. The much smaller absolute values 
of gD and gB confirms the conductance cancellation effects of the FVA-based gain 
enhancement technique. The amounts of DC gain enhancement arisen from the technique are 
shown in Figure 2.20. It shows that the minimum amount of the DC improvement is 28.9dB 
-40 -20 0 20 40 60 80
80
90
100
110
120
130
140
150
Temperature (C)
D
C
 g
a
in
 (
d
B
)
Proposed vs. Conventional
-40 -20 0 20 40 60 80
80
90
100
110
120
130
140
150
Temperature (C)
D
C
 g
a
in
 (
d
B
)
[4] vs. Conventional
 
 
ssf
fff
sf
tt
ff
fs
ss
ssf
fff
sf
tt
ff
fs
ss
(a) (b)
Conventional
Minimum gap 
of 28.9dB
Proposed
[4]
Minimum gap 
of 16.3dB
Conventional
-40 -20 0 20 40 60 80
80
90
100
110
120
130
140
150
Temperature (C)
D
C
 g
a
in
 (
d
B
)
Proposed vs. Conventional
-40 -20 0 20 40 60 80
80
90
100
110
120
130
140
150
Temperature (C)
D
C
 g
a
in
 (
d
B
)
[4] vs. Conventional
 
 
ssf
fff
sf
tt
ff
fs
ss
ssf
fff
sf
tt
ff
fs
ss
(a) (b)
Conventional
Minimum gap 
of 28.9dB
Proposed
[4]
Minimum gap 
of 16.3dB
Conventional
36 
 
over process and temperature variations. This verifies the effectiveness and robustness of the 
proposed FVA-based gain enhancement for folded cascode op amps. The performance 
summary of the two designed op amps is shown in Table 2.4.  
Without the aid of any tuning circuit, the proposed FVA-based gain enhancement technique 
keeps a DC gain enhancement of over 28.9dB under temperatures between -40 and 80°C, over 
27.6dB under supply voltage between 1.4V and 2V, and over 29dB under differential output 
swing between -1.1V and 1.1V. The power and area overhead of the gain enhancement circuit 
are respectively only 7% and 3% of those of the conventional op amp.  
Table 2.4: Performance summary of the designed op amps 
Op Amps Conventional Proposed 
DC gain (dB) 87.4 131.4 
Load capacitor (pF) 20 20 
GBW/ UGF (MHz) 73.1/66.51 75.5/67.53 
PM (o)  53.9 53.6  
SR+/SR- (V/µs) 51.2/51.3 51.3/51.9 
1% settling time (ns) 40.8 40.0 
0.01% settling time (ns) 66 66 
0.0001% settling time(ns) NA 90 
Supply voltage (V) 1.5 1.5 
Current (µA) 1141 1221 
Estimated area (mm2) 0.0532 0.0548 
Process technology IBM 130nm CMOS 
2.10. Discussion 
The proposed SDC-based and FVA-base gain enhancement techniques are ultimately limited 
by intrinsic gains of the critical transistors such as transistors M5, M11, M16 and M17 in 
Figure 2.6. This limitation can be mitigated via replacing the critical transistors by compound 
transistors or gain blocks which have much higher DC gain than a single transistor’s DC gain.  
As for the design of the lower gain amplifier in both SDC-based and FVA-base gain 
enhancement techniques, the amplifier’s DC gain constancy is very critical. Non-constant DC 
gain of the amplifier under PVT variations will need significant design and simulation efforts 
37 
 
to achieve large DC gain enhancement. This has been analyzed and discussed in detail in [5]. 
Fortunately, the low gain amplifier or the level shifter in the SDC-based and FVA-based gain 
enhancement technique in Sections 2.5 and 2.6 have very good gain constancy.  
2.11. Summary 
A new gds cancellation method to robustly improve op amps’ DC gain with negligible power 
and area overhead has been introduced. The method can be implemented based on the source 
degeneration circuit (SDC) and the flipped voltage attenuator (FVA). Compared to the FVA-
based method, the SDC-based technique is more suitable for the CMOS processes, in which 
transistors’ threshold voltages are too low for the transistors to work in weak or strong 
inversion regions in the FVA configuration. Otherwise, the FVA-based technique is 
recommended as this technique is more robust to devices’ random mismatch. A prototype 
current mirror input op amp with the FVA-based technique is designed and fabricated in the 
IBM130nm process. The measurement and simulation results of the prototype verify that the 
technique effectively enhances an op amp’s DC gain (>20dB) and is very robust over process, 
voltage and temperature variations. Another prototype folded cascode amplifier design with 
the FVA-based technique also shows large DC gain enhancement.  
The simulation and measurement results agree well with the theoretical analysis. The 
effectiveness of the proposed gain enhancement method is supported by the measurement and 
post-layout simulation results of two prototype op amps in presence of variations in process, 
temperature, supply voltage, output voltage swing, and random mismatch. The design 
simplicity, gain enhancement effectiveness, low power and area overhead, and zero 
degradation on settling time performance make the proposed gain enhancement method 
38 
 
suitable for many high precision applications such as switched-capacitor circuits and sigma-
delta converters.  
2.12. References  
[6]. K. Bult and G. J. G. M. Geelen, "A fast-settling CMOS op amp for SC circuits with 
90-dB DC gain," in IEEE Journal of Solid-State Circuits, vol. 25, no. 6, pp. 1379-1384, 
Dec 1990.  
[7]. J. Yan and R.L. Geiger, "Fast-settling CMOS operational amplifiers with negative 
conductance voltage gain enhancement, " ISCAS, 2001, Sydney, Australia, pp. 228-
231 
[8]. J. Yan, and R.L. Geiger, "A high gain CMOS operational amplifier with negative 
conductance gain enhancement, " CICC 2002, Orlando, FL, USA, pp. 337- 340, 2002 
[9]. C. He, L. Jin, D. Chen and R.L. Geiger, "Robust High-Gain Amplifier Design Using 
Dynamical Systems and Bifurcation Theory With Digital Postprocessing Techniques, 
" IEEE TCAS I, vol. 54, no. 5, pp. 964-973, May 2007  
[10]. B. Huang and D. Chen, "Power-efficient, PVT robust conductance cancellation 
method for gain enhancement," Electronics Letters , vol.49, no.16, pp.,, Aug. 1 2013 
[11]. B. Huang and D. Chen, “An Effective Conductance Cancellation Method with 
Minimal Design Effort”,   IEEE Midwest Symposium on Circuits and 
Systems  (MWSCAS), 2014, College Station, TX 
[12]. B. Huang and D. Chen, “A High Gain Operational Amplifier via an Efficient 
Conductance Cancellation Technique”, IEEE Custom Integrated Circuits Conference 
(CICC), 2014, San Jose, CA, USA 
 
39 
 
 SLEW RATE ENHANCEMENT FOR 
OPERATIONAL TRANSCONDUCTANCE AMPLIFIERS  
3.1. Introduction 
In applications of switched-capacitor circuits and other applications with large capacitive 
loads such as liquid crystal display drivers, OTAs must provide sufficient slew rate (SR) to 
achieve fast settling performance. In a conventional Class-A OTA, as shown in Figure 3.1, its 
SR and gain-bandwidth product (GBW) are given by (3-1). To maximize the gm/Itail 
efficiency and optimize the OTA’s noise and GBW, the input pair (M1 and M2) usually work 
in weak inversion regions with overdrive voltage typically around 70~80mV. Therefore, the 
ratio of SR to GBW is derived as (3-2) and its value is about 0.1V.  
When a sine-wave with a frequency of GBW/2π is applied at the input of the OTA in the 
configuration of a noninverting unity gain buffer, the ideal output voltage of the OTA, Vout, 
is given by (3-3). In order to avoid slew rate induced distortions at Vout, the OTA’s SR 
(~0.1*GBW) needs to be larger than the fastest voltage change rate of Vout. The fastest change 
rate happens at the zero-crossing point and is equal to is GBW*A. Therefore, if the peak-to-
peak Vout voltage is more than 0.2V at frequency of GBW/2π, the OTA’s limited slew rate 
starts to cause distortion. In order to improve the linearity of low gain OTAs, it is very 
important to decouple their gain bandwidth product (GBW) and slew rate (SR), and to preserve 
OTAs’ DC and small signal performance. In an effort to improve the slew rate of OTAs with 
small static power consumption, several different methods have been reported in the literature 
and will be reviewed in Section 3.2. 
    
GBW =
gm1
CL
;    SR =
Itail
CL
 (3-1) 
40 
 
 
SR
GBW
=
Itail
CL
gm1
CL
=
Itail
gm1
=
2I1
gm1
= 2nVT ≈ 0.1V    (3-2) 
 Vout(t) = A sin(GBW ∗ t)     
 
(3-3) 
 
Figure 3.1: Conventional Class-A operation transconductance amplifier 
3.2. Literature Review 
In the literature, many different slew rate enhancement (SRE) methods [1-6] have been 
proposed but they all suffer from various drawbacks. For example, some SRE methods [1] [2] 
are incompatible with low supply voltage, some [3-5] degrade amplifier linearity, some [6] are 
sensitive to input common mode range (ICMR), yet others [4] require complex circuits 
producing large power and area overhead. 
One of the widely used SRE methods for OTAs is the adaptive biasing scheme [3], as shown 
in Figure 3.2. The current mirror ratios of all current mirrors in Figure 3.2 are 1:1 except the 
current mirrors of M17-M20. The current mirror ratios of M17-M18 and M20-M19 are both 
1:A. When a positive differential signal vid is applied at the inputs of the OTA, I2 becomes 
larger than I1, where I1 and I2 respectively denotes the drain currents of M1 and M2.  The 
M3
VDD
VSS
Itail
M1 M2
M4
M9 M6
M5M8
M7
Vin- Vin+
Vout
I1 I2
Vb
CL
41 
 
absolute current difference between I1 and I2 is sensed by current subtraction circuits formed 
by M16-M22 and is feedback to the tail current of the OTA. Assuming both M1 and M2 work 
in the weak inversion region, I1 + I2 = A|I1 − I2| + Ip  and I1 = I2 exp(Vid/nVT )  can be 
obtained by writing the KCL equations at the common source node of the input pair, node X.  
Thus, I1 and I2 can be found as (3-4) and (3-5), where VT is the thermal voltage. The output 
current of the OTA is the current difference between I1 and I2 and is derived as (3-6). 
 
M3
VDD
VSS
IP
M1 M2
M4
M22M15
M13 M14M12M11
M16
M17
M18
M9
M19 M21
M20
M6
M5M8
M7
Vin- Vin+
Vout
I1 I2
A:11:A
Vb
X
 
Figure 3.2: An OTA with the adaptive biasing circuit [3] 
 
I1 =
Ipexp (Vid/nVT)
(A + 1) − (A − 1)exp (Vid/nVT)
 (3-4) 
 
I2 =
Ip
(A + 1) − (A − 1)exp (Vid/nVT)
 (3-5) 
 
Iout = I1 − I2 = Ip [
−1 + exp (Vid/nVT)
(A + 1) + (1 − A)exp (Vid/nVT)
] (3-6) 
For large signal operation, vid ≫ nVT, output peak current is obtained as (3-7). Equation (3-
7) implies the peak current Ipk is very large when A=1. But the peak current cannot be infinite 
since when the drain current of the input pair becomes large, the input pair will leave the weak 
inversion region and equations (3-6) and (3-7) are no longer valid. For small signal operation, 
42 
 
(3-6) is applicable. The transconductance of the input pair, gm, is defined as ∂Iout/ ∂Vid and 
is calculated as (3-8) accordingly. As can be seen from (3-8), gm varies as differential input 
signal changes when A is not equal to zero. This dependency of gm on differential signal 
degrades the linearity of the OTA compared with the conventional OTA where A=0. The 
reason of the loss in linearity is that the adaptive circuit does not distinguish between small 
signal and large signal operations. Comparatively, the adaptive biasing circuit is always on as 
long as a differential signal is applied. In an effort to improve the slew rate of an OTA while 
not degrading the OTA’s linearity, the desired features of SRE circuits are discussed in the 
next section.   
 
Ipk = Iout|Vid≫nVT
= {
Ip
1 − A
,                   0 ≤ A < 1
unpredicted current, A ≥ 1
 (3-7) 
 
gm ≈
2Ip
nVT
(1 + A)2 (1 −
Vid
nVT
) + (1 − A)2 (1 +
Vid
nVT
) + 2(1 − A2)
 
    ≈
Ip/nVT
2 − 2AVid/nVT
 
(3-8) 
3.3. Desired Features of Slew Rate Enhancement Circuits 
        In order to avoid linearity degradation, the proposed SRE method should be off for 
small signal and DC operations. However, when an amplifier is at the onset of slewing, the 
proposed SRE method should be activated to dynamically increase the SR of the amplifier. 
Several desired features of a proposed SRE method are listed as below: a) simple; b) low power 
and area consumption for SRE circuits; c) having a predefined turn on voltage for the SRE 
circuit. For small signal operation, the sensed voltage is smaller than the turn on voltage. 
Therefore, the SRE circuit stays off in small signal operation and avoids the aforementioned 
linearity degradation.  
43 
 
3.4. Proposed SRE Method via Excessive Transient Feedback 
3.4.1 Concept of the slew rate enhancement via excessive transient feedback 
The concept of the proposed SRE method is shown in Figure 3.3.  First, a transient signal at 
the output stage that can be a single ended or differential voltage or current signal, xs, is sensed. 
Then the feedback signal, xfb, is generated to turn on/off the SRE circuit. xfb is a nonlinear 
function of xs. The relationship between xfb and xs is given in (3-9), where α is a non-constant 
gain factor and xn is the threshold for extracting an excessive transient signal.  
 
Figure 3.3: Concept of the proposed SRE method  
 
xfb = f(xs) = {
0                                  if |xs| ≤ xn
α(|xs| − xn)             if |xs| > xn 
                        (3-9) 
    When an amplifier is in the dc or small signal operation, that is, when |xs| ≤ xn, xfb is 
zero and the SRE feedback is turned off. However, when the amplifier’s output stage is at the 
onset of slewing, xfb, the product of α and the excessive transient signal, |xs| − xn, will be 
generated to turn on the SRE feedback. In order to ensure zero impact on the amplifier’s small 
signal operation and an effective SR boost, xn and α should be set properly.  
3.4.2 Selections of sensing and driving nodes for a SRE circuit 
 
 
Input 
Stage
Push-pull 
output driver
Vin Vo
CL
f(xs)
Nonlinear 
function 
xfb xs
Vo1
44 
 
M3
VDD
VSS
IP
M1 M2
M4 M6
M5
M7
M18
M9
M8
Vin- Vin+
I1 I2
Vb
CL
SR detection and 
control circuit
 
Figure 3.4: Different types of SRE methods 
The selections of sensing and driving nodes or branches for a SRE circuit are very important. 
As shown in Figure 3.4, the selections of driving nodes or branches for a SRE circuit can be a) 
a tail current source b) an output node c) both a tail current source and output node. The benefit 
of boosting tail current is that large signal slew rates and gain bandwidth production of the 
OTA can be increased simultaneously. This maximizes the OTA’s large signal operation 
speed. But this method of boosting tail current to improve slew rate requires that all the circuits 
in the OTA’s large signal path have sufficient dynamic range to respond to a very large tail 
current without suffering from any long recovery time after slewing phase. This is usually more 
difficult to accomplish when the large signal path is long or involves many devices. By 
boosting the current directly to the OTA’s output node, the requirement of the OTA’s dynamic 
range can be mitigated because the small and large signal operation paths of the OTA are 
separate. The core of OTA can be optimized for small signal performance, while the SRE 
circuit can be designed for large signal performance improvement. But boosting transient 
current directly to the OTA’s output node increases the SRE circuit’ design complexity and 
requires additional large transistors in place to conduct the dramatically increased transient 
45 
 
current. This may significantly increase the OTA’s area. In general, boosting tail current of an 
OTA is preferred if the OTA has a large dynamic range for design simplicity and compactness. 
Otherwise, boosting transient current to the output node is recommended.  
For selections of circuit nodes or branches for slewing detection, the nodes or branches with 
the least delay from the OTA’s output are generally preferred since SRE’s turn-on and turn-
off delays ultimately limit the effectiveness and robustness of a SRE circuit. In the example 
OTA in Figure 3.4, compared to the gates of transistors M3 and M4, the gates of transistors 
M1 and M2 are faster sensing nodes. Similarly, the gates of transistors M3 and M4 are faster 
than the gate of M7 in terms of slewing detection. However, sensing the fasting nodes is not 
always very straightforward. In the example OTA, the input nodes are the fastest nodes but 
still have a very wide input common mode range (ICMR). Therefore, the SRE circuit, sensing 
the input nodes, needs to be configured to accommodate this ICMR, which adds to the design 
complexity of the circuit. Because of this, one needs to make tradeoffs between design 
complexity of SRE circuit and delays of sensing nodes. 
3.5. Design Example with the Proposed SRE Technique  
Based on the discussed SRE concept via excessive transient feedback, we present an OTA 
design with a simple SRE circuit as shown in Figure 3.5. The OTA consists of an OTA core 
and a proposed SRE circuit. The SRE is implemented to boost the OTA’s tail current in this 
design because the OTA has a very wide dynamic range. Transistor M19 is designed to provide 
transient tail current. M19 is normally off in the quiescent or small signal operation to preserve 
the OTA’s small signal operation and linearity. However, when the OTA is about to slew, M19 
will be turned on heavily to provide a large dynamic tail current to effectively boost the OTA’s 
SR and improve its large signal linearity.  
46 
 
M16
VC
M19
M15
M17 M18
1:n (n>2) 
Proposed SRE circuit
VA
VDD
VSS
Vctrl
Itail_dy
M14
VB
M1 M2
M0
M7 M8
M5
M6M4
M3
B
A C
Vb1
Vin+
VA
VDD
VSS
Core amplifier
R1 R2
It
LCMFB
Vout
Itail_dy
Vin-
 
Figure 3.5: Designed one-stage OTA with the proposed SRE method 
As to slewing detection nodes, the OTA’s internal nodes, A and C, are selected because they 
provide the optimal tradeoff between design complexity and speed of the sensing nodes. The 
voltages of nodes A and C, VA and VC, share the same common mode voltage, VB, but have 
opposite differential voltage. Transistors M14-M16 have the same size, whereas M18 and M17 
have a size ratio of n. In this work, n>2 is chosen so as to make sure that in the quiescent 
operation M15 and M16 work in the saturation region and M18 works in the triode region. As 
M18 works in the triode region, its drain source voltage, VDS18, and gate source voltage of 
M19, VGS19, are very small. As a result, the drain current of M19 is zero in the quiescent 
operation. Therefore, the OTA’s DC operation is untouched by the proposed SRE circuit. In 
the quiescent operation, the KCL equation at the node of M19’s gate is calculated as (3-10), in 
which β18 and β17 are respectively μpCoxW18/L18 and μpCoxW17/L17. VDS18 is M18’s drain 
source voltage and Vod18 is M18’s overdrive voltage. After solving (3-10), VDS18 is found as 
(3-11). To ensure that M19 works in the cutoff region in the quiescent operation, VDS18 should 
be less than the threshold voltage of M19. 
47 
 
β18 (Vod18 −
1
2
VDS18) VDS18 = 2 ∗
1
2
∗ β17Vod18
2  (3-10) 
VDS18 = (1 − √1 − 2/n   )Vod18 (3-11) 
Upon application of a differential signal, vid, to the input of the OTA, the differential current 
of the input pair is annotated as Id. As devices R1,2, M3 and M4 form a local common mode 
feedback (LCMFB). This LCMFB sets node B as a virtual ground and makes the voltage 
changes at nodes A and C complementary. In the presence of Id, the voltages of VA and VC 
become respectively VB+∆V and VB-∆V, where |∆V|=0.5*Id*R1,2. When |∆V| < Vod14, both 
M15 and M16 stay on. When |∆V| > Vod14, either only M15 or only M16 is on. According to 
the square law model, the total drain current of M15 and M16 is found as (3-12). From (3-12), 
it can be found that I15 + I16 monotonically increases as |∆V| increases. Therefore, the gate 
voltage of M19, Vctrl, monotonically decreases as |∆V| increases. The voltage gain from |∆V| 
to Vctrl is very high when M18 works in the saturation region, and M15-M16 work in either the 
saturation region or the cutoff region. In this operation scenario, any voltage increases of |∆V| 
will dramatically reduce Vctrl and hence turn on M19 due to the high voltage gain from |∆V| to 
Vctrl. Therefore, we define |∆V| and Id, ensuring that M18 work in the saturation region as the 
turn-on voltage (∆Von) and turn-on current of the SRE circuit (Id,on). When the SRE circuit is 
on, M15-M16 can work in either the saturation region or the cutoff region. According to the 
definitions of ∆Von, (3-13) is found by writing KCL equation at transistor M18’s drain node, 
where n is the size ratio of transistor M18 to transistor M17.  Equation (3-13) depends on the 
relationship between ∆Von and Vod14. If ∆Von < Vod14, transistor M15, M16 and M18 all work 
in the saturation region at the turn-on boundary of the SRE circuit. If ∆Von > Vod14, at the turn-
on boundary of the SRE circuit, M15 and M16 respectively work in the cutoff and saturation 
regions or vice versa, and M18 works in the saturation region. Therefore, ∆Von  can be 
48 
 
calculated as (3-14). As can be seen, ∆Von is equal to Vod14 if n=4.  If n is smaller than 4, the 
first equation in (3-14) is valid; Otherwise the second is valid. For large signal operation, |∆V| 
is very large, making M15 or M16 work in the deep triode region and Vctrl approximate Vss. 
Thus, M19 is turned on heavily and a large transient tail current is provided to the OTA to 
effectively boost its SR. 
I15 + I16 = {
β15[Vod14
2 + ∆V2]           |∆V| < Vod14 
1
2
β15[(Vod14 + ∆V)
2]     |∆V| > Vod14
 (3-12) 
{
β15[Vod14
2 + ∆Von
2 ] =
n
2
β15Vod14
2    if  ∆Von < Vod14
1
2
β15[(Vod14 + ∆Von)
2] =
n
2
β15Vod14
2  if  ∆Von > Vod14
 (3-13) 
{
∆Von = √
n
2
− 1 Vod14      if  ∆Von < Vod14
∆Von = (√n − 1)Vod14    if  ∆Von > Vod14
 (3-14) 
In short, the proposed SRE method has a predefined turn on voltage, ∆Von, with zero impact 
on an amplifier’s DC operating point or small signal performance or linearity. Meanwhile, it 
can provide a very large dynamic current to effectively enhance an amplifier’s SR when the 
amplifier slews, thus improving the amplifier’s large signal linearity. 
3.5.1 Small signal analysis 
As discussed earlier, the LCMFB, formed by devices R1,2, M3 and M4, sets node B as a 
virtual ground. As a result, the effects of Cgs3 and Cgs4 on nodes A and C are eliminated. 
Compared to an OTA without the LCFMB, the poles associated with nodes A and C, calculated 
as (3-15), tend to be at a higher frequency, where RA=R1,2//rds3,4//rds1,2 and CA is the total 
parasitic capacitor at node A. Since transistors M3~M6 have the same size, the gain bandwidth 
product (GBW) of the OTA can be obtained as (3-16), where gm1, gm5,6 are transconductance 
49 
 
of M1, M5-M6 and CL is the load capacitor. Therefore, the phase margin (PM) of the OTA is 
approximately found as (3-17). In order to ensure that PA imposes little phase degradation on 
the OTA, R1,2 should be small. In this work, the value of R1,2 is close to 1/gm3,4, where gm3,4 is 
the small-signal transconductance of M3 and M4. 
PA =
1
2πRACA
 (3-15) 
BBW =
gm1RAgm5,6
2πCL
 (3-16) 
PM ≈ 90 − tan−1
CAgm1gm5,6RA
2
CL
 (3-17) 
3.5.2 Large signal analysis 
   In the presence of a differential current in the input pair, Id, the current flow in R1,2 can be 
found as Id/2 while VA and VC correspondingly become VB+R1Id/2 and VB-R1Id/2. The currents 
in M5 and M6 in the output stage are obtained as (3-18). Since M7 and M8 have the same size, 
the current flow in CL is equal to the current difference between I5 and I6 as given by (3-19), in 
which Vod3 is proportional to the square root of I1+I2.  This means that a boosted transient tail 
current always enhances the OTA’s SR no matter whether the transient current in M1 and M2 
is differential-mode current or common-mode current. Also, a large R1,2 in the LCMFB is 
helpful for SRE but reduces phase margin and stability of the OTA as shown in (3-17). 
Fortunately, the workings of the proposed SRE method does not require large resistors to 
achieve large SRE and hence the method satisfies the stability requirement.  
I5 =
1
2
β3 (Vod3 +
R1,2Id
2
 )
2
, I6 =
1
2
β3 (Vod3 −
R1,2Id
2
 )
2
 (3-18) 
50 
 
IL = {
β3 Vod3R1,2Id                       if  Id < 2Vod3/R1,2
1
2
β3 (Vod3 +
R1,2|Id|
2
 )
2
    if  Id > 2Vod3/R1,2
     (3-19) 
Id,on =
2 ∗ ∆Von
R1,2
=
2Vod14
4R1,2
=
βVod3gm3,4
2α
=
βItail,Q
2α
 (3-20) 
In this design, the size ratio between M17 and M18 is n=2.125. To guarantee this ratio after 
fabrication, some sophisticated layout techniques or simple trimming circuits may need to be 
implemented. After plugging n into (3-14), the turn-on voltage, ∆Von, of the SRE circuit is 
found as 0.25Vod14. Assuming that R1,2 is α/gm3,4 and Vod14 is equal to βVod3, the obtained turn 
on differential current Id,on is given by (3-20), where Itail,Q is the quiescent drain current in M0. 
Equation (3-20) implies that as long as β/2α is less than 1, the SRE circuit will be turned on 
when the OTA starts to slew. In this design, β=1 and α=1. After the OTA completes slewing 
and enters the small signal settling, Id becomes less than Id,on. This turns off the SRE circuit.  
3.6. Simulation Results 
To show the effectiveness of the proposed SRE method, three one-stage single-ended OTAs 
are designed in the IBM 130nm process. The first OTA (conventional) and the second 
(proposed) share the same core amplifier as shown in Figure 3.5; but the conventional OTA 
does not have any SRE circuit whereas the proposed OTA has the proposed SRE circuit. The 
third OTA (adaptive) has an adaptive SRE circuit [3]. The three designed OTAs have almost 
the same unity gain frequency (UGF) of 7.3MHz and phase margin (PM) of 88° under the 
same capacitive load of 20pF. Small signal step responses of the three OTAs are shown in 
Figure 3.6. Unlike the adaptive method, the OTA with the proposed SRE circuit preserves the 
small signal step responses of the conventional OTA. 
51 
 
 
Figure 3.6: Small signal transient response of the three designed OTAs 
 
Figure 3.7: Step responses of the three OTAs (a) output voltages (b) tail currents 
Upon application of 0.8V voltage step to the input of the OTA in the unity gain buffer 
configuration, the transient responses of the three OTAs are shown in Figure 3.7(a). As shown 
in Figure 3.7(a), the proposed SRE method improves the average slew rate of the conventional 
OTA by a factor of 2320% under power and area overhead of only 2% and 1.2%. Compared 
with the adaptive method [3], the proposed SRE method enhances the slew rate by more than 
300% but with power and area overhead decreased by 11.1% and 25%. In the slewing phases, 
the corresponding transient tail currents of the three OTAs are displayed in Figure 3.7(b). The 
peak transient tail currents of the proposed OTA are 1158uA in the negative slewing phase and 
0 50 100 150 200
0.5
0.51
0.52
0.53
0.54
0.55
0.56
Time (ns)
O
u
tp
u
t 
v
o
lt
a
g
e
 (
V
)
Small signal settling
 
 
Proposed
Adaptive
Conventional
(a)
(b)
(a)
(b)
52 
 
871.6uA in the positive slewing phase, which are respectively about 4 and 2.4 times of the 
adaptive OTA, and 14.5 and 13.4 times of the conventional OTA. In addition, the linearity of 
the three OTAs is simulated with a 1MHz, 0.6V peak-to-peak voltage sine wave. The total 
harmonic distortion (THD) of the proposed OTA is respectively improved by 18dB and 6dB 
compared with the adaptive and conventional OTAs. The performance of the three designed 
OTAs is summarized and compared in Table 3.1. 
Table 3.1: Performance summary of the three designed OTAs 
Parameter Conventional Adaptive [3] Proposed 
Load Capacitor (pF) 20 20 20 
DC Gain (dB) 24.9 24.86 24.9 
UGF(MHz) 7.33 7.58 7.33 
PM (deg) 88.7 89 88.7 
SR+/SR - (V/µs) 8/5.6 61.8/34.8 138/178.4 
THD (dBc) @ Vpp=0.6V, fin=1MHz -56.7 -44.7 -62.7  
Estimated Area (µm2) 8,214 11,065 8,310 
Current consumption (µA) 252.9 290.2 258 
Supply Voltage (V) 1.5 1.5 1.5 
Technology IBM 0.13µm CMOS 
 
3.7. Summary  
     A simple yet very effective SRE method has been introduced. Compared with the 
conventional OTA, the proposed OTA preserves small signal performance and improves SR 
by a factor of 2320% and THD by 6dB, but the power and area overhead is only 2% and 1.2% 
of those of the conventional OTA. Compared with the adaptive OTA, the SR and THD of the 
proposed OTA are respectively improved by 300% and by 18dB. Due to the little power 
consumption, small area overhead, design simplicity and high effectiveness of the proposed 
SRE method, the method is suitable for applications which need to provide large capacitive 
driving capability with low static power dissipation. 
53 
 
3.8. References  
[1]. R. Castello, and P.R Gray, “A high-performance micropower  switched-capacitor filter”, 
IEEE J. Solid-State Circuits, vol. 20, no. 6, pp. 1122-1132, Dec. 1985  
[2]. B.W. Lee, and B.J Sheu, “A high slew-rate CMOS amplifier for analog signal 
processing”, IEEE J. Solid-State Circuits, vol. 25, no. 3, pp. 885-889, June 1990  
[3]. M. Degranuwe, J. Rijmenants, E. A. Vittoz, and D. Man, “Adaptive biasing CMOS 
amplifiers”, IEEE J. Solid-State Circuits, vol. 17, no. 3, pp. 522-528, June 1982  
[4]. R. Harjani,  R. Heineke, and F. Wang, “An integrated low voltage class AB CMOS 
OTA”, IEEE J. Solid-State Circuits, vol.34, no. 2, pp. 134-142, Feb 1999 
[5]. A.J. Lopez-Martin, S. Baswa, J. Ramirez-Angulo, and R.G. Carvajal, “Low-Voltage 
Super class AB CMOS OTA cells with very high slew rate and power efficiency”, IEEE 
J. Solid-State Circuits, vol.40, no. 5, pp. 1068-1077, May 2005 
[6]. R. Klinke, B.J. Hosticka, and H. Pfleiderer, “A very-high-slew-rate CMOS operational 
amplifier”, IEEE J. Solid-State Circuits, vol.24, no. 3, pp. 744-746, Jun 1989 
 
 
 
 
 
 
 
 
 
54 
 
 POWER EFFICIENCY ENHANCEMENT FOR 
OP AMPS DRIVING LARGE CAPACITIVE LOADS  
4.1. Introduction 
In modern high-resolution thin-film-transistor liquid-crystal display (TFT-LCD) displays, 
gamma correction must be performed to correct nonlinearities in the glass transmission 
characteristics of the LCD panel [1]. The typical LCD source driver for 64 bits of grayscale 
uses internal digital-to-analog converters (DACs) to convert the 6-bit data into analog voltages. 
These generated analog voltages are buffered by gamma buffers to drive large capacitor load 
in the range of 10nF to 100nF, which is used to provide the glitch energy during DAC 
conversions [2]. For these gamma buffers, the output voltage swing should be large and the 
DC gain should be more than 66dB for 10-bit resolution [3]. Other very important circuit 
parameters for these op amps are gain-bandwidth product (GBW), slew rate (SR), power 
consumption, and circuit area.   
4.2. Literature Review  
4.2.1 General review 
Multistage op amps are predominant approaches for gamma buffers in LCD applications 
because of their superior gain/speed-to-power ratios [4]-[8]. But all these multistage-amplifiers 
need complicated frequency compensations which significantly increase design complexity. 
Recently, single-stage amplifiers used as gamma buffers are becoming popular in LCD display 
applications. [3][9] are single stage amplifier designs for these applications and have reported 
favorable GBW and SR performance over multistage-amplifier counterparts [4]-[8]. The 
methods in [3] and [9] are reviewed in the next section 
55 
 
4.2.2 State-of-the-art methods  
4.2.2.1 Nested Current Mirror Approach [3] 
Figure 4.1 shows a basic cell working as the preamplifier (preamp) of the nested current 
mirror (NCM) amplifier [3]. In [3], multiple of the preamps are cascaded to improve the 
amplifier’s GBW. Each preamp consists of a PMOS and a NMOS input pair. The PMOS input 
pair is always tied to the input signals (V1i and V2i) of the whole NCM amplifier, whereas the 
NMOS input pair is tied to the outputs from the prior preamp stage (V4i and V6i). The outputs 
of the current preamp stage are denoted as V3i and V5i. As the poles associated with the 
preamp stages are at much higher frequencies than the entire amplifier’s GBW, cascading 
multiple preamp stages enhances the entire amplifier’s GBW and gain.  
 
Figure 4.1: Basic cell used in the nest current mirror based single stage op amp 
 
∆v3 = (∆vth2 − ∆vth1 + ∆v2 − ∆v1) ∗ (k + 1) − ∆vth3 − k
∗ (∆v4 + ∆vth4) 
(4-1) 
 
∆v5 = −(∆vth2 − ∆vth1 + ∆v2 − ∆v1) ∗ (k + 1) − ∆vth5 − k
∗ (∆v4 + ∆vth6) 
(4-2) 
However, when cascading multiple preamp stages, the random offset voltages from all the 
preceding preamp stages are also amplified. This can be illustrated by looking at how the offset 
M1
Vb1
V2iV1i 
M2
1
2(Ki+1)*Iu
Ki
V6i
V3i V5i
M4 M6M5M3
- +
- +
- +
- +
∆vth1 ∆vth2
∆vth3 ∆vth5
- +V4i 
∆vth4
- +
∆vth6
Ki+1
56 
 
errors at the gates of the transistors M1 and M4 are amplified to preamp’s outputs. Using the 
small-signal analysis technique at DC frequency, the random offset voltages at gates of M3 
and M5 can be easily derived as (4-1) and (4-2), where ∆Vi and ∆Vthi are respectively the 
total voltage error and threshold voltage error of transistor Mi and i=1, 2…,6. k is the size ratio 
of M4 to M3. Equations (4-1) and (4-2) clearly show that any voltage errors at the NMOS input 
pair (M4 and M6), including their threshold voltage errors and voltage errors from preceding 
preamp stages, are amplified by k times to the output (the gates of M3 and M5). Similarly, the 
voltage errors at the PMOS input pair are amplified by k+1 times. Because of the voltage error 
amplification, the quiescent current in transistors M3 and M5 can deviate far from their 
nominal current. The voltage errors can be divided into differential-mode and common-mode 
voltage errors. The differential-mode errors at the inputs of the NMOS pair can be partially 
corrected as the offset voltage of the NCM amplifier in a closed loop configuration, whereas 
the common-mode voltage errors directly affect the quiescent currents of the preamp’s output 
stages and the succeeding circuits. Due to the uncontrolled common-mode errors in [3], the 
quiescent currents of M3 and M5 can even become zero when more than three preamp stages 
are cascaded. The absence of well-defined quiescent currents of the NCM amplifier severely 
limits its robustness, yield and thus practical applicability.  
4.2.2.2 Signal-Current Enhancer Approach [9] 
The basic preamp circuit of another state-of-the-art op amp design [9] for driving large 
capacitive loads is shown in Figure 4.2. The preamp provides gain from its differential current 
input to its differential current output. The input current consists of a common-mode DC bias 
current, IB and a differential-mode signal current, Is. Ideally, the differential signal current gain 
from the input to the output is (2K+1) with the transistors MP1~MP3’s aspect ratio being 1: 
57 
 
K+1: K. By cascading n preamp stages, ideally the current gain is (2K+1)N and the circuit’s 
GBW is improved by (2K+1)N. However, this approach suffers from severe tradeoffs among 
quiescent supply current constancy, power supply rejection, small-signal performance and 
large-signal performance. The tradeoffs are discussed below.  
 
Figure 4.2: Basic cell used in [9] 
Due to channel length modulation effects, the currents flowing out from nodes n3 and n4, 
are calculated as (4-3) and (4-4), in which λP  and λN  are the channel length modulation 
coefficients of PMOS and NMOS transistors. For simplicity, we assume λP ≈ λN and the gate 
source voltages of the same type of transistors are the same, as expressed in (4-5). Therefore, 
the common-mode and differential-mode currents of In3 and In4 are expressed as (4-6) and (4-
7). As can be seen in (4-6), the current errors due to channel length modulation are amplified 
by (K+1) times for a single preamp stage. The amplifier in [9] has five of the preamp stages in 
cascade, and thus the errors in the common-mode current are amplified by a factor of (K+1)5. 
The value of (K+1)5 is as high as 1024. After plugging λN=0.15, VDD=1.8V, VGS,MP4≈0.55V, 
VGS,MN1≈0.45V in this 180nm CMOS process, the common-mode current error is found as high 
as 123*IB after cascading five of the preamp stages, which is significantly higher than the 
desired bias current, IB. The actual current error should be slightly smaller than the calculated 
58 
 
123*IB because VGS,MP4 and VGS,MN1 also slightly increase when the bias current increases. 
Nevertheless, there is still a huge amplification factor for the current error. Consequently, the 
op amp in [9] is extremely sensitive to its supply voltage.  
In3 = (K + 1)(IB − is)
1 + λN(VDD − VGS,MP4)
(1 + λNVGS,MN1)
− K(IB + is)
1 + λPVGS,MP4
1 + λPVGS,MP1
 
≈ (K + 1)(IB − is)[1 + λN(VDD − VGS,MP4 − VGS,MN1)] − K(IB + is) 
(4-3) 
In4 = (K + 1)(IB + is)
1 + λP(VDD − VGS,MN4)
1 + λPVGS,MP1
− K(IB − is)
1 + λNVGS,MN4
1 + λNVGS,MN1
 
≈ (K + 1)(IB + is)[1 + λP(VDD − VGS,MP4 − VGS,MN1)] − K(IB − is) 
(4-4) 
VGS,MN4 ≈ VGS,MN1, VGS,MP4 ≈ VGS,MP1, λP ≈ λN ∝
1
L
 (4-5) 
In,cm =
In3 + In4
2
= IB + (K + 1)IB ∗ λN(VDD − VGS,MP4 − VGS,MN1) (4-6) 
In,dm =
In4 − In3
2
= (2K + 1)is + (K + 1)is ∗ λN(VDD − VGS,MP4 − VGS,MN1) (4-7) 
In order to mitigate the quiescent current variation of [9] caused by its high sensitivity to 
supply voltage, either a fixed supply voltage source equal to VGS,MN1 + VGS,MP1 or transistors 
with long lengths are needed. In [9], the op amp is designed in a 130nm CMOS process and is 
powered by a 0.7V supply voltage. However, this actually leads to the demand of a 
sophisticated LDO design to provide a constant 0.7V voltage. This not only significantly 
increases the design complexity and area consumption but also degrades the maximum 
achievable slew rate (SR) of the op amp because the maximum SR is approximately 
proportional to the square of supply voltage. On the other hand, increasing the transistors’ 
channel lengths would severely compromise an op amp’s speed. The pole frequencies 
associated with gates of MP1 and MN1 in Figure 4.2 are found as fTP/(2K+1) and fTN/(2K+1) 
respectively, where fTP and fTN are unity current gain frequencies of MP1 and MN1. A 
transistor’s unity current gain frequency decreases as its channel length increases with a 
59 
 
relationship shown in (4-8). As transistors’ length changes, fT changes faster than the channel 
length modulation coefficient, λ, which is proportional to 1/L. Therefore, to reduce λ by 10 
times through increasing the transistor’s length by 10 times, fT will drop by about 31.6 times. 
Consequently, this severely degrades the preamp’s speed.  
 
fT =
gm
Cgs
= √
2μId
WL3Cox
 (4-8) 
In addition, the small-signal performance such as GBW of [9] compromises its large-signal 
performance such as slew rate. In the slewing phases, the large transient current to the output 
is the amplified current of the input pair’s differential current. Therefore, all the transistors in 
the op amp in [9] need to carry large transient currents so that the op amp’s output transient 
current can be sufficient to charge or discharge the load capacitor. There are mainly two 
disadvantages of passing large current through multiple stages. First, in order to pass large 
transient current to output stage, the W/L ratios of the all the transistors should be large. For a 
given bias current and length of a transistor, a larger transistor width results in a smaller fT as 
shown by (4-8).  This leads to lower frequency of the non-dominant poles in the preamp, which 
ultimately limits the GBW of the entire op amp. Second, the transient current efficiency of the 
op amp is low. Ideally, in the slewing phases, we want all the generated large transient current 
to flow only into the load capacitor so minimal transient current is wasted at any intermediate 
stages in the op amp. But all the preamp stages in [9] waste a considerable portion of the large 
transient current passed to the load capacitor. 
Last but not least, some bias currents of the preamp stage shown Figure 4.2 are wasted such 
as the drain currents of load transistors MN1, MN4, MP1 and MP4. Ideally, we want to have 
zero current wasted in any of the load devices so as to maximize the transconductance and 
GBW of the preamp for a given supply current.  
60 
 
4.3. Desired Features of Op Amp for Driving Large Capacitive Loads 
In an effort to solve the problems that [3] [9] have, a desired op amp design for driving large 
capacitive loads should meet following requirements:  
a) Possesses a well-defined quiescent current for each branch of circuits  
b) Decouples small-signal and large-signal operations 
c) Has robust performance under random mismatch variations 
d) Eliminates current wasted in the preamp’s load circuits 
4.4. Concept of the Proposed Power-Efficient Op Amp Design for Driving 
Large Capacitive Loads 
 
Figure 4.3: Proposed power-efficient op amp design for driving large capacitive loads 
 
xfb = f(xs) = {
0                                  if |xs| ≤ xn
α(|xs| − xn)             if |xs| > xn 
                   (4-9) 
The conceptual power-efficient op amp design for driving large capacitive loads is shown in 
Figure 4.3. Unlike [3][9], this op amp design decouples its small- and large-signal paths. The 
small-signal enhancement path, as shown by the blue arrow, consists of two voltage-to-current 
converters (V-I), one current-to-voltage converter (I-V) and multiple voltage-to-voltage 
converters (V-V). All the converters except the output stage class AB V-I converter work as 
V-I I-V V-V V-V
Class AB V-I 
driver
CL
f(xs)
Nonlinear 
function 
xfbxs
Input Pair Input stage load Low voltage gain stages Output Stage
Design for small signal performance enhancement
Design for large signal performance enhancement
Vin Vo
61 
 
the preamp stages of the op amp and the preamp stages do not need to carry a large transient 
current in the slewing phases. As a result, unlike [3][9], the demand of large transistor sizes in 
the preamp stages due to the impact of large-signal operation is eliminated.  Therefore, all the 
preamp stages in this work can be mainly designed for small-signal performance improvement. 
In addition, the quiescent current of all the circuits in the op amp is well defined. Furthermore, 
the design of V-V stages, generating the largest amount of gain and small-signal improvement, 
wastes zero current in the V-V stages’ load circuits. This increases the power efficiency of the 
preamp and the entire op amp compared with [3][9].  
As for the large-signal performance enhancement path, shown by the red arrow, it senses 
internal nodes of the input V-I and detects if the op amp is in the slewing phase. The large-
signal enhancement circuit is a nonlinear function of the sensed signal. The nonlinear function 
is similar to the function of the introduced slew rate enhancement (SRE) circuit in Chapter 3 
and is repeated as (4-9), where xs and xn are respectively the sensed signal and the threshold 
voltage or current for the sensed signal to activate the SRE circuit. In addition, xfb is the control 
signal to activate the SRE circuit when xfb>0 or to deactivate the SRE circuit when xfb=0. When 
the op amp is in the dc or small-signal operation, that is, when |xs| ≤ xn, xfb becomes zero, 
deactivating the SRE circuit. When the op amp’s input stage is at the onset of slewing, xfb, the 
product of α and the excessive transient signal, |xs| − xn, will be generated to turn on the SRE 
circuit so as to increase the tail current of the last preamp stage. Due to the existence of both 
large input signals and increased tail current at the last preamp stage, the preamp stage 
generates large differential output voltages. As a result, the output class AB V-I driver 
generates a large transient current to the load capacitor to boost the op amp’s slew rate.   
62 
 
  In summary, the benefits of the proposed power-efficient op amp design are shown as 
below. 
1) The small-signal and large-signal paths for performance enhancement are decoupled. 
This eliminates the aforementioned trade-offs between GBW and the capability to 
convey large transient current. In addition, this improves transient current efficiency of 
op amps during the slewing phase since only the output stage conducts large current to 
charge/discharge load capacitor.  
2) All the used circuits have well-defined quiescent current. This eliminates the 
aforementioned trade-offs between GBW and quiescent current variations of op amps. 
3) Zero bias current is wasted in the V-V preamp design. This increases the power 
efficiency of the V-V preamp stage and the entire op amp.  
4.5. Design Example  
In this section, we will demonstrate a power-efficient op amp design driving a large 
capacitive load, i.e.15nF, with the proposed preamp stage. The power efficient design strategy 
for the op amp will be discussed.  
4.5.1 Design of the V-V preamp stage 
The schematic of the V-V preamp stage is shown in Figure 4.4. The inputs and outputs of 
the preamp are V1+/V1- and V2+/V2- respectively. The preamp has a well-defined quiescent 
current because the transistor M6 has a fixed bias current. All the bias currents are used to 
generate transconductance of both NMOS and PMOS transistors, i.e. M1 and M3. Zero current 
is wasted in the preamp’s load circuits, which are the two resistors, R. These two resistors also 
63 
 
form a local common-mode feedback loop to define the preamp’s output common-mode 
voltage.  
 
Figure 4.4: Schematic of the designed V-V preamp stage 
4.5.1.1 Large-signal Analysis of the Preamp Stage 
 
Figure 4.5: a) Positive slewing phase of the last preamp stage b) op amp output stage  
V2,dm =
V2+ − V2−
2
= Itail ∗ R (4-10) 
M1
Vb1
M2
M3 M4
V2-
V1+ V1-
R R
V2,cm
M6
M5
V2+
VOUT
M9 M9C
M10 M10C
CL
V2-V2+
64 
 
Figure 4.5 shows the designed op amp’s class AB output stage and its preceding preamp 
stage. As shown in Figure 4.5(a), transistors M3-M5 and the two resistors R form a local 
common-mode feedback loop in the quiescent or small-signal operation. In the quiescent 
operation, M5 is biased in the saturation region to define the preamp’s common-mode output 
voltage, V2,cm, which then defines the quiescent current of the op amp’s class AB output stage 
shown in Figure 4.5(b). When the preamp’s input differential voltage, V1,dm =0.5*(V1+ - V1-), 
is larger than 0.7*Vod1, the preamp is in a positive slewing phase. In this phase, the preamp’s 
differential-mode output voltage, V2,dm, is calculated as (4-10), and its V2,cm depends on both 
V2,dm and the operation regions of M3 and M5. V2,dm and V2,cm  are analyzed as follows. 
a) In the positive slewing phase, V1+ and V1- respectively increases and decreases by at least 
(√2 − 1)*Vodi from their quiescent voltages, where Vodi is the quiescent overdrive voltage 
of transistor Mi, i=1,2,…4. The values of the transistors’ Vodi are close to each other. If 
V2,dm is so small that M3 and M5 work in the saturation region in the slewing phase, V2,cm 
can be found as (4-11), in which Itail is the drain current of M6 and Vth5 is M5’s threshold 
voltage. In addition, µ, Cox, Wi and Li are respectively transistor Mi’s mobility, gate oxide 
capacitance, width and length. As M3 and M5 work in the saturation region, V2- is larger 
than √2Vod3 + Vod5 as expressed by (4-12). After solving (4-12), the expression Itail ∗ R <
Vth5 − √2Vod3 is found. Then based on (4-13), V2+ is found to be smaller than Vod5 +
2Vth5 − √2Vod3. Therefore, in the positive slewing phase, V2+ is smaller than the supply 
voltage by at least 4.4*Vod, because the supply voltage is higher than Vod5 + Vod6 + Vod1 +
Vod3 + Vth1 + Vth3, which is approximately 4*Vod+2*Vth. This concludes that V2+ is not 
maximized if M3 and M5 still work in the saturation region in the slewing phase.  
65 
 
V2,cm =
V2+ + V2−
2
= √
2 ∗ Itail ∗ L5
μCoxW5
+ Vth5 = Vod5 + Vth5 (4-11) 
V2− = V2,cm − V2,dm = Vod5 + Vth5 − Itail ∗ R > √2Vod3 + Vod5 (4-12) 
V2+ = V2,cm + V2,dm = Vod5 + Vth5 + Itail ∗ R < Vod5 + 2Vth5 − √2Vod3 (4-13) 
b) In the positive slewing phase, if V2,dm is large enough to make M3 and M5 work in the 
triode and saturation region respectively, the common-mode output voltage is still 
calculated as (4-11) because M5 still works in the saturation region. The drain source 
voltage of M5, Vds5, is given by (4-14), where Ron3 is the on resistance of transistor M3 
working in the triode region. In order to keep M5 working in saturation region, Vds5 needs 
to be larger than Vod5. Thus, it is found that Vth5>Itail*(R+Ron3). The expression of V2-, given 
as (4-15), is derived from the fact that the total drain source voltage of M3 and M5 is 
smaller than the sum of their overdrive voltage in the slewing phase because M3 works in 
the triode region. Equation (4-16) about V2+ can be easily derived after plugging 
Vth5>Itail*(R+Ron3). V2+ is smaller than the supply voltage, around 4*Vod+2*Vth, by at least 
3*Vod. Therefore, in this scenario, V2+ is still not maximized in the positive slewing phase. 
Vds5 = Vod5 + Vth5 − Itail ∗ (R + Ron3) > Vod5 (4-14) 
V2− = V2,cm − V2,dm = Vod5 + Vth5 − Itail ∗ R < √2Vod3 + Vod5 (4-15) 
V2+ = Vod5 + Vth5 + Itail ∗ R < Vod5 + 2Vth5 − Itail ∗ Ron3 (4-16) 
c) In the positive slewing phase, if V2,dm is sufficiently large to make both M3 and M5 work 
in the triode region, the common-mode output voltage is then calculated as (4-17). The on 
resistance of M3 and M5, Ron3 and Ron5, are typically much smaller than the resistor R in 
the low power design. Therefore, V2,cm is about Itail*R. In addition, V2- and V2+ are 
calculated as (4-18) and (4-19). It’s found that V2+ linearly increases as Itail. Therefore, a 
66 
 
large transient Itail should be generated for the last preamp stage to maximize its output 
voltage swing and the entire op amp’s slew rate. As the preamp is symmetry, the 
calculations of V2+ and V2-, shown in (4-18) and (4-19) are swapped in the negative slewing 
phase.  
V2,cm =
V2+ + V2−
2
= Itail ∗ (R + Ron3 + Ron5) ≈ Itail ∗ R (4-17) 
V2− = Itail ∗ (Ron3 + Ron5) (4-18) 
V2+ = Itail ∗ (2R + Ron3 + Ron5) (4-19) 
When V2- and V2+ become Vsupply and 0 respectively in the negative slewing phase, the op 
amp’s output slewing current can be easily calculated as (4-20), if M9C works in the saturation 
region in this phase. Under the reasonable assumptions that β9=β10, β10C=β9C and M9 works in 
the triode region in the op amp’s positive slewing phase, the gate voltage of M10, Vg10, is found 
as about 
2
3
(Vsupply − Vth10) after solving the KCL equations at M10’s gate node. If M10C 
works in the saturation region in the positive slewing phase and its threshold voltage is the 
same as M9C, then the positive slewing current, ISR+ can be simplified to be about −
4
9
ISR− as 
shown in (4-21). Therefore, the transistors sizes of the op amp’s output stage should be 
designed according to the op amp’s slew rate specifications given by equations (4-20) and (4-
21). 
ISR− = −
1
2
β9C(Vsupply − Vth9C)
2
 (4-20) 
ISR+ =
1
2
β17(Vsupply − V7 − Vth17)
2
=
1
2
∗
4
9
β17(Vsupply − Vth17)
2
= −
4
9
ISR− (4-21) 
67 
 
4.5.1.2 Small-signal Analysis of the Preamp Stage 
The DC gain of the preamp shown in Figure 4.5(a) is annotated as Avo and calculated as (4-
22). Assuming that fT of M1 and M3 are the same for simplicity and this preamp’s loading 
circuit is another same preamp, this preamp’s pole, Pnd, is found as (4-23), where CL is given 
by (4-24). Thus, the GBW of the preamp, GBWpreamp, is derived as (4-25). When an op amp 
drives a very large capacitive load and its dominant pole is located at the op amp’s output node, 
the amount of GBW enhancement and the amount of DC gain enhancement generated by the 
added preamp stages are the same. In this regard, when the N-stage of the preamps in Figure 
4.5(a) are cascaded prior to the op amp’s output stage, the GBW of the op amp, GBWenh, is 
found as (4-26), in which GBWorig is the GBW of the output stage of the op amp without any 
preamp stages. The phase drop caused by the poles in the N-stage preamps can be calculated 
and simplified as (4-27) after plugging (4-26) into (4-27). In order to have a phase margin more 
than 63 degrees for the op amp, ϕdrop needs to be less than 27o or 0.463rad. This phase margin 
requirement imposes the requirement of GBWpreamp/GBWorig as shown in (4-28).   
 AV0 = (gm1 + gm3) ∗ R (4-22) 
 
pnd =
1
R(Cgs1 + Cgs3 + CL)
=
(gm1 + gm3)
AV0(Cgs1 + Cgs3)(1 + m)
≈
fT
AV0(1 + m)
 
 
(4-23) 
 CL = m ∗ (Cgs1 + Cgs3) = Cgd1 + Cgd3 + Cds1 + Cdb1 + Cds3 + Cdb3 (4-24) 
 
GBWpreamp =
gm1 + gm3
(Cgs1 + Cgs3)(1 + m)
≈
fT
1 + m
 (4-25) 
GBWenh = AV0
NGBWorig (4-26) 
∅drop = N tan
−1 [
GBWenh
pnd
] ≈
N ∗ GBWenh
pnd
≈
NAV0
N+1 ∗ GBWorig
GBWpreamp
≤ 0.46 (4-27) 
68 
 
GBWpreamp
GBWorig
≥
AV0
N+1 ∗ N
0.46
 (4-28) 
4.5.1.3 GBW Enhancement Optimization for N-stage of Preamp in Cascade 
For a given total current budget, Ibudget, for N identical preamp stages, the current budget is 
equally distributed to N preamp stages, where N can range from 1 to any other positive integer 
number. We define the GBW of the preamp as GBWpreamp_single when Ibudget is entirely consumed 
by this single preamp. Assuming the transistors in the preamps are working in the weak 
inversion region, scaling down the transistors’ bias current to Ibudget/N without changing the 
size of transistors shrinks the preamp’s GBW by N times to GBWpreamp_single/N. Therefore, the 
ratio of GBWpreamp_single to GWBorig, GBWratio, is derived as (4-29) after plugging (4-28). The 
GBWratio depends on process feature sizes, bias current and load capacitor etc. Different 
GBWratio may result in different optimal preamp stages and optimal GBW enhancement factors.  
GBWratio =
GBWpreamp_single
GBWorig
=
N ∗ GBWpreamp
GBWorig
≥ AV0
N+1 ∗
N2
0.46
 (4-29) 
Figure 4.6 shows the dependency of GBW enhancement factor, AVo
N, on the quantity of 
preamp stages at different GBWratio. The peak of AVo
N
 shifts to upper right portion of the plot 
as the GBWratio increases. This means that more preamp stages are needed to achieve optimal 
GBW enhancement factors as GBWratio increases. For example, the optimal number of preamp 
stages for GBWratio of 8*10
5 and 5*104 are respectively four and three. Increasing the load 
capacitor or the preamp’s bias current or using a smaller transistor size will enhance GBWratio. 
In this design with the 0.18um CMOS process, Ibudget for the preamp stages is about 5uA and 
CL=15nF, the GBW ratio is about 2*105. Figure 4.5 shows that the optimal number of the 
preamp stage is 3~4 for this design and the largest GBW enhancement factor is about 1000. 
69 
 
The DC gain of each preamp stage should be about 10 for a 3-stage preamp and 5.6 for a 4-
stage preamp. In this work, we design a 4-stage preamp with a gain of 5.8.  
 
Figure 4.6: Dependency of GBW enhancement factor on number of preamp stages 
4.5.2 Design of the entire op amp 
Figure 4.7 shows the schematic of the designed op amp. The adaptive bias circuit of the input 
pair is shown in Figure 4.8. The designed op amp consists of a class AB input stage, three V-
V preamps and a class AB output stage. The adaptive biasing circuit for the input pairs is 
controlled by negative feedback loops formed by M1 and M20~M22. The adaptive biasing 
circuit regulates transistor M1’s source voltage so that it tracks its gate voltage. Because of 
this, the input pairs M1A, M1B, M1C and M2D have a class-AB operation with effectively 2 
times of the input small signals. Due to the class-AB operation, when a large step input signal 
(Vid= Vip-Vim >>Vod1) is applied in the slewing phase, large transient drain currents of 
transistors M1A and M1C will be provided by transistor M22A, whereas M1B and M1D and 
2 3 4 5 6 7 8 9 10
0
500
1000
1500
2000
2500
3000
3500
Preamp Stages
G
B
W
 E
n
h
a
n
c
e
m
e
n
t 
F
a
c
to
r
GBW Enhancement Factor vs. Preamp Stages
 
 
GBW ratio=50000
GBW ratio=100000
GBW ratio=200000
GBW ratio=400000
GBW ratio=800000
70 
 
their current mirrors have small currents. As a result, the transient current in M1C is much 
larger than M3B and this excessive transient current from M1C activates M0B. When the large 
step input signals are removed, transistor M0B resumes to the off state. Transistor M0B’s off 
stage is automatically resumed because the drain source voltages of M3A and M3B are biased 
to be lower than the threshold voltages of M0A and M0B in the quiescent operation. Similarly, 
when a large negative input signal is applied, transistor M0A will be activated. Therefore, 
whenever there is a large transient input step signal, the total current in transistors M0A and 
M0B increases. The total current in M0A and M0B is mirrored and gained up to the last preamp 
stage’s tail current by transistors M11A and M11B. The largely boosted tail current increases 
the output voltage swing of the last preamp stage, as given by (4-18) and (4-19). This largely 
improves the slew rates. 
 
Figure 4.7: Schematic of the designed op amp for driving 15nF load capacitor 
 
M2A M3A
M1A M1B
M4A M4B
M5A M5B
R3 R3
M6B
M2BM3B
M1D
M6A
Vb2
Vip Vim
V1m
V2p
Vip Vim
V1p
V2m
Vb1
M8A M8B
R4 R4
M7A
M10A
V3p V3m
V3cm
V3p V3m
V4m V4p
V4cm
Vb1
R5 R5
M7B
V4p V4m
V5m V5p
V5cm
M9A M9B M9C M9D
M10B
Vb1
R6 R6
M7C
V5mV5p
V6cm
M8E M8F
M9E M9F
V6m V6p
M0A M0B M11A M11B
V2p V2m
VOUT
M14 M15
M16 M17
CL
M13
M1C
M8C M8D
V6mV6p
Vim2 Vip2 Vim2 Vip2
Adaptive 
bias
Vip
Vim Vim2
Vip2
2*I 5*I 5*I 2*I
3*I 3*I
0 0 8*I8*I 4*I (5.4*I)0 0
Class AB input stage 1st V-V preamp 2nd  V-V preamp last  V-V preamp Class AB output stage
V4p V4m
V7m
14*I (23.6*I)
71 
 
 
Figure 4.8: The adaptive bias circuit for the designed op amp’s input stage 
In the negative slewing phase, transistors M17 and M15 respectively work in the cutoff and 
triode regions. The transient drain current of M15 is derived as I15≈β15Vin_avg(Vgs15-Vth15-
0.5*Vin_avg)≈22.7mA, because Vgs15≈Vsupply-50mV=1.45V, β15=μCoxW15/L15=48.5mA/V2, 
Vth15≈0.46V and Vin_avg=0.78V in this design. Consequently, the expected negative slew rate 
(SR-) of the designed op amp is around I15/CL=1.5V/µs with CL of 15nF. In the positive slewing 
phase, transistors M14 and M15 work in the triode and cutoff regions with Vgs14≈Vsupply-50mV 
=1.45V. M16 still works in the saturation region because of its diode connection. Therefore, 
the KCL equation at the drain of M14 and M16 can be expressed as (4-30). After plugging 
λ16=0.25, β16=12, β14=13.9, Vth16=0.53V, Vth14=0.46V and Vgs14= Vgs15=1.45V into (4-30), Vg16 
is found as 0.28V and the drain current of M16 and M14 is found as 3.3mA. Since the aspect 
ratio of transistors M17 to M16 is 14/4, the expected drain current of M17 is about 
3.3mA*14/4=11.6mA and the expected positive slew rate (SR+) is about 0.77V/μs in the 
positive slewing phase.  
1
2
β16(Vsupply − Vg16 − Vth16)
2
[1 + λ16(Vsupply − Vg16 − Vth16)]
= β14(Vgs14 − Vth14)(Vgs14 − Vth14 − 0.5 ∗ Vg16) 
(4-30) 
Vim
Vb2
VY-
VX-
M1E
M20A
Vb1
M21A
M22A
4*I
2*I 4*I
Vim2
Vip
Vb2
VY+
VX+
M1F
M20B
Vb1
M21B
M22B
4*I
2*I4*I
Vip2
Adaptive biasing circuit 
Adaptive 
bias
Vip
Vim Vim2
Vip2
72 
 
In terms of the op amp’s DC operation points, transistors M9A-M9D and M10A-M10B are 
respectively defined to have the same operation conditions as transistors M5A-M5B and M6A-
M6B, because they have the same gate voltages and same current densities. Therefore, M9A-
M9D and M5A-M5B work in the saturation region, whereas M10A-M10B and M6A-M6B 
work in the triode region. As for the last preamp stage, because the gate voltage of transistor 
M13 is used to define the output stage’s current, transistor M13 is designed to work in the 
saturation region. This can be achieved by lowering the gate source voltage of transistors M9E 
and M9F. Therefore, each circuit branch in the designed op amp has a well-defined quiescent 
current and any common mode voltage errors from prior preamp stages will not proceed to the 
output voltage of subsequent preamp stages. 
In addition, a feedforward connection, from the 2nd preamp’s outputs to the NMOS input 
pair of the last preamp stage, is used to reduce the total gain of the preamp stages. This 
feedforward connection increases the op amp’s input linear range. An input linear range that 
is too narrow could cause conditional instability in large-signal operation of multi-stage op 
amps [10].  
To understand the frequency response of the entire op amp, the frequency response of the 
adaptive bias circuit in Figure 4.8 is analyzed first. After a differential input voltage applied at 
Vim of -0.5*Vin and Vip of 0.5*Vin, the KCL equations at nodes Vim2, Vx- and Vy- can be found 
as (4-31) to (4-33), where gx=gds1+gds21+gds20, gy=gds21 and gz=2gds1+gds22. After solving (4-31) 
to (4-33), the transfer function of (Vip-Vim2)/Vip is found as (4-34), where a, b, c and d are 
expressed in (4-35) to (4-38). In order to obtain more insights from the equations, the parasitic 
capacitances (ie. Cgs and Cgd) and fT of transistors M1, M20-M22 are assumed to be close to 
each other for simplicity. Therefore, the expressions of a, b, c and d are approximated as 2/fT, 
73 
 
4/fT, 10/fT
2 and 5/fT
3, where fT of the transistors are in the order of 100MHz. With the 
approximated a, b, c and d, it can be found that the frequencies of the three LHP poles and 
three zeros in (4-34) are much higher than the GBW of the designed op amp, which is about 
0.85MHz. Therefore, for simplicity, the transfer function of (Vip-Vim2)/Vip and (Vim-Vip2)/Vim 
is approximated as 2 in the following frequency analysis. 
Vim2 ∗ [2(gm1 + s ∗ Cgs1) + sCgd22 + gz] + Vy ∗ (gm22 − sCgd22) = 0 (4-31) 
−gm1 (
Vin
2
+ Vim2) +
Vin
2
sCgd1 + Vx(sCgs21 + sCgd1 + gx + gm21) − gds21Vy
= 0 
(4-32) 
−gm21Vx + Vy ∗ (gy + sCgs22 + sCgd22) − Vim2 ∗ sCgd22 = 0 (4-33) 
Vip − Vim2
Vip
=
Vim − Vip2
Vim
=
2 + as + cs2 + ds3
1 + bs + cs2 + ds3
 (4-34) 
a =
(2Cgs22gm1 + Cgd22gm22 − Cgd1gm22)
gm1gm22
≈
2
fT
 (4-35) 
b =
(2Cgs22gm1 + Cgd22gm22 + Cgd22gm1)
gm1gm22
≈
4
fT
 (4-36) 
c =
Cgd22[(2Cgs1 + Cgs22)gm21 + Cgs21(2gm1 + gm22)]
gm1gm21gm22
+
2Cgs22(Cgs21gm1 + Cgs1gm21)
gm1gm21gm22
≈
10
fT
2  
(4-37) 
d =
Cgs21(2Cgd22Cgs1 + Cgd22Cgs22 + 2Cgs1Cgs22)
gm1gm21gm22
=
5
fT
3 (4-38) 
With the transfer function of (Vip-Vim2)/Vip and (Vim-Vip2)/Vim known as 2, the small-signal 
block diagram of the designed op amp’s input stage can be simplified as Figure 4.9. The gain 
of 2 is expressed by changing the input signal from 0.5*Vin to Vin. Three KCL equations, 
expressed as (4-39) to (4-41), are calculated for nodes V1, V2, and V3. After solving the 
equations, the transfer function from Vin to V3, TF1(s), is derived as (4-42), in which the time 
74 
 
constants are expressed in (4-43). The Ci and gi in (4-43) are respectively the capacitance and 
conductance at node i and their expressions are shown in Table 4.1. As expected, there are 
three LHP poles and one LHP zero in TF1(s) and the DC gain of TF1(s) is A1= k*gm1*R3=3.5 
gm1*R3=6.7.  
 
Figure 4.9: The small-signal block diagram of the op amp’s input stage 
−gm1 ∗ Vin + V1 ∗ (s ∗ C1 + gm2) = 0 (4-39) 
gm1 ∗ Vin + V1 ∗ gm3 + V2(s ∗ C2 + gm4) = 0 (4-40) 
V2 ∗ gm4 = V3 ∗ (s ∗ C3 + 1/R3) (4-41) 
TF1(s) =
V3
Vin
≈
−gm1R3 ∗ k(1 + sτz1)
(1 + sτ1)(1 + sτ2)(1 + sτ3 )
;  k =
gm2 + gm3
gm2
= 3.5 (4-42) 
τ1 =
C1
gm2
= 11ns , τ2 =
C2
gm4
= 9.6ns, τ3 =
C3
g3
= 6.7nsτz1 =
C1
kgm2
= 3.2ns  (4-43) 
Table 4.1: Expressions of parasitic capacitance for the op amp’s input stage 
Expression 
C1≈Cgs2+Cgs3+ Cdb1+Cdb2+Cgd3+Cgd1 
C2≈Cdb3+Cdb1+Cdb4+Cgs4+Cgs0+Cgd3+Cgd1 
C3≈Cdb4+Cdb5+Cgd4+Cgd5+(Cgd8+ Cgd9)*(gm8+gm9)R2 
C4≈Cgd8+Cgd9+(Cgd8+ Cgd9)*(gm8+gm9)R3+ Cgd9*gm9R4 
C5≈Cgd8+Cgd9+ Cgd9*gm8R4 
C6p ≈Cgd8+Cgd9+Cgs14+Cgd14 
C6m ≈Cgd8+Cgd9+Cgs15+Cgd15 
C6t=C6p+C6m≈ C6p(1+gm15/gm14)=4.5*C6p 
C7≈Cgs16+Cgs17+Cgd17+Cgd14 
g3=1/R3, g4≈1/R4, g5≈1/R5, g6≈1/R6, gL=gds17+gds15 
-Vin V1
C1gm2
-gm1
C2gm4 C3R3
V2 V3
+gm4-gm3
Vin
-gm1
75 
 
 
Figure 4.10: Small-signal block diagram of the designed op amp from its 1st preamp stage 
to its output stage  
  To analyze the complete transfer function from the input to the output of the op amp, the 
small-signal block diagram from the op amp’s 1st preamp stage output to its output stage is 
drawn as Figure 4.10. In the block diagram, gm8=gm9 is used for simplicity. The KCL equations 
at nodes V4, V5, V6p, V6m, V7 and Vout are calculated as (4-44) to (4-49). After solving the 
equations, the transfer function from V3 to Vout is derived as (4-50), where A2 and A3 are 
respectively the three preamp stages’ total DC gain and the output stage’s DC gain. Therefore, 
the DC gain from Vin to V6 can be found as A1*A2= 3.5gm12gm8
2 (1/R5 + 2gm8) ∗
R3R4R5R6=1072, in which A1=6.7 and A2=160. The transfer function from the op amp input 
to V3 of the op amp is also recalled as (4-51) from (4-42). The values of the time constants in 
(4-49) and (4-50) are calculated as (4-51) to (4-55). In addition, the expressions of gi and Ci 
are expressed in Table 4.1. Therefore, the op amp’s transfer function from its input to output 
is calculated as TF1(s)* TF2(s), which consequently has four LHP zeros and nine LHP poles. 
The distribution of the designed op amp’s poles and zeros within 5 times of the GBW of the 
op amp are shown in Figure 4.11, in which P-3dB=gL/CL, Pi=1/τi , Zj=1/τzj, i=1~5, 6m, 6p and 
V3
CLgL
Vout
-2gm8
C4R4
V4
-2gm8
C5R5
V5
C6pR6
V6p V7
-gm14
C7
gm16
-gm17
-gm8
C6mR6
V6m
-gm15-1x
gm15=n*gm14;  gm17=n*gm16
2
-gm8
2
gm8
2
-gm8
2
76 
 
j=1~4. With the locations of the poles and zeros, the phase margin of the designed op amp is 
62.5o with a GBW of 0.85MHz.  
V3 ∗ 2gm8 + V4(g4 + sC4) = 0 (4-44) 
V4 ∗ 2gm8 + V5 ∗ (g5 + sC5) = 0 (4-45) 
−V4 ∗ 0.5gm8 + V5 ∗ 0.5gm8 + V6p ∗ (g6 + sC6p) = 0 (4-46) 
V4 ∗ 0.5gm8 − V5 ∗ 0.5gm8 + V6m(g6 + sC6m) = 0 (4-47) 
V6p ∗ gm14 + V7(gm16 + sC7) = 0 (4-48) 
V7 ∗ gm17 + V6m ∗ gm15 + Vout ∗ (gL + sCL) = 0 (4-49) 
TF2(s) =
Vo
V3
≈
−A2A3(1 + sτz2)[1 + sτz3][1 + sτz4]
(1 + τ4s)(1 + τ5s)(1 + τ6ms)(1 + τ6ps)(1 + τ7s)(1 +
CL
gL
s)
 
(4-50) 
TF1(s) =
V3
Vin
≈
−A1(1 + sτz1)
(1 + sτ1)(1 + s τ2)(1 + sτ3 )
 (4-51) 
A1 = 3.5gm1R3,   A2 =
2gm8
2 (g5 + 2gm8)
g4g5g6
= 160, A3 =
gm15
gL
 (4-52) 
τ1 =
C1
gm2
= 11ns , τ2 =
C2
gm4
= 9.6ns, τ3 =
C3
g3
= 6.7ns, τ4 =
C4
g4
= 7.8ns (4-53) 
τ5 =
C5
g5
= 5.6ns, τ6p =
C6p
g6
= 20ns, τ6m =
C6m
g6
= 52.4ns  (4-54) 
τ7 =
C7
gm16
= 22.7ns, τz1 =
C1
kgm2
= 3.2ns; τz2 =
C5
g5 + 2gm8
= 0.83ns,  (4-55) 
τz3 =
1
2
(
C6t
g6
+
C7
gm16
) = 36.2ns, τz4 =
C6pC7
C6tgm16 + C7g6
= 4.8ns (4-56) 
GBW = A1 ∗ A2 ∗
gm15
CL
= 3.5gm12gm8
2 (
1
R5
+ 2gm8) ∗ R3R4R5R6 ∗
gm15
CL
 (4-57) 
 
77 
 
 
Figure 4.11: Distribution of the op amp’s poles and zeros within 5 times of GBW  
Compared with the GBW of the op amp without preamp stages, gm15/CL, the op amp’s GBW 
with the preamp stages is enhanced by A1*A2=1072 times, as shown by (4-57). From (4-57), it 
also can be seen that the GBW of the op amp is approximately proportional to gm5 and R4. As 
the resistor’s process variation in this 180nm CMOS process is about -50%~+45% of its typical 
value, the variation of the R4 and GBW can be as high as 6.25%~442%. To reduce this variation, 
a constant gm bias circuit like [11] is used as this op amp’s bias circuit. The constant gm bias 
circuit makes the NMOS transistors’ gm proportional to 1/R. As a result, A1*A2 is 
approximately a constant. The op amp’s GBW can thus be simplified as proportional only to 
the transistor’s gm. Therefore, the expected GBW variation ranges from 70% to 200% of the 
typical GBW. This GBW variation can be further reduced by trimming the resistors in the 
preamp stages or trimming the transistor sizes at the op amp’s output stage or implementing 
the resistors by transistors. In addition, the generated bias current from the constant gm bias 
circuit is also roughly proportional to 1/R, so the expected quiescent current of the op amp also 
ranges from 70% to 200% of the typical value under process corner variations. Because the 
GBW and the supply current, Isupply, of the op amp have similar dependencies on the resistor’s 
variation, the ratio of GBW and Isupply should have less variation. As a result, the variation of 
FOMs=GBW*CL/Isupply is not very large and its value need to be confirmed by simulation. On 
the other hand, some foundries closely monitor the doping concentrations of the poly and 
whole wafer as their standard procedure. As a result, the poly resistor values are almost always 
jω 
P-3dB σ P6mP6p P7 Z3
GBW
78 
 
close to the values in the typical corner. If this is the scenario, no constant gm bias circuit is 
needed.  
4.6. Simulation Results 
In this section, the designed op amp in a CMOS 180nm process is simulated under three 
different conditions: 1) under process corner variations only, 2) under mismatch variations 
only, and 3) under process corner plus mismatch variations. The purposes of the simulation 
results are fourfold: a) to confirm that the quiescent current of the op amp is well controlled; 
b) to verify the theoretical analysis of the op amps’ frequency and transient response including 
phase margin and slew rate in the typical corner; c) to confirm the variation range of GBW and 
supply current under process corner variations; and d) to confirm that the designed op amp 
provides favorable small- and large-signal figures of merit, FOMs and FOML, compared with 
the state-of-the-art op amp design [3][9] for driving large capacitive loads. FOMs and FOML 
are defined as (4-56), where GBW, CL, SR and Isupply are respectively the gain-bandwidth 
product, load capacitor, slew rate and supply current of the op amp.  
FOMs =
GBW ∗ CL
Isupply
 ;  FOML =
SR ∗ CL
Isupply
 (4-58) 
79 
 
4.6.1 Typical corner simulation results 
 
Figure 4.12: Frequency response of the designed op amp at typical corner 
Figure 4.12 shows the frequency response of the designed op amp with a 15nF load capacitor. 
The simulated DC gain, GBW and phase margin (PM) are respectively 92.6dB, 0.85MHz and 
62.5o. The simulated phase margin agrees with the theoretical calculation in Section 4.5.2. Also, 
as expected, the op amp has a dominant pole at low frequency, about 10Hz, and all other 
nondominant poles are located at frequencies a few times higher than the op amp’s GBW.  
 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-400
-350
-300
-250
-200
-150
-100
-50
0
50
100
150
200
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
 
 
Loop phase
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-140
-120
-100
-80
-60
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
 
 
Loop gain
80 
 
 
Figure 4.13: Transient response of the designed op amp at typical corner 
The designed op amp’s large- and small-signal transient responses are simulated in the 
noninverting unity gain buffer configuration with input step voltages of 400mV and 60mV. 
The simulated transient responses are shown in Figure 4.13. As can be seen, the positive slew 
rate (SR+) is 0.77V/µs and the negative slew rate (SR-) is -1.4V/µs. The simulated SR+ and 
SR- are also consistent with the calculated slew rates. With a positive large step input, the op 
amp’s settling time with 1% (Ts+_1%) and 0.1% (Ts+_0.1%) settling accuracy are respectively 
0.85µs and 1.06µs. With a negative large step input, the op amp’s settling time with 1% (Ts-
_1%) and 0.1% (Ts-_0.1%) settling accuracy are respectively 0.613µs and 0.801µs. The 
quiescent supply current and supply voltage of the op amp are respectively 6.56µA and 1.5V. 
Therefore, FOMs and FOML are derived as 1940pF/MHz-uA and 2500pF*V/us-uA.  Also, the 
op amp’s input referred voltage noise density is found as 1.5µV/sqrt(Hz) and 93.6nV/sqrt(Hz) 
at 100Hz and 100KHz respectively. In addition, the op amp’s power supply rejection ratios 
(PSRR) are -93.2dB at frequency of 1KHz and 91.2dB at frequency of 100KHz. The 
performance summary of the designed op amp in the typical corner is shown in Table. 4.2. 
0 5 10 15
0.5
0.6
0.7
0.8
0.9
1
1.1
time (s)
v
o
lt
a
g
e
 (
V
)
settling performance
 
 
Output Voltage
Input Voltage
81 
 
Table 4.2: Performance summary of the designed op amp in the typical corner 
Output Unit Typ 
Phase Margin degree 62.5 
GBW MHz 0.846 
DC gain dB 92.64 
Isupply μA 6.56 
Vos, 1σ mV -0.00958 
SR- V/μs -1.41 
SR+ V/μs 0.778 
Ts-_1% μs 0.613 
Ts+_1% μs 0.852 
Ts+_0.1% μs 0.801 
Ts+_0.1% μs 1.06 
FOMs pF/MHz-uA 1940 
FOML pF*V/us-uA 2500 
noise_at_100Hz nV/sqrt(Hz) 1510 
noise_at_100K nV/sqrt(Hz) 93.6 
PSRR at 1KHz dB -93.2 
PSRR at 100KHz dB -91.23 
4.6.2 Process corner variation simulation results 
In this section, the designed op amp is simulated under process corner variations. The 
purposes of the simulations are threefold: a) to verify the functionality of the designed op amp 
under process corner variations; b) to check the variations of the op amp’s GBW, PM, DC gain, 
slew rate and settling time under process corner variations; and c) to confirm the robustness of 
the op amp’s FOMs and FOML under process corner variations. The process corner setup is 
shown in Table 4.3.  
82 
 
 
Figure 4.14: Frequency responses of the designed op amp at all process corners 
Table 4.3: Process corner setups for the simulations of the designed op amp  
Parameter typ All0 All1 All2 All3 All4 All5 All6 All7 low high 
Capacitor typ typ typ typ typ typ typ typ typ low high 
MOSFET tntp hnlp lnhp snsp wnwp hnlp lnhp snsp wnwp wnwp snsp 
Resistor  typ high high high high low low low low low high 
Figure 4.14 shows the designed op amp’s frequency response under process corner 
variations.  The performance pairs of (min, typ, max) of the simulated DC gain, phase margin 
(PM), GBW and supply current (Isupply) are respectively (89.7dB, 92.64dB, 93.3dB), (59.7
o, 
62.5o ,69.9o), (0.43MHz, 0.846MHz, 1.49MHz) and (4.54µA, 6.56 µA, 12.4μA). The ranges 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-400
-350
-300
-250
-200
-150
-100
-50
0
50
100
150
200
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-140
-120
-100
-80
-60
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
83 
 
of the GBW and supply current are respectively within 51%~176% and 69%~189% of their 
typical values. These match reasonably well with the calculated ranges of 70%~200% and 
70%~200%. The (min, typ, max) of the simulated FOMs are (1310 pF/MHz-μA, 1940 
pF/MHz-μA, 1940 pF/MHz-μA). As expected, the variation range of FOMs, 68%~100%, is 
smaller than that of GBW or Isupply because the GBW and Isupply of the op amp have similar 
dependencies on resistors’ variation. 
 
Figure 4.15: Transient step responses of the designed op amp at all process corners 
The designed op amp’s large- and small-signal transient responses are simulated in the 
noninverting unity gain buffer configuration with input step voltages of 400mV and 60mV 
under process corner variations. The simulated transient performance of the op amp is shown 
in Figure 4.15. The (min, typ, max) of the simulated Ts+_1%, Ts-_1%, Ts-_0.1% and Ts+_0.1% 
are respectively (0.52μs, 0.85μs, 1.17μs), (0.44μs, 0.61μs, 0.71μs), (0.65μs, 1.06 μs, 1.46μs) 
and (0.56μs, 0.80μs, 1.0μs). The (min, typ, max) of the simulated SR+ and SR- are (0.65V/μs, 
0.78 V/μs, 0.95V/μs) and (-1.23V/μs, -1.41V/μs, -1.64V/μs). As a result, the (min, typ, max) 
of FOML is (1160pF*V/us-μA, 2500pF*V/us-μA, 3860pF*V/us-μA). The FOML of the 
designed op amp excels [3][9]. The performance summary of the designed op amp under 
process corner variations is shown in Table 4.4.  
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0.53
0.58
0.63
0.68
0.73
0.78
0.83
0.88
0.93
0.98
1.03
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
84 
 
Table 4.4: Performance sumamry of the designed op amp under process corner variation 
Output Unit Min Max Typ 
Phase Margin degree 59.73 69.92 62.5 
GBW MHz 0.434 1.49 0.846 
DC gain dB 89.71 93.32 92.64 
Isupply μA 4.54 12.4 6.56 
SR- V/μs -1.23 -1.64 -1.41 
SR+ V/μs 0.645 0.945 0.778 
Ts-_1% μs 0.444 0.713 0.613 
Ts+_1% μs 0.525 1.17 0.852 
Ts-_0.1% μs 0.563 1 0.801 
Ts+_0.1% μs 0.654 1.46 1.06 
FOMs pF/MHz-μA 1310 1940 1940 
FOML pF*V/us-μA 1160 3860 2500 
Noise_at_100Hz nV/sqrt(Hz) 1070 3380 1510 
Noise_at_100K nV/sqrt(Hz) 72.5 128 93.6 
PSRR at 1KHz dB -120.4 -90.42 -93.2 
PSRR at 100KHz dB -110.5 -86.05 -91.23 
4.6.3 Mismatch variation simulation results 
In this section, the designed op amp is simulated using a 1000-run Monte Carlo simulation 
with mismatch variations only. The purposes of the simulations are twofold: a) to verify that 
the op amp’s quiescent current is well controlled; and b) to verify the tight spread of the op 
amp’s performance including GBW, PM, DC Gain, slew rate and settling time. Figure 4.16 
shows the frequency responses of the designed op amp under mismatch variations. The 1000-
run Monte Carlo simulation shows that the pairs of (mean, sigma) of the simulated PM, DC 
gain, GBW and supply current of the designed op amp are respectively (62.5o, 2.5o), (92.5dB, 
0.7dB), (0.83MHz, 0.08MHz) and (6.56μA, 0.19μA).  The tight spread of the op amp’s supply 
current and GBW under random mismatch variations verifies that the op amp’s quiescent 
85 
 
current is well defined. Therefore, unlike [3], the designed op amp’s small-signal performance 
is robust under random mismatches. 
 
Figure 4.16: Frequency responses of the designed op amp under mismatch variation 
The simulated transient responses of the op amp are shown in Figure 4.17. As can be seen, 
the op amp always settles to its final steady-state voltage after a certain period. The final 
steady-state voltage slightly varies due to the op amp’s random offset voltages. The offset 
voltages of the op amp have a normal distribution with a mean of 0.03mV and a sigma of 
2.8mV. The performance pairs of (mean, sigma) of the simulated SR- and SR+ are respectively 
(-1.4V/μs, 0.01V/μs) and (0.78V/μs, 0.003V/μs). Similarly, the (mean, sigma) of Ts+_1%, Ts-
_1%, Ts+_0.1%, and Ts-_0.1% are found as (0.84μs, 0.04μs), (0.58μs, 0.08μs), (1.06μs, 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-400
-350
-300
-250
-200
-150
-100
-50
0
50
100
150
200
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response under mismatch variation
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-140
-120
-100
-80
-60
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
86 
 
0.04μs) and (0.77μs, 0.1μs). The very tight spread of the slew rate and settling time of the 
designed op amp confirms the robustness of the op amp’s large-signal performance under 
random mismatch variations.  
 
Figure 4.17: Transient responses of the the designed op amp under mismatch variation 
 
Figure 4.18: FOMs of the designed op amp under mismatch variation  
The histograms of the FOMs and FOML of the op amp are shown in Figure 4.18 and Figure 
4.19.  FOMs has a normal distribution with a mean and sigma of 1904pF/MHz-uA and 
140pF/MHz-uA respectively, whereas the mean and sigma of the FOML are respectively 
2501pF*V/us-uA and 70pF*V/us-uA. The narrow variations of the op amp’s FOMs and FOML 
again confirm the robustness of the designed op amp under mismatch variations. The 
performance summary of the design op amp under mismatch variations is shown in Table. 4.5.  
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0.53
0.58
0.63
0.68
0.73
0.78
0.83
0.88
0.93
0.98
1.03
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
1600 1800 2000 2200
0
50
100
150
200
250
300
FOMs (MHz*pF/A)
H
it
s
FOMs (MHz*pF/A)
87 
 
 
Figure 4.19: FOML of the designed op amp under mismatch variation 
Table 4.5: Performance summary of the designed op amp under mismatch variation 
Output Unit Min Max Mean Median Std Dev 
Phase Margin degree 54.1 69.7 62.6 62.7 2.5 
GBW MHz 0.6 1.1 0.8 0.8 0.1 
DC gain dB 90.2 95.2 92.5 92.5 0.7 
Isupply μA 6.0 7.2 6.6 6.5 0.2 
Vos mV -9.7 10.6 -0.1 0.0 2.8 
SR- V/μs -1.4 -1.4 -1.4 -1.4 0.0 
SR+ V/μs 0.8 0.8 0.8 0.8 0.0 
Ts-_1% μs 0.4 0.7 0.6 0.6 0.1 
Ts+_1% μs 0.6 0.9 0.8 0.8 0.0 
Ts+_0.1% μs 0.4 1.1 0.8 0.7 0.1 
Ts+_0.1% μs 0.8 1.2 1.1 1.1 0.0 
FOMs pF/MHz-uA 1497.0 2389.0 1904.0 1897.0 140.9 
FOML pF*V/us-uA 2297.0 2721.0 2501.0 2501.0 70.4 
Noise_at_100Hz nV/sqrt(Hz) 1448.0 1578.0 1509.0 1509.0 17.6 
Noise_at_100K nV/sqrt(Hz) 89.5 97.7 93.7 93.7 1.4 
PSRR at 1KHz dB -124.8 -79.2 -91.4 -90.4 6.6 
PSRR at 10KHz dB -116.8 -69.1 -82.8 -82.0 6.3 
PSRR at 100KHz dB -104.2 -48.4 -63.7 -62.1 8.0 
2321 2363 2405 2447 2489 2531 2573 2615 2657 2699
0
50
100
150
200
250
300
FOML (V*pF/A-s)
H
it
s
FOML (V*pF/A-s)
88 
 
4.6.4 Process corner plus mismatch variation simulation results 
In this section, the designed op amp is simulated under both process corner and mismatch 
(P.Mis) variations using a 1000-run Monte Carlo simulation. The purposes of these simulations 
are threefold: a) to verify the functionality of the designed op amp under P.Mis variations; b) 
to check the variations of the op amp’s GBW, PM, DC gain, slew rate and settling time under 
P.Mis variations; and c) to confirm the robustness of the op amp’s FOMs and FOML under 
P.Mis variations. Figure 4.20 shows the simulated frequency responses of the designed op amp 
with a15nF load capacitor under P.Mis variations. The simulated PM, Gain, GBW, Isupply and 
FOMs all have a normal distribution but with values of mean and sigma. Their (mean, sigma) 
are respectively (62.3o, 3.0o), (92.5dB, 1.1dB), (0.9MHz, 0.27MHz), (7.0μA, 1.87μA) and 
(1910 pF/MHz-μA, 167 pF/MHz-μA). As discussed, the variation in GBW is mainly caused 
by the process corner variation of the resistors. If a more constant GWB is desired, the resistors 
in either the 1st or the 2nd preamp of the op amp can be trimmed to obtain a constant GBW. As 
both GBW and Isupply vary in a comparable way as the resistor value varies, the variation of 
GBW/Isupply is much smaller than GBW or Isupply alone. That’s the reason why the normalized 
variation (sigma/mean) of FOMs is smaller than those of GBW or Isupply. The histogram of 
FOMs is shown in Figure 4. 21.  
The simulated transient responses of the designed op amp under P.Mis variations are shown 
in Figure 4.22.  As can be seen, the op amp always settles to its final steady-state voltages after 
a certain period. The op amp’s offset voltages show a normal distribution with a mean and 
sigma of 0.1mV and 2.8mV. The simulated Ts+_1%, Ts-_1%, Ts+_0.1%, Ts-_0.1% have a 
normal distribution with (mean, sigma) of (0.82μs, 0.15μs), (1.03μs, 0.2μs), (0.56μs, 0.08μs) 
and (0.8μs, 0.2μs) respectively. The simulated SR- and SR+ have a normal distribution with 
89 
 
(mean, sigma) of (-1.4V/μs, 0.06V/μs) and (0.78V/μs, 0.08V/μs). The FOML also has a normal 
distribution with (mean, sigma) of (2474 pF*V/us-μA, 604.5 pF*V/us-μA). The spread of 
FOML is mainly caused by variations in the supply current under process corner variations. 
The histogram of FOML is shown in Figure 4.23.  The performance summary of the designed 
op amp under P.Mis variations is shown in Table. 4.6.  
 
Figure 4.20: Frequency responses of the designed op amp under P.Mis.variation  
 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-400
-350
-300
-250
-200
-150
-100
-50
0
50
100
150
200
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response under corners and mismatch variation
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
90 
 
 
Figure 4.21: FOMs of the designed op amp under P.Mis.variation 
 
Figure 4.22: Transient responses of the designed op amp under P.Mis variation  
 
Figure 4.23: FOML of the designed op amp under P.Mis variation 
 
1400 1600 1800 2000 2200 2400
0
50
100
150
200
250
300
FOMs (MHz*pF/A)
H
it
s
FOMs (MHz*pF/A)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0.53
0.58
0.63
0.68
0.73
0.78
0.83
0.88
0.93
0.98
1.03
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
1500 2000 2500 3000 3500
0
50
100
150
FOML (V*pF/A-s)
H
it
s
FOML (V*pF/A-s)
91 
 
Table 4.6: Performance summary of the designed op amp under P.Mis variation 
Output Unit Min Max Mean Median Std Dev 
Phase Margin degree 52.73 71.54 62.26 62.37 3.08 
GBW MHz 0.47 1.74 0.90 0.84 0.27 
DC gain dB 89.12 95.38 92.53 92.54 1.07 
Isupply_prop μA 4.30 12.29 7.03 6.64 1.88 
Vos mV -8.96 8.33 0.10 0.16 2.78 
SR- V/μs -1.57 -1.27 -1.40 -1.39 0.06 
SR+ V/μs 0.63 0.94 0.78 0.78 0.08 
Ts-_1% μs 0.36 0.84 0.56 0.56 0.09 
Ts+_1% μs 0.46 1.17 0.82 0.84 0.15 
Ts+_0.1% μs 0.40 1.72 0.80 0.75 0.20 
Ts+_0.1% μs 0.54 1.73 1.03 1.03 0.20 
FOMs pF/MHz-μA 1361 2465 1910 1902 167 
FOML pF*V/us-μA 1212 3900 2474 2486 605 
Noise_at_100Hz nV/sqrt(Hz) 1423 1846 1522 1520 42 
Noise_at_100K nV/sqrt(Hz) 75.47 112.10 93.91 93.98 8.60 
PSRR at 1KHz dB -121.30 -64.36 -89.10 -88.60 7.97 
PSRR at 10KHz dB -110.70 -62.66 -81.50 -81.21 7.05 
PSRR at 100KHz dB -104.80 -44.71 -63.24 -61.78 9.09 
4.6.5 Post-layout simulation results 
The layout of the designed op amp is shown in Figure 4.24. Compared with the schematic 
simulation results, the post-layout simulation results of the proposed op amp show about 14.5% 
reduction in GBW and 3.5% reduction in supply current. The reduction in GBW is caused by 
the routing parasitic capacitance at the internal nodes of the op amp. The slight reduction in 
supply current is caused by the combination of shallow trench isolation (STI) effect, well 
proximity effect (WPE) and layout design with fingers. As a result, the FOMs in the post-
layout simulation shows a reduction of 12.5% by comparison with the schematic simulation 
results. But, as expected, the slew rates of the schematic and post-layout simulation results are 
the same because the voltage swings at the gates of transistors M14 and M15 are close to rail-
92 
 
to-rail supply voltage in both in both schematic and post-layout simulations. The detailed 
simulation results about the designed op amp’s AC and transient responses with schematic and 
post-layout views are shown in Figure 4.25 to Figure 4.27. The schematic and post-layout 
simulation results are compared and shown in Table 4.7 in Section 4.7.   
 
Figure 4.24: Layout view of the designed op amp 
 
Figure 4.25: Proposed op amp’s transient responses in schematic and post-layout 
simulation 
 
0 5 10 15 20 25 30
0.4
0.5
0.6
0.7
0.8
0.9
1
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
 
 
Postlayout sim. Schematic sim Input voltage
93 
 
  
Figure 4.26: Magnified views of the transient overshooting  
  
Figure 4.27: Proposed op amp’s frequency responses in schematic and post-layout 
simulation b) magnified bode plots around 0dB  
4.7. Performance Comparison of This Work with the Literature 
Table 4.7 shows the performance comparison among [3], [9] and the proposed op amp in 
schematic and layout views. As can be seen, the post-layout simulation results of this work 
show favorable performance in small-signal (FOMs), large-signal (FOML), and settling-time 
6 8 10 12
0.976
0.978
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
 
 
Postlayout sim. Schematic sim Input voltage
1 2 3 4 5 6
0.55
0.555
0.56
0.565
0.57
0.575
0.58
0.585
0.59
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
 
 
Postlayout sim. Schematic sim Input voltage
10
0
10
1
10
2
10
3
10
4
10
5
10
6
-200
-150
-100
-50
0
50
100
150
200
Frequency (Hz)
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of proposed op amp
 
 
Schematic sim.
Postlayout sim.
10
0
10
1
10
2
10
3
10
4
10
5
10
6
-40
-20
0
20
40
60
80
100
 
 
X: 6.31e+05
Y: 0.7366
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
X: 1e+06
Y: -1.914
Schematic sim.
Postlayout sim.
10
6
50
Frequency (Hz)
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of proposed op amp
 
 
Schematic sim.
Postlayout sim.
10
6
0
 
 
X: 6.31e+05
Y: 0.7366
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
X: 1e+06
Y: -1.914
Schematic sim.
Postlayout sim.
94 
 
(FOMTs_x%) figure of merits. FOMTs is defined as CL/(Isupply*Ts_x%), where Ts is the settling 
time of the op amp with x% settling accuracy. Work [9] is also redesigned in the same 0.18um 
CMOS process used for the proposed op amp. The redesigned op amp has the exact same 
transistor sizes, bias current and total supply current as reported in [9]. The redesigned op amp 
is not stable with a15nF capacitive load, so the performance of the redesigned op amp is shown 
with a 100nF capacitive load only. Under a 100nF capacitive load, compared with the 
redesigned op amp, the proposed op amp has a similar FOMs but a much higher phase margin, 
FOMTs_1% and FOMTs_0.1%. In addition, when the supply voltage changes by +/-10%, the supply 
current of the redesigned [9] changes by +89.3% and -48.9% respectively, whereas the 
proposed op amp only changes by +1.7/-1.8% respectively. As reviewed in Section 4.2.2, work 
[9] or the redesigned op amp’s supply current is extremely sensitive to its supply voltage 
because its preamp stages greatly amplify current errors due to the channel length modulation 
effects.  
Table 4.8 shows the performance comparison among [3], [9] and the proposed op amp over 
process corner and mismatch variations. In a typical corner, the proposed op amp’s post-layout 
simulated FOMs is 25 times of [3] and 1.8 times of [9], while its FOML is 1198 times of [3] 
and 7.8 times of [9]. Even in its worst-case scenario of schematic simulations, the FOMs of 
the proposed op amp is 20.6 times of [3] and 1.5 times of [9], while its FOML is 632.6 times 
of [3] and 4.1 times of [9]. If trimming bits are available to trim the resistor’s value in the 
designed op amp, variations in FOMs and FOML can be reduced. The performance 
improvement of this work over [3] and [9] are mainly introduced by structurally decoupling 
large- and small-signal operations and eliminating any wasted current in the preamp’s load 
95 
 
circuits. In addition, unlike [3][9], the quiescent current of all the branches in the designed op 
amp is well defined.  
Table 4.7: Performance comparison of this work in schematic and post-layout view with 
recently reported amplifiers 
  
+NCM, 
JSSC'15 
[3] 
+Hybrid, 
JSSC'16 
[9] 
*This work, 
schematic 
*This work, 
post-layout 
*[9], 
redesi
gned  
CMOS process (μm) 0.18 0.13 0.18 0.18 0.18 
VDD (V) 1.2 0.7 1.5 1.5 0.9 
IDD (μA) 3 24 6.44 6.208 24 
DC gain (dB) 84 ~100 93.5 92.09 80 
CL (nF) 15 15 15 100 15 100 100 
GBW (MHz) 0.396 1.46 0.811 0.125 0.684 0.105 0.446 
PM (o) 81 66 66.4 86.36 63.97 85.97 64.11 
SR (V/μs) 0.01 0.47 0.95 0.14 0.95 0.14 0.01 
Avg. 1% settling (μs) 47.00 1.41 0.80 4.90 0.87 4.96 26.15 
Avg. 0.1% settling (μs) - - 0.85 6.00 0.94 6.66 27.86 
FOMs (pF*MHz/μA) 66 912.5 1,889 1,941 1,653 1,691 1,858 
FOML (pF*V/μs/μA) 1.916 293.8 2,201 2,174 2,295 2,302 38 
FOMTs_1% (pF/μs/μA) 106.4 443.3 2,922 3,169 2,793 3,248 159.3 
FOM Ts_0.1% (pF/μs/μA) - - 2,756 2,588 2,577 2,420 149.6 
Area (mm2) 0.0013 0.0027 - - 0.0064 0.0064 - 
Notes: + represents the measurement results and * represent the simulation results 
Table 4.8: Performance comparison of this work with recently reported amplifiers 
  
+NCM, 
JSSC'15 [3] 
+Hybrid, 
JSSC'16 [9] 
*This 
work, typ  
*This 
work, min 
*This 
work, max  
CMOS process (μm) 0.18 0.13 0.18 0.18 0.18 
CL (nF) 15 15 15 15 15 
VDD (V) 1.2 0.7 1.5 1.5 1.5 
IDD (μA) 3 24 6.56 4.299 12.29 
DC gain (dB) 84 ~100 92.6 89.12 95.38 
GBW (MHz) 0.396 1.46 0.85 0.47 1.742 
PM (o) 81 66 62.5 52.73 71.54 
SR (V/μs) 0.01 0.47 1.1 0.95 1.25 
 
96 
 
Table 4.8 (continued) 
Avg. 1% settling (μs) 47 1.41 0.73 0.41 1 
Avg. 0.1% settling (μs) NA NA 0.93 0.47 1.72 
FOMs (pF/MHz-μA) 66 912.5 1940 1361 2465 
FOML (pF*V/us-μA) 1.916 293.8 2500 1212 3900 
*The minimum and maximum performance of this work is reported based on 1000-run Monte Carlo 
simulation with both process corner and mismatch variation enabled. 
4.8. Discussion 
If a tighter spread of the designed op amp’s GBW is needed under all process corner 
variations without the aid of any trimming circuits, a more sophisticated bias strategy is 
needed. As shown in Section 4.5.2, the GBW of the designed op amp is proportional to gm5*R4. 
With the constant gm bias circuit [11], gm becomes approximately proportional to 1/R and the 
expression of GBW is then simplified to be proportional gm. or 1/R. To reduce the GBW spread 
further under process corner variations, one of the preamp stages’ gm needs to be constant 
instead of proportional to 1/R. This can be achieved by biasing the tail current of one of the 
preamp stages with a fixed bias current.  
4.9. Summary  
A new power-efficient design technique for op amps driving large capacitive loads has been 
introduced to largely boost both the op amp’s small- and large-signal performance. An op amp 
is designed with the new technique and demonstrates ability to decouple large- and small-
signal performance, possess very well-defined quiescent current for all the preamp stages, and 
eliminate current waste in the preamp’s load circuits. Because of these good features, the 
designed op amp is much less sensitive to devices’ random mismatches and the op amp can be 
optimized for both large- and small-signal performance. The optimization between the gain 
bandwidth product enhancement and the number of preamp stages for the proposed op amp 
has also been discussed. The proposed op amp has also been simulated in a 180nm CMOS 
97 
 
process under three different conditions. The simulation results are found to agree well with 
theoretical calculations/discussions/analysis. Compared with the state-of-the-art methods 
[3][9], the designed op amp shows very favorable FOMs and FOML. The results show that the 
proposed power-efficient op amp design is suitable for applications such as LCD gamma 
buffers where a large capacitive load is driven.  
4.10. References  
[1].  “LM6584 TFT-LCD Quad, 13V RRIO high output current operational amplifier” Mar.  
2013 [Online]. Available: www.ti.com/cn/lit/gpn/lm6584  
[2]. “4-channel, rail-to-rail, CMOS buffer amplifier,” in Rev. B Texas Instruments, Jul. 
2004 [Online]. Available: http://www.ti.com/product/buf04701  
[3]. Z. Yan, P. I. Mak, M. K. Law, R. P. Martins and F. Maloberti, "Nested-Current-Mirror 
Rail-to-Rail-Output Single-Stage Amplifier With Enhancements of DC Gain, GBW 
and Slew Rate," IEEE Journal of Solid-State Circuits, vol. 50, no. 10, pp. 2353-2366, 
Oct. 2015.  
[4]. X. Peng, W. Sansen, L. Hou, J. Wang, and W. Wu, “Impedance adapting compensation 
for low-power multistage amplifiers,” IEEE J. Solid State Circuits, vol. 46, no. 2, pp. 
445–451, Feb. 2011. 
[5]. S. S. Chong and P. K. Chan, “Cross feedforward cascode compensation for low-power 
three-stage amplifier with large capacitive load,” IEEE J. Solid State Circuits, vol. 47, 
no. 9, pp. 2227–2234, Sep. 2012. 
[6]. Z. Yan, P.-I. Mak, M.-K. Law, and R. P. Martins, “A 0.016-mm2144-μW three-stage 
amplifier capable of driving 1-to-15 nF capacitive load with 0.95-MHz GBW,” IEEE 
J. Solid State Circuits, vol. 48, no. 2, pp. 527–540, Feb. 2013. 
98 
 
[7]. M. Tan and W.-H. Ki, “A cascode Miller-compensated three-stage amplifier with local 
impedance attenuation for optimized complex-pole control,” IEEE J. Solid State 
Circuits, vol. 50, no. 2, pp. 440–449, Feb.2015. 
[8]. K. N. Leung and P. K. T. Mok, "Analysis of multistage amplifier-frequency 
compensation." IEEE transactions on circuits and systems I: fundamental theory and 
applications vol. 48, no. 9, pp. 1041-1056, 2001 
[9]. K. H. Mak, M. W. Lau, J. P. Guo, T. W. Mui, M. Ho, W. L. Goh, and K. N. Leung, 
"A Hybrid OTA Driving 15 nF Capacitive Load With 1.46 MHz GBW." IEEE Journal 
of Solid-State Circuits vol. 50, no. 11 (2015): 2750-2757. 
[10]. V. V. Ivanov, and I. M. Filanovsky, “OpAmp Gain Structure, Frequency 
Compensation and Stability," in Operational amplifier speed and accuracy 
improvement: analog circuit design with structural methodology, Springer Science & 
Business Media, 2006,  pp. 76 
[11]. N. Talebbeydokhti, P. K. Hanumolu, P. Kurahashi and Un-Ku Moon, "Constant 
transconductance bias circuit with an on-chip resistor," 2006 IEEE International 
Symposium on Circuits and Systems, Island of Kos, 2006, pp. 4 pp.-2860 
 
 
 
 
 
 
 
99 
 
 CURRENT UTILIZATION EFFICIENCY 
ENHANCEMENT FOR FOLDED CASCODE AMPLIFIERS  
5.1. Introduction 
Op amps are one of the most fundamental building blocks for many analog and mixed-signal 
systems. Among different op amp structures, folded cascade amplifiers (FCAs) are one of the 
mostly widely-used architectures in single- and multi-stage op amp designs because FCAs 
have high gain, wide input common code range (ICMR) and reasonably large output voltage 
swing (OSW) [1]. PMOS input FCAs, due to their higher non-dominant poles, lower flicker 
noise, and lower input common mode levels, have become the primary choice over its NMOS 
counterpart. Moreover, PMOS input FCAs allow for input switches using a single NMOS 
transistor in switched-capacitor (SC) applications [2].  
 
Figure 5.1: Schematic of a conventional folded cascode amplifier (FCA) 
Figure 5.1 shows a conventional PMOS input FCA. In this FCA design, the tail current (Itail) 
of the op amp is designed to meet the FCA’s noise and GBW specifications. The cascode stage 
current (Ib) is conventionally set as larger than 0.5Itail to avoid a long recovery time caused by 
100 
 
the input pair (M1-M2) working in the triode region and the cascode transistors (M7-M8) 
working in the cutoff region in slewing phases. In practice, Ib is usually designed to be about 
0.7*Itail to provide some design margin over random mismatch variations [3]. Therefore, the 
bias current of the cascode stage is about 1.4 times of the FCA’s tail current. Unfortunately, 
this large amount of bias current in the FCA’s cascode stage not only dramatically increases 
the power consumption of the FCA but also degrades the FCA’s noise and offset voltage 
performance. This is discussed further below. 
The input referred offset voltage of the FCA in Figure 5.1 is calculated as (5-1), in 
which ∆V12, ∆V34, ∆V56 are the offset voltages of transistors M1-M2, M3-M4 and M5-M6. 
Also, gmi is the transconductance of transistor Mi, i=1,2..,6. To calculate noise, we assume that 
transistors M1-M6 have the same length, current density, and flicker noise constant for 
simplicity. Thus, the FCA’s input referred noise power density is calculated and simplified as 
(5-2), where Vni
2  is the noise power density of transistor Mi. Kf, Cox, W1, and L1 are transistor 
M1’s oxide capacitance per unit area, width, length, and flicker noise constant. Also, k and T 
are respectively Boltzmann constant and temperature in Kevin. As can be seen from (5-1) and 
(5-2), the FCA’s offset voltage and noise drop as gm3/gm1 and gm5/gm1 decreases. For a given 
targeted GBW and capacitive load (CL), gm1 is designed to be GBW*CL. Therefore, a power-
efficient way to reduce the FCA’s noise and offset voltage is to decrease the transconductance 
of transistors M3-M6 via reducing their bias currents.  
 Vos = ∆V12 + ∆V34 ∗
gm3
gm1
+ ∆V56 ∗
gm5
gm1
 (5-1) 
 
Vni2̅̅ ̅̅ = 2 [Vni2̅̅ ̅̅ + Vn3
2̅̅ ̅̅ (
gm3
gm1
)
2
+ Vn5
2̅̅ ̅̅ (
gm5
gm1
)
2
] (5-2) 
101 
 
= (
16kT
3gm1
+
2Kf
W1L1Coxf
) (1 +
gm3
gm1
+
gm5
gm1
) 
5.2. Literature Review 
5.2.1 General review 
In an effort to reduce an FCA’s input referred offset voltage and noise, several techniques 
have been reported in the literature [4]-[6]. The techniques function by reducing gmc/gm1, where 
gmc is the total effective transconductance of the top PMOS and bottom NMOS transistors in 
the FCA’s cascode stage, and gm1 is the transconductance of the FCA’s input pair. Approach 
[4] reduces gmc by doing a resistive degeneration for the top PMOS and bottom NMOS 
transistors. However, this approach not only reduces the FCA’s ICMR and OSW but also 
increases area consumption since a large degeneration resistor is placed in a low power design. 
Approach [5] adds a low noise preamp stage in front of the conventional FCA, but this 
approach significantly increases the FCA’s power consumption to achieve the same slew rate 
performance. Approach [6] uses a new turn-around circuit and improves the FCA’s noise, input 
offset voltage and current utilization efficiency (CUE) by reducing the cascode stage’s bias 
current, where CUE is defined as the ratio of the FCA’s tail current to its supply current. But 
this approach can only afford a slight decrease in the cascode stage’s bias current so as to avoid 
a long recovery time. In addition, this approach requires a complicated frequency 
compensation to stabilize its new turn-around stage caused by its multiple internal loops. This 
significantly increases design complexity and area consumption of the FCA. Therefore, in this 
chapter, we propose a new output stage to enhance the FCA’s CUE. The proposed output stage 
also improves the FCA’s performance in terms of noise, offset voltage and gain.  
102 
 
5.2.2 A state-of-the-art FCA design for CUE enhancement 
Figure 5.2 shows a state-of-the-art method [6] to improve the FCA’s CUE by reducing its 
cascode stage’s bias current. As byproducts, the noise and offset performance of the FCA are 
also improved. The PMOS-side circuit, formed by transistors M5-M6, M9-M10 and M13-M4, 
is symmetric to the NMOS-side circuit formed by transistors M3-M4, M7-M8 and M11-M12. 
Transistors M10 and M14 respectively share the same gate voltages as transistors M9 and M13. 
In addition, transistors M10 and M13 respectively share the same source voltages as transistors 
M14 and M9. As the total drain current of transistors M10 and M14 is also the same as that of 
transistors M9 and M13, the DC bias voltages of node 3 and 8 are equal, V8=V3. Consequently, 
transistor M10 has a constant bias current that is the same as in transistor M9. Therefore, the 
drain currents of transistors M12 and M7 are also equal to Ib in the NMOS-side circuits.  
 
Figure 5.2: Rudy’s FCA a) the FCA’s schematic b) floating battery in the FCA 
Upon application of a small negative differential input voltage, the differential signal 
currents in M1 and M2 are respectively -∆I and ∆I. These differential currents cause a voltage 
increase by ∆V7 at node 7, whereas the voltage at node 1 stays the same since Vgs of transistor 
Vb3
Vb2
M8 M12
M14
⑤ 
⑥ 
⑦ 
⑧ 
Vbattery
Battery circuit
⑧ 
⑦ 
M10
Vb3
Vss
Vdd
Vb2
Vin- Vin+
Vb1
M0
M1 M2
M9
M8
M5M6
M10
M4 M3
Vo
M7
CL
M12
M14
M11
M13
Vb3
Vb2
Cm1
R1
Cm2
Cm3
R2
Cm4
Ib
①
②
③ 
⑤ 
④ 
⑥ 
⑦ 
⑧ 
103 
 
M7 stays the same because of the constant bias current. Then, ∆V7 is simply shifted up to node 
6, so ∆V6=∆V7. Therefore, the gate source voltages of transistors M8 and M11 change by +∆V7 
and -∆V7, respectively. Since transistors M7-M8 and M11-M12 are symmetrically designed 
with the same size, the differential signal currents in M8 and M11 can be respectively found 
as -∆I and ∆I. The signal current in M8 is ultimately copied to transistor M13 by the circuit 
formed by M5-M6, M9-M10 and M13-M14. Therefore, the signal currents in transistors M13 
and M11 are -∆I and ∆I respectively. This holds true only when |∆I| is less or equal to 2* Ib or 
when |∆V7|≤Vod8, where Vod8 is the quiescent overdrive voltage of transistor M8.  
However, when ∆I exceeds 2*Ib, ∆V7>Vod8 and transistors M8 and M13-M14 work in the 
cutoff region, whereas M11 still works in the saturation region. In this case, the changes of the 
drain currents in transistors M4, M3, M8 and M11 are respectively ∆I-Ib, ∆I-Ib, -Ib and 2∆I-Ib. 
Therefore, if ∆I=0.5Itail> 2Ib occurs in a negative slewing phase, the drain currents of the 
transistors become I2=I11= Itail, I1=I8=I14=I13=0, I3=I4= Itail+Ib, I5=I6=Ib and I10=Ib, where Ii is 
transistor Mi’s drain current and i=1,2…14. Similarly, in a positive slewing phase, I2=I11=0, 
I1=I8=I14=I13=Itail, I3=I4= Itail+Ib, I5=I6= Itail+Ib and I10=Ib. As a result, Rudy’s FCA [6] in Figure 
5.2 has the same slew rate as a conventional FCA even when Ib is smaller than 0.5*Itail. But as 
mentioned, if Ib<0.25Itail, either M8 or M11 would work in the cutoff region during negative 
or positive slewing phases, which increases the recovery time of the FCA after slewing 
completes. This limits the lower boundary of the total bias current in this FCA’s cascode stage 
as 4*Ib> Itail. Therefore, the maximum achievable CUE of this FCA is within 50% to avoid 
long recovery time.  
In addition, this FCA needs a complex frequency compensation and significantly increases 
the FCA’s area overhead. There are two translinear loops in the FCA in Figure 5.2. One loop 
104 
 
is M13-M5-M6-M14-M13 and another loop is M11-M3-M4-M12-M11. The two translinear 
loops make the circuits between nodes 7 and 8 work as a floating battery. Therefore, it can be 
found that the resistance at nodes 7 and 8 is about 1/(gds6+gds4+gds2), where gds6, gds4 and gds2 
are respectively the conductance of transistors M6, M4 and M2. Due to the existence of two 
high impedance nodes in the FCA including node 7 or 8 and output node, a complex frequency 
compensation shown in Figure 5.2 is needed in [6] to stabilize the FCA, which unfortunately 
dramatically increases the FCA’s design complexity and area overhead. In the design example, 
the area consumption of the compensation capacitors and resistors is as big as the FCA core. 
In summary, method [6] improves the FCA’s CUE slightly but at the cost of significantly 
increased design complexity and area overhead.  
5.3. Proposed FCA Output Stage Design for Low Noise, Offset and Power   
   In this section, we summarize the desired features of an effective FCA output stage. Based 
on the desired features, a conceptual FCA output stage design is presented. Then, actual circuits 
are designed to implement the conceptual FCA output stage.  
5.3.1 Desired features and conceptual design of a FCA output stage 
A conceptual single-stage FCA design with desired features of its output stage is shown in 
Figure 5.3. As widely known, the FCA’s input stage has a fixed trade-off among noise, power 
and speed. The input stage’s transconductance, gm, is set to GBW*CL to meet the GBW 
specification, where CL is the FCA’s load capacitor. In addition, the gm must be large enough 
to meet the FAC’s noise specification, as shown in (5-2). Assuming that the thermal noise 
dominates the FCA’s total noise, it can be found that the input pair’s gm needs to be larger 
than 16*k*T*m/(3*Vni,spec
2), where k, T and Vni,spec are respectively the Boltzmann constant, 
the operation temperature in Kevin, and FCA’s input referred voltage noise specification. Also, 
105 
 
m represents the ratio of the FCA’s total noise power to the noise power from the input pair. 
Therefore, the input pair’s gm needs to be large enough, as shown in (5-3) to meet both GBW 
and noise specifications. With a constant gm/ID (gm,efficiency) design strategy, the input pair’s tail 
current (Itail) can be easily found as 2*gm,spec/gm,efficiency.  
gm,spec ≥ max (
16kT ∗ m
3Vni,spec
2 , 2 ∗ GBW ∗ CL) (5-3) 
 
Figure 5.3: Desired features of a FCA’s output stage 
 
The FCA’s output stage takes the input differential current from the input pair and conveys 
the current to the output of the FCA. The FCA’s output stage must be able to convey at least 
Itail to the output in a large signal operation. Ideally, the output stage not only passes the 
differential current from the input pair but also amplifies the current and then passes the 
amplified current to the output node. That is, we want Ai_tran to be large, where Ai_tran is the 
ratio of the output current (Iout) to the input differential current (Idm) in the large signal 
operation. In a small signal operation, Idm is converted to output voltage by the output stage’s 
output resistance, Rout. Ideally, Rout should be infinite to generate infinite DC gain. In addition, 
ideally the FCA’s output stage should contribute zero offset and noise (m=1), and consume 
Vin- Vin+
M1 M2
CL
Input stage Desired output stage
Ib
Large Ai_tran
Infinite Rout
Zero Power
Zero Noise
Zero Offset
Vout
Fixed noise, power, 
speed tradeoff
Iout
0.5Idm
0.5Idm
106 
 
zero bias current. In summary, a desired FCA’s output stage should have a large Ai_tran or a 
large current conveyance capability, a very large Rout, zero power consumption, zero noise and 
zero offset contribution. However, meeting these requirements altogether is usually very 
difficult because a large current conveyance capability typically requires a large bias current 
in the output stage, while a large bias current not only increases the FCA’s power consumption, 
noise and offset voltage but also reduces FCA’s Rout. Clearly, tradeoffs need to be made among 
a sufficiently large current conveyance capability, a sufficiently large Rout, minimal noise, 
minimal offset and minimal power consumption. 
To mitigate the tradeoffs above, a conceptual FCA design with the proposed output stage is 
shown in Figure 5.4. The basic idea is to decouple the large-signal and small-signal operations. 
The large-signal path determines the current conveyance capability. This path is normally off 
so that this path needs zero bias current and contributes zero noise and zero offset. On the other 
hand, the small-signal path is always on with minimal bias current so that power, noise and 
offset caused by this path are also minimized. The gain is also maximized. A circuit 
implementation of the conceptual design is discussed in the following sections.  
 
Vin- Vin+
M1 M2
Input stage Desired output stage
Ib
Fixed noise, power, 
speed tradeoff
0.5Idm
0.5Idm Rout, power, 
noise, offset
Current 
conveyance 
capability
Large 
signal
Small 
signal
CL
Vout
Iout
107 
 
Figure 5.4: A conceptual design of a FCA output stage 
5.3.2 Proposed FCA core amplifier design 
For a differential-input single-ended output FCA, a differential-to-single-ended conversion 
circuit needs to be implemented in the FCA. The two types of the differential-to-single-ended 
conversion circuits with a PMOS input FCA are shown in Figure 5.5. Figure 5.5(a) implements 
the conversion by a top PMOS current mirror, whereas Figure 5.5(b) implements it by a bottom 
NMOS current mirror. The two FCAs are analyzed and their advantages and disadvantages are 
discussed.  
The FCA in Figure 5.5a) is the most conventional design for a PMOS input FCA. In the 
signal path from Vin+ to Vo, there are three low impedance nodes including nodes ①, ②, and 
④. The impedances of these nodes are respectively 1/gm7, 1/gm5 and 1/gm10. These nodes’ 
impedances are sensitive to the cascode stage’s bias current, A*Itail. In the signal path from Vin- 
to Vo, there is only one low impedance node, node ⑤. This node’s impedance is 1/gm8, which 
is also very sensitive to the cascode stage’s bias current. Therefore, all the nondominant poles 
in Figure 5.5(a) are highly dependent on the cascode stage’s bias current.  
 
Vb3
Vss
Vdd
Vb2
Vin- Vin+
Vb1
M0
M1 M2
M9
M8
M5 M6
M10
M4M3
Vo
M7
CL
Itail
Vb1
A*Itail A*Itail
Vb3
Vss
Vdd
Vb2
Vin- Vin+
Vb1
M0
M1 M2
M9
M8
M5 M6
M10
M4M3
Vo
M7
CL
Itail A*Itail A*Itail
①
②
③ 
⑤ 
④ 
①
②
③ 
⑤ 
④ 
Vb4
108 
 
Figure 5.5: A PMOS input FCA with  differential-to-single-ended conversion on a) PMOS 
side b) NMOS side 
 
After writing and solving the KCL equations at nodes ①~⑤, the transfer function from the 
FCA’s input to the output is calculated as (5-4), where Ci is the parasitic capacitance at node i 
and gmi is the transconductance of transistor Mi. There are three nondominant poles and two 
zeros in the transfer function. One pole is always located at gm7/C1. The rest two nondominant 
poles/zeros can be either complex or real poles/zeros, depending on whether gm10/C4<4gm5/C2 
or not. When gm10/C4<4gm5/C2, the complex pole pair’s natural frequency is 
√gm5gm10/(C2C4) , whereas the complex zero pair is √2gm5gm10/(C2C4) . If 
gm10/C4>4gm5/C2, the nondominant pole at gm10/C4 cancels out the zero at the same frequency. 
The remaining nondominant poles are at frequencies of gm5/C2, gm7/C1 and the rest zero’s 
frequency is 2gm5/C2.  In summary, regardless of whether the zeros or the poles are complex 
or real, the frequencies of the nondominant poles and zeros are highly sensitive to the cascode 
stage’s bias current but are independent of tail current. The smaller the bias current of the 
cascode stage is, the lower the frequencies of the nondominant poles, zeros and phase margin 
are. This fundamentally limits the lower boundary of the bias current of the cascode stage.  
TFslow,FCA =
gm1
2 ∗ gL
(1 + s
CL
gL
) (1 + s
C1
gm7
)
∗
s2 +
gm10
C4
s +
2 ∗ gm5gm10
C2C4
s2 +
gm10
C4
s +
gm5gm10
C2C4
 
(5-4) 
TFfast,FCA =
gm1
2 ∗ gL
(1 + s
CL
gL
) (1 + s
C1
gm7
)
∗
s2 +
gm8
C5
s +
2 ∗ gm4gm8
C2C5
s2 +
gm8
C5
s +
gm4gm8
C2C5
 
(5-5) 
The alternative FCA design in Figure 5.5(b) mitigates the dependency of the nondominant 
poles on the cascode stage’s bias current. On the signal path from Vin+ to Vo, there are still 
three low impedance nodes including nodes ①, ②, and ⑤. As the minimum drain current of 
109 
 
transistor M3 is 0.5*Itail, the nondominant pole at node ②, gm3/C2, is always at a very high 
frequency even with zero cascode bias current. So, the pole is not very sensitive to the low bias 
current in the cascode stage. After writing and solving the KCL equations at nodes ①~⑤, the 
transfer function from the FCA’s inputs to its output is calculated as (5-5). One nondominant 
pole is always at gm7/C1. When gm8/C5<4gm4/C2, the rest two nondominant poles are complex 
poles with a natural frequency of √gm4gm8/(C2C5), whereas two zeros are complex zeros 
with a natural frequency of √2gm4gm8/(C2C5). As can be seen, the natural frequencies of 
complex poles and zeros are proportional to √gm4, and gm4 is proportional to √Ib + 0.5Itail 
instead of  √Ib . Therefore, when Ib is much smaller than Itail, the frequencies of the two 
nondominant poles and two zeros of the alternative FCA are considerably higher than the 
conventional FCA, especially considering that NMOS transistors’ mobility is also about 2~3x 
of PMOS transistors. When gm8/C5>4gm4/C2, the two zeros and two nondominant poles of the 
alternative FCA become real zeros and poles. Consequently, the nondominant pole and zero at 
a frequency of gm8/C5 are cancelled out. The frequencies of the remaining nondominant pole 
and zero are respectively gm4/C2 and 2gm4/C2, which are at much higher frequencies than the 
conventional FCA’s pole and zero (gm5/C2 and 2gm5/C2). 
The superior speed of the alternative FCA is also confirmed by the simulation results of the 
two FCA design examples in the 180nm CMOS process. The first design example uses a 
conventional FCA structure, whereas the second example uses the alternative structure. 
Because of the speed difference between the two FCAs, the two FCAs are renamed as slow 
and fast FCAs respectively. The simulated frequency and transient responses of the fast and 
slow FCAs are shown in Figure 5.6 and Figure 5.7. The fast FCA has a higher phase margin 
110 
 
and a slightly higher GBW. The transient responses of the two design examples are shown in 
Figure 5.7. Other performance of the two design examples are summarized in Table. 5.1.  
In addition to a faster speed, the fast FCA structure also reduces the amount of bias voltages 
by one because Vb4 is no longer needed and Vb1 is shared with the tail current bias in the fast 
FCA. Furthermore, the negative slew rate of the fast FCA does not depend on the cascode 
stage’s bias current, whereas the slow FCA does. Because of its advantages in less biasing 
circuits and a faster speed, the fast FCA structure is chosen as the core amplifier for the 
proposed FCA.  
 
Figure 5.6: Frequency responses of the conventional fast and slow FCA 
 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-50
-25
0
25
50
75
100
125
150
175
200
Frequency (Hz)
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of prop. FCA
 
 
fast FCA
slow FCA
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
 
 
fast FCA
slow FCA
111 
 
Table 5.1: Performacne summary of the designed conventional slow and fast FCAs 
 Conv slow FCA Conv Fast FCA 
Gain (dB) 80 80 
GBW (MHz) 1.9 2.1 
Phase Margin(degree) 71.4 82 
0.1% settling time (us) 0.778 0.719 
0.01% settling time (us) 1.31 0.811 
Vno (0.01~100KHz) (uV) 49.3 49 
Vno (0.01~2MHz) (uV) 138.4 138.4 
Isupply (uA) 5 5 
CL (pF) 1 1 
 
Figure 5.7: Transient responses of the fast and slow FCA 
5.3.3 Proposed FCA output stage design 
5.3.3.1 Operation Principle 
The conceptual design of the FCA’s output stage and the findings about the FCA core 
amplifier design enlightens the proposed FCA design shown in Figure 5.8. The proposed FCA 
consists of a fast FCA core and an additional turn-around stage. The turn-around stage is 
normally off and is only activated during the FCA’s positive slewing phase. Such design allows 
0 1 2 3 4 5 6 7
0.4
0.5
0.6
0.7
0.8
0.9
1
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
 
 
Fast FCA Slow FCA Input Voltage
112 
 
the FCA’s current conveyance capability to be greatly enhanced during the positive slewing 
phase while at the same time keeping the bias current consumption of the turn-around stage to 
be a minimum and generating very low noise and offset voltage. As a result, the bias current 
of the FCA’s cascode stage can be reduced to a current much smaller than Itail. The cascode 
stage’s bias current is annotated as 𝛼*Itail, where Itail is the drain current of transistor M0. The 
smaller 𝛼 is, the less the noise, offset voltage and power consumption of the FCA are. However, 
𝛼 cannot be indefinitely small because it affects the frequencies of the nondominant pole 
associated with node Vx as discussed earlier. Therefore, a proper value of 𝛼 must be selected. 
In this design, 𝛼=1/12.  
 
Figure 5.8: Schematic of the proposed FCA with a new turn-around stage 
In the proposed FCA, there are two signal paths from the FCA’s inputs to output. The first 
signal path, as shown by the blue lines, always conducts signal current to the output node 
whenever a differential input voltage exists. But the second signal path, as marked by the red 
lines, is activated only when Vid>Von or ∆Vx >∆Vx,on. Vid is the differential input voltage. ∆Vx 
Vb3
Vss
Vdd
Vb2
Vin- Vin+
Vb1
M0
M1 M2
M9
M8
M5 M6
M10
M4M3
Vo
M7
CL
Itail
Vb1
M13
M14
Vb3
M18M11
Vb1
M15
M17
M19
Vb2
𝛼 *Itail
Small and large 
signal path
Large signal path only
I=0 I=0
1:n
M20
Fold cascode amplifier core Additional turn-around stage
Vx
Vy
𝛼 *Itail  𝛼*Itail
m:1
①
②
A3:A4
113 
 
is the voltage change at Vx node upon application of Vid at the input pair. Von and ∆Vx,on are 
respectively the threshold voltages of Vid and ∆Vx required to activate the turn-around stage. 
The details about the workings of the signal paths are discussed below.   
Transistor M13 is designed to be of twice the size as transistor M8 but with the same bias 
current. As a result, M13 works in the triode region in the quiescent operation, which leads to 
a low drain source voltage for M13 or makes Vy approximate Vx. When the DC bias voltage 
of Vx is kept less than transistor M14’s threshold voltage, transistor M14 works in the cutoff 
region so the turn-around stage is also off in the quiescent operation.  
However, upon application of a positive differential input signal, Vid, the source voltage of 
transistor M13 would increase by ∆Vx. Transistor M13 stays in the triode region and the turn-
around stage remains off before Vid and ∆Vx become as big as Von and ∆Vx,on respectively. 
When Vid=Von and ∆Vx =∆Vx,on, the operation region of transistor M13 transits from the triode 
region to saturation region. Once transistor M13 works in the saturation region, any Vid>Von 
will quickly raise the gate voltage of M14 and turns on the turn-around stage. Therefore, the 
boundary between the enabling and disabling of the turn-around stage can be approximately 
marked by the transition of M13’s operation region from the triode region to saturation region. 
At the transition point, the drain currents of M8 and M13 are respectively expressed as (5-6) 
and (5-7), where β8 = μnCoxW8/L8  and  β13 = μnCoxW13/L13 . Also, Vod8 and ∆Id8  are 
respectively M8’s overdrive voltage and drain current change. By dividing (5-6) by (5-7) and 
substituting  β13 = 2β8 , it is found that ∆Id8 = −
α
2
∗ Itail = −
1
24
Itail  and  ∆Vx,on = (1 −
√2
2
) Vod8 = 0.3Vod8. At the transition point, M14 is still off and the drain current change of M8 
comes from the input differential pair. Therefore, the input referred turn-on voltage, Von, for 
the turn-around stage is derived as (5-8) by solving the KCL equation at M13’s source node. 
114 
 
In (5-8), gm1 and Vod1 are respectively the transconductance and overdrive voltage of transistor 
M1. Also, A4 and A3 are respectively the aspect ratios of transistors M4 and M3. In this design, 
A4/A3=8/7. Therefore, Von is found to be about 3mV, assuming that Vod1 is in the neighborhood 
of 70~80mV.  
(Vod8 − ∆Vx,on)
2
∗ 0.5β8 = α ∗ Itail + ∆Id8 (5-6) 
(Vod8 − ∆Vx,on)
2
∗ 0.5β13 = α ∗ Itail (5-7) 
Von = −
∆Id8
gm1
2 (1 +
A4
A3
)
=
1
24 Itail
Itail
2Vod1
(
A4
A3
+ 1)
=
Vod1
12 ∗ (
A4
A3
+ 1)
 
 
(5-8) 
When Vid increases to a point that Vid>Von, transistor M14 turns on, transistor M13 works in 
the saturation region, and the negative feedback loop formed by M11 and M13- M14 is 
activated. As a result, ∆Vx stays as ∆Vx,on regardless of the differential current from M1 and 
M2, Idm, because the negative feedback loop makes M14 compensate Idm. Therefore, in the 
positive slewing phase, the drain currents of M8 and M14 respectively become 0.5α ∗ Itail and 
Itail(1 − α ∗ A4/A3 + 0.5α). The drain current of M14 is then amplified four times to pass to 
the output to charge the load capacitance. This enhances the positive slew rate of the FCA. 
Once the FCA’s output voltage decreases to a point that Vid<Von, the FCA’s turn-around stage 
gets deactivated and M13 returns to work in the triode region.  
As noted in the above operation, transistor M8 always holds half of its quiescent bias current 
in the positive slewing phase, which prevents M8 from ever turning off and keeps the voltage 
change at Vx to be very small. As a result, the input transistor M1 does not work in triode 
region in the slewing phase. Therefore, although the proposed FCA has an extremely small 
cascode bias current, it does not require a long time to recover after the slewing phase 
115 
 
completes, since a long recovery time is generally caused by either transistor M8 working in 
the cutoff region or transistor M1 working in triode region.  
In the negative slewing phase, transistor M2 steers all the tail current into transistor M3, and 
then transistor M4 passes the mirrored current to discharge the load capacitor via transistor M8. 
In the slewing phase, transistors M8 and M10’s drain currents are [A4/A3*(𝛼+1)- 𝛼]*Itail and 
𝛼*Itail respectively, which results in a net discharging current of [A4/A3*(𝛼+1)- 2𝛼]*Itail to the 
load capacitor. The discharging current is slightly larger than that of the conventional FCA. 
The conventional FCA’s discharging current is Itail when its cascode bias current is larger than 
0.5*Itail.   
5.3.3.2 Frequency Response Analysis 
 
Figure 5.9: Small signal block diagram of the proposed FCA  
In order to understand the frequency response of the proposed FCA in Figure 5.8, its small 
signal block diagram is drawn in Figure 5.9. By writing KCL equations at nodes V1, V2, Vx 
and Vo, equations (5-9) to (5-12) can be obtained, where gmi is transistor Mi’s transconductance. 
gi and Ci are respectively the impedance and parasitic capacitance at node i. After solving these 
Vid/2 V1
C1g1
-gm1
gm7V1
C2g2 Cxgx
-gm3
-Vid/2 -gm1
V2 Vx
gds8
gm8Vx
CLgL
gds7
-gm4
Vo
116 
 
equations, the transfer function from the FCA’s inputs to its output is derived as (5-13), 
assuming that the transistors’ transconductance are much larger than their conductance. Then 
(5-13) is rewritten as (5-14) and further simplified as (5-15) after substituting the expressions 
(5-16) into (5-14). The expressions of g1, g2, gx, gL, C1, C2 and Cx are shown in Table 5.2.  
Table 5.2: Expressions of the conductance and capactance in the proposed FCA 
g1=gds2+gds3 C1≈Cdb2+Cgd2+Cdb3+Cgd3+Cgs7 
g2≈gds5gds9/gm9 C2≈ Cgs3+ Cgd3+ Cgs4+ Cgd4 
gx≈gds1+gds4+gds11 Cx≈ Cdb1+Cgd1+Cdb4+Cgd4+Cgs8+ Cgs13+Cgd14+Cgd14 
gL≈gds6gds10/gm10+(gds1+gds4) gds8/gm8  
Vid
2
∗ gm1 + V1(g1 + sC1) + gm7V1 + gds7(V1 − V2) + V2 ∗ gm3 = 0 
(5-9) 
V1 ∗ gm7 + (V1 − V2) ∗ gds7 − V2(g2 + sC2) = 0 (5-10) 
V2 ∗ gm4 + Vx(gm8 + gds8 + gx + sCx) − Vo ∗ gds8 −
Vid
2
∗ gm1 = 0 (5-11) 
Vo(gL + gds8 + sCL) = Vx(gm8 + gds8) (5-12) 
Vo
Vid
≈
0.5 ∗ gm1 ∗ gm8
(gL + sCL)(gm8 + sCx)
∗
s2C1C2 + gm7sC2 + (gm3 + gm4)gm7
s2C1C2 + gm7sC2 + gm3gm7
 (5-13) 
Vo
Vid
=
gm1
2 ∗ gL
(1 + s
CL
gL
) (1 + s
Cx
gm8
)
∗
s2 +
gm7
C1
s +
(gm3 + gm4)gm7
C1C2
s2 +
gm7
C1
s +
gm3gm7
C1C2
 (5-14) 
Vo
Vid
=
gm1
2 ∗ gL
(1 + s
CL
gL
) (1 +
s
k2GBW
)
∗
s2 + k2GBWs + k1k2(1 + k3)GBW
2
s2 + k2GBWs + k1k2GBW2
 (5-15) 
k1 =
gm3
C2
GBW
,    k2 =
gm7
C1
GBW
=
gm8
Cx
GBW
,   k3 =
gm4
gm3
, GBW =
gm1
CL
∗
k3 + 1
2
 (5-16) 
As can be seen from (5-15), there are four poles and two zeros in the system. The locations 
of all the poles and zeros in the system are shown by (5-17), (5-18), (5-19), (5-20), (5-21) and 
(5-22). Because the drain current of M7 is much smaller than that of M3, which makes k1>k2 
117 
 
and (
k2
2
)
2
< k1k2, the poles and zeros (Pnd2, Pnd3, Znd1 and Znd2) are complex poles and zeros. 
The distribution of all the poles and zeros of the system in S-plane is shown in Figure 5.10. 
The complex poles have a lower natural frequency and a lower Q-factor compared to the 
complex zeros. But the complex poles and zeros are close to each other, so the phase drop due 
to the complex poles and zeros are small in this design. The phase drop caused by the complex 
poles and zeros is calculated as (5-23).  
 
Figure 5.10: Poles and zeros distribution of the proposed FCA 
Pd1 = −
gL
CL
 (5-17) 
Pnd1 = −
gm8
Cx
= −k2 ∗ GBW (5-18) 
Pnd2 = −GBW ∗ (
k2
2
− √(
k2
2
)
2
− k1k2) (5-19) 
Pnd3 = −GBW ∗ (
k2
2
+ √(
k2
2
)
2
− k1k2) (5-20) 
Znd1 = −GBW ∗ (
k2
2
− √(
k2
2
)
2
− k1k2(1 + k3)) (5-21) 
Znd2 = −GBW ∗ (
k2
2
+ √(
k2
2
)
2
− k1k2(1 + k3)) (5-22) 
S-plane
Img
Pnd1
Pnd2
Pnd3
Znd1
Znd2
Pd1
Re
118 
 
∅ = −tan−1 {
k2
k1k2 − 1
} + tan−1 {
k2
k1k2(1 + k3) − 1
} (5-23) 
PM = 90 − tan−1 (
1
k2
) −tan−1 {
k2
k1k2 − 1
} + tan−1 {
k2
k1k2(1 + k3) − 1
} (5-24) 
 
Figure 5.11: Phase drop due to complex poles and zeros vs. k1 and k2  
 
 
Figure 5.12: The proposed FCA’s PM vs. k1 and k2 
The dependency of this phase drop on the ratio of k2 to the FCA’s GBW is also shown in 
Figure 5.11. As can be seen, the phase drop is less than 9° even when k2=2 and k1=2*k2=4. In 
2 3 4 5 6 7 8
-10
-8
-6
-4
-2
0
k2=g
m2
/C
1
/GBW
C
o
m
p
le
x
 p
o
le
&
z
e
ro
 p
a
ir
s
 p
h
a
s
e
 d
ro
p
Complex pole&zero pairs phase drop vs. k2 and k1
 
 
k1=2*k2
k1=3*k2
k1=4*k2
k1=5*k2
k1=6*k2
2 3 4 5 6 7 8
50
55
60
65
70
75
80
85
k2=g
m2
/C
1
/GBW
O
p
 a
m
p
 p
h
a
s
e
 m
a
rg
in
Phase Margin vs. k2 and k1
 
 
k1=2*k2
k1=3*k2
k1=4*k2
k1=5*k2
k1=6*k2
119 
 
this design, k1 and k2 are about 3.5 and 2. Therefore, the expected phase drop due to the 
complex poles and zeros is about 5°. The FCA’s phase margin is calculated as (5-24) and its 
dependency on the ratio of k2 to the FCA’s GBW is shown in Figure 5.12. As can be from 
Figure 5.12, the expected phase margin of the proposed FCA is about 70° at k1=3.5 and k2=2.  
5.3.3.3 Noise Analysis 
 
Figure 5.13:  Noise model for the proposed FCA 
As the proposed FCA’s bias current for the cascode stage is smaller than the conventional 
fast FCA, it is of interest to analyze the noise impact of the proposed FCA. The noise model 
of the proposed FCA is shown in Figure 5.13 after neglecting the noise contributed by the 
cascode transistors and the transistors working in the cutoff region. The FCA’s output current 
noise is derived as (5-25), where a transistor’s voltage noise power is expressed as (5-26). The 
8KT
3gmi
 and  
Kf
WiLiCoxf
 in (5-26) respectively represent thermal and flicker noise. The transistors in 
current mirrors are typically sized to have the same length and current density. Consequently, 
Vb3
Vss
Vdd
Vb2
Vb1
M0
M1 M2
M9
M8
M5 M6
M10
M4M3
Vo
M7
CL
Itail
M13
Vb3
M11
Vb1
𝛼 *Itail
Vx
Vy
𝛼 *Itail  𝛼*Itail
①
②*
en1
2
*
en2
2
*
en3
2
*
en4
2
*
en5
2
*
en6
2
Vb1
*
en11
2
120 
 
their widths and transconductance linearly scale with their bias currents. Therefore, their 
voltage noise power is linearly proportional to their bias currents, whereas their current noise 
power is inversely proportional to their bias currents, as shown in (5-26) and (5-27). As a result, 
the noise expression in (5-28) can be established. After plugging (5-28) into (5-25), the 
equation (5-25) is simplified as (5-29). Equation (5-29) is further simplified as (5-30) by 
neglecting 
In5
2
4α
(k3 − 1)
2 because this term is much smaller than In5
2 (k3
2 + 2). Therefore, the 
input referred voltage noise power of the FCA is derived as (5-31).  
 
Ino,prop
2 ≈
gm0
2 en0
2
4
(
gm4
gm3
− 1)
2
+ (gm3
2 en3
2 + gm2
2 en2
2 + gm5
2 en5
2 ) ∗
gm4
2
gm3
2
+ (gm4
2 en4
2 + gm1
2 en1
2 + gm6
2 en6
2 + gm11
2 en11
2 ) 
(5-25) 
eni
2
∆f
=
8KT
3gmi
+
Kf
WiLiCoxf
∝
1
Ibias
 (5-26) 
Ini
2 =
eni
2 gmi
2
∆f
=
gmi ∗ 8KT
3
+
gmi
2 ∗ Kf
WiLiCoxf
∝ Ibias (5-27) 
In5
2 = In6
2 = In11
2 = αIn0
2 ;  In1
2 = In2
2 ;  In3
2 = In4
2 /k3 (5-28) 
Ino,prop
2 ≈
In5
2
4α
(k3 − 1)
2 + (In3
2 + In1
2 + In5
2 ) ∗ k3
2 + (k3In3
2 + In1
2 + 2In5
2 ) (5-29) 
Ino,prop
2 ≈ In1
2 ∗ (1 + k3
2) + In3
2 (k3 + k3
2) + In5
2 (2 + k3
2) (5-30) 
Vni
2 =
Ino,prop
2
[0.5 ∗ gm1 ∗ (k3 + 1)]2
=
In1
2 ∗ (1 + k3
2) + In3
2 (k3 + k3
2) + In5
2 (2 + k3
2)
0.5 ∗ gm1 ∗ (k3 + 1) ∗ GBW ∗ CL
 (5-31) 
Vno
2 = Vni
2 ∗
πGBW
2 ∗ 2π
=
[In1
2 ∗ (1 + k3
2) + In3
2 (k3 + k3
2) + In5
2 (2 + k3
2)]
2gm1 ∗ (k3 + 1) ∗ CL
 (5-32) 
Vno,thermal
2 =
4KT
3
[(1 + k3
2) + a(k3 + k3
2) + b(2 + k3
2)]
(k3 + 1)CL
≈
2KT
CL
 (5-33) 
121 
 
Vno,thermal,conv
2 =
4KT
3 ∗ [2 + 2gm3
′ /gm1 + 2gm5
′ /gm1]
2CL
=
4KT
3CL
∗ [1 + a ∗
r + 0.5
α + 0.5
+ b ∗
r
α
] =
3.2KT
CL
 
(5-34) 
When this proposed FCA is placed in the positive unity gain buffer structure, the equivalent 
rectangular noise bandwidth of the FCA is π/2*GBW/(2π)=GBW/4, where  GBW =
0.5gm1(k3 + 1)/CL. Therefore, the in-band output referred voltage noise power is calculated 
as (5-32), in which the dominant noise source for a wideband FCA is the thermal noise. The 
in-band thermal noise is calculated as (5-33). This equation suggests that a, b and k3 should be 
minimized to minimize the in-band thermal noise for a given load capacitor. In this design, 
a=gm3/gm1=0.4, b=gm5/gm1=0.07 and k3=gm4/gm3=8/7. As a result, the proposed FCA’s in-band 
thermal noise is calculated as 2KT/CL or 91uV at T=300K and CL=1pF after plugging a, b and 
k3 into (5-33). 
Similarly, the thermal noise of the conventional FCA counterpart in Figure 5.5(b) is found 
as (5-34), where gm3
’ and gm5
’ are respectively the transconductance of transistors M3 and M5 
in the conventional FCA counterpart. With a typical bias current of r*Itail=0.67*Itail for the 
conventional FCA’s cascode stage, it can be found that 
gm3
′
gm1
= a ∗
r+0.5
α+0.5
  and  
gm5
′
gm1
= b ∗
r
α
. As a 
result, the integrated thermal noise voltage of the conventional FCA is obtained as 3.2KT/CL 
or 115uV at T=300K and CL=1pF after plugging a=0.4, b=0.07, 𝛼=1/12, and r=0.67. 
Therefore, compared to the conventional FCA, the proposed FCA is expected to reduce the in-
band noise voltage by 21% or 2.03dB.  
122 
 
5.3.3.4 Offset Voltage Analysis 
The variance of transistor Mi’s threshold voltage and ∆βi/βi are expressed as (5-35), where 
βi=µCoxWi/Li. In addition, Athi
2  and Aβi
2  are mismatch coefficients, fixed parameters for a given 
process, of transistor Mi’s threshold voltage and feature sizes. Transistor Mi’s drain current 
variation caused by its random mismatch is shown in (5-36), where Idi and Vodi are respectively 
the transistor Mi’s quiescent current and overdrive voltage. Based on the sizing strategy of the 
fixed current density for transistor Mi, Equation (5-37) shows that Mi’s drain current variation 
is proportional to its bias current. The larger the bias current is, the larger the drain current 
variation is.  
The input referred offset voltage of a FCA can be analyzed in a similar manner to how noise 
is analyzed in section 5.3.3.3. The proposed FCA’s output current variation caused by the 
mismatches of the transistors (M1-M6) and M11 is derived as (5-37). Therefore, its input 
referred offset voltage, Vos,prop, is calculated as (5-38). In (5-38), c = Ios3
2 /Ios1
2  and  d =
Ios5
2 /Ios1
2 . Similarly, the input referred offset voltage for the conventional FCA, Vos,conv, in 
Figure 5.5(b) is calculated as (5-39), in which r=0.67 and 𝛼=1/12. Compared to Vos,conv, it is 
clear that Vos,prop is smaller due to the reduced offset contribution from transistors M3 and M5. 
This is also confirmed by the Monte Carlo simulation results shown below.  
σvthi
2 =
Athi
2
WiLi
    ,      σ2(
∆βi
βi
) =
Aβi
2
WiLi
 (5-35) 
Iosi
2 = σvthi
2 gmi
2 + σ2 (
∆βi
βi
) Idi
2 =
(Aβi
2 Vod
2 +4Athi
2 )Idi
2
WiLiVodi
2 ∝
Idi
2
Wi
∝ Idi 
(5-36) 
Ios,out
2 = Ios1
2 ∗ (1 + k3
2) + Ios3
2 (k3 + k3
2) + Ios5
2 (2 + k3
2) (5-37) 
Vos,prop
2 =
Ios1
2 ∗ [(1 + k3
2) + c ∗ (k3 + k3
2) + d ∗ (2 + k3
2)]
[0.5 ∗ gm1 ∗ (k3 + 1)]2
 
(5-38) 
123 
 
≈
2Ios1
2
gm1
2
(1 + c + 1.5d);    c =
Ios3
2
Ios1
2 ; d =
Ios5
2
Ios1
2   
Vos,conv
2 =
2(Ios1
2 + Ios3,conv
2 + Ios5,conv
2 )
gm1
2 =
2Ios1
2
gm1
2 (1 + c ∗
r + 0.5
α + 0.5
+ d ∗
r
α
) 
 
(5-39) 
5.4. Simulation Results for Proposed FCA vs. Conventional Fast FCA 
In order to confirm the effectiveness and robustness of the improved current utilization 
efficiency brought by the proposed FCA, two design examples are implemented in the 180nm 
CMOS process. The first design example is the conventional (conv.) fast FCA shown in Figure 
5.5(b). The second design example is the proposed (prop.) FCA shown in Figure 5.8. Extensive 
simulations under various process corner variations, mismatch variations and process corner 
plus mismatch variations are conducted to compare the two design examples. The purposes of 
the simulations are twofold: a) to verify that the proposed FCA largely improves the FCA’s 
current utilization efficiency (CUE); and b) to verify that noise, offset voltage, and gain are 
also improved as byproducts from improvement in the FCA’s CUE.  
All the simulation results below are collected with the design examples placed in a 
noninverting unity gain buffer configuration with a load capacitor of 1pF and supply voltage 
of 1.8V. The nominal bias currents of the proposed and conventional op amp are respectively 
3.5µA and 1.88µA but with the same tail current of 1.5µA. 
5.4.1 Typical corner simulation results 
5.4.1.1 Frequency Response  
   The frequency responses of the proposed and conventional FCAs are shown in Figure 5.14. 
The DC gain of the proposed FCA, 89.7dB, is about 8dB higher than that of the conventional 
124 
 
FCA, 83.5dB. The two FCAs have almost the same GBW of 2MHz. The phase margins of the 
conventional and proposed FCA are respectively 74o and 70o. The simulated PM of the 
proposed FCA agrees very well with the theoretical calculation in Section 5.3.3.2. The slight 
phase margin difference is caused by a much lower bias current in the proposed FCA’s cascode 
stage. In the two design examples, the cascode stage’s bias currents in the proposed and 
conventional FCA are respectively 0.083 times and 0.67 times of Itail.  
 
Figure 5.14: Frequency responses of the proposed and conventional FCAs 
 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-50
-25
0
25
50
75
100
125
150
175
200
Frequency (Hz)
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of prop. FCA
 
 
proposed FCA
conventional FCA
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
 
 
proposed FCA
conventional FCA
125 
 
5.4.1.2 Noise Performance 
The simulated noise performance of the two FCAs are shown in Figure 5.15. For example, 
the noise densities of the proposed and conventional FCAs at 100KHz are respectively 
68nV/sqrt(Hz) and 88.4nV/sqrt(Hz). The noise reduction of the proposed FCAs is a natural 
byproduct of the bias current reduction in the cascode stage. The total integrated noise from 
0.01Hz to 2MHz (FCA’s GBW) for the proposed and conventional FCAs are respectively 
93.2µV and 127.4µV. That is to say, compared with conventional FCA, the proposed FCA 
reduces noise by 27%. 
 
Figure 5.15: Noise performance of the proposed and conventional FCAs 
5.4.1.3 Transient Response  
Figure 5.16 shows the step responses of the two FCAs with an input step voltage of 0.6V. 
As expected, the positive slew rate (SR+) of the proposed FCA is larger than the conventional 
FCA due to its inclusion of a turn-around stage. The positive and negative slew rate (SR+ and 
SR-) of the proposed FCA are SR+prop =+5.84V/µs and SR-prop =-1.49V/µs, whereas those of 
the conventional FCA are SR+conv =+1.1V/µs and SR-conv =-1.34V/µs. That is to say, the 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
0
0.2
0.4
0.6
0.8
1
x 10
-4
Frequency (Hz)
V
o
lt
a
g
e
 n
o
is
e
 d
e
n
s
it
y
 (
V
/s
q
rt
(H
z
))
Noise performance
 
 
proposed FCA
conventional FCA
126 
 
positive and negative SR improvement brought by the proposed FCA are 5.3 times and 1.1 
times. The average SR improvement of the proposed FCA is 3.67 times. The simulated SR+ 
improvement is slightly higher than the calculated improvement factor of 4, due to length 
modulation effects of the current mirror M14-M15. The simulated SR- improvement matches 
very well with the theoretical calculation.  
 
Figure 5.16: Transient responses of the proposed and conventional FCAs 
In addition, the settling times for the two FCAs are respectively 0.5µs and 0.75µs with an 
accuracy of 0.1% (Ts_0.1%) and 0.72µs and 1.08µs with an accuracy of 0.01% (Ts_0.01%). 
Therefore, the average Ts_0.1% and Ts_0.01% of the proposed FCA are both shorter than 
those of the conventional FCA by 34%. These simulation results match with the theoretical 
calculation results for 0.1% (7/GBW) and 0.01% (9/GBW) accuracy on settling time. This 
confirms that a long recovery time is not needed by the proposed FCA though its cascode bias 
current is much smaller than its tail current.  
0 1 2 3 4 5 6 7
0.4
0.5
0.6
0.7
0.8
0.9
1
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
 
 
Prop. FCA Conv. FCA Input Voltage
127 
 
5.4.1.4 Performance Summary for Typical Corner Simulation 
The performance of the two design examples are summarized in Table 5.3. Both the 
proposed and conventional FCAs have the same tail current of 1.5µA, but the total bias currents 
of their cascade stages are respectively 2µA and 0.38µA. As a result, their supply currents are 
respectively 1.88µA and 3.5µA, and their current utilization efficiency (CUE) are respectively 
80% and 42%, where CUE is defined as the ratio of the tail current to the FCA’s supply current. 
Therefore, compared with the conventional FCA, the proposed FCA increases the CUE by 2 
times, enhances the average slew rate by 3 times, reduces Ts_0.1% and Ts_0.01% by 34%, 
and reduces the in-band noise by 27%.  
Compared with the conventional FCA, the proposed FCA improves the small signal figure 
of merit (FOMs) and the large signal figure of merit (FOML) by 2 times and 5.5 times 
respectively. The FOMs and FOML shown in (5-40) are used to compare op amps’ GBW and 
slew rate per unit supply current and have been used as a conventional measure to compare op 
amp performance. The general idea is that an op amp with a larger FOMs and FOML tends to 
work faster for a given supply current budget and a given load capacitor. However, because 
neither FOMs nor FOML contains the settling between the fastest large signal slewing and small 
signal settling, this general idea may not be valid in some cases. For example, some op amps 
with a slew rate enhancement (SRE) circuit have three slewing phases. In the first slewing 
phase, the SRE circuit is not activated. In the second slewing phase, the SRE circuit is turned 
on to enhance slew rate. In the third phase, the SRE circuit is deactivated followed by a small 
signal settling. In the second slewing phase, some op amps may work in highly nonlinear 
regions, where the op amp’s internal voltages and currents deviate far away from the op amps’ 
internal voltages and currents in a quiescent status. As a result, a long recovery time may be 
128 
 
needed to recover the internal voltages and currents to their quiescent status. However, this 
long recovery time cannot be captured by either FOMs or FOML. Therefore, we propose a new 
figure of merit to compare op amps’ normalized settling time because a normalized settling 
time is the ultimate speed requirement for a system. The proposed figure of merit is also able 
to capture slow settling behavior such as long recovery times. Equation (5-41) shows the 
expression of the settling time figure of merit (FOMTs_x%) with a settling accuracy of x%, where 
Ts_x% is the op amps’ settling time with x% settling accuracy in a noninverting buffer 
configuration. The values of x can be 1, 0.1, 0.01 and 0.001 depending on the targeted 
application settling accuracy requirement. The larger the FOMTs_x% is, the faster the op amp is. 
Compared with the conventional FCA, the proposed FCA improves both FOMTs_0.1% and 
FOMTs_0.01% by 2.8 times.  
FOMs =
GBW ∗ CL
Isupply
 ;  FOML =
SR ∗ CL
Isupply
   (5-40) 
FOMTs_x% =
CL
Tsx% ∗ Isupply
  , x = 1, 0.1, 0.01 …    (5-41) 
FOMnoise =
Vni,total
2
Vni,input pair
2  (5-42) 
In order to fairly compare the noise performance of op amps, we would also like to define a 
noise figure of merit, FOMnoise, whose expression is shown in (5-42). The purpose of FOMnoise 
is to identify the percentage of the integrated noise contribution from the input pair to the total 
integrated input referred noise. If all the noise of an op amp comes from the input pair, then 
FOMnoise=1. A larger FOMnoise represents more noise coming from devices other than the input 
pair, which signals poorer noise performance of an op amp. In the two designed FCAs, the 
FOMnoise of the proposed and conventional FCAs are respectively 2.6 and 5, meaning that the 
FOMnoise of the proposed FCA is improved by 2.45 times compared with the conventional 
129 
 
FCA. As discussed, this noise performance improvement is a natural byproduct of reducing 
the cascode stage’s bias current.   
Table 5.3: Performance summary of the proposed and conventional FCAs 
Output Unit Prop. Conv. 
GBW MHz 2.14 2.04 
PM degree 70 74 
DC Gain dB 89.69 83.5 
Isupply µA 1.88 3.5 
Iwaste µA 0.38 2 
Itail µA 1.5 1.5 
Iwaste/Itail % 25.26 133.5 
Current utilization efficiency (Itail/Isupply) % 80 42 
SR_avg V/µs 3.67 1.2 
0.1% Settling time @Vstep=0.6V µs 0.5 0.75 
0.01% Settling time @Vstep=0.6V µs 0.72 1.08 
Vni @ 100KHz nV/sqrt(Hz) 68.04 88.4 
Vni integrated to 2MHz µV 93.12 127.4 
FOMs pF*MHz/µA 1.14 0.56 
FOML pF*V/µA-µs 1.95 0.35 
FOMTs_0.1%   pF/µA-µs 1.07 0.38 
FOMTs_0.01%   pF/µA-µs 0.74 0.26 
FOMnoise(total noise/input pair noise) (V/V)
2 2.6 5 
CL pF 1 1 
Vsupply V 1.8 1.8 
 
5.4.2 Process corner and temperature variation simulation results 
In this section, the designed two FCAs are simulated under process corner and temperature 
(P.T.) variations ranging from -40oC to 85oC. The purposes of the simulations are twofold: a) 
to verify the robustness of the proposed FCA under P.T. variations; and b) to confirm the 
advantages of the proposed FCA under P.T. variations. Simulations of the designed FCAs are 
130 
 
set up to cover frequency response, transient response and noise performance since it is known 
that these are the elements that are commonly impacted by P. T. variations. The independent 
process corners variations and temperature variations are listed in Table. 5.4. In total, there are 
25 simulation setups including 1 typical corner and 24 combinations of P.T. variation.  
Table 5.4: Simulation setup with process corner and temperature variation  
 Typical Corners 
Temperature 27oC -40oC, 27oC and 85oC 
Low Vth MOS tntp snsp, snwp,wnsp,wnwp 
High Vth MOS tntp snsp, wnwp 
5.4.2.1 Frequency Response  
  
Figure 5.17: Frequency responses of the two FCAs a) proposed b) conventional 
Figure 5.17 shows the frequency responses of the proposed and conventional FCAs under 
P.T. variations. The (min, typ, max) of the proposed FCA’s simulated DC gain, phase margin 
(PM) and GBW are respectively (83dB, 89.7dB, 89.7dB), (66.4o, 70o ,72.5o) and (1.76MHz, 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-100
-75
-50
-25
0
25
50
75
100
125
150
175
200
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of prop. FCA under P.T. variation
 
 
Prop. loop phase
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
 
 
Prop. loop gain
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-50
-25
0
25
50
75
100
125
150
175
200
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of conv. FCA under P.T. variation
 
 
Conv. loop phase
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
 
 
Conv. loop gain
131 
 
2.14MHz, 2.4MHz). On the other hand, the (min, typ, max) of the conventional FCA’s 
simulated DC gain, phase margin (PM) and GBW are respectively (80dB, 83.5dB, 83.5dB), 
(73.7o, 74o ,74.5o) and (1.7MHz, 2.0MHz, 2.2MHz). The variations of DC gain, PM and GBW 
of both conventional and proposed FCA are small. The lowest DC gain of the proposed FCA 
is captured in the corner of fast NMOS when T=85oC. In this corner, transistor M14 in Figure 
5.8 is weakly on, which consequently reduces the output impedance of the FCA. Nevertheless, 
the DC gain of the proposed FCA is always higher than the conventional FCA.  
5.4.2.2 Noise Performance  
 
Figure 5.18: Noise performance of the prop. and conv. FCAs under P.T. variation 
The simulated noise performance of the proposed and conventional FCAs under P.T. variations 
are shown in Figure 5.18. The input referred voltage noise densities of the proposed and 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
x 10
-4
Frequency (Hz)
V
o
lt
a
g
e
 n
o
is
e
 d
e
n
s
it
y
 (
V
/s
q
rt
(H
z
))
Noise performance under P.T. variation
 
 
Prop. FCA
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
0
0.5
1
1.5
x 10
-4
Frequency (Hz)
V
o
lt
a
g
e
 n
o
is
e
 d
e
n
s
it
y
 (
V
/s
q
rt
(H
z
))
 
 
Conv. FCA
132 
 
conventional FCAs are respectively 55.7~87.5 nV/sqrt(Hz) and 72.3~115nV/sqrt(Hz) at a 
frequency of 100KHz. The integrated noise from 0.01Hz to 2MHz of the proposed and 
conventional FCAs are respectively 76.7~118.9µV and 104.6~161.1µV. Therefore, both the 
minimum and maximum of the proposed FCA’s integrated noise are 26% lower than the 
conventional FCA.  
5.4.2.3 Transient Response  
 
Figure 5.19: Transient responses of the prop. and conv. FCAs under P.T. variation 
The transient responses of the two FCAs are simulated in the noninverting unity gain buffer 
configuration with an input step voltage of 0.6V under P.T. variations. The simulation results 
are shown in Figure 5.19. Both the positive and negative slew rates of the proposed and 
conventional FCAs show a very small spread under P.T. variations. This indicates the 
robustness of the proposed FCA in its positive slew rate enhancement. This robustness over 
P.T. variations is expected because the tail current in the positive slewing phase is amplified 
by a well-defined current gain, and then the amplified current is passed to the load capacitor 
by the turn-around stage. The mean SRs of the proposed and conventional FCAs range from 
3.0~4.4V/µs and 1.2~1.24V/µs respectively. Also, the mean Ts_0.1% of the proposed and 
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
0.34
0.44
0.54
0.64
0.74
0.84
0.94
1.04
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response under P.T. variation
 
 
Input voltage Conv. group Prop. group
133 
 
conventional FCAs range from 0.43~0.63µs and 0.72~0.78µs respectively. This clearly shows 
the speed advantage of the proposed FCA over the conventional FCA. It also shows that the 
proposed FCA does not suffer from any long recovery time under P.T. variations.  
5.4.2.4 Performance Summary for P.T. Variation 
The performance summary of the proposed and conventional FCAs under P.T. variations is 
shown in Table. 5.5. Compared with the conventional FCA, the proposed FCA shows the 
advantages in terms of GBW, DC gain, settling time and supply current under P.T. variations. 
This clearly demonstrates the advantages and robustness of the proposed FCA. 
Table 5.5: Performance summary of the prop. and conv. FCAs under P.T. variation 
  Proposed FCA Conventional FCA 
Output Unit Min Max Typ Min Max Typ 
GBW MHz 1.76 2.4 2.14 1.7 2.2 2.04 
PM ◦ 66.4 72.5 70 73.7 74.5 74 
DC Gain dB 83 89.7 89.7 80 83.5 83.5 
SR_avg V/µs 3.02 4.41 3.67 1.2 1.21 1.23 
Ts_0.1% @Vstep=0.6V µs 0.43 0.63 0.5 0.72 0.78 0.75 
Ts_0.01% @Vstep=0.6V µs 0.64 0.85 0.72 0.87 1.03 0.93 
Vni@ 100KHz nV/sqrt(Hz) 55.7 87.5 68.0 72.3 113.8 88.4 
Vno integrated to 2MHz µV 76.7 118.9 93.1 104.6 161.1 127.4 
FOMs pF*MHz/µA 0.94 1.28 1.14 0.48 0.63 0.56 
FOML pF*V/µA-µs 1.6 2.34 1.95 0.345 0.354 0.352 
FOMTs_0.1%   pF/µA-µs 0.85 1.23 1.07 0.37 0.4 0.38 
FOMTs_0.01%   pF/µA-µs 0.62 0.84 0.74 0.25 0.28 0.27 
Isupply µA 1.88 3.5 
Iwaste µA 0.38 2.0 
Itail µA 1.5 1.5 
CUE (Itail/Isupply) % 80 43 
CL pF 1.0 
Vsupply V 1.8 
Process   180nm CMOS 
134 
 
5.4.3 Mismatch variation simulation results 
This section details the results of the two designed FCAs simulated under mismatch 
variations via the 500-run Monte Carlo simulation. The purposes of the simulations are twofold: 
a) to verify the robustness of the proposed FCA under mismatch variations; and b) to confirm 
the advantages of the proposed FCA under mismatch variations. The simulated performance 
includes transient response, offset voltage, DC gain and supply current.  
The simulated transient responses of the proposed and conventional FCAs under mismatch 
variations are shown in Figure 5.20. The slew rates of both the proposed and conventional 
FCAs show very small variations under device mismatch variations. The (mean, sigma) of the 
slew rate of the proposed and conventional FCAs are respectively (3.66V/µs, 0.433V/µs) and 
(1.23V/µs, 0.024V/µs). The (mean, sigma) of Ts_0.01% of the proposed and conventional 
FCAs are respectively (0.72µs, 0.02µs) and (1.08µs, 0.02µs). In addition to a faster speed, the 
proposed FCA also has a smaller random offset voltage. The (mean, sigma) of the offset 
voltages of the proposed and conventional FCAs are respectively (-0.14mV, 2.09mV) and (-
0.14mV, 2.95mV). Therefore, the proposed FCA’s offset voltage is decreased by about 30%. 
The random mismatch has a negligible impact on the DC gain and supply current of the two 
FCAs. Both FCAs have a tail current of 1.5µA. As discussed before, Itail is determined based 
on GBW and noise specifications. Any extra current other than Itail is considered as the FCA’s 
wasted current, Iwaste. The normalized wasted current (Iwaste/Itail) of the proposed and 
conventional FCAs are respectively 25% and 133%. The CUE of the proposed and 
conventional FCAs are respectively 80% and 43%.  
135 
 
 
Figure 5.20: Transient responses of the prop. and conv. FCAs under mismatch variation 
5.4.3.1 Performance Summary for Mismatch Variation Simulation 
The performance summary of the proposed and conventional FCAs under mismatch 
variations is shown in Table. 5.6. Compared with the conventional FCA, the proposed FCA’s 
CUE is improved from 43% to 80% by significantly reducing its bias current in the cascode 
stage. Due to the significantly reduced bias current in the proposed FCA’s cascode stage, the 
integrated noise, offset and gain performance of the proposed FCA are also respectively 
improved by 27%, 29% and 6dB.  More importantly, the average Ts_0.1% and Ts_0.01% are 
also both improved by 34% in the proposed FCA. The significant supply current reduction and 
moderate improvement on settling time, noise, offset voltage and DC gain under mismatch 
variations clearly demonstrate the advantages and robustness of the proposed FCA.  
 
 
 
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
0.34
0.44
0.54
0.64
0.74
0.84
0.94
1.04
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response Under Mismatch Variation
 
 
Input voltage Conv. group Prop. group
136 
 
Table 5.6: Performance summary of the prop. and conv. FCA under mismatch variation 
  Proposed  Conventional 
Output Unit Mean Stdev Mean Stdev 
Vos mV -0.14 2.092 -0.14 2.94 
GBW MHz 2.14 0.044 2.04 0.033 
PM degree 69.78 0.606 74 0.6 
DC Gain dB 89.68 0.069 83.5 0.9 
SR_avg V/µs 3.66 0.434 1.23 0.024 
Ts_0.1% µs 0.50 0.013 0.75 0.02 
Ts_0.01%  µs 0.72 0.021 1.08 0.02 
Vni @ 100KHz nV/sqrt(Hz) 68.1 0.535 88.5 1.85 
Vni integrated to 2MHz µV 93.22 0.633 127.6 1.88 
FOMs pF*MHz/µA 1.14 0.015 0.57 0.026 
FOML pF*V/µA-µs 1.95 0.233 0.35 0.015 
FOMTs_0.1%  pF/µA-µs 1.08 0.025 0.38 0.013 
FOMTs_0.01% pF/µA-µs 0.74 0.019 0.27 0.011 
Isupply µA 1.88 0.032 3.5 0.19 
Iwaste µA 0.38 0.005 2.0 0.177 
Itail µA 1.50 0.032 1.50 0.032 
Iwaste/Itail % 25.26 0.37 134 11.8 
CUE (Itail/Isupply) % 80 1.7 42.8 0.16 
CL pF 1.00 NA 1.00 NA 
Vsupply V 1.8 NA 1.8 NA 
Process  180nm CMOS 
5.4.4 Process corner plus mismatch variation simulation results 
In this section, the two designed FCAs are simulated under both process corner and 
mismatch (P.Mis) variations via the 500-run Monte Carlo simulation. The purposes of the 
simulations are twofold: a) to verify the robustness of the proposed FCA under P.Mis variations; 
and b) to confirm the advantages of the proposed FCA under P.Mis variations. The simulated 
performance discussed in this section is the transient response. The FOMs, FOML, FOMTs_0.1%, 
and FOMTs_0.01% of the FCAs are also reported.  
137 
 
 
Figure 5.21: Transient responses of the prop. and conv. FCAs under P.Mis. variation  
Figure 5.21 shows the simulated transient responses of the proposed and conventional FCAs 
under P.Mis variations. Figure 5.22 and Figure 5.23 respectively show the histograms of the 
average Ts_0.01% of the proposed and conventional FCAs under P.Mis variations.  
The average SRs and settling times of both two FCAs show normal distributions. The (mean, 
sigma) of the proposed and conventional FCAs’ SRs are respectively (3.61V/µs, 0.49V/µs) 
and (1.23V/µs, 0.024V/µs). The (mean, sigma) of the proposed and conventional FCAs’ 
Ts_0.1% are respectively (0.5µs, 0.014µs) and (0.75µs, 0.02µs). In addition, the (mean, sigma) 
of the proposed and conventional FCAs’ Ts_0.01% are respectively (0.72µs, 0.027µs) and 
(0.108µs, 0.021µs). The (mean, sigma) of the proposed and conventional FCAs’ offset voltages 
are respectively (-0.12mV, 2.2mV) and (0.11mV, 3.1mV). Most importantly, all the 
performance improvement brought by the proposed FCA is achieved yet with much smaller 
power consumption which is about 53.7% of the conventional FCA. Therefore, the (mean, 
sigma) of the proposed and conventional FCAs’ FOMs are (1.14 pF*MHz/µA, 0.02 
pF*MHz/µA) and (0.56 pF*MHz/µA, 0.026 pF*MHz/µA). The (mean, sigma) of the two 
FCAs’ FOML are (1.92 pF*V/µA-µs, 0.26 pF*V/µA-µs) and (0.35 pF*V/µA-µs, 0.015 
pF*V/µA-µs). The (mean, sigma) of the two FCAs’ FOMTS_0.01%   are (0.74 pF/µA-µs, 0.026 
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
0.34
0.44
0.54
0.64
0.74
0.84
0.94
1.04
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response Under Process and Mismatch Variation
 
 
Input voltage Conv. group Prop. group
138 
 
pF/µA-µs) and (0.266 pF/µA-µs, 0.012 pF/µA-µs). Therefore, compared with the conventional 
FCA, the average improvement of FOMs, FOML and FOMTS_0.01%  are respectively 2 times, 5.5 
times and 2.8 times.  
 
Figure 5.22: Average Ts_0.01% of the proposed FCA under P.Mis. variation 
 
 
Figure 5.23: Average Ts_0.01% of the conventional FCA under P.Mis. variation 
 
5.4.4.1 Performance Summary for P.Mis Variation 
The performance summary of the proposed and conventional FCAs are shown in Table 5.7. 
Compared with the conventional FCA, the proposed FCA not only reduces its power 
consumption but also improves its settling, noise, offset and DC gain performance under P.Mis 
0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78
0
50
100
150
 Ts_0.01% of prop. FCA (s)
H
it
s
 Ts_0.01% of prop. FCA (s)
 
 
Prop.
1.02 1.04 1.06 1.08 1.1 1.12 1.14
0
20
40
60
80
100
120
 Ts_0.01% of conv. FCA (s)
H
it
s
 Ts_0.01% of conv. FCA (s)
 
 
conv.
139 
 
variations. In addition, the superiority of the proposed FCA is very robust under process corner 
and device random mismatch variations.  
Table 5.7: Performance summary of the prop. and conv. FCA under P.Mis variation 
  Proposed  Conventional 
Output Unit Mean Stdev Mean Stdev 
Vos mV -0.12 2.175 0.11 3.11 
GBW MHz 2.14 0.044 1.98 0.034 
PM degree 69.77 0.639 73.8 0.64 
DC Gain dB 89.44 0.510 82 3.4 
SR_avg V/µs 3.61 0.494 1.23 0.024 
Ts_0.1% µs 0.50 0.014 0.75 0.02 
Ts_0.01% µs 0.72 0.027 1.08 0.02 
Vni @ 100KHz nV/sqrt(Hz) 68.07 0.777 88.6 1.9 
Vno integrated to 2MHz µV 93.20 0.947 128 2.0 
FOMs pF*MHz/µA 1.14 0.019 0.56 0.006 
FOML pF*V/µA-µs 1.92 0.262 0.35 0.003 
FOMTs_0.1%  pF/µA-µs 1.08 0.028 0.38 0.003 
FOMTs_0.01% pF/µA-µs 0.73 0.022 0.265 0.004 
Isupply µA 1.88 0.028 3.5 0.19 
Iwaste µA 0.38 0.005 2.0 0.177 
Itail µA 1.50 0.028 1.50 0.028 
Iwaste/Itail % 25.25 0.362 134.9 11.7 
CUE (Itail/Isupply) % 80 1.7 42.8 0.16 
CL pF 1.00 NA 1.00 NA 
Vsupply V 1.8 NA 1.8 NA 
Process  180nm CMOS 
5.5. Performance Comparison of This Work with the literature 
Table 5.8 summarizes the performance of the proposed FCA compared with the conventional 
FCA and [6] in a typical corner at room temperature. Compared with [6] and the conventional 
FCA, the proposed FCA reduces its Iwaste by 2.89 times and 5.3 times, which consequently 
increases its CUE by 1.33 times and 1.9 times respectively. In addition, as byproducts of 
minimizing the bias current of the proposed FCA’s cascode stage, its random offset voltage is 
reduced to 82.5% of [6] and 73.8% of the conventional FCA. Similarly, the proposed FCA’s 
140 
 
integrated noise from 0.01Hz to 2MHz is reduced to 82.5% of [6] and 72.9% of the 
conventional FCA.  
Table 5.8: Performance comparison of the proposed FCA to the state-of-the-art method 
and the conventional FCA 
Output Unit This work [6] Conv. FCA 
Vos mV 2.18 2.64 2.95 
GBW MHz 2.14 2.2 2.0 
PM degree 70 70 74 
DC Gain dB 89.7 92.75 83.5 
Isupply µA 1.88 2.6 3.5 
Iwaste µA 0.38 1.1 2.0 
Itail µA 1.50 1.50 1.50 
Iwaste/Itail % 25.25 73.3 133.3 
CUE (Itail/Isupply) % 80 60 43 
SR_avg V/µs 3.66 1.355 1.23 
Ts_0.1% µs 0.50 0.855 0.75 
Ts_0.01% µs 0.72 1.6 1.08 
Vni @ 100KHz nV/sqrt(Hz) 68.07 82.2 88.5 
Vno integrated to 2MHz µV 93.20 113.0 127.5 
FOMs pF*MHz/µA 1.14 0.87 0.565 
FOML pF*V/µA-µs 1.92 0.52 0.352 
FOMTs_0.1% pF/µA-µs 1.08 0.449 0.38 
FOMTs_0.01% pF/µA-µs 0.73 0.24 0.268 
FOMnoise (V/V)
2   2.6  3.6  5.0 
CL pF 1.00 1.00 1.00 
Process  180nm CMOS 
Moreover, Ts_0.1% of the proposed FCA is reduced to 58% of [6] and 66.7% of the 
conventional FCA, whereas Ts_0.01% of the proposed FCA is reduced to 45.0% of [6] and 
67.3% of the conventional FCA. The reason why the proposed FCA’s settling time is much 
shorter than [6] is that the proposed FCA completely eliminates the long recovery time after 
slewing phase completes while [6] does not when Ib<Itail/4 in Figure 5.2. The complex 
frequency compensation of [6] also degrades its settling time performance. Compared to [6], 
the proposed FCA does not need any frequency compensation. This not only makes the FCA 
141 
 
design much simpler but also saves a considerable amount of area that would have been 
consumed by compensation capacitors and resistors. In terms of figure of merits, in comparison 
with [6], the proposed FCA increases FOMs, FOML, FOMTs_0.1% and FOMTs_0.01% by 1.31, 3.69, 
2.4 and 3.04 times. Compared with the conventional FCA, the proposed FCA improves FOMs, 
FOML, FOMTs_0.1% and FOMTs_0.01% by 2.0, 5.5, 2.86 and 0.27 times. In addition, the FOMnoise 
of the proposed FCA is also reduced to 71% of [6] and 52% of the conventional FCA.  
The simultaneous performance improvement on CUE and settling time by the proposed FCA 
demonstrates its clear advantages over [6] and the conventional FCA.  
5.6. Discussion 
In summary, compared to [6], the proposed FCA design has the following benefits. 
1) No long recovery time is needed even when the cascode stage’s bias current is only 
1/12 of Itail. The method in [6] starts to suffer from a long recovery time when its 
cascode stage’s bias current becomes less than 0.5*Itail.  
2) The design involves much less complexity given that the complex frequency 
compensation in [6] is omitted. 
3) Area consumption decreases significantly given that no large compensation 
capacitors are used. 
4) Power consumption for the cascode stage is lowered because the nondominant poles 
associated with the differential-to-single-ended conversion circuit in the proposed 
FCA are at higher frequencies. 
5) The proposed design has good compatibility with potentially additional gain 
enhancement circuits mentioned in Chapter 2. 
142 
 
 
Figure 5.24: A circuit to reduce leakage current of M14 in the turn-around stage 
There is also a potential limitation to the proposed FCA design when a very high DC gain is 
needed. As mentioned in Section 5.4.2, transistor M14 in Figure 5.8 could be weakly on at the 
corner of fast NMOS when T=85oC. When M14 is on in the quiescent operation, the output 
impedance of the FCA is reduced, which ultimately limits the largest achievable DC gain. This 
can be solved by replacing M14 with a very high threshold voltage device if it is available in 
the process. Otherwise, the leakage issue can be solved by adding transistors M12 and M24 to 
the circuit as shown in Figure 5.24. In the quiescent operation, M13 works in the triode region 
and M12 works in the cutoff region so the gate voltage of M14 is Vss. This minimizes the 
leakage current from M14 so as to increase the largest achievable DC gain. As for the large-
signal operation of the circuits in Figure 5.24, it functions similarly to the turn-around stage in 
Figure 5.8. This circuit will be further discussed in length in Chapter 6.  
M13
M14
Vb3
M18M11
M15
Vb3
M19
Vb2
1:n
Vss
M12
CL
Vdd
Vss
M16
M17
M20
Vb1
Vb4
VX
M24
Reduce M14 leakage
143 
 
5.7. Summary 
A new and simple turn-around stage to effectively improve a FCA’s current utilization 
efficiency (CUE) has been introduced. The proposed FCA does not suffer from a long recovery 
time though the FCA’s bias current is only 8.3% of the FCA’s tail current. In addition, the 
settling performance of the proposed FCA is also improved due to larger average slew rate 
(SR) brought out by the new turn-around stage. Furthermore, as byproducts from a reduced 
bias current in the cascode stage, the noise and offset of the proposed FCA are also improved. 
Compared to [6], the proposed FCA increases CUE and SR by 1.33 and 2.7 times. The 
proposed FCA’s settling time with 0.1% accuracy and 0.01% are decreased to 58% and 45% 
of [6]. The theoretical calculations for the proposed FCA highly agree with its simulation 
results.  
Due to its design simplicity, high CUE, low noise, and low offset voltage, the proposed 
FCA is well suitable for applications and systems where FCAs are used as single-stage 
amplifiers or the first stage in multi-stage amplifiers. The applications include but not limited 
to switched-capacitor circuits, battery monitoring circuits, load current sensing circuits, LDO 
error amplifiers, and sigma-delta ADC. 
5.8. References 
[1]. R. S. Assaad and J. Silva-Martinez, "The Recycling Folded Cascode: A General 
Enhancement of the Folded Cascode Amplifier," IEEE Journal of Solid-State Circuits, 
vol. 44, no. 9, pp. 2535-2542, Sept. 2009 
[2]. P. Y. Wu, V. S.-L. Cheung, and H. C. Luong, “A 1-V 100-MS/s 8-bit CMOS switched-
opamp pipelined ADC using loading-free architecture,” IEEE J. Solid-State Circuits, 
vol. 42, no. 4, pp. 730–738, Apr.2007. 
144 
 
[3].  PE. Allen and DR. Holberg, CMOS analog circuit design, Second Edition, pp.307, 
Oxford Univ. Press; 2002. 
[4].  B. Razavi, Design of analog CMOS integrated circuits, International Edition, pp. 458, 
2001 
[5].  W. Sansen, Analog design essentials, Vol. 859, pp. 141, Springer Science & Business 
Media, 2007. 
[6].  R. Eschauzierand and NV. Rijn. "Apparatus and method for a compact class AB turn-
around stage with low noise, low offset, and low power consumption," U.S. Patent No. 
6,624,696. 23 Sep. 2003. 
[7].  C. C. Enz and G. C. Temes, "Circuit techniques for reducing the effects of op-amp 
imperfections: autozeroing, correlated double sampling, and chopper stabilization," 
in Proceedings of the IEEE, vol. 84, no. 11, pp. 1584-1614, Nov 1996. 
 
 
 
 
 
 
 
145 
 
 COMBINED PERFORMANCE 
ENHANCEMENT TECHNIQUES FOR FOLDED 
CASCODE AMPLIFIERS 
Many applications such as continuous-time sigma delta ADCs require an op amp with high 
gain, high slew rate, low noise, low offset, low power and large input common mode range 
(ICMR). In this chapter, a single-stage folded cascode amplifier (FCA) is designed with these 
three techniques combined: the proposed gain enhancement (GE) technique in Chapter 2, the 
slew rate enhancement (SRE) technique in Chapter 3 and the current utilization efficiency 
(CUE) enhancement technique in Chapter 5. The purposes of combining the techniques are 
twofold: a) confirm that these three proposed techniques are compatible; and b) confirm that a 
FCA with the combined techniques can simultaneously have high DC gain, high slew rate, low 
noise, low offset and low power. 
6.1. Schematic Design 
Figure 6.1 shows the schematic of the proposed FCA combining techniques of GE, SRE and 
CUE enhancement. The proposed FCA consists of a FCA core formed by transistors M0-M10, 
a GE circuit formed by transistors M21-M23, an additional turn-around stage formed by 
transistors M12-M14, and a negative SRE circuit. The negative SRE circuit is shown in Figure 
6.2.  
The additional turn-around stage is normally off and is only activated during the FCA’s 
positive slewing phase. Such design allows the FCA’s current conveyance capability to be 
greatly enhanced during the positive slewing phase while at the same time keeping the bias 
current consumption of the turn-around stage to a minimum and generating very low noise and 
offset voltage. As a result, the bias current of the FCA’s cascode stage can be reduced to a 
current much smaller than Itail. The cascode stage’s bias current is annotated as 2𝛼*Itail, where 
146 
 
Itail is the drain current of transistor M0. The smaller 𝛼 is, the less the noise, offset voltage and 
power consumption of the FCA are. However, 𝛼 cannot be indefinitely small because it affects 
the frequencies of the nondominant pole associated with node Vx as discussed in Chapter 5. 
Therefore, a proper value of 𝛼 must be selected. In this design, 𝛼=1/12.  
 
Figure 6.1: Schematic of the proposed FCA with gain, slew rate and CUE enhancement 
 
Figure 6.2: Schematice of the negative SRE circuit for the proposed FCA  
 
Vb4
Vss
Vdd
Vb2
Vin- Vin+
Vb1
M0
M1 M2
M9
M8
M5 M6
M10
M4M3
Vo
M7
CL
Itail
Vb1
M13
M14
Vb3
M18M12
Vb1
M15
M16
Vb3
M17
M19
Vb2
2𝛼 *Itail 2𝛼 *Itail
SRE+ Large signal path
𝛼 *Itail I=0 I=0
1:n
M20
M24
M25
M23
Vb1
Vb2
𝛼 *Itail
Vfb
Vfb
SRE- 
Circuit
Gain enhancement path
Vss
M21
Fold cascode amplifier core Additional turn-around stage
M22
Vx
①
② Vy
0.4*Ibias=λ*𝛼Itail
M31
M32
CL
Vdd
Vss
M37
Vb3
1               1
M41
M38
Vb3
Vo
Ibias 2Ibias
M34
M35 M36
 1
Vin- Vin+
M39 M40
M44
M42
M43
M45
2.4Ibias1                1  1  1  4
Negative slew rate enhancement circuit
Vb4
M46
147 
 
In the FCA, there are three signal paths from the FCA’s inputs to its output. The first signal 
path, as shown by the blue arrow line, always conducts signal to the output whenever a 
differential input voltage exists. But the second signal path, as marked by the green arrow line, 
is activated only when Vid>Von_pos or ∆Vx >∆Vx,on_pos. Vid is the differential input voltage. 
∆Vx_pos is the positive voltage change at Vx node upon application of a positive Vid at the input 
pair. Von_pos and ∆Vx,on_pos are respectively the positive threshold voltages of Vid and ∆Vx 
required to activate the turn-around stage. The third signal path, as marked by the red line, is 
the negative SRE path. This path is activated only when Vid<Von_neg. Similar to the definition 
of Von_pos, Von_neg is the negative threshold voltage of Vid required to activate the negative SRE 
circuit. The details about the workings of the signal paths in the quiescent, small-signal and 
large-signal operations are discussed below.  
Transistor M13 is designed to carry half of the bias current as transistor M8 but with the 
same size. As a result, M13 works in the triode region in the quiescent operation, which leads 
to a low drain source voltage for M13 or makes Vy approximate Vx. When the DC bias voltage 
of Vx is kept less than transistor M21’s threshold voltage, transistor M21 works in the cutoff 
region. As a result, transistor M22 works in the triode region and transistor M14 works in the 
cutoff region. Therefore, Vz approximates Vss and the turn-around stage is off in the quiescent 
operation. Vz is so close to Vss that transistor M14’s leakage current is minimized, which 
consequently improves the maximum achievable DC gain for the proposed FCA. 
Upon application of a positive differential input signal, Vid, the source voltage of transistor 
M13 would increase by ∆Vx. Transistor M13 stays in the triode region, and transistor M21 
stays in the triode region, and the turn-around stage remains off before Vid and ∆Vx become as 
big as Von_pos and ∆Vx,on_pos respectively. When Vid=Von_pos and ∆Vx =∆Vx,on_pos, the operation 
148 
 
regions of transistors M13, M21 and M22 transit from their original operation regions (triode, 
cutoff, triode) to the saturation region. As a result, any Vid>Von will quickly increase the drain 
voltage of M13 and the drain current of M21, which turns on the turn-around stage. Therefore, 
the boundary between the disabling and enabling the turn-around stage can be approximately 
marked by the transitions of M13, M21 and M22’s operation regions from their quiescent 
operation regions to the saturation region. At this transition point, the drain current of M8 and 
M13 are respectively expressed as (6-1) and (6-2), where β8=µnCoxW8/L8 and 
β13=µnCoxW13/L13. Also, Vod8 and ∆Id8 are respectively transistor M8’s overdrive voltage and 
drain current change. The value of λ is 0.4*Ibias/(𝛼*Itail) =0.2. By dividing (6-1) by (6-2) and 
substituting β13=β8, it is found that ∆Id8=-(1+λ) * 𝛼 Itail≈-0.1* 𝛼Itail and ∆Vx,on_pos=1-
√0.5(1 − λ)≈0.37*Vod8≈26mV. At the transition point, M14 is still off and the drain current 
change of M8 and M13 comes from the input differential pair. Therefore, the input referred 
turn-on voltage, Von, for the turn-around stage is calculated as (6-3) by solving the KCL 
equation at M13’s source node. In (6-3), gm1 and Vod1 are respectively the transconductance 
and overdrive voltage of transistor M1. In addition, A4 and A3 are respectively the aspect ratios 
of transistors M4 and M3. In this design, A4/A3=5/4. As a result, Von is found to be about 7mV, 
assuming that Vod1 is in the neighborhood of 70~80mV.  
(Vod8 − ∆Vx,on_pos )
2
∗ 0.5β8 = 2α ∗ Itail + ∆Id8 (6-1) 
(Vod8 − ∆Vx,on_pos)
2
∗ 0.5β13 = (Vod8 − ∆Vx,on_pos)
2
∗ 0.5β8 = αItail(1 − λ) (6-2) 
Von_pos = −
∆Id8 + ∆Id13
gm1
2 (1 +
A4
A3
)
=
α(1 + 2λ)Itail
Itail
2Vod1
(
A4
A3
+ 1)
=
0.23 ∗ Vod1
(
A4
A3
+ 1)
≈ 7.8mV 
 
(6-3) 
When Vid increases to a point where Vid>Von_pos, transistor M14 turns on, transistor M13 
works in the saturation region, and the negative feedback loop formed by transistors M21-M22 
149 
 
and M13- M14 is activated. As a result, ∆Vx stays as ∆Vx,on regardless of the differential current 
from M1 and M2, Idm, because the negative feedback loop makes M14 compensate Idm. 
Therefore, in the positive slewing phase, the drain currents of transistors M8 and M14 
respectively become 𝛼Itail(1-λ) =Itail/15 and Itail[1+(1-λ-A4/A3)*2𝛼]≈Itail. This enhances the 
positive slew rate of the FCA. Once the FCA’s output voltage decreases to a point where 
Vid<Von_pos, the FCA’s turn-around stage gets deactivated and transistor M13 returns to work 
in the triode region. One thing to note is that transistor M8 always holds a small drain current 
of (1-λ)*Itail=Itail/15, which prevents M8 from ever turning off and keeps the voltage change at 
Vx as small as 0.37*Vod8≈26mV. As a result, input transistor M1 does not work in the triode 
region in the slewing phase even when the input common mode voltage (ICMV) is close to the 
negative supply rail. Therefore, although the proposed FCA has an extremely small cascode 
bias current, it does not require a long time for the current to recover after the slewing phase 
completes, since a long recovery time is generally caused by either transistor M8 working in 
the cutoff region or transistor M1 working in triode region but neither condition applies to the 
proposed FCA. As a matter of fact, the settling time is slightly improved because the positive 
SR is increased by setting the current mirror ratios of M14-to-M15 and M20-to-M18 as larger 
than 1.  
In the negative slewing phase, transistor M2 steers all the tail current into transistor M3, and 
then transistor M4 passes the mirrored current to discharge the load capacitor via transistor M8. 
In this slewing phase, the drain currents of transistors M8 and M10 are respectively 
[A4/A3*(2𝛼+1)-𝛼]*Itail and 2𝛼*Itail, which results in a net discharging current of 
[A4/A3*(2𝛼+1)-3𝛼]*Itail to the load capacitor. This discharge current is slightly larger than that 
of the conventional FCA. The conventional FCA’s discharging current is Itail when its cascode 
150 
 
bias current is larger than 0.5*Itail. More importantly, in the negative slewing phase, the 
negative SRE circuit shown in Figure 6.2 also turns on to increase the transient discharging 
current to the load capacitor. The details about the operation principles of the negative SRE 
circuit are described next.  
The negative SRE circuit of the FCA is shown in Figure 6.2. As can be seen, the quiescent 
bias currents of transistors M31, M32, M35, M37 and M41 are the same, Ibias. Transistor M37’s 
source voltage is designed to be less than transistor M38’s threshold voltage in the quiescent 
operation. As a result, transistors M38, M39 and M40 work in the cutoff region in the quiescent 
operation. Transistors M35 and M36 are a matched input pair, so the M36’s drain current in 
the quiescent operation is the same as M35’s bias current. As a result, the total drain current of 
transistors M36 and M42 is 2* Ibias. This 2* Ibias is smaller than the intended bias current of 
transistor M46, 2.4*Ibias when M46 works in the saturation region. Therefore, transistors M46, 
M42 and M43 work in the triode, triode and cutoff regions respectively, which ensures zero 
bias current in transistor M43 to keep the negative SRE circuit off in the quiescent operation.  
However, upon application of a negative differential input signal to the input pair, Vid, 
transistor M36’s drain current increases while transistor M35’s drain current stays the same as 
Ibias. The reason is that the negative feedback loop formed by M32, M34, M35, M37 and 2*Ibias 
always adjusts transistor M34’s drain current to maintain transistor M35’s drain current as a 
constant of Ibias. When Vid decreases to a point, Von_neg, that the drain current of M36 increases 
by 0.4*Ibias, the operating regions of transistors M46 and M42 transit from the triode region to 
the saturation region. Any further increase of M36’s drain current caused by further increase 
of Vid flows into transistor M43 and is then amplified by the aspect ratio of transistors M44 to 
M43. The amplified drain current is passed to the load capacitor CL via transistors M44 and 
151 
 
M45. Therefore, the boundary between enabling and disabling the negative SRE circuit can be 
marked by M46 and M42’s operation regions transitioning from the triode region to the 
saturation region. According to this definition, the input referred turn-on threshold voltage of 
the negative SRE circuit, Von_neg, is calculated as (6-6) by solving (6-4) and (6-5), where Vod36 
is transistor M36’s quiescent overdrive voltage. Assuming Vod36 is about 70mV~80mV, the 
calculated Von_neg is about -12mV.  
(Vod36 − ∆Von_neg )
2
∗ 0.5β36 = 1.4 ∗ Ibias (6-4) 
(Vod36 )
2 ∗ 0.5β36 = 1.0 ∗ Ibias (6-5) 
∆Von_neg = (1 − √1.4) ∗ Vod36 ≈ −12mV  (6-6) 
     In order to improve the DC gain of the proposed FCA, a GE circuit via conductance 
cancellation is implemented as shown in Figure 6.1. The forward path of the GE circuit is 
formed by transistors M23-M25 and the feedback path reuses transistors M7 and M3-M4 in 
the FCA core to form a flipped voltage attenuator (FVA). In the GE forward path, the voltage 
change at Vx node is sensed and shifted up to voltage Vfb via the level shifter formed by 
transistors M23-M25. In the GE feedback path, voltage Vfb feedbacks to the Vx node through 
the FVA. The voltage gain from Vfb to node 2’s voltage, V2, is calculated as (6-7), with which 
the generated negative conductance is derived as (6-8). As a result, the net conductance looking 
down from the source of M8, gx, is obtained as (6-9). In order to maximize the FCA’s DC gain, 
gx should be designed to be close to zero but slightly negative.  
V2
Vfb
≈ −
gm7(gds2 + gds3)
(gm7 + gds7)gm3
≈ −
gds2 + gds3
gm3
 (6-7) 
gneg =
V2
Vfb
∗ gm4 = −
gds2 + gds3
gm3
∗ gm4 ≈ −gds4 − gds2 ∗
A4
A3
 (6-8) 
152 
 
gx = gds4 + gds1 + gds12 + gneg ≈ gds12 − gds2 (
A4
A3
− 1)
= gds12 −
gds2 ∗ 2α
0.5 + 2α
 
(6-9) 
6.2. Frequency Response Analysis 
In order to understand the frequency response of the proposed FCA in Figure 6.1, its small 
signal block diagram is drawn in Figure 6.3. In the following analysis, the following 
assumptions are made:  
1) The transconductance of transistors M1-M13 and M23-M25 are much larger than their 
conductance counterpart. For example, gmi>>gdsi, where gmi and gdsi are respectively 
transistor Mi’s transconductance and conductance.  
2) The amount of parasitic capacitance at node VX and V1 are the same 
3) Load capacitor, CL, is much larger than the parasitic capacitance at the FCA’s internal 
nodes. For example, CL>>C1, C2 and CX 
 
Figure 6.3: Small signal block diagram of the proposed FCA  
 
The transfer function from Vx to Vfb, u(s), is found as (6-10). The transfer function has one 
pole and one zero, located at frequencies of 0.5*fT and fT respectively, where fT is the unity 
Vid/2 V1
C1g1
-gm1
gm7V1
C2g2 Cxgx
-gm3
-Vid/2 -gm1
V2 Vx
gds8
gm8Vx
CLgL
gds7
-gm4
Vo
u1(s)+gm7
153 
 
current gain frequency of transistor M23. Since transistor M23’s fT is about 100MHz in this 
design and is much higher than the proposed FCA’s GBW (2.4MHz), the transfer function, 
u(s), can be simplified as 1 for frequencies less than the GBW. In order to derive the transfer 
function from the FCA’s inputs to output, 
Vo
Vid
 , KCL equations at nodes V1, V2, Vx and Vo are 
derived and written as (6-11) to (6-14), where gi and Ci are respectively the impedance and 
parasitic capacitance at node i. The expressions of g1, g2, gx, gL, C1, C2 and Cx are shown in 
Table 6.1. After solving the KCL equations (6-11) to (6-14), the transfer function 
Vo
Vid
 is derived 
as (6-15) and rewritten as (6-16). Equation (6-16) is further simplified as (6-17) by substituting 
(6-18) into (6-16).  
Table 6.1: Expressions of the conductance and capactance in the proposed FCA 
g1=gds2+gds3 C1≈Cdb2+Cgd2+Cdb3+Cgd3+Cgs7 
g2≈gds5gds9/gm9 C2≈ Cgs3+ Cgd3+ Cgs4+ Cgd4 
gx≈gds1+gds4+gds11 Cx≈ Cdb1+Cgd1+Cdb4+Cgd4+Cgs8+ Cgs13+Cgd14+Cgd14 
gL≈ gds6gds10/gm10+gxgds8/gm8  
u(s) =
Vfb
Vx
≈
(1 + s
 Cgs23
gm23
)
1 + s
 Cgs23 + Cgs7
gm23
≈
(1 + s
 Cgs23
gm23
)
1 + s
 2Cgs23
gm23
= 1 (6-10) 
gm1Vid
2
+ V1(g1 + sC1) + gm7V1 + gds7(V1 − V2) + gm3V2 − gm7Vx ∗ u(s) = 0 (6-11) 
V1 ∗ gm7 + (V1 − V2) ∗ gds7 − V2(g2 + sC2) = 0 (6-12) 
V2 ∗ gm4 + Vx(gm8 + gds8 + gx + sCx) − Vo ∗ gds8 −
Vid
2
∗ gm1 = 0 (6-13) 
Vo(gL + gds8 + sCL) = Vx(gm8 + gds8) (6-14) 
Vo
Vid
≈
0.5gm1gm8[s
2C1C2 + gm7sC2 + (gm3 + gm4)gm7 + gm7gm4u(s)]
(gL + sCL)(gm8 + sCx)[s2C1C2 + sgm7C2 + gm3gm7 + gm7gm4u(s)]
 (6-15) 
154 
 
Vo
Vid
≈
gm1
2 ∗ gL
(1 + s
CL
gL
) (1 + s
Cx
gm8
)
∗
s2 +
gm7
C1
s +
(gm3 + 2gm4)gm7
C1C2
s2 +
gm7
C1
s +
(gm3 + gm4)gm7
C1C2
 (6-16) 
Vo
Vid
=
gm1
2 ∗ gL
(1 + s
CL
gL
) (1 +
s
k2GBW
)
∗
s2 + k2GBWs + k1k2(1 + 2k3)GBW
2
s2 + k2GBWs + k1k2(1 + k3)GBW2
 (6-17) 
k1 =
gm3
C2
GBW
,    k2 =
gm7
C1
GBW
=
gm8
Cx
GBW
,   k3 =
gm4
gm3
, GBW =
gm1
CL
∗
k3 + 1
2
 (6-18) 
As can be seen from (6-17), there are four poles and two zeros in this proposed FCA’s 
transfer function. The frequencies of all the poles and zeros are respectively calculated as (6-
20) to (6-24). Because drain current of transistor M7 is much smaller than that of transistor 
M3, k1>k2 and (
k2
2
)
2
< k1k2. As a result, the poles and zeros (Pnd2, Pnd3, Znd1 and Znd2) are 
complex poles and zeros. The natural frequencies of the complex poles and zeros are 
respectively 𝐺𝐵𝑊 ∗ √k1k2(1 + k3)  and 𝐺𝐵𝑊 ∗ √k1k2(1 + 2k3) , which are higher than 
those of the proposed FCA in Chapter 5 because k3>0. The results of the higher frequencies of 
the nondominant poles and zeros are brought by the additional GE path in this proposed FCA. 
In addition, compared with the proposed FCA in Chapter 5, this proposed FCA also has a 
higher frequency of Pnd1 due to its larger bias current in its cascode stage. The distribution of 
all the poles and zeros of this proposed FCA in S-plane is shown in Figure 6.4. As can be seen, 
the complex poles are with a lower natural frequency and a lower Q-factor compared with the 
complex zeros. But the complex poles and zeros are so close to each other that the phase drop 
caused by them is minimal, as calculated by (6-25). Figure 6.5 illustrates the dependency of 
the phase drop on the ratio of k2 to the FCA’s GBW, from which we can see that the phase 
drop is less than 2.5 degrees even when k1=2*k2 and k2 is as low as 2. The entire FCA’s phase 
155 
 
margin is calculated as (6-26) and its dependency on the ratio of k2 to the FCA’s GBW is 
shown in Figure 6.6. In this design, k1=2, k2=4 and k3=1.25. Therefore, the expected phase 
margin of the op amp is about 71 degrees.  
 
Figure 6.4: Distribution of the proposed FCA’s poles and zeros 
Pd1 = −
gL
CL
 (6-19) 
Pnd1 = −
gm8
Cx
= −k2 ∗ GBW (6-20) 
Pnd2 = −GBW ∗ (
k2
2
− √(
k2
2
)
2
− k1k2(1 + k3)) (6-21) 
Pnd3 = −GBW ∗ (
k2
2
+ √(
k2
2
)
2
− k1k2(1 + k3)) (6-22) 
Znd1 = −GBW ∗ (
k2
2
− √(
k2
2
)
2
− k1k2(1 + 2k3)) (6-23) 
Znd2 = −GBW ∗ (
k2
2
+ √(
k2
2
)
2
− k1k2(1 + 2k3)) (6-24) 
∅ = −tan−1 {
k2
k1k2(1 + k3) − 1
} + tan−1 {
k2
k1k2(1 + 2k3) − 1
} (6-25) 
jω 
Pnd1
Pnd2
Pnd3
Znd1
Znd2
Pd1 σ 
156 
 
PM = 90 − tan−1 (
1
k2
) −tan−1 {
k2
k1k2(1 + k3) − 1
} + tan−1 {
k2
k1k2(1 + 2k3) − 1
} (6-26) 
 
Figure 6.5: Phase drop due to complex poles and zeros vs. k1 and k2 
 
 
Figure 6.6: The FCA’s PM vs. k1 and k2 
6.3. Noise Analysis 
The noise of the proposed FCA is analyzed in comparison with the conventional fast FCA 
in Figure 5.5(b) so as to understand the noise reduction brought by the bias current reduction 
in the cascode stage of the proposed FCA. The noise model of the proposed FCA is shown in 
2 3 4 5 6 7 8
-2.5
-2
-1.5
-1
-0.5
0
k2=g
m2
/C
1
/GBW
C
o
m
p
le
x
 p
o
le
&
z
e
ro
 p
a
ir
s
 p
h
a
s
e
 d
ro
p
Complex pole&zero pairs phase drop vs. k2 and k1
 
 
k1=2*k2
k1=3*k2
k1=4*k2
k1=5*k2
k1=6*k2
2 3 4 5 6 7 8
60
65
70
75
80
85
k2=g
m2
/C
1
/GBW
O
p
 a
m
p
 p
h
a
s
e
 m
a
rg
in
Phase Margin vs. k2 and k1
 
 
k1=2*k2
k1=3*k2
k1=4*k2
k1=5*k2
k1=6*k2
157 
 
Figure 6.7 after neglecting the noise contributed by the cascode transistors and the transistors 
working in the cutoff region. The proposed FCA’s output current noise power is derived as (6-
27), where a transistor’s voltage noise power is expressed as (6-28). The 
8KT
3gmi
 and  
Kf
WiLiCoxf
 in 
(6-28) respectively represent a transistor’s thermal and flicker noise. The transistors in current 
mirrors are typically sized to have the same length and current density. Consequently, their 
widths and transconductance linearly scale with their bias currents. Therefore, their voltage 
noise power is linearly proportional to their bias currents, whereas their current noise power is 
inversely proportional to their bias currents, as shown in (6-28) and (6-29). As a result, the 
noise expression in (6-30) can be established. After plugging (6-30) into (6-27), the equation 
(6-27) is simplified as (6-31). Equation (6-31) is further simplified as (6-32) by neglecting the 
term of 
In5
2
8α
(k3 − 1)
2 because this term is much smaller than In5
2 (k3
2 + 2). As a result, the input 
referred voltage noise power of the FCA is derived as (6-33).  
 
Figure 6.7:  Noise model for the proposed op amp 
 
Vb3
Vss
Vdd
Vb2
Vb1
M0
M1 M2
M9
M8
M5 M6
M10
M4M3
Vo
M7
CL
Itail
M13
Vb3
M11
Vb1
2𝛼 *Itail
Vx
Vy
2𝛼 *Itail  𝛼*Itail
①
②*
en1
2
*
en2
2
*
en3
2
*
en4
2
*
en5
2
*
en6
2
Vb1
*
en11
2
M24
M25
M23
Vb1
Vb2
𝛼 *Itail
Vfb
*
en25
2
Vfb
158 
 
 
 
Ino,prop
2 ≈
gm0
2 en0
2
4
(
gm4
gm3
− 1)
2
+ (gm3
2 en3
2 + gm2
2 en2
2 + gm5
2 en5
2 ) ∗
gm4
2
gm3
2
+ (gm4
2 en4
2 + gm1
2 en1
2 + gm6
2 en6
2 + gm11
2 en11
2 + gm25
2 en25
2 ) 
(6-27) 
eni
2
∆f
=
8KT
3gmi
+
Kf
WiLiCoxf
∝
1
Ibias
 
 
(6-28) 
Ini
2 =
eni
2 gmi
2
∆f
=
gmi ∗ 8KT
3
+
gmi
2 ∗ Kf
WiLiCoxf
∝ Ibias 
(6-29) 
In5
2 = In6
2 = 2In11
2 = 2In25
2 = 2αIn0
2 ;  In1
2 = In2
2 ;  In3
2 =
In4
2
k3
 (6-30) 
Ino,prop
2 ≈
In5
2
8α
(k3 − 1)
2 + (In3
2 + In1
2 + In5
2 ) ∗ k3
2 + (k3In3
2 + In1
2 + 2In5
2 ) 
 
(6-31) 
Ino,prop
2 ≈ In1
2 ∗ (1 + k3
2) + In3
2 (k3 + k3
2) + In5
2 (2 + k3
2) 
 
(6-32) 
Vni
2 =
Ino,prop
2
[0.5 ∗ gm1 ∗ (k3 + 1)]2
=
In1
2 ∗ (1 + k3
2) + In3
2 (k3 + k3
2) + In5
2 (2 + k3
2)
0.5 ∗ gm1 ∗ (k3 + 1) ∗ GBW ∗ CL
 (6-33) 
Vno
2 = Vni
2 ∗
πGBW
2 ∗ 2π
=
[In1
2 ∗ (1 + k3
2) + In3
2 (k3 + k3
2) + In5
2 (2 + k3
2)]
2gm1 ∗ (k3 + 1) ∗ CL
 (6-34) 
Vno,thermal
2 =
4KT
3
[(1 + k3
2) + a(k3 + k3
2) + b(2 + k3
2)]
(k3 + 1)CL
≈
2.4KT
CL
 (6-35) 
Vno,thermal,conv
2 =
4KT
3 ∗ [2 + 2gm3
′ /gm1 + 2gm5
′ /gm1]
2CL
=
4KT
3CL
∗ [1 + a ∗
r + 0.5
2α + 0.5
+ b ∗
r
2α
] =
4.5KT
CL
 
(6-36) 
When the FCA is placed in a positive unity gain buffer structure, the equivalent rectangular 
noise bandwidth of the FCA is GBW/4, where GBW = 0.5gm1(k3 + 1)/CL. Therefore, the in-
band output referred voltage noise power of the proposed FCA is calculated as (6-34), in which 
the dominant noise source for a wideband FCA is thermal noise. The in-band thermal noise of 
the FCA is calculated as (6-35). This equation suggests that a, b and k3 should be minimized 
159 
 
in order to minimize the in-band thermal noise for a given load capacitor. In this design, a= 
gm3/gm1=0.4, b=gm5/gm1=0.14, k3=5/4 and 𝛼=1/12. As a result, the proposed FCA’s in-band 
thermal noise is calculated as 2.4KT/CL or 99uV at T=300K and CL=1pF after plugging a, b 
and k3 into (6-35).  
Similarly, the thermal noise of the conventional FCA counterpart in Figure 5.5(b), is found 
as (6-36), where gm3’ and gm5’ are transconductance of transistors M3 and M5 in the 
conventional FCA counterpart. With a typical bias current of r*Itail=0.67*Itail for the 
conventional FCA’s cascode stage, it can be found that 
gm3
′
gm1
= a ∗
r+0.5
2α+0.5
  and 
gm5
′
gm1
= b ∗
r
2α
. As 
a result, the integrated thermal noise voltage of the conventional FCA is obtained as 3.2KT/CL 
or 115uV at T=300K and CL=1pF after plugging a=0.4, b=0.07, 𝛼=1/12, and r=0.67. 
Therefore, compared to the conventional FCA, the proposed FCA is expected to reduce in-
band noise voltage by 14%.  
6.4. Offset Voltage Analysis 
The variance of transistor Mi’s threshold voltage and ∆βi/βi are expressed as (6-37), where 
βi=µCoxWi/Li. In addition, Athi
2  and Aβi
2  are mismatch coefficients, fixed parameters for a given 
process, of transistor Mi’s threshold voltage and feature sizes. Transistor Mi’s drain current 
variation due to its random mismatch is shown in (6-38), where Idi and Vodi are respectively 
the transistor’s quiescent current and overdrive voltage. Based on the sizing strategy of fixed 
current density for the transistor Mi, Equation (6-39) shows that transistor Mi’s drain current 
variation is proportional to its bias current. The larger the bias current is, the larger the drain 
current variation is.  
The input referred offset voltage of a FCA can be analyzed in a very similar manner to how 
noise is analyzed in section 6.4. The proposed FCA’s output current variation caused by the 
160 
 
mismatches of transistors (M1-M6), M11 and M25 is shown as (6-39). Therefore, its input 
referred offset voltage, Vos,prop, is calculated as (6-40). In (6-40), c = Ios3
2 /Ios1
2  and  d =
Ios5
2 /Ios1
2 . Similarly, the input referred offset voltage for the conventional FCA, Vos,conv, in 
Figure 5.5(b) is calculated as (6-41), in which r=0.67 and 𝛼=1/12. Compared to Vos,conv, it is 
clear that Vos,prop is reduced due to the reduced offset contribution from transistors M3 and M5. 
This will also be confirmed by the Monte Carlo simulation results.  
σvthi
2 =
Athi
2
WiLi
    ,      σ2(
∆βi
βi
) =
Aβi
2
WiLi
 (6-37) 
Iosi
2 = σvthi
2 gmi
2 + σ2 (
∆βi
βi
) Idi
2 =
(Aβi
2 Vod
2 +4Athi
2 )Idi
2
WiLiVodi
2 ∝
Idi
2
Wi
∝ Idi (6-38) 
Ios,out
2 = Ios1
2 ∗ (1 + k3
2) + Ios3
2 (k3 + k3
2) + Ios5
2 (2 + k3
2) (6-39) 
Vos,prop
2 =
Ios1
2 ∗ [(1 + k3
2) + c ∗ (k3 + k3
2) + d ∗ (2 + k3
2)]
[0.5 ∗ gm1 ∗ (k3 + 1)]2
; c =
Ios3
2
Ios1
2 ; d =
Ios5
2
Ios1
2  (6-40) 
Vos,conv
2 =
2(Ios1
2 + Ios3,conv
2 + Ios5,conv
2 )
gm1
2 =
2Ios1
2
gm1
2 (1 + c ∗
r + 0.5
2α + 0.5
+ d ∗
r
2α
) (6-41) 
6.5. Simulation Results 
In order to confirm the effectiveness and robustness of the performance improvement 
brought by the proposed FCA, two design examples are implemented in the 180nm CMOS 
process. The first design example is the conventional (conv.) FCA shown in Figure 5.5(b). The 
second design example is the proposed (prop.) FCA shown in Figure 6.1. Extensive 
simulations, under process corner variations and process corner plus mismatch variations are 
conducted to compare the two design examples. The purposes of the simulations are fourfold: 
a) to verify that the proposed FCA largely improves the FCA’s CUE; b) to verify that the 
proposed FCA largely improves the FCA’s DC gain; c) to verify that the proposed FCA largely 
161 
 
improves the FCA’s SR; and d) to confirm the compatibility of the proposed gain, SR and CUE 
enhancement techniques.  
All the simulation results below are collected with the design examples placed in a 
noninverting unity gain buffer configuration with a load capacitor of 1pF and supply voltage 
of 1.8V. The nominal bias currents of the proposed and conventional op amp are respectively 
3.5µA and 2.58µA but the op amps’ tail currents are the same at 1.5µA. 
6.5.1 Typical corner simulation results 
6.5.1.1 Frequency response  
     The frequency responses of the proposed and conventional fast FCAs are shown in Figure 
5.14. The proposed FCA’s GBW, 2.4MHz, is slightly higher than that of the conventional FCA, 
2.0MHz, given that the size ratio of transistor M3 to transistorM4 is 1.25, slightly larger than 
1. The phase margin (PM) of the proposed and conventional FCAs are 75.5o and 70o 
respectively, which match well with the theoretical calculations. The slight PM difference is 
caused by a much lower bias current in the proposed FCA’s cascode stage. In the two design 
examples, the cascode stage’s bias currents in the proposed and conventional FCA are 
respectively 0167 times and 0.67 times of Itail. In addition, the DC gain of the proposed FCA 
is about 20dB higher than the conventional FCA. The DC gain of the proposed and 
conventional op amps are about 103.8dB and 83.5dB respectively. The DC gain enhancement 
in the proposed FCA is brought by both the gain enhancement circuit on NMOS side and the 
smaller bias current in the cascode stage.  
162 
 
 
Figure 6.8: Frequency responses of the proposed and conventional FCAs 
6.5.1.2 Noise performance 
The simulated noise performance of the proposed and conventional FCA are shown in Figure 
6.9. As expected, the proposed FCA has lower noise floor than the conventional FCA. For 
example, the voltage noise density of the proposed FCA at 100KHz is 73.47nV/sqrt(Hz), while 
the voltage noise density of the conventional FCA at 100KHz is 88.4nV/sqrt(Hz). The noise 
reduction of the proposed FCAs is a natural byproduct of the bias current reduction in the 
cascode stage. The total integrated noise voltage from 0.01Hz to 2MHz for the proposed and 
conventional FCAs are respectively 99.24 µV and 127.4µV. That is to say, compared with the 
conventional FCA, the proposed FCA reduces noise by 22%. 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-50
-25
0
25
50
75
100
125
150
175
200
Frequency (Hz)
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of prop. FCA
 
 
proposed FCA
conventional FCA
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-40
-20
0
20
40
60
80
100
120
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
 
 
proposed FCA
conventional FCA
163 
 
 
Figure 6.9: Noise performance of the proposed and conventional FCAs 
6.5.1.3 Transient response  
Figure 6.10 shows the step responses of the two FCAs with an input step voltage of 0.6V. 
As expected, the positive slew rate (SR+) of the proposed FCA is faster than the conventional 
FCA due to the new turn-around stage. The positive and negative slew rate (SR+ and SR-) of 
the proposed FCA are respectively SR+prop =+5.84V/µs and SR-prop =-5.1V/µs, whereas those 
of the conventional FCA are respectively SR+conv =+1.1V/µs and SR-conv =-1.34V/µs. The 
positive and negative SR improvement brought by the proposed FCA are 5.3 times and 3.8 
times. The average SR improvement of the proposed FCA is 4.6 times. The simulated SR+ 
improvement is slightly higher than the calculated improvement factor of 4, due to length 
modulation effects of the current mirror M14-M15. The simulated SR- improvement matches 
very well with the theoretical calculations.  
 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
0
0.2
0.4
0.6
0.8
1
x 10
-4
Frequency (Hz)
V
o
lt
a
g
e
 n
o
is
e
 d
e
n
s
it
y
 (
V
/s
q
rt
(H
z
))
Noise performance
 
 
proposed FCA
conventional FCA
164 
 
 
Figure 6.10: Transient responses of proposed and conventional FCAs 
In addition, the settling times for the two FCAs are respectively 0.39µs and 0.75µs with a 
settling accuracy of 0.1% (Ts_0.1%) and 0.54µs and 1.08µs with a settling accuracy of 0.01% 
(Ts_0.01%). Therefore, the average Ts_0.1% and Ts_0.01% of the proposed FCA are shorter 
than those of the conventional FCA by 48% and 50% respectively. The simulated settling time 
of the two FCAs also matches the theoretical calculation for settling accuracy of 0.1% (7/GBW) 
and 0.01% (9/GBW) in a first-order system. In fact, the proposed FCA’s settling times are 
slightly faster than the calculated settling times due to the low-gain positive feedback loop of 
the GE circuit in the FCA. This confirms that a long recovery time is not needed in the proposed 
FCA despite that its cascode bias current is much smaller than its tail current.  
6.5.1.4 Performance summary for typical corner simulation 
The performance of the two design examples are summarized in Table 6.2. As can be seen, 
compared with the conventional FCA, the proposed FCA reduces its total current waste in its 
cascode stage by a factor of 2, which results in a 27% decrease in its total power consumption. 
0 1 2 3 4 5 6 7
0.4
0.5
0.6
0.7
0.8
0.9
1
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response
 
 
Prop. FCA Conv. FCA Input Voltage
165 
 
The CUE of the proposed and conventional FCAs are respectively 58% and 42%, so the 
proposed FCA increases the CUE by 1.38 times. In addition to its advantage in reducing 
required supply current, the proposed FCA enhances the average SR by 4.6 times and reduces 
Ts_0.1% and Ts_0.01% by 48% and 50% respectively. Moreover, compared with the 
conventional FCA, the proposed FCA’s input referred voltage noise density at 100KHz and 
integrated noise from 0.1Hz to 2MHz are reduced by 13% and 22% respectively.  
Table 6.2: Performance summary of the prop. and conv. FCAs in typical corner 
Output Unit Prop. Conv. 
GBW MHz 2.38 2.04 
PM degree 75.5 74 
DC Gain dB 103.8 83.5 
Isupply µA 2.58 3.5 
Iwaste µA 1.08 2 
Itail µA 1.5 1.5 
Iwaste/Itail % 71.77 133.5 
CUE (Itail/Isupply) % 58 42 
SR_avg V/µs 5.46 1.2 
Ts_0.1% @Vstep=0.6V µs 0.39 0.75 
Ts_0.01% @Vstep=0.6V µs 0.54 1.08 
Vni @ 100KHz nV/sqrt(Hz) 73.47 88.4 
Vno integrated to 2MHz µV 99.24 127.4 
FOMs pF*MHz/µA 0.93 0.56 
FOML pF*V/µA-µs 2.12 0.35 
FOMTs_0.1%   pF/µA-µs 0.99 0.38 
FOMTs_0.01%   pF/µA-µs 0.72 0.26 
FOMnoise(total noise/input pair noise) (V/V)
2 2.78 5 
CL pF 1 1 
Supply Voltage V 1.8 1.8 
Process   180nm CMOS 
As a result, compared with the conventional FCA, the proposed FCA improves the small 
signal figure of merit (FOMs) and the large signal figure of merit (FOML) by 66% and 6.6 
166 
 
times respectively. Recalling from the noise and settling time figure of merits defined in 
Section 5.4.1.4, the two figure of merits are rewritten as (6-43) and (6-44). In (6-43), Ts_x% 
is the settling time of the FCA with x% settling accuracy in a noninverting unity gain buffer 
configuration and the value of x can be 1, 0.1, 0.01 and 0.001 depending on the targeted 
application’s settling accuracy requirement. Isupply and CL are respectively the supply current 
and load capacitor of the FCA. Compared with the conventional FCA, the proposed FCA’s 
FOMTs_0.1% and FOMTs_0.01% are improved by 2.6 times and 2.8 times respectively.  In addition, 
the proposed FCA’s FOMnoise is improved by 1.8 times.  
FOMs =
GBW ∗ CL
Isupply
 ;  FOML =
SR ∗ CL
Isupply
   (6-42) 
FOMTs_x% =
Ts_x% ∗ CL
Isupply
  , x = 1, 0.1, 0.01 …    (6-43) 
FOMnoise =
Vni,total
2
Vni,input pair
2  (6-44) 
6.5.2 Process corner and temperature variation simulation results 
In this section, the designed two FCAs are simulated under process corner and temperature 
(P.T.) variations from -40oC to 85oC. The purposes of the simulations are twofold: a) verify 
the robustness of the proposed FCA under P.T. variations; and b) confirm the advantages of 
the proposed FCA under P.T. variations. The simulated performance of the designed FCAs 
include frequency response, transient response and noise performance because these 
performance are affected by P.T. variations. The independent process corner variations and 
temperature variations are listed in Table. 6.3. In total, there are 25 simulation setups including 
1 typical corner and 24 combinations of P.T. variation.  
 
167 
 
Table 6.3: Simulation setup with process corner and temperature variation 
 Typical Corners 
Temperature 27oC -40oC, 27oC and 85oC 
Low Vth MOS tntp snsp, snwp,wnsp,wnwp 
High Vth MOS tntp snsp, wnwp 
6.5.2.1 Frequency Response 
  
Figure 6.11: Frequency responses of the two FCAs a) proposed b) conventional 
Figure 6.11 shows the frequency responses of the proposed and conventional FCA under 
P.T. variations. The GBW and PM of the two FCAs have very little variation. The (min, typ, 
max) of the proposed FCA’s simulated DC gain, phase margin (PM) and GBW are respectively 
(102B, 104dB, 104dB), (73.7o, 75.8o ,77.4o) and (2.03MHz, 2.39MHz, 2.65MHz). On the other 
hand, the (min, typ, max) of the conventional FCA’s simulated DC gain, phase margin (PM) 
and GBW are respectively (80dB, 83.5dB, 83.5dB), (73.7o, 74o ,74.5o) and (1.7MHz, 2.0MHz, 
2.2MHz). The variations of DC gain, PM and GBW of both the conventional and proposed 
FCAs are small. The amount of DC gain enhancement is maintained to be about 20dB under 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-50
-25
0
25
50
75
100
125
150
175
200
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of prop. FCA under process variation
 
 
Prop. loop phase
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-40
-20
0
20
40
60
80
100
120
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
 
 
Prop. loop gain
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-50
-25
0
25
50
75
100
125
150
175
200
L
o
o
p
 p
h
a
s
e
 (
D
e
g
re
e
)
Frequency response of conv. FCA under process variation
 
 
Conv. loop phase
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
-40
-20
0
20
40
60
80
100
Frequency (Hz)
L
o
o
p
 g
a
in
 (
d
B
)
 
 
Conv. loop gain
168 
 
P.T. variations. Compared with the proposed FCA in Chapter 5, the leakage current of 
transistor M14 in this proposed FCA has been minimized. This is the reason why the proposed 
FCA can maintain large DC gain enhancement in the corner of fast NMOS when T=85oC.  
6.5.2.2 Noise Performance 
 
Figure 6.12: Noise performance of the prop. and conv. FCAs under P.T. variation 
The simulated noise performance of the proposed and conventional FCA under P.T. 
variations are shown in Figure 6.12. The input referred voltage noise densities of the proposed 
and conventional FCAs are respectively 60.2~94.3nV/sqrt(Hz) and 72.3~115nV/sqrt(Hz) at a 
frequency of 100KHz. The integrated noise from 0.01Hz to 2MHz of the proposed and 
conventional FCAs are respectively 81.6~126.7µV and 104.6~161.1µV. Therefore, both the 
minimum and maximum of the proposed FCA’s integrated noise are 21% lower than the 
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
0
0.5
1
1.5
x 10
-4
Frequency (Hz)
V
o
lt
a
g
e
 n
o
is
e
 d
e
n
s
it
y
 (
V
/s
q
rt
(H
z
))
Noise performance under process variation
 
 
Prop. FCA
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
0
0.5
1
1.5
x 10
-4
Frequency (Hz)
V
o
lt
a
g
e
 n
o
is
e
 d
e
n
s
it
y
 (
V
/s
q
rt
(H
z
))
 
 
Conv. FCA
169 
 
conventional FCA. The noise performance improvement in the proposed FCA is a natural 
byproduct of a reduced bias current at the cascode stage.  
6.5.2.3 Transient Response 
The transient responses of the two FCAs are simulated in the noninverting unity gain buffer 
configuration with an input step voltage of 0.6V under P.T. variations. The simulation results 
are shown in Figure 6.13. Both the positive and negative slew rates of the proposed and 
conventional FCAs show a very small spread under P.T. variations. This indicates the 
robustness of the proposed FCA in its positive slew rate enhancement. This robustness over 
P.T. variations is expected because the tail current in the positive slewing phase is amplified 
by a well-defined current gain, and then the amplified current is passed to the load capacitor 
by the turn-around stage. The mean SRs of the proposed and conventional FCAs range from 
4.03~6.33V/µs and 1.2~1.24V/µs respectively. Also, the mean Ts_0.1% of the proposed and 
conventional FCAs range from 0.3~0.45µs and 0.72~0.78µs respectively. This clearly shows 
advantages of the proposed FCA over the conventional FCA in terms of operation speed. It 
also shows that unlike the conventional FCA, the proposed FCA does not suffer from any long 
recovery time under P.T. variations.  
 
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
0.34
0.44
0.54
0.64
0.74
0.84
0.94
1.04
1.14
1.24
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response under process variation
 
 
Input voltage Conv. group Prop. group
170 
 
Figure 6.13: Transient responses of the prop. and conv. FCAs under P.T. variation 
 
6.5.2.4 Performance Summary for P.T. Variation 
Table 6.4: Performance summary of the prop. and conv. FCA under P.T. variation 
  Proposed Conventional 
Output Unit Min Max Typical Min Max Typical 
GBW MHz 2.03 2.65 2.39 1.7 2.2 2.04 
PM degree 73.65 77.35 75.8 73.7 74.5 74 
DC Gain dB 101.9 103.8 103.8 80 83.5 83.5 
SR_avg V/µs 4.03 6.33 5.81 1.2 1.21 1.23 
Ts_0.1% µs 0.3 0.45 0.39 0.72 0.78 0.75 
Ts_0.01% µs 0.49 0.63 0.49 0.87 1.03 0.93 
Vni @ 100KHz nV/sqrt(Hz) 60.17 94.28 73.47 72.3 113.8 88.4 
Vno integrated to 2MHz µV 81.61 126.7 99.15 104.6 161.1 127.4 
FOMs pF*MHz/µA 0.78 1.03 0.93 0.48 0.63 0.56 
FOML pF*V/µA-µs 1.57 2.45 2.25 0.345 0.354 0.352 
FOMTs_0.1%   pF/µA-µs 0.87 1.3 0.99 0.37 0.4 0.38 
FOMTs_0.01%   pF/µA-µs 0.62 0.8 0.79 0.25 0.28 0.27 
Isupply µA 2.58 5 
Iwaste µA 1.08 3.5 
Itail µA 1.5 1.5 
CUE (Itail/Isupply) % 58.14 43 
CL pF 1 1 
The performance summary of the proposed and conventional FCAs under P.T. variations are 
shown in Table 6.4. Compared with the conventional FCA, the proposed FCA’s output voltage 
noise, integrated from 0.01Hz to 2MHz, has been reduced by 21%. Ts_0.1% and Ts_0.01% of 
the proposed FCA are also reduced by 48%, while the proposed FCA’s supply current is only 
about 74% of the conventional FCA. In addition, compared with the conventional FCA, the 
proposed FCA improves the DC gain by about 20dB and well maintains this amount of gain 
enhancement under P.T. variations. The significant supply current reduction, considerable 
171 
 
settling time reduction, large DC gain enhancement and noise reduction under P.T. variations 
clearly demonstrate the advantages and robustness of the proposed FCA. This is also evidence 
that the proposed FCA has a good design compatibility that allows gain, slew rate and current 
utilization efficiency to be all improved simultaneously.  
6.5.3 Process corner plus mismatch variation simulation results 
In this section, the two designed FCAs are simulated under both process corner and 
mismatch (P.Mis) variations via the 500-run Monte Carlo simulation. The purposes of the 
simulations are twofold: a) to verify the robustness of the proposed FCA under P.Mis variations; 
and b) to confirm the advantages of the proposed FCA under P.Mis variations. The simulated 
performance discussed in this section are transient response, offset voltage, frequency response, 
noise and current utilization efficiency.  The FOMs, FOML, FOMTs_0.1%, and FOMTs_0.01% of 
the FCAs are also reported.  
 
Figure 6.14: Transient responses of the prop. and conv. FCAs under P.Mis. variation  
 
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
0.34
0.44
0.54
0.64
0.74
0.84
0.94
1.04
time (s)
v
o
lt
a
g
e
 (
V
)
Transient Response Under Process and Mismatch Variation
 
 
Input voltage Conv. group Prop. group
172 
 
 
 
Figure 6.15: Average Ts_0.01% of the proposed FCA under P.Mis. variation 
 
Figure 6.16: Average Ts_0.01% of the conventional FCA under P.Mis. variation 
Figure 6.14 shows the simulated transient responses of the proposed and conventional FCAs 
under P.Mis variations. Figure 6.15 and Figure 6.16 respectively show the histograms of the 
average Ts_0.01% of the proposed and conventional FCAs under P.Mis variations. The 
average SRs and settling times of both two FCAs show normal distributions. The (mean, sigma) 
of the proposed FCA’s average SR, Ts_0.1% and Ts_0.01% are respectively (5.44V/µs, 
0.53V/µs), (0.38µs, 0.027µs) and (0.53µs, 0.032µs). On the other hand, the (mean, sigma) of 
the proposed FCA’s average SR, Ts_0.1% and Ts_0.01% are respectively (1.23V/µs, 
0.024V/µs), (0.75µs, 0.02µs) and (1.08µs, 0.021µs). Therefore, the average improvement of 
SR, Ts_0.1% and Ts_0.01% brought by the proposed FCA are respectively 4.4 times, 2 times 
0.45 0.5 0.55 0.6 0.65
0
50
100
150
 Ts_0.01% of prop. FCA (s)
H
it
s
 Ts_0.01% of prop. FCA (s)
 
 
Prop.
1.02 1.04 1.06 1.08 1.1 1.12 1.14
0
20
40
60
80
100
120
 Ts_0.01% of conv. FCA (s)
H
it
s
 Ts_0.01% of conv. FCA (s)
 
 
conv.
173 
 
and 2 times of performance of the conventional FCA. This clearly shows the favorable speed 
performance of the proposed FCA. Most importantly, all the speed improvement brought by 
the proposed FCA is achieved yet with a smaller power consumption which is about 73.7% of 
the conventional FCA. In addition, compared with the (mean, sigma) of the conventional 
FCA’s offset voltage being (0.11mV, 3.11mV), those of the proposed FCA are also moderately 
smaller and are equal to (-0.39mV, 2.26mV).  
Therefore, the (mean, sigma) of the proposed FCA’s FOMs, FOML and FOMTS_0.01% are 
respectively (1.04 pF*MHz/µA, 0.08 pF*MHz/µA), (2.11 pF*V/µA-µs, 0.20 pF*V/µA-µs) 
and (0.74 pF/µA-µs, 0.026 pF/µA-µs). The (mean, sigma) of the conventional FCA’s FOMs, 
FOML and FOMTS_0.01% are respectively (0.56 pF*MHz/µA, 0.003 pF*MHz/µA), (0.35 
pF*V/µA-µs, 0.003 pF*V/µA-µs) and (0.266 pF/µA-µs, 0.012 pF/µA-µs). As can be seen, the 
average improvement of FOMs, FOML and FOMTS_0.01% brought by the proposed FCA are 
respectively 1.65 times, 6.0 times and 2.8 times of the performance of the conventional FCA. 
6.5.3.1 Performance Summary for P.Mis variation 
The performance summary of the proposed and conventional FCAs are shown in the Table 
6.5. As can be seen, compared with the conventional FCA, the proposed FCA not only reduces 
power consumption but also improves settling performance under P.Mis variations. In addition, 
the proposed FCA’s DC gain is also largely improved.  
Compared with the conventional FCA, the proposed FCA has 27% less supply current, 21% 
less integrated noise, 28% less offset voltage, 48% less Ts_0.1% and Ts_0.01% but 20dB more 
DC gain. The simultaneous performance enhancement on these critical specifications under 
P.Mis variations clearly demonstrate the advantages and robustness of the proposed FCA.  
174 
 
Table 6.5: Performance summary of the prop. and conv. FCA under P.Mis variation 
  Proposed Conventional 
Output Unit Mean Stdev Mean Stdev 
Vos mV -0.39 2.26 0.11 3.11 
GBW MHz 2.38 0.053 1.98 0.034 
PM degree 75.47 0.345 73.8 0.64 
DC Gain dB 103.6 0.633 82 3.4 
SR_avg V/µs 5.44 0.527 1.23 0.024 
Ts_0.1% µs 0.38 0.027 0.75 0.02 
Ts_0.01% µs 0.53 0.032 1.08 0.02 
Vni @ 100KHz nV/sqrt(Hz) 73.43 0.833 88.6 1.9 
Vno integrated to 2MHz µV 99.19 0.98 128 2.0 
FOMs pF*MHz/µA 0.92 0.017 0.56 0.006 
FOML pF*V/µA-µs 2.11 0.203 0.35 0.003 
FOMTs_0.1%   pF/µA-µs 1.04 0.077 0.38 0.003 
FOMTs_0.01%   pF/µA-µs 0.74 0.045 0.265 0.004 
Isupply µA 2.58 0.041 3.5 0.19 
Iwaste µA 1.08 0 2.0 0.177 
Itail µA 1.5 0.04 1.50 0.028 
Iwaste/Itail % 71.77 0.872 134.9 11.7 
Itail/Isupply % 58.2 0.23 42.8 0.16 
CL pF 1.00 1.00 1.00 NA 
Vsupply V 1.8 NA 1.8 NA 
Process  180nm CMOS 
6.6. Performance comparison to the literature 
Table 6.7 summarizes the performance of the proposed FCA compared with the Chapter 5’s 
proposed FCA, the FCA in [1] and the conventional FCA in a typical corner at room 
temperature. Compared with Chapter 5’s proposed FCA, the FCA in [1] and the conventional 
FCA, the proposed FCA enhances DC gain by 14dB, 11dB and 20dB respectively, enhances 
slew rates by 1.5 times, 4.0 times and 4.4 times respectively, reduces Ts_0.1% by 22%, 54.4% 
and 48%, decreases Ts_0.01% by 26%, 66.3% and 50%. As a result, FOMTs_0.1% of this 
proposed FCA and Chapter 5’s proposed FCA are the same. Their FOMTs_0.1% are as high as 3 
175 
 
times of [1] and 2.68 times of the conventional FCA. Again, the aforementioned performance 
comparison clearly demonstrates the advantages of this work (Chapter 6’s proposed FCA) over 
[1] and the conventional FCA. Compared with Chapter 5’s proposed FCA, this works shows 
comparable figure of merits but much higher DC gain, which is a critical specification for high 
precision system. This work also demonstrates that the proposed gain enhancement, slew rate 
enhancement and current utilization enhancements techniques in this dissertation can be 
combined in a single FCA design.  
Table 6.6: Performance comparison of the proposed FCA to the literature 
Output Unit 
This 
work 
FCA in 
Chapter 5 [1] 
Conv. 
FCA 
Vos mV 2.26 2.18 2.64 2.95 
GBW MHz 2.38 2.14 2.2 2.04 
PM degree 75.5 70 70 74 
DC Gain dB 103.8 89.69 92.75 83.5 
Isupply µA 2.58 1.88 2.6 3.5 
Iwaste µA 1.08 0.38 1.1 2.0 
Itail µA 1.5 1.50 1.50 1.50 
Iwaste/Itail % 71.77 25.25 73.3 133.3 
Itail/Isupply  % 58 80 60 43 
SR_avg V/µs 5.46 3.66 1.355 1.23 
Ts_0.1% µs 0.39 0.50 0.855 0.75 
Ts_0.01% µs 0.54 0.73 1.6 1.08 
Vni @ 100KHz nV/sqrt(Hz) 73.47 68.07 82.2 88.5 
Vno integrated to 2MHz µV 99.24 93.20 113.0 127.5 
FOMs pF*MHz/µA 0.93 1.14 0.87 0.565 
FOML pF*V/µA-µs 2.12 1.92 0.52 0.352 
FOMTs_0.1% pF/µA-µs 0.99 1.08 0.449 0.38 
FOMTs_0.01% pF/µA-µs 0.72 0.73 0.24 0.268 
FOMnoise (V/V)
2 2.78 2.6 3.6  5.0 
CL pF 1.00 1.00 1.00 1.00 
Process  Mean 180nm CMOS 
 
176 
 
6.7. Discussion 
The design example with combined gain, slew rate and current utilization efficiency 
enhancement techniques shows the advantages of the design over [1] in terms of gain, settling 
time, power consumption, noise, and offset voltage. The amount of the DC gain enhancement 
over [1] is 11dB, which leads to an achieved DC gain of about 104dB for a single-stage FCA. 
The gain enhancement is limited because only the conductance cancellation circuit for the 
NMOS side of the proposed FCA is implemented as shown in Figure 6.1. If a larger DC gain 
is needed for an application, a similar gain enhancement circuit can be implemented for the 
NMOS side of the proposed FCA.  
6.8. Summary 
A design example with a combination of enhancement techniques for gain, slew rate and 
current utilization efficiency has been introduced. Compared to the state-of-the-art method [1], 
the proposed FCA increases DC gain by 11dB, improves slew rate by 4.04 times, and reduces 
settling time with 0.1% and 0.01% settling accuracy by 2.2 and 2.96 times respectively. Due 
to its design simplicity, high current utilization efficiency, low noise and offset voltage, the 
proposed FCA is suitable for applications and systems where FCAs are used as single-stage 
amplifiers or first stages in multi-stage amplifiers. The applications include but not limited to 
battery monitoring circuits, load current sensing circuits, data converters and switched-
capacitor circuits.  
6.9. References 
[1]. R. Eschauzierand and NV. Rijn. "Apparatus and method for a compact class AB turn-
around stage with low noise, low offset, and low power consumption," U.S. Patent No. 
6,624,696. 23 Sep. 2003. 
177 
 
 CONCLUSION 
In this research, a series of performance enhancement techniques for operational amplifiers 
are introduced including techniques for gain enhancement, slew rate enhancement, current 
utilization efficiency enhancement and power efficiency enhancement.  
In Chapter 2, a new method to robustly improve an op amp’ DC gain with negligible power 
and area overhead via conductance cancellation has been introduced. The uniqueness of this 
gain enhancement technique lies in its robust ability to track and cancel conductance under 
PVT variations without the aid of any tuning circuit. Because of this unique capability, the 
proposed method can bring out over 20dB enhanced DC gain that well sustains under PVT 
variations. Compared with the regulated gain boosting technique, the proposed method offers 
several benefits. First, it does not degrade an op amp’s settling performance including its high 
precision settling. Second, the design and simulation effort involved in the design is minimal, 
whereas contrastively the regulated gain boosting technique needs significant amount of design 
and simulation efforts to address instability and pole-zero doublet issues. Third, the power and 
area consumption of the proposed gain enhancement technique are very low. Due to its design 
simplicity, low power and low area cost with no degradation of an op amp’s settling time, this 
proposed technique is suitable for op amps in high precision systems such as switched 
capacitor circuits, ADC drivers and filters.  
In Chapter 3, we have introduced a new slew rate enhancement (SRE) circuit, which can 
largely improve an amplifier’s slew rate via excessive transient feedback in the slewing phases 
while preserving the amplifier’s small-signal performance through a well-defined turn-on 
condition. This nonlinear operation of the introduced SRE circuit improves the linearity of the 
entire amplifier. In addition, the transient current efficiency of the proposed SRE method is 
178 
 
also high in the slewing phases because the increased transient tail current always improves 
the amplifier’s slew rate regardless of whether the transient tail current functions as common-
mode or differential-mode for the amplifier’s input pair. Due to the little power consumption, 
low area overhead, design simplicity and high effectiveness of the proposed SRE method, the 
method is suitable for applications which need to provide large capacitive driving capabilities 
with low static power dissipation such as switched capacitor circuits.  
In Chapter 4, we have introduced a power-efficient design method for an op amp to drive 
very large capacitive loads. The proposed method has several advantages compared with the 
state-of-the-art methods for driving large capacitive loads. First, the proposed method 
decouples large- and small-signal paths so that both the small- and large-signal performance 
of the op amp can be optimized simultaneously. Second, the designed op amp with the 
proposed method has a well-defined quiescent current. As a result, the designed op amp is not 
sensitive to devices’ random mismatches. Third, the amount of wasted current in the preamp’s 
load circuit is minimized to zero. Due to these three advantages, the designed op amp is able 
to offer favorable small-signal and large-signal figure of merits simultaneously. This is an 
important improvement compared with the state-of-the-art methods which can only improve 
small-signal figure of merits at the cost of large-signal figure of merits or vice versa. This 
proposed power-efficient op amp design is suitable for applications where large capacitive 
loads need to be driven, such as LCD buffers and electro-chemical sensors. 
In Chapter 5, a new technique that improves the current utilization efficiency (CUE) of a 
folded cascode amplifier (FCA) has been introduced. Compared with the state-of-the-art 
techniques, the proposed method provides several benefits. First, the dependency of the FCA’s 
nondominant poles and phase margin on the bias current of the FCA’s cascode stage is largely 
179 
 
relaxed. Therefore, the proposed FCA can significantly reduce current consumption in the 
cascode stage. Second, the proposed method does not suffer from a long recovery time after 
the slewing phases complete, though the cascode stage’s bias current is as low as 8.3% of the 
FCA’s tail current. Third, the proposed method does not need any frequency compensation. 
As a result, the design simplicity and area consumption of the designed FCA is significantly 
reduced. In addition, compared with the conventional FCA design and the state-of-the-art 
method which improve settling time only with increased power consumption, the designed 
FCA achieves faster settling time with decreased power consumption. Therefore, the proposed 
CUE enhancement technique is suitable for applications and systems where a FCA is used as 
single-stage amplifiers or the first stage in multi-stage amplifiers. The applications include but 
not limited to battery monitoring circuits, load current sensing circuits, data converters, and 
switched-capacitor circuits.  
In Chapter 6, we have presented a designed FCA with gain, slew rate and current utilization 
efficiency enhancement techniques combined. The designed FCA confirms the compatibility 
of the proposed performance enhancement techniques in Chapter 2, 3 and 5. Compared with 
the conventional FCA, the design example shows multiple performance improvement 
simultaneously including power consumption, gain, slew rate, and settling time. As natural 
byproducts of power consumption reduction, the offset voltage and noise of the designed FCA 
are also decreased. Therefore, the designed FCA can be used for applications where wide input 
common mode range, high gain, fast settling, low noise and low offset are needed such as 
pipeline ADC’s sample-and-hold circuits and sigma-delta ADCs.  
 
 
