Copyright by Wei Shi 2022 The Dissertation Committee for Wei Shi certifies that this is the approved version of the following dissertation:

## Design and Automation Techniques for High-Performance Mixed-Signal Circuits

Committee:

David Z. Pan, Supervisor

Nan Sun, Supervisor

Michael Orshansky

Jaydeep Kulkarni

Song Han

## Design and Automation Techniques for High-Performance Mixed-Signal Circuits

by

Wei Shi

#### DISSERTATION

Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of

#### DOCTOR OF PHILOSOPHY

THE UNIVERSITY OF TEXAS AT AUSTIN May 2022 Dedicated to my parents.

## Acknowledgments

I want to express my sincerest gratitude to my advisor, Dr. Nan Sun, who I met at the start of my academic journey. He has been an academic role model and a supportive mentor. His persistence, concentration, and confidence in tackling technical problems guide me to encounter research challenges. He bestowed me all the presentation and writing skills, and the passion to contribute to frontier research. I hope to follow in his footsteps to be the kind of person he is.

I want to say my deepest appreciation to my advisor, Dr. David Z. Pan. As I stepped into the area of machine learning for circuit optimization, I received tremendous help from Dr. Pan and UTDA group members. I also appreciate his leadership, involving me in multi-disciplinary projects. Visions of people from different domains including optimization, machine learning, and science are eye-opening and inspiring, setting the future course I want to pursue.

I would also like to thank my committee members, Prof. Song Han, Prof. Michael Orchansky, and Prof. Jaydeep Kulkarni, for their valuable advice and discussions.

It has been a great honor to meet and work with all the Sun-lab members: Miguel Gandara, Shaolan Li, Abhishek Mukherjee, Jiaxin Liu, Linxiao Shen, Xiyuan Tang, Sungjin Hong, Chen-Kai Hsu, Wenda Zhao, Xiangxing Yang, Xing Wang, Zijie Gao, and Yi Shen. Working with them is truly inspiring and fun. Also, I would like to thank UTDA members: Keren Zhu, Jiaqi Gu, Zixuan Jiang, Hao Chen, Hanqing Zhu, etc. I would like to thank Dr. Linxiao Shen and Dr. Xiyuan Tang for numerous technical discussions and suggestions about my research. I would like to thank Dr. Ahbishek Mukherjee for sharing his expertise in CTDSM with me. Also, I would like to thank Dr. Yi Shen and Dr. Jiaxin Liu for teaching me simulation and measurement skills in my early years of data converter research. I would like to thank Keren Zhu for his help in my CAD research. I would like to thank Jiaqi Gu for his generous help in writing ML papers. I would also thank Andrew Kieschnick for the technical supports on server environments.

Besides, I would like to thank my good friend, Dr. Zhaowen Wang from Columbia University. His encouragements and help are precious.

I would like to thank my beautiful girlfriend, Xuejun. She has been with me through lots of critical moments in my Ph.D life. During the years, my life and career has been through three huge changes. She has witnessed my emotional and physical up and downs and always stood by my side.

My beloved family is the one I am truly indebted to. While I am on the other end of the Earth, my parents continuously support me unconditionally. Without their support and love, I never could have reached where I am today. This dissertation is dedicated to them.

## Design and Automation Techniques for High-Performance Mixed-Signal Circuits

Wei Shi, Ph.D. The University of Texas at Austin, 2022

> Supervisors: David Z. Pan Nan Sun

In the era of ubiquitous sensing environment, the modern electronic system expands our perception of the outside world. Analog/mixed-signal circuit has played a critical role to bridge the physical and digital worlds. The boom of Internet-of-Things (IoT), bio-sensing, and digital camera calls for versatile high-performance mixed-signal circuits and the corresponding automated design methodology. However, high-performance analog circuits are area or power hungry. Moreover, the design cost is prohibitively expensive. To address these challenges, this dissertation explores solutions from both the design and automation techniques.

Analog-to-digital converter (ADC) is an important subset of analog/mixedsignal circuits. Continuous time Delta-Sigma modulator (CTDSM) is a popular design choice for high-speed and high-resolution designs. CTDSMs feature a higher power efficiency than their discrete-time (DT) counterpart. The first work presents a high-speed  $4^{th}$ -order DSM featuring the CT-DT hybridization and an efficient excess-loop-delay (ELD) compensation technique in the charge domain. Compared to prior high-order CTDSMs, the proposed hybrid DSM achieves 4<sup>th</sup>-order noise shaping with single operational trans-conductance amplifier (OTA). Minimized number of OTAs reduces power and enhances stability. On top of that, an efficient ELD compensation technique is implemented by utilizing the inherent capacitor digital-to-analog converter (CDAC) of SAR. Fabricated in 40 nm CMOS, the prototype ADC achieved a peak Schreier Figure-of-Merits (FoM) of 176.1 dB, marking 4 dB improvement over prior arts.

The second project explores the techniques to reduce the area consumption of high-resolution CTDSMs. The performance of existing high-resolution CTDSMs is limited by the feedback DAC. The stringent non-linearity requirement leads to the large area of DAC. To address this limitation, a lowcomplexity hardware-based  $2^{nd}$ -order dynamic-element-matching (DEM) is proposed. The partial sorter applied to the DEM minimizes the hardware cost. Moreover, feedforward path assisted loop filter adapts the highly-linear integrator design to the low power supply voltage. With these techniques combined, the prototype shows a feasible design pattern to achieve compact-area, high-resolution design at advanced technology nodes. A prototype fabricated in 40 nm CMOS measured 95dB SNDR, occupying only 0.37 mm<sup>2</sup> area.

After the exploration of pushing the ADC performance boundary, this dissertation also demonstrates the automated design methodology. The design cost of high-performance mixed-signal circuit grows exponentially with the technology scaling. Existing analog automation techniques cannot handle practical circuit design constraints (e.g. robustness against variations). The third work presents RobustAnalog, a variation-aware analog circuit optimization via multi-task reinforcement learning (RL) and task-space pruning. RobustAnalog is mainly designed to tackle the process-voltage-temperature (PVT) robustness in the analog design. Correlations between similar variations are modeled and conflicts between distinct variations are mitigated. With task pruning, a small-sized proxy training task set is formed. The pruning reduces the queries to the full task set. Compared with the popular blackbox optimization methods, RobustAnalog significantly reduces the simulation cost. Therefore, RobustAnalog shows the staggering progress towards analog automation techniques that can be applied to real silicon conditions.

# Table of Contents

| Acknowledgments                                                |       |                                                     | $\mathbf{v}$ |
|----------------------------------------------------------------|-------|-----------------------------------------------------|--------------|
| Abstra                                                         | ict   |                                                     | vii          |
| List of                                                        | Table | 25                                                  | xiii         |
| List of                                                        | Figur | es                                                  | xiv          |
| Chapter 1.                                                     |       | Introduction                                        | 1            |
| 1.1                                                            | High- | Performance Mixed-Signal Circuit                    | 1            |
|                                                                | 1.1.1 | ADC Background and Applications                     | 1            |
|                                                                | 1.1.2 | High-Speed and High-Resolution CTDSMs               | 2            |
| 1.2                                                            | Analo | g Design Automation                                 | 6            |
|                                                                | 1.2.1 | Automation Problem Formulation                      | 10           |
|                                                                | 1.2.2 | Optimization Methods                                | 11           |
|                                                                |       | 1.2.2.1 Evolutionary Strategy                       | 12           |
|                                                                |       | 1.2.2.2 Bayesian Optimization                       | 14           |
|                                                                |       | 1.2.2.3 RL-Guided Optimization                      | 16           |
|                                                                |       | 1.2.2.4 Challenges in Practical Analog Optimization | 18           |
| Chapter 2. High Speed Hybrid CT-DT DSM with Charge-don<br>ELDC |       | ain<br>20                                           |              |
| 2.1                                                            | Intro | luction                                             | 21           |
| 2.2                                                            | Prope | osed $4^{th}$ -Order Hybrid CT-DT DSM               | 23           |
|                                                                | 2.2.1 | Prior High-Order CT DSMs                            | 23           |
|                                                                | 2.2.2 | Hybrid DSM Architecture                             | 25           |
|                                                                | 2.2.3 | Coefficient Sensitivity                             | 26           |
|                                                                | 2.2.4 | Finite UGB Effects                                  | 27           |
| 2.3                                                            | Prope | sed ELD Compensation Scheme                         | 30           |

|        | 2.3.1          | Brief Review of ELD Compensation                                | 30 |
|--------|----------------|-----------------------------------------------------------------|----|
|        | 2.3.2          | Proposed Charge-domain ELD Compensation                         | 32 |
|        | 2.3.3          | Feedforward Path and STF                                        | 36 |
| 2.4    | Circu          | it Implementations                                              | 38 |
|        | 2.4.1          | System Architecture                                             | 38 |
|        | 2.4.2          | SAB Design                                                      | 40 |
|        | 2.4.3          | $2^{nd}$ -order NS-SAR Design                                   | 43 |
| 2.5    | Measu          | urement Results                                                 | 46 |
| 2.6    | Concl          | lusion                                                          | 49 |
| Chapte | er 3.          | High Resolution CTDSM with 2 <sup>nd</sup> -order MES           | 50 |
| 3.1    | Intro          | duction $\ldots$                                                | 51 |
| 3.2    |                | osed CTDSM with Low-Cost 2nd-Order Vector Quantize-             |    |
|        | 10 010 0 0 0   | DEM                                                             | 54 |
|        |                | Prior Mismatch Error Shaping Techniques                         | 54 |
|        | 3.2.2          | $2^{nd}$ -Order DEM with Partial Sorter                         | 56 |
| 3.3    | Loop           | Filter Design                                                   | 62 |
| 3.4    | CTDS           | SM Architecture                                                 | 65 |
|        | 3.4.1          | OTA Schematic                                                   | 66 |
|        | 3.4.2          | NS-SAR Schematic                                                | 67 |
| 3.5    | Measu          | urement Results                                                 | 68 |
| 3.6    | Concl          | lusion                                                          | 73 |
| Chapte | er 4.          | Robust Analog Design Automation Via Reinforce-<br>ment Learning | 77 |
| 4.1    | Intro          | duction                                                         | 78 |
| 4.2    | 2 Related Work |                                                                 | 81 |
| 4.3    | Propo          | osed PVT Variation-Aware Circuit Sizing                         | 84 |
|        | 4.3.1          | Problem Definition                                              | 84 |
|        | 4.3.2          | Framework Overview                                              | 85 |
|        | 4.3.3          | Multi-Task RL training                                          | 86 |
|        | 4.3.4          | Task Space Pruning                                              | 89 |
| 4.4    | Expe           | riments                                                         | 92 |

|                       | 4.4.1 | Analog/Mixed-Signal Circuits           | 92  |
|-----------------------|-------|----------------------------------------|-----|
|                       | 4.4.2 | Training Settings                      | 95  |
|                       | 4.4.3 | Evaluation of the Circuit Optimization | 96  |
|                       | 4.4.4 | Analysis                               | 97  |
| 4.5                   | Concl | usion                                  | 100 |
| Chapter 5. Conclusion |       |                                        | 102 |
| Bibliography          |       |                                        | 104 |
| Vita                  |       |                                        | 121 |

# List of Tables

| 2.1 | Comparison with the state-of-the-art single-loop CT DSMs                            | 44 |
|-----|-------------------------------------------------------------------------------------|----|
| 3.1 | Pros and cons of different $2^{nd}$ -order DEMs                                     | 61 |
| 3.2 | Performance summary and comparison with state-of-<br>the-art high-resolution CTDSMs | 75 |
| 4.1 | Comparison between RobustAnalog's solution and expert's so-<br>lution               | 99 |

# List of Figures

| 1.1  | ADC application universe [Robertson [2015]]                                                                                                            | 2  |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.2  | Generic model of CTDSM                                                                                                                                 | 3  |
| 1.3  | CTDSM in ADC FoM surveys                                                                                                                               | 4  |
| 1.4  | Increasing design cost in today's technologies [Olofsson [2018]]                                                                                       | 7  |
| 1.5  | Evolutionary Computation Flow                                                                                                                          | 13 |
| 1.6  | Bayesian Optimization Flow                                                                                                                             | 15 |
| 1.7  | Reinforcement Learning Flow.                                                                                                                           | 17 |
| 2.1  | Typical discrete-time and continuous-time structures                                                                                                   | 22 |
| 2.2  | Proposed hybrid DSM with SAB filter and passive NS-SAR quantizer                                                                                       | 26 |
| 2.3  | (a) Conventional $4^{th}$ -order CT DSM, (b) Proposed $4^{th}$ -order hybrid CT-DT DSM                                                                 | 27 |
| 2.4  | Coefficient sensitivity comparison between a conventional and the proposed $4^{th}$ -order hybrid CT-DT DSM                                            | 28 |
| 2.5  | Proposed hybrid CT-DT DSM SQNR with different OTA UGBs                                                                                                 | 29 |
| 2.6  | ELDC implemented by the direct feedback path                                                                                                           | 30 |
| 2.7  | ELDC implemented by the residual feedforward path                                                                                                      | 30 |
| 2.8  | (a) Residual ELDC signal flow (b) ELDC path implementation<br>in the current domain                                                                    | 32 |
| 2.9  | Residual ELDC implementation in the charge domain                                                                                                      | 33 |
| 2.10 | (a) Signal flow and schematic of proposed charge domain em-<br>bedded ELD compensation, (b) Timing diagram of ELD com-<br>pensation operations on CDAC | 34 |
| 2.11 | (a) The UGB increase of the conventional $4^{th}$ -order filter after ELDC, (b) The UGB increase of the SAB $2^{nd}$ -order filter after ELDC.         | 35 |
| 2.12 | (a) Original CT DSM model, (b) Equivalent residual ELD compensation model.                                                                             | 37 |

| 2.13 | (a) Rearranged model to separate CT and DT domain, (b) STF<br>Bode plot                                  |
|------|----------------------------------------------------------------------------------------------------------|
| 2.14 | Schematic and timing diagram of proposed hybrid DSM                                                      |
| 2.15 | The two-stage feedforward-compensated OTA design                                                         |
| 2.16 | Comparator design                                                                                        |
|      | Die micrograph                                                                                           |
| 2.18 | Measured single-tone spectrum                                                                            |
|      | Measured two-tone spectrum                                                                               |
| 2.20 | Measured SNDR/DNR vs. different input amplitudes                                                         |
| 3.1  | Selection pattern of DWA                                                                                 |
| 3.2  | SNDR measurements at different input amplitudes in [Theertham et al. [2020]]                             |
| 3.3  | Conventional VQ-based DEM structure                                                                      |
| 3.4  | VQ-based DEM structure with partial sorter.                                                              |
| 3.5  | Mismatch error shaping spectrum of VQ-based DEM with complete sorter and partial sorter.                 |
| 3.6  | Partial sorter structure for 15 elements.                                                                |
| 3.7  | Spectrum at input of 14 dBFS                                                                             |
| 3.8  | Filter structure of VQ-based DEM.                                                                        |
| 3.9  | Spectrum comparison before and after compensation.                                                       |
| 3.10 | VQ-based DEM logic.                                                                                      |
| 3.11 | SNDR loss by mismatch.                                                                                   |
| 3.12 | Architecture of the proposed low-complexity $2^{nd}$ -order VQ-based mismatch error shaping.             |
| 3.13 | Large integration capacitor due to a small $R_{IN}$                                                      |
| 3.14 | Integration capacitor area and OTA output swing are reduced with<br>the resistive feedforward path added |
| 3.15 | The schematic of proposed CTDSM                                                                          |
|      | Chopped OTA schematic: $1^{st}$ stage                                                                    |
|      | Chopped OTA schematic: $2^{nd}$ stage                                                                    |
|      | Simplified NS-SAR schematic.                                                                             |
|      | NS-SAR timing.                                                                                           |
| 3.20 | Measured output spectra at large-amplitude input with DEM off .                                          |

| 3.21 | Measured output spectra at large-amplitude input with DEM on $\ .$                                                                                                                                                                                                                                                                        | 70        |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| 3.22 | Measured output spectra at small-amplitude input with DEM off $% \mathcal{A}$ .                                                                                                                                                                                                                                                           | 71        |
| 3.23 | Measured output spectra at small-amplitude input with DEM on $% \mathcal{A}$ .                                                                                                                                                                                                                                                            | 72        |
| 3.24 | Measured SNDR/SNR vs. different input amplitudes                                                                                                                                                                                                                                                                                          | 73        |
| 3.25 | Power breakdown.                                                                                                                                                                                                                                                                                                                          | 74        |
| 3.26 | IRND and area plot of high-resolution CTDSMs                                                                                                                                                                                                                                                                                              | 76        |
| 4.1  | <i>Left:</i> Performances under variations form a distribution. <i>Right:</i> New technologies have larger process variations, vulnerability to environmental variations, hence higher discarded rate                                                                                                                                     | 78        |
| 4.2  | RobustAnalog Overview. (1) A pruned task subset is generated<br>from the full task set (2) Multi-task RL agent is trained on<br>task subset (3) Training continues until the produced sizing can<br>achieve training tasks. Then the sizing is evaluated on the full<br>set. If it passes all the tasks, RobustAnalog returns the result. | 86        |
| 4.3  | Visualization of corner task clustering and selection for stron-<br>gARM Latch. k-means decision boundary is shown, dividing<br>corners into two kinds: noise-limited corners (blue) and speed-<br>limited corners (brown). The corner with the worst performance<br>in each cluster is chosen as one of the training tasks               | 90        |
| 4.4  | Three analog/mixed-signal benchmarks.                                                                                                                                                                                                                                                                                                     | 96        |
| 4.5  | Simulation times for each method to take to first hit reward= $0.2$                                                                                                                                                                                                                                                                       | 96        |
| 4.6  | Compare learning curves (average reward vs. $\#$ simulation)<br>among baselines and our proposed RobustAnalog. Reward=0.2<br>indicates all tasks are passed. RobustAnalog hits the reward of<br>0.2 significantly faster than the baseline methods on all bench-                                                                          |           |
|      | marks.                                                                                                                                                                                                                                                                                                                                    | 97        |
| 4.7  | Ablation of applying multi-task and task space pruning. Using two together brings the least simulation cost.                                                                                                                                                                                                                              | 98        |
| 4.8  | Performance distributions of two intermediate sizings during the RobustAnalog optimization. Red and blue markers are performances on different corners at time $t_0$ and $t_1$ . Selected training corners are indicated by black circles.                                                                                                | 99        |
| 4.9  |                                                                                                                                                                                                                                                                                                                                           | 99<br>100 |
| т.Ј  |                                                                                                                                                                                                                                                                                                                                           | 100       |

## Chapter 1

### Introduction

# 1.1 High-Performance Mixed-Signal Circuit1.1.1 ADC Background and Applications

Nowadays, we have witnessed an increasing demand for high-performance mixed-signal circuits. Analog-to-digital converter (ADC), as a typical mixedsignal circuit, plays an essential role in many applications, as shown in Fig. 1.1. Today's world has become a ubiquitous sensing environment. ADC serves as the portal connecting the physical world to computational intelligence. For example, trillions of photos are captured by digital cameras in the world each year. A digital cameras usually demand many ADCs of around 80 dB signal-tonoise-and-distortion-ratio (SNDR) and a bandwidth of tens of MHz. Moreover, Internet of Things (IoT), mobile sensing, and bio-sensing have further reduced the boundaries of sensor data granularity. The way we perceive the world is no longer restricted to our visual or auditory sense. The boom of received information is reshaping our understanding of the world. Such minuscular sensing devices requires ADCs of 60-100 dB SNDR and tens to hundreds of kHz bandwidth. Unfortunately, high-speed, high-resolution ADCs are costly in area and power. The gap between such demand and supply motivates us to develop energy and area-efficient designs.



Figure 1.1: ADC application universe [Robertson [2015]]

#### 1.1.2 High-Speed and High-Resolution CTDSMs

Oversampling and noise-shaping ADCs are proposed to achieve high resolution. Their sampling frequency is much higher than the Nyquist rate. With proper signal processing, noise shaping, in-band quantization noise can be largely suppressed. Noise shaping is usually achieved by Delta-Sigma modulation(DSM). Continuous-time Delta-Sigma modulator (CTDSM) features a higher efficiency than its discrete-time (DT) counterpart. A generic block diagram of CTDSM is shown in Fig. 1.2. They generally consist of a quantizer, feedback digital-to-analog converter (DAC), and, importantly, a loop filter that determines the noise transfer function (NTF). The continuous pro-



Figure 1.2: Generic model of CTDSM

cessing of signals yields lower dynamic requirements on the loop filters. Also, CTDSM has easier drivability, and probably most importantly, their implicit signal filtering. The CTDSM gained its popularity in high-resolution design with sub-MHz bandwidth, as shown in Fig 1.3. High oversampling ratio (OSR) and moderate order loop-filters are usually used for high-resolution designs. CTDSM has also paved its way to the wideband domain. As evidenced by Fig. 1.3, CTDSM showed its great power efficiency at the bandwidth of hundreds of MHz. Low OSR and high loop-filter order are usually the design choices for high-speed designs. Nevertheless, there are several bottlenecks preventing the power/area efficiency of CTDSM from being further improved. For high-speed designs, the obstacles to achieve low power come two-fold. First, as



Figure 1.3: CTDSM in ADC FoM surveys.

mentioned above, the high loop filter order dictates a large number of OTAs. The OTAs need to consume a considerable amount of static current to guarantee the loop filters' performance. Second, the high loop filter order demands excess-loop-delay (ELD) compensation, which is expensive. Therefore, an efficient high-speed design should minimize the involvement of OTAs and the price for ELD compensation. For high-resolution CTDSMs, there are two obstacles to compact area designs. First, the distortion target poses stringent mismatch requirements for the feedback DAC. The feedback DAC has to occupy a large area to suppress the mismatch error. Second, large capacitors are required due to the thermal noise requirement. To attain the area reduction, it imminently calls for new design patterns for high-resolution CTDSMs. The new pattern should leverage the advancement of CMOS scaling instead of being limited by them.

A substantial part of this dissertation is to research and propose new techniques for power-efficient high-speed CTDSMs and area-efficient highresolution CTDSMs. In the discussion of power-efficient high-speed CTDSMs, the focus is placed on hybridizing CTDSM with passive DTDSM. The sheer number of power-hungry OTAs is reduced by using a nested noise-shaping (NS) quantizer. The lower CT NS order also eases the ELD compensation. Such a combination decouples the trade-off between stability and aggressive NS of CTDSMs. In the discussion of area-efficient high-resolution CTDSMs, a high-order mismatch error shaping (MES) is applied to save the resistor DAC (RDAC) area. The prototype design essentially adapts the high-resolution DSM to a more digital-intensive fashion, thus unveiling many new possibilities for area reduction. The research of high-speed and high-resolution CTDSMs in this dissertation is carried out in two stages, with a prototype ADC taped out in each stage as proof of concept.

The first prototype demonstrates a hybridized high-order CT-DT DSM. It presents a design that combines a CT single-amplifier biquad (SAB)-based  $2^{nd}$ -order loop filter with a DT passive  $2^{nd}$ -order NS Successive approximation registers (SAR). As a result, the prototype ADC uses only one OTA but achieves  $4^{th}$ -order shaping. The power and area of the NS-SAR can be made very small owing to the high gain from the SAB filter. Process-VoltageTemperature (PVT) robustness is improved as NS-SAR is immune from PVT variations. On top of the hybridization, this work implements both ELD compensation and the direct feedforward path in the charge domain, further reducing the circuit complexity and the OTA power. The techniques enable the proposed ADC to achieve 81dB SNDR over 12.5MHz BW with 3.7mW power, leading to a Schreier FoM of 176dB.

The second prototype explores the area reduction techniques in the high-resolution CTDSM. This work significantly reduces the total DAC area by applying a hardware-efficient  $2^{nd}$ -order vector-quantizer (VQ) based dynamic element matching (DEM) to the MSB DAC. The  $2^{nd}$ -order DSM not only suppresses the dominiant MSB DAC mismatch error, but also alleviates the SNDR kink issue of the  $1^{st}$ -order dynamic weighted averaging (DWA). Moreover, a direct feedforward-assisted loop filter reduces the integrating capacitor's area by 10 times. Overall, this work achieves 95 dB SNDR and 250 kHz BW while consuming 4.7 mW from a 1.1 V supply and having a compact area of 0.37 mm<sup>2</sup>.

In the subsequent chapters, the details of these prototypes and their corresponding techniques will be presented and analyzed. Measurement results will also be discussed and compared to state-of-the-art statistics.

#### **1.2** Analog Design Automation

Thanks to technology scaling and circuit innovations, analog/mixedsignal circuits have achieved great advancement in the past twenty years.



Figure 1.4: Increasing design cost in today's technologies [Olofsson [2018]].

However, the design complexity grows exponentially with technology scaling, as shown in Fig. 1.4. Nowadays, circuit design is a paramount and extremely challenging task. The explosive growth of engineering cost in circuit design calls for a highly automated workflow. Analog, mixed-signal, and RF circuits are indispensable in modern electronics systems. Implementing analog circuits is mainly a manual, time-consuming, and error-prone task. In a modern application-specific integrated circuit (ASIC), analog circuits occupy 30% of the total area but consumes 70% of engineers' efforts. Analog design requires a huge amount of human effort and lacks effective automation. A short turnaround time of analog, mixed-signal, radio-frequency (RF) IC design is highly desired.

Typical analog design procedure includes the following steps. First, designers must determine circuit topology according to the requirements. Second, devices are appropriately sized for the targeted performance. Devices include dimensions of different transistors, resistors/capacitors, and so on. After device sizing, a corresponding layout is carefully designed to avoid performance degradation. At last, a comprehensive verification including process-voltagetemperature(PVT) and Monte-Carlo simulations are performed. There has been plenty of work on automating analog sizing and layout. For sizing problems, many simulation-based optimization methods have been explored recently [Liu et al. [2009]; Lyu et al. [2018a]; Wang et al. [2020a]; Settaluri et al. [2020]]. For the automatic analog/mixed-signal layout synthesis, current popular methods can be categorized into two kinds: procedural-based methods and optimization-based methods. Recent works on procedural-based methods have shown the potential of producing tape-out quality designs [Wulff and Ytterdal [2017]; Chang et al. [2018]]. For optimization-based methods, there are several open-source layout generators [Kunal et al. [2019]; Xu et al. [2019]].

In this dissertation, we focus on the analog circuit sizing which is an important stage in the procedure. Analog sizing, as an engineering optimization problem, is hard due to the following reasons: 1) The relationship between the circuit performances and the design variables is extremely complex and highly non-linear. In fact, designers spent the most time on the derivation of circuit performance metrics. For example, noise analysis is non-trivial even for highly-skilled designers. Designers dive into noise sources and transfer functions, attempting to obtain insights to design optimal sizings. Tremendous efforts of such derivation can be avoided by relying on the simulator to provide the mapping. However, A large number of slow simulations are needed, which makes this optimization problem computationally expensive. 2) The solution space is sparse and multi-objective optimization is not easy. Unlike digital circuits, analog design usually requires transistors to operate in a narrow region. For example, transistors in OTAs must be in saturation to provide proper trans-conductance and output resistance. Moreover, very few combinations of design variables can produce desired circuit performance. The list of things that can go wrong is long. This problem becomes worse in a more complicated design that has 20 or more design variables. 3) There are many constraints that are difficult to be quantified. Designers develop their common sense from their past experiences. The width to length ratio should not be either too large or small to save area. Critical transistors should not be biased into the deep sub-threshold region to ensure PVT robustness. However, in analog automation, such common sense is blind to the machine unless one explicitly set a quantized constraint. Machine sometimes presents solutions that are numerically satisfying but violate some hidden constraints. Debugging with the machine-generated results and implementing the ad-hoc constraints is also a laborious and frustrating process.

#### **1.2.1** Automation Problem Formulation

Circuit sizing can be formulated as a constrained optimization problem. Given a fixed circuit topology, we search for a circuit sizing that has the optimal performance.

minimize 
$$F_0(X)$$
  
subject to  $F_i(X) < C_i, \quad i = 1, \dots, k$  (1.1)

where

$$X = X_1, X_2, ..., X_n$$
  
 $D = D_1, D_2, ..., D_n$   
 $C = C_1, C_2, ..., C_m$ 

X, the sizing vector, is an n-dimensional variable which corresponding to n circuit sizing parameters. D is the domain for X. For example,  $D_1$  is [0, 1] which means the design space of  $X_1$  is [0, 1]. C is the constraint set for all circuit metrics. Because we have m metrics, the number of constraints is also m.  $F_i(X)$  is the  $i^{th}$  performance metric of circuit.  $F_i$  is a non-linear mapping between X and the  $i^{th}$  metric in the performance. X is the input. We rely on the circuit simulator to provide this mapping. Circuit design is a multi-objective design. Thus,  $F_0(X)$  is a vector. In most existing automation methods,  $F_0(X)$  is scalarized. Therefore, our goal is to find an optimal X that can satisfy any constraints in C. A common variant is to make  $F_0(X)$  a constant and put all objectives into constraints. Thus, the constrained optimization problem degenerates to a constraint satisfaction problem (CSP). The CSP formulation implies that sometimes our target is to satisfy design requirements rather than maximize the performance.

#### 1.2.2 Optimization Methods

Various research works attempted to tackle the analog automation challenges with different levels of human involvement. Human designers can help to narrow down the initial sizing range, which makes the problem easier. Such simplification requires designers' expertise in circuit behavior, device technology, and past design experiences. The more prior human knowledge we utilize, the easier the sizing will be. Therefore, existing automation techniques can be categorized into equation-based and simulation-based methods. Equations are derived by human experts, characterizing the relationship between sizing and circuit metrics. Equation-based methods are computationally efficient as they lump the complex circuit system into multiple equations. However, solutions are not satisfying because equations are inaccurate, which becomes a more severe problem in the advanced technologies. Simulation-based methods search the design space for optimal solutions according to the circuit simulation results. The automatic design space exploration can provide solutions with superior performance to that of human designs. Nevertheless, the expensive simulations prevent such methods from solving large-scale circuits or many constraints. This dissertation will talk about improving the efficiency and scalibility through various optimization techniques.

As simulation-based methods solely rely on the simulator to provide sizing-to-performance mapping close-form expression of objective function and constraints can be hardly obtained. In the meanwhile, the simulations are computationally expensive and time-consuming. Thus, it resembles a blackbox optimization problem. With multiple sizing variables as inputs, the simulator provides multi-dimensional metrics as outputs. Several black-box optimization methods are applied to the analog sizing, achieving different optimalities and sample efficiency. They are Evolutionary Strategy (ES), Bayesian Optimization (BO), and Reinforcement Learning (RL)-guided optimization.

#### 1.2.2.1 Evolutionary Strategy

Evolutionary computation is a global heuristic search method that mimics the natural evolution process to find approximate solutions for optimization problems. ES is a subset of Evolutionary computation. ES works with vectors of real numbers as representation of solutions. ES solves the trialand-error problems based on survival of the fittest. Fitness is usually defined by our objective function. The evolutionary concepts are inheritance, selection, mutation, and crossover. A typical ES methodology is shown in Fig. 1.5. Initially, a set of individual solutions, called population, is randomly generated to cover the entire range of search space extensively. Then the fitness of each individual in the population is evaluated. The next step consists of the selection stage, where the fitness of the individuals is utilized for determining



Figure 1.5: Evolutionary Computation Flow.

the individuals that will be selected for breeding to generate solutions of the future generation. A few individuals of low fitness can also be selected to ensure solution diversity. Diversity can prevent premature convergence and strike a balance between local versus global search. Next, the reproduction stage occurs where the selected solutions from the previous step are mated through crossover and/or mutation. The algorithm is continued until fitness

values satisfy our requirements.

The advantages of ES are that they are not affected by discontinuous optimization landscape and are suitable for high-dimensional problems. ES achieved progress in solving medium-scale problems (20-40 dimensions). In the context of analog sizing, the scale is corresponding to a circuit building block that has around 20 devices. ES was applied to analog synthesis as early as 1997 in [Koza et al. [1997]]. In the work of [Liu et al. [2009]], augmented Lagrangians are incorporated to solve the constrained sizing optimization. Surrogate modeling has also been introduced to improve sampling efficiency [Liu et al. [2013, 2014, 2016]]. Recently, Deep neural network (DNN) has been put into the loop of ES to model the circuit behavior [Hakhamaneshi et al. [2019]].

#### 1.2.2.2 Bayesian Optimization

Bayesian optimization is another sequential design strategy for global optimization of black-box functions that does not assume any functional forms. The basic flow is shown in Fig. 1.6. The objective function is evaluated by the simulator, which is an unknown and expensive process. Bayesian optimization treats it as a random function and place a prior over it. In each iteration, a set of function evaluations are collected and used as training data to update the prior. Then a posterior distribution over the objective function is obtained from prior. At last, an acquisition function is constructed to determine the next query point. There are several methods used to define the prior/posterior distribution over the objective function. Models that describe such distribu-



Figure 1.6: Bayesian Optimization Flow.

tions are called surrogate model. They not only provides predictive means but also the corresponding uncertainty estimations. The most common surrogate model is Gaussian Processes. Bayesian optimization is particularly advantageous in terms of sample efficiency. However, it has high computation complexity with the number of samples. Considering its computation cost, it is a good fit in problems of fewer than 20 dimensions.

BO was introduced to analog sizing problem later than ES. In the work of [Lyu et al. [2018b,a]], authors view sizing as a multi-objective sizing and applied BO framework to solve it. Sizing optimization can also benefit from simulations of different accuracy and speeds [Zhang et al. [2019b, 2020]].

#### 1.2.2.3 RL-Guided Optimization

RL originates from ideas from dynamic system theory, specifically, as the optimal control of incompletely-known Markov decision problem. We train a learning agent to achieve a goal by making it interacting with its environment over time. The learning agent sense the states of the environment and take actions that affect the state. These three aspects, sensation, action, and goal echo the sequential decision process in the optimization problem. Beyond the agent and the environment, there are four main elements of a reinforcement learning system: a policy, a reward, a value function, and, optionally, a model of the environment. A policy defines the agent's behavior at a given state. The policy can be loosely described as a mapping from perceived states of the environment to actions. Generally, policies may be stochastic and specify probabilities for each action. A reward defines the goal of a RL problem. On each iteration, environment sends a reward, a number, to the agent. The agent's objective is to maximize the long-term cumulative reward. Models refer to the environment modeling, which is similar to the surrogate model. They mimic the environment outputs given the same inputs. Recently, RL methods



Figure 1.7: Reinforcement Learning Flow.

become attractive as they achieved great success in domains including gaming, autoML, and robotics. Applied to the circuit design automation, RL methods show the potential to achieve higher circuit performances given enough explorations [Wang et al. [2018a, 2020a]; Settaluri et al. [2020]]. Moreover, RL enables transfer- learning across different design conditions, including different technologies and pre/post-layout design stage. Moreover, DNNs are being extensively studied and turn out to be a powerful model of the environment. They can approximate the complex relation between circuit parameters and performances. DNNs are tailored to circuit optimizations in the works of [Budak et al. [2021]; Yang et al. [2021]]. Graph neural network (GNN) leverages the circuit connectivity information to improve the model performance [Wang et al. [2020a]; Zhang et al. [2019a]].

#### 1.2.2.4 Challenges in Practical Analog Optimization

We still have many challenges though there has been great progress in circuit sizing optimization. There are three directions we can explore. 1) scal*abity.* All existing works have focused on small-sized circuit blocks. To extend the current methods to larger-scale circuits, we have two potential directions. First, the single thread of sequential trial-and-error process limits the exploration at each time step. Asynchronous parallelism is a way to best leverage modern compute power nowadays. Second, hierarchy in large circuit systems has not been studied. Human designers often start designing in a top-down fashion. Similarly, how to automate divide and conquer, how to communicate between high-level and low-level abstractions are essential questions we have to answer to achieve large-scale optimization. 2) robustness. There are very few works about optimization under uncertainty. However, the real challenge of designing the analog circuit is to overcome numerous unpredictable uncertainties in the fabrication and end-user environment. Incorporating uncertainty into automation flow in a brute force way leads to the explosive simulation cost increase. 3) *generalization*. Unlike digital circuits, analog circuits are usually highly customized. However, there are still many properties analog circuits share in common (e.g. noise-bandwidth trade-off). Existing methods are applied to a small cluster of circuits and can hardly be adapted into a different kind. To overcome this challenge, a promising way is to discover a better circuit representation. GNN, as used in [Mirhoseini et al. [2021]], has a data structure better aligning with the circuit connectivity. With such circuit representation, offline data could be exploited to achieve few-shot learning.

A substantial part of dissertation is to address the robustness problem of analog design automation. The challenges of variation-aware automation come twofold. First, the simulation cost is prohibitively expensive in order to get accurate variation effects. Second, optimizing circuits under one condition may conflict with the other, making the optimization landscape more complex. a multi-task RL framework with task-space pruning is presented to address the above challenges. Variations are viewed as different tasks. Their correlations are modeled and conflicts are mitigated, leading to an increased sampling efficiency. A proxy training task set is selected by pruning, reducing the simulations of the full set. In the subsequent chapter, the corresponding methodology and analysis will be shown. Comparisons between existing optimization methods and the proposed method will be discussed.

## Chapter 2

# High Speed Hybrid CT-DT DSM with Charge-domain ELDC

This chapter <sup>1</sup> presents a hybrid  $4^{th}$ -order delta-sigma modulator (DSM). It combines a continuous-time (CT) loop filter and a discrete-time (DT) passive  $2^{nd}$ -order noise-shaping SAR (NS-SAR). Since the  $2^{nd}$ -order NS-SAR is robust against PVT, the stability of this  $4^{th}$ -order DSM is similar to that of a  $2^{nd}$ -order CT-DSM. The CT loop filter is based on single-amplifier bi-quad (SAB) structure. As a result, only one OTA is used to achieve  $4^{th}$ -order noise shaping, leading to a high power efficiency. Moreover, this work implements both ELD compensation and an input feedforward path inside the NS-SAR in the charge domain, further reducing the circuit complexity and the OTA power. Overall, this work achieves 81 dB SNDR over 12.5 MHz with 3.7 mW

<sup>&</sup>lt;sup>1</sup>This chapter is a partial reprint of the publication: Wei Shi, Jiaxin Liu, Abhishek Mukherjee, Xiangxing Yang, Xiyuan Tang, Linxiao Shen, Wenda Zhao and Nan Sun, "A 3.7mW 12.5MHz 81dB-SNDR 4th-Order CTDSM with Single-OTA and 2nd-Order NS-SAR," in *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. C170-C172, February 2021. I am the main contributor in charge of circuit design, layout, and chip validations.

power, leading to a Schreier FoM of 176 dB.

### 2.1 Introduction

The continuous time (CT) delta-sigma modulator (DSM) is a popular choice in wide-band analog-to-digital converters (ADCs), especially for wireless transceivers. Typical DT and CT DSMs are shown in Fig. 2.1. The loop filter of CT DSM processes the signal before sampling, achieving both noise shaping and anti-alias filtering. The settling requirement is obviated owing to the CT-domain operation. Therefore, lower dynamic requirements for the loop-filter, the inherent anti-aliasing filtering, and an easy-to-drive front-end are the major advantages that CTDSM has over its DT counterpart. A low oversampling ratio (OSR) is needed if an energy-efficient CT DSM with high bandwidth and high resolution is desired. A high-order noise transfer function (NTF) is usually applied to achieve the target resolution. However, several challenges are posed to the high-order CT DSMs due to the non-idealities of practical implementations. First, the CT loop-filter is sensitive to the process variation [Schreier and Zhang [1996]]. The loop-filter coefficients primarily depend on the resistors and capacitors whose values vary across PVT variations. Consequently, extra tuning circuitry is needed. Second, conventional high-order loop-filters have a large number of OTAs. Multiple cascaded OTAs cause phase shifts, leading to instability. Third, excess-loop-delay (ELD) in the CT DSM alters the NTF. The price paid to compensate the ELD is high in high-order designs. Extra circuit complexity and power consumption dete-



Figure 2.1: Typical discrete-time and continuous-time structures.

riorate the power efficiency.

In this work, we present a  $4^{th}$ -order hybrid CT-DT DSM that combines a  $2^{nd}$ -order single-amplifier-biquad (SAB)-based CT filter with a  $2^{nd}$ -order passive noise-shaping SAR (NS-SAR). With the DT noise shaping immune from the PVT variation, the hybrid DSM obtains a better robustness. Compared to the conventional design, the combination of SAB and passive NS-SAR reduces the number of OTAs from many to one. The power is saved because of removing many power-hungry OTAs and the relaxed bandwidth requirement. Moreover, the ELD compensation (ELDC) is embedded into the charge domain. There is minimal extra power and circuitry cost for the ELD compensation since it is implemented with the small CDAC of the NS-SAR.

This chapter is organized as follows. Section 2.2 reviews the prior highorder CTDSM designs and introduces the proposed hybrid  $4^{th}$ -order DSM. Section 2.3 discusses the ELD compensation techniques and introduces the proposed charge-domain ELD compensation with both feedforward and feedback paths. Section 2.4 describes the design of the prototype ADC and discusses implementation details. Section 2.5 presents measurement results. Section 2.6 concludes this chapter.

# 2.2 Proposed 4<sup>th</sup>-Order Hybrid CT-DT DSM 2.2.1 Prior High-Order CT DSMs

There is extensive research on tackling the challenges in high-order CTDSM designs. The NTF sensitivity to the RC time constant variation causes the performance degradation and instability. Leaving design margin for variation is a feasible solution, which essentially trades noise-shaping performance for better robustness [De Vuyst et al. [2011]; Ho et al. [2015]; Berti et al. [2016]]. An alternative solution is a corner-adaptive tuning circuit or off-chip calibration [Pavan et al. [2017]]. The loop-filter implementation is a determinant for the power efficiency. Gm-C and VCO integrators are proposed as a low-power alternative for the power-hungry feedback integrators [Mukherjee et al. [2020]; Li et al. [2017]]. However, their linearity is limited. To reduce the total delay of the multiple cascaded OTAs, higher power budget has to be assigned to meet the stringent OTA bandwidth requirement. OTA bandwidth

requirements can be relaxed by the ELD over-compensation [Shu et al. [2013]], but there are still four OTAs for a  $4^{th}$ -order DSM. One way to improve the power efficiency is to reduce the sheer number of OTAs. SAB has been studied and used in the CT DSM [Zanbaghi et al. [2013]; Wang et al. [2018b]; Berti et al. [2016]]. However, unsatisfying loop-filter robustness still remains as an issue.

Lowering the CT noise shaping order by adding the DT order can be a strategy to improve both the robustness and power efficiency. A CT front-end is applied to the DSM, followed by switched-capacitor based filters in [Signore et al. [1990]]. The large sampling capacitor is avoided by using a first-order RC integrator. The precise positioning of poles and zeros in the NTF are guaranteed by the following third-order DT filter. However, the DT filter still needs OTA. Thus, the total number of OTAs in the DSM is unchanged. An effective way to incorporate the DT shaping order is to embed a noise-shaping quantizer into the DSM loop. Emerging passive NS-SARs are attractive because they are OTA-free, PVT robust, and energy-efficient [Guo and Sun [2016]; Lo et al. [2019]]. An RC integrator and a  $2^{nd}$ -order passive NS-SAR is proposed in [Liu et al. [2019]]. Third-order shaping is achieved by a single OTA. However, the passive NS-SAR input-referred noise cannot be sufficiently suppressed by the  $1^{st}$ -order CT filter. Even worse, the NS-SAR noise is amplified by four times because of the direct feedback ELD compensation, which will be discussed in Section 2.3. Therefore, the resolution is limited to 70 dB SNDR. The work of [Xing et al. [2020]] adopts a SAB-based filter as front-end and  $1^{st}$ -order NS-SAR as the backend quantizer. However, the NS-SAR NTF is mild, and the current domain ELD compensation is costly. A  $1^{st}$ -order NS-SAR is used with a  $1^{st}$ -order RC integrator in [Lo et al. [2019]]. The high resolution relies on the 12-cycle NS-SAR. Nevertheless, the 12-cycle NS-SAR requires a larger timing budget and area consumption.

#### 2.2.2 Hybrid DSM Architecture

To address the high-order CT DSM design challenges mentioned above, we propose a  $4^{th}$ -order hybrid CT-DT DSM. Fig. 2.2 shows the architecture of the proposed hybrid DSM with a  $2^{nd}$ -order CT SAB filter and a passive  $2^{nd}$ order DT NS-SAR. The NS-SAR NTF coefficients are set by capacitor and transistor device ratios [Liu et al. [2019]]. Therefore, the  $2^{nd}$ -order shaping from the passive NS-SAR is immune from PVT variations, which improves the robustness of the proposed  $4^{th}$ -order DSM.

The extra thermal noise from the passive NS-SAR prevents it from achieving a high resolution as a standalone quantizer. In the proposed DSM, the high gain provided by the SAB filter suppresses the noise of the passive NS-SAR. Consequently, the power and area of the NS-SAR can be made very small. In the meantime, the  $4^{th}$ -order shaping is achieved by only one OTA. The power is saved from the elimination of multiple power-hungry OTAs and the bandwidth relaxation. Moreover, the ELD compensation is implemented in the charge domain with minimal circuitry and power overhead.



Figure 2.2: Proposed hybrid DSM with SAB filter and passive NS-SAR quantizer

#### 2.2.3 Coefficient Sensitivity

Fig. 2.3 compares a conventional  $4^{th}$ -order CTDSM with the proposed hybrid architecture. In the conventional design, tunable capacitor banks are added to deal with the variation. Four OTAs are cascaded in the loop-filter with the corresponding bandwidths  $BW_{1-4}$ . The proposed DSM has a single OTA with bandwidth  $BW_0$  and a noise-shaped quantizer. The sensitivity of the conventional and proposed DSM to the RC time constant variation is shown in Fig. 2.4. Both DSMs have an NTF with two optimized zeros.

It can be clearly seen that the conventional  $4^{th}$ -order CTDSMs are highly sensitive to RC time constant variations. It becomes unstable when the variation in the coefficients (1/RC) goes beyond +8%. In the meantime, its SQNR drops significantly as the coefficients decreases. By contrast, the proposed DSM remains stable as long as the coefficients variation is less than



Figure 2.3: (a) Conventional  $4^{th}\text{-}\mathrm{order}$  CT DSM, (b) Proposed  $4^{th}\text{-}\mathrm{order}$  hybrid CT-DT DSM

+15%, and the slope of the SQNR variation is also milder (e.g., SQNR >80 dB with -30% coefficient variation). The improved robustness leads to the area and complexity reduction of the tuning circuits.

#### 2.2.4 Finite UGB Effects

The finite UGB of the OTA degrades the stability of the feedback loop. To investigate how the proposed DSM performs under different  $BW_0$ , we in-



Figure 2.4: Coefficient sensitivity comparison between a conventional and the proposed  $4^{th}$ -order hybrid CT-DT DSM

corporate the OTA UGB into our DSM modeling. The SAB transfer function is:

$$H(s) = \frac{k_1 s + k_2}{s^2 + k_3 s + w_p^2} \tag{2.1}$$

We model the OTA as a single-pole system with a finite DC gain  $A_{DC}$ and a pole frequency  $\omega_0$ . Correspondingly, the UGB is  $A_{DC}\omega_0$ . In order to achieve an ideal SAB filter response,  $k_3$  is set to zero by choosing proper



Figure 2.5: Proposed hybrid CT-DT DSM SQNR with different OTA UGBs RC values. Therefore, the SAB transfer function with the finite OTA UGB becomes

$$H'(s) = \frac{k_1 s + k_2}{s^2 + w_p^2 + (\frac{s}{UGB} + \frac{1}{A_{DC}})(s^2 + k_1 s + k_2 + w_p^2)}$$
(2.2)

Fig. 2.5 shows the SQNR over different UGB of the single OTA. From  $0.5f_s$  to  $1.5f_s$ , the SQNR improvement is around 10 dB while there is only 2~3 dB improvement from  $1.5f_s$  to  $5f_s$ . Thus,  $1.5f_s$  is chosen as our OTA bandwidth. By contrast, conventional  $4^{th}$ -order CTDSMs requires  $3\sim 4 f_s$  OTA bandwidth to make the loop stable [Wang et al. [2020c]]. The relaxation of OTA bandwidth and the decreased number of OTAs result in significantly reduced power.



Figure 2.6: ELDC implemented by the direct feedback path



Figure 2.7: ELDC implemented by the residual feedforward path

# 2.3 Proposed ELD Compensation Scheme

#### 2.3.1 Brief Review of ELD Compensation

ELD is a key problem in the CT DSM design. Such delay is bound to degrade modulator stability and change NTF. ELD compensation techniques can be used to restore the loop stability and the desired NTF. However, the price for ELD compensation is high in high-order and high-speed CT DSM designs. The high cost consists of two parts. First, it adds the extra circuitry into the DSM. In the conventional designs, a direct feedback path around the quantizer dictates an extra DAC [Pavan et al. [2017]]. Second, the tuning of the filter coefficients is needed. To compensate for the delay, we have to increase the coefficients of lower-order paths in the loop filter. The required filter UGB is then increased. The OTA bandwidths have to be increased proportionally to maintain the desired filter response. Therefore, the ELD compensation incurs a significant increase in both circuitry complexity and power consumption. Prior works have dedicated significant efforts to address the high cost of ELD compensation. The direct feedback path can be embedded into the flash quantizer [Shu et al. [2013]]. However, the feedback path complicates the flash quantizer design. If the quantizer is SAR, the feedback path implementation can be simplified because of the inherent CDAC [Wei et al. [2015]; Wu et al. [2016]; Liu et al. [2019]]. However, the direct feedback path around the quantizer increases the loop-filter swing, as shown in Fig. 2.6. The larger swing degrades the OTA linearity or even exceeds the power supply range. A dynamic range scaling has to be performed by changing the loop-filter and quantizer gain. With a reduced filter gain, the noise is amplified. A residual ELD compensation can avoid the large filter output swing by not feeding the signal component to the filter output [Ho et al. [2015]; Wang et al. [2020c]]. Fig. 2.7 shows that only noise part is sent to the sum node before the quantizer. However, they are implemented in the current domain. A feedforward resistor is inserted into the RC integrator, which is shown in Fig. 2.8. The ELD compensation is affected by the finite UGB of the OTA. Alternatively, the compensation can also be implemented in the phase domain in a VCObased design [Huang et al. [2017]]. Nevertheless, it consumes almost a 1.5-fold power of the whole loop filter.



Figure 2.8: (a) Residual ELDC signal flow (b) ELDC path implementation in the current domain

#### 2.3.2 Proposed Charge-domain ELD Compensation

Thanks to the inherent CDAC of SAR quantizer, we can implement the residual ELDC in the charge domain easily, as shown in Fig. 2.9. We propose an efficient charge-domain ELD compensation shown in Fig. 2.10(a). The high efficiency stems from two reasons. First, the compensation is implemented by the small CDAC of NS-SAR. The passive capacitor based ELD compensation



Figure 2.9: Residual ELDC implementation in the charge domain

adds no extra burden on the OTA. In the meantime, the additional circuitry on the CDAC is minimal. An extra feedback DAC is not needed. Second, it implements the residual ELD compensation by combining a feedforward and a feedback path. Therefore, no dynamic range scaling is needed. The noise performance degradation is avoided. Fig. 2.10(b) shows the timing and charge domain operations. The feedback path and feedforward path are implemented by a sub-DAC,  $C_{ELDC}$ . The feedback path output is fed into the bottom plate of  $C_{ELDC}$ . The feedforward path output is sampled onto the top plate of  $C_{ELDC}$ . Note the feedforward signal is also sampled onto the  $C_{SAR}$ . This helps to reduce the SAB output swing further, leading to the linearity enhancement. The timing of the ELD compensation is the following. During the sampling phase, the bottom plates of  $C_{ELDC}$  and  $C_{SAR}$  are connected to D[n-1] and the inverse of the SAB output  $-V_L$ , respectively. Their top plates are both connected to the DSM input,  $V_{IN}$ . At the sampling frequency of 500 MHz, all sampling operations need to finish within 200 ps under 40-nm LP technology. Fortunately, they do not need to be absolutely accurate, as any sampling error



Figure 2.10: (a) Signal flow and schematic of proposed charge domain embedded ELD compensation, (b) Timing diagram of ELD compensation operations on CDAC

is  $2^{nd}$ -order shaped by the front-end SAB. As the SAR conversion phase begins, their top plates are disconnected from  $V_{IN}$ . Shortly after that, the bottom plates of  $C_{ELDC}$  and  $C_{SAR}$  are reset to  $V_{cm}$ . Consequently, both feedback and feedforward functions are implemented. Then the SAR conversion is performed on  $C_{SAR}$ .



Figure 2.11: (a) The UGB increase of the conventional  $4^{th}$ -order filter after ELDC, (b) The UGB increase of the SAB  $2^{nd}$ -order filter after ELDC.

It is worth noting that lowering the number of CT noise-shaping order can significantly save the power of ELD compensation. Compared with a  $4^{th}$ order loop filter, a  $2^{nd}$ -order filter is easier to stabilize. To compensate for the same ELD, the UGB increase of the conventional  $4^{th}$ -order filter is significantly larger that of the  $2^{nd}$ -order SAB filter, as shown in Fig. 2.11. Assuming the loop-filters are active RC filters, the filter transfer functions with higher UGBs need OTAs with higher bandwidths, leading to more power consumption. Note the frequency response plots in Fig. 2.11 are normalized to  $f_s$ . It is more challenging to push UGB even further at a higher  $f_s$ . Consequently, DSMs of a higher  $f_s$  benefits more from lower-order CT shaping because of the reduced ELD cost.

#### 2.3.3 Feedforward Path and STF

In the high-order CTDSM with multiple OTAs, the full output swing is handled by the last integrator. The resulting distortion due to the large output swing is suppressed by the prior stage gain. However, SAB has an inferior linearity because the single OTA has to handle the full swing and provides less loop gain to suppress the distortion. The non-linearity becomes an issue when the target resolution is high (SNDR >80 dB). In this work, the feedforward path with a gain of  $k_{ELD}$  and direct feedback DAC is shown in Fig. 2.12(a). Such organization forms a residual ELDC in Fig. 2.12(b). Besides the feedforward signal for ELD compensation purpose, the extra signal is feedforwarded to enhance the filter linearity. This feedforward component, with a gain of  $k_{FF}$ , will modify the STF of the CT DSM. Since STF of the NS-SAR is unity, we only need to study the STF of second-order CT outer loop. We can rearrange the signal flow model to separate the CT and DT transfer functions [Pavan et al. [2017]]. Fig. 2.13(a) shows the signal models.

After the rearrangement, it can be easily shown that the STF is

$$STF(s) = (LF_1(s) + k_{FF}) \cdot NTF_2(e^{sT_s})$$
 (2.3)



Figure 2.12: (a) Original CT DSM model, (b) Equivalent residual ELD compensation model.

where  $LF_1(s)$  is the loop-filter transfer function with ELDC and  $NTF_2$  is the corresponding  $2^{nd}$ -order NTF created by the outer CT loop. STF(s) is shown in Fig. 2.13(b).  $LF_1(s)$  is low-pass. Since  $NTF_2$  creates notches at multiples of sampling frequency, STF(s) maintains the anti-aliasing feature. The cost is a peaking of 6 dB due to the direct feedforward path.



Figure 2.13: (a) Rearranged model to separate CT and DT domain, (b) STF Bode plot.

## 2.4 Circuit Implementations

#### 2.4.1 System Architecture

Fig. 2.14 shows the schematic of the entire DSM. The  $2^{nd}$ -order CT front-end is implemented by a SAB. To realize the desired biquad response, the OTA outputs are cross-coupled to its inputs via  $R_1$  and  $C_1$ . Negative feedback paths are formed by  $R_2$  and  $C_2$ .  $C_1$  and  $C_2$  are made by adjustable capacitor banks to further enhance DSM stability against process variations. The passive NS-SAR achieves  $2^{nd}$ -order NTF by capacitor merging and the multi-input comparator ratio. The OTA output and one-cycle delayed quantizer output



Figure 2.14: Schematic and timing diagram of proposed hybrid DSM.

are sampled onto the CDAC bottom plates. ADC input is sampled onto the CDAC top plates. kT/C noise and capacitor mismatch errors in the NS-SAR CDAC are significantly attenuated by the  $2^{nd}$ -order shaping provided by the SAB. Thus, capacitors in the CDAC can be made very small. The differential CDAC is used for both sampling and conversion. Each half is only 30 fF. The small CDAC facilitates the high-speed operations in the NS-SAR, including signal sampling, SAR conversion, and capacitor-merging based

residue integration. The power that CDAC draws from the reference is only 0.04 mW, which is 1% of the total ADC power.

#### 2.4.2 SAB Design

In the conventional CT DSM design, the number of OTAs is at least the number of orders. A number of prior works aim to reduce the number of OTAs in the loop filter because the OTAs are the main contributors to the power dissipation. The work of [Zeller et al. [2011]] introduces a cross-coupled SAB which can achieve arbitrary second-order polynomial in the numerator of its transfer function. A cross-coupled structure with a simplified RC network is used in the CT DSM [Ho et al. [2015]]. It only contains one resistor and two capacitors. Reference [Chae et al. [2013]] uses a structure with only one more resistor than that of [Ho et al. [2015]] to make a single amplifier resonator. The work of [Berti et al. [2016]] uses the same structure as that of [Chae et al. [2013]] but changes the output node to achieve complex poles in the filter transfer. Therefore, NTF with optimized zeros can be achieved. Because of the low OSR in our design, we use the same SAB as that of [Berti et al. [2016]]. In our work, the coefficients in (2.1) are:

$$\begin{cases} k_{1} = \frac{1}{R_{IN}C_{2}} \\ k_{2} = \frac{1}{R_{IN}R_{1}C_{1}C_{2}} \\ k_{3} = \frac{1}{R_{1}C_{1}} + \frac{1}{R_{2}C_{2}} - \frac{1}{R_{1}C_{2}} \\ w_{p}^{2} = \frac{1}{R_{1}R_{2}C_{1}C_{2}} \end{cases}$$

$$(2.4)$$

Assuming  $R_{IN}$  is bounded by thermal noise constraints, the resulting component values are

$$\begin{cases}
R_{2} = \frac{k_{2}}{\omega_{p}^{2}}R_{IN} \\
R_{1} = \frac{R_{2}}{1 + \left(\frac{k_{2}}{k_{1}}\right)^{2}\frac{1}{\omega_{0}^{2}}} \\
C_{1} = \frac{k_{1}}{k_{2}R_{1}} \\
C_{2} = \frac{\left(\frac{k_{2}}{k_{1}}\right)^{2}}{\left(\frac{k_{2}}{k_{1}}\right)^{2} + \omega_{p}^{2}}C_{1}
\end{cases}$$
(2.5)

 $C_1$  is derived from the condition  $k_3 = 0$ . Arbitrary combinations of a second-order and a first-order path can be achieved. However, there is a lack of a zero-order path in the SAB filter transfer function. The zero-order path has to be implemented by other circuitry in the CT DSM.

The single OTA is a key circuit block in our DSM design. Thanks to the system design of the proposed DSM, the tight BW requirement is significantly



Figure 2.15: The two-stage feedforward-compensated OTA design

relaxed to only  $1.5f_s$ . Therefore, a simple two-stage feedforward compensated OTA is adopted in the SAB filter. Fig. 2.15 shows the OTA schematic. The cascade of  $M_{1-4}$  and  $M_7$  provides a slow but high dc gain path.  $M_8$  creates a fast feedforward path between the input and the output to stabilize the OTA. The input common-mode voltage of second stage is determined by the output common-mode voltage of the first stage. Therefore, the current of the second stage tracks that of the first stage under PVT variations. The constant current



Figure 2.16: Comparator design

ratio leads to a constant  $g_m$  ratio between the first and second stage, hence a high stability. The first stage adopts the current reuse technique in [Song et al. [2013]], which nearly doubles the  $g_m/I_d$ . The first stage CMFB loop is implemented by an output resistor divider. The second stage CMFB includes an extra error amplifier. Overall, this OTA consumes 2 mW under the 1.2-V power supply, which takes 54% of the total power.

## 2.4.3 2<sup>nd</sup>-order NS-SAR Design

The passive NS-SAR is an emerging quantizer architecture. It can achieve a high resolution by the noise shaping capability while inheriting the high power efficiency from the SAR architecture. They leverage the charge sharing between capacitors to implement the integration. The passive NS-SAR variants have different ways to implement the gain block. Two methods are capacitor stacking [Lin et al. [2019]; Liu et al. [2020]] and multi-input comparator [Guo and Sun [2016]; Liu et al. [2019]]. Capacitor-stacking helps

|                         | ISSCC-13<br>Shu | ISSCC-17<br>Kim | ISSCC-19<br>Lo                   | VLSI-18<br>Liu                   | VLSI-19<br>Weng                  | VLSI-20<br>Xing                  | This<br>work                     |
|-------------------------|-----------------|-----------------|----------------------------------|----------------------------------|----------------------------------|----------------------------------|----------------------------------|
| Order                   | 4               | 4               | 2                                | 3                                | 3                                | 3                                | 4                                |
| # OTA                   | 4               | 2               | 1                                | 1                                | 3                                | 1                                | 1                                |
| Quantizer               | Flash           | DNSQ            | 1 <sup>st</sup> -order<br>NS-SAR | 2 <sup>nd</sup> -order<br>NS-SAR | 1 <sup>st</sup> -order<br>NS-SAR | 1 <sup>st</sup> -order<br>NS-SAR | 2 <sup>nd</sup> -order<br>NS-SAR |
| Process [nm]            | 28              | 130             | 7                                | 40                               | 12                               | 28                               | 40                               |
| Fs [MHz]                | 640             | 640             | 400                              | 500                              | 832                              | 1500                             | 500                              |
| BW [MHz]                | 18              | 15              | 25                               | 12.5                             | 30                               | 50                               | 12.5                             |
| Area [mm <sup>2</sup> ] | 0.08            | 0.17            | 0.056                            | 0.029                            | 0.058                            | 0.024                            | 0.057                            |
| Power [mW]              | 3.9             | 11.4            | 3.8                              | 1.16                             | 3.2                              | 10.4                             | 3.7                              |
| DR [dB]                 | 78.1            | 82.9            | 79.4                             | 73                               | 74.5                             | 80.6                             | 82.2                             |
| SNDR [dB]               | 73.6            | 80.4            | 74                               | 70.4                             | 71.4                             | 74.4                             | 80.9                             |
| FoMs* [dB]              | 170.2           | 171.6           | 172.2                            | 170.7                            | 171                              | 171.2                            | 176.1                            |

Table 2.1: Comparison with the state-of-the-art single-loop CT DSMs.

 $*FoM_S = SNDR + 10 \cdot \log_{10}(BW/Power)$ 

to reduce the comparator noise [Lin et al. [2019]; Liu et al. [2020]]. However, those NS-SARs have only  $1^{st}$ -order noise shaping and complicate the CDAC design. Therefore, we choose multi-input comparator based NS-SARs, which achieves  $2^{nd}$ -order noise shaping and avoids adding circuit complexity.

Fig. 2.14 also shows the architecture of NS-SAR. In our design, 4-bit conversion is finished during  $\phi_{clkc}$ , followed by two integration phases,  $\phi_{int}\langle 0 \rangle$ and  $\phi_{int}\langle 1 \rangle$ . The 1<sup>st</sup> integration phase  $\phi_{int}\langle 0 \rangle$  is merged with the SAR LSB conversion. Therefore, only the 2<sup>nd</sup> integration phase  $\phi_{int}\langle 1 \rangle$  takes the extra



Figure 2.17: Die micrograph

time. Overall, the timing budget of this 4b NS-SAR is equivalent to that of the standard 5b SAR, but can provide an effective 8b resolution at the OSR of 20 thanks to its  $2^{nd}$ -order noise shaping.

The robust passive gains are crucial to the NS-SAR, which are determined by relative gain ratios of the comparator input transistor [Liu et al. [2019]]. Although the ratios depend on the transistor dimensions to the first order, they are affected by the common-mode input voltage  $V_{cm}$  variations in the bi-directional DAC switching [Liu et al. [2019]]. To minimize the  $V_{cm}$  variation during the SAR conversion,  $V_{cm}$  switching of [Zhu et al. [2010]] is used in our NS-SAR. The comparator schematic is shown in Fig. 2.16. Note that the next stage is triggered by the previous stage. The sequential operations



Figure 2.18: Measured single-tone spectrum

guarantee a consistent pre-amplification gain hence a robust noise performance against variations.

# 2.5 Measurement Results

A prototype of the proposed  $4^{th}$ -order hybrid CT-DT DSM is fabricated in a 40-nm LP CMOS process. Fig. 2.17 shows the die photograph. The active area is 0.057 mm<sup>2</sup>. The SAB filter occupies the largest area of 0.035 mm<sup>2</sup>. The quantizer is 0.0078 mm<sup>2</sup> and RDAC is 0.0065 mm<sup>2</sup>.

Fig. 2.18 shows the measured output spectrum with a 3-MHz input



Figure 2.19: Measured two-tone spectrum

signal, showing 4<sup>th</sup>-order shaping. With the bandwidth of 12.5 MHz, this DSM achieves SNDR, SNR, and SFDR of 80.9 dB, 81.7 dB and 89.3 dB, respectively. The RDAC mismatch is addressed by an off-chip calibration. The two-tone test in Fig. 2.19 shows the IMD3 of -81.3 dB. Fig. 2.20 shows the measured SNDR and SNR versus the input amplitude. The dynamic range (DR) is 82.2 dB.

The prototype CT-DT DSM consumes 3.7 mW of power with a 1.2-V supply. The sampling frequency is 500 MHz. The OTA consumes the largest portion of total power, which is 2 mW. NS-SAR consumes 0.85 mW power. RDAC consumes 0.52 mW. The digital circuit power is 0.33 mW. Table 2.1



Figure 2.20: Measured SNDR/DNR vs. different input amplitudes

summarizes the measurement results and compares them with the state-ofthe-art single-loop CTDSMs. Compared to the prior works, our work achieves the highest SNDR of 80.9 dB. This work is the only 4<sup>th</sup>-order DSM with a single OTA. Owing to the reduced number of OTAs and charge domain ELD compensation, our high-order DSM also achieves the best Schreier FoM of 176.1 dB. Overall, this paper presents a compact, high-resolution, and energyefficient DSM.

## 2.6 Conclusion

This chapter presents a  $4^{th}$ -order hybrid DSM with a CT SAB-based  $2^{nd}$ -order loop filter and a passive  $2^{nd}$ -order passive NS-SAR. It combines the merits of a CTDSM (anti-aliasing filtering and relaxed OTA settling) and DTDSM (PVT robustness and accurate NTF). The robustness against PVT variations is improved. The multiple cascaded OTAs are reduced to a single OTA to achieve  $4^{th}$ -order shaping. Moreover, the ELD compensation and the input feedforward path are implemented in the charge domain. Therefore, the power efficiency and area consumption are improved. The proposed CT-DT DSM is well suited for applications demanding low power, low design complexity, and high robustness.

# Chapter 3

# High Resolution CTDSM with 2<sup>nd</sup>-order MES

In the previous chapter, we presented techniques that can boost the energy efficiency and improve the PVT robustness of the high-speed CTDSM. In this chapter, we will introduce techniques that reduces the power and area consumption of high-resolution CTDSMs. The mismatch error shaping order is increased by using a low-complexity dynamic element matching (DEM) hardware. Moreover, the highly linear CTDSM is achieved under a low  $V_{DD}$ , which is friendly to the advantage technology node. This chapter <sup>1</sup> presents a 95dB-SNDR and 250kHz-bandwidth CTDSM under 1.1V voltage supply, occupying a small area of 0.37mm<sup>2</sup>. This work significantly reduces the total DAC area by applying a hardware-efficient  $2^{nd}$ -order vector-quantizer (VQ) based DEM to the MSB DAC. The  $2^{nd}$ -order DEM not only suppresses the dominant MSB

<sup>&</sup>lt;sup>1</sup>This chapter is a partial reprint of the publication: Wei Shi, Xing Wang, Xiyuan Tang, Abhishek Mukherjee, Raviteja Theertham, Shanthi Pavan, Lu Jie and Nan Sun, "A  $0.37 \text{mm}^2$  250kHz-BW 95dB-SNDR with Low-Cost  $2^{nd}$ -order Vector-Quantizer DEM," in *IEEE Custom Integrated Circuits Conference (CICC)*, pp. C170-C172, Apr 2022. I am the main contributor in charge of circuit design, layout, and chip validations.

DAC mismatch error, but also alleviates the SNDR kink issue of the  $1^{st}$ -order DWA. To reduce the high hardware complexity of  $2^{nd}$ -order DEM implementation, a partial-sorter algorithm is proposed. Therefore, aggressive mismatch error shaping is achieved with minimal hardware cost. The loop filter also consumes a considerable area, because the tight noise requirement typically mandates small resistors and hence large integrating capacitors. By using a direct feedforward path, this work reduces the integrating capacitor's area by 10 times. Moreover, it largely suppresses the OTA output swing, resulting in low distortion under a limited voltage supply. Overall, this work achieves 95dB SNDR and 250kHz BW while consuming 4.7mW from 1.1V supply and having a compact area of  $0.37 \text{mm}^2$ .

#### 3.1 Introduction

CTDSMs with high resolution and bandwidth greater than 200kHz are needed in industrial, medical, and automotive applications. Such high performance demands very low noise and distortion. The noise and distortion have to be suppressed even further in advanced technologies due to the low voltage headroom. Such tight requirements usually result in excessive power and area consumption. There are three common ways to lower the quantization noise level. We can increase the number of NTF orders, increase OSR, or increase the number of bits of quantizer. A CTDSM with a highly multi-bit quantizer has many advantages. First, multi-bit quantizer can achieve higher SQNR at moderate OSRs. Second, multi-bit quantizer-based CTDSM has a larger maximum stability amplitude (MSA). Third, multi-bit quantization reduces the loop filter swing, relaxing the linearity requirements for filters. The second and third advantages both contribute to superior power efficiency. Moreover, multi-bit quantization makes CTDSM less sensitive to the clock jitter.

The major distortion sources are loop filters and feedback DACs. Highly linear filters are power hungry. Although multi-bit quantization can improve SQNR and relax the linearity requirements for the loop filters, it mandates a multi-bit DAC. Unlike the inherently linear single-bit DAC, multi-bit DAC suffers from the mismatch between elements. DAC area and mismatch is a fundamental design trade-off. Mismatch errors can be suppressed by using a large DAC area. Thus, the area cost for a high-resolution multi-bit CTDSM is non-trivial. An efficient mismatch error shaping is of great practical value.

In conclusion, the large area and power cost of DAC and loop filters turn out to be the major obstacles in the low noise and distortion design. In recent high-resolution CTDSMs, 1st-order data weighted average (DWA) is applied to the multi-bit DAC [Gönen et al. [2020]; Wang et al. [2015]; Theertham et al. [2020]]. However, 1st-order DWA yields limited mismatch error suppression, worsens the inter symbol interference (ISI), and suffers from the kink issue. In [Theertham et al. [2020]], DAC occupies a considerable area. There is also a kink in the SNDR plot of [Theertham et al. [2020]] at low input amplitudes due to tones caused by DWA. To reduce the area, the works in [Wu et al. [2020]; Liu et al. [2020]] use DWA for the MSB bits and mismatch error shaping (MES) for the LSB bits. MES enables the binary coded DAC to save the LSB DAC area. However, the overall DAC's mismatch-induced distortion is dominated by the MSB bits. Thus, the approach of [Wu et al. [2020]; Liu et al. [2020]] yields limited performance benefits due to the relatively mild 1st-order mismatch error shaping obtained from the DWA operation on the MSB bits. The limitations mentioned above motivated us to develop a CTDSM with a



Figure 3.1: Selection pattern of DWA.

low complexity  $2^{nd}$ -order MES. It makes use of the partial sorter algorithm to achieve a hardware-efficient MES. In the meantime, the passive feedforward path helps to reduce the distortions from the RC integrators under a low  $V_{DD}$ . It leverages faster digital logic and higher bandwidth in advanced technologies. Therefore, this highly linearized CTDSM has been well adapted to the advanced technologies.



# 3.2 Proposed CTDSM with Low-Cost 2nd-Order Vector Quantize-based DEM

Figure 3.2: SNDR measurements at different input amplitudes in [Theertham et al. [2020]]

#### 3.2.1 Prior Mismatch Error Shaping Techniques

Data weighted averaging (DWA) is widely adopted in many DSM designs. Many high-resolution CTDSMs use DWA to achieve  $1^{st}$ -order mismatch error shaping [Theertham et al. [2020]; Wang et al. [2015]; Wu et al. [2020]]. The element selection pattern is shown in Fig. 3.1. The logic implementation of DWA is simple since it is only rotational element selection.

However, DWA only offers  $1^{st}$ -order shaping. Considering the stringent distortion requirement in the high-resolution designs, large DACs are still needed to have a good raw matching. Moreover, the DWA selection pattern lacks randomness, leading to extra tones at low-amplitude inputs. In the work of [Theertham et al. [2020]], there is a kink in the SNDR plot, which is shown in Fig. 3.2.



Figure 3.3: Conventional VQ-based DEM structure.

To overcome the problem of mild mismatch error shaping and deterministic selection pattern,  $2^{nd}$ -order mismatch error shaping naturally becomes a design candidate.  $2^{nd}$ -order mismatch error shaping provides stronger error shaping and a more random selection pattern in principle. Existing  $2^{nd}$ order Dynamic element matching (DEM) has two categories: tree structurebased [Galton [1997]] and vector quantize-based DEM [Schreier and Zhang]. Tree structure has a relatively simpler logic implementation but the shaping performance turns out to be unsatisfying. VQ, on the other hand, offers the desired shaping at the cost of high-complexity hardware. Such drawbacks of existing  $2^{nd}$ -order DEMs prevent them from being used in the CTDSMs. Therefore, a key challenge is to reduce the hardware complexity of DEM while maintaining the desired  $2^{nd}$ -order mismatch error shaping.

### 3.2.2 2<sup>nd</sup>-Order DEM with Partial Sorter

In order to lower the hardware complexity of VQ-based DEM, we can first identify the most hardware-intensive block. As shown in Fig. 3.3, the complete sorter block contributes most to the hardware complexity.

The complete sorter makes a comparison between any pairs of all elements, leading to a massive number of comparators needed. For example, 105 comparators are needed for a 15-element sorting. However, complete sorting is not necessary for the DEM. In the work of [Sun and Cao [2011]], partial sorting replaces the complete sorting, as shown in Fig.3.4.



Figure 3.4: VQ-based DEM structure with partial sorter.

Although the order of elements is not completely correct, partial sorter doesn't lose much mismatch error suppression but has significantly less hard-



Figure 3.5: Mismatch error shaping spectrum of VQ-based DEM with complete sorter and partial sorter.

ware. The number of comparators needed is lower down to 33 given 15 elements. In the meantime, the shaping spectrum is close to that of the complete sorter-based DEM. As shown in Fig. 3.5, a comparison of spectrums with complete sorter and partial sorter is made. The one with complete sorter indicates a simulated 110 dB SNDR without thermal noise. The one with partial sorter shows 108 dB SNDR. Therefore, partial sorting achieves a better trade-off between hardware complexity and mismatch error suppression. Fig. 3.6 shows the partial sorter structure for 15 elements. 15 inputs are divided into 3 groups. The accurate ranks among 5 neighboring elements and 3 group summations are obtained first. The final element rank is the weighted sum of the individual rank and group summation rank.



Figure 3.6: Partial sorter structure for 15 elements.



Figure 3.7: Spectrum at input of 14 dBFS.

However, partial sorter-based DEM fails at inputs of some amplitudes.

Suppose that the sorter structure is the same as the one in Fig. 3.6 and there is a 15-element DAC, there are extra distortions when DAC inputs are 3, 6, 9, and 12. Fig. 3.7 shows the spectrum with extra distortions.

To diagnose the distortion problem, we can take a deep dive into integrators inside the filter. Fig. 3.8 shows the detailed filter structure.



Figure 3.8: Filter structure of VQ-based DEM.

Assume that we have a DAC input d[n] = 9. After sorted by the partial sorter, 3 elements in each group will be selected. Thus, it will cause  $mean(sv_{1-5}[n]-d[n]) = 0$ . Consequently, the group average of the first integrator outputs remains the same from cycle to cycle. For example,  $mean(sx_{1-5}[n]) = mean(sx_{1-5}[n-1])$  always holds for the first group. Therefore,  $mean(sx_{1-5})$  will remain a constant value. If DAC keeps getting the same input, d[n] = 9, the second integrator output  $x_{1-5}[n]$  will overflow with a constant input. The overflow leads to malfunctions of DEM filters. The root cause of the constant input is the lack of order information between groups. To fix the issue, we subtract the mean(sx) each cycle, forcing group mean to be zero when inputs are

3, 6, 9, and 12. As a result, the distortions are fixed and SNDR are restored, which is shown in Fig. 3.9.



Figure 3.9: Spectrum comparison before and after compensation.

After all, Table. 3.1 summarizes the pros and cons of existing and proposed  $2^{nd}$ -order DEMs. Our proposed  $2^{nd}$ -order DEM has achieved strong mismatch error suppression at low hardware cost. Fig. 3.10 shows the detailed structure of proposed DEM. As mentioned above, partial sorter causes the increased vector quantization noise, hence the degraded stability. To prevent filters from overflow or saturation, a compensating value SC is subtracted from the outputs of the 1st-order integrators, as shown in 3.10. SC is the average of

| Table 3.1: PROS AND | CONS OF | DIFFERENT | $2^{nd}$ -ORDER | DEMs. |
|---------------------|---------|-----------|-----------------|-------|
|---------------------|---------|-----------|-----------------|-------|

|                      | Delay | Gate count | Mismatch suppression |
|----------------------|-------|------------|----------------------|
| Complete VQ          | low   | high       | strong               |
| Tree Structure       | high  | low        | weak                 |
| Partial VQ(proposed) | low   | low        | strong               |



Figure 3.10: VQ-based DEM logic.

the 1st-order integrator outputs within one group. The 2nd-order integrator outputs are guaranteed to be bounded after SC is removed.  $2^{nd}$ -order DEM is applied to unary DACs. However, since a N-bit unary has  $2^{N}$  elements, the number of bits of unary DACs is usually lower than 6 bits. To achieve a highly multi-bit DAC, segmented DAC is used. The LSB is made binary and with MES technique [Wu et al. [2020]]. The binary LSB DAC enables a significant area reduction. Realizing that MSB DAC produces the most of



Figure 3.11: SNDR loss by mismatch.

mismatch errors, we apply  $2^{nd}$ -order DEM on the MSB DAC and  $1^{st}$ -order MES on the LSB DAC. Fig. 3.11 shows that smaller MSB DAC contributes more distortions. In our simulation, the SNDR loss by MSB mismatch error is 30 dB higher than that by LSB mismatch error. Fig. 3.12 shows the architecture of the proposed 2nd-order VQ-based DEM. Our DAC input D[8:0] is segmented into thermometer MSB[3:0] and binary LSB[4:0] with a redundant bit.

## 3.3 Loop Filter Design

Since the RDAC area is reduced by the 2nd-order mismatch error shaping, the filter area dominates the whole active area. In prior work [Wang et al.



Figure 3.12: Architecture of the proposed low-complexity  $2^{nd}$ -order VQ-based mismatch error shaping.

[2015]], a higher sampling frequency is chosen in 28nm to scale down the RC time constant to achieve a compact area. However, our CTDSM targets at 250kHz bandwidth, leading to smaller  $R_{IN}$  and larger  $C_1$  compared to [Wang et al. [2015]] (24kHz bandwidth). This is problematic as  $C_1$  is 82 pF and occupies 0.12mm<sup>2</sup> area. Fig. 3.13 shows the integrator structure and the parasitic capacitance. Moreover, the resulting 2.5-pF parasitic capacitance  $C_p$  heavily loads the OTA, as shown in Fig. 3.15. Furthermore, limited voltage headroom of OTA output increases harmonic distortion. Instead of increasing sampling frequency, we can reduce the capacitor directly and add the corresponding



Figure 3.13: Large integration capacitor due to a small  $R_{IN}$ .

attenuation block after the integrator to maintain the same loop gain. In the meantime, we want to suppress the large output swing of the integrator. Therefore, a resistive feedforward path is added to lower swing and achieve attenuation. Fig. 3.14 shows the structure of the loop filter with a resistive feedforward path. The dividing resistors of the feedforward path and filter path are set to be a ratio of 9:1, hence the signal attenuation at integrator output.  $C_1$  is reduced by 10 times, leading to both smaller area and parasitic capacitance. As a result, a compact and highly linear loop filter is achieved under low voltage headroom with the assistance of the resistive feedforward path.



Figure 3.14: Integration capacitor area and OTA output swing are reduced with the resistive feedforward path added.

# 3.4 CTDSM Architecture

Fig. 3.15 shows the architecture of the proposed CTDSM. It consists of an active RC integrator with a chopped OTA, a 9-b NS-SAR, and a 9-b segmented RDAC. Since the RDAC area is reduced by the 2nd-order mismatch error shaping, the filter area dominates the whole active area. Virtual-groundswitching [Theertham et al. [2020]] is used to mitigate the ISI. This section will illustrate more details of each circuit block.



Figure 3.15: The schematic of proposed CTDSM.

#### 3.4.1 OTA Schematic

The OTA used in the RC integrator is a feedforward compensated twostage OTA. The first stage is chopped at  $f_s/2$  and the schematic is shown in Fig. 3.16. Cascodes are used to increase dc gain of the stage.  $G_{m2}$  and  $G_{m3}$ schematics are shown in the Fig. 3.17. They are merged into one five-transistor amplifier.



Figure 3.16: Chopped OTA schematic:  $1^{st}$  stage.

#### 3.4.2 NS-SAR Schematic

The passive NS-SAR is implemented with an additional integration capacitor and multi-input comparator. Fig. 3.18 shows the simplified schematic. Since any errors from the NS-SAR are shaped by the front-end integrator, we implement only  $1^{st}$ -order mismatch error shaping on the SAR CDAC. MSB CDAC is taken care of by the DWA. The  $1^{st}$ -order MES is applied to the LSB CDAC. Fig. **??** shows the NS-SAR timing diagram. Sampling and reset phases achieve the MES Shu et al. [2016]. After NS-SAR finishes the conversion phase, the CDAC is merged with the integration capacitor to conduct integration.



Figure 3.17: Chopped OTA schematic:  $2^{nd}$  stage.



Figure 3.18: Simplified NS-SAR schematic.

# 3.5 Measurement Results

The modulator achieves 95dB SNDR and 250kHz bandwidth with only 0.37mm<sup>2</sup> area. Fig. 3.20 shows the measured spectrum with 20kHz input



Figure 3.20: Measured output spectra at large-amplitude input with DEM off

at a large amplitude with DEM off. Fig. 3.21 shows the spectrum of the same amplitude but with DEM on. With a bandwidth of 250kHz, the SNDR and SFDR are improved from 67dB/69dB to 95dB/103dB, respectively, at -3dBFS input. Fig. 3.22 shows a spectrum with -47 dBFS where usually

SNDR kink of DWA happens. Fig. 3.23 shows that a tone-free spectrum at -47 dBFS can be obtained with our 2<sup>nd</sup>-order DEM. For the -47dBFS input, the SNDR and SFDR are improved from 46dB/48dB to 48dB/65dB, respectively. Fig. 3.24 shows the measured amplitude. Fabricated in a 40nm CMOS process, the prototype ADC consumes 4.7mW (OTA:2.5mW, SAR:0.55mW, RDAC:1.1mW, digital circuits:0.55mW) from 1.1V at 32MS/s. The pie chart of power breakdown is shown in Fig. 3.25. The dynamic range (DR) is 96dB. Table. 3.2 summarizes the results and compares them with the state-of-the-art high-resolution CTDSMs. The input-referred noise and distortion (IRND)



Figure 3.21: Measured output spectra at large-amplitude input with DEM on



Figure 3.22: Measured output spectra at small-amplitude input with DEM off

quantifies the power of in-band noise and distortion normalized to the signal bandwidth. Compared to the prior CTDSM for audio applications (<25kHz), our work has 10x larger bandwidth and better IRND. FoM<sub>S</sub> favors designs with higher analog VDD because higher input full scale makes higher SNDR. Compared to the SNDR, noise and distortion (ND) better serves as a resolution metric with signal power excluded. Therefore, FoM<sub>N</sub> is defined to measure the power efficiency at a certain level of ND and bandwidth. With the same 250kHz bandwidth, our work has a better FoM<sub>N</sub> and 7x smaller area than [Theertham et al. [2020]]. As shown in Fig. 3.26, our work shows great potential to cut down the area cost in the high-resolution CTDSMs. The IRND of our work is 1.5xsmaller than prior works of <0.5mm<sup>2</sup> area. Our work drastically reduces the area by 7x compared to the work of [Theertham et al. [2020]]. Therefore, our work achieves low noise and distortion at a 250kHz BW with minimal power and area cost. Moreover, this work only requires a 1.1-V voltage supply, making it a suitable design choice in advanced technologies.



Figure 3.23: Measured output spectra at small-amplitude input with DEM on



Figure 3.24: Measured SNDR/SNR vs. different input amplitudes.

# 3.6 Conclusion

This chapter presents a high-resolution CTDSM with  $2^{nd}$ -order lowcomplexity DEM. To obtain a more aggressive mismatch error shaping and overcome the SNDR kink issue of the commonly-used  $1^{st}$ -order DWA, VQbased DEM has been implemented. Partial sorter, together with the proposed stabilization method, successfully reduces the hardware complexity with trivial performance loss. Moreover, a resistive feedforward path lowers the filter swing and reduces the integration capacitor by 10 times. Consequently, the loop filter



Figure 3.25: Power breakdown.

is highly linear, even under a low power supply. The proposed DEM logic and CTDSM loop structure shows a promising high-resolution design pattern under advanced technology nodes.

|                            | ISSCC-20<br>Wu | JSSC-16<br>Berti | JSSC-20<br>Gonen | JSSC-15<br>Wang | JSSC-20<br>Theertham | This work |
|----------------------------|----------------|------------------|------------------|-----------------|----------------------|-----------|
| Process [nm]               | 55             | 160              | 160              | 28              | 180                  | 40        |
| Analog V <sub>DD</sub> [V] | 1.2            | 1.6              | 1.8              | 3.3             | 1.8                  | 1.1       |
| Power [mW]                 | 1.01           | 0.39             | 0.618            | 1.13            | 24                   | 4.7       |
| BW [Hz]                    | 4k             | 20k              | 20k              | 24k             | 250k                 | 250k      |
| DR [dB]                    | 140            | 103.1            | 108.5            | 100.6           | 107.5                | 96.0      |
| SNDR [dB]                  | 101.9          | 91.3             | 106.4            | 98.5            | 105.3                | 94.8      |
| OSR                        | 128            | 75               | 256              | 500             | 64                   | 64        |
| DEM Order                  | 1              | 1                | 1                | 1               | 1                    | 2         |
| IRND [nV/√Hz]              | 108            | 218              | 43.1             | 179             | 13.8                 | 28.3      |
| FoMs[dB]                   | 167.9          | 168.4            | 181.5            | 171.8           | 175.4                | 172.1     |
| FoM <sub>N</sub> [dB]      | 169.3          | 167.3            | 179.4            | 163.9           | 173.3                | 174.3     |
| Area [mm <sup>2</sup> ]    | 0.585          | 0.21             | 0.27             | 0.022           | 2.85                 | 0.37      |

Table 3.2: PERFORMANCE SUMMARY AND COMPARISON WITH STATE-OF-THE-ART HIGH-RESOLUTION CTDSMs.

IRND(Input Referred Noise + Distortion) = (Noise + Distortion)/BW

 $FoM_S = SNDR + 10 \cdot \log_{10}(BW/Power)$ 

 $FoM_N = -10 \cdot \log_{10}(Noise + Distortion) + 10 \cdot \log_{10}(BW/Power)$ 



Figure 3.26: IRND and area plot of high-resolution CTDSMs.

# Chapter 4

# Robust Analog Design Automation Via Reinforcement Learning

Previous chapters discussed two high-performance ADC designs. A large amount of time was spent on tuning the circuit parameters. Therefore, analog automation is necessary to accelerate the design cycle. This chapter presents an automation technique to design analog/mixed-signal circuits against PVT variations. Analog/mixed-signal circuit design is one of the most complex and time-consuming stages in the chip design process. Due to various process, voltage, and temperature (PVT) variations from chip manufacturing, analog circuits inevitably suffer from performance degradation. Although there has been plenty of work on automating analog circuit design under the typical condition, limited research has been done on exploring robust designs under the real and unpredictable silicon variations. Automatic analog design against variations requires a large cost of computation and time. To address the challenge, we present RobustAnalog, a robust circuit design framework that involves the variation information in the optimization process. Specifically, circuit optimizations under different variations are considered as a set of tasks. Similarities among tasks are leveraged and competitions are alleviated to realize a sample-efficient multi-task training. Moreover, RobustAnalog prunes the task space according to the current performance in each iteration, leading to a further simulation cost reduction. In this way, RobustAnalog can rapidly produce a set of circuit parameters that satisfies diverse constraints (e.g. gain, bandwidth, noise...) across variations. We compare our method with Bayesian optimization, Evolutionary algorithm, and Deep Deterministic Policy Gradient (DDPG) and demonstrate that RobustAnalog can significantly reduce required the optimization time by  $14 \times -30 \times$ . Therefore, our study provides a feasible method to handle real silicon conditions.



### 4.1 Introduction

Figure 4.1: *Left:* Performances under variations form a distribution. *Right:* New technologies have larger process variations, vulnerability to environmental variations, hence higher discarded rate.

Analog circuit design is a paramount but extremely challenging task. It requires a huge amount of human efforts and lacks effective automations. Due to numerous chip manufacturing variations, analog circuits suffer from non-trivial performance degradation. Addressing such variation issues is considerably challenging. Large manufacturing variations make the circuit performance *unpredictable*.

In the performance distribution visualized in Figure 4.1, quite a few proportion of chips are landing in the red regions that are completely defected and discarded. As the chip fabrication technology advances, variation issues become even worse, leading to a larger chip failure rate. If such severe variation issues are not carefully handled, significant economic losses up to the billions of dollars will occur [McConaghy et al. [2012]]. Hence, an effective variationaware circuit design methodology is in high demand.

Traditional solutions to address such circuit variation issues primarily rely on laborious human expert involvement. Experts manually design the circuit based on *their expertise* and the feedback from *a large number of circuit simulations* and iterate the process until it passes all variation tests. However, the *burdensome analysis and slow simulations* make the manual design process considerably time-consuming.

Existing automated methods cannot address variation issues effectively. The black-box optimization algorithms [Cohen et al. [2015]; Lyu et al. [2018a]] and learning-based automation techniques [Wang et al. [2020a]; Settaluri et al. [2020]; Wang et al. [2018a]; Liu et al. [2021]] are used to design circuits. However, they merely focus on the optimization under *the typical condition without variations*. None of them can systematically produce a robust design under real chip variations. The variation-aware optimization is challenging in two aspects. First, the simulation cost is prohibitively expensive in order to get accurate variation effects under many test cases. Second, different variation conditions might conflict with each other which significantly complicates the circuit optimization problem. It will cost the solver much more time to find a feasible solution that meets all performance constraints.

To address the above challenges, in this work, we present RobustAnalog, an efficient variation-aware optimization framework for automatic analog circuit design. RobustAnalog largely reduces the simulation cost to design a robust analog circuit against variations. Here the variation-aware optimization is formulated as a multi-task reinforcement learning (RL) problem, where design for each variation condition is considered as one task. RobustAnalog includes two stages. At the first stage, we select a representative subset of tasks as the training set. Specifically, we group the tasks using clustering algorithm and choose one task per group to form the training task set based on their relative performance to the target performance. At the second stage, we leverage multi-task deep deterministic policy gradient (DDPG) [Lillicrap et al. [2015]] to train our RL agent with the selected tasks. During training, the critic model learns to predict values of state-action pair from each task and guides the actor to generate a better policy. To alleviate conflicting multi-task gradients, we apply PCGrad [Yu et al. [2020]] to optimize actor and critic models.

The core contributions of this work are as follows,

- We propose an automated optimization framework for variation-aware analog circuit design via multi-task reinforcement learning and adaptive task space pruning.
- An efficient training with variations is achieved by multi-task RL. The different PVT corners are formulated as multiple tasks. Sampling efficiency is largely improved by leveraging the correlations among similar tasks and mitigating the competition among conflicting tasks.
- An effective task pruning technique reduces the number of training tasks. With a subset of full tasks, the trained agent can still achieve full tasks eventually. The number of queries into the full task set is minimized, leading to a significant simulation cost reduction.
- Extensive experimental results demonstrate that, on real-world circuit design benchmarks, our method outperforms Evolutionary strategy (ES), Bayesian optimization (BO), and DDPG methods with  $14 \times -30 \times$  simulation cost reduction.

#### 4.2 Related Work

**PVT Variation and Corners** – The major part of variations is PVT variation. PVT variation usually refers to a combination of global process variation (P), power supply (V), and temperature (T) variations. Process variations happen during chip manufacturing, resulting in different transistor characteristics. There are five transistor models to cover the process variation {TT, SS, FF, SF, FS.

To avoid circuit failures due to uncontrollable PVT variations, we model all these variations by a set of *PVT corners*. A PVT corner is a combination of process, voltage, and temperature values. For example, a fast-process, highvoltage, and low temperature corner is {Process = FF,  $V_{dd} = 1.3$ V, T = 15°C }. A robust circuit should maintain desired performances in all of the pre-set PVT corners.

Automatic Analog Sizing – Automatic analog sizing techniques are attracting more and more research interests these years. The optimization methods, including Bayesian Optimization [Snoek et al. [2012]; Lyu et al. [2018a]], Genetic Algorithms [Cohen et al. [2015]] formulate the circuit design as a blackbox problem. They show the differences in the sample efficiency and optimality. However, the critical issue is that they have to optimize the circuit from scratch every time when encountering a new design condition. The lack of transferalibity across different conditions prevents them from addressing the variation issue at an affordable cost. Recently, learning-based methods have been extensively applied to circuit sizing problems. Deep neural networks (DNN) [Wang et al. [2020a]; Zhang et al. [2019a]] can approximate the complex relation between circuit parameters and performances. Deep RL methods show the potential to achieve higher circuit performances given enough explorations [Wang et al. [2018a, 2020a]; Settaluri et al. [2020]]. Moreover, RL enables transfer- learning across different design conditions, including different technologies and pre/post-layout design stage. However, current methods cannot reach the design goal under different conditions simultaneously. In this chapter, our work effectively addresses the variations of real-circuits altogether.

**Multi-Task RL** – Deep reinforcement learning (DRL) is an emerging subfield of RL that can scale RL algorithms to complex and rich environments. Multi-task RL focuses on enabling the single agent to solve multiple related problems, either simultaneously or sequentially [Teh et al. [2017]]. Learning multiple related tasks together should facilitate the learning of each individual task [Bengio [2012]; Caruana [1997]]. However, it has also been found that training on multiple tasks can negatively affect performance on each task. Different kinds of techniques are proposed to solve this issue including new architectures [Heess et al. [2016]; Devin et al. [2017]], auxiliary tasks [Jaderberg et al. [2016], and new optimization schemes [Hessel et al. [2019]; Yu et al. [2020]]. Besides, choosing which task or tasks to train on at each time step is also important. The task scheduling [Sharma et al. [2017]] is also discussed. The idea behind it is to assign task scheduling probabilities based on relative performance to a target level. Optimized training task selections can significantly improve model performance [Bengio et al. [2009]]. We explored both the optimal task selection and multi-task training. They are integrated together into the framework to boost the sampling efficiency.

# 4.3 Proposed PVT Variation-Aware Circuit Sizing4.3.1 Problem Definition

Given a fixed circuit topology, we search for a circuit sizing vector whose performance can satisfy the constraints (design targets) across all variations. Then the problem can be formulated as a constraint satisfaction problem under different conditions.

minimize 0  
subject to 
$$F_i(X|T_j) < C_i, \quad j = 1, \dots, k$$
 (4.1)

where

$$X = X_1, X_2, ..., X_n$$
$$D = D_1, D_2, ..., D_n$$
$$C = C_1, C_2, ..., C_m$$
$$T = T_1, T_2, ..., T_k$$

X, the sizing vector, is an *n*-dimensional variable which corresponding to *n* circuit sizing parameters. *D* is the domain for *X*. For example,  $D_1$  is [0, 1] which means the design space of  $X_1$  is [0, 1]. *T* is the set of *k* pre-defined PVT corners to cover possible variations in the real world. *C* is the constraint set for all circuit metrics. Because we have *m* metrics, the number of constraints is also *m*.  $F_i(X|T_j)$  is the *i*<sup>th</sup> performance metric of circuit under the *j*<sup>th</sup> corner.  $F_i$  is a non-linear mapping between *X* and the *i*<sup>th</sup> metric in the performance. *X* is the input, and *T* is the parameter. We rely on the circuit simulator to provide this mapping. Therefore, our goal is to find an X that can satisfy any constraints in C under any corner task in T. It is worth noting that choosing which tasks to optimize is also non-trivial. Spending simulations on each task is wasteful and provides minimal additional information since the correlation among tasks is ignored. A more interesting way is to conduct the task selection and multi-task training jointly.

#### 4.3.2 Framework Overview

An overview of the proposed framework is shown in Figure 4.2. We consider satisfying constraints under one PVT corner as a single task. In each iteration, (1) RobustAnalog selects a new task subset from all PVT corner tasks. For the first iteration, a pre-defined nominal corner will be selected as the first task; (2) The RL agent generates actions and passes them to each environment in the task subset; (3) The environment denormalizes actions ([-1, 1] range) to actual circuit sizings and refines them. The sizings will be truncated according to minimum precision, lower and upper bounds of the technology node if necessary; (4)Simulate the circuit (5) Agent gets the rewards from corner-specific environments. Optimizations are performed on the actor and critic networks with PCGrad technique. (6) If all tasks in the subset are passed during the agent evaluation, the sizing solution will be tested on the full task set. If it passes all tasks, the loop terminates. Otherwise back to (1). In the meantime, actor-critic model weights and replay buffers are saved for the agent to inherit in the next iteration.



Figure 4.2: RobustAnalog Overview. (1) A pruned task subset is generated from the full task set (2) Multi-task RL agent is trained on task subset (3) Training continues until the produced sizing can achieve training tasks. Then the sizing is evaluated on the full set. If it passes all the tasks, RobustAnalog returns the result.

#### 4.3.3 Multi-Task RL training

Multi-task RL is a training paradigm in which the agents are trained with samples from multiple tasks simultaneously. Shared representations are learnt from a collection of related tasks. These shared representations increase sample efficiency and can potentially yield a faster learning speed for related tasks. In our setting, we create a multi-task agent whose critic can predict the value of task-conditioned action-state pairs. Since the target of the actor is to look for a sizing that passes all tasks, the actor model is set to be task agnostic. Another benefit from shared representations is its ability to generalize to unseen corner tasks, which is useful in Monte Carlo corner tests. There are more discussions in Section 4.4.

**State.** The PVT information is embedded in our states, s = (p, v, t), where p is the one-hot representation of component type, v is the normalized voltage value and t is the normalized temperature value.

**Reward.** Our reward is formulated as:

$$R = \begin{cases} r, & r < -0.02 \\ 0.2, & r \ge -0.02 \end{cases}$$
(4.2)

$$r = \sum_{i=1}^{M} \min\{\frac{m_i - m_i^*}{m_i + m_i^*}, 0\}$$
(4.3)

where  $m_i$  is the current simulated  $i^{th}$  performance metric and  $m_i^*$  is the corresponding constraint. The reward is a measure of the relative distance between the current performance metrics and the corresponding design targets. Once the requirements are met, the reward value is fixed at 0.2. This reward formulation is motivated by the design goal in the real world. Designers tend not to over-optimize the circuits. It is more important that designers can fulfill the requirements in a short period of time.

Action. The action vector is a set of values corresponding to the sizing parameters for each circuit. They include transistor sizes (width, length) and capacitor values. The details of settings for each benchmark are illustrated in section 4.4.

**Training.** The environment includes the circuit, simulator, and PVT information. Each time we query the environment, it simulates the circuit and returns the performance with PVT information. After agent-environment interactions, samples  $(s, a, r, z_i)$  will be stored in the replay buffers , where s is the state, a is the action, r is the reward, and  $z_i$  is the corner task ID. The critic neural network takes  $(s, a, z_i)$  as a input and predicts the corresponding value for the current corner task. Relying on the insight that performance under different corners are related, most of critic neural network parameters are shared across tasks except a few in the input layer. The task ID is removed from the inputs of the actor neural network. The training process is modified from DDPG Lillicrap et al. [2015]. Details are illustrated in Algorithm 1. M is the max optimization episodes and W is the warm-up episodes. N is the truncated norm noise.  $N_s$  is the training batch size. The key difference from the single-task setting is that we sample a stratified batch from buffers every time and generate task-specific losses. Also, samples from different tasks are stored in separated tasks. For the optimization strategy, we use PCGrad Yu et al. [2020] to address conflicting gradients from different tasks.

Algorithm 1: Multi-task RL in RobustAnalog

### 4.3.4 Task Space Pruning

Although multi-task training has improved the efficiency of optimization on different corner tasks, we can still reduce the number of simulations further by selecting a small-sized training task set. Since our final goal is to



Figure 4.3: Visualization of corner task clustering and selection for strongARM Latch. k-means decision boundary is shown, dividing corners into two kinds: noise-limited corners (blue) and speed-limited corners (brown). The corner with the worst performance in each cluster is chosen as one of the training tasks.

pass all the corner tasks, we must be able to iteratively improve our optimization results with a series of training task sets, as shown in Fig.4.2. Therefore, we choose to incrementally train our NN-based RL agent since it has the transferability and capability of inheriting the trained weights from last cycle [Wang et al. [2020a]; Settaluri et al. [2020]]. Such transferability makes it compatible with the following task space pruning technique, which is a major advantage of multi-task RL methods over other optimization methods like Bayesian optimization and Evolutionary strategy.

Choosing a small batch from a large number of tasks is non-trivial. The

straightforward ways to form a training task set is sampling tasks randomly from the full set [Dong et al. [2015]; Sanh et al. [2019]]. Human designers tend to guess the worst-case corners and design against those corners. Inspired by human design methodology, the work of [Yang et al. [2021]] defines the lowest-reward as the worst cases and optimize on them correspondingly. The value of reward, scalarized from a multi-dimensional performance metric vector, lacks the information to differentiate different low-performance corners. Low rewards may result from different constraint violations. Therefore, the lowest reward doesn't mean one corner's performance is dominated by others'.

To address the problem mentioned above, we first cluster the corners based on their multi-dimensional metric vector and rank the corners in the same cluster by their rewards. Corners in the same cluster have the similar performance pattern. Therefore, the value of rewards in the same cluster can better reflect the "goodness" of one corner's performance. Since corner performance patterns are unknown, we perform clustering by using one of unsupervised learning techniques, k-means [MacQueen et al. [1967]], in the performance space. After including all the performance patterns, we can train our agent with a small subset of all corners with confidence that all corners will still work. We also apply the pruning technique to larger random corner set, which is detailed in section 4.4.

We take strongARM benchmark as an example. From last cycle, we have an optimized sizing based on the last batch of training corners. If such optimized sizing doesn't pass the full corner test set, we need to select a new batch of training corners. The proposed task space pruning contains three steps: (1) We simulate the optimized on all corner tests to get the corresponding performance distribution. (2) We divide the corners into different clusters by using the performance metrics as input features. It is shown in Fig.4.3 that corners of strongARM belong to two clusters. One cluster is noise limited and the other is speed limited. (3) We select the corner with the lowest reward in each cluster as one of the training tasks for the next iteration. Cluster-specific worst corner sets a lower bound of performances within the same group. With this pruning technique, the task space for multi-task RL training in each iteration is pruned to be a significantly smaller scale while still being a good representation of the full task space. If there is no sizing given at first, a pre-defined nominal corner will be chosen as the first corner to train on. An interesting finding from the empirical study results is that the easier corner tasks help to accelerate the learning of other hard tasks. Therefore, we always add a nominal corner as an auxiliary task in the training task sets at all time steps. If all the corners are passed, the loop terminates.

#### 4.4 Experiments

#### 4.4.1 Analog/Mixed-Signal Circuits

We experiment with three real-world analog/mixed-signal circuits. They are two-stage operational transimpedance amplifier (Two-stage OTA), foldedcascode operational transimpedance amplifier (Folded-Cascode OTA) and strongARM Latch. They are chosen for three reasons. First, they are the most important and common-used blocks in various systems. Engineers usually spend the longest time optimizing the performance and robustness of these circuits. Second, they include two representative kinds of analog circuits which are the static and dynamic circuits. The two kinds are dictated by different physic and engineering rules. The third reason is that they have different levels of variations. Two-stage OTA is with 45nm, and the other two are with older 180nm technology. 45nm has a larger variation. Therefore, we can study the impacts of different variation magnitudes. Each circuit is a composition of a number of transistors and capacitors. Each transistor has two parameters, the gate width and length (w, l). Capacitors have one parameter (c), the capacitance value. The initial design spaces of these devices are given by human designers. To minimize the efforts of designers, our design space are set to be very large. They have  $10^{14}$ ,  $10^{27}$ , and  $6.4 \times 10^{64}$  possible values correspondingly.

The circuits are simulated on SPICE-based simulators Nagel and Pederson [1973]. Two-stage OTA is on Ngspice and BSIM 45nm predictive technology. Folded-Cascode OTA and strongARM Latch are on Cadence spectre and TSMC 180nm technology, a commercial simulator tool.

**Two-stage OTA**. The topology is shown in Figure 4.4. It has 7 parameters including 6 transistor widths (w) and 1 capacitor value (c). The range of w is  $[0.5, 50]*1\mu M$  and [0.1, 10]\*1pF for c. The total design space is  $10^{14}$  possible values. The performance metrics are current(i), unity gain-bandwidth (ugb), phase margin (phm). The corresponding constraints (C) and the PVT

corner tests (T) are showed below. There are 30 corners  $(5 \times 3 \times 2)$ .

$$T = \{TT, SS, FF, FS, SF\} \times \{1.0V, 1.1V, 1.2V\} \times \{0^{\circ}C, 100^{\circ}C\}$$
$$C = \{i \le 5mA, ugb \ge 15MHz, phm \ge 60^{\circ}\}$$

Folded-Cascode OTA. The topology is shown in Figure 4.4. It has 20 parameters, including 7 transistor widths (w), 7 lengths (l), 2 capcitor values (c) and 4 transistor ratios (n). The range of w is  $[0.24, 150]*1\mu M$ ,  $[0.18, 2]*1\mu M$  for l, [0.1, 2]\*1pF, [0.1, 10]\*pF for different c. The total design space is  $6.4 \times 10^{64}$  possible values. The performance metrics are power(p), unity gain (g), phase margin (phm), common-mode rejection ratio (CMRR), power supply rejection ratio (PSRR), noise (n), unity-gain-bandwidth (ugb). The corresponding constraints (C) and the PVT corner tests (T) are showed below. There are 20 corners ( $5 \times 2 \times 2$ ).

$$T = \{TT, SS, FF, FS, SF\} \times \{1.6V, 1.8V\} \times \{0^{\circ}C, 100^{\circ}C\}$$
$$C = \{p \le 1mW, ugb \ge 30MHz, phm \ge 60^{\circ}, n \le 30mV,$$
$$g \ge 60dB, CMRR \ge 80dB, PSRR \ge 80dB\}$$

strongARM Latch. The topology is shown in Figure 4.4. It has 7 parameters, including 6 transistor widths (w), 1 capcitor values (c). The range of w is  $[0.22, 50]*1\mu M$ , [0.15, 4.5]\*1pF for c. The total design space is  $10^{27}$  possible values. The performance metrics are power(p), set delay (sd), reset delay (rd), set voltage (sv), reset voltage (rv), noise (n). The corresponding constraints (C) and the PVT corner tests (T) are showed below. There are 20 corners  $(5 \times 2 \times 2)$ .

$$\begin{split} T &= \{TT, \; SS, \; FF, \; FS, \; SF\} \times \{1.1V, \; 1.2V \;\} \times \{0^{\circ}C, \; 100^{\circ}C \;\} \\ C &= \{p \leq 4.5uW, \; n \leq 50uV, \; sd \leq 14ns, \; rd \leq 9.1ns, \\ sv \geq vdd - 0.05V, \; rv \leq 0.05V \;\} \end{split}$$

#### 4.4.2 Training Settings

To demonstrate the effectiveness of the proposed RobustAnalog, we apply RobustAnalog to the above three circuits and record the simulation time it took to pass all the corner tests. We compare the results of RobustAnalog with Bayesian Optimization (BO) Snoek et al. [2012], Evolutionary Strategy (ES) Hansen [2016], and Deep Deterministic Policy Gradient (DDPG). For the three baselines, the variation-aware circuit optimization is considered as a single task. The average reward of all corner tasks is used to indicate the goodness of the current sizing. BO, ES, and DDPG improve the average reward until it reaches 0.2. In ES, DDPG, and RobustAnalog, the circuit simulation time accounts for over 95% of the total time. The computation time of BO becomes comparable with simulation time after many iterations. We compare these methods in terms of the simulation time. For RL training, we use a training batch size of 64, replay buffer size of 1000, and exploration noise standard deviation of 0.2. Actor and critic are all 4-layer multilayer perceptions (MLPs). For RL methods, we evaluate the agent every 10 training steps. All the experiments are conducted on a 6 core CPU. RL methods are



Figure 4.4: Three analog/mixed-signal benchmarks.



Figure 4.5: Simulation times for each method to take to first hit reward=0.2 implemented with PyTorch Paszke et al. [2019]; Stooke and Abbeel [2019]

### 4.4.3 Evaluation of the Circuit Optimization

In all three circuit benchmarks, RobustAnalog achieved the smallest simulation cost to accomplish all the corner tasks. In each benchmark, it passed all the corners in the runs of different random seeds hence a 100% success rate. The comparison of simulation costs are shown in Figure 4.5. RobustAnalog consistently outperforms the baseline methods including ES, BO, and single-task DDPG. The simulation cost reductions are huge, 26x in Two-Stage OTA, 30x in strongARM Latch, and 14x in Folded-Cascode OTA. Note that BO becomes slow after having many samples. We ran BO for the same time with other methods for fair comparisons. We have



Figure 4.6: Compare learning curves (average reward vs. # simulation) among baselines and our proposed RobustAnalog. Reward=0.2 indicates all tasks are passed. RobustAnalog hits the reward of 0.2 significantly faster than the baseline methods on all benchmarks.

several findings from the experiment results. First, all methods spend more simulations on optimizing the Two-Stage OTA which has larger variations with the 45nm technology. Second, compared to the ES and BO, single-task DDPG performs better in strongARM Latch while worse in the Two-Stage and Folded-Cascode OTAs. This is possibly because strongARM Latch is a dynamic circuit that is different from the static OTAs. To conclude, RobustAnalog shows a significant efficiency improvement in the different levels of variations and circuit benchmarks with distinct natures. The learning curves are shown in Figure 4.6. Moreover, RobustAnalog achieves comparable performances with a state-of-the-art human design Tang et al. [2020].

### 4.4.4 Analysis

Multi-Task and Task Space Pruning. We conduct an ablation study on multi-task training and task space pruning. In Figure 4.7, We com-



Figure 4.7: Ablation of applying multi-task and task space pruning. Using two together brings the least simulation cost.

pared simulation costs of DDPG baseline, multi-task DDPG with full task set, and RobustAnalog (multi-task DDPG with pruned task set). DDPG took over 300,000 simulations to pass all corner tests. With the multi-task training, the number of simulations was reduced to 35,000. With the pruned task space, the number of simulations was further cut down to 7,000. We also visualize the corner performances and the optimization trace in the performance plane of three circuit benchmarks in Figure 4.8.

Noise (n) - Delay (sd) plane is chosen for strongARM Latch and Bandwidth (ugb) - Phase Margin (phm) for OTAs. Selected training tasks are denoted by black circles. We can clearly see that selections are located at the performance boundary. Two snapshots of the performance distribution during the optimization are also showed. They clearly indicate that distributions moved towards the feasible set area from  $t_0$  to  $t_1$  with such pruned task space.

**RobustAnalog vs. Human Expert**. To examine the quality of the solutions from RobustAnalog, we compared them with a state-of-the-art hu-



Figure 4.8: Performance distributions of two intermediate sizings during the RobustAnalog optimization. Red and blue markers are performances on different corners at time  $t_0$  and  $t_1$ . Selected training corners are indicated by black circles.

Table 4.1: Comparison between RobustAnalog's solution and expert's solution

|                                  | Power        | Set Delay    | Reset Delay  | Noise        | DSV    | Reset Voltage |
|----------------------------------|--------------|--------------|--------------|--------------|--------|---------------|
|                                  | (uV)         | (ns)         | (ns)         | (uV)         | (V)    | (nV)          |
| Human Expert [Tang et al., 2020] | (3.78, 4.69) | (6.42, 19.7) | (5.02, 9.40) | (41.9, 57.3) | (0, 0) | (8.71, 1.99k) |
| RobustAnalog                     | (2.22, 2.88) | (4.46, 13.9) | (1.30, 2.3)  | (45.3, 61.9) | (0, 0) | (20.4, 2.21k) |

man design. The performance metrics are listed in Table 4.1. Each metric is shown in the format of (min, max) across corners. For all metrics, RobustAnalog performed better excepts for the slightly inferior noise performance. This benchmark, strongARM Latch, has non-linear behaviors and variationsensitive performances. A large amount of tuning efforts is required. It can take days for the expert to achieve the design target. Now RobustAnalog can achieve the same task and produce high-quality solutions within an hour.

### Scale to Large Corner Sets.

Here we empirically study how the simulation cost scales as we take on more and more corner tasks. In the previous sections, we discussed the fully factorial corner test for each benchmark. In industry-level circuits, ran-



Figure 4.9: Required simulation steps with more corners

domly sampled corners, Monte Carlo corners, are also used. There can be hundreds, even thousands of Monte Carlo corners needed to perform a thorough verification. Therefore, the scalability to a large corner set is important. To demonstrate the scalability of RobustAnalog, we conduct Monte Carlo sampling on process variation modelsets {TT, FF, SS, FS, SF}, continuous voltage range [1.0, 1.2] and continuous temperature range [0°C, 100°C] and form 5 Monte Carlo corner test sets of different sizes. These Monte Carlo corner sets have 20, 40, 80, 100, 150 corners, respectively. Experiments are done on Two-Stage OTA benchmark and results are shown in Figure 4.9. RobustAnalog only needs 69% more simulations when the corner task set becomes  $7.5 \times$  larger. The simulation cost difference between RobustAnalog and the baseline methods will become  $4.4 \times$  larger at the scale of 150 corners.

### 4.5 Conclusion

We present RobustAnalog, a variation-aware optimization framework based on multi-task RL. The key property of RobustAnalog is the ability to conduct efficient multi-task learning with pruned training task space. Therefore, it can effectively design circuits for variations. We show that RobustAnalog can reduce simulation cost by an order of magnitude compared with baselines. It can also scale to a large number of variation cases. As today's chip design becomes extremely challenging with the presence of variations, RobustAnalog shows the potential to drastically shorten the circuit design cycle and reduce the cost.

## Chapter 5

# Conclusion

This dissertation has presented a set of design and automation techniques to reduce the physical and design cost of analog/mixed-signal designs. We mainly focused on the area and power reduction in the high-resolution and high-speed regimes of ADC design. Then, the design cost of those highperformance ADCs is largely reduced by the proposed variation-aware analog automation. The major contributions are concluded in this session.

The high-speed CTDSM design trade-offs are discussed in Chapter 2. An energy-efficient 4<sup>th</sup>-order CTDSM with single OTA and passive NS-SAR. Hybridizing the CTDSM with the emerging passive NS-SAR significantly reduces the power by minimizing the number of OTAs and enhances the stability. In the meanwhile, an efficient ELD compensation is conducted in the charge domain. The hybrid CT-DT DSM turns out to be a cost-effective design pattern of high-order DSMs.

The high-resolution CTDSM design is researched in Chapter 3. A compact-area high-resolution CTDSM is presented with  $2^{nd}$ -order DEM and feedforward-assisted loop filter. The high hardware complexity of high-order DEM is tackled by a partial-sorter algorithm. Thus, the feedback multi-bit

DAC area is significantly reduced. The feedforward path makes the loop filter highly linear in the low power supply. Therefore, this design shows a viable way to achieve high resolution in the advanced technology node despite the low power supply and expensive area consumption.

Variation-aware analog automation is studied in Chapter 4. The multitask RL framework with task space pruning address the issue of expensive simulation cost in automations. The correlations in different conditions are modeled and their conflicts are mitigated. On top of this multi-task formulation, the full training task set is pruned in each iteration of the framework adaptively. In conclusion, the proposed framework, RobustAnalog, bridges the gap between real-world analog design and existing automation techniques.

All chip designs were validated through measurements on silicon prototypes and demonstrated solid evidence on advancing cutting-edge performance. Automation algorithms were tested on real-world circuits and compared with mainstream black-box optimization algorithms. In summary, we push the boundary of analog/mixed-signal performances and make high-performance analog/mixed-signal circuits easy to be accessed with the help of machine intelligence. All these efforts will enable the landing of the ubiquitous sensing era.

## Bibliography

- Chapter 1 FinFET—From device concept to standard compact model. In Yogesh Singh Chauhan, Darsen D. Lu, Sriramkumar Vanugopalan, Sourabh Khandelwal, Juan Pablo Duarte, Navid Paydavosi, Ai Niknejad, and Chenming Hu, editors, *FinFET Modeling for IC Simulation and Design*, pages 1–13. Academic Press, Oxford, 2015. ISBN 978-0-12-420031-9. doi: https://doi.org/10.1016/B978-0-12-420031-9.00001-4. URL https://www. sciencedirect.com/science/article/pii/B9780124200319000014.
- Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML workshop on unsupervised and transfer learning, pages 17–36. JMLR Workshop and Conference Proceedings, 2012.
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.
- Claudio De Berti, Piero Malcovati, Lorenzo Crespi, and Andrea Baschirotto. A 106 dB A-weighted DR low-power continuous-time ΣΔ modulator for MEMS microphones. *IEEE Journal of Solid-State Circuits*, 51(7):1607– 1618, 2016. ISSN 0018-9200. doi: 10.1109/jssc.2016.2540811.
- Ahmet F Budak, Prateek Bhansali, Bo Liu, Nan Sun, David Z Pan, and

Chandramouli V Kashyap. Dnn-opt: An rl inspired optimization for analog circuit sizing using deep neural networks. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 1219–1224. IEEE, 2021.

Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.

- Hyungil Chae, Jaehun Jeong, Gabriele Manganaro, and Michael P Flynn. A 12 mW low power continuous-time bandpass ΔΣ modulator with 58 dB SNDR and 24 MHz bandwidth at 200 MHz IF. *IEEE Journal of Solid-State Circuits*, 49(2):405–415, 2013.
- Eric Chang, Jaeduk Han, Woorham Bae, Zhongkai Wang, Nathan Narevsky, Borivoje NikoliC, and Elad Alon. Bag2: A process-portable framework for generator-based ams circuit design. In 2018 IEEE Custom Integrated Circuits Conference (CICC), pages 1–8. IEEE, 2018.
- Miri Weiss Cohen, Michael Aga, and Tomer Weinberg. Genetic algorithm software system for analog circuit design. *Proceedia CIRP*, 36:17–22, 2015.
- Bart De Vuyst, Pieter Rombouts, and Georges Gielen. A rigorous approach to the robust design of continuous-time  $\Sigma\Delta$  modulators. *IEEE Transactions* on Circuits and Systems I: Regular Papers, 58(12):2829–2837, 2011.
- Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, and Sergey Levine. Learning modular neural network policies for multi-task and multirobot transfer. In 2017 IEEE international conference on robotics and automation (ICRA), pages 2169–2176. IEEE, 2017.

- Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. Multitask learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1723–1732, 2015.
- Ian Galton. Spectral shaping of circuit errors in digital-to-analog converters. IEEE Transactions on Circuits and Systems II: Analog and digital signal processing, 44(10):808–817, 1997.
- Burak Gönen, Shoubhik Karmakar, Robert van Veldhoven, and Kofi AA Makinwa. A continuous-time zoom ADC for low-power audio applications. *IEEE Journal of Solid-State Circuits*, 55(4):1023–1031, 2020.
- Wenjuan Guo and Nan Sun. A 12b-ENOB 61µW noise-shaping SAR ADC with a passive integrator. In ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference, pages 405–408. IEEE, 2016.
- Kourosh Hakhamaneshi, Nick Werblun, Pieter Abbeel, and Vladimir Stojanović. Bagnet: Berkeley analog generator with layout optimizer boosted with deep neural networks. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8. IEEE, 2019.
- Nikolaus Hansen. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772, 2016.

- Yihui He et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In ECCV, 2018.
- Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, and David Silver. Learning and transfer of modulated locomotor controllers. arXiv preprint arXiv:1610.05182, 2016.
- Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, and Hado van Hasselt. Multi-task deep reinforcement learning with popart. In *Proceedings of the AAAI Conference on Artificial Intelligence*, volume 33, pages 3796–3803, 2019.
- Chen-Yen Ho, Cong Liu, Chi-Lun Lo, Hung-Chieh Tsai, Tze-Chien Wang, and Yu-Hsin Lin. A 4.5 mW CT self-coupled  $\Delta\Sigma$  modulator with 2.2 MHz BW and 90.4 dB SNDR using residual ELD compensation. *IEEE Journal of Solid-State Circuits*, 50(12):2870–2879, 2015.
- Nuno Horta. Analogue and mixed-signal systems topologies exploration using symbolic methods. Analog Integrated Circuits and Signal Processing, 31(2): 161–176, 2002.
- Sheng-Jui Huang, Nathan Egan, Divya Kesharwani, Frank Opteynde, and Michael Ashburn. 28.3 A 125MHz-BW 71.9 dB-SNDR VCO-based CT ΔΣ ADC with segmented phase-domain ELD compensation in 16nm CMOS. In 2017 IEEE International Solid-State Circuits Conference (ISSCC), pages 470–471. IEEE, 2017.

- Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397, 2016.
- John R. Koza, Forrest H Bennett, David Andre, Martin A. Keane, and Frank Dunlap. Automated synthesis of analog electrical circuits by means of genetic programming. *IEEE Transactions on evolutionary computation*, 1(2): 109–128, 1997.
- Kishor Kunal, Meghna Madhusudan, Arvind K Sharma, Wenbin Xu, Steven M Burns, Ramesh Harjani, Jiang Hu, Desmond A Kirkpatrick, and Sachin S Sapatnekar. Align: Open-source analog layout automation from the ground up. In *Proceedings of the 56th Annual Design Automation Conference 2019*, pages 1–4, 2019.
- Sergey Levine et al. End-to-end training of deep visuomotor policies. *JMLR*, 2016.
- Shaolan Li, Abhishek Mukherjee, and Nan Sun. A 174.3-dB FoM VCO-based CT ΔΣ modulator with a fully-digital phase extended quantizer and trilevel resistor DAC in 130-nm CMOS. *IEEE Journal of Solid-State Circuits*, 52(7):1940–1952, 2017.
- Yaping Li, Yong Wang, Yusong Li, Ranran Zhou, and Zhaojun Lin. An artificial neural network assisted optimization system for analog design space exploration. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 39(10):2640–2653, 2019.

- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Yibo Lin, Zixuan Jiang, Jiaqi Gu, Wuxi Li, Shounak Dhar, Haoxing Ren, Brucek Khailany, and David Z Pan. Dreamplace: Deep learning toolkitenabled gpu acceleration for modern vlsi placement. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 2020.
- Ying-Zu Lin, Chin-Yu Lin, Shan-Chih Tsou, Chih-Hou Tsai, and Chao-Hsin Lu. 20.2 a 40MHz-BW 320MS/s passive noise-shaping SAR ADC with passive signal-residue summation in 14nm FinFET. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC), pages 330–332. IEEE, 2019.
- Bo Liu, Yan Wang, Zhiping Yu, Leibo Liu, Miao Li, Zheng Wang, Jing Lu, and Francisco V Fernández. Analog circuit optimization system based on hybrid evolutionary algorithms. *Integration*, 42(2):137–148, 2009.
- Bo Liu, Qingfu Zhang, and Georges GE Gielen. A gaussian process surrogate model assisted evolutionary algorithm for medium scale expensive optimization problems. *IEEE Transactions on Evolutionary Computation*, 18 (2):180–192, 2013.
- Bo Liu, Dixian Zhao, Patrick Reynaert, and Georges GE Gielen. Gaspad: A general and efficient mm-wave integrated circuit synthesis method based on surrogate model assisted evolutionary algorithm. *IEEE Transactions on*

Computer-Aided Design of Integrated Circuits and Systems, 33(2):169–182, 2014.

- Bo Liu, Slawomir Koziel, and Qingfu Zhang. A multi-fidelity surrogate-modelassisted evolutionary algorithm for computationally expensive optimization problems. *Journal of computational science*, 12:28–37, 2016.
- Jiaxin Liu, Chen-Kai Hsu, Xiyuan Tang, Shaolan Li, Guangjun Wen, and Nan Sun. Error-Feedback Mismatch Error Shaping for High-Resolution Data Converters. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 66(4):1342–1354, 2018a.
- Jiaxin Liu, Shaolan Li, Wenjuan Guo, Guangjun Wen, and Nan Sun. A 0.029 MM2 17-FJ/Conv.-Step CT ΔΣ ADC with 2nd-Order Noise-Shaping SAR Quantizer. In 2018 IEEE symposium on VLSI circuits, pages 201–202. IEEE, 2018b.
- Jiaxin Liu, Shaolan Li, Wenjuan Guo, Guangjun Wen, and Nan Sun. A 0.029mm<sup>2</sup> 17-fJ/conversion-step third-order CT ΔΣ ADC with a single OTA and second-order noise-shaping SAR quantizer. *IEEE Journal of Solid-State Circuits*, 54(2):428–440, 2019. ISSN 0018-9200. doi: 10.1109/jssc. 2018.2879955.
- Jiaxin Liu, Xing Wang, Zijie Gao, Mingtao Zhan, Xiyuan Tang, and Nan Sun. 9.3 A 40kHz-BW 90dB-SNDR Noise-Shaping SAR with 4× Passive Gain and 2 nd-Order Mismatch Error Shaping. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pages 158–160. IEEE, 2020.

- Mingjie Liu, Walker J. Turner, George F. Kokai, Brucek Khailany, David Z. Pan, and Haoxing Ren. Parasitic-aware analog circuit sizing with graph neural networks and bayesian optimization. In 2021 Design, Automation Test in Europe Conference Exhibition (DATE), pages 1372–1377, 2021. doi: 10.23919/DATE51398.2021.9474253.
- Tien-Yu Lo, Chan-Hsiang Weng, Hung-Yi Hsieh, Yun-Shiang Shu, and Pao-Cheng Chiu. 20.4 An 8×-OSR 25MHz-BW 79.4dB/74dB DR/SNDR CT ΔΣ modulator using 7b linearized segmented DACs with digital noise-couplingcompensation filter in 7nm FinFET CMOS. 2019 IEEE International Solid-State Circuits Conference - (ISSCC), 00:334–336, 2019. doi: 10.1109/isscc. 2019.8662371.
- Wenlong Lyu, Fan Yang, Changhao Yan, Dian Zhou, and Xuan Zeng. Batch bayesian optimization via multi-objective acquisition ensemble for automated analog circuit design. In *International conference on machine learning*, pages 3306–3314. PMLR, 2018a.
- Wenlong Lyu, Fan Yang, Changhao Yan, Dian Zhou, and Xuan Zeng. Multiobjective bayesian optimization for analog/rf circuit synthesis. In Proceedings of the 55th Annual Design Automation Conference, pages 1–6, 2018b.
- James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA, 1967.

- Colin C. McAndrew, Ik-Sung Lim, Brandt Braswell, and Doug Garrity. Corner models: Inaccurate at best, and it only gets worst .... In *Proceedings of the IEEE 2013 Custom Integrated Circuits Conference*, pages 1–4, 2013. doi: 10.1109/CICC.2013.6658428.
- Trent McConaghy, Kristopher Breen, Jeffrey Dyck, and Amit Gupta. Variation-aware design of custom integrated circuits: a hands-on field guide. Springer Science & Business Media, 2012.
- Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, et al. Chip placement with deep reinforcement learning. arXiv preprint arXiv:2004.10746, 2020.
- Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, et al. A graph placement methodology for fast chip design. *Nature*, 594(7862):207–212, 2021.
- Abhishek Mukherjee, Miguel Gandara, Xiangxing Yang, Linxiao Shen, Xiyuan Tang, Chen-Kai Hsu, and Nan Sun. A 74.5-dB dynamic range 10-MHz BW CT- $\Delta\Sigma$  ADC with distributed-input VCO and embedded capacitive- $\pi$  network in 40-nm CMOS. *IEEE Journal of Solid-State Circuits*, 56(2): 476–487, 2020.
- Laurence W. Nagel and D.O. Pederson. Spice (simulation program with integrated circuit emphasis). Technical Report UCB/ERL M382, EECS

Department, University of California, Berkeley, Apr 1973. URL http: //www2.eecs.berkeley.edu/Pubs/TechRpts/1973/22871.html.

- Andreas Olofsson. Intelligent design of electronic assets (idea) & posh open source hardware (posh). DARPA/MTO, 2018.
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703, 2019.
- Shanthi Pavan, Richard Schreier, and Gabor C Temes. "Feedback DAC design," in Understanding delta-sigma data converters. John Wiley & Sons, 2017.
- David H Robertson. Problems and solutions: How applications drive data converters (and how changing data converter technology influences system architecture). *IEEE Solid-State Circuits Magazine*, 7(3):47–57, 2015.
- Andrei Sandu, Andi Buzo, Georg Pelz, and Corneliu Burileanu. Adaptive methodology for process-voltage-temperature verification. In 2020 International Semiconductor Conference (CAS), pages 51–54. IEEE, 2020.
- Victor Sanh, Thomas Wolf, and Sebastian Ruder. A hierarchical multi-task approach for learning embeddings from semantic tasks. In *Proceedings of* the AAAI Conference on Artificial Intelligence, volume 33, pages 6949–6956, 2019.

- Richard Schreier and Bo Zhang. noise-shaped multibit d/a convertor employing unit elements.
- Richard Schreier and Boming Zhang. Delta-sigma modulators employing continuous-time circuitry. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 43(4):324–332, 1996.
- Keertana Settaluri, Ameer Haj-Ali, Qijing Huang, Kourosh Hakhamaneshi, and Borivoje Nikolic. Autockt: Deep reinforcement learning of analog circuit designs. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 490–495. IEEE, 2020.
- Sahil Sharma, Ashutosh Jha, Parikshit Hegde, and Balaraman Ravindran. Learning to multi-task by active sampling. arXiv preprint arXiv:1702.06053, 2017.
- Yun-Shiang Shu, Jui-Yuan Tsai, Ping Chen, Tien-Yu Lo, and Pao-Cheng Chiu. A 28fJ/conv-step CT ΔΣ modulator with 78dB DR and 18MHz BW in 28nm CMOS using a highly digital multibit quantizer. 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, pages 268–270, 2013. doi: 10.1109/isscc.2013.6487729.
- Yun-Shiang Shu, Liang-Ting Kuo, and Tien-Yu Lo. An oversampling SAR ADC with DAC mismatch error shaping achieving 105 dB SFDR and 101 dB SNDR over 1 kHz BW in 55 nm CMOS. *IEEE Journal of solid-state circuits*, 51(12):2928–2940, 2016.

- B.P. Del Signore, D.A. Kerth, N.S. Sooch, and E.J. Swanson. A monolithic
  2-b delta-sigma A/D converter. *IEEE Journal of Solid-State Circuits*, 25 (6):1311–1317, 1990. ISSN 0018-9200. doi: 10.1109/4.62174.
- Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. arXiv preprint arXiv:1206.2944, 2012.
- Shuang Song, Michiel J Rooijakkers, Pieter Harpe, Chiara Rabotti, Massimo Mischi, Arthur HM van Roermund, and Eugenio Cantatore. A 430nW 64nV/vHz current-reuse telescopic amplifier for neural recording applications. In 2013 IEEE Biomedical Circuits and Systems Conference (BioCAS), pages 322–325. IEEE, 2013.
- Trevor Standley, Amir Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. Which tasks should be learned together in multi-task learning? In *International Conference on Machine Learning*, pages 9120– 9132. PMLR, 2020.
- Adam Stooke and Pieter Abbeel. rlpyt: A research code base for deep reinforcement learning in pytorch, 2019.
- Nan Sun and Peiyan Cao. Low-Complexity High-Order Vector-Based Mismatch Shaping in Multibit ΔΣ ADCs. *IEEE Transactions on Circuits and* Systems II: Express Briefs, 58(12):872–876, 2011.

- Xiyuan Tang, Linxiao Shen, Begum Kasap, Xiangxing Yang, Wei Shi, Abhishek Mukherjee, David Z Pan, and Nan Sun. An energy-efficient comparator with dynamic floating inverter amplifier. *IEEE Journal of Solid-State Circuits*, 55(4):1011–1022, 2020.
- Matthew E Taylor and Peter Stone. An introduction to intertask transfer for reinforcement learning. *Ai Magazine*, 32(1):15–15, 2011.
- Yee Whye Teh, Victor Bapst, Wojciech Marian Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, and Razvan Pascanu. Distral: Robust multitask reinforcement learning, 2017.
- Raviteja Theertham, Prasanth Koottala, Sujith Billa, and Shanthi Pavan. Design techniques for high-resolution continuous-time delta–sigma converters with low in-band noise spectral density. *IEEE Journal of Solid-State Circuits*, 55(9):2429–2442, 2020.
- Hanrui Wang, Jiacheng Yang, Hae-Seung Lee, and Song Han. Learning to design circuits. In NeurIPS Machine Learning for Systems Workshop, 2018a.
- Hanrui Wang, Kuan Wang, Jiacheng Yang, Linxiao Shen, Nan Sun, Hae-Seung Lee, and Song Han. GCN-RL circuit designer: Transferable transistor sizing with graph neural networks and reinforcement learning. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2020a.

- Hanrui Wang, Jiacheng Yang, Hae-Seung Lee, and Song Han. Learning to design circuits, 2020b.
- Tze-Chien Wang, Yu-Hsin Lin, and Chun-Cheng Liu. A 0.022 mm<sup>2</sup> 98.5 dB SNDR Hybrid Audio ΔΣ Modulator With Digital ELD Compensation in 28 nm CMOS. *IEEE Journal of Solid-State Circuits*, 50(11):2655–2664, 2015.
- Wei Wang, Yan Zhu, Chi-Hang Chan, and Rui Paulo Martins. A 5.35-mW 10-MHz single-opamp third-order CT ΔΣ modulator with CTC amplifier and adaptive latch DAC driver in 65-nm CMOS. *IEEE Journal of Solid-State Circuits*, 53(10):2783–2794, 2018b. ISSN 0018-9200. doi: 10.1109/jssc.2018. 2852326.
- Wei Wang, Chi-Hang Chan, Yan Zhu, and Rui P Martins. A 100-MHz BW 72.6-dB-SNDR CT ΔΣ modulator utilizing preliminary sampling and quantization. *IEEE Journal of Solid-State Circuits*, 55(6):1588–1598, 2020c.
- Guowen Wei, Pradeep Shettigar, Feng Su, Xinyu Yu, and Tom Kwan. A 13-ENOB, 5 MHz BW, 3.16 mW multi-bit continuous-time ΔΣ ADC in 28 nm CMOS with excess-loop-delay compensation embedded in SAR quantizer. In 2015 Symposium on VLSI Circuits (VLSI Circuits), pages C292–C293. IEEE, 2015.
- Bo Wu, Shuang Zhu, Benwei Xu, and Yun Chiu. A 24.7 mW 65 nm CMOS SAR-assisted CT  $\Delta\Sigma$  modulator with second-order noise coupling achieving 45 MHz bandwidth and 75.3 dB SNDR. *IEEE Journal of Solid-State Circuits*, 51(12):2893–2905, 2016.

- Su-Hao Wu, Yun-Shiang Shu, Albert Yen-Chih Chiou, Wei-Hsiang Huang, Zhi-Xin Chen, and Hung-Yi Hsieh. 9.1 A Current-Sensing Front-End Realized by A Continuous-Time Incremental ADC with 12b SAR Quantizer and Reset-Then-Open Resistive DAC Achieving 140dB DR and 8ppm INL at 4kS/s. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pages 154–156. IEEE, 2020.
- Carsten Wulff and Trond Ytterdal. A compiled 9-bit 20-ms/s 3.5-fj/conv. step sar adc in 28-nm fdsoi for bluetooth low energy receivers. *IEEE Journal of Solid-State Circuits*, 52(7):1915–1926, 2017.
- Kai Xing, Wei Wang, Yan Zhu, Chi-Hang Chan, and Rui Paulo Martins. A 10.4mW 50MHz-BW 80dB-DR single-opamp third-order CTSDM with SAB-ELD-merged integrator and 3-stage opamp. 2020 IEEE Symposium on VLSI Circuits, 00:1–2, 2020. doi: 10.1109/vlsicircuits18222.2020.9162797.
- Biying Xu, Keren Zhu, Mingjie Liu, Yibo Lin, Shaolan Li, Xiyuan Tang, Nan Sun, and David Z Pan. Magical: Toward fully automated analog ic layout leveraging human and machine intelligence. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8. IEEE, 2019.
- Kai-En Yang, Chia-Yu Tsai, Hung-Hao Shen, Chen-Feng Chiang, Feng-Ming Tsai, Chung-An Wang, Yiju Ting, Chia-Shun Yeh, and Chin-Tang Lai. Trust-region method with deep reinforcement learning in analog design

space exploration. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 1225–1230. IEEE, 2021.

- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? arXiv preprint arXiv:1411.1792, 2014.
- Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. arXiv preprint arXiv:2001.06782, 2020.
- Ramin Zanbaghi, Pavan Kumar Hanumolu, and Terri S. Fiez. An 80-dB DR, 7.2-MHz bandwidth single opamp biquad based CT ΔΣ modulator dissipating 13.7-mW. *IEEE Journal of Solid-State Circuits*, 48(2):487–501, 2013. ISSN 0018-9200. doi: 10.1109/jssc.2012.2221194.
- Sebastian Zeller, Christian Muenker, and Robert Weigel. A X-coupled differential single-opamp resonator for low power continuous time ΣΔ-ADCs. In 2011 7th Conference on Ph. D. Research in Microelectronics and Electronics, pages 233–236. IEEE, 2011.
- Guo Zhang, Hao He, and Dina Katabi. Circuit-GNN: Graph neural networks for distributed circuit design. In International Conference on Machine Learning, pages 7364–7373. PMLR, 2019a.
- Shuhan Zhang, Wenlong Lyu, Fan Yang, Changhao Yan, Dian Zhou, Xuan Zeng, and Xiangdong Hu. An efficient multi-fidelity bayesian optimization

approach for analog circuit synthesis. In 2019 56th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2019b.

- Shuhan Zhang, Fan Yang, Dian Zhou, and Xuan Zeng. An efficient asynchronous batch bayesian optimization approach for analog circuit synthesis. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2020.
- Yan Zhu, Chi-Hang Chan, U-Fat Chio, Sai-Weng Sin, U Seng-Pan, Rui Paulo Martins, and Franco Maloberti. A 10-bit 100-MS/s reference-free SAR ADC in 90 nm CMOS. *IEEE Journal of Solid-State Circuits*, 45(6):1111–1121, 2010.
- A.L. Zimpeck, C. Meinhardt, and R.A.L. Reis. Impact of PVT variability on 20nm FinFET standard cells. *Microelectronics Reliability*, 55(9):1379– 1383, 2015. ISSN 0026-2714. doi: https://doi.org/10.1016/j.microrel.2015. 06.039. URL https://www.sciencedirect.com/science/article/pii/ S0026271415300202. Proceedings of the 26th European Symposium on Reliability of Electron Devices, Failure Physics and Analysis.

## Vita

Wei Shi was born and grew up in Wuhan, Hubei, China. He received the B.Sc. degree from the College of Electrical Engineering, Zhejiang University, Hangzhou, Zhejiang, China, in 2017, and the M.S. degree in the Electrical and Computer Engineering from the University of Texas at Austin, Austin, TX, USA, where he is currently pursuing the Ph.D. degree. From May 2018 to August 2018, he interned at Cirrus Logic, Austin. From June 2021 to August 2021, he interned at Meta, Menlo Park.

His current research interests include high-performance data converters, machine learning and optimization, and open-source hardware. Mr. Shi received the Chinese National Scholarship and UCLA Cross-disciplinary Scholar Fellowship from 2013-2017. He is a recipient of Texas Instruments Outstanding Student Designer Award in 2017, IEEE ISSCC Analog Devices Inc. Outstanding Student Designer Award in 2019, and the 2020-2021 IEEE SSCS Predoctoral Achievement Award.

Email address: weishi0079@utexas.edu

 $<sup>^{\</sup>dagger} {\rm L\!AT}_{\rm E} X$  is a document preparation system developed by Leslie Lamport as a special version of Donald Knuth's T\_EX Program.