Modeling, Optimization and Testing for Analog/Mixed-Signal  Circuits in Deeply Scaled CMOS Technologies by Yu, Guo
MODELING, OPTIMIZATION AND TESTING FOR ANALOG/MIXED-
SIGNAL CIRCUITS IN DEEPLY SCALED CMOS TECHNOLOGIES
A Dissertation
by
GUO YU
Submitted to the Oﬃce of Graduate Studies of
Texas A&M University
in partial fulﬁllment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
December 2009
Major Subject: Computer Engineering
MODELING, OPTIMIZATION AND TESTING FOR ANALOG/MIXED-
SIGNAL CIRCUITS IN DEEPLY SCALED CMOS TECHNOLOGIES
A Dissertation
by
GUO YU
Submitted to the Oﬃce of Graduate Studies of
Texas A&M University
in partial fulﬁllment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
Approved by:
Chair of Committee, Peng Li
Committee Members, Gwan Choi
Edgar Sanchez-Sinencio
Duncan M. Walker
Head of Department, Costas N. Georghiades
December 2009
Major Subject: Computer Engineering
iii
ABSTRACT
Modeling, Optimization and Testing for Analog/Mixed-Signal
Circuits in Deeply Scaled CMOS Technologies. (December 2009)
Guo Yu, B.S., Fudan University;
M.S., Delft University of Technology
Chair of Advisory Committee: Dr. Peng Li
As CMOS technologies move to sub-100nm regions, the design and veriﬁcation
for analog/mixed-signal circuits become more and more diﬃcult due to the problems
including the decrease of transconductance, severe gate leakage and profound mis-
matches. The increasing manufacturing-induced process variations and their impacts
on circuit performances make the already complex circuit design even more sophis-
ticated in the deeply scaled CMOS technologies. Given these barriers, eﬀorts are
needed to ensure the circuits are robust and optimized with consideration of paramet-
ric variations. This research presents innovative computer-aided design approaches
to address three such problems: (1) large analog/mixed-signal performance modeling
under process variations, (2) yield-aware optimization for complex analog/mixed-
signal systems and (3) on-chip test scheme development to detect and compensate
parametric failures.
The ﬁrst problem focus on the eﬃcient circuit performance evaluation with con-
sideration of process variations which serves as the baseline for robust analog cir-
cuit design. We propose statistical performance modeling methods for two popu-
lar types of complex analog/mixed-signal circuits including Sigma-Delta ADCs and
charge-pump PLLs. A more general performance modeling is achieved by employ-
ing a geostatistics motivated performance model (Kriging model), which is accurate
and eﬃcient for capturing stand-alone analog circuit block performances. Based on
iv
the generated block-level performance models, we can solve the more challenging
problem of yield-aware system optimization for large analog/mixed-signal systems.
Multi-yield pareto fronts are utilized in the hierarchical optimization framework so
that the statistical optimal solutions can be achieved eﬃciently for the systems. We
further look into on-chip design-for-test (DFT) circuits in analog systems and solve
the problems of linearity test in ADCs and DFT scheme optimization in charge-pump
PLLs. Finally a design example of digital intensive PLL is presented to illustrate the
practical applications of the modeling, optimization and testing approaches for large
analog/mixed-signal systems.
vTo My father, YU Chenghua, and my mother, SHEN Meiqin
vi
ACKNOWLEDGMENTS
I would like to express my great thanks to my advisor, Dr. Peng Li, for his
kind guidance for my Ph.D. study. Dr. Peng Li shared his deep knowledge, research
experience and insight with me constantly and provided encouragement and support
to me during my research work. This dissertation would never have been completed
without his advice and help.
I am very grateful for having an exceptional doctoral committee and wish to
thank Dr. Edgar Sanchez-Sinencio, Dr. Gwan Choi and Dr. Duncan M. Walker for
their invaluable support and advice.
I really appreciate Dr. Hongzhou Liu and Dr. Hui Zhang in Cadence Design
System for being my mentors when I was interning there. They shared a great
industry experience and insights, which are an excellent treasure to me. In addition,
I would like to thank all my friends who made my stay at Texas A&M very enjoyable.
In particular I’d like to thank Zhuo Feng, Wei Dong and Xiaoji Ye in Dr. Li’s lab for
the collaboration and fruitful discussion. Thanks also go to Rajesh Garg, Shiyan Hu,
Zhanyuan Jiang in the Computer Engineering Group, and Heng Zhang, Xi Chen and
Yung-Chung Lo in the Analog & Mixed Signal Group for their valuable input on my
research projects.
My research work was supported by the FCRP Focus Center for Circuit & System
Solutions (C2S2), under contract 2003-CT-888. I thank the sponsor for providing
ﬁnancial support.
Last, but not least, I would like to express my greatest gratefulness to my family
for their long-lasting encouragement and support.
vii
TABLE OF CONTENTS
CHAPTER Page
I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A. Capture Statistical Performances Under Process Variations 3
B. Automatic Yield-aware System Synthesis . . . . . . . . . . 4
C. Enhance Performance Using On-chip Design-for-test Function 6
D. Design Case of All-digital PLL . . . . . . . . . . . . . . . . 8
II CIRCUIT PERFORMANCE MODELING UNDER PRO-
CESS VARIATIONS ∗ . . . . . . . . . . . . . . . . . . . . . . . 10
A. Lookup Table Based Sigma-Delta ADC Modeling . . . . . 10
1. Sigma-Delta ADC Background . . . . . . . . . . . . . 11
2. Look-up Table Modeling . . . . . . . . . . . . . . . . . 13
a. Details of Model Extraction Setup . . . . . . . . . 13
b. Controlling of Model Accuracy . . . . . . . . . . . 16
3. Parametric LUT-based Macromodeling . . . . . . . . . 17
a. Response Surface Modeling . . . . . . . . . . . . 17
4. Circuit Examples . . . . . . . . . . . . . . . . . . . . . 20
B. Parameter Dimension Reduced Phase-Locked Loop Modeling 23
1. PLL Background . . . . . . . . . . . . . . . . . . . . . 24
2. Hierarchical PLL Modeling . . . . . . . . . . . . . . . 25
a. Voltage Controlled Oscillator . . . . . . . . . . . 26
b. Charge Pump . . . . . . . . . . . . . . . . . . . . 27
c. Other PLL Circuit Blocks . . . . . . . . . . . . . 28
3. Eﬃcient parametric-reduction PLL Modeling . . . . . 30
a. RRR Based Parameter Dimension Reduction . . . 31
b. Parameterized Macromodeling Using Param-
eter Reduction . . . . . . . . . . . . . . . . . . . 32
C. General Block Modeling Using Kriging Models . . . . . . . 33
1. Mathematical Formulation . . . . . . . . . . . . . . . 34
2. Circuit Examples . . . . . . . . . . . . . . . . . . . . . 37
a. Ring Oscillator . . . . . . . . . . . . . . . . . . . 37
b. LC Oscillator . . . . . . . . . . . . . . . . . . . . 38
D. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
viii
CHAPTER Page
III YIELD-AWARE ANALOG CIRCUIT OPTIMIZATION . . . . . 42
A. Yield-aware Circuit Block Optimization . . . . . . . . . . . 44
1. Pareto Front Background . . . . . . . . . . . . . . . . 45
2. Iterative Search for Pareto Fronts . . . . . . . . . . . . 46
3. Fast Statistical Analysis Using Partial Kriging . . . . 48
4. Yield-aware Block Optimization . . . . . . . . . . . . 50
5. Block Optimization Examples . . . . . . . . . . . . . . 51
B. Yield-aware Hierarchical System Optimization . . . . . . . 54
1. Hierarchical Optimization Background . . . . . . . . . 55
2. Issues in Yield-aware Hierarchical Optimization . . . . 56
a. Pareto Front Generation Issues . . . . . . . . . . 57
b. System-level Optimization Issues . . . . . . . . . 57
3. Multi-yield Pareto Fronts . . . . . . . . . . . . . . . . 59
4. System-level Optimization Formulation . . . . . . . . 60
a. Bridging Block-level and System-level . . . . . . . 60
b. System-level Cost Function . . . . . . . . . . . . 61
c. Optimization Algorithm . . . . . . . . . . . . . . 62
5. System Optimization Examples . . . . . . . . . . . . . 65
a. Two-stage Ampliﬁer . . . . . . . . . . . . . . . . 65
b. Charge-pump PLL . . . . . . . . . . . . . . . . . 68
C. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
IV ON-CHIP TEST FOR ANALOG/MIXED-SIGNAL CIRCUITS 76
A. Linearity Test for Sigma-Delta ADCs . . . . . . . . . . . . 77
1. System Analysis Using Volterra Series . . . . . . . . . 77
a. Nonlinear System Modeling . . . . . . . . . . . . 78
b. Nonlinear Transfer Function Analysis . . . . . . . 79
2. Predicting INL using HDs . . . . . . . . . . . . . . . . 82
a. Relating INL with Transfer Functions . . . . . . . 82
b. Relating INL with HDs . . . . . . . . . . . . . . . 84
3. Simulation-based Model Generation . . . . . . . . . . 85
4. Circuit Example . . . . . . . . . . . . . . . . . . . . . 87
B. On-chip Test Design and Optimization for PLL . . . . . . 89
1. DFT schemes for Parametric Failure Detection . . . . 89
a. Scheme 1 . . . . . . . . . . . . . . . . . . . . . . 91
b. Scheme 2 . . . . . . . . . . . . . . . . . . . . . . 92
c. Scheme 3 . . . . . . . . . . . . . . . . . . . . . . 92
2. DFT Evaluation and Optimization . . . . . . . . . . . 93
ix
CHAPTER Page
a. Identiﬁcation of Key System Level Variation Sources 94
b. DFT Evaluation and Optimization . . . . . . . . 94
3. Optimization Example . . . . . . . . . . . . . . . . . . 96
a. Performance Modeling . . . . . . . . . . . . . . . 97
b. Test Scheme Evaluation and Optimization . . . . 97
c. DFT Scheme Veriﬁcation . . . . . . . . . . . . . . 99
d. DFT Trade-oﬀ Analysis . . . . . . . . . . . . . . 102
C. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
V DESIGN CASE: ALL-DIGITAL PLL . . . . . . . . . . . . . . . 105
A. System Background . . . . . . . . . . . . . . . . . . . . . . 105
B. System-level ADPLL Design . . . . . . . . . . . . . . . . . 108
1. System Performance Analysis . . . . . . . . . . . . . . 108
2. Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . 110
3. Time-to-Digital Converter . . . . . . . . . . . . . . . . 111
4. Digital Controlled Oscillator . . . . . . . . . . . . . . 111
C. Block Modeling in ADPLL . . . . . . . . . . . . . . . . . . 112
1. TDC Modeling . . . . . . . . . . . . . . . . . . . . . . 112
2. DCO Modeling . . . . . . . . . . . . . . . . . . . . . . 115
D. Yield-aware ADPLL Optimization . . . . . . . . . . . . . . 116
1. Topology Selection . . . . . . . . . . . . . . . . . . . . 116
2. Yield-aware Fine Tuning . . . . . . . . . . . . . . . . 117
E. Adaptive Self-tuning ADPLL Design . . . . . . . . . . . . 118
F. Optimization of Adaptive ADPLLs . . . . . . . . . . . . . 121
1. Adaptive System Performance Calculation . . . . . . . 121
2. Optimization of Adaptive ADPLLs . . . . . . . . . . . 122
G. Experimental Results . . . . . . . . . . . . . . . . . . . . . 124
1. Normal ADPLL Optimization . . . . . . . . . . . . . . 125
2. Adaptive ADPLL Optimization . . . . . . . . . . . . . 127
H. Summary and Discussion . . . . . . . . . . . . . . . . . . . 129
VI CONCLUSIONS AND FUTURE DIRECTIONS . . . . . . . . . 132
A. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 132
B. Future Directions . . . . . . . . . . . . . . . . . . . . . . . 132
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
xLIST OF TABLES
TABLE Page
I ITRS predicted process variabilities [1]. . . . . . . . . . . . . . . . . . 1
II Runtime and accuracy comparison for the proposed simulator. . . . . 21
III Comparison of Spectre and LUT based simulator. . . . . . . . . . . . 22
IV Kriging model accuracy for ring oscillator. . . . . . . . . . . . . . . . 38
V Kriging model accuracy for LC oscillator. . . . . . . . . . . . . . . . 40
VI Hierarchical optimization results for PLL. . . . . . . . . . . . . . . . 72
VII Runtime summary for PLL optimization. . . . . . . . . . . . . . . . . 74
VIII The accuracy of the maximum INL prediction. . . . . . . . . . . . . 88
IX Comparison of DFT schemes. . . . . . . . . . . . . . . . . . . . . . . 93
X PLL speciﬁcations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
XI Sensitivities of system performance to DFT schemes. . . . . . . . . . 98
XII Comparison of DFT schemes to identify faulty chips. . . . . . . . . . 100
XIII Optimization variable summary. . . . . . . . . . . . . . . . . . . . . . 125
xi
LIST OF FIGURES
FIGURE Page
1 Block diagram of ΣΔ ADC. . . . . . . . . . . . . . . . . . . . . . . . 11
2 Clocked ΣΔ modulator behavior. . . . . . . . . . . . . . . . . . . . . 12
3 The proposed LUT based simulation framework. . . . . . . . . . . . 14
4 Model extraction setup for integrators. . . . . . . . . . . . . . . . . . 15
5 Modiﬁed output voltage setup. . . . . . . . . . . . . . . . . . . . . . 16
6 Response surface modeling of parameterized LUTs. . . . . . . . . . . 18
7 Spectrum comparison of Spectre and proposed simulator. . . . . . . . 21
8 SNDR distribution with random parameter sweeping. . . . . . . . . . 22
9 SNDR distributions with mismatching of DACs. . . . . . . . . . . . . 23
10 Block diagram of charge-pump PLL. . . . . . . . . . . . . . . . . . . 24
11 Schematic of a ring oscillator and VCO macromodel. . . . . . . . . . 26
12 Schematic of charge-pump. . . . . . . . . . . . . . . . . . . . . . . . 27
13 PLL macromodel generation ﬂow. . . . . . . . . . . . . . . . . . . . . 29
14 LC oscillator schematic. . . . . . . . . . . . . . . . . . . . . . . . . . 39
15 Illustration of pareto front. . . . . . . . . . . . . . . . . . . . . . . . 45
16 Iterative pareto front generation. . . . . . . . . . . . . . . . . . . . . 46
17 Speeding up Monte-Carlo sampling via partial Kriging model evaluation. 49
18 Iterative yield-aware pareto front optimization. . . . . . . . . . . . . 51
19 Yield-aware pareto fronts for the ring oscillator. . . . . . . . . . . . . 52
xii
FIGURE Page
20 Veriﬁcation of yield-aware pareto front. . . . . . . . . . . . . . . . . . 52
21 Yield-aware pareto fronts for the LC oscillator. . . . . . . . . . . . . 53
22 Two-stage Op-Amp schematic. . . . . . . . . . . . . . . . . . . . . . 53
23 Iterative pareto front generation for the two-stage Op-Amp. . . . . . 54
24 Yield-aware pareto fronts for the two-stage Op-Amp. . . . . . . . . . 54
25 Nominal hierarchical optimization ﬂow. . . . . . . . . . . . . . . . . . 56
26 Multi-yield pareto front generation. . . . . . . . . . . . . . . . . . . . 59
27 Mapping from multi-yield pareto fronts to yield-aware system per-
formances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
28 Hierarchical optimization using multi-yield pareto fronts. . . . . . . . 64
29 Schematic of two-stage operational ampliﬁer. . . . . . . . . . . . . . 65
30 Comparison of results of diﬀerent optimization methods. . . . . . . . 67
31 PLL modeling and optimization. . . . . . . . . . . . . . . . . . . . . 69
32 Multi-yield pareto fronts for charge pump (left) and VCO (right). . . 71
33 Trade-oﬀs of lockin time and power at diﬀerent yield levels. . . . . . 72
34 Trade-oﬀs of lockin time and jitter at diﬀerent yield levels. . . . . . . 73
35 Veriﬁcation of performance trade-oﬀs for lockin time and power. . . . 73
36 Modeling of a second-order ΣΔ ADC. . . . . . . . . . . . . . . . . . 79
37 Deﬁnition of integral nonlinearity. . . . . . . . . . . . . . . . . . . . . 83
38 Comparison of INL curves predicted by analytical model and sim-
ulated results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
39 The accuracy of INL푚푎푥 prediction using the simulation-based model. 89
40 DFT scheme candidates. . . . . . . . . . . . . . . . . . . . . . . . . . 90
xiii
FIGURE Page
41 Evaluation and optimization a DFT scheme. . . . . . . . . . . . . . . 93
42 Distribution of system performances. . . . . . . . . . . . . . . . . . . 98
43 Pass/fail predictions of three DFT schemes. . . . . . . . . . . . . . . 99
44 Chip prediction distribution for DFT scheme 1. . . . . . . . . . . . . 101
45 Chip prediction distribution for DFT scheme 3. . . . . . . . . . . . . 101
46 Digital output changes due to process variation for DFT scheme 1. . 102
47 Error v.s. number of test codes. . . . . . . . . . . . . . . . . . . . . . 103
48 All-digital PLL system block diagram. . . . . . . . . . . . . . . . . . 106
49 Phase noise contributions of TDC and DCO. . . . . . . . . . . . . . 107
50 푠-domain linear ADPLL noise model. . . . . . . . . . . . . . . . . . . 109
51 Modeling of TDC noise. . . . . . . . . . . . . . . . . . . . . . . . . . 113
52 Modeling of DCO noise. . . . . . . . . . . . . . . . . . . . . . . . . . 115
53 Yield-aware optimization ﬂow for ADPLL. . . . . . . . . . . . . . . . 118
54 Adaptive PLL system diagram. . . . . . . . . . . . . . . . . . . . . . 119
55 Logic sequence of self compensation. . . . . . . . . . . . . . . . . . . 120
56 Conventional yield-aware optimization. . . . . . . . . . . . . . . . . . 120
57 Proposed yield-aware optimization using adaptive operation. . . . . . 121
58 Comparison of phase noise obtained by proposed method and
event-driven simulation. . . . . . . . . . . . . . . . . . . . . . . . . . 126
59 Jitter distribution in topology selection. . . . . . . . . . . . . . . . . 126
60 Jitter distribution comparisons in ﬁne tuning stage. . . . . . . . . . . 127
61 Power and jitter trade-oﬀs in topology evaluation. . . . . . . . . . . . 128
xiv
FIGURE Page
62 Area and jitter trade-oﬀs in topology evaluation. . . . . . . . . . . . 128
63 Power distribution for the reference optimization. . . . . . . . . . . . 128
64 Power distribution for the adaptive optimization. . . . . . . . . . . . 129
65 Yield-aware optimization with two-way adaption. . . . . . . . . . . . 130
1CHAPTER I
INTRODUCTION
As CMOS technologies move into nano-scale regions, the impact of manufacturing-
induced variations becomes more and more profound to integrated circuit perfor-
mances. The continuous feature size scaling causes increasing uncertainties in device
and circuit electrical characteristics, as illustrated in Table I. Consequently, circuit
performances are no longer deterministic values and the downgraded statistical per-
formances result in parametric failures which in turn cut down the yield of fabricated
chips [2]. Therefore the eﬀects of process variations must be taken into considera-
tion in the circuit design stage so that the circuits can work properly once they are
fabricated. This requirement, however, is nontrivial and needs signiﬁcant eﬀorts in
understanding device level characterizations and utilizing these information to achieve
robust circuit design.
Table I. ITRS predicted process variabilities [1].
Year 08 09 10 11 12 13
Pitch(nm) 57 50 45 40 36 32
푉푡ℎ 37% 42% 42% 42% 58% 58%
Delay 46% 49% 50% 53% 54% 57%
Power 57% 57% 58% 58% 59% 59%
On the other hand, the proliferation of communication and consumer electronic
systems leads to high demands for low-power & high-performance analog/mixed-
signal circuits, either as stand-alone components or integrated IPs [3]. The design
This dissertation follows the style of IEEE Transactions on Automatic Control.
2diﬃculties and complexities of modern analog/mixed-signal circuits have raised sig-
niﬁcantly in recent years to accomplish sophisticated demands like multiple wireless
standards and ultra low power applications. The circuit design tasks become even
more challenging when process variations come into play. Statistical performance
analysis and optimization requirements in addition to the already lengthy analog and
mixed-signal system designs are pushing the product time-to-market to some far ends
which deeply hampers product proﬁtability. Innovations are needed to address these
challenges.
In this dissertation, we focus on addressing the problems related to robust
analog/mixed-signal circuit design in modern CMOS technologies using computer-
aided approaches. The most fundamental questions to be answered is: what the
circuits will behave after fabrication when so many process uncertainties play roles.
Once we have the knowledge about the manners how the circuits behave, it is possible
to use these information to enhance circuit designs. Clearly there could be various
approaches to achieve system enhancements. One way is to consider the possible
performance downgrades in the early design stage and ﬁnd design solutions that can
generate best overall system performances after fabrication. Other possible solutions
take advantage of the cheap digital processing capabilities in CMOS technologies. We
can develop built-in circuitries to detect performance failures due to the process vari-
ations and use digital control logic to compensate performance collapse in individual
chips. In the rest of this dissertation we explain these ideas and how to achieve the
goals in detail.
3A. Capture Statistical Performances Under Process Variations
Although the hardware computation power has been climbing for decades, direct
use of SPICE-like simulators to evaluate statistical system performances are still
prohibitively expensive for complex analog/mixed-signal systems, especially for the
types of circuits with oversampling (e.g. Sigma-Delta ADCs and DACs) or coexistence
of slow and fast signals (e.g. Phase-locked Loops) [4, 5]. In these systems, a single
run of performance evaluation might take a few days or even weeks using dedicated
transistor-level simulators on today’s most powerful servers. Therefore, if we target to
address the problems of robust analog/mixed-signal design under process variations,
it is crucial to ﬁnd eﬃcient ways to measure system performances.
There is no general solution yet to possess both accuracy and eﬃciency for com-
plex analog/mixed-signal circuit simulation. In order to speed up circuit evaluation
procedure, some noncritical information have to be discarded to trade for simulation
speed. This treatment, however, is very circuit topology dependent since diﬀerent
kinds of circuits operate in various manners. The system performance analysis be-
come even more challenging for scaled technologies when process variations have to
be counted and the system performances are now statistical variables. The accurate
capturing of system performance variations in terms of uncertainties in the device
level need to be developed to safeguard circuit design.
Our ﬁrst attempt to solve this problem is developing eﬃcient statistical system
performance evaluation methods for two popular types of analog/mixed-signal circuits
including Sigma-Delta ADCs and charge-pump PLLs. Parametric look-up tables are
utilized to capture the performance variations of the integrators, quantizers and feed-
back DACs caused by process uncertainties. Eﬃcient statistical performance evalua-
tion can be achieved with four orders of magnitude runtime speedup over SPICE-like
4simulators, which makes robust Sigma-Delta ADC design possible. We use a diﬀerent
approach in PLL system simulation to acknowledge the circuit uniqueness. Eﬃcient
behavioral simulation framework is developed to map building block performances to
the system level very eﬃciently. For block performance variation analysis, we propose
an eﬃcient parameter-reduction modeling technique to encode process variations into
block performance models without much computation eﬀort. In such way we achieve
the accurate and eﬃcient PLL performance evaluation taking consideration of process
variations.
A further step to solve the statistical performance analysis problem leads to de-
veloping more general modeling method. It is well known that the performances of
complex analog/mixed-signal systems are very nonlinear and diﬃcult to be modeled
accurately. However, for smaller scale analog circuits the performance-design vari-
able relationship are much simpler and can possibly be linked together using elab-
orated mathematical formulation. We employ Kriging models [6] to capture circuit
performances in terms of design parameters and process variables, and utilize these
models to evaluate circuit performances statistically. One of the unique character-
istics of Kriging performance model is that it can evaluate performance prediction
uncertainty for the new input variable set, which make it possible to enhance model
accuracy adaptively. This circuit performance modeling method can not only be used
to evaluate the performances of circuit blocks, but can also be utilized to achieve more
challenging task of complicated system performance optimization.
B. Automatic Yield-aware System Synthesis
The research and development of automatic analog design have been around for
decades and are drawing more attention recently as system complexities continue
5to grow [3]. Analog design automation methodologies have been implemented as
various design-assist tools commercially available. However, most of these tools are
limited in terms of design variable number (e.g. can only handle simple ampliﬁer or
oscillator designs). A more challenging problem in these tools is the lack of capabil-
ity to handle process uncertainness in automatic circuit designs for yield safeguard.
The gap between the demands of advanced analog designs and oﬀering of current
automation tools must be ﬁlled.
We propose to utilize the powerful performance modeling capabilities provided
by Kriging models for automatic analog system design. Since the mapping from the
design and process variables to the circuit performances can be accurately captured
by Kriging modeling approach, we can search in the design space directly using the
Kriging performance models. In order to achieve this goal, we ﬁrst sample in the whole
design space uniformly to ﬁnd the optimum design points in global sense. Then we
start from the initial design solutions to perform local ﬁne search for better design
solutions. The achieved new circuit performance trade-oﬀs are compared with the
previous ones unless they have reached convergence. The device-level uncertainties are
considered during the optimization search. We use the best achievable performances
at the required yield levels as the optimization objects, so the design points obtained in
the optimization can guarantee to achieve the required yield levels. A novel simulation
algorithm is proposed to help statistical performance distribution analysis, which save
about 60% to 70% computation time when compared with the brute-force use of
Monte-Carlo simulation.
Performance modeling becomes more complicated for large analog/mixed-signal
systems and even Kriging model may not work properly when the performance map-
ping gets over nonlinear. The other barrier for the direct application of Kriging
models in large analog system performance modeling is the model complexity. As
6the number of input variables goes up, even the linear increase of Kriging samples
leads to quadratic computation cost increase, which prevents us from directly em-
ploying the performance models in large system optimization. To solve this problem,
we adopt the hierarchical optimization idea. The complex systems are ﬁrst decom-
posed into several well-isolated building blocks of smaller sizes, then optimization
searches are carried out individually for these blocks. The most diﬃcult problems in
this framework are how to preserve process variation information in the ﬁnal system-
level performance evaluations and how to handle the interactions between diﬀerent
building blocks. We tackle these challenges by introducing the concept of multi-yield
pareto fronts to capture “near-optimal” design solutions in each building block and
perform statistical performance analysis at system level by doing Monte-Carlo simula-
tion for each transistor-level device. In such way we can achieve eﬃcient and accurate
yield-aware design synthesis for large analog/mixed-signal systems.
C. Enhance Performance Using On-chip Design-for-test Function
Besides performing the yield-aware design concept to safeguard system robustness, it
is also possible to make use of the powerful digital processing capabilities in advanced
CMOS technologies to ﬁght against process variations. The immediate requirement in
this framework is to detect performance failures induced by process variations, then
we can either carry out performance compensations or screen out the faulty chips.
In this dissertation we will focus more on the circuit failure detection side including
on-chip design-for-test (DFT) circuit design and optimization.
In commercial integrated circuit production, it is not cost-eﬃcient to use sophis-
ticated external measurement instruments or stations to test individual chips. Some
circuit performances, for example, linearities of ADCs and lock-in time of PLLs can
7not be measured by simple logic on chip. So the idea of self-test relies on apply-
ing some easy to measure circuit metrics to test these non-accessible system perfor-
mances. In order to validate these indirect performance measurement, the metrics
selected need to be highly correlated to these hard-to-measure target performances,
also we need to know the exact mapping relationship between the performance indica-
tors and the actual performances. We start from developing a cheap on-chip linearity
test scheme for Sigma-Delta ADCs with easy-to-access frequency distortion metrics.
Since direct linearity tests require super linear reference signal and mass data pro-
cessing, it is of advantage to replace it with frequency domain measurement. In order
to achieve this, we conduct discrete-time Volterra series analysis in the frequency do-
main and build correlation model between linearities and distortions. Experimental
results show excellent performance testing accuracy for the proposed method.
The cost of self-test circuit implementation is very critical in the whole system
design. The overhead needs to be kept in minimal while achieving the required test
accuracy. So it is crucial to optimize the DFT circuit schemes so that the system
can achieve the best overall performances. We employ the simulation framework
developed in Chapter II to facilitate the DFT circuit optimization for charge-pump
PLLs. Several potential self-test schemes are proposed to detect the parametric fail-
ures caused by process variations. Novel circuit-level macromodeling and powerful
statistical dimension reduction techniques are employed to evaluate the eﬀectiveness
of each on-chip test scheme. We further perform optimization for the chosen scheme
topology by conﬁguring internal scheme variables to achieve the optimal balance of
implementation cost and test accuracy.
8D. Design Case of All-digital PLL
We demonstrate the usefulness of modeling, optimization and testing ideas using a
practical design example. The design case selected is a digital intensive PLL, a.k.a.
all-digital PLL (ADPLL) [7]. It is diﬀerent from most of the conventional types of
PLL since all the control signals are in the digital domain. ADPLL, however, is
still analog/mixed-signal circuit since it relies on the analog circuit blocks like digital
controlled oscillator (DCO) to generate clock signals of adjustable frequencies and
time-to-digital converter (TDC) to measure frequency and phase diﬀerences. ADPLLs
are more process variation resistive than most of other mixed-signal circuits thanks
to the digital-intensive implementation, although the analog blocks are still aﬀected
by the device-level uncertainties.
The uniqueness of ADPLL operation needs to be reﬂected in the system design
and optimization procedures. The most straightforward change is the discretized
design values in system implementation, which need to be treated diﬀerently from
analog-type variables. Typical implementations of ADPLLs require more transistors
than the analog counterparts, so it is important to develop an eﬃcient system per-
formance simulation approach to evaluate the design solutions. The digital signal
processing is robust to process variations, so the yield-aware optimization framework
need to be modiﬁed to account for the circuit changes. We develop eﬃcient variation-
aware performance models for building blocks and employ a two-step optimization
methodology including topology selection and block ﬁne tuning to achieve the eﬃcient
ADPLL system optimization.
Self healing of system performances are easier for ADPLLs thanks to the digital
type implementation. Many hard-to-detect circuit behaviors can now be measured ef-
ﬁciently with digital signal processing. For example, PLL lock-in time can be obtained
9using simple digital logics to monitor digital frequency control signal. Other circuit
performances, however, still need elaborate measurement planning. We develop an
approach to capture the correlation between the system jitter performances and the
frequency diﬀerences so that the jitters in ADPLLs can be measured on chip. With
the system performance information measure by the DFT circuits, we can implement
the performance compensation functions to achieve circuit self healing. The digital
implementation of ADPLL systems give us the freedom to adjust ﬁlter characteris-
tics and TDC/DCO conﬁgurations without much hardware overhead. A prototype
of adaptive performance compensation scheme is implemented by measuring the fre-
quency shift levels which represent the ADPLL jitter performances, a throughout
search of diﬀerent system conﬁgurations is trigged if the jitter exceeds the predeﬁned
value. The conﬁguration with best system performances is saved after the search and
the ADPLL is conﬁgured to this structure with enhanced system performances.
10
CHAPTER II
CIRCUIT PERFORMANCE MODELING UNDER PROCESS VARIATIONS ∗
In this chapter we focus on developing circuit modeling and simulation techniques
to achieve eﬃcient and accurate system performance evaluation under consideration
of process variations. First we present eﬃcient modeling techniques for two popular
mixed-signal circuits including Sigma-Delta ADCs [8] and charge-pump PLLs [9],
then we propose a more general circuit performance modeling approach employing
Kriging models [10].
A. Lookup Table Based Sigma-Delta ADC Modeling
Sigma-Delta (ΣΔ) ADCs have been widely used in data conversion applications due to
the good performances. However, oversampling and complex circuit behaviors render
the transistor-level analysis of these designs prohibitively time consuming. The ineﬃ-
ciency of the standard simulation approach also rules out the possibility of analyzing
the impacts of a multitude of environmental and process variations critical in modern
VLSI technologies. We present a lookup table (LUT) based modeling technique to
facilitate much more eﬃcient performance analysis of ΣΔ ADCs. Various transistor-
level circuit nonidealities are systematically characterized at the building block level
∗ c⃝2007 IEEE. Part of this chapter is reprinted, with permission, from “Eﬃcient
Lookup Table Based Modeling for Robust Design of Sigma-Delta ADCs”, by G. Yu
and P. Li, IEEE Trans. on Circuits and Systems - I, vol. 54, No. 7, pp. 1513-1528,
July 2007.
This material is posted here with permission of the IEEE. Such permission of the
IEEE does not in any way imply IEEE endorsement of any of the Texas A&M Uni-
versity’s products or services. Internal or personal use of this material is permitted.
However, permission to reprint/republish this material for advertising or promo-
tional purposes or for creating new collective works for resale or redistribution must
be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to
view this material, you agree to all provisions of the copyright laws protecting it.
11
and the whole system is simulated much more eﬃciently using these building block
models. Our approach can provide up to four orders of magnitude runtime speedup
over SPICE-like simulators, hence signiﬁcantly shortening the CPU time required for
evaluating system performances such as SNDR (signal-to-noise-and-distortion-ratio).
The proposed modeling technique is further extended to enable scalable performance
variation analysis of complex ΣΔ ADC designs.
1. Sigma-Delta ADC Background
As illustrated in Fig. 1, the two basic components of ΣΔ ADCs are modulators and
digital ﬁlters. The analog input is sampled by a very high frequency clock in the ΣΔ
modulator, then the signal is passed through a loop-ﬁlter to perform noise-shaping.
The output of the loop-ﬁlter is quantized by an internal A/D converter, producing
a bit-stream at the same speed as the sampling clock. A low-pass digital ﬁlter then
removes the out-of-band noise and the down-sampler converts the high speed bit-
stream to the high resolution digital codes.
 




	









	

	







	
 

 ﬀ
ﬁﬁ ﬂ

ﬃ
	




  !
ﬃ
	


"

#
$
%&
'
()*
+
,
-
(.
$
/012,
-
(.
1
1
1 1
a z
z
−
−
−
1
2
1 1
a z
z
−
−
−
 
2d1d


	




3



Fig. 1. Block diagram of ΣΔ ADC.
The major components of a ΣΔ modulator are integrators, internal quantizers
and D/A converters. The whole system is clocked by an external sampling clock,
which makes it possible to model the performance of each component at sampling
12
intervals. The output of switched-capacitor integrators used in the ΣΔ converter is
a function of input signals and their previous states
푦[푘 + 1] = 퐹 (푦[푘], 푥[푘 + 1], 푑[푘 + 1]), (2.1)
where 푦[푘 + 1] is the current output of the integrator, 푦[푘] is the previous integrator
output, 퐹 is a nonlinear function describing the state transfer, 푥[푘 + 1] and 푑[푘 + 1]
are the current input signal and feedback digital output, respectively. As shown in
Fig. 2, since each integrator is clocked by the sampling clock, it is possible to use
lookup tables to model the output at the end of each clock cycle, as described in the
later sections.
 
 



	



	
 



	

	

	
 

	


	











 
 



	



	
 



	

	

	
 

	


	
1 1
2 2y[k+1]=F(y[k],x[k+ ],d[k+ ])




Fig. 2. Clocked ΣΔ modulator behavior.
On the other hand, various circuit-level nonidealities such as the ﬁnite DC gain,
bandwidth and slew rate of the operational ampliﬁes, charge injections of the switches,
mismatching of the internal quantizers and D/A converters, etc, are diﬃcult to ana-
lyze accurately by hand analysis, neither are their impacts on system performances.
The ﬁnite bandwidth, slew rate and saturation of the ampliﬁer also introduce incom-
plete charge transfer, which shifts system transfer function. There is no simple way to
calculate the inﬂuence of the eﬀects mentioned above in terms of SNDR, so normally
transistor-level simulation needs to be employed. Additionally, using transistor-level
simulation to predict the linearity of the design can be prohibitively expensive. For
13
example, transient analysis needs to be performed over at least (214 − 1) ⋅ 128 clock
cycles to fully characterize a ΣΔ ADC. The process variation analysis is another
challenge for the conventional simulators because a large number of long transient
simulations are needed to evaluate the performance of the circuit at diﬀerent param-
eter corners. In the following sections, it will be shown that these issues can be well
addressed by adopting LUT based modeling.
2. Look-up Table Modeling
The proposed lookup table (LUT) based simulation framework is illustrated in Fig. 3.
In our fast ADC simulation methodology, various circuit blocks are modeled as follows.
The macromodel of each building block of the ΣΔ modulator is extracted at the
transistor-level using Cadence Spectre [11]. SNDR and THD are calculated using
Fast Fourier transform (FFT) to estimate the performance of the modulator. A two-
stage Cascaded Integrator Comb (CIC) ﬁlter is implemented as the decimator and the
decimation rate (same as OSR) is programmable to adjust for diﬀerent applications.
Since the simulator can run long transient simulation very eﬃciently (2 seconds for
64K cycles), INL and Diﬀerential Nonlinearity (DNL) can be easily calculated by
applying an input ramp signal and evaluating the output digital codes.
a. Details of Model Extraction Setup
The setup of ΣΔ ADCs for the lookup table generation can be divided into two
parts, one consists of integrators and D/A converters, and the other consists of the
quantizer. The integrator output is a function of the input signals and the initial
state of the integrator, which is discretized to generate the lookup table. The number
of levels depends on the accuracy requirement of the simulator. Since ΣΔ ADCs
are quite linear in most cases [4], linear interpolation is good enough. The internal
14
 



 



	


	


	


	

 



 




	


	


	


	




 



	


	


	


	

 



	


	


	


	









 ﬀ

ﬁ


ﬂﬃ
 
 

 

!
""#$% &'()* +
!
""#$% &'()* ,
-.
/0
1 23
4
.
56
2
7
89
:
3
;0
2 21
549
<
85
.
23
9
.
1 =
5
1
5
>
;
=
96 9
?
1
85@
13
;
.
0
23
.
4 A/9@
1
89
B
;;C
D
0/
<
5E69
F
5
2
9
=
A
3
G065
1
;8
3
. >
5
1
65E
3
.
1
98/;65
13
;
.
-.
/0
1
H

H H
 
H
H H


I
JKL
M
N
O
A/9@
1
80G
P
0
1
/0
1
Q
3
4
31
56
P
0
1
/0
1
A
R
21
9G
7
98
S
;8G5
.
@9
AT
QU
V
<WQ
V
-
T
B
Fig. 3. The proposed LUT based simulation framework.
15
voltage swing is determined by the system architecture, for low-voltage designs the
internal voltages can change from 0 to 푉푑푑. To cover the whole range of the voltage
swing, we discretize the inputs and outputs of the integrators at N levels, from 0 to
푉푑푑.
The extraction setup for an integrator with a 2-bit DAC which is implemented in
thermometer code is shown in Fig. 4. A large inductor 퐿 together with a voltage source
푉푠 is used to set the initial value of the integrator output. The input of the integrator
is also set by a voltage source 푉푖. The digital output of the quantizer controls the
amount of feed back charge. The digital codes of 00 to 11 can be represented by
counting the number of three voltage sources 푉푑1, 푉푑2 and 푉푑3 which are set to 푉푑푑.
For the ΣΔ modulators with 1-bit quantizer, the modeling of the digital signal is
simpler, with only one voltage source 푉푑.
 
 










	 





 








 








 





















Fig. 4. Model extraction setup for integrators.
The nonidealities of the quantizer should also be taken into consideration. For a
1-bit quantizer, we can use Spectre to ﬁnd the voltage levels where the digital output
switches from 0 to 1 (푉표푓푓+) and 1 to 0 (푉표푓푓−), the quantizer is then modeled as
푑[푘 + 1] =
⎧⎨
⎩
1 (푉푖푛[푘 + 1] > 푉표푓푓+)
푑[푘] (푉표푓푓− < 푉푖푛[푘 + 1] < 푉표푓푓+)
0 (푉푖푛[푘 + 1] < 푉표푓푓−)
(2.2)
16
where 푑[푘+1] is the current output of the quantizer, 푑[푘] is the output of the quantizer
in the previous clock cycle. A 2-bit quantizer can be modeled in a similar way since
it is built from three 1-bit quantizers each of which is modeled as in Eqn. 2.2.
b. Controlling of Model Accuracy
To achieve good accuracy for the LUT methodology, several issues regarding with
modeling must be taken into consideration. When we perform the lookup table gen-
eration, the inductor 퐿 and the voltage source 푉푠 are used to set the initial condition
of the integrator as shown in Fig. 5, here we redraw it in Fig. 5.
 


 
 
	









 

 
	






Fig. 5. Modiﬁed output voltage setup.
As shown in the right part of Figure 5, the initial output voltage of the integrator
was not setup correctly in [12]. To see this, suppose that the gain of the ampliﬁer is
퐴, the input voltage is 푉푖푛 and the initial voltage of the ampliﬁer output is 푉푠, we get
퐴 ⋅ 푉푖푛 = −(푉푠 − 푉푖푛), (2.3)
so the voltage at the input node can be written as 푉푖푛 = −푉푠/(1 + 퐴). Suppose
푉푠 = 푉푑푑/2 and 퐴 = 60푑퐵, we will get an oﬀset voltage of ∣푉푖푛∣ = 푉푑푑/2002. This
oﬀset voltage occurs each time when the table is built and will be transferred to the
ampliﬁer output through 푉푠, which makes the tables generated inaccurate. In our
experiments, it has been observed that such an oﬀset voltage can introduce as much
17
as 5dB error in SNDR with the input signal in full voltage swing.
Another issue to be noticed is that, if we take a look into the charging consequence
of the integrator in one clock cycle from 푛푇 to (푛+1)푇 , we can see that the sample-
and-hold circuit follows the input signal during the ﬁrst half clock cycle. So we have
to use the input signal at (푛 + 1/2)푇 instead of (푛 + 1)푇 as the index to the LUT
during simulation. This should be taken into consideration and the Eqn. 2.1 will be
rewritten as 푦[푘+1] = 퐹 (푦[푘], 푥[푘+1/2], 푑[푘+1/2]), which was not handled correctly
in [12]. Since the digital output of the DAC remains the same within a clock cycle,
푑[푘] can be used to replace 푑[푘 + 1/2].
3. Parametric LUT-based Macromodeling
Environmental and process variations can introduce noticeable shifts in the perfor-
mance of ΣΔ ADCs. To allow a feasible variation analysis, we combine the response
surface modeling with the LUT based methodology in this section. The term circuit
design variable includes environmental variation, process variation, mismatching, etc,
which has an impact on the circuit performance.
a. Response Surface Modeling
To ﬁnd the inﬂuence of diﬀerent circuit parameters to the performance of a system,
parameterized lookup tables can be used to approximate the system performance un-
der the circuit parameter variations as illustrated in Fig. 6. Given a set of 푛 responses
푦1, 푦2, ⋅ ⋅ ⋅ , 푦푛 and 푛 sets of 푚 input variables 푥1, 푥2, ⋅ ⋅ ⋅ , 푥푚 which represent the en-
vironmental and process variations, we can determine a set of simpliﬁed formulas
18
ℎˆ1, ℎˆ2, ⋅ ⋅ ⋅ , ℎˆ푛 to relate 푥 and 푦 as [13]
푦ˆ1 = ℎˆ1(푥11, 푥12, ⋅ ⋅ ⋅ , 푥1푚)
푦ˆ2 = ℎˆ2(푥21, 푥22, ⋅ ⋅ ⋅ , 푥2푚)
...
푦ˆ푛 = ℎˆ푛(푥푛1, 푥푛2, ⋅ ⋅ ⋅ , 푥푛푚)
, (2.4)
where
푦ˆ푖 푖th approximated response,
ℎˆ푖 a function relating y and x,
푥푖 푖th set of circuit variables,
m number of circuit variables,
n number of experimental runs.
  


	





  
 


	


	


 
	







	





	



 

 ﬀ

 ﬁ


 



ﬂ
	

  
	

^ ^ ^ ^
0
1 1 1
m m m
i i ij i j
i i j
y x x xβ β β
= = =
= + +∑ ∑∑
ﬃ

 !"
#$%%
&'"
(
')*
 %
+ ,
  -  

Fig. 6. Response surface modeling of parameterized LUTs.
To construct the parameterized LUTs, a number of simulations should be per-
formed to get enough sets of response values. In order to minimize the cost of LUT
generation while keeping reasonable accuracy, RSM can be applied to generate a
19
quadratic function relating each entry in the table with the circuit parameters [13]
푦ˆ = 훽ˆ0 +
푚∑
푖=1
훽ˆ푖푥푖 +
푚∑
푖=1
푚∑
푖=1
훽ˆ푖푗푥푖푥푗 , (2.5)
where
푥푖 푖th circuit variable,
푦ˆ approximated response (an table entry),
훽ˆ estimated ﬁtting coeﬃcients,
m number of circuit variables.
Since minimizing the number of simulation is a major consideration, an experi-
mental plan to reuse all runs which are performed during variable screening for ﬁtting
coeﬃcients construction is needed [13]. Here a second-order central composite plan
consists of cube design plan and star design plan is employed [14]. The cube design
plan is a two-level fractional factorial plan which can be used to estimate ﬁrst-order
eﬀects (e.g., 푥푖) and interaction eﬀects (e.g., 푥푖푥푗), but it is not possible to estimate
pure quadratic terms (e.g., 푥2푖 ). The star design plan is used as a supplementary
training set to provide pure quadratic terms in Eqn. 2.5.
When we perform variation analysis, the range for each circuit variable is speciﬁed
at the very beginning. The discretization of the circuit variables can be arbitrary, but
in the two-level factorial experimental design plan, each factor takes on two values
−1 and +1 to represent the minimum and the maximum value of the circuit variable,
respectively. Each factor in the star plan takes on three levels −훼, 0, 훼, where 0
represents the circuit variable in nominal case and ±훼 are two standardized values of
the circuit variable between the two ends (−1,+1) of the circuit variable. Once the
experimental plan has been determined by RSM, the ﬁtting coeﬃcients in Eqn. 2.5
can be constructed using least-square ﬁtting. In reality, numerically more stable
algorithms such as SVD can be used to solve the nonlinear least square problem.
20
4. Circuit Examples
We use two second-order ΣΔ ADCs designed with a 1-bit and a 2-bit quantizer as
design example to demonstrate the eﬀectiveness of the proposed simulation frame-
work. Both converters are implemented in 0.13휇푚 CMOS technology with single
1.5 푉 supply. The oversampling ratio is set to 128 and the sampling clock used is
1MHz and the stimuli is a 2푘퐻푧 0.8푉푝−푝 sinusoidal signal. For each converter, we
perform 65,536 (216) + 100 clock cycles transient simulation with the ﬁrst 100 points
thrown. A Kaiser window is applied to the digital output in FFT analysis. We con-
sider process and environment variation for the ΣΔ ADC with 1-bit quantizer, and
the internal DAC mismatching for the ΣΔ ADC with 2-bit quantizer. Because the
1-bit DAC in the ΣΔ ADC is quite linear, nonlinearities are mainly due to process
variation, also the major part of nonlinearities in the 2-bit ΣΔ ADC come from the
mismatching of the internal DACs, this approach is a good approximation for circuit
performance evaluation.
The generation of parameterized LUTs is time consuming because traditional
simulators are used for transient simulation, while since the tables are reusable once
they are built, the speed up of the LUT based simulator is calculated by the ratio of
the run time of the LUT based simulator and Spectre. A comparison of the model
extraction time, simulation time, SNDR and THD of the LUT based simulator and
Spectre are shown in Table II.
Fig. 7 shows the the output spectrum of the ΣΔ ADC with 2-bit quantizer in
nominal case. Two spectra ﬁt very well, especially for the signal and distortions which
are of most interest in the performance analysis, indicating the good accuracy of the
LUT simulator.
The eight parameters selected to represent the environmental and process vari-
21
Table II. Runtime and accuracy comparison for the proposed simulator.
LUT based simulator Spectre
Design Nom. Para. Time SNDR THD Time SNDR THD
SDM (1bit) 7 min 9.5 hr 2 s 73.8 dB -63.1 dB 4.5 hr 74.1 dB -62.6 dB
SDM (2bit) 20 min 15 hr 4 s 86.8 dB -76.2 dB 9.5 hr 86.2 dB -76.8 dB
0 1000 2000 3000 4000 5000 6000 7000 8000
−150
−100
−50
0
Frequency (HZ)
Am
pl
itu
de
 (d
b)
LUT based simulator
Spectre
Distortion
Fig. 7. Spectrum comparison of Spectre and proposed simulator.
ation of the ΣΔ ADC with 1-bit quantizer are presented as follows. Temperature
(푇푒푚푝) changes from −97퐶표 to 127퐶표, the threshold voltage of PMOS (푉푡ℎ푝) and
NMOS (푉푡ℎ푛), the carrier mobility in PMOS (휇표푝) and NMOS (휇표푛), the eﬀective
channel length of PMOS (퐿푒푓푓푃 ) and NMOS (퐿푒푓푓푛), and the oxide thickness (푇표푥)
are all swept ±40% from their nominal values. A Resolution VI 28−2 fractional fac-
torial design plan that includes 64 runs for the cube design plan and 17 runs for the
star design plan is used to build the parameterized lookup table. The high speed of
the LUT simulator makes it possible to perform statistical simulation, which is too
time consuming to be done with conventional simulators. A statistical performance
analysis is carried out with 1000 runs of transient simulation. It will take 4500 hours
for Spectre to complete, but only 20min for the LUT simulator. Four sets of environ-
mental and process parameters are selected randomly and transient analysis of 64K
22
Table III. Comparison of Spectre and LUT based simulator.
Spectre LUT simulator Error Speed up
set1 72.4dB 72.1dB 0.3dB 8100X
set2 73.2dB 72.3dB 0.9dB 8100X
set3 74.0dB 73.9dB 0.1dB 8100X
set4 74.8dB 74.6dB 0.2dB 8100X
clock cycles are performed to evaluate the accuracy of the LUT simulator, as shown
in Table III. The error of SNDR is within 1dB, which demonstrates the eﬀectiveness
of our method for capturing process variation.
An experiment of SNDR distribution of the 1-bit quantizer ΣΔ ADC with eight
parameters swept randomly for 1000 runs is shown in Fig. 8. We can see that the
ADC is most likely to have a SNDR of 73dB and the deviation is small, which means
the design of the converter is very robust.
67 68 69 70 71 72 73 74 75 76 77
0
10
20
30
40
50
60
70
SNDR (dB)
N
um
be
r
Fig. 8. SNDR distribution with random parameter sweeping.
In today’s CMOS technology, the mismatching of capacitors can be controlled
within±1%, here we set the maximum mismatching to±2% to cover all the situations.
Statistical simulations are performed to analyze the inﬂuence of the mismatching of
the two internal DACs in the ΣΔ ADC with 2-bit quantizer by sweeping the ca-
pacitances of the three charging capacitors within ±2% randomly for each DAC. The
23
distributions of SNDR for the Sigma-Delta ADC using two DACs are shown in Fig. 9.
We can see from the two ﬁgures that the mismatching of the DAC connected to the
65 70 75 80 85 90
0
20
40
60
80
100
SNDR (dB)
N
um
be
r
65 70 75 80 85 90
0
20
40
60
80
100
SNDR (dB)
N
um
be
r
Fig. 9. SNDR distributions with mismatching of DACs.
ﬁrst stage integrator (left ﬁgure) has much more inﬂuence to the system performance
then that of the other DAC (right ﬁgure). This can be explained by the fact that
the ﬁrst DAC is connected directly to the input, so the feedback error because of the
DAC mismatching will be magniﬁed by the second stage integrator, which makes the
mismatching more severe. This analysis provides useful information to the designer
that more attention should be paid to the ﬁrst stage DAC to make it linear.
B. Parameter Dimension Reduced Phase-Locked Loop Modeling
As an essential building block, PLLs are widely used in today’s communication and
digital systems for purposes such as frequency synthesis, low-jitter clock generation,
data recovery and so on. Although the input and output signals of PLLs are in the
digital domain, most PLLs implementations consist of both digital and analog com-
ponents, which make them prone to process variation inﬂuences. In this section we
propose an eﬃcient parameter-reduction modeling technique to capture process vari-
ations and further achieve low-cost system performance evaluation using hierarchical
system simulation. The proposed method can not only be used for robust PLL design
24
under process variation, but also paves the road for eﬀective built-in self-test circuit
design as to be discussed in Chapter IV.
1. PLL Background
As illustrated in Fig. 10, a typical charge-pump PLL system consists of a frequency
detector, a charge pump, a loop ﬁlter, a voltage-controlled oscillator (VCO) and
a frequency divider. The frequency of the output clock signal 퐹표푢푡 is N times of
that of reference clock signal 퐹푟푒푓 , where N can be an integer number or fractional
number. The PLL design options include VCO topologies and component sizes, ﬁlter
characterizations, charge current in the charge pump and so on. The metrics of PLL
systems usually include acquisition/lock-in time, output jitter, system power, total
area, etc. Capturing these performances are of great importance in PLL design.
 




	

	

















	
 

 









ﬀ
ﬁﬂﬃ
ﬀ




	

 
Fig. 10. Block diagram of charge-pump PLL.
Due to the mixed-signal nature, the modeling of PLL system is quite complex
and costly. For example, a long transient simulation (in the order of hours or days) is
needed to obtain the lock-in time behavior of PLL, which is one of the most impor-
tant performance metrics for a PLL. The diﬃculty of system performance evaluation
can be addressed by adopting a bottom-up modeling and simulation strategy. The
performances of analog building blocks can be evaluated and optimized without too
much cost. When the behaviors of analog building blocks are extracted, these building
25
blocks can be mapped to Verilog-A models for fast system level evaluation [15]. By
using this approach we can avoid the scalability issue associated with time consuming
transistor-level simulations.
When process variations are considered, the situation becomes more sophisti-
cated. The large number of process variables and the correlations between diﬀerent
building blocks introduce more uncertainties for PLL performance under process vari-
ations. In order to utilize the hierarchical simulation method while taking into consid-
eration of statistical performance distributions, we propose an eﬃcient macromodeling
method to handle this diﬃculty. The key aspect of the proposed macromodeling tech-
niques is the extraction of parameterized behavioral models that can truthfully map
the device-level variabilities to variabilities at the system level, so that the inﬂuence
of fabrication stage variations can be propagated to the PLL system performances.
Parameterization can be done for each building block model as follows. First,
multiple behavioral model extractions are conducted at multiple parameter corners,
possibly following a particular design-of-experiments (DOE) plan [14]. Then, a pa-
rameterized behavioral model is constructed by performing nonlinear regression over
the models extracted at diﬀerent corners. This detailed parametric modeling step
is advantageous since it systematically maps the device-level parametric variations
to each of the behavioral models. However, diﬃculties arise when the number of
parametric variations is large, which leads to prohibitively high parametric model
extraction cost. We address this challenge by applying design-speciﬁc parameter di-
mension reduction techniques as described in the following section.
2. Hierarchical PLL Modeling
In this section we ﬁrst describe the nominal behavioral model extraction for each PLL
building block, then we discuss how a parameterized model can be constructed in the
26
next section.
a. Voltage Controlled Oscillator
The voltage controlled oscillator (VCO) is the core component of a PLL. The two
mainstream types of VCOs are LC-tank oscillators and ring oscillators. A 5-stage ring
oscillator and a VCO behavioral model are shown in Fig. 11. In this VCO model, the
dynamic and static characteristics of the voltage to frequency transfer are modeled
separately ﬁrst and then combined to form the complete model.
 
 
 


	
Vcon’
Freq
Vcon Vcon’ FreqDelay 
Element
H(s)




 

 

Fig. 11. Schematic of a ring oscillator and VCO macromodel.
As shown in the ﬁgure, the static VCO characteristic can be written as 퐹표푢푡 =
푓(푉 ′푐표푛), where 퐹표푢푡 is the output frequency, 푉
′
푐표푛 is the delayed control voltage, and
푓(⋅) is a nonlinear mapping relating the voltage with the frequency. 푓(⋅) can be
further represented by an 푛-th order polynomial function
퐹표푢푡 = 푎0 + 푎1 ⋅ 푉
′
푐표푛 + 푎2 ⋅ 푉
′
푐표푛
2
+ ⋅ ⋅ ⋅+ 푎푛 ⋅ 푉
′
푐표푛
푛
, (2.6)
where 푎0, 푎1, ⋅ ⋅ ⋅ , 푎푛 are the coeﬃcients of the polynomial. To generate the above
polynomial, multiple VCO steady-state simulations are conducted at diﬀerent control
voltage levels and a nonlinear regression is performed using the collected simulation
data. The dynamic behavior of the VCO is modeled by adding a delay element that
produces a delayed version of the control voltage, 푉 ′푐표푛. The delay element can be
27
expressed using a linear transfer function 퐻(푠) (e.g. a second-order RC network
consisting of two R’s and two C’s).
b. Charge Pump
The schematic of a widely-used charge pump is shown in Fig. 12. The control signals
of the two switches come from the outputs of the phase and frequency detector. The
currents through 푀1 and 푀2 can be turned on-and-oﬀ to provide desired charge-up
or charge-down currents.
 






	






















 











		
Fig. 12. Schematic of charge-pump.
The existing charge pump macromodels are very simplistic. Usually, both the
charge-up and charge-down currents are modeled as constant values [16, 17]. A con-
stant mismatch between the two currents may be also considered [15]. However, this
simple approach is not suﬃcient to model the behavior of charge pump accurately.
As indicated in Fig. 12, the current sources are implemented using transistors so that
the actual output currents will vary according to the voltages across these MOSFET
(푀1 and 푀2). Therefore, the dependency of charge-up and charge-down currents on
푉푐표푛 must be considered.
In our charge pump model, for each output current, the current vs. 푉푐표푛 char-
acteristics is divided into two regions. When the output voltage 푉푐표푛 is close to the
28
supply voltage, then 푀1 will be biased in triode region. The charge-up current 퐼푢푝 in
the triode region (shadow in Fig. 12) is given as
퐼푢푝 = 휇푝퐶표푥
푊
퐿
[(푉푔푠 − 푉푡ℎ푝)푉푑푠 − 0.5푉
2
푑푠]
푉푑푠 = 푉푑푑 − 푉표푛 − 푉푐표푛
(2.7)
where 푉푑푑 is the supply voltage, 푉표푛 is the on-voltage across the switch, 푉푔푠 is the gate-
source voltage, 휇푝 is the mobility, 퐶표푥 is the oxide capacitance, 푊 is width and 퐿 is
the length of 푀1. We can see from Eqn. 2.7 that the charge-up current is dependent
on the output voltage 푉푐표푛. We use a polynomial to explicitly model such voltage
dependency
퐼푢푝 = 푏0 + 푏1 ⋅ 푉푐표푛 + 푏2 ⋅ 푉
2
푐표푛 + 푏3 ⋅ 푉
3
푐표푛
(2.8)
where 푏′푖푠 are the polynomial coeﬃcients. Similarly, the charge-down current has a
strong 푉푐표푛 dependency when 푉푐표푛 is low. This voltage dependency is modeled in a
similar fashion. When 푀1 and 푀2 operate in saturation region, they act as part of
the current mirrors. In this case, constant output current values are assumed while
the possible mismatch between the two are considered in our Verilog-A models.
c. Other PLL Circuit Blocks
The phase detector and the frequency divider are digital circuits so that they are more
amenable to behavioral modeling. The two key parameters of the phase detector and
the frequency divider are the output signal delay and the transition time, which are
easy to extract from transistor-level simulation. The loop ﬁlters are usually comprised
of passive RC elements, which can be directly modeled using Verilog-A. The complete
PLL macromodel generation ﬂow is shown in Fig. 13.
29
 
 

	 

 















ﬀ
ﬁﬂ ﬃ
ﬀ




	

 
 



 
!
"
#
$
"ﬃ
!
ﬀ

ﬀ
"
%
"ﬁ& 
 
 

	 

 
!
ﬀ

ﬀ
"
%
"ﬁ& 
'()*+, -)
./
- 0123 (*
4
55
626+(7 8,712
9
/:
(;;0
<
0
:
,
=>?
@
4A
3
BC
3
44
5DDD
E
0(*
4
>F
B
4
:
62;1;
4
1(201
/
-3
,2)'()*+,
'()*+,
@
/
( 0
/
(23 (*
4
55
626+(7 8,712
-
A
6;, G H
IJK
.L
EI
1)
4
'()
9
0
/:
(;;0-
A
6;,
B
JK
.L
EM
H3
DD5D
,2)'()*+,



ﬀN



ﬀ
"ﬁO
P
ﬁ"
Q
ﬁ
Q
 

 
ﬀ


"
ﬀ
ﬁﬂ

Fig. 13. PLL macromodel generation ﬂow.
30
3. Eﬃcient parametric-reduction PLL Modeling
When process variations are considered, parameterization can be done for each build-
ing block model as follows. First, multiple behavioral model extractions are conducted
at multiple parameter corners, possibly following a particular design-of-experiments
plan. Then, a parameterized behavioral model is constructed by performing nonlinear
regression over the models extracted at diﬀerent corners. This detailed parametric
modeling step is advantageous since it systematically maps the device-level paramet-
ric variations to each of the behavioral models. However, diﬃculties arise when the
number of parametric variations is large, which leads to a prohibitively high para-
metric model extraction cost.
The key parametric variations for a single transistor may include variations in
mobility 휇, gate oxide 푇표푥, threshold voltage 푉푡ℎ, eﬀective length 퐿푒푓푓 and so on [18].
The consideration of all possible sources of variations in transistors and interconnects
can easily lead to explosion of the parameter space, rendering the parametric model-
ing infeasible. Although the widely used principle component analysis (PCA) [19] can
be adopted to perform parameter dimension reduction, its eﬀectiveness may be rather
limited since parameter reduction is achieved by only considering the statistics of the
controlling parameters while neglecting the important correspondence between these
parameters and the circuit performances of interest. As such, the extent to which
the parameter reduction can be achieved is not suﬃcient for our analog macromod-
eling problems. To address this diﬃculty, a more powerful design-speciﬁc dimension
reduction technique, which is based on reduced rank regression (RRR), is developed.
This new technique considers the crucial structural information imposed by the de-
sign and has been shown to be quite eﬀective for parametric interconnecting modeling
problems [20, 21].
31
a. RRR Based Parameter Dimension Reduction
Suppose we have a set of 푛 process variations, 푋 , and a set of 푁 performances, 푌 .
The objective is to identify a smaller set of new variables 푍, based on 푋 , which are
statistically signiﬁcant to the performances of interest, 푌 . Without loss of generality,
let us assume 푌 nonlinearly depends on 푋 through a quadratic model
푌 = 푓(푋) ≈ [훽1 훽2]
⎡
⎢⎢⎣ 푋
푋 ⊗푋
⎤
⎥⎥⎦ (2.9)
where 훽1 and 훽2 are the ﬁrst and second order coeﬃcients, 푋 ⊗ 푋 represents the
quadratic terms of 푋 . The combination of the linear and quadratic terms of 푋 are
then deﬁned as a new predictor vector 푋ˆ = [푋푇 (푋 ⊗ 푋)푇 ]푇 . Now the quadratic
model in (2.9) can be cast into a linear model in 푋ˆ as: 푌 = 퐴푋ˆ + 휀. To identify the
redundancy in 푋 to facilitate parameter reduction, we seek a reduced rank regression
model in the form
푌 = 퐴푅퐵푅푋ˆ + 휀, (2.10)
where 퐴푅 and 퐵푅 have a rank of 푅 (푅 < 푛), and 퐵푅 has only 푅 columns. We denote
the covariance matrix of 푋ˆ as 퐶표푣(푋) = Σ푋ˆ푋ˆ , and covariance matrix between 푋ˆ
and 푌 as 퐶표푣(푌, 푋ˆ) = Σ푌 푋ˆ . It can be shown that an optimal reduced rank model
(in the sense of mean square error) is given as [21]
퐴푅 = 푈,퐵푅 = 푈
푇Σ푌 푋ˆΣ
−1
푋ˆ푋ˆ
, (2.11)
where 푈 contains 푅(푅 < 푛) normalized eigenvectors corresponding to the 푅 largest
eigenvalues of the matrix: 퐷 = Σ푌 푋ˆΣ
−1
푋ˆ푋ˆ
Σ푋ˆ푌 . It is important to note that a success-
ful construction of the above reduced rank model indicates that only a smaller set of
푅 new parameters 푍 = 퐵푅푋ˆ are critical to 푌 in a statistical sense, hence facilitating
32
the desired parameter reduction.
It should be noted that the reduced rank regression is only employed as a means
for parameter reduction so as to reduce the complexity of the subsequent parame-
terized macromodeling step. Hence, 푌 in the above equations do not have to be the
true performances of interest and can be just some circuit responses that are highly
correlated to the performances. This ﬂexibility can be exploited to more eﬃciently
collect Σ푌 푋ˆ though Monte-Carlo sampling if 푌 are easier to obtain than the true
performances in simulation.
b. Parameterized Macromodeling Using Parameter Reduction
The parameterized PLL macromodel extraction ﬂow is shown in Fig. 13. As men-
tioned in Eqn. 2.6 and 2.8, each behavioral model parameter is expressed as a poly-
nomial in the underlying device-level variations, such as
훽 = 푓(푉푡ℎ1, 퐿푒푓푓1, 푇표푥1, ⋅ ⋅ ⋅ , 푉푡ℎ푛, 퐿푒푓푓푛, 푇표푥푛), (2.12)
where 푉푡ℎ푖, 퐿푒푓푓푖, 푇표푥푖, etc represent the parameters of 푖-th transistor, 푓(⋅) is the non-
linear polynomial function. 푓(⋅) is very diﬃcult to obtain if the number of parameters
is large. Hence, RRR-based parameter reduction is applied, which leads to a set of
푅 new parameters 푍 that are the most important variations for the given circuit
performances of interest. If 푅 is small, then a new parameterized model in terms of
푍 can be easily obtained through conventional nonlinear regression
훽 = 푓ˆ(푍1, 푍2, ⋅ ⋅ ⋅ , 푍푅). (2.13)
In addition to reducing the cost of parameterized macromodeling, parameter di-
mension reduction also leads to more eﬃcient statistical simulation of the complete
PLL. This is because instead of analyzing the design performance variations over
33
the original high-dimensional parameter space, statistical simulation can be now per-
formed more eﬃciently in a much lower dimensional space that carries the essential
information of the design variability. The application of the parameter reduction can
also improve the eﬃciency of DFT schemes as to be described in Chapter IV.
C. General Block Modeling Using Kriging Models
We have presented the eﬃcient modeling techniques for Sigma-Delta ADCs and
charge-pump Phase-locked Loops. The ability to accurately model arbitral analog
circuits are important and necessary for automatic circuit performance optimization.
The complex nature of analog/mixed-signal systems, however, makes this task dif-
ﬁcult and costly. In this section, we adopt a Geostatistics motivated approach (i.e.
Kriging model) for eﬃcient generation of performance models considering both de-
sign and process variables for analog circuits [6]. Kriging model is attractive for our
yield-aware analog performance modeling problem because of two appealing features.
First, Kriging model enables robust regression of global trends of complex mapping
between design parameters and resulting performances. The capability in capturing
global trends of the performance space is very beneﬁcial as it helps prevent trapping
into local optimum that may happen in optimization-based approaches. Second, in
addition to providing performance predictions, Kriging model also provides an uncer-
tainty level for each prediction in the form of mean-square-error (MSE). The ability
in providing such an assessment on prediction accuracy immediately allows an it-
erative update scheme wherein new data can be selectively added in the regions of
high uncertainty level to improve the model accuracy. We further utilize the ob-
tained performance models of building blocks to achieve eﬃcient hierarchical system
optimization in the next chapter.
34
1. Mathematical Formulation
Kriging model was ﬁrst proposed by Matheron in 1963 [6] for geostatisics problems.
Unlike physical experiments, computer simulation does not posses any random errors.
However, under the framework of Kriging, a deterministic circuit performance using
Kriging model is considered as a realization stochastic process, 푌 (푥), where 푥 is
an 푛-dimensional vector containing design parameters (and/or process variations).
This fundamental treatment of Kriging model provides a statistical framework for
deterministic function approximation and quantiﬁcation of approximation uncertainty
[22]. The stochastic process 푌 (푥) is cast into a regression model as
푌 (푥) = 푍(푥) + 훽푇 f(푥) (2.14)
where f(푥) = [푓1(푥) 푓2(푥) ⋅ ⋅ ⋅ 푓푛(푥)]
푇 is a vector of predeﬁned regression functions,
훽 = [훽1 훽2 ⋅ ⋅ ⋅ 훽푛]
푇 is the vector of unknown regression coeﬃcients. The term 훽푇 f(푥)
represents the global trend of 푌 (푥) across the input space and 푍(푥) is a random
process and used to capture the systematic departure of the performance from the
global regression portion. 푍(푥) is assumed to have zero mean and a correlation matrix
퐶표푟푟(푍(푥푖), 푍(푥푗)) = 푅(푥푖, 푥푗) (2.15)
for 푍(푥푖) and 푍(푥푗) with two input vectors 푥푖 and 푥푗 . The correlation matrix in
Eqn 2.15 is often parameterized and can be chosen to be a product of stationary
one-dimensional functions
푅(푥푖, 푥푗) =
푛∏
푘=1
푒−휃푘∣푥푖,푘−푥푗,푘∣
푝푘 (2.16)
where 휃 = [휃1 휃2 ⋅ ⋅ ⋅ 휃푛]
푇 and 푝 = [푝1 푝2 ⋅ ⋅ ⋅ 푝푛]
푇 are unknown coeﬃcients with
constrains 휃푘 ≥ 0 and 0 ≤ 푝푘 ≤ 2.
35
In order to apply Eqn 2.14 for performance modeling, we need to estimate the
unknown parameters 휎2, 훽, 푝 and 휃. Suppose 푚 sets of simulations are performed,
we have 푋푠 = [푥푠,1, 푥푠,2, ⋅ ⋅ ⋅ 푥푠,푚]
푇 as the input vectors (can be design variables or
process variables) and Y푠 = [y푠,1, y푠,2, ⋅ ⋅ ⋅ y푠,푚]
푇 be the corresponding performances.
An 푚 ×푚 matrix 푅 is deﬁned by 푅푖,푗 = 푅(푥푖, 푥푗). The goal of Kriging model is to
ﬁnd a predictor of the system performances at the new point 푥푛푒푤.
If we denote 푟(푥푛푒푤) = [푅(푥푛푒푤 − 푥1) 푅(푥푛푒푤 − 푥2) ⋅ ⋅ ⋅ 푅(푥푛푒푤 − 푥푚)]
푇 and
퐹 (푥) = [푓(푥1) 푓(푥2) ⋅ ⋅ ⋅ 푓(푥푚)]
푇 , the predictor 푌ˆ (푥푛푒푤) can be written as [23]
푌ˆ (푥푛푒푤) = 푓(푥푛푒푤)
푇 훽ˆ + 푟푇푅−1
(
푌푠 − 퐹 훽ˆ
)
(2.17)
where
훽ˆ =
[
퐹 푇푅−1퐹
]−1
퐹 푇푅−1푌푠. (2.18)
The MSE of the predictor 푌ˆ (푥푛푒푤) and the real performance value can be ob-
tained as
푀푆퐸[푌ˆ0] = 휎ˆ
2
⎛
⎜⎜⎝1− [푓푇 푟]
⎡
⎢⎢⎣ 0 퐹
푇
퐹 푅
⎤
⎥⎥⎦
−1 ⎡⎢⎢⎣ 푓(푥푛푒푤)
푟(푥푛푒푤)
⎤
⎥⎥⎦
⎞
⎟⎟⎠ (2.19)
where 휎ˆ2 is given as
휎ˆ2 =
1
푚
[
푌푠 − 퐹 훽ˆ
]푇
푅−1
[
푌푠 − 퐹 훽ˆ
]
. (2.20)
To get the performance prediction, the Maximum likelihood estimate (MLE) of
correlation parameters 휃 and 푝 as well as 훽 and 휎2 is performed. The likelihood
function can be written as
퐿(휃, 푝, 훽, 휎2) = 1
(2휋)푚/2
⋅ 1
(휎2)푚/2
⋅ 1
(det(푅))1/2
⋅ exp
(
− 1
2휎2
[푌푠 − 퐹훽]
푇 푅−1 [푌푠 − 퐹훽]
) . (2.21)
Substituting Eqns 2.18 and 2.20 into Eqn 2.21 leads to the log of the likelihood
36
function
퐿(휃, 푝) = −
1
2
[
푚 log 휎ˆ2 + log(det(푅))
]
. (2.22)
In practical implementation, we can ﬁrst select a proper value for 푝 (e.g. 2),
and solve Eqn 2.22 numerically to ﬁnd 휃ˆ. Then, Eqn 2.18 and Eqn 2.20 are used to
compute 훽ˆ and 휎ˆ2. Finally, the estimated performance at 푥푛푒푤 is given by Eqn 2.17
and the MSE of 푌ˆ (푥푛푒푤) is given by Eqn 2.19, which can be used to evaluate the
prediction accuracy. A relative prediction uncertainty is more meaningful
퐸푟푟 =
(
푀푆퐸[푌ˆ0]
)1/2
/푌ˆ0. (2.23)
We summarize the application of Kriging model for circuit performance modeling
in Algorithm. 1. Here the goal is to predict the system performance 푃 at any design
point 퐷 in the design space.
Algorithm 1 Generation of Kriging Performance Model
Input: Circuit netlist, target performance 푃 , design variables 퐷 = [푑1, ⋅ ⋅ ⋅ , 푑푛], max
relative error tolerance 퐸푟푟푚푎푥, # of test points 푛.
Output: The Kriging model 푃 = 퐾(퐷).
1: Evenly sample the design space using 푚 date points 퐷1, ⋅ ⋅ ⋅ ,퐷푚 and ﬁnd the corre-
sponding performances 푃1, ⋅ ⋅ ⋅ , 푃푚 via SPICE simulation.
2: Construct an initial Kriging model 푃 = 퐾(퐷) by maximizing Eqn 2.22.
3: for 푖 = 1 to 푛 do
4: Randomly generate a design sample 퐷푇,푖 and evaluate the relative MSE 퐸푟푟푖 using
Eqn 2.23 for 푃ˆ (퐷푇,푖);
5: if 퐸푟푟푖 > 퐸푟푟푚푎푥 then
6: Perform a SPICE simulation at 퐷푇,푖 to compute the exact performance 푃퐷푇,푖 .
7: Include {퐷푇,푖, 푃퐷푇,푖} as an additional data and update the Kriging model.
8: end if
9: end for
The above algorithm iteratively constructs a Kriging model for a single perfor-
mance. The model is updated in the areas where the Kriging models do not possess
of enough accuracy, which leads to the iterative model updating feature. If more
37
than one performance are of interest, multiple Kriging models can be extracted in-
dividually. With the help of Kriging performance models we can achieve yield-aware
hierarchical analog/mixed-signal system optimization and also help develop eﬃcient
built-in self-test circuits, which will be discussed in detail in the next chapters.
2. Circuit Examples
In order to demonstrate the accuracy and eﬃciency of the proposed Kriging perfor-
mance methodology, we use two test circuit cases including one ring oscillator and
one LC oscillator which are implemented in 90-푛푚 CMOS technology.
a. Ring Oscillator
The ﬁrst circuit under analysis is a typical ﬁve-stage ring oscillator similar as in
Fig. 11. The total number of transistors in this circuit is 23. We consider the sizes
(width 푊 with length set to the minimum) of all the transistors with symmetry
constrains, that is the inverters should be identical across diﬀerent stages. Therefore,
there are 7 design parameters in our design example. Threshold voltage 푉푡ℎ of all
the 23 transistors are assumed to follow Gaussian distributions to model the process
variations. The circuit performance considered are power, maximum frequency and
VCO gain at the center operation frequency.
We set the following design constrains: each transistor width can be varied within
±40% of the initial design value and should be at least greater than the minimal tran-
sistor width. The variation of 푉푡ℎ is set to 3휎 = 15% to account for the process varia-
tions. The initial Kriging model is built with 50 uniform samples in the design space.
After the generation of the Kriging model, we compare the predicted performance
and corresponding predicted relative MSE with the SPICE simulation results for four
randomly selected points as shown in Table IV. 푌푝 is the predicted performance, 푌푎
38
is the measured results by SPICE, 푀푆퐸 is the predicted relative error and 퐸푟푟 is
the actual relative error of 푌푝 w.r.t. 푌푎. From Table IV, we can clearly see that the
Table IV. Kriging model accuracy for ring oscillator.
퐹푟푒푞푚푎푥 푌푝(퐺퐻푧) 푌푎(퐺퐻푧) 푀푆퐸(%) 퐸푟푟(%)
Point 1 2.768 2.782 0.53 -0.53
Point 2 2.462 2.440 0.36 0.90
Point 3 2.164 2.151 0.41 0.60
Point 4 1.878 1.861 0.66 0.92
Power 푌푝(휇푊 ) 푌푎(휇푊 ) 푀푆퐸(%) 퐸푟푟(%)
Point 1 51.98 51.91 0.29 0.14
Point 2 42.50 42.72 0.41 -0.50
Point 3 39.85 39.89 0.23 -0.099
Point 4 36.54 36.79 0.48 -0.69
Gain 푌푝 (G/V) 푌푎 (G/V) 푀푆퐸(%) 퐸푟푟(%)
Point 1 5.607 5.671 0.54 -1.14
Point 2 4.823 4.796 0.47 0.56
Point 3 4.489 4.517 0.70 -0.63
Point 4 4.100 4.105 1.31 -0.11
predicted performances match well with the actual measured performance values.
b. LC Oscillator
The second circuit example is an LC oscillator as shown in Fig. 14. There are totally
six transistors including two acting as a variator. We use 푅1 and 푅2 to model the
parasitics resistances of the two inductors 퐿1 and 퐿2, and 퐶푓1 and 퐶푓2 to model the
39
parasitic capacitances at the two output nodes.
 



	



























 

Fig. 14. LC oscillator schematic.
Width 푊 of all the transistors are considered as design parameters with sym-
metry constrains between 푀3 and 푀4, 푀5 and 푀6, respectively. The biasing current
퐼푏 is also selected as a design parameter. In terms of process variation modeling, we
consider the threshold voltage 푉푡ℎ of each transistor and the mismatch between induc-
tors 퐿1 and 퐿2, the parasitic capacitances and resistances are sources of variability.
The ranges of the transistor widths and the bias current are set to be within ±30%
of the initial design, and transistor widths should also satisfy the minimal width
requirement. The variations of 푉푡ℎ and parasitic parameters are set to be 3휎 = 10%.
We consider the VCO gain and the power for this LC oscillator and build Krig-
ing models to predict the system performance. The accuracy of the Kriging model
−→
퐾 (
−→
퐷,
−→
푉 ) is demonstrated in Table V, in which we pick up four random combinations
of the design parameters
−→
퐷 and process variations
−→
푉 .
40
Table V. Kriging model accuracy for LC oscillator.
Power 푌푝(휇푊 ) 푌푎(휇푊 ) 푀푆퐸(%) 퐸푟푟(%)
Point 1 55.25 55.17 0.76 0.14
Point 2 50.42 50.44 0.07 -0.03
Point 3 56.90 56.78 0.11 0.21
Point 4 50.18 50.15 0.13 0.07
Gain 푌푝 (G/V) 푌푎 (G/V) 푀푆퐸(%) 퐸푟푟(%)
Point 1 1.107 1.106 0.83 0.09
Point 3 1.250 1.249 0.54 0.08
Point 2 1.316 1.294 0.77 1.67
Point 4 1.071 1.071 1.24 0.02
D. Summary
In this chapter, we propose an eﬃcient lookup table based modeling technique for
Sigma-Delta ADC performance evaluations. By combining response surface modeling
technique, parameterized LUT models in terms of important underlying circuit pa-
rameters are constructed. Our modeling framework can be employed to perform fast
nominal and statistical simulation of various Sigma-Delta ADC designs and provide a
basis for performance and robustness trade-oﬀ analysis. We also look into the model-
ing of charge-pump PLLs by presenting a macromodeling strategy with reduced-rank
regression to perform fast statistical PLL performance evaluations. The PLL mod-
eling frameworks are used to eﬃciently evaluate the PLL performance distributions
under process variation. A geostatistics motivated yield-aware modeling approach
is presented for general analog circuit performance modeling. The MSE metric pro-
vided by the Kriging model is instrumental in controlling modeling accuracy via
41
well-controlled model updates. Experimental results are presented to demonstrate
the eﬀectiveness of the proposed performance modeling approaches. The proposed
performance modeling approaches are critical for the analog/mixed-signal system op-
timizations and performance veriﬁcations, as to be discussed in the following chapters.
42
CHAPTER III
YIELD-AWARE ANALOG CIRCUIT OPTIMIZATION
Optimization of large analog/mixed-signal systems is diﬃcult due to the costly sys-
tem performance evaluation procedures and the large design variable space. The
optimization task becomes even more challenging when process variations come into
consideration. In this chapter we utilize Kriging performance models to ﬁrst achieve
circuit block level performance trade-oﬀs and then use these models to facilitate yield-
aware hierarchical system optimization [24].
Design centering technique is a widely used yield enhancement approach by mov-
ing the nominal designs to increase the overlap of the process distribution and the
feasible performance space [25]. Although the idea is simple and straightforward,
the implementations of design centering techniques could vary. One variant of the
design centering implementation uses linear model with sensitivity analysis to form
an approximation of the overall feasibility region as polytope [26]. Other implemen-
tations may treat the performance feasible region as an ellipsoid rather than a convex
polytope, then the optimization target becomes to move the ﬁnal design to the center
of the ellipsoid [27]. The major drawback of these simpliﬁcations is the accuracy,
simple linear models have very poor capability to capture the complex process varia-
tion behaviors in modern CMOS technologies, so improved techniques are needed for
paretical yield-aware analog optimizations.
One of the two most popular commercial yield-enhance tools for analog circuit
designs is NeoCircuit by Cadence Design Systems [28]. It employs Monte-Carlo sim-
ulations to ﬁnd the process parameter combinations which corresponds to the worst
system performances, then the tool performs design space optimization to meet these
process corners. The underlining theory is simple, if the design can meet the spec-
43
iﬁcations in the worst performance corners, it can meet the speciﬁcations in all the
other process variation conditions, hence the design is optimized in the center. The
other analog optimization tool, WiCkeD developed by MunEDA, utilizes the worst
case distance to guide the optimization search [29]. In that optimization framework,
the major diﬀerence from NeoCircuit is that the process corners used to guide the
yield optimization are not from Monte-Carlo simulations but from the numerical cal-
culation of the worst-case distance [30].
There exist several drawbacks in these traditional design centering techniques
for analog yield optimization. First of all, no matter model-based or corner-based
yield analysis techniques, they can only maintain the accuracy for the speciﬁed local
design points. Once the design points are shifted during the system optimization,
process models or corners need to be updated in the new design points, hence no
optimization convergence can be guaranteed. Second, the yield analysis are rough in
these optimization frameworks. The accuracy of the yields calculated using process
space models heavily depend on the model accuracy and the shape of process distri-
butions, while the corner based approaches can not specify the required yield levels.
And lastly, these traditional optimization methods lack of the capabilities to address
the problems of large scale analog circuit optimization.
In order to address the limitations of these traditional analog optimization ap-
proaches, we propose to use the accurate and eﬃcient global modeling technique, the
Kriging modeling method, as the key element in our yield-aware analog optimization
framework. The Kriging model bridges the design and process variable spaces and the
system performance spaces. The performance models are updated iteratively in the
whole design space to enhance the accuracy, so we can use the Kriging performance
models to search for the global optimal solutions. The statistical system performances
are analyzed using the device-level Monte-Carlo simulations which are evaluated us-
44
ing the eﬃcient partial Kriging models, so the yields calculated are accurate. The
proposed optimization framework keeps the structure of eﬃcient hierarchical system
performance analysis so the computation time of the performance evaluation within
each optimization iteration can be reduced signiﬁcantly. In this chapter we start with
the optimization of smaller scale analog circuits with the consideration of process
variations, then present the more challenging hierarchical yield-aware analog system
optimization framework.
A. Yield-aware Circuit Block Optimization
The mathematical forms of Kriging models have been presented in the previous Chap-
ter. Suppose we denote the Kriging performance model as
−→
퐾 (
−→
퐷,
−→
푉 ) where
−→
퐷 is the
design parameter set and
−→
푉 represents the process variation information. Perfor-
mances of small scale analog/mixed-signal circuits can be modeled accurately using
Kriging model, so we can perform circuit optimization using
−→
퐾 (
−→
퐷,
−→
푉 ) by searching
in the design space
−→
퐷 while injecting process uncertainties using
−→
푉 .
In this section, we ﬁrst evaluate circuit performance trade-oﬀs (pareto fronts)
without considering process variations using iterative search with nominal Kriging
performance models. The generated nominal building block pareto fronts serve as
the baseline to generate the yield-aware performance trade-oﬀs. Then we use the
nominal pareto fronts as starting points to search for the performance trade-oﬀs in
diﬀerent yield levels achieving better convergence. The yield-aware optimization is
obtained by optimizing system cost functions consisted of all the system performances
in the speciﬁed yield levels. The statistical system performance distribution analysis
is achieved using partial Kriging models, which save about 60% to 70% computation
time when compared with brute-force use of Monte-Carlo simulation. We demonstrate
45
the eﬃciency and accuracy of the proposed yield-aware optimization framework using
design examples of two oscillator and one ampliﬁer.
1. Pareto Front Background
In most circuits, diﬀerent performance objectives compete against each other and it
is infeasible to ﬁnd a design point to reach the best value for all performances at the
same time. The design task then becomes a multi-objective optimization problem. As
shown in Fig. 15, pareto front consists of the optimal performance trade-oﬀs between
diﬀerent performances. In multi-objective optimization, performance 푝푎 dominates
performance 푝푏 (suppose smaller value is better) when [31, 32]
pa ≺ pb : ∀(푝푎푖 ≤ 푝푏푖) ∧ ∃(푝푎푖 < 푝푏푖), 푖 = 1, ⋅ ⋅ ⋅ , 푛 (3.1)
where 푝푎푖 and 푝푏푖 are the 푖-th performances of interest, and there are totally 푛 perfor-
mances. A set of performances is considered as pareto-optimal if it is not dominated
by any other set of performances.
 
  	




 










ﬀ
ﬁ
ﬂ
ﬃﬂ
 
!"#$ﬂ
ﬂ%#

"ﬂ
&

 
#
&

'

()


()


(

ﬂ%#
$#
&
"#
*
ﬂ
&

 +
ﬂ%#
ﬃ""!ﬂ
&
ﬂ
D
→
( )c ⋅
( )K D
→ →
Fig. 15. Illustration of pareto front.
In practice, it is impossible to obtain the entire pareto front in the closed form
and it has to be approximated by ﬁnding a number of points on the front. So the
multi-objective optimization problem is often transformed to multiple single-objective
46
optimization problems by assigning a suitable weight to each performance [31, 32].
2. Iterative Search for Pareto Fronts
With the deﬁnition in Eqn. 3.1, we start to build pareto fronts without consider-
ing process variations by employing nominal Kriging performance models. Although
Kriging model provides a general performance modeling machinery, its practical ap-
plication in circuit modeling must be facilitated by well controlling accuracy and
complexity. The key idea is to develop an iterative model accuracy reﬁning scheme so
that the pareto fronts generated represent the actual circuit performance trade-oﬀs.
Our strategy help control the total number of transistor-level circuit performance
evaluations (SPICE simulations) by exploiting the prediction power of Kriging model
and adopting incremental updates.
 

 

 


 




	 








 
D
→
 






 




 


	 

ﬀ
ﬁ
ﬂﬃ

ﬃ ! "#ﬃ$ﬃﬂ$
%&
'(
! $
(
ﬂ
(
# 

ﬃ&ﬂ
( )K D
→ →
 

 

 

 


 



 

 

)

*












  


	
+








 

 

 


 



 

 

	
*




*







 


*



,-



*





ﬀ
 .
ﬀ
/
01
2345
67 89
:
;
:<
;
=>
4
7? @
:
6A
=>
97
3
>
:<
6B C
D EFG
HI
 

 

 


 



 

 

J
 


 
	 


	 K







)





K







 

 

 


 



 

 

)








*






 



 



*





	 K







 

 

 


 



 

 






L
 


*


,










*




 




M
NO
PQR
S
QT U
V
RQ
W
N X
R
NO
W
	
*




*


Fig. 16. Iterative pareto front generation.
1) As shown in Fig. 16, our nominal-case iterative pareto front extraction starts
from a coarse sampling of design space
−→
퐷 by using a limited number of SPICE
simulations. This initial set of SPICE simulation data serves as a basis of constructing
an initial Kriging model as in Eqn 2.17 for predicting circuit performances.
47
2) In the following step, the complete design space is sampled uniformly to cap-
ture the global trend of design parameter to performance mapping through multiple
evaluations of the Kriging model. The Kriging model accuracy is improved by adding
additional SPICE simulation data when the relative MSE in Eqn 2.19 exceeds a user-
deﬁned threshold. In our experiments, it has been observed the cost of these selective
SPICE simulations is rather mild.
3) With a conditionally updated Kriging model and the set of Kriging model
evaluations in the previous step, an initial pareto front is extracted by sorting the
evaluated performance values. The cost due to sorting is usually insigniﬁcant.
4) As the initial pareto front only captures the global trends of design parameters
to circuit performances mappings, a local design space search is conducted around
the initial pareto front to reﬁne the initial pareto front. A number of additional
design points in the neighborhood of the initial pareto front are evaluated using the
Kriging model. Note that this local design space exploration is not expensive since
performance evaluation is achieved through Kriging.
5) The pareto front is updated by examining performances of all the design
points involved in the local design space search, and this iterative process continues
till the a converged pareto front is reached. It has been observed that the convergence
can be usually reached in a few iterations in our experiments.
The obtained nominal pareto fronts serve as the baseline for the yield-aware
pareto front generation. Although the “optimal” performance tradeoﬀs associated
with the nominal case pareto front often represent overly optimistic performance
combinations under signiﬁcant process variations, one key observation is that the
design parameters on the nominal pareto front are often still the near-optimal design
candidates in the presence of process variations. This allows us to use the nominal
pareto front as the starting point and adopt a well-controlled iterative process to
48
search for pareto fronts under varying yield levels.
3. Fast Statistical Analysis Using Partial Kriging
Although Kriging model is a much more eﬃcient surrogate than circuit simulation, a
brute-force use of Kriging model to run Monte-Carlo simulations could be rather ex-
pensive. Our key idea to alleviate the cost is to facilitate computation sharing between
Monte-Carlo simulations across the design space via Partial Kriging Model Evalua-
tion. This is achieved by properly choosing a correlation function (i.e. Eqn 2.16) and
exploiting special structure of the Kriging model.
Let us examine Eqn 2.17 that provides the circuit performance predication by the
Kriging model. Without loss of generality, in the following discussion, we only concern
with a scalar performance. In the practical implementation, 푓(⋅) can be simply chosen
as constants, then only variable in Eqn 2.17 is vector 푟, which corresponds to the
correlations between the untried input 푥0 and the 푚 inputs 푥푠,1, ⋅ ⋅ ⋅ , 푥푠,푚 based on
which the Kriging model is built. Eqn 2.17 is now written as
푌ˆ0 = 푐+ 푟
푇 푞, (3.2)
where 푐 is a constant and 푞 = 푅−1(푌푠 − 퐹 훽ˆ) ∈ R
푚, both of which are known after
the Kriging model is constructed. The main cost of a Kriging model evaluation is due
to the computation of 푟 that involves of computation of 푛푚 exponential terms and
their products as in Eqn 2.16, where 푚 is the range of a few hundreds and 푛 is the
dimension of the combined design and process parameter space, which is typically in
the range of a few tens. The computation of 푟 takes the most signiﬁcant portion of
model evaluation time.
Our key observation is that due to the speciﬁc structure of our Kriging model,
the design and process variables are well separated in the evaluation of 푟. To see this
49
more clearly, we rewrite Eqn 2.16 as
푅(푥0, 푥푖) =
푁퐷∏
푘=1
푒−휃푘∣푥0,푘−푥푖,푘∣
푝푘 ⋅
푁퐷+푁푉∏
푘=푁퐷+1
푒−휃푘∣푥0,푘−푥푖,푘∣
푝푘 , (3.3)
where 푁퐷 is the number of the design parameters and 푁푉 is the number of process
parameters, and 푁퐷+푁푉 = 푛. Note that the second product 푅푟(푥0, 푥푖) of the above
equation does not depend on the design parameters, so it can be pre-computed and
shared for diﬀerent design points. Essentially, before the yield-aware design space
starts, we ﬁrst perform one run of Monte-Carlo sampling in the process space by
pre-computing a set of 푅푟(푥0, 푥푖) samples. This set of samples do not complete the
Kriging model evaluations solely by themselves, therefore this step is referred to as
Partial Kriging Model Evaluation. This set of 푅푟(푥0, 푥푖) samples can be shared with
other design points to facilitate fast Monte-Carlo simulations in the design space
eﬃciently. For instance, as shown in Fig. 17, for any give design point, we only need
to evaluate
푅푙(푥0, 푥푖) =
푁퐷∏
푘=1
푒−휃푘∣푥0,푘−푥푖,푘∣
푝푘 (3.4)
and combine it with the 푅푟(푥0, 푥푖) samples to quickly generate the statistical per-
formance distribution at this design point. Our technique is especially attractive in
practice since under most cases 푁푉 > 푁퐷.
( , )P K D V
→ → → →
=
 

	 

 
0, , 0, ,| | | |
0
1 1
( , )
D VD p pk k
k k i k k k i k
D
N NN
x x x x
i
k k N
R x x e eθ θ
+
− − − −
= = +
= ⋅∏ ∏








 0, ,| |
1
D V pk
k k i k
D
N N
x x
k N
e
θ
+
− −
= +
∏
0 , ,| |
1
D V pk
k k i k
D
N N
x x
k N
e
θ
+
− −
= +
∏
0, ,| |
1
D V pk
k k i k
D
N N
x x
k N
e
θ
+
− −
= +
∏



ﬀ ﬁﬂﬂ ﬃﬀ 




!
"
!
#
!
$%&
!
$%
"
'(
)* +
(,
-./
(
0.12
+-/
(
). /314-
D
→
!
$
!
&
1D
→
2D
→
567
867
967
:
0.
;
- <1=20 /
(,>
21
;
(
0.
(
. +-/
(
). /314-
567
967
867
?
=
^ ^ ^
1
0
T T
sP f r R Y Fβ β−
@ A
= + −
B C
D E
Fig. 17. Speeding up Monte-Carlo sampling via partial Kriging model evaluation.
50
4. Yield-aware Block Optimization
In the previous section, we have described how partial Kriging model evaluation
can be exploited to facilitate eﬃcient Monte-Carlo simulations in the design space
based on Kriging models
−→
퐾 (
−→
퐷,
−→
푉 ) that span across both the design and process
spaces. However, the construction of such Kriging models can be considerably costly
compared with the case of the nominal case modeling due to a higher input space
dimensionality. We tackle this diﬃculty by including Kriging model construction as
part of iterative pareto front extraction procedure where only localized updates and
search are involved.
Although the “optimal” performance tradeoﬀs associated with the nominal case
pareto front often represent overly optimistic performance combinations under sig-
niﬁcant process variations, one key observation is that the design parameters on the
nominal pareto front are often still the near-optimal design candidates in the presence
of process variations. This allows us to use the nominal pareto front as a starting
point and adopt a well-controlled iterative process to search for pareto fronts under
varying yield levels as shown in Fig. 18.
Starting from the nominal pareto front, we perform local design space search by
evaluating additional design points in the neighborhood of the initial pareto front.
Here, each design point is not only evaluated in the nominal sense, but also statisti-
cally by performing Monte-Carlo sampling in the process space. Hence, performances
that can be achieved at varying yield levels (e.g. 60% or 80%) are obtained after the
Monte-Carlo simulation. It shall be noted the cost of these Monte-Carlo simulation
is well controlled by performing Partial Kriging Model Evaluation based fast tech-
nique. The relative MSE of each full Kriging model evaluation is also checked and
the Kriging model is again conditionally updated by performing additional SPICE
51
 

 

 

 


 

 


	
 

 


 



ﬀﬁ ﬂ

ﬃ
 



 !
 

"

ﬁ"

ﬀ#$
 %
 &
%

'
(
#

$
!
 

"

ﬁ"

ﬀ#$

ﬂ

ﬃ

ﬀ $
ﬀ

ﬁ

 )
* +,-
.
 

 

 

 


 

 

,

,

/

0
10
2
1	3

10
456
789:8; <=8>;
?@
A
@
98 B
@
98
C
5 D
9
56
CE
FG
H
IG
H
JG
H
K
	3	
3 
L

M

L



 

 

 


 

 

 

N



	
O
 





M

L


	
3
P
0
L
1	3

10 10Q
R

L



L


 


 

 

 


 

 

STU
VTU W
TU
 

 

 

 

X
0Q 


 


Fig. 18. Iterative yield-aware pareto front optimization.
simulation if the MSE exceeds the speciﬁed threshold. Next, for any required yield
level 훼, performances that can be achieved at yield 훼 in the local design space search
are sorted and the pareto front is updated accordingly. This iterative process con-
tinues till a converged pareto front is reached for each yield level 훼. The achieved
yield-aware pareto fronts represent the best achievable performance trade-oﬀs in the
required yield level.
5. Block Optimization Examples
In this section we ﬁrst apply the yield-aware optimization method for the two design
cases illustrated for Kriging model accuracy in Chapter II. The 50% yield and 80%
yield pareto fronts as well as the nominal pareto fronts as in Fig. 19 for the ring
oscillator in Fig. 11.
To verify that the pareto fronts we have obtained in Fig. 19 are globally optimal,
we perform Monte-Carlo simulations in the neighborhoods of these found fronts and
also in the far away regions in the design space, to check the optimality of our Pareto
fronts. The pareto front with 50% yield is selected as an example and the simulation
52
1.5
2
2.5
330
40
50
60
3
3.5
4
4.5
5
5.5
6
Max Frequency (GHz)Power (uW)
G
ai
n 
(G
Hz
/V
)
Nominal pareto front
50% yield pareto front
80% yield pareto front
Fig. 19. Yield-aware pareto fronts for the ring oscillator.
results are shown in Fig. 20. In the ﬁgure, we examine the performance space region
where the maximum frequency is at 2.4GHz± 0.1GHz. As can be seen from the ﬁgure,
the additional design points are inferior to the found pareto front, demonstrating the
eﬀectiveness of our yield-aware pareto front generation method.
35 40 45 50 55 60
4.6
4.7
4.8
4.9
5
5.1
Power (uW)
G
ai
n 
(G
Hz
/V
)
Random samples
50% yield pareto front
Fig. 20. Veriﬁcation of yield-aware pareto front.
Similar as the ring oscillator example, we plot the pareto fronts for the LC
oscillator in Fig. 21. Fig. 21 clearly shows the diﬀerence between the nominal pareto
fronts and the pareto fronts at diﬀerent yield levels. We can see that if a higher yield
is needed, the corresponding performance trade-oﬀs become worse.
53
0.9511.051.11.151.21.251.31.35
47
48
49
50
51
52
53
54
55
Gain (GHz/V)
Po
w
er
 (u
W
)
Nominal pareto front
50% yield pareto front
80% yeild pareto front
Fig. 21. Yield-aware pareto fronts for the LC oscillator.
We further perform optimization for a two-stage operational ampliﬁer as shown
in Fig. 22. There are totally 8 transistors in the circuit and we consider 5 design
variables for the optimization and 8 transistor mismatch parameters to represent
process variations, set up with the same range as those of the LC oscillator.
 








	
 		











 



Fig. 22. Two-stage Op-Amp schematic.
The performance we select to generate pareto fronts are the DC gain and the
3-푑퐵 bandwidth. We show an intermediate step in our pareto front search algorithm
and the converged front in Fig. 23.
The yield-aware pareto fronts of 20%, 50% and 80% are plotted in Fig. 24 for
the two-stage opamp. The pareto front curve denote the best achievable circuit
54
0510
15
20
25
30
35
40
45
50
Bandwidth (MHz)
G
ai
n 
(dB
)
0510
15
20
25
30
35
40
45
50
Bandwidth (MHz)
G
ai
n 
(dB
)
Sort for the
 pareto front Converged pareto front
Initial pareto front
Fig. 23. Iterative pareto front generation for the two-stage Op-Amp.
performance trade-oﬀs at the required yield levels.
234567
22
24
26
28
30
32
34
Bandwidth (MHz)
G
ai
n 
(dB
)
20% yield
50% yield
80% yield
Fig. 24. Yield-aware pareto fronts for the two-stage Op-Amp.
B. Yield-aware Hierarchical System Optimization
The yield-aware hierarchical optimization is dictated by the need for safeguarding
large analog/mixed-signal designs in scaled CMOS technologies. Although in the
previous section we successfully achieved yield-aware synthesis for small scale analog
circuits, it is of much more interest to solve the problem of automatic design for
large analog systems. Hierarchical optimization is a promising approach for large
55
system designs. However, the existing ﬂows are lack of the ability to consider process
variations in the automatic system design, which is much needed to ensure the system
yield.
We address two fundamental diﬃculties in achieving eﬃcient robust analog sys-
tem optimization: yield-aware pareto performance characterization at the building
block level and yield-aware system-level optimization problem formulation. It is
shown that the proposed approach is not only able to eﬀectively capture the block
performance trade-oﬀs at diﬀerent yield levels, but also correctly formulate the whole
system yield and eﬃciently perform system-level optimization in presence of process
variations. Our approach extends the eﬃciency of hierarchical analog optimization,
enjoyed for improving nominal circuit performances, to yield-aware optimization. The
proposed methodology is demonstrated by the two examples of a two-stage ampliﬁer
and a phased locked loop (PLL) consisting of multiple building blocks.
1. Hierarchical Optimization Background
In hierarchical optimization, a large analog system is decomposed into several build-
ing blocks. In order to get the best overall system performances, it is natural to ﬁnd
the design points which result in best performances for the building blocks. Pareto
fronts presented in the previous section are the good representation of the perfor-
mance trade-oﬀs needed in hierarchical optimization framework. The optimal system
performances can be achieved by searching within building block-level pareto fronts.
A general ﬂowchart of hierarchical optimization using pareto fronts is shown in
Fig. 25. Circuit level design parameters (sizes of transistor and passive components,
biasing, etc) are explored within the design constrains to ﬁnd the best possible per-
formance trade-oﬀs. Then system-level optimization is carried out by searching in
the space restricted using block-level pareto fronts. If the mappings from system
56
performances to building block parameters and block-level variables to circuit-level
parameters are known, we can retain the corresponding circuit design parameters
to the optimal system performances. There exist two key beneﬁts for this hierar-
chical optimization. First, since the number of performances in the block level is
much smaller than that of the original design space, the space exploration can be re-
duced signiﬁcantly. An equally important beneﬁt here is that system-level behavioral
models can be used to quickly estimate system-level performances, thereby further
signiﬁcantly reducing the overall optimization cost.
 




	





	











	 

	























ﬀ
ﬁ
ﬂ ﬃ
ﬁ
ﬂ

 !"
# $
!
#ﬁ

%
&'
%
&(
%
''
%
'(
)*
#
+, -
)*
#
+, .
/
 
 0
 +,
#
0
*
#
+,

 ! 1
2345 65 75
111
%8%
''
5%
'(9
%8%
&'
5%
&(9
/
 
 0
 +,
#
:
"

ﬂ
ﬁ

 ! 1
Fig. 25. Nominal hierarchical optimization ﬂow.
2. Issues in Yield-aware Hierarchical Optimization
Considering process variations in pareto front generation requires statistical simula-
tions for each design point, which typically imposes two orders of magnitude compu-
tation cost for statistical performance evaluation. Since there are much more device-
level process variables than design variables, scalability and accuracy become crucial
to the development of performance models. The more challenging requirement comes
from how to perform hierarchical optimization. In the nominal case, circuit blocks
can be individually characterized in terms of block-level pareto performance models.
57
However, in the case of yield-aware optimization, it is critical to capture the impacts
of device-level variations on the system-level performances. This need makes individ-
ual extraction of block-level pareto models and system-level optimization much more
complicated, which are discussed in the following sections.
a. Pareto Front Generation Issues
To obtain yield-aware pareto models, block-level design points that provide robust
best block-level performance tradeoﬀs are collected. This can be achieved by assuming
a single yield target for possibly multiple block-level performances, then the pareto
front can be generated. However, there exists a disconnection between the block-level
models and the system-level performances, where statistical variations are considered
at the system level. In other words, the single yield level at the block level is not
suﬃcient to provide enough statistics based on which the whole system yield can be
estimated. So a way to generate statistical pareto fronts which can correctly pass
yield information as well as block performance trade-oﬀs is needed.
b. System-level Optimization Issues
In [32], the authors suggested to use building block pareto fronts with all the perfor-
mances at one speciﬁed yield level in the hierarchical optimization. And the obtained
system-level optimization points were supposed to have the same yield level as build-
ing blocks. This approach, however, may not work properly due to several reasons.
Firstly, the transformations from building block-level performances to system-
level performances may be complex. This dependency may allow the low yield level
of one building block be compensated by other blocks in the same system. Similarly,
the performances within one building block may also be compensated by the block-
to-system transformation. As a result, the relationship between building block yields
58
and the system yields can be non-monotonic and complex. Only using the speci-
ﬁed yield level pareto fronts in the hierarchical optimization will lose many possible
promising block-level performance combinations which may lead to better system-
level performances in the end.
The severer problem comes from the statistical correlations between various
circuit blocks. In reality, the device variations in diﬀerent blocks may share com-
mon/global physical origins. As a result, not only the device variations are correlated,
so are the block-level performances across the blocks. In the prior yield aware pareto
front modeling works [33, 32, 10], such correlations are not captured since each block
is optimized independently. This issue is especially severe if a single yield target is
assumed for all the block-level performances when the pareto models are extracted.
Such a simple yield-aware pareto model can not provide full statistical information
to determine the whole system yield. Consider a simple example, where the entire
system consists of two blocks with two block-level performances 푃1 and 푃2. And the
system performance is simply assumed to be: 푃푠 = 푃1+푃2. If 푃1 and 푃2 are of Gaus-
sian distribution then the system 푃푠 will also be Gaussian. In [32], it is suggested
that to achieve a system-level yield target, say 84.1%, the block-level pareto models
at the same yield level should be considered. In this simple case, the system perfor-
mance that achieves the yield target is at: 휇1 + 휇2 +
√
휎21 + 휎
2
2 + 2휎1휎2 ⋅ 푐표푣(푃1, 푃2).
Obviously, the value for this performance level depends on the correlation between
푃1 and 푃2. Without such knowledge, the correct system-level performance cannot be
decided. The situation becomes even more complex if the block performance distri-
butions are non-Gaussian. In this case, knowing only the correlation factor is also
not suﬃcient. We address these challenges by using the techniques described in the
following sections.
59
3. Multi-yield Pareto Fronts
Unlike the prior work where a single ﬁxed yield level is used when extracting the yield-
aware pareto front for multiple performances [33, 32, 10], we introduce the notation
of multi-yield pareto fronts, where best performance trade-oﬀs are extracted in terms
of combinations of yield level parameters individually speciﬁed for each performance.
For example, a block with two performances 푃1 and 푃2, will be characterized in terms
of two sperate yield level parameters 푌1 and 푌2, one for each performance.
The multi-yield pareto fronts are generated by varying yield level for each per-
formance individually. For the 푖-th building block our proposed multi-yield pareto
front is in the form of
푀푌 (
−→
푌 퐵푖,
−→
푃 퐵푖) = 0
−→
푌 퐵푖 푀푖푛 ≤
−→
푌 퐵푖 ≤
−→
푌 퐵푖 푀푎푥
(3.5)
where
−→
푃 퐵푖 are the best block performances that can be achieved at the yield level
−→
푌 퐵푖. For practical purpose,
−→
푌 퐵푖 is constrained within [
−→
푌 퐵푖 푀푖푛,
−→
푌 퐵푖 푀푎푥]. The yield
level
−→
푌 퐵푖 can vary for diﬀerent performances in a building block. An example of two-
performance multi-yield pareto front generation is shown in Fig. 26. In this case,
a ﬁxed-yield pareto model is extracted at each combination of the two performance
yield targets.
 

 

 


 


	
 


	
 





















 ﬀ
ﬁ
 ﬂﬀﬃ
 


!
 

"
 


!
 

"
 


!
 

"























Fig. 26. Multi-yield pareto front generation.
The use of multi-yield pareto fronts allows us to identify a more complete set
60
of “near optimal′′ block-level design points for hierarchical optimization. Facing the
lack of system-level interaction during individual pareto model extraction stage, this
choice relaxes the artiﬁcial constrains set in the ﬁxed-yield pareto models and allows
the system-level optimization be conducted in a larger number of promising block level
performance tradeoﬀs. However, the search space for the system-level optimization
are still constrained by the block-level best performances trade-oﬀs. Therefore, the
hierarchical nature of the overall optimization is preserved.
4. System-level Optimization Formulation
With the multi-yield pareto fronts, we have the information of device level process
uncertainties. These multi-yield pareto front models are used to achieve eﬃcient
system-level optimization.
a. Bridging Block-level and System-level
In order to evaluate system performance distributions correctly, we need the perfor-
mance distributions of building blocks for all possible promising design points. As
such, the yield levels in multi-yield pareto fronts are not used directly but together
with block performances to identify the design parameters. A speciﬁed yield level
and performance set (
−→
푌 퐵푖,
−→
푃 퐵푖) can identify a unique design point in the multi-yield
pareto front of the 푖-th building block. When appropriate, interpolation along the
pareto front can be conducted. The mapping to the design space of the 푖-th building
block can be denoted as
−→
퐷퐵푖 = 퐷푃푌푖(
−→
푌 퐵푖,
−→
푃 퐵푖). (3.6)
This mapping can be achieved by using Kriging performance models to generate dense
points forming pareto fronts instead of analytical formulas, so the mapping back to
61
the design parameters is naturally obtained.
b. System-level Cost Function
The goal of the yield-aware circuit optimization is to ﬁnd the optimal system perfor-
mances at targeted system yields. There can be more than one performances for the
whole system. Therefore, we can also specify diﬀerent system yield levels for diﬀerent
performances. Due to process variations, the system performances are all statisti-
cal variables. For the 푘-th statistical system performance 푃푠,푘 (smaller the better),
suppose we need a yield of 푌푠,푘, then the yield-aware performance 푃
푌푠,푘
푠,푘 satisﬁes the
following probability condition
P{푃푠,푘 ≤ 푃
푌푠,푘
푠,푘 } = 푌푠,푘. (3.7)
Eqn. 3.7 implies that for 푃푠,푘, the best achievable performance value is 푃
푌푠,푘
푠,푘 when yield
level 푌푠,푘 is required. 푃
푌푠,푘
푠,푘 is considered as the yield-aware 푘-th system performance
and to be used in the system-level optimization.
For multi-objective systems, the system-level cost function 퐹 for 푀 system per-
formances with yield 푌푠 = [푌푠,1, ⋅ ⋅ ⋅ , 푌푠,푀 ] can be formulated as
퐹 (
−→
푌 퐵,
−→
푃 퐵) =
푀∑
푘=1
푊푠,푘 ⋅ 푃
푌푠,푘
푠,푘 (
−→
푌 퐵,
−→
푃 퐵)
푆푝푒푐푘
(3.8)
where 푊푠,푘 is the weighting coeﬃcient for the 푘-th system performance and 푆푝푒푐푘 is
its speciﬁed performance achievable at yield level 푌푠,푘. The input variables for the
cost function are the yield and performance set [
−→
푌 퐵,
−→
푃 퐵] of all the building blocks.
With Eqn. 3.8, the objective of yield-aware optimization is to minimize the cost
function 퐹 (
−→
푌 퐵,
−→
푃 퐵) at speciﬁed system yield levels 푌푠,푘. By changing the weighting
coeﬃcients 푊푠 for diﬀerent system performances, the system-level optimization can
be set to tradeoﬀ between diﬀerent system performances.
62
c. Optimization Algorithm
The system cost function has been formulated. Now the question is how to optimize
it. As we are interested in the system performances in the statistical sense, accurate
evaluation of statistical system performances is required. Since the optimization
variables are [
−→
푌 퐵,
−→
푃 퐵], we need to know the mapping function 푓푠,푘 to get 푃
푌푠,푘
푠,푘 from
the multi-yield pareto fronts, formulated as 푃
푌푠,푘
푠,푘 = 푓푠,푘(
−→
푌 퐵,
−→
푃 퐵).
To achieve this, we ﬁrst transfer the optimization variables in each building block
back to the design space using Eqn. 3.6, then we perform Monte-Carlo simulation at
each design point to evaluate the block performance distributions. This step can be
accelerated by extracting an empirical Kriging based regression model. If there are
correlations between device-level parameters across diﬀerent building blocks, they can
be naturally captured in the block performance distributions as it is now possible to
generate Monte-Carlo samples at the block level with such correlations considered.
The correlated block performance distributions are then mapped into the system per-
formance distributions using system-level behavioral simulation. Again, if needed,
this step can be spedup by extracting a Kriging regression model. The system perfor-
mances at yield 푌푠 are obtained by ﬁnding the values meeting the yield requirement
in the system performance distributions. The mapping ﬂow from multi-yield pareto
fronts to system performances is shown in Fig. 27.
The ﬂowchart of the hierarchical optimization using multi-yield pareto fronts is
illustrated in Fig. 28. We start from the yields and performances of multi-yield pareto
fronts, get back to the design points, then obtain the statistical system performances
to evaluate the system cost function. The optimization goal is to reduce the cost
function in Eqn. 3.8. Since the multi-yield pareto fronts are self constrained, the
optimizer also need to take Eqn. 3.6 as the optimization constrain.
63
 


 




	


















 


ﬀ
ﬁﬂ ﬃ
 !
"#
$
%

&
 



%




 %
'
(

%) 

'*
$



&
+



	




%
'

%



,ﬂ

-

.-



ﬀ
ﬁﬂ ﬃ
 !
"#
,1BD
/
0
,B iD
1
2
,B ND
3
4
[ , ]B BY P
5
6
5
6
1 1( , )B BY P
7
8
7
8
( , )Bi BiY P
9
:
9
:
;
+



&





(
<


%











%
 

=
ﬀ

ﬀ


 >

ﬂﬁ?


ﬀ

ﬂ

ﬃ


.
@

%) 

'
%



,
, ,
( , )s kY B Bs k s kP f Y P=
A
B
A
B
 

=
ﬀ

ﬀ


 >

ﬂﬁ?
C




 DEF
G
H
C
ﬂ
C



ﬂ


I
C
ﬂ

C


.ﬀ
DEF
G
, ( , )B i Bi BiPYiD D Y P=
J
K
J
K
J
K
Fig. 27. Mapping from multi-yield pareto fronts to yield-aware system performances.
For a system with 푀 performances and 푁 building blocks, the complete system-
level optimization can be formulated as
푚푖푛 퐹 (
−→
푌 퐵,
−→
푃 퐵)
푠.푡.
⎧⎨
⎩
푃
푌푠,푘
푠,푘 = 푓푠,푘(
−→
푌 퐵,
−→
푃 퐵), 푘 = 1, 2, ⋅ ⋅ ⋅푀
푀푌 (
−→
푌 퐵푖,
−→
푃 퐵푖) = 0, 푖 = 1, 2, ⋅ ⋅ ⋅푁
−→
푌 퐵푖 푀푖푛 ≤
−→
푌 퐵푖 ≤
−→
푌 퐵푖 푀푎푥
(3.9)
where the optimization variables are multi-yield pareto front yields and performances
[
−→
푌 퐵,
−→
푃 퐵] for all the N building blocks. The dimensions of
−→
푌 퐵푖 and
−→
푃 퐵푖 in each
building block are the same, which depend on the number of block-level performances
considered. Note that each multi-yield pareto front acts as an constraint in the
optimization and reduces the degrees of freedom for of the system-level optimization
by one. Suppose the 푖-th block of has 퐿푖 block-level performances, the dimension of
the optimization search space is
∑푁
푖 (2퐿푖 − 1). Hence, the proposed approach extends
the eﬃciency of hierarchical analog optimization from deterministic optimization to
statistical optimization.
The system-level optimization problem can be solved by any suitable optimiza-
tion method, particularly a derivative free method. Global optimization algorithm
64
based on multilevel coordinate search (MCS) [34] is adopted in the optimization ﬂow.
( ( , ))B BMin F Y P
 

 

,1SY ,S kY ,S MY
,1SP ,S kP ,S MP

  
	













 


















ﬀﬁ
ﬂﬃ

ﬃ
 !"
#$%
&'
(
)
'
*
%+
,-.*
&/ 0
.
/
1
&2
0/
. 3
%
/45 6
78
[ , ]B BY P
9
:
9
:
,
, ,
( , )s kY B Bs k s kP f Y P= ;
<
;
<
,1BD
=
>
,B iD
=
>
,B ND
?
@
( , ) 0Bi BiMY Y P =
A
B
A
B
C
*
2'
D1 ,-.-E
0/
. 3
%
/45 6
78
F



	G
Fig. 28. Hierarchical optimization using multi-yield pareto fronts.
We summarize the complete algorithm consisting of Kriging modeling construc-
tion, multi-yield pareto front generation and system-level optimization in Algorithm 2.
Algorithm 2 Yield-aware Hierarchical Optimization Flow
Inputs: Design variable set
−→
퐷 = {퐷1,퐷2, ⋅ ⋅ ⋅ ,퐷푛}, system speciﬁcations
{푆푝푒푐1, 푆푝푒푐2, ⋅ ⋅ ⋅ , 푆푝푒푐푚} and weighting coeﬃcients {휔1, 휔2, ⋅ ⋅ ⋅ , 휔푚}
Outputs: Optimized system performances
−→
푃 푠푦푠표푝푡 and corresponding design variables
−→
퐷표푝푡.
1: divide system into 푁 sub-blocks
2: construct Kriging performance model
−→
퐾 (
−→
퐷) in nominal case for each building block
3: generate nominal pareto front
−−→
푃퐹 (
−→
푃 퐵) using iterative search
4: construct Kriging performance model
−→
퐾(
−→
퐷,
−→
푉 ) containing both design and process
variables
5: build multi-yield pareto front 푀푌 (
−→
푌 퐵푖,
−→
푃 퐵푖) = 0 for each building block
6: construct system cost function using Eqn. 3.8
7:
−→
푃 푠푦푠표푝푡 = 푚푖푛(퐹 (
−→
푌 퐵 ,
−→
푃 퐵)) 푠.푡. Eqn. 3.9
8: return
−→
푃 푠푦푠표푝푡 and
−→
퐷표푝푡
65
5. System Optimization Examples
We demonstrate the detailed applications of the proposed yield-aware hierarchical
optimization including behavioral modeling, multi-yield pareto front generation and
system-level optimization formulation by looking into two design examples in this
section.
a. Two-stage Ampliﬁer
To illustrate important aspects of yield-aware hierarchical optimization, we use a
simple but revealing two-stage operational ampliﬁer example as shown in Fig. 29.
This ampliﬁer is designed for the application of low-bandwidth pre-ampliﬁcation and
requires low-power consumption with reasonable gain. The ampliﬁer is implemented
in a 90nm CMOS technology. The design parameters consist of transistor sizes,
biasing currents and the capacitance. We consider a global process variable of gate
oxide thickness 푇표푥 with 3휎=15% for all the transistors, 3휎=10% mismatch of gate
length 퐿 for each transistor and 3휎=10% variation for capacitance and resistance.
 









	












 
 





 


Fig. 29. Schematic of two-stage operational ampliﬁer.
The ampliﬁer is partitioned into two stages and we model each stage as one
building block. The global process variable 푇표푥 are modeled as input variable in
the Kriging performance models of both stages. The block performances are gain
66
and power, the system performances selected are also gain and power. We also im-
pose a requirements of phase margin to make sure the optimized circuits are stable.
For this circuit, the behavioral models that transfer block performances to system
performances are rather straightforward and can be evaluated analytically:
퐺푎푖푛푠푦푠푡푒푚 = 퐺푎푖푛푠푡푎푔푒1 ×퐺푎푖푛푠푡푎푔푒2
푃표푤푒푟푠푦푠푡푒푚 = 푃표푤푒푟푠푡푎푔푒1 + 푃표푤푒푟푠푡푎푔푒2
(3.10)
The cost function in Eqn. 3.8 is rewritten for the ampliﬁer case,
퐹 (
−→
푌 퐵,
−→
푃 퐵) = 푊퐺 ⋅
퐺푎푖푛푎푚푝(푌퐺)
퐺푎푖푛푆푝푒푐
+푊푃 ⋅
푃표푤푎푚푝(푌푃 )
푃표푤푆푝푒푐
(3.11)
The optimization target is to reduce the system cost function together with the phase
margin requirement, here we set the phase margin to be larger than 60표, so the system
optimization is formulated as
min
(
퐹 (
−→
푌 퐵,
−→
푃 퐵)
)
푠.푡. 푃푀 ≥ 60표
(3.12)
In order to verify if the multi-yield hierarchical optimization can ﬁnd the actual
system optimal point, we compare the results of the proposed method and that of
the ﬂat yield-aware optimization method. The ﬁxed-yield hierarchical optimization
method is also evaluated for comparison. The ﬂat optimization is performed by using
a simulation-based optimization approach similar to a commercial optimization tool
[28]. The optimizer [34] calls Spectre [11] to run Monte-Carlo simulation at each
design point it reaches and uses the simulated performances at targeted yields level
as the guidance for brute-force optimization search. The major computation cost for
the proposed multi-yield hierarchical optimization method comes from generating the
multi-yield pareto fronts, which needs around 20 minutes. Once the multi-yield pareto
fronts are obtained, system-level optimization requires only 1-2 minutes, since the
67
performance mapping from block-level to the system-level is analytical as in Eqn. 3.10.
The comparison of results of diﬀerent optimization methods are illustrated in
Fig. 30. The system-level yields 푌퐺 and 푌푃 are set to 70% for both gain and power.
The data in the plots are all evaluated with transistor-level Monte-Carlo simulation
using Spectre for better accuracy. From the ﬁgure, we can see that the system trade-
oﬀs at the converged optimization points captured by the proposed method match
the ﬂat optimization results very well, while the ﬁxed yield pareto front optimization
is not able to converge to the actual optimal system performance trade-oﬀ curves.
20 40 60 80 100 120 140 160
0
50
100
150
200
250
300
Power (uW)
G
ai
n Multi−yield pareto
Fixed−yield pareto
Flat optimization
Fig. 30. Comparison of results of diﬀerent optimization methods.
From the ampliﬁer example presented above, we can see clearly that directly
applying the pareto fronts at ﬁxed yield levels in the hierarchical optimization is not
able to ﬁnd the system optimal solutions, while the results of proposed optimization
methodology with multi-yield pareto fronts match the ﬂat optimization results very
well. Although the circuit by itself is simple, it demonstrates that the conventional
hierarchical yield-aware optimization method is not suitable and our proposed multi-
yield pareto optimization approach can be a good solution to perform yield-aware
hierarchical optimization. Next we will discuss a more realistic and complicated case,
where ﬂat optimization is not possible because of the huge simulation time needed
for the direct simulation.
68
b. Charge-pump PLL
Due to the nature of coexistence of fast and slow signals, the design and evaluation of
PLL system is quite complex and costly. So the brute-force optimization by searching
in the design space with transistor-level simulation is infeasible for PLL designs. In
this example, we demonstrate the eﬀectiveness of the proposed yield-aware optimiza-
tion methodology using a charge-pump PLL optimization as design example.
The PLL system investigated contains a voltage control oscillator (VCO), a phase
detector, a charge pump, a loop ﬁlter and a frequency divider. Among these com-
ponents, the phase detector and the frequency divider are digital components which
have few tunabilities and are robust to process variations. So we restrict our focus
on the optimization of VCO, charge pump and ﬁlter in this example.
The performances selected to build VCO behavioral models are jitter, power,
VCO gain 퐾푉 퐶푂 and Frequency 퐹표푓푓푠푒푡 at certain control voltage 푉표푓푓푠푒푡. The VCO
voltage-frequency curve is linearized around 푉표푓푓푠푒푡 with the slope of 퐾푉 퐶푂. 푉표푓푓푠푒푡
is predeﬁned as the center of VCO linear gain region. These coeﬃcients are used
to capture the complete voltage-frequency curve, which preciously models the PLL
acquisition procedure especially in low/high control voltages. The model used is
of advantage to the conventional models only considering the VCO gain within a
ﬁxed frequency region [15, 35], since the VCO behaviors in nonlinear regions are also
captured here.
For the charge pump, we include jitter, charge up current 퐼푢푝 and charge down
current 퐼푑표푤푛 in the behavioral model. The parameters of loop ﬁlters are the capac-
itances of 퐶1 and 퐶2 with the resistance of 푅, which can all be handled directly at
system-level simulation. The power of charge pump and loop ﬁlter can be calculated
with charge pump currents [15]. The behaviors of digital building blocks in the sys-
69
tem, including frequency divider and the phase detector, are also characterized using
delay and slew and included in the behavioral models. The simulation and modeling
of PLL is shown in Fig.31.
 






	












		


Charge Pump Ring Oscillator Digital Blocks






ﬀ ﬁﬂ
ﬃ
ﬂﬁ






 



!"#
$
%
&
'
()*
'
()*
+,
-.
ﬁ
/
0
/1

234567 84
9:
8 ;<=> 35
?
@@
A=A63B C7B<=
D
:E
3FF;
G
;
E
7
HIJ
K
?L
>
MN
>
??
@OOO
P
;35
?
IQ
M
?
E
A=F<F
?
<3=;<
:
8>
7=4234567
234567
K
:
3 ;
:
3=> 35
?
@@
A=A63B C7B<=
8
L
AF7 R STUV
9W
P
T<4
?
234
D
;
:E
3FF;8
L
AF7
M
UV
9W
PX
S>
OO@O
7=4234567
Y
Z[\
 
[

]
^

[
_
[]
_

Z
`
aZb[
 c
de
ﬁ
1
fg
h

ijk
_
[Z

l
Y
Z[\
 
[

]
^

j
m
j

[Z
Y
n%
& o
0
1
g
/

1p
Y



q
Y
&r
Lockin time
Jitter
Power
Verilog-A
models
sk
[

t
[
k
a
^
Jitter, Iup, Idown Jitter, Power, Gain, OffsetR, C1, C2
Loop Filter
ou
v

wxy
F
?
A
?
<
:
GJz
:
5
E
K
7
o{
|
}
o~u

~
o
|v
o
o
|



	
Fig. 31. PLL modeling and optimization.
When the behaviors of analog building blocks are extracted, they are mapped
into Verilog-A models for the system-level simulation [15, 36]. In the system level, the
performances of PLL are considered as lockin time 푇 , power 푃 (exclude digital blocks)
as well as jitter 퐽 [37]. With the Verilog-A behavioral models, the mapping from
block-level performances to the system-level performances can be achieved eﬃciently.
Multi-yield pareto fronts for individual building block are extract to perform
yield-aware hierarchical optimization. For VCO, since the frequency 퐹표푓푓푠푒푡 is only
a model parameter which is not considered as performance metric, we generate the
70
pareto fronts considering power, jitter and gain as 푃푉 퐶푂(푝표푤푒푟, 푗푖푡푡푒푟, 푔푎푖푛). The
designed 퐼푢푝 and 퐼푑표푤푛 should be equal for the charge pump, but when the process
variations are considered, there will be mismatch between 퐼푢푝 and 퐼푑표푤푛, which can
introduce extra jitter and impact lockin time signiﬁcantly. So for the charge pump, we
generate pareto fronts considering jitter, average charge current 퐼푐푝 = 0.5⋅(퐼푢푝+퐼푑표푤푛)
and mismatch current 퐼푚푖푠 = 푎푏푠(퐼푢푝 − 퐼푑표푤푛) as 푃퐶푃 (푗푖푡푡푒푟, 퐼푐푝, 퐼푚푖푠). Charge pump
power is not included in the pareto fronts since it is proportional to 퐼푐푝. The cost
function in Eqn. 3.8 is rewritten for the PLL case as
퐹 (
−→
푌 퐵,
−→
푃 퐵) = 푊푃 ⋅
푃푃퐿퐿(푌푃 )
푃푆푝푒푐
+푊퐽 ⋅
퐽푃퐿퐿(푌퐽)
퐽푆푝푒푐
+푊푇 ⋅
푇푃퐿퐿(푌푇 )
푇푆푝푒푐
(3.13)
The design space constrains in Eqn. 3.9 is further expressed for the PLL case as
⎧⎨
⎩
푀푌퐶푃 (푌푗푖푡, 푃푗푖푡, 푌퐼푐푝, 푃퐼푐푝, 푌퐼푚푖푠, 푃퐼푚푖푠) = 0
푀푌푉 퐶푂(푌푗푖푡, 푃푗푖푡, 푌푝표푤, 푃푝표푤, 푌푔푎푖푛, 푃푔푎푖푛) = 0
푅푚푖푛 ≤ 푅 ≤ 푅푚푎푥;퐶푚푖푛 ≤ 퐶 ≤ 퐶푚푎푥; 푌푚푖푛 ≤ 푌 ≤ 푌푚푎푥
(3.14)
The input variables for the system level optimization include not only the yield-levels
and performances of VCO and charge pump multi-yield pareto fronts, but also loop
ﬁlter parameters since they can be evaluated directly in the system-level simulation.
The PLL example is implemented in 90nm CMOS technology, the process vari-
ations considered include threshold voltage 푉푡ℎ for each transistor with a variation of
3휎=10%. For the system-level simulation, a single transient simulation of 10휇푠 for
jitter and lockin time analysis requires about 40 seconds even with Verilog-A models.
To alleviate the optimization cost, we build another Kriging model to map all the
building block-level performances (푉 퐶푂푝표푤푒푟, 푉 퐶푂푗푖푡푡푒푟, etc) and ﬁlter parameters
to the PLL system performances, then use these high-level Kriging models to guide
the optimizer to ﬁnd optimal solutions. To ensure the accuracy, when the optimiza-
71
tion design points are obtained, we use Spectre and Verilog-A models to ﬁnd the
performances with speciﬁed yields in these points.
The examples of multi-yield pareto fronts for charge pump and VCO are illus-
trated in Fig. 32.
0246
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Icp (uA)
Im
is
 (u
A)
30 40 50 60 70 80
2
2.2
2.4
2.6
2.8
3
3.2
Power (uW)
VC
O
 g
ai
n 
(G
Hz
/V
)
Y1=50%, Y2=50%
Y1=60%, Y2=80%
Y1=80%, Y2=60%
Y1=95%, Y2=95%
Y1=50%, Y2=50%
Y1=60%, Y2=80%
Y1=80%, Y2=60%
Y1=95%, Y2=95%
Fig. 32. Multi-yield pareto fronts for charge pump (left) and VCO (right).
In the system level, the lockin time and power trade-oﬀs for the PLL are ana-
lyzed. We change the weighting coeﬃcients for power and lockin time in Eqn. 3.13,
then diﬀerent optimal PLL performance combinations can be obtained, as shown in
Fig. 33. The two system performances are set to the same yield level. The system
performances shown in the ﬁgure are simulated using Spectre with Verilog-A models.
The performance trade-oﬀs without consideration of yield are obtained similar to [15],
also shown in Fig. 33. We can see from the ﬁgure that the yield could be very low if
we do the optimization without considering the yield information, especially for the
region around the point denoted as “Opt. point”. The nominal system performances
of “Opt. point” and the initial design point evaluated using direct Spectre simulation
are also plotted in Fig. 33, denoted as “Init. design” and “Opt. design”, respectively.
72
0.5 1 1.5 2 2.5 3 3.5
35
40
45
50
55
60
65
70
Lockin time (us)
Po
w
er
 (u
W
)
50% Yield
80% Yield
Nominal
Init. design
Opt. designOpt. point
Fig. 33. Trade-oﬀs of lockin time and power at diﬀerent yield levels.
Table VI. Hierarchical optimization results for PLL.
PLL VCO Charge pump
Power Lockin Gain Oﬀset Power Jitter 퐼푢푝 퐼푑표푤푛 Jitter
Init. D 61.9휇W 2.37휇s 2.43G/V 1.54GHz 57.2휇W 0.87ps 2.11휇A 2.07휇A 54.2fs
Opt. D 43.4휇W 0.95휇s 2.36G/V 1.36GHz 36.0휇W 0.92ps 3.55휇A 3.53휇A 63.0fs
Opt. P 45.4휇W 0.84휇s 2.39G/V 1.33GHz 36.7휇W 1.18ps 3.64휇A 3.62휇A 68.6fs
Table VI shows the results of nominal performances comparison for the initial
design and the “Opt. point” in the obtained pareto front. The system and block-
level performances of initial design and “Opt. point” are both evaluated with direct
transistor-level Spectre simulation and listed in the ﬁrst and second row, respectively.
The PLL system results in the third row are simulated using Spectre and Verilog-
A models, the building block performances are obtained from Kriging models for
the design parameters of “Opt. point”. We can see they are quite close to the direct
Spectre simulation as in the second row, which validates the accuracy of our Verilog-A
behavioral models.
Similarly, the trade-oﬀs between jitter and lockin time at diﬀerent yield levels
are plotted in Fig. 34.
As stated before, it is not possible to use direct transistor-level optimization to
verify if the proposed method can ﬁnd the optimal system-level performance trade-oﬀs
73
0.5 1 1.5 2
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Lockin time (us)
Jit
te
r (
ps
)
50% Yield
80% Yield
Fig. 34. Trade-oﬀs of lockin time and jitter at diﬀerent yield levels.
as in the two-stage ampliﬁer case. So we randomly sample in the design space and
simulate the corresponding PLL system performances using the Verilog-A models.
Fig. 35 shows an example of the trade-oﬀs of power and jitter at 80% yield level and
the performances of 200 random design samples. It can be seen that the performance
trade-oﬀ curve in Fig. 35 is superior to all the veriﬁcation points, which conﬁrms that
our proposed method achieves optimal designs.
1 2 3 4 5 6 7
30
40
50
60
70
80
90
100
110
Lockin time (us)
Po
w
er
 (u
W
)
Performance trade−off
Verification points
Fig. 35. Veriﬁcation of performance trade-oﬀs for lockin time and power.
The runtime information is summarized in Table VII, the computation cost for
the complete ﬂow is only a few hours, which is quite acceptable for large analog circuit
synthesis.
74
Table VII. Runtime summary for PLL optimization.
VCO Charge pump
Kriging model gen. 12 min 5 min
Multi-yield pareto gen. 103 min 117 min
Cost function optimization 10 min/point
C. Summary
Automated syntheses of complex analog systems in scaled CMOS technologies are pro-
posed by conducting numerical optimization based on eﬃcient circuit modeling and
system optimization formulation in this chapter. We start with yield-aware optimiza-
tion for small-scale analog circuits by employing iterative search based optimization
approach to eﬃciently seek optimal performance tradeoﬀs under yield constraints in
high-dimensional design parameter and process variation spaces. We demonstrate the
proposed approach by constructing pareto front analog performance models. More-
over, speciﬁc techniques including partial Kriging evaluation approach have been
developed to facilitate eﬃcient design space exploration while considering process
variations. Experimental results of one oscillator and one operational ampliﬁer con-
ﬁrm the good accuracy and eﬃciency of the presented optimization approach.
In order to handle the large analog/mixed-signal system optimization problems,
we further propose a methodology to perform yield-aware hierarchical optimization
by addressing the problems of generating yield-aware pareto fronts of building blocks
and formulating system performances at speciﬁed yield levels. The system-level per-
formance distributions are naturally captured by searching in the multi-yield pareto
fronts while still maintaining the eﬃciency of hierarchical optimization. The proposed
methodology is validated through the optimizations of a two-stage ampliﬁer and a
75
large charge-pump PLL design with good eﬃciency achieved in both design cases.
76
CHAPTER IV
ON-CHIP TEST FOR ANALOG/MIXED-SIGNAL CIRCUITS
Besides performing yield-aware system optimization, the other approach to alleviate
the inﬂuences of process variations in analog/mixed-signal systems is to use on-chip
test and performance compensation functions to self heal the failing chips caused
by process variations. The implementations of on-chip test/diagnoise idea can vary.
The access of internal analog signals could be very diﬃcult for integrated circuits,
so circuitries are designed to modify the original structures to improve the circuit
performance accessibilities. These design techniques that add testability features
are classiﬁed as design for test (DFT), which make the testing of analog/mixed-signal
circuit easier. A more complete realization is to integrate the measurement and failure
classiﬁcation function blocks in the analog/mixed-signal circuits so the systems can
perform self diagnose and make go/no-go decisions. This approach is called built-in
self-test (BIST) [38].
Careful design and optimization for on-chip testing function blocks are required
for eﬃcient and eﬀective parametrical yield capturing. In this chapter we ﬁrst propose
a cost-eﬀective linearity test method targeting for switched-capacitor Sigma-Delta
ADCs [39]. The underlining system analysis supports our idea that complex sys-
tem performances can be indirectly tested by some easy-to-measure alternatives. As
such we can achieve eﬃcient and accurate built-in self-testing in analog/mixed-signal
systems. We further look into diﬀerent design choices of design-for-test schemes for
charge-pump PLLs and perform optimization to enhance test circuit eﬃciency [9].
77
A. Linearity Test for Sigma-Delta ADCs
Static linearity test of Sigma-Delta ADCs imposes stringent requirement on the pre-
cision of test signals and leads to excessive test time. Consequently, ADC test re-
mains as a bottleneck to the product development and contributes signiﬁcantly to
the devolvement cost. In this section, a cost-eﬀective linearity test and diagno-
sis methodology is presented for Sigma-Delta ADCs with multi-bit internal DACs.
Frequency-domain nonlinear circuit analysis is employed to systematically establish
the connection between the static linearity measure (INL) and its frequency domain
counterpart (harmonic distortions (HDs)), making it possible to predict INL using
much simpler HD measurements. The eﬃcacy of the proposed technique is demon-
strated by successful construction of accurate simulation-based INL prediction models
which are also compared against with closed-form models resulted directly from our
circuit analysis.
1. System Analysis Using Volterra Series
Our central approach in achieving the low-cost ADC test and diagnosis is to pre-
dict INL via frequency-domain measurements (HDs) easily obtainable by applying a
sinusoidal input to the circuit under test (CUT). To achieve this goal, the connec-
tion between INL and HDs must be established. This is accomplished by performing
discrete-time Volterra series analysis suitable for analyzing the nonlinear circuit be-
haviors of the targeted switched-capacitor ΣΔ ADCs.
In Volterra series, the output of a nonlinear system is considered as a sum of the
responses of increasing orders [40]. In the frequency domain, Volterra series can be
used rather straightforwardly to calculate harmonic distortions and intermodulations
under multiple-tone excitations. If the input signal 푢(푘) is an 푛-tone input 푢(푘) =
78
푒푗휔1푘 + 푒푗휔2푘 + ⋅ ⋅ ⋅ + 푒푗휔푛푘, the intermodulation component at the sum frequency
휔1 + 휔2, ⋅ ⋅ ⋅ , 휔푛 can be written as
푦푛(푘) = 푛!퐻푛(휔1, 휔2, ⋅ ⋅ ⋅ , 휔푛)푒
푗푘(휔1+휔2,⋅⋅⋅+휔푛) (4.1)
where 퐻푛(휔1, 휔2, ⋅ ⋅ ⋅ , 휔푛) is the 푛-th order nonlinear transfer function. The nonlinear
transfer functions can be considered as extensions to the familiar linear (ﬁrst order)
transfer function and are used as a canonical characterization of the weakly nonlinear
system behavior. For ΣΔ ADCs, the transfer functions can be used to calculate HDs
and INLs.
a. Nonlinear System Modeling
The switched-capacitor ΣΔ ADCs are intrinsically nonlinear discrete-time systems.
To make Volterra series applicable for ΣΔ ADCs, we need to model the internal
quantizers properly. A widely used approach in design analysis is to consider the
quantizer as a linear gain model with added quantization noise at the output [4].
Usually, this additive quantization noise can be modeled as white noise over the
entire signal bandwidth. In spite of the approximation introduced, this choice allows
us to consider the strong nonlinear behavior of quantization through the standard
means of quantization noise.
For each switched-capacitor integrator in the system, the state transfer can be
modeled as a nonlinear function of the integrator’s inputs and its current state
푦(푘 + 1) = 퐹 (푦(푘), 푥(푘), 푑(푘)), (4.2)
where 푦(푘 + 1) is the current output of the integrator, 푦(푘) is the previous output,
푥(푘) and 푑(푘) are the previous input signal and digital feedback signal, respectively.
79
The nonlinear function can be further written as
퐹 (푦(푘), 푥(푘), 푑(푘)) = 푦(푘) + 푎1 ⋅ 푢(푘) + 푎2 ⋅ 푢(푘)
2 + ⋅ ⋅ ⋅+ 푎푛 ⋅ 푢(푘)
푛 (4.3)
where 푢(푘) is the diﬀerence between the input analog signal and the digital feedback
signal, 푎1 is the linear gain of the integrator transfer characteristics, and 푎2, ⋅ ⋅ ⋅ , 푎푛
are the second-order to the 푛-th order coeﬃcients which are used to model the non-
linearities of the integrator.
The transfer curve of an internal D/A converter in the system can be written as
푓 = 푏0 + 푏1 ⋅퐷 + 푏2 ⋅퐷
2 + ⋅ ⋅ ⋅+ 푏푛 ⋅퐷
푛 (4.4)
where 푓 is the DAC output, 퐷 is the digital input, 푏0 is the oﬀset, 푏1 ⋅퐷, ⋅ ⋅ ⋅ , 푏푛 ⋅퐷
푛
are the ﬁrst to the 푛-th order components at the DAC output.
b. Nonlinear Transfer Function Analysis
With the previous analysis, we start to derive transfer functions in the frequency
domain using Volterra series. As an example, we will consider a second-order ΣΔ
ADC with 2-bit internal quantizer as shown in Fig. 36.
  

( )Y k2 ( )X k1( )X k( )u k
10 11 12 13, , , ,b b b b  20 21 22 23, , , ,b b b b 
11 12, ,a a  21 22, ,a a 
Fig. 36. Modeling of a second-order ΣΔ ADC.
The nonlinear characteristics of the integrators and the internal DAC will be
modeled using second and third order polynomials, respectively. The state equations
80
for the whole system can be written as
푋1(푘 + 1) = 푋1(푘) + 푎11퐹푑,1(푘) + 푎12퐹푑,1(푘)
2,
푋2(푘 + 1) = 푋2(푘) + 푎21퐹푑,2(푘) + 푎22퐹푑,2(푘)
2,
푌 (푘) = 훼 ⋅푋2(푘),
(4.5)
where 푋1(푘) is the output of the ﬁrst-stage integrator, 푋2(푘) is the output of the
second-stage integrator and 푌 (푘) is the digital output. 퐹푑,1(푘) and 퐹푑,2 are given as
퐹푑,1(푘) = 푢(푘)− 푏10 − 푏11푌 (푘)− 푏12푌 (푘)
2 − 푏13푌 (푘)
3
퐹푑,2(푘) = 푋1(푘)− 푏20 − 푏21푌 (푘)− 푏22푌 (푘)
2 − 푏23푌 (푘)
3, (4.6)
where 푏10, 푏11, 푏12, 푏13, 푏20, 푏21, 푏22, 푏23 are the coeﬃcients of transfer curves for the two
internal feedback DACs, 푎11, 푎12, 푎21, 푎22 are the nonlinear coeﬃcients for the two
integrators and 훼 is the linear gain for the quantizer. In Volterra analysis, transfer
functions and responses at diﬀerent orders are analyzed recursively [41, 42]. We start
from deriving the ﬁrst order (linear) transfer functions. When a single-tone signal
푒푗휔푘 is applied as the input, we substitute each output 푋(푘) by its linear response
푋(푘) = 퐻1(휔)푒
푗휔푘 into Equation 4.5 and solve the resulting system equations. We
obtain the ﬁrst order transfer functions as
퐻푥11 (휔) =
푎11(푒푗휔−1)+푎11푎21푏21훼
(푒푗휔−1)2+푎21푏21훼(푒푗휔−1)+푎11푎21푏11훼
,
퐻푥21 (휔) =
푎11푎21
(푒푗휔−1)2+푎21푏21훼(푒푗휔−1)+푎11푎21푏11훼
,
퐻표푢푡1 (휔) =
푎11푎21훼
(푒푗휔−1)2+푎21푏21훼(푒푗휔−1)+푎11푎21푏11훼
,
(4.7)
where 퐻표푢푡1 (휔) is the ﬁrst order transfer function at the quantizer output.
To derive the second order transfer functions, a two-tone input 푒푗휔1푘 + 푒푗휔2푘 is
applied to the system. For any response 푋(푘), we consider its frequency component
81
at 휔1 + 휔2, which is given as
푋(푘) = 2퐻2(휔1, 휔2)푒
푗(휔1+휔2)푘. (4.8)
We substitute each second order response into Equation 4.5 and keep only the signal
components at 휔1 + 휔2. A set of equations are obtained as
2퐻푥12 (휔1, 휔2)푒
푗(휔1+휔2)(푘+1) = 2퐻푥12 (휔1, 휔2)푒
푗(휔1+휔2)푘
+푎11[−2푏11푌 (휔1, 휔2)푒
푗(휔1+휔2)푘 − 2푏12퐻1(휔1)퐻1(휔2)푒
푗(휔1+휔2)푘]
+푎12[−푏11퐻1(휔1)− 푏11퐻1(휔2)]
2,
2퐻푥22 (휔1, 휔2)푒
푗(휔1+휔2)(푘+1) = 2퐻푥22 (휔1, 휔2)푒
푗(휔1+휔2)푘
+푎21[2퐻
푥1
2 (휔1, 휔2)푒
푗(휔1+휔2)푘 − 2푏21푌 (휔1, 휔2)푒
푗(휔1+휔2)푘
−2푏22퐻
푥2
1 (휔1)퐻
푥2
1 (휔2)푒
푗(휔1+휔2)푘]
+푎22[−푏21퐻1(휔1)푒
푗휔1푘 − 푏21퐻1(휔2)푒
푗휔2푘]2,
푌 (휔1, 휔2)푒
푗(휔1+휔2)푘 = 훼퐻푥22 (휔1, 휔2)푒
푗(휔1+휔2)푘,
(4.9)
where 퐻푥12 (휔1, 휔2) and 퐻
푥2
2 (휔1, 휔2) are the second order transfer functions at the ﬁrst
and the second stage integrator outputs. The second order transfer function of the
ADC output 퐻표푢푡2 (휔1, 휔2) can be shown to be
퐻표푢푡2 (휔1, 휔2) = 훼
퐹21
퐹22
퐻1(휔1)퐻1(휔2),
퐹21 = 2푎12푏12 − 푎11푎21푏12 − (2푎22푏
2
22 − 푎21푏22)(푒
푗(휔1+휔2) − 1),
퐹22 = (푒
푗(휔1+휔2) − 1)2 + 푎21푏21훼(푒
푗(휔1+휔2) − 1) + 푎11푎21푏11훼,
(4.10)
In a similar way, the third order transfer function of the ADC output can be
82
derived as
퐻표푢푡3 (휔1, 휔2, 휔3) = 훼
퐹31
퐹33
+ 훼퐹32
퐹33
(푒푗(휔1+휔2+휔3) − 1),
퐹31 = (2푎21푎12푏
2
11 − 2푎21푎11푏12)퐻1(휔1)퐻2(휔2, 휔3)
+(2푎21푎12푏11푏12 − 푎21푎11푏13)퐻1(휔1)퐻1(휔2)퐻1(휔3),
퐹32 = (2푎22푏
2
21 − 2푎21푏12)퐻1(휔1)퐻2(휔2, 휔3)
+(2푎22푏11푏12 − 2푎21푏12)퐻1(휔1)퐻1(휔2)퐻1(휔3),
퐹33 = (푒
푗(휔1+휔2+휔3) − 1)2 + 푎21푏21훼(푒
푗(휔1+휔2+휔3) − 1)
+푎11푎21푏11훼.
(4.11)
We only consider transfer functions up to the third order, which are usually adequate
to describe the weakly nonlinear system behavior.
2. Predicting INL using HDs
a. Relating INL with Transfer Functions
The deﬁnition of INL is the deviation of the actual ADC transfer curve from the ideal
transfer curve. The ideal curve can be determined either by a least-square ﬁtting
through the acquired samples or by a straight line through the two end points of the
ADC [43]. Here, the two end points method is used to deﬁne the INL [44] as shown
in Fig. 37.
The ADC transfer curve after oﬀset and gain correction can be represented by
an 푛-th order polynomial
푦푎푐푡푢푎푙(푥) = 훽1푥+ 훽2푥
2 + ⋅ ⋅ ⋅+ 훽푛푥
푛, (4.12)
where 푦푎푐푡푢푎푙(푥) is the ADC output, 푥 is the analog input and 훽1, 훽2, ⋅ ⋅ ⋅ , 훽푛 are the
coeﬃcients to model the system nonlinearity. Suppose that the input ramp signal
used to measure the INL rises from −퐴 to 퐴. The slope 훾 of the ideal transfer curve
connecting the two end points can be determined by evaluating the two end point
83
 


 
								





			





		






	
	





	








		






	








	








 

















ﬀﬁﬂ

ﬃﬃ 













! "
# $





% &

"


$


"
$

'%










2
1 2
n
ny x x xβ β β= + + +(
)


*
y x bγ= +
Fig. 37. Deﬁnition of integral nonlinearity.
ADC outputs using nonlinear transfer functions at DC
훾 = 퐴퐻1(0)+⋅⋅⋅+퐴
푛퐻푛(0,⋅⋅⋅,0)
퐴−(−퐴)
− (−퐴)퐻1(0)+⋅⋅⋅+(−퐴)
푛퐻푛(0,⋅⋅⋅,0)
퐴−(−퐴)
(4.13)
If only the ﬁrst three orders of transfer functions are considered, the ideal transfer
slope becomes 퐻1 + 퐴
2 ⋅ 퐻3 with oﬀset 퐴
2 ⋅ 퐻2. So the ideal transfer curve can be
written as
푦푖푑푒푎푙(푥) = (퐻1 + 퐴
2 ⋅퐻3) ⋅ 푥+ 퐴
2 ⋅퐻2 (4.14)
where 퐻1, 퐻2, 퐻3 are the transfer functions at DC. With Equation 4.12 and Equa-
tion 4.14, INL can be written as
퐼푁퐿(푥) = 푉푠푐푎푙푒 ⋅ (푥
2퐻2 + 푥
3퐻3 − 푥 ⋅ 퐴
2 ⋅퐻3 −퐴
2 ⋅퐻2)
푉푠푐푎푙푒 =
2푛−1
2A⋅(퐻1+퐴2⋅퐻3)
, (4.15)
where 푉푠푐푎푙푒 is the scaling factor used to convert the deviation from the ideal transfer
curve into LSB scale.
84
b. Relating INL with HDs
The relationship between INL and transfer functions have been derived in the previous
section. Measuring harmonic distortions using automatic test equipment will become
a feasible way to estimate INL if a relationship of INL and HDs can be derived.
Consider applying a sinusoidal input 퐴 ⋅ 푐표푠(휔푘) to measure the HDs. Such input can
be decomposed into a pair of two complex exponentials 퐴/2 ⋅ 푒푗휔푘 and 퐴/2 ⋅ 푒−푗휔푘.
The corresponding response can be computed using nonlinear transfer functions and
leads to the following expressions for HDs
푙1(휔) =
1
2
퐴퐻1(휔)푒
푗휔푘 + 1
2
퐴퐻1(−휔)푒
−푗휔푘
= 1
2
퐴퐻1(휔)푒
푗휔푘 + 1
2
퐴퐻∗1 (휔)푒
−푗휔푘,
푙2(2휔) =
1
4
퐴2퐻2(휔, 휔)푒
2푗휔푘 + 1
4
퐴2퐻2(−휔,−휔)푒
−2푗휔푘
= 1
4
퐴2퐻2(휔, 휔)푒
2푗휔푘 + 1
4
퐴2퐻∗2 (휔, 휔)푒
−2푗휔푘,
푙3(3휔) =
1
8
퐴3퐻3(휔, 휔, 휔)푒
3푗휔푘 + 1
8
퐴3퐻3(−휔,−휔,−휔)푒
−3푗휔푘
= 1
8
퐴3퐻3(휔, 휔, 휔)푒
3푗휔푘 + 1
8
퐴3퐻∗3 (휔, 휔, 휔)푒
−3푗휔푘,
(4.16)
where 푙1(휔), 푙2(2휔) and 푙3(3휔) are the ﬁrst, second, third order harmonic components
at the output, 퐻1, 퐻2, 퐻3 are the ﬁrst, second, third order transfer functions, and
퐻∗1 , 퐻
∗
2 , 퐻
∗
3 are the conjugates of 퐻1, 퐻2, 퐻3, respectively. Combining the real and
imaginary parts of the response, we can rewrite Equation 4.16 as
푙1(휔) = 퐴Re(퐻1(휔)) cos(휔푘)− 퐴Im(퐻1(휔)) sin(휔푘),
푙2(2휔) =
1
2
퐴2Re(퐻2(휔, 휔)) cos(2휔푘)−
1
2
퐴2Im(퐻2(휔, 휔)) sin(2휔푘),
푙3(3휔) =
1
4
퐴3Re(퐻3(휔, 휔, 휔)) cos(3휔푘)
−1
4
퐴3Im(퐻3(휔, 휔, 휔)) sin(3휔푘).
(4.17)
Notice that one of the characteristics of ΣΔ ADCs is oversampling. This means the
sampling frequency is much higher than the input signal bandwidth. The maximum
85
discrete-time input signal frequency is given as
휔푚푎푥 = 2휋 ⋅ 푓푏/푓푠푎푚푝푙푒 = 휋/푂푆푅, (4.18)
where 푓푏 is the input signal bandwidth, 푓푠푎푚푝푙푒 is the sampling frequency and OSR is
the oversampling ratio. Normally OSR is quite large, so 휔푚푎푥 is very small and the
imaginary parts of transfer functions in Equation 4.11 can be neglected. Therefore,
the real parts of the transfer functions are good approximation of the complete transfer
functions. In this case, Equation 4.16 can be rewritten as
푙1(휔) ≈ 퐴퐻1(휔),
푙2(2휔) ≈
1
2
퐴2퐻2(휔, 휔),
푙3(3휔) ≈
1
4
퐴3퐻3(휔, 휔, 휔),
(4.19)
where 휔 is near DC.
The amplitude and phase of each output frequency component can be measured
using spectrum analyzer or on-chip FFT function block. The phase information can
be used to determine the sign of the transfer functions. Combining Equation 4.15
and Equation 4.19, we can compute INL from HDs analytically. If we deﬁne each
harmonic distortion as the ratio of the harmonic distortion to the ﬁrst order response,
i.e. 퐻퐷푖 = 푙푖/푙1, then Equation 4.15 can be rewritten as
퐼푁퐿(푥) ≈
2푛 − 1
퐴3
(푥2퐴퐻퐷2 + 2푥
3퐻퐷3 − 2푥퐴
2퐻퐷3 − 퐴
3퐻퐷2). (4.20)
3. Simulation-based Model Generation
We have derived closed-form models for predicting INL using simple HD measure-
ments. However, one limitation of these models is the limited prediction accuracy
especially for INL due to the weakly nonlinear assumption made and omission of
other sources of nonlinearity such as quantizers. To generate more accurate predic-
86
tion models, we adopt a simulation-based approach where regression models predict-
ing INL using HDs are extracted based upon a large population of simulation data.
Compared to closed-form analytical models, this simulation-based approach allows a
more truthful account for various circuit nonlinearities. However, the diﬃculty is that
generating a large set of simulation data by transistor-level simulation while injecting
various parametric variations is computationally infeasible. To address this challenge,
we adopt the fast ΣΔ ADC simulation technique presented in Chapter II.
In principle, by using the simulation data, a regression model can be generated
for INL based on the HD measurements. For example, we can express a speciﬁcation
of interest as a polynomial function of the harmonic distortions and use least square
ﬁtting to construct such model. In our case, the second and third order harmonic
distortions with sign information, together with the linear system gain (to represent
푏11) are used as the input parameters to the prediction model. We can formulate such
a polynomial regression model as
⎡
⎢⎢⎢⎢⎢⎢⎣
1 퐻11 ⋅ ⋅ ⋅ 퐻11퐻퐷21 퐻11퐻퐷31 ⋅ ⋅ ⋅
...
1 퐻1푛 ⋅ ⋅ ⋅ 퐻1푛퐻퐷2푛 퐻1푛퐻퐷3푛 ⋅ ⋅ ⋅
⎤
⎥⎥⎥⎥⎥⎥⎦
⎡
⎢⎢⎢⎢⎢⎢⎣
훽1
...
훽푚
⎤
⎥⎥⎥⎥⎥⎥⎦
=
⎡
⎢⎢⎢⎢⎢⎢⎣
퐹1
...
퐹푛
⎤
⎥⎥⎥⎥⎥⎥⎦
(4.21)
where 푛 sets of training data are used, 퐻1푖 is the 푖-th set of linear system gain (the
ratio of the output signal power and the input signal power), 퐻퐷2푖 and 퐻퐷3푖 are the
푖-th set of second and third order harmonic distortions. The polynomial model we
propose to use is of second order, so there are totally 10 terms to represent a single
output. 퐹1 to 퐹푛 are the values corresponding to the input sets and we get the vector
훽 from least square ﬁtting as the coeﬃcients for the polynomial regression model.
For the linearity test, we are mainly interested in the largest nonlinearity level
in the system, so the maximum INL (INL푚푎푥) is a good representation for the overall
87
nonlinearity. The operation to ﬁnd the maximum INL, however, is quite nonlinear.
Therefore, a low-order polynomial regression model may not be suﬃcient to relate
the HDs with the maximum INL. To address this issue, a more suitable regression
tool, Support Vector Machine (SVM) is adopted. Support Vector Machine (SVM)
[45] is a powerful method to build highly nonlinear multivariate regression models.
In SVM regression, we consider a set of training data {(푥1, 푦1), (푥2, 푦2), ⋅ ⋅ ⋅ , (푥푛, 푦푛)},
where 푥푖 is the input and 푦푖 is the corresponding output. The input 푋 is mapped into
a high dimensional feature space using nonlinear transformation, then a best ﬁtting
function is constructed in this feature space as
푓(푥) = 휔 ⋅ 휙(푥) + 푏 (4.22)
where 휙 is the nonlinear transformation, 푏 is the bias term, and 휔 is the model
parameter to be decided. The goal of SVM regression is to ﬁnd the value of 휔 and 푏
such that the values of 푥 can be determined by minimization the regression risk.
4. Circuit Example
We demonstrate the accurate prediction of INL using simulation-based models and
compare them against simpler closed-form analytical models. The test circuit is a
second-order switched-capacitor ΣΔ ADC with 2-bit internal DAC. It is implemented
in 0.13휇푚 CMOS technology with a single 1.5 V supply. The oversampling ratio is
set to 128, the sampling clock is 1MHz and the digital ﬁlters are designed to generate
11-bit digital codes. The major sources of nonlinearities considered are the internal
DAC mismatch which is bounded within a typical value of 1%.
We use Equation 4.20 to calculate INL from harmonic distortions directly. Fig. 38
compares the calculated INL from HDs using the analytical expression and the what
is simulated by the LUT-based simulator. We can see that the analytical model can
88
predict the overall trend of the INL curve very well.
−0.6 −0.4 −0.2 0 0.2 0.4 0.6
0
1
2
3
4
5
6
7
Input (v)
IN
L 
(L
SB
)
Calculated INL
Actual INL
Fig. 38. Comparison of INL curves predicted by analytical model and simulated re-
sults.
The simulation-based INL prediction model is built with 1,000 simulation runs of
the LUT-based simulator. In each run, the maximum INL value INL푚푎푥 and HDs are
collected. The simulation-based model is built as a SVM [45] regression model. Since
it is too time consuming to use the traditional transistor-level simulation to perform
a complete INL simulation, here we only compare the predicted INL푚푎푥 values with
what are simulated by our fast LUT-based simulator, as shown in Table VIII for three
circuit samples.
Table VIII. The accuracy of the maximum INL prediction.
Actual INL Analytical Simulation-based
(LSB) (LSB) (LSB)
3.5819 3.2573 3.5818
4.9884 4.7706 4.9307
3.0084 2.6986 3.0364
In Fig. 39, the accuracy of the simulation-based INL prediction model is veriﬁed
for a large set of circuit instances. We can observe that the accuracy of the prediction
model is very good. In this case, the average relative error is 0.69%.
89
0 1 2 3 4 5 6 7 8 9
0
2
4
6
8
Actual INL (LSB)
Pr
ed
ic
te
d 
IN
L 
(L
SB
)
Fig. 39. The accuracy of INL푚푎푥 prediction using the simulation-based model.
B. On-chip Test Design and Optimization for PLL
Test of phase-locked loops has been hampered by the complex mixed-signal nature
of the system operation. While several on-chip test schemes have been proposed to
reduce the cost of PLL test, a more systematic DFT development methodology, spe-
cially targeting at the growing parametric failures in nanometer VLSI technologies, is
yet to be developed. In this section, we utilize the PLL modeling framework in Chap-
ter II which can realistically map the device-level process variations to the variations
of system-level performances. Our parametric modeling techniques allow us to exam-
ine the correlations between the system performances and speciﬁc DFT measurements
feasibly through behavioral-levels simulations. An eﬃcient methodology is developed
to facilitate evaluation and optimization of PLL DFT schemes. The application of
our DFT development methodology is demonstrated by generating optimized DFT
schemes that produce low mis-prediction levels for detection of parametric failures of
charge-pump PLLs.
1. DFT schemes for Parametric Failure Detection
The direct measurement of internal circuit nodes in a PLL is costly and it may also
degrade the PLL performance [46, 47]. A better approach is to utilize the exist-
90
ing digital blocks, such as to use the frequency divider as counter and read out its
state in order to detect chip failures [46, 47, 48, 49]. As illustrated in this chap-
ter, the performances of diﬀerent blocks are intrinsically connected. It is expected
that the frequency divider/counter output will change signiﬁcantly if there exists a
catastrophic fault. However, parametric failures may produce smaller variations in
the readout values. Hence, they are more diﬃcult to detect and deserve more care-
ful treatments. The three DFT schemes shown in Fig. 40 are under consideration.
 



	








	

	












	

 






	






	
ﬀ
	


	ﬁ



ﬀ



ﬂ
 



	








	

	










	




ﬀ
	


	ﬁ



ﬀ







	






	



ﬂ
ﬃ
 !"#$# %
ﬃ
 !"#$# &
 



	








	

	












	

 

ﬀ
	


	ﬁ



ﬀ







	






	
 

'


 




ﬂ
ﬃ
 !"#$# (
)*+, -


	


.
.
.
)*+, -


	


)*+, -


	


/
011
2
/
011
2
/
011
2





3



 
4
 
5
 
6
789
:
8
,
9;<8
/
011
2
/
011
2
/
011
2





3



 
4
 
5
 
6
789
:
8
,
9;<8
/
011
2
/
011
2
/
011
2





3



 
4
 
5
 
6
789
:
8
,
9;<8
Fig. 40. DFT scheme candidates.
Similar in spirit to the existing DFT schemes, the main idea of the proposed DFT
schemes is to control the charge pump in a way such that the output frequency of the
PLL will be altered and the state of the frequency divider is read out at certain time
91
instance for failure detection. Device-level variations and mismatch will perturb the
operation of the PLL and can push the system performances out of the speciﬁcation
window. The same parametric variations may be reﬂected in the variations in the
readout values of the frequency divider. Parametric failures may be detected if the
states of the frequency divider are strongly correlated with the design performances.
The diﬀerence of the counter output Δ푁 during a constant time period can be
written as
Δ푁 =
∫ 푇2
푇1
퐹푣푐표(푡)푑푡 =
∫ 푇2
푇1
푓(푉푐표푛(푡))푑푡 (4.23)
where 푇1 and 푇2 deﬁne the time interval. 퐹푣푐표 is the VCO frequency, which is a
function of the control voltage 푉푐표푛 for the VCO. When it comes to the DFT scheme
development, we will need to decide the way in which the the VCO control voltage
is altered and deﬁne suitable test time 푇1 and 푇2 for failure detection. Each of the
DFT schemes in Fig. 40 is discussed in details as follows.
a. Scheme 1
The ﬁrst DFT scheme is similar to the one adopted in [48]. In the normal operation
mode, the reference input and the output of the frequency divider are applied to
the frequency detector to form the closed loop conﬁguration. In the test mode,
the output of the frequency divider is disconnected from the input of the frequency
detector. The reference input and or its delayed versions are fed through the muxes
to the frequency divider forming a open loop conﬁguration. The ﬁrst delay element
has a larger delay value than the second one. To charge up the VCO, the reference
input and its delayed version through delay 2 are applied to the frequency detector.
To charge down the VCO, the delayed versions of the reference input by both delay
1 and delay 2 are selected. The delay values of the two delay elements determine the
92
phase error introduced at the frequency detector inputs. Hence, they also dictate the
coverage of the VCO tuning range in this DFT setup. Under typical design values,
delay values in the order of ten’s of the reference clock signal period are required,
which may cost signiﬁcant silicon area to implement.
For all these three schemes, the counter read-out signals, which control the start
and end points for a single test run, are generated by passing the reference clock
signal 퐹푟푒푓 through a series of D ﬂip-ﬂops. As such, the contents of the frequency
divider within a deﬁned time interval are read out.
b. Scheme 2
To solve the silicon overhead problem of the scheme 1, we propose the second DFT
scheme which employs an inverter to introduce the phase diﬀerence. Since this con-
ﬁguration introduces a constant phase delay of 휋 at the inputs of frequency divider,
the charge pump experiences the following sequence of operation: charge up → stop
→ charge up → stop until the control voltage of the VCO reaches the fully voltage
swing.
c. Scheme 3
The third DFT scheme is conﬁgured as follows: ﬁrst the PLL is put in a standard
closed-loop conﬁguration and then a standard phase lock test is performed. Once the
PLL is locked, the feedback signal frequency is changed from 퐹표푢푡 to 2퐹표푢푡 by using
the mux to select the output of the second last D ﬂip-ﬂop in the divider.
A brief comparison of the three DFT schemes is shown in Table IX.
93
Table IX. Comparison of DFT schemes.
DFT Area cost Test time
Scheme 1 high short
Scheme 2 low short
Scheme 3 low medium
2. DFT Evaluation and Optimization
The overall ﬂow of the evaluation and optimization of a given DFT scheme is shown in
Fig. 41. The most straightforward way to evaluate a DFT scheme is to perform Monte-
Perform Monte-Carlo PLL simulations
using parametric Verilog-A models;
Collect samples of specifications
Perform parameter reduction to 
identify a reduced set of key Verilog
parameters: Zv
For a given DFT scheme: 
Perform a few simulations to build
a nonlinear regression model for each
measurement: Ti= fi(Zv) in Zv
Use the above regression models to
quickly generate large samples of 
measurements for each Ti
Construct a regression model relating
the specifications with the 
measurements
Compute sensitivities of specifications
w.r.t. measurements
Have achieved high 
sensitivities? End
Adjust DFT setups
Fig. 41. Evaluation and optimization a DFT scheme.
Carlo simulations and examine the ability of using the speciﬁc DFT measurements to
predict the pass/fail status of the design. Since a set of few hundred Monte-Carlo PLL
simulations may take tens of hours to complete, more runtime eﬃcient approaches
are needed, especially for the optimization purpose.
94
a. Identiﬁcation of Key System Level Variation Sources
As shown in Fig. 41, prior to the evaluation of any DFT scheme, Monte-Carlo simula-
tions utilizing pre-characterized parametric Verilog-A models are conducted to collect
a set of PLL performance samples by sampling the underlying process variations. Al-
though being expensive, this process presents a one-time cost since it is only used to
establish the correspondence between the device-level variations and variations in the
system performances.
To facilitate eﬃcient subsequent DFT evaluation and optimization steps, it is
desired to identify a potentially smaller set of key system-level variation sources that
contribute to most of speciﬁcation variations. If such goal is achieved, the following
steps can be more eﬃciently conducted over a compressed parameter space consisting
of the most critical sources of variation. This parameter dimension task can be
achieved by applying the design-speciﬁc dimension reduction technique described in
Chapter II. More speciﬁcally, by utilizing the simulation data collected in the prior
step, the RRR-based parameter dimension reduction is applied to identify a small
set of new critical system-level variation sources, 푍푣, which are linear or nonlinear
combinations of a potentially large set of various behavioral-level model parameters
of all the PLL building circuit blocks.
b. DFT Evaluation and Optimization
The optimization of a given DFT scheme with 푛 digital outputs is illustrated in the
second half of Fig. 41. Since a DFT scheme may be evaluated many times under
diﬀerent setups (e.g., the time interval within which the states of the frequency divider
are read out) within the optimization loop, the correlation between the DFT schemes
and the PLL performance must be eﬃciently conducted. This goal is achieved by
95
utilizing the critical sources of variations, 푍푣, identiﬁed in the previous step.
Noticing that 푍푣 only contains a small set of variations, a nonlinear empirical
model relating each measurement 푇푖 of the given DFT scheme and 푍푣 can be rather
eﬃciently generated. This is achieved by conducting a few Verilog-A based PLL sim-
ulations at diﬀerent 푍푣 samples and performing nonlinear regression: 푇푖 = 푓푡푣(푍푣).
Note that this step does not incur a high simulation cost since regression models
are only built over a low-dimension parameter space represented by 푍푣. Using these
easily obtained regression models, a large set of samples for each 푇푖 can be eﬃciently
generated. By examining the measurement data and the pre-computed system per-
formance data, sensitivities measures can be computed to evaluate the eﬀectiveness
of the DFT scheme as follows. The overall sensitivity of the 푖-th system performance
푆푖 with respect to a given DFT scheme is deﬁned as
푀푖 =
∣∣∣∣∣∂푆푖∂푇1
∣∣∣∣∣+
∣∣∣∣∣∂푆푖∂푇2
∣∣∣∣∣+ ⋅ ⋅ ⋅+
∣∣∣∣∣ ∂푆푖∂푇푛
∣∣∣∣∣ , (4.24)
where 푛 is the number of total digital outputs in the DFT, and ∂푆푖
∂푇푗
is the sensitivity of
푆푖 with respect of the 푗-th digital output. All these sensitivities in the above equation
are obtained via least square ﬁtting based on the measurement and performance data
mentioned before. Note that ∂푆푖
∂푇푗
is normalized with respect to the variance of 푇푗 .
For a given DFT scheme, eﬃcient optimization is conducted to ﬁnd an optimal setup
that leads to a highest overall sensitivity.
Once an optimal DFT scheme has been identiﬁed, a more accurate correlation
model that is capable of accurate pass/fail chip prediction using the corresponding
DFT measurements is extracted. To capture the potential nonlinear correspondence
between the design performances and the measurements, Support Vector Machine
(SVM) is adopted as an accurate classiﬁer. Support Vector Machine (SVM) [45] is a
powerful method to build highly nonlinear multivariate regression/classiﬁcation mod-
96
els. In SVM regression, we consider a set of training data {(푥1, 푦1), (푥2, 푦2), ⋅ ⋅ ⋅ , (푥푛, 푦푛)},
where 푥푖 is the input vector and 푦푖 is the corresponding output. The input 푋 is
mapped into a high dimensional feature space using nonlinear transformation, then
a best ﬁtting function is constructed in this feature space as
푦 = 푓(푥) = 휔 ⋅ 휙(푥) + 푏 (4.25)
where 휙 is the nonlinear transformation, 푏 is the bias term, and 휔 represents the model
parameters to be decided. Based on this nonlinear function 푓(⋅), we can classify the
chip as faulty or not with the DFT circuit outputs.
3. Optimization Example
We demonstrate the application of the proposed DFT development methodology with
a PLL design example. The PLL circuit and proposed on-chip test schemes are
implemented in 90-푛푚 CMOS technology, speciﬁcations of the PLL are listed in
Table X.
Table X. PLL speciﬁcations.
Technology 90 푛푚 CMOS
Supply Voltage 1.2 V
Power Consumption 0.98 mW
Reference Signal 10.7 MHz
Center Frequency 1.37 GHz
Frequency Divider Ratio 128
We consider three system performances. The ﬁrst one is the startup time from
power on to the locked mode, which is actually the lock time from the zero frequency
condition to the operation frequency. We denote it as 푙표푐푘푡푖푚푒1. The second spec-
97
iﬁcation 푙표푐푘푡푖푚푒2 is mainly targeted for the dynamic behavior of the PLL, which
is measured as the time needed from the locked mode to the new locked mode with
half output frequency. The third speciﬁcation considered is the maximum frequency
푚푎푥푓푟푒푞, which determines the highest clock frequency that the PLL can generate.
The device-level parameter variations with spatial correlations are modeled following
the work in [50, 51]. We use multivariate Gaussian distributions to model vari-
ous transistor parameter variations such as the variations of the threshold voltages.
For each variational parameter, the variance is set to 3휎 = 10% for 90-푛푚 CMOS
technology [27].
a. Performance Modeling
The frequency versus control voltage curve of the VCO is extracted using the eﬃcient
modeling framework in Chapter II. First we simulate the VCO for a few clock cycles
and gather the time-domain output response as 푌 . Then RRR is applied to get
a reduced parameter set 푍 to represent the important device-level parameters. A
parametric model of the VCO in terms of 푍 is then built. To model the statistical
characteristics of the VCO accurately, a 6-th order polynomial ﬁtting is used to ﬁt
the output frequency vs. control voltage curve.
The Verilog-A models for other building blocks are extracted in a similar fashion.
Speciﬁcally, the charge pump model is generated using a 3-rd order polynomial in the
output voltage for each charge-up/down current. There are a total of 17 Verilog-A
model parameters extracted for the complete PLL design.
b. Test Scheme Evaluation and Optimization
Prior to the DFT evaluation and optimization, we perform Monte-Carlo simulations
to collect a large set of system performances samples. This is accomplished using our
98
eﬃcient parametric modeling infrastructure. The distribution of each speciﬁcation is
shown in Fig. 42. The pass/fail decision is based on whether all speciﬁcations can
meet the performance windows or not. The targeted speciﬁcation windows are set as
follows: 푙표푐푘푡푖푚푒1 < 2.5휇푠, 푙표푐푘푡푖푚푒2 < 2.5휇푠 and 푚푎푥푓푟푒푞 > 2.085퐺퐻푧.
1 2 3 4 5
0
100
200
300
400
Locktime1 (us)
N
um
be
r o
f S
am
pl
es
0 1 2 3 4 5
0
100
200
300
400
Locktime2 (us)
N
um
be
r o
f S
am
pl
es
2.08 2.085 2.09 2.095 2.1
0
100
200
300
400
Maxfreq (GHz)
N
um
be
r o
f S
am
pl
es
Fig. 42. Distribution of system performances.
As discussed previously, each DFT scheme is based on reading out the states
of the frequency divider and the corresponding digital codes are used to identify
parametric failures. Since the bit length of each digital code is limited by the number
of D ﬂip-ﬂops in the frequency divider, overﬂow may occur. When overﬂow happens,
we add 2푚 to compensate it, where 푚 is the number of D ﬂip-ﬂops in the frequency
divider. The proposed DFT evaluation and optimization methodology is applied to
each of three DFT schemes. At the end of the optimization, we rank the eﬀectiveness
of each test scheme by computing the total DFT sensitivity to each speciﬁcation, as
listed in Table XI. Our analysis results indicate the scheme 1 is the most eﬀective
Table XI. Sensitivities of system performance to DFT schemes.
Spec. 1 Spec. 2 Spec. 3
DFT Scheme 1 4.7439 2.457 1.8836
DFT Scheme 2 2.3752 0.6124 0.0905
DFT Scheme 3 2.6324 1.4951 1.2698
99
scheme since it has the largest sensitivities for all the speciﬁcations. Scheme 2 is
identiﬁed as the least eﬀective scheme.
c. DFT Scheme Veriﬁcation
The three DFT schemes have been ranked in the previous section using sensitivity
analysis. To verify this result, for each scheme, a SVM model is extracted to pre-
dict the pass/fail status of the chips based on the corresponding DFT outputs. 400
Monte-Carlo simulation samples are generated by conducting PLL system simulation
using Verilog-A macromodels. These data are used to generate the SVM model. To
evaluate the eﬀectiveness of the each scheme more reliably, another 100 Monte-Carlo
simulations are carried out and used as the test data for checking the accuracy of the
SVM model. The pass/fail predictions achieved through the three SVM models are
compared against the simulated chip performances, as shown in Fig. 43. Here, the
predictions made through the simulation are labeled as “direct measurement”, and
+1 indicates a chip being classiﬁed as “fail” while −1 indicates the opposite. Some
oﬀset has been applied to each DFT scheme when fail chip detected (at points of +1)
for easier comparison.
0 10 20 30 40 50 60 70 80 90 100
−1
−0.5
0
0.5
1
1.5
Chip Index
Pa
ss
/F
ai
l
Direct measurement
DFT scheme 1
DFT scheme 2
DFT scheme 3
Fig. 43. Pass/fail predictions of three DFT schemes.
100
From Fig. 43 we can see that DFT scheme 1 only has only 1 misclassiﬁcation.
The performance of DFT scheme 2 is veriﬁed to be poor as it can only detect two
faulty chips which may have largest variations. DFT scheme 3 can detect more failures
than scheme 2 but is still not as good as scheme 1. The performance of each DFT
scheme are summarized in Table XII. The Monte-Carlo simulation results conﬁrm
the validity of our sensitivity analysis.
Table XII. Comparison of DFT schemes to identify faulty chips.
Defect Yield Overall
escape loss accuracy
DFT Scheme 1 0.0% 1.0% 99%
DFT Scheme 2 13.0% 0.0% 87%
DFT Scheme 3 7.0% 3.0% 90%
The faulty chips that can not be detected by various DFT schemes are mostly
near the system speciﬁcation boundaries. This can conﬁrmed by examining the loca-
tions of mis-predictions in the speciﬁcation space. Such an analysis is done for schemes
1 and 3 in Fig.44 and Fig.45, where the dashed cube represents the acceptance region
of the speciﬁcations.
For an eﬀective DFT scheme, the change of process parameters shall be reﬂected
in the digital outputs such that the potential parametric failures may be detected.
To examine this, we consider two speciﬁc perturbed PLL circuits. In the ﬁrst circuit,
relatively small parametric variations are introduced in the VCO and the circuit
is veriﬁed to be meeting all the speciﬁcations. While in the second circuit, large
perturbations are introduced into the charge-pump, causing parametric failures. We
show the digital outputs produced by DFT scheme 1 for the nominal circuit and the
two perturbed circuits in Fig. 46. As can be seen, the digital outputs of the ﬁrst
101
0 1 2
3 42
4
2.08
2.085
2.09
2.095
2.1
2.105
Locktime1 (us)Locktime2 (us)
M
ax
fre
q 
(G
Hz
)
Correct prediction
Misprediction
Fig. 44. Chip prediction distribution for DFT scheme 1.
0 1
2 31
2
3
42.08
2.085
2.09
2.095
2.1
Locktime1 (us)Locktime2 (us)
M
ax
fre
q 
(G
Hz
)
Correct prediction
Misprediction
Fig. 45. Chip prediction distribution for DFT scheme 3.
102
circuit do not vary signiﬁcantly from those of the nominal circuit and the circuit is
classiﬁed correctly as a good chip by the DFT results. In contrast, the digital outputs
of the second circuit vary signiﬁcantly from the nominal values, which is also correctly
classiﬁed by the DFT scheme as a failing chip.
1 2 3 4 5 6 7 8 9
0
20
40
60
80
100
120
Readout index
D
ig
ita
l o
ut
pu
t
Norminal circuit
Charge pump variation
VCO variation
Fig. 46. Digital output changes due to process variation for DFT scheme 1.
d. DFT Trade-oﬀ Analysis
We further look into the trade-oﬀ between the accuracy and the number of digital
outputs for the considered DFT schemes. This is important since fewer test codes
will correspond to a shorter test time, if a similar accuracy can be achieved. This
trade-oﬀ analysis is conducted for every scheme in Fig. 47. It can be observed that
for a small number of digital outputs, the accuracy of scheme 2 is actually higher
than that of scheme 3. It can be also seen that the accuracy of scheme 3 becomes
quickly saturated as the number of outputs increases. Under all the cases, scheme 1
is always the optimal choice.
C. Summary
In this chapter we address the problems of testing analog/mixed-signal circuit per-
formances with consideration of process variations. We propose to use Volterra series
103
3 4 5 6 7 8 9
0
0.05
0.1
0.15
0.2
Number of test codes
Er
ro
r
DFT scheme 1
DFT scheme 2
DFT scheme 3
Fig. 47. Error v.s. number of test codes.
to analyze the nonlinear circuit behaviors of Sigma-Delta ADCs in the frequency
domain. This analysis allows us to formally relate the linearity metrics with eas-
ily obtained harmonic distortions measurements. The closed-form analytical models
have been derived to allow lost-cost linearity test based on simple HD measurements.
In order to further improve the prediction accuracy, a simulation-based model gener-
ation approach is adopted which is enabled by the fast LUT-based ADC simulation
technique presented in Chapter II. The good accuracy of the proposed prediction
models is veriﬁed by the close ﬁtting of the INLs obtained by the proposed linearity
test method and the actually simulated values.
Design-for-test development methodology targeting at the detection of paramet-
ric failures in charge-pump PLLs is presented. Such methodology is enabled by de-
tailed bottom-up macromodeling which leads to a scalable parametric PLL simula-
tion infrastructure based on Verilog-A behavioral modeling presented in Chapter II.
In conjunction with the powerful dimension reduction techniques that are employed
to deal with high process variation space, eﬃcient PLL DFT scheme evaluation and
optimization are conducted. As a demonstration of the proposed techniques, three
DFT schemes are evaluated and optimized. Monte-Carlo simulations are performed,
which conﬁrm the results obtained from our DFT methodology. Furthermore, trade-
104
oﬀ analysis concerning detection accuracy and test time for the DFT schemes are also
performed.
105
CHAPTER V
DESIGN CASE: ALL-DIGITAL PLL
In this chapter we apply the modeling, optimization and testing techniques discussed
in the previous chapters in a complicated all-digital PLL (ADPLL) design case. We
start with understanding the operation of ADPLL system, then implement the build-
ing blocks in transistor level. System optimization considering the uniqueness of
digital intensive implementation and system reconﬁgurations is performed with eﬃ-
cient variation-aware block models. On-chip performance detection and performance
tuning functions are also implemented to detect parametric failures and carry out
performance compensation.
A. System Background
The concept of ADPLL was ﬁrst proposed for clock generation with integer-N fre-
quency multiplication [52, 53]. With the digital interface, PLL frequencies can be
controlled directly by digital logic blocks or microcontrollers. Since the clock gener-
ation does not require very accurate frequency steps, most of these systems featured
ring oscillators with integer-N frequency multiplication. The recent proposed ADPLL
designs are mainly targeted at wireless communication applications, which have tight
system speciﬁcations for jitter, power and frequency resolution [54, 55, 56, 57]. Most
of these systems feature fractional-N frequency multiplication and employ LC-tank
oscillators to achieve good phase noise performances. In this chapter, we focus on the
design of fractional-N ADPLLs with LC-type oscillator cores.
An example of ADPLL block diagram is shown in Fig. 48. The system output
frequency 퐹표푢푡 is determined by the frequency division ratio (FDR). Suppose 푁푖 is the
integer part of the frequency division ratio and 푁푓 is the fractional part, the system
106
output frequency 퐹표푢푡 equals to 퐹퐷푅 = 푁푖+푁푓 times the reference signal frequency
퐹푟푒푓 .
 




	

 











ﬀ
ﬁ

ﬂ

ﬃ
 
!
"
!
#
!
$
Fig. 48. All-digital PLL system block diagram.
The core of ADPLL is digital controlled oscillator (DCO), of which the frequency
is controlled by digital codes. The control words for DCO come from the phase
diﬀerence of reference signal and the output signal. During a reference clock period,
clock accumulator (ACU) is used to count the integer number of the output clock
퐹표푢푡 cycles in this reference clock period, and time-to-digital converter (TDC) is used
to calculate the fractional part.
If PLL is stabilized, the accumulated clock cycle number should be equal to the
frequency division ratio.
퐴퐶푈 [푘]−퐴퐶푈 [푘 − 1] + 푇퐷퐶푔푎푖푛 ⋅ (푇퐷퐶[푘]− 푇퐷퐶[푘 − 1]) = 푁 푖 +푁푓 (5.1)
where 퐴퐶푈 [푘] and 푇퐷퐶[푘] are ACU and TDC outputs in the 푘-th clock cycle,
푇퐷퐶푔푎푖푛 is the scaling factor to convert TDC outputs from digital measurement
results to the frequency domain. Suppose the resolution of TDC is 푇푟푒푠 and the
output clock period is 푇표푢푡, TDC gain can be calculated as [58]
푇퐷퐶푔푎푖푛 =
푇푟푒푠
푇표푢푡
= 푇푟푒푠 ⋅ 퐹표푢푡 (5.2)
If the counted output clock cycles do not meet the required frequency division
ratio, the diﬀerence detected by ACU and TDC will be passed through loop ﬁlter to
107
adjust DCO frequency. The digital ﬁlter in the loop is to make the frequency tuning
smooth. In ADPLL systems, the digital implementation of loop ﬁlters can eliminate
large capacitors needed in conventional charge-pump PLL and is robust to process
variations.
In real world, there always exist nonlinearities and noises in building blocks like
thermal noise in DCO and limited time resolution in TDC. These nonlinearities are
noise-shaped by loop ﬁlters at system output as shown in Fig. 49. Careful design
of loop characterizations can reduce the total phase noise by balancing the contribu-
tions from diﬀerent noise sources, so the ﬁlter design is very critical for the ADPLL
performance.
 
 


 

	


	


	



	



OUTφ
Fig. 49. Phase noise contributions of TDC and DCO.
ADPLL system design is mainly focused on constructing key building blocks.
There are diﬀerent topologies for each building block and many trade-oﬀs to consider.
For example, we can use a high-resolution TDC for better frequency detection with
more power burned, but whether it is a good choice really depends on system spec-
iﬁcations. The trade-oﬀ analysis becomes more complicated when process variations
are taken into consideration. The varying transistor characterizations make system
performances statistical variables, so the designs tuned in nominal operation may not
be optimal under process variations. To solve these problems, we ﬁrst explore pos-
sible architectures of building blocks and evaluate their system performances. Then
we perform yield-aware transistor tuning to optimize circuit design under speciﬁed
system conﬁgurations.
108
B. System-level ADPLL Design
Traditionally, PLL designers would choose a topology for PLL based on the open loop
transfer function then set the gain and pole/zero locations to achieve the required
phase/gain margin [59]. The choice of building blocks and loop ﬁlter characterizations
is the ﬁrst step in ADPLL design. Since the number of possible choices is limited,
it is beneﬁcial to evaluate the system performances with these discretized topology
parameters as the starting point.
1. System Performance Analysis
It is very diﬃcult or impossible to use SPICE-like simulators to evaluate ADPLL
system performances due to the extraordinarily long computation time, which may
take weeks or months for a single transient run. Transfer function based simulation
approach is proven to be accurate and eﬃcient to evaluate phase noise [7, 59, 60] if
block noises are modeled accurately.
Since ADPLL operates in the digital domain, the corresponding transfer function
is 푧-operator. A Backward-Euler transformation from 푧-domain to 푠-domain can be
written as in Eqn. 5.3 for small 휔 [7]. This approximation is valid as long as the
frequencies of interest are much smaller than the sampling rate, which is 퐹푟푒푓 in this
case. It is widely accepted that this linear approximation holds as long as the PLL
bandwidth 퐹퐵푊 is at least 10 times smaller than the sampling rate [61]. Here the
PLL bandwidth is set to a few ten KHz while the reference signal frequency is over
ten MHz, so the transformation in Eqn. 5.3 holds the accuracy.
푧 = 푒푗휃 ≈ 1 + 푗휃 = 1 +
푗휔
퐹푟푒푓
= 1 +
푠
퐹푟푒푓
(5.3)
The noises of building blocks can be modeled and injected to ADPLL systems
109
as in Fig. 50 [7]. Noises will go through high-pass or low-pass shaping depending on
the diﬀerent locations they are injected in.
1
s
outφ
,n TDCφ ,n DCOφ
1Z −
   
 

	





 





REFφ
,n REFφ N

	
refF
s




 


Fig. 50. 푠-domain linear ADPLL noise model.
Suppose we have loop ﬁlter transfer function 퐻퐿퐹 (푓), the TDC noise at the
system output 푁표,푇퐷퐶 can be written as
푁표,푇퐷퐶 =
∣∣∣∣∣ 퐻퐿퐹 (푓)1 +퐻퐿퐹 (푓)
∣∣∣∣∣
2
휙푛,푇퐷퐶 (5.4)
where 휙푛,푇퐷퐶 is the total TDC noise. The DCO noise at the system output 푁표,퐷퐶푂
can be written as a function of total DCO noise 휙푛,퐷퐶푂
푁표,퐷퐶푂 =
∣∣∣∣∣ 11 +퐻퐿퐹 (푓)
∣∣∣∣∣
2
휙푛,퐷퐶푂 (5.5)
Similarly the reference signal noise contribution 푁표,푅퐸퐹 is
푁표,푅퐸퐹 =
∣∣∣∣∣푁 ⋅퐻퐿퐹 (푓)1 +퐻퐿퐹 (푓)
∣∣∣∣∣
2
휙푛,푅퐸퐹 (5.6)
where 푁 is the frequency division ratio and 휙푛,푅퐸퐹 is the reference signal noise.
Total system phase noise 푁표,푡표푡 can be calculated by adding these noise contri-
butions together
푁표,푡표푡 = 푁표,푇퐷퐶 +푁표,퐷퐶푂 +푁표,푅퐸퐹 (5.7)
As can be seen in Eqn. 5.7, loop characterizations and block noises determine
110
system noise performances. Other system performances like power and area are much
easier to be included in the system level, as long as we know the corresponding
numbers of building blocks, they can be simply added in the system level. So in the
following sections we will put our focus on noise performance modeling.
2. Loop Filter
The ﬁlters in ADPLL systems can be conﬁgured as Type-I or Type-II, which have
diﬀerent phase noise shaping capabilities. The transfer function of type-I ﬁlter can
be considered as a direct feedforward path with a tunable loop gain 훼. Type-II ﬁlter
has an extra pole and its transfer function can be written as [7]
퐻퐿푃퐹 (푓) = 훼 +
휌 ⋅ 퐹푅퐸퐹
푠
(5.8)
where 휌 is the ﬁlter coeﬃcient.
High order ﬁlters can also be added for better phase noise suppression. IIR ﬁlters
are typically used for the simplicity [7]. To ensure loop stability, cascading of single-
pole IIR ﬁlters can be used. For a 푀-th order IIR ﬁlter, the transfer function can be
expressed as
퐻퐼퐼푅(푓) =
(
1 + 푠/퐹푅퐸퐹
1 + 푠/ (휆퐹푅퐸퐹 )
)푀
(5.9)
where 휆 is the IIR ﬁlter gain.
Digital ﬁlters are superior to traditional analog implementations since they are
robust to process variations, so we can save the eﬀort of modeling of process variations
for loop ﬁlters. For loop coeﬃcients/gains, numbers in the powers of 2 are much easier
to be implemented in digital circuits, so we select these discretized values in the system
design space exploration.
111
3. Time-to-Digital Converter
Time-to-digital converters in ADPLL behave like phase detectors in charge-pump
PLLs to detect the frequency/phase diﬀerences between input signals. The basic
principle of TDC is that the fast input signals are passed through delay chains and
the comparators are used to detect the point when the two signals exchange leading.
The nonlinearities of TDC are mainly due to the limited time resolution, which equals
to the minimum cell delay 푇푟푒푠. It behaves quite like the quantization noise in analog-
to-digital converters (ADCs). For an ideal TDC, the noise power can be written as
[58]
휙푛,푇퐷퐶 =
1
12 ⋅ 퐹푅퐸퐹
(
2휋 ⋅ 푇푟푒푠
푇표푢푡
)2
(5.10)
As can be seen from Eqn. 5.10, TDC noise power is proportional to the square
of TDC time resolution 푇푟푒푠, which is typically a buﬀer or inverter delay. So it is of
interest to use higher resolution TDC implementation to reduce associated noise with
a penalty of extra power [57, 62]. The choice of ﬁne resolution TDC is also treated
as a topology selection variable in the ADPLL system design exploration.
Process variations will cause nonlinear time-to-digital conversion, which makes
the TDC noise level increase and causes additional noise in the ADPLL system output.
We will discuss this eﬀect and introduce a numerical method to calculate the inﬂuence
in the next section.
4. Digital Controlled Oscillator
The nature of DCO is still oscillator, so the ﬂicker noise and thermal noise inevitably
aﬀect DCO performances. These natural oscillator noises can be characterized us-
ing simulators like SpectreRF [63] and their inﬂuences at the system output can be
calculated using Eqn. 5.5.
112
As the frequency control of oscillator is implemented by the capacitance tuning,
the minimum varactor capacitance switching determines the frequency resolution.
Due to the size of varactors, the minimum frequency resolution achievable by switch-
ing varactors is a few ten KHz [7]. Better frequency resolutions can be achieved by
using digital-to-analog converters (DACs) to control the unit varactor capacitance
continuously [56] or applying Sigma-Delta modulation (SDM) for varactor banks [7].
For the DAC based approach, the extra noise power can be written as [60]
휙푛,퐷퐴퐶(푓) = 4퐾푇푅푒푞
1
1 + (푓/푓푝)
2 (5.11)
where 푅푒푞 and 푓푝 are the equivalent resistance and corner frequency of the DAC. The
noise power of the SDM approach is [7]
휙푛,푆퐷푀(푓) =
1
12
(
푓푟푒푠
푓
)2
1
푓푑푡ℎ
(
푠푖푛푐
푓
푓푑푡ℎ
)2
(5.12)
where 푓푟푒푠 is the DCO frequency resolution and 푓푑푡ℎ is the dithering frequency. The
selection of DAC or SDM approach is used as a discretized system design parameter.
C. Block Modeling in ADPLL
In this section, we model the performances of TDC and DCO with diﬀerent design
variables and process variations. The models presented in this section are used in
both transistor-level ﬁne tuning and system-level performance distribution analysis.
1. TDC Modeling
Process variations will cause mismatch between TDC cells, which in turn makes
the time-to-digital transfer curve nonlinear. The process variation induced nonlinear
transfer curve will cause extra phase noise in addition to the inherent quantization
113
noise in Eqn. 5.10 [64], so careful analysis is needed for the TDC modeling under
process variations.
From Eqn. 5.1 we can see that TDC output 푇퐷퐶[푘] travels through all the
digital codes with equal probabilities as long as 푁푓 ∕= 0. The accurate time steps
in TDC transfer curve can be obtained using the transistor-level simulation. With a
determined TDC cell delay distribution, we can calculate the TDC noise numerically.
As shown in Fig. 51, we calculate the power of the statistical variable which deviates
from the ideal transfer curve, and add its power along the whole range of input time
diﬀerences to get the total TDC noise.
 

  






 



 


 


	




 


 





ﬀ
ﬁ ﬂ

ﬃ ﬃﬀ
!
ﬂ"
#$
ﬀ
!%
ﬀ& 
'

%
'(
&
ﬂ
!
)&
iL i iT L−
2 2
0 0
i i iL T L
iP e de e de
−
= +∫ ∫
	


*





+

,
Fig. 51. Modeling of TDC noise.
Detailed analysis and calculation of TDC power are presented as follows. For the
푖-th TDC step, we start from the interception point of the ideal transfer curve and
the real transfer curve and set the two ends as 퐿푖 and 푇푖 − 퐿푖. With the assumption
that the input time is evenly distributed, the noise within this time step equals to
the probability of 푒 deviates from the ideal point
푃푖 =
∫ 퐿푖
0
푒2푑푒+
∫ 푇푖−퐿푖
0
푒2푑푒 (5.13)
114
The total noise of the TDC can be calculated by adding noise in every TDC delay cell
and averaging it over the whole TDC input range which equals to the output signal
period
휙푛,푇퐷퐶 =
1
푇표푢푡
∑푀
푖=1
푃푖 (5.14)
here 푀 is the total number of TDC cells.
There are a few constrains to be noticed when using Eqn. 5.13 and 5.14. First,
퐿푖 and 푇푖 are related geometrically, Eqn. 5.15 shows the constraint when the ideal
transfer curve has a unity slope. Second, the ideal transfer curve is “best-ﬁtting
curve” the coeﬃcients of which can be calculated by minimizing the total noise in
Eqn. 5.14. Third, since the total TDC cell number 푀 = 푇표푢푡/푇푎푣푔 may not be an
integer number, the last digit of TDC will contribute higher noise. Here 푇푛표푟푚 is the
averaging single TDC cell delay.
(푇푖−1 − 퐿푖−1) + 퐿푖 = 푇푎푣푔 (5.15)
As discussed in the previous section, using ﬁne resolution TDC can cut down
the TDC noise. Another option for noise reduction is to search for sets of transistor
designs which can reduce TDC cell mismatches under process variations. So we
model TDC performances with design and process variables targeting for statistical
optimization. A formal TDC model is deﬁned as
−→
푃 푇퐷퐶(
−→
푇 푇퐷퐶 ,
−→
퐷푇퐷퐶,
−→
푉 ) (5.16)
where
−→
푃 푇퐷퐶 represent TDC performances,
−→
푇 푇퐷퐶 account for the TDC topology
selection,
−→
퐷푇퐷퐶 are transistor sizes in the TDC design and
−→
푉 stand for process
variations.
115
2. DCO Modeling
Inherent DCO noise contributes a signiﬁcant part of total system noise. Generally
phase noise/jitter in DCO drop as the current through LC-tank increases (SNR in-
creases with higher signal power), but the phase noise will go up again when the
current is too large and output signals start clipping. The modeling of DCO phase
noise is illustrated in Fig. 52.
 

 


	










 



 
ﬀﬁ ﬂﬃﬀﬁ
 
!
"
#
( , , )DCO DCO DCOP T D V
Fig. 52. Modeling of DCO noise.
Larger biasing current results in higher power, this trade-oﬀ needs to be consid-
ered in the ADPLL optimization. So we model the DCO phase noise and power in
terms of biasing current and transistor sizes. Process variations also have an impact
on DCO noise performances and should be included in the performance model. We
formulate the DCO model as
−→
푃 퐷퐶푂(
−→
푇 퐷퐶푂,
−→
퐷퐷퐶푂,
−→
푉 ) (5.17)
where
−→
푃 퐷퐶푂 represent DCO performances,
−→
푇 퐷퐶푂 account for the choice of SDM
or DAC to increase the ADPLL frequency resolution, 퐷퐷퐶푂 represent the transistor
sizes and biasing current and 푉 stand for process variations
116
D. Yield-aware ADPLL Optimization
With ADPLL building block models and transfer functions obtained in the previous
sections, we introduce the yield-aware optimization framework for optimal ADPLL
design. Costly system performance analysis are replaced by the eﬃcient simulation
framework using 푠-domain circuit models. We ﬁrst evaluate the performances of
diﬀerent topologies for each building block, then perform the reﬁned device-level
variable tuning scheme to achieve the optimal system performances.
1. Topology Selection
Since there are diﬀerent choices of circuit blocks to build an ADPLL system, it is of
beneﬁt to choose the suitable topology combinations for ADPLL systems ﬁrst. These
topology selection variables are discretized numbers
−→
푇 = {푇퐷퐶푂, 푇푇퐷퐶 , 푇퐹 푖푙푡푒푟}. The
primary goal in topology evaluation stage is to ﬁnd a reduced set of possible topolo-
gies for faster localized design parameter tuning, and generate a picture of system
performance trade-oﬀs.
There are always trade-oﬀs in circuit designs. It is almost impossible to achieve
a solution of best performance for each speciﬁcation. To solve this problem, we
construct multi-objective system cost functions to balance diﬀerent performance re-
quirements. We explore system performances in the discretized topology space by
evaluating the cost functions and then achieve the optimal solutions. We do not
stop at the design combination with the lowest cost function as the only optimum
solution. This is because some “near-optimal” discretized design points may have
better statistical performances when we perform block tuning and take into the con-
sideration of process variations. So a few discretized system design choices are kept,
and ﬁne tuning of building blocks are carried out for these points. Retaining a few
117
tens of discretized design choices in the ﬁne tuning stage would be good enough for
optimization in practice.
2. Yield-aware Fine Tuning
After the topology selection, we have a few optimized discretized system topologies
to choose from, denoted as
−→
푇 표푝푡. In this stage we perform the tuning of TDC and
DCO within the reduced discretized system space.
Due to the process variations, system performances also become statistical vari-
ables. For 푘-th statistical system performance 푃푘 (smaller the better), suppose we
need a yield of 푌푘, then the yield-aware performance 푃
푌푘
푘 satisﬁes the following prob-
ability condition
P{푃푘 ≤ 푃
푌푘
푘 } = 푌푘 (5.18)
Eqn. 5.18 implies that for 푃푘, the best achievable performance value is 푃
푌푘
푘 when yield
level 푌푘 is required. 푃
푌푘
푘 is considered as the yield-aware 푘-th system performance
and to be used in the overall system optimization.
Since we perform ﬁne tuning for each discretized system conﬁguration set, the
yield-aware ﬁne tuning can be formulated as minimizing the cost function in the
building-block space within
min퐶표푠푡
(−→
푃 푇퐷퐶(
−→
퐷푇퐷퐶),
−→
푃 퐷퐶푂(
−→
퐷퐷퐶푂)
)
∣−→
푇 표푝푡
(5.19)
where design variables of TDC and DCO models are selected as input variables to
minimize the statistical system cost function. Any suitable optimization packages
can be used for this minimization problem. In this chapter, we employ the multi-
ple coordinate search algorithm in [34]. The proposed two-step optimization ﬂow is
summarized and illustrated in Fig. 53.
118
 



	






	












 

ﬀﬁ
ﬂﬃ
ﬁ

 
 
!"
#





ﬁ 

 

ﬀﬁ
ﬂ





$ ﬀ



#
%
&'()* +, -./)0
1
.&
2
&. 3&4+5, 46')&
7
8
7
9
7
:
7
8
7
9
7
:
;
<=<> ?
@A
B
>
;
<=<> ?
@A
B
>
;
<=<> ?
@A
B
>
;
<=<> ?
@A
B
>
,
min ( ), ( ) |
TDC DCO
TDC TDC DCO DCO C CCost P D P D
→ → → →
C D
E F
G H
Fig. 53. Yield-aware optimization ﬂow for ADPLL.
E. Adaptive Self-tuning ADPLL Design
System performance monitoring needs to be addressed before any compensation block
can be designed. Thanks to the digital implementation, all the control signals in
ADPLL systems are in the digital domain and easy to be processed. When the
ADPLL loops are stabilized, the frequency errors measured by the TDCs are found
to possess of strong correlation to the system phase noises [65]. The TDC itself
behaves like a jitter measurement block since it detects the diﬀerences between the
reference signal edges and the output signal edges. So we use the RMS value of
frequency diﬀerences (퐹퐸푅푀푆) as the indicator of system jitter performances. For
simplicity, the RMS value is calculated for every 64 clock cycles.
The goal of using 퐹퐸푅푀푆 as system jitter indicator is to facilitate cheap perfor-
mance monitoring. A predeﬁned threshold number for frequency error (퐹퐸푇푅퐸) is
calculated in the design stage by statistical simulation and can be further calibrated
using silicon measurement results. If the frequency error is large than the threshold
value, it indicates that the system needs to be reconﬁgured to enhance the perfor-
mances. As mentioned earlier, the power consumption for the ﬁne resolution TDC
119
is pretty high, so in the system level design we need to perform trade-oﬀ analysis to
see whether the ﬁne resolution TDC should be turned on or oﬀ. Loop conﬁgurations,
including loop gain and loop ﬁlter orders can be modiﬁed without power penalty, so
they are free of tunability. The adaptive PLL system diagram is shown in Fig. 54.
The DCO biasing current is calibrated oﬀ-chip for the current design.
 





	



	







 ﬀ
ﬁ







ﬁ
ﬂﬃ
 !"
#$$%&
'
(

 ﬀ
ﬁ
ﬀ
ﬁ

 ﬀ
ﬁ


 )*



 )*

+,
-

*

./ 0
 !1!" 2
0
 
/ 3
"
45
 
0ﬂ
46
2
0
 
 7&
'
,
-8
9
	
:

;<+
=
->
?
+@A
..


ﬁ
B
CDEF
DCCGC
H
GICJD
KLH
BM
ND
KLH
DSP
8

O
8
PP

*


Q


R*

)*



S
	

:
9:
*

	*

Fig. 54. Adaptive PLL system diagram.
The detailed logic control sequence of the performance self-compensation func-
tion is shown in Fig. 55. When the self-healing block detects 퐹퐸푅푀푆 > 퐹퐸푇푅퐸 ,
it ﬁrst reconﬁgures loop gain within [2−10, 2−9, 2−8, 2−7] and loop order within [1, 2],
a total of 8 diﬀerent combinations. For each new conﬁguration, we wait for 푇1 cy-
cles to let loop gets stabilized and calculate 퐹퐸푅푀푆 in the next 푇2 cycles. If the
minimum 퐹퐸푅푀푆 in these 8 conﬁgurations is lower than 퐹퐸푇푅퐸 , the compensation
algorithm will stop and save the conﬁguration with the lowest 퐹퐸푅푀푆 as the new
loop conﬁguration. If the best 퐹퐸푅푀푆 still higher than 퐹퐸푇푅퐸 , ﬁne-resolution TDC
will be powered on and the system will go through the 8 loop conﬁgurations. The
conﬁguration with the best 퐹퐸푅푀푆 will be saved to reconﬁgure the ADPLL system.
The performance monitoring and control function blocks are designed in Verilog
HDL and synthesized using ARM Artisan standard cell library. Total gate counts for
the block implementation is about 10,000.
120
   


	




 



 





	



 








 



ﬀ








 	ﬁ

ﬀ







ﬂ
ﬃ 
	ﬁ
 !







 


"

   #



 



$
	ﬀ

ﬀ








 %

	
%

  


"

   




 #

  



$ 



 
ﬀ
















&
ﬂ
ﬃ









 









 



	ﬀ

ﬀ









%

	
%

  

$


 ﬀ
#
ﬁ	



ﬀ








 	ﬁ
Fig. 55. Logic sequence of self compensation.
By employing the self-compensation operation in ADPLL systems, we can detect
the failure chips and make compensation to bring them back to meet the require-
ments. Employing this approach has the potential to achieve better overall system
performances than the synthesis of analog circuits by only sizing transistor sizes. Con-
ventional analog optimizations for yield enhancement tend to push designs to high
performance corners. This approach works since the chips with performance dropping
induced by process variations can be tuned to meet the speciﬁcations. However, as
discussed in [32] and [24], these improvements are often achieved with the sacriﬁce
of other performances. Fig. 56 illustrates one of the scenarios where power increases
and the overall yield may even drop if power speciﬁcation is tight.
'()
*
)
+, -./
01
/2+
(
3. 4
)
5
*
/
)67
*
)1(
'()
*
)
+, -
1
8
./ 4
)
5
*
/
)67
*
)1(
9
.
8
-./
01
/2+
(
3. 4
)
5
*
/
)67
*
)1(
9
.
8
-
1
8
./ 4
)
5
*
/
)67
*
)1(
:
-.3
;<
./+=.
-
1
8
./
;<
./+=.
-
1
8
./
)(
3/.+5.
>
?@
AB
@CDEF?
GH
IJ
C
J
K
D
IJ
B
E
L
MN
O
Fig. 56. Conventional yield-aware optimization.
The self-healing function in the adaptive ADPLL systems works by switching to
higher performance conﬁgurations only when process variations cause the chips fail
121
to meet the speciﬁcations, so we do not need to design the system in the high per-
formance and high power conﬁgurations to ﬁght for process variations. As illustrated
in Fig. 57, adaptation kicks in when the chips fail to meet the required performance
targets, which is statistically rare. The new circuit conﬁguration burns more power
only for the portion of chips not meeting the speciﬁcations, so the statistical power
consumption can be improved when compared with conventional yield-aware opti-
mization results.
 


 
	



 








 


 



 












	



 















 































ﬀﬁ
ﬂﬃ 
!"
#$
ﬂ
$
%
ﬃ
#$
ﬁ

&
$#'
ﬃ(ﬃ
"
#$
)

*
+
*
#
ﬂ
,
-.
/
0
1
/23
4567189-
:
1;
Fig. 57. Proposed yield-aware optimization using adaptive operation.
In the next section we look into the optimization of ADPLL systems by correctly
modeling the operations of adaptive function and including these information into
the system optimization ﬂow.
F. Optimization of Adaptive ADPLLs
1. Adaptive System Performance Calculation
The optimization of adaptive ADPLLs also depends on the variation-aware block
performance models for TDC, DCO and ﬁlter. The performance and yield calculation
in adaptive systems are diﬀerent from conventional systems since the adaptation
122
operation needs to be accounted. The system is reconﬁgured when the performance
monitoring block indicates failure to meet speciﬁcation for certain process variation
scenario. So the calculation of statistical performances in adaptive systems is essential
for the system optimization.
The algorithm to ﬁnd performance distributions under process variations for a
certain design parameter set
−→
퐷 and topology set
−→
푇 is illustrated in Algorithm 3. The
target of the algorithm is to ﬁnd the jitter and power distribution eﬃciently, which is
also implemented in hardware to perform ADPLL adaptation control. The main idea
is to ﬁnd the system conﬁguration with lowest power consumption while still meets
the required jitter speciﬁcation. The jitter calculation is based on the integration of
system noise [66].
As indicated in the algorithm, we ﬁrst check if the initial conﬁguration can meet
the required system speciﬁcation. Then we adopt the highest performance conﬁgura-
tion to see if it can meet the speciﬁcation. If the chip with conﬁguration
−→
퐶푚푎푥 meets
the speciﬁcations, we continue to search and save the conﬁguration with the lowest
power consumption, otherwise the chip is classiﬁed as failure since it can not work in
the best performance conﬁguration.
2. Optimization of Adaptive ADPLLs
We can perform diﬀerent optimization tasks for adaptive ADPLL systems. Here we
select the optimization target as to minimize overall system power while keeping the
yield at the required level. The statistical power and jitter distributions are calculated
using Algorithm 3, and the system optimization problem is formulated as
min 푝표푤푒푟(
−→
퐷)
푠푡. 푌 푖푒푙푑(
−→
퐽 ) ≥ 푌 푖푒푙푑푟푒푞,
−→
푇 ∈ {
−→
푇 ′표푝푡},
−→
퐶 ∈ {
−→
퐶 }
(5.20)
123
Algorithm 3 Statistical Performance Calculation for Adaptive ADPLL
Input: ADPLL conﬁguration set
−→
퐶 , Monte-Carlo process variable samples
−→
푉 = {푉1, 푉2, ⋅ ⋅ ⋅ , 푉푁}, initial conﬁguration set
−→
퐶 푖푛푖푡 after topology selection, jitter
speciﬁcation 퐽푠푝푒푐
Output: Jitter distribution
−→
퐽 and power distribution
−→
푃 .
1: set 푗 ← 1,
−→
퐶 ←
−→
퐶 푖푛푖푡.
2: for 푖 = 1 to 푁 do
3: calculate system jitter 퐽(
−→
퐶 ,
−→
푇 ,
−→
퐷,푉푖) under process variable set 푉푖;
4: if 퐽(
−→
퐶 ,
−→
푇 ,
−→
퐷,푉푖) > 퐽푠푝푒푐 then
5: select highest performance and power conﬁguration
−→
퐶푚푎푥 for ADPLL
6: if 퐽(
−→
퐶푚푎푥,
−→
푇 ,
−→
퐷,푉푖) > 퐽푠푝푒푐 then
7: return.
8: else
9: for conﬁguration
−→
퐶 푘 increase performance and power from
−→
퐶 푖푛푖푡 to
−→
퐶푚푎푥 do
10: if 퐽(
−→
퐶 푘,
−→
푇 ,
−→
퐷,푉푖) > 퐽푠푝푒푐 then
11: save 퐽푗 = 퐽(
−→
퐶 푘,
−→
푇 ,
−→
퐷,푉푖), 푃푗 = 푃 (
−→
퐶 푘,
−→
푇 ,
−→
퐷,푉푖).
12: 푗 ← 푗 + 1
13: break.
14: end if
15: end for
16: end if
17: else
18: save 퐽푗 = 퐽(
−→
퐶 ,
−→
푇 ,
−→
퐷,푉푖), 푃푗 = 푃 (
−→
퐶 ,
−→
푇 ,
−→
퐷,푉푖).
19: set 푗 ← 푗 + 1.
20: end if
21: end for
124
where {
−→
퐶 } are adaptive tuning space which are determined by the implementation
of ADPLL systems.
We perform the ADPLL topology selection similarly as in the conventional AD-
PLL optimization, then search within the optimized topology sets {
−→
푇 ′표푝푡} after topol-
ogy evaluation, note that the ﬁlter order in topology selection
−→
푇 is reconﬁgurable and
considered as part in
−→
퐶 , so we rewrite {
−→
푇 표푝푡} as {
−→
푇 ′표푝푡}. The formulation of the
optimization problem is in Algorithm 4.
Algorithm 4 Fine Tuning for Adaptive ADPLLs
Input: optimized topology sets {
−→
푇 ′표푝푡}, design variables
−→
퐷 , initial design
−→
퐷 푖푛푖푡 after
topology optimization, required jitter yield 푌 푖푒푙푑푟푒푞
Output: topology and design variables for minimum power.
1: for all topologies in {
−→
푇 ′표푝푡} do
2: set
−→
퐷 ←
−→
퐷 푖푛푖푡
3: calculate jitter
−→
퐽 (
−→
퐷) and power
−→
푃 (
−→
퐷) distribution using Algorithm 1, with 200
process variation samples.
4: calculate jitter yield 푌 푖푒푙푑퐽 and average power 퐴푣푔(푝표푤푒푟).
5: if 푌 푖푒푙푑퐽 < 푌 푖푒푙푑푟푒푞 then
6: add penalty 푝 ∝ exp(푌 푖푒푙푑푟푒푞 − 푌 푖푒푙푑퐽 ) to object function Avg(power) for conver-
gence speedup.
7: end if
8: while min(퐴푣푔(푝표푤푒푟)) not converge do
9: set new
−→
퐷 using coordinate search [34]
10: repeat step 3-7
11: end while
12: save minimum power of current topology.
13: end for
14: select the topology and design variables of minimum power.
G. Experimental Results
In this section, we demonstrate the eﬀectiveness of the proposed yield-aware system
optimization approach for ADPLLs by going through a design example. The build-
ing block designs are implemented in a 90푛푚 CMOS technology with 1.2V power
125
Table XIII. Optimization variable summary.
T (selection) D (variable)
DCO 2 (DAC or SDM) 4
TDC 2 (Reg. or Fine) 4
Type 2 (I or II) N/A
Filter Gain N/A 1
IIR order 4 (0 or 1 or 2 or 3) N/A
IIR gain N/A 3
supply. TDC and DCO options are designed manually and the digital circuits are
synthesized using standard cell libraries. The performance models of TDC and DCO
are generated using Kriging models [10] with data samples collected using SpectreRF
[63] simulations. Process variation information is extracted using PDK statistical
simulation and included in the system performance models.
The optimization targets are set to the system jitter integrated from 1KHz to
10MHz based on [66], system power and system area. The jitter speciﬁcation is set
to 0.5푝푠, the power speciﬁcation is set to 15푚푊 and the area speciﬁcation is set to
0.5푚푚2.
1. Normal ADPLL Optimization
We ﬁrst optimize an ADPLL without adaptive tuning function. The numbers of topol-
ogy choices and the dimensions of TDC and DCO design variables are summarized
in Table XIII.
Event-driven simulation technique is of good accuracy and much faster than
SPICE-like simulators for the ADPLL system performance evaluation [67]. We com-
pare the phase noise spectrum calculated using the proposed approach and that from
126
the event-driven simulation in Fig. 58. The close ﬁtting of the two curves demon-
strates the accuracy of our proposed transfer function based simulation method.
104 105 106 107
−180
−160
−140
−120
−100
−80
−60
Frequency (Hz)
Ph
as
e 
N
oi
se
 (d
Bc
/H
z)
Calcualted by perf. model
Simulated by event−driven
Fig. 58. Comparison of phase noise obtained by proposed method and event-driven
simulation.
In the topology selection stage, we plot the jitter distribution with all discretized
topology choices in Fig. 59. As can be seen from the ﬁgure, the major portion of the
discretized system design variables generate much higher jitters than the speciﬁcation,
so we can safely ﬁlter out these design points after the topology selection stage and
use the remaining topology choices for the transistor-level ﬁne tuning.
0 1 2 3 4 5
x 10−12
0
10
20
30
40
50
60
70
Jitter (s)
D
es
ig
n 
Nu
m
be
r
Fig. 59. Jitter distribution in topology selection.
127
The proposed optimization ﬂow is quite eﬃcient and can be ﬁnished within about
one hour. In the ﬁne tuning stage, system performance distributions are optimized.
The jitter distribution of the initial design is compared with the optimized design
after ﬁne tuning in Fig. 60.
0.9 0.95 1 1.05 1.1
x 10−12
0
5
10
15
Jitter (ps)
(a) initial
N
um
be
r
7.4 7.6 7.8 8 8.2
x 10−13
0
5
10
15
Jitter (ps)
(b) optimized
N
um
be
r
Fig. 60. Jitter distribution comparisons in ﬁne tuning stage.
2. Adaptive ADPLL Optimization
We use the similar setups as the previous example for the optimization of adaptive
ADPLL. In the topology selection stage, we evaluate diﬀerent system topologies and
build the idea of the system performance trade-oﬀs and near-optimal circuit conﬁgu-
rations. The yield-aware topology evaluation results for power and jitter are plotted
in Fig. 61.
Similarly the trade-oﬀs of area and jitter performances are plotted in Fig. 62.
From the ﬁgure we can see that the area changes are small (less than 10%) and all
meet the requried speciﬁcation. In the ADPLL ﬁne tuning stage, we focus on the
power minimization.
We set the yield target at 99% and minimize the power consumption using the
approaches presented in Algorithm 4. The power distribution after conventional yield-
aware tuning is presented in Fig. 63.
128
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
8
10
12
14
16
18
20
22
Jitter (ps)
Po
w
er
 (m
W
) 80% yield
95% yield
Fig. 61. Power and jitter trade-oﬀs in topology evaluation.
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.42
0.43
0.44
0.45
0.46
0.47
0.48
Jitter (ps)
Ar
ea
 (m
m*
mm
) 80% yield
95% yield
Fig. 62. Area and jitter trade-oﬀs in topology evaluation.
16.8 17 17.2 17.4 17.6 17.8 18 18.2 18.4 18.6
0
2
4
6
8
10
12
Power (mW)
Fig. 63. Power distribution for the reference optimization.
129
As comparison, the power distribution for the adaptive ADPLL system is shown
in Fig. 64. The average power value for the traditional optimization is 17.71푚푊
which the one of the adaptive ADPLL is 13.76푚푊 , which presents 22.3% power
reduction.
12 13 14 15 16 17 18 19 20
0
5
10
15
20
25
30
35
40
Power (mW)
Fig. 64. Power distribution for the adaptive optimization.
H. Summary and Discussion
A mixed-signal circuit design example is presented in this chapter. We design an all-
digital PLL circuit to apply the modeling, optimization and testing ideas illustrated
in the previous chapters. The noise performances of ADPLL are calculated using the
eﬃcient 푠-domain transfer function approach. Process variation-aware models are
developed for the building blocks of analog natures, these models are then utilized
to optimize ADPLL system performances. The yield-aware ADPLL optimization is
achieved eﬃciently by ﬁrst performing topology selection and then doing ﬁne tuning
for design variables. System performance self-healing functions are analyzed and im-
plemented to enhance the ADPLL performances. Experimental results demonstrate
the eﬀectiveness of our modeling and optimization framework and also indicate that
adaptive system designs have the ability to achieve better overall system performances
130
than these of only performing automatic transistor sizing.
We have demonstrated that the system adaption is very useful in yield enhance-
ments for analog/mixed-signal circuits in scaled CMOS technologies. The failure chips
can be brought back to the working condition with on-chip performance detection and
system self-healing functions. In addition to the performance enhancements, system
adaption can also be used to downgrade the system performances to save power, as
illustrated in Fig. 65. Some portion of the fabricated chips may have much higher
performances with large power consumptions, which can be considered as overdesign.
It is of interest to reduce the power of these chips by “self-downgrading” the system
performances and lowing the overall power consumption. This “two-way” style sys-
tem adaption optimizes system performances and power whenever possible with single
on-chip logic function and hence can achieve even better overall system performances.
 

 

	



 











 

 



 
















	



 


















 






























 







 









ﬀﬁ
ﬂﬃ 
ﬁ!
"#
ﬂ
#
$
ﬃ
"#
ﬁ

%
#"& "
%
ﬁ
'
%
ﬃ
(
ﬃ)ﬃ
!
"
ﬃ
"#
ﬁ

*
+,
-
.
/
01
23
/4
0
/
5
678
3 9
:;/<
=:
5
/
>
:;
9
:;/<
=:
5
/
.
/
01
2
5
:;?
8
<4
5
/
5
Fig. 65. Yield-aware optimization with two-way adaption.
In real system implementations, the power reduction in the system adaption can
be achieved by reconﬁguring supply voltage [68] or body bias [69]. In the ADPLL
131
design case, we can also consider the DCO biasing current as a tunable variable to save
the system power. In this dissertation we do not focus on using the system adaption
to push power reduction since the circuits are designed to be of the minimum power
consumption at the starting point. However, it is still of potential beneﬁt to have both
the self-healing and self-downgrading system adaption functions to further enhance
the system performances.
132
CHAPTER VI
CONCLUSIONS AND FUTURE DIRECTIONS
A. Conclusions
The focuses of this dissertation are employing computer methods to design and op-
timize large analog/mixed-signal systems in highly scaled CMOS technologies. We
have addressed the problems of eﬃcient circuit performance modeling and expedi-
tious simulation for robust Sigma-Delta ADC and Phase-locked Loop designs with
consideration of process variations. General performance modeling of small scale
analog blocks are achieved using Kriging modeling method which is integrated in
the framework of yield-aware hierarchical analog system optimization. We have also
investigated the problem of robust circuit design by employing built-in self-testing
circuitries in analog/mixed-signal systems so that individual chips can perform self-
healing of performances after fabrication to ﬁght against process variations. A circuit
design example of digital-intensive PLL is presented to illustrated the ideas of large
mixed-signal system modeling, optimization and testing.
B. Future Directions
Currently in the industry, most portion of analog circuits are still designed by experi-
enced circuit engineers instead of automatic design approaches. The designers utilize
their expertise in circuit designs and the process information from the foundries to
accomplish robust analog/mixed-signal designs. The immediate drawbacks of such
approaches turn out to be 1) rapidly increasing design closure cycle, the designers need
to wait for days or weeks to justify a small design change using current transistor-
level simulation tools, 2) the large number of tunable design parameters makes it
133
almost impossible to achieve global optima by performing handcrafted design, and 3)
the ineﬃcient use of foundry process variation information may cost a few tape-out
iterations before converging to yield-stable design choices.
Despite of the shortcomings of manual design procedures, automatic circuit de-
sign ideas have not been adopted by the majority in analog design society yet, when
compared with the grate success of digital VLSI design methodologies. There are
many reasons to explain this situation but it clearly indicates there are missing pieces
and still long way to go for analog EDA research and developments. The barri-
ers keeping analog circuit designers from adopting automatic design methodologies
come from various aspects. The accuracy concerns of fast performance evaluation
and optimization tools always stay on the top of circuit designers’ minds. Many
computer-aided design enhancement tools are circuit topology speciﬁed and require
dedicated human involvement with steep learning curves, which makes them diﬃcult
to build standardized design ﬂows. Human beings is always reluctant to switch from
familiar and established procedures to something unfamiliar unless completely neces-
sary. And lastly computer designs can not do innovations which are valued as crucial
by some analog designers.
As CMOS technologies scale further following Moore’s law (or the development
of technologies), there could be increasing opportunities for automatic analog design
methodologies to gain popularity. Several scenarios could happen in the next few
years/decades and make the automatic design method essential in industry design
practice. For example, the process variations might become so severe (refer to Table
I) that the impacts of variabilities may change the circuit characters instead of in-
ﬂecting them in quantities. For high-performance or high-yield circuit designs, circuit
layout have to be designed and calibrated by the foundries for individual manufac-
turing processes to achieve best manufacturability. In that case, the foundries have
134
to build and verify element circuits and their layouts, then ship the performance
and characteristic models of these circuits as fundamental building blocks to the de-
signers/system integrators to design electronic systems. It is no longer necessary for
the designers to deal with transistor-level design problems under such circumstances.
Clearly the modeling and optimization techniques proposed in this dissertation would
be of great value for such kind of digital-like design procedures in the future. The re-
cent acquisition of the world’s largest analog IP provider by the EDA giant Synopsys
signals that analog design revolution might be already on the way [70].
135
REFERENCES
[1] The International Technology Roadmap for Semiconductors,
http://public.itrs.net, Accessed July 2009.
[2] S. Nassif, “Process variability at the 65nm node and beyond,” in Proc. of IEEE
Custom Integrated Circuits Conf., 2008, pp. 1–8.
[3] G. Gielen and R. Rutenbar, “Computer-aided design of analog and mixed-signal
integrated circuits,” Proc. of the IEEE, vol. 88, no. 12, pp. 1825–1852, Dec.
2000.
[4] S. R. Norsworthy, R. Schreier, and G. C. Temes, Delta-Sigma Data Converters:
Theory, Design, and Simulation, Piscataway, NJ: IEEE Press, 1997.
[5] R. Best, Phase-Locked Loops Design, Simulation, and Applications 5th edition,
New York City, NY: McGraw-Hill Professional, 2003.
[6] G. Matheron, “Principles of geostatistics,” Economic Geology, vol. 58, no. 8,
pp. 1246–1266, Dec. 1963.
[7] R. Staszewski and P. Balsara, All-Digital Frequency Synthesizer in Deep-
submicron CMOS, Hoboken, NJ: Wiley-Interscience, 2008.
[8] G. Yu and P. Li, “Lookup table based simulation and statistical modeling of
sigma-delta ADCs,” in Proc. IEEE/ACM Design Automation Conf., 2006, pp.
1035–1040.
[9] G. Yu and P. Li, “A methodology for systematic built-in self-test of phase-locked
loops targeting at parametric failures,” in Proc. of IEEE Int. Test Conf., 2007,
pp. 1–10.
136
[10] G. Yu and P. Li, “Yield-aware analog integrated circuit optimization using
geostatistics motivated parametric failures,” in Proc. of IEEE/ACM Int. Conf.
on CAD, 2007, pp. 464–469.
[11] Cadence Design Systems Technical Staﬀ, Aﬃrma Spectre Circuit Simulator User
Guide, Cadence Design Systems, Inc., San Jose, CA, 2000.
[12] R. J. Bishop, J.J. Paulos, M. B. Steer, and S. H. Ardalan, “Table-based modeling
of delta-sigma modulators,” IEEE Trans. Circuits Syst., vol. 37, no. 3, pp. 447–
451, March 1990.
[13] K. K. Low and S. W. Director, “An eﬃcient methodology for building macro-
models of IC fabrication processes,” IEEE Trans. Computer-Aided Design In-
tegr. Circuits Syst., vol. 8, no. 12, pp. 1299–1313, December 1989.
[14] G.E.P. Box, J. S. Hunter, and W. G. Hunter, Statistics for Experimenters :
Design, Innovation, and Discovery, Hoboken, NJ: John Wiley & Son, 2005.
[15] J. Zou, D. Mueller, H. Graeb, and U. Schlichtmann, “A CPPLL hierarchical
optimization methodology considering jitter, power and locking time,” in Proc.
of IEEE/ACM Design Automation Conf, 2006, pp. 19–24.
[16] N. Godambe and C. J. R. Shi, “Behavioral level noise modeling and jitter sim-
ulation of phase-locked loops with faults using VHDL-AMS,” in Proc. of IEEE
VLSI Test Symposium, 1997, pp. 177–182.
[17] A. Phanse, R. Shirani, R. Rasmussen, R. Mendel and J. Yuan, “Behavioral
modeling of a phase locked loop,” in Southcon’96, 1996, pp. 400–404.
[18] S. R. Nassif, “Modeling and analysis of manufacturing variations,” in IEEE
Custom Integrated Circuits Conference, 2001, pp. 223–228.
137
[19] D. Morrison, Multivariate Statistical Methods, New York City, NY: McGraw-
Hill, 1976.
[20] Z. Feng and P. Li, “Performance-oriented statistical parameter reduction of
parameterized systems via reduced rank regression,” in Proc. of IEEE/ACM
Int. Conf. on CAD, 2006, pp. 868–875.
[21] G. Reinsel and R. Velu, Multivariate Reduced-Rank Regression, Theory and
Applications, New York City, NY: Springer-Verlag, 1998.
[22] J. Sacks, W. Welch, T. Mitchell, and H. Wynn, “Design and analysis of computer
experiments,” Statistical Science, vol. 4, no. 4, pp. 409–435, Nov. 1989.
[23] M. Bernardo, R. Buck, L. Liu, W. Nazaret, J. Sacks, and W. Welch, “Inte-
grated circuit design optimization using a sequential strategy,” IEEE Trans. on
Computer-Aided Design, vol. 11, no. 3, pp. 361–372, March 1992.
[24] G. Yu and P. Li, “Yield-aware hierarchical optimization of large analog inte-
grated circuits,” in Proc. of IEEE/ACM Int. Conf. on CAD, 2008, pp. 79–84.
[25] M. Driscoll, W. Daasch, and C. Sembakutti, “Eﬃcient design centering of analog
integrated circuits using binary search,” Analog Integrated Circuits and Signal
Processing, vol. 6, pp. 157 – 169, 1994.
[26] F. Schenkel, M. Pronath, S. Zizala, R. Schwencker, H. Graeb, and K. Antreich,
“Mismatch analysis and direct yield optimization by specwise linearization and
feasibility-guided search,” in Proc. of IEEE Design Automation Conference,
2001, pp. 858–863.
[27] Y. Xu, K. Hsiung, X. Li, I. Vausieda, S. Boyd and L. Pileggi, “OPERA: Opti-
mization with ellipsoidal uncertainty for robust analog IC design,” in Proc. of
138
IEEE/ACM Design Automation Conference, 2005, pp. 632–637.
[28] Cadence Design Systems Technical Staﬀ, Virtuoso NeoCircuit User Guide, Ca-
dence Design Systems, 2009.
[29] MunEDA Technical Staﬀ, WiCkeD User Guide, MunEDA, 2009.
[30] K. Antreich, H. Graeb, and C. Wieser, “Circuit analysis and optimization driven
by worst-case distancess,” IEEE Trans. on Computer-aided Design, vol. 13, no.
1, pp. 57 – 71, Jan. 1994.
[31] G. Stehr, H. Braeb, and K. Antreich, “Performance trade-oﬀ analysis of ana-
log circuits by normal-boundary intersection,” in Proc. of IEEE/ACM Design
Automation Conference, 2003, pp. 958–963.
[32] S. Tiwary, P. Tiwary, and R. Rutenbar, “Generation of yield-aware pareto sur-
faces for hierarchical circuit design space exploration,” in Proc. of IEEE/ACM
Design Automation Conf, 2006, pp. 31–36.
[33] B. Smedt and G. Gielen, “HOLMES: Capturing the yield - optimized design
space boundaries of analog and RF integrated circuits,” in Proc. of IEEE/ACM
Design, Automation and Test in Europe Conference and Exhibition, 2003, pp.
19–24.
[34] W. Huyer and A. Neumaier, “Global optimization by multilevel coordinate
search,” Journal of Global Optimization, vol. 14, no. 4, pp. 331–355, June 1999.
[35] S. Tiwary, S. Velu, R. Butenbar, and T. Mukherjee, “Pareto optimal modeling
for eﬃcient PLL optimization,” in Nanotech 2004 Vol. 2, 2004, pp. 195–198.
[36] Open Verilog International, Verilog-A Language Reference Manual,
www.verilog.org, Accessed Sept. 2009.
139
[37] K. Kundert, Predicting the Phase Noise and Jitter of PLL-Based Frequency
Synthesizers, www.designers-guide.org, Accessed Aug. 2006.
[38] G. Roberts, “Metrics, techniques and recent developments in mixed-signal test-
ing,” in Proc. of IEEE/ACM Int. Conf. on CAD, 1996, pp. 514 – 521.
[39] G. Yu, P. Li, and W. Dong, “Achieving low-cost linearity test and diagnosis of
sigma delta ADCs via frequency-domain nonlinear analysis and macromodeling,”
in Proc. of IEEE Int. Symposium on Quality Electronic Design, 2007, pp. 513–
518.
[40] M. Schetzen, The Volterra and Wiener Theories of Nonlinear Systems, Hoboken,
NJ: John Wiley & Son, 1980.
[41] P. Wambacq and W. Sansen, Distortion Analysis of Analog Integrated Circuits,
Dordrecht, The Netherlands: Kluwer Academic Publishers, 1998.
[42] W. J. Rugh, Nonlinear System Theory - The Volterra/Wiener Approach, Balti-
more, MD: The Johns Hopkins University Press, 1981.
[43] N. Csizmadia and A.J.E.M. Janssen, “Estimating the integral non-linearity of
AD-converters via the frequency domain,” in IEEE Int. Test Conf., 1999, pp.
757–762.
[44] INL/DNL measurements for high-speed analog-to-digital converters ADCs - AN
283, Maxim Integrated Products, Dallas, TX, Nov. 2001.
[45] V. Vapnik, Statistical Learning Theory, Hoboken, NJ: Wiley-Interscience Pub-
lishers, 1998.
[46] S. Sunter and A. Roy, “BIST for phase-locked loops in digital applications,” in
Proc. of IEEE International Test Conference, 1999, pp. 532–540.
140
[47] S. Kim and M. Soma, “An all-digital built-in self-test for high-speed phase-
locked loops,” IEEE Trans. on Circuit and Systems -II: Analog and Digital
Signal Processing, vol. 48, no. 2, pp. 141–150, Feb. 2001.
[48] C. Hsu, Y. Lai and S. Wang, “Built-in self-test for phase-locked loops,” IEEE
Trans. on Instr. and Measurement, vol. 54, no. 3, pp. 996–1002, June 2005.
[49] F. Azais, Y. Bertrand, M. Renovell, A. Ivanov and S. Tabatabaei, “An all-digital
DFT scheme for testing catastrophic faults in PLLs,” IEEE Design & Test of
Computers, vol. 20, no. 1, pp. 60–67, Jan. 2003.
[50] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, “Statistical
modeling of device mismatch for analog integrated circuits,” IEEE J. Solid-State
Circuits, vol. 24, no. 5, pp. 1433–1440, October 1989.
[51] U. Schaper, J. Einfeld, and A. Sauerbrey, “Parameter variation on chip-level,”
in IEEE Int. Conf. on Microelectronic Test Structures, 2005, pp. 155–158.
[52] J. Dunning, G. Garcia, J. Lundberg, and E. Nuckolls, “An all-digital phase-
locked loop with 50-cycle lock time suitable for high-performance microproces-
sors,” IEEE J. Solid-State Circuits, vol. 30, no. 4, pp. 412–422, Apr. 1995.
[53] C. Chung and C. Lee, “An all-digital phase-locked loop for high-speed clock
generation,” IEEE J. Solid-State Circuits, vol. 38, no. 2, pp. 347–351, Feb. 2003.
[54] R. Staszewski, K. Muhammad, D. Leipold, C. Hung, Y. Ho, and et al, “All-
digital TX frequency synthesizer and discrete-time receiver for bluetooth radio
in 130-nm CMOS,” IEEE J. Solid-State Circuits, vol. 39, no. 12, pp. 2278–2291,
Dec. 2004.
141
[55] H. Chang, P. Wang, J. Zhan, and B. Hsieh, “A fractional spur-free ADPLL with
loop-gain calibration and phase-noise cancellation for GSM/GPRS/EDGE,” in
Proc. of ISSCC Dig. Tech. Papers, 2008, pp. 200–202.
[56] C. Hsu, M. Straayer, and M. Perrott, “A low-noise wide-BW 3.6GHz digital ds
fractional-N frequency synthesizer with a noise-shaping time-to-digital converter
and quantization noise cancellation,” in Proc. of ISSCC Dig. Tech. Papers, 2008,
pp. 340–342.
[57] C. Wu, E. Temporiti, D. Baldi, and F. Svelto, “A 3GHz fractional-N all-digital
PLL with precise time-to-digital converter calibration and mismatch correction,”
in Proc. of ISSCC Dig. Tech. Papers, 2008, pp. 344–346.
[58] R. Staszewski, S. Vemulapalli, P. Vallur, J. Wallberg, and P.T. Balsara, “1.3 V
20 ps time-to-digital converter for frequency synthesis in 90-nm CMOS,” IEEE
Tran. on Circuits and Systems. - II, vol. 53, no. 4, pp. 769–777, Mar. 2006.
[59] C. Lau and M. Perrott, “Fractional-N frequency synthesizer design at the trans-
fer function level using a direct closed loop realization algorithm,” in Proc. of
IEEE/ACM Design Automation Conference, July 2003, pp. 526–531.
[60] C. Hsu, Techniques for High-Performance Digital Frequency Synthesis and Phase
Control, PhD dissertation, MIT, Cambridge, MA, 2008.
[61] F. Gardner, “Charge-pump phase-locked loops,” IEEE Trans. on Communica-
tions, vol. 28, pp. 1849 – 1858, Nov. 1980.
[62] M. Lee and A. Abidi, “A 9b, 1.25 ps resolution coarse-ﬁne time-to-digital con-
verter in 90 nm CMOS that ampliﬁes a time residue,” IEEE J. Solid-State
Circuits, vol. 43, no. 4, pp. 769–777, Apr. 2008.
142
[63] Cadence Design Systems Technical Staﬀ, SpectreRF Circuit Simulator User
Guide, Cadence Design Systems, June 2008.
[64] K. Waheed and R. Staszewski, “Digital RF processing techniques for device
mismatch tolerant transmitters in nanometer-scale CMOS,” in Proc. of IEEE
International Symposium on Circuits and Systems, May 2007, pp. 1253–1256.
[65] R. Staszewski, I. Bashir, and O. Eliezer, “RF built-in self test of a wireless
transmitter,” IEEE Tran. on Circuits and Systems - II, vol. 39, no. 12, pp.
2278–2291, Dec. 2004.
[66] Clock Jitter and Phase Noise Conversion - Maxim AN 3359, Maxim Integrated
Products, Dallas, TX, Dec. 2004.
[67] R. Staszewski, C. Fernando, and P. Balsara, “Event-driven simulation and mod-
eling of phase noise of an RF oscillator,” IEEE Tran. on Circuits and Systems.
- I, vol. 52, no. 4, pp. 723–733, Apr. 2005.
[68] T. Chen and S. Naﬀziger, “Comparison of adaptive body bias (ABB) and adap-
tive supply voltage (ASV) for improving delay and leakage under the presence
of process variation,” IEEE Trans. on VLSI, vol. 11, no. 5, pp. 888–899, May
2003.
[69] J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and
V. De, “Adaptive body bias for reducing impacts of die-to-die and within-die
parameter variations on microprocessor frequency and leakage,” IEEE J. on
Solid-States Circuits, vol. 27, no. 11, pp. 1396–1402, Nov. 2002.
[70] Synopsys Acquires Analog Business Group of MIPS Technologies, PR Newswire,
Mountain View, CA, May 2009.
143
VITA
Guo Yu received the B.S. and M.S. degrees in electrical engineering from Fudan
University, Shanghai, China and Delft University of Technology (TU Delft), Delft, the
Netherlands, in 2003 and 2005, respectively. He worked at Philips Semiconductors
Netherlands from Oct. 2004 to Aug. 2005, and Cadence Design Systems Pittsburgh
during the summer of 2007 as a research intern respectively. His research interests in-
clude modeling and optimization for analog/mixed-signal circuits, built-in test scheme
design, timing and yield analysis for VLSI circuits, and device modeling. He can be
reached at WERC 331A, Department of Electrical and Computer Engineering, Texas
A&M University, College Station, TX 77843-3128.
The typist for this thesis was Guo Yu.
