Rapid Co-optimization of Processing and Circuit Design to Overcome
  Carbon Nanotube Variations by Hills, Gage et al.
1082 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 7, JULY 2015
Rapid Co-Optimization of Processing and Circuit
Design to Overcome Carbon Nanotube Variations
Gage Hills, Jie Zhang, Member, IEEE, Max Marcel Shulaker, Hai Wei, Student Member, IEEE, Chi-Shuen Lee,
Arjun Balasingam, H.-S. Philip Wong, Fellow, IEEE, and Subhasish Mitra, Fellow, IEEE
Abstract—Carbon nanotube field-effect transistors (CNFETs)
are promising candidates for building energy-efficient digital
systems at highly scaled technology nodes. However, carbon nan-
otubes (CNTs) are inherently subject to variations that reduce
circuit yield, increase susceptibility to noise, and severely degrade
their anticipated energy and speed benefits. Joint exploration
and optimization of CNT processing options and CNFET cir-
cuit design are required to overcome this outstanding challenge.
Unfortunately, existing approaches for such exploration and
optimization are computationally expensive, and mostly rely
on trial-and-error-based ad hoc techniques. In this paper, we
present a framework that quickly evaluates the impact of CNT
variations on circuit delay and noise margin, and systemati-
cally explores the large space of CNT processing options to
derive optimized CNT processing and CNFET circuit design
guidelines. We demonstrate that our framework: 1) runs over
100× faster than existing approaches and 2) accurately iden-
tifies the most important CNT processing parameters, together
with CNFET circuit design parameters (e.g., for CNFET siz-
ing and standard cell layouts), to minimize the impact of CNT
variations on CNFET circuit speed with ≤5% energy cost,
while simultaneously meeting circuit-level noise margin and yield
constraints.
Index Terms—Carbon nanotube (CNT), CNT variations, delay
optimization, design-technology co-optimization.
I. INTRODUCTION
WHILE physical scaling of silicon-based field-effect tran-sistors has improved digital system performance for
decades [10], continued device scaling is becoming increas-
ingly challenging [2]. Carbon nanotube (CNT) field-effect
transistors (CNFETs) are excellent candidates for contin-
uing to improve both performance and energy efficiency
of digital systems [13]. CNFET-based very large-scale inte-
grated (VLSI) digital systems are projected to improve
energy-delay product (EDP) by an order of magnitude
versus silicon-CMOS [6], [46]. Furthermore, CNFETs pro-
vide an exciting opportunity to enable monolithic 3-D
integrated circuits [47], leading to additional EDP benefits for
Manuscript received March 1, 2014; revised June 13, 2014; accepted
September 22, 2014. Date of publication March 23, 2015; date of current ver-
sion June 16, 2015. This work was supported in part by the National Science
Foundation through the NCN-NEEDS program, contract 1227020-EEC. This
paper was recommended by Associate Editor Y. Cao.
G. Hills, J. Zhang, M. M. Shulaker, H. Wei, C.-S. Lee, A. Balasingam, and
H.-S. P. Wong are with the Department of Electrical Engineering, Stanford
University, Stanford, CA 94305 USA (e-mail: ghills@stanford.edu).
S. Mitra is with the Department of Electrical Engineering and
the Department of Computer Science, Stanford University, Stanford,
CA 94305 USA.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD.2015.2415492
Fig. 1. (a) CNT. (b) Typical CNFET structure. (c) Scanning electron
microscopy image of the CNFET channel.
CNFET-based digital systems with massive integration of logic
and memory [42].
The schematic of a CNFET is shown in Fig. 1. Multiple
CNTs compose the transistor channel, whose conductance is
modulated by the gate. The gate, source, and drain are defined
using traditional photolithography, while the CNT-CNT spac-
ing is determined by the CNT growth [31] and can therefore
exceed the minimum lithographic pitch. For high drive current,
the target CNT-CNT spacing is 4–5 nm [46].
Despite demonstrations of sub-10 nm channel
length CNFETs [13] and stand-alone CNFET circuit
elements [5], [7], [11], realization of complex CNFET-
based digital systems had been prohibited by substantial
imperfections inherent to CNTs: mis-positioned CNTs and
metallic CNTs. Mis-positioned CNTs cause stray conducting
paths that can lead to incorrect logic functionality, and
metallic CNTs (resulting from the imprecise control over
CNT properties) result in increased leakage current and can
lead to incorrect logic functionality. A unique combination
of CNT processing and CNFET circuit design techniques,
known as the imperfection-immune paradigm [54], over-
comes these challenges in a VLSI-compatible manner to
enable the realization of the first CNFET-based digital
systems [32], [33], [40], including the first programmable
microprocessor built using CNFETs [39]. Two key enablers
of these demonstrations are: 1) mis-positioned CNT-immune
layout design [30] and 2) VLSI-compatible metallic CNT
removal (VMR), which efficiently removes ≥99.99% of
metallic CNTs [32], [40].
Unfortunately, process variations specific to CNTs, such as
the imprecise control over CNT properties and the nonuniform
density of grown CNTs (details in Section II), can lead to signif-
icantly reduced circuit yield, increased susceptibility to noise,
and large variations in CNFET circuit delays (Section II) [54].
One method to counteract these effects is to upsize all
CNFETs. However, such naïve upsizing incurs large energy and
delay costs that diminish CNFET technology benefits.
Rather, various CNT process improvement options,
when combined with CNFET circuit design, provide an
0278-0070 c⃝ 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS 1083
energy-efficient method of overcoming CNT variations.
Without such strategies, CNT variations can degrade the
potential speed benefits of CNFET circuits by ≥20% at sub-
10 nm nodes, even for circuits with upsized CNFETs to
achieve ≥99.9% yield (Section II). By leveraging CNT pro-
cess improvements, together with CNFET circuit design, the
overall speed degradation can be limited to ≤5% with ≤5%
energy cost while simultaneously meeting circuit-level noise
margin and yield constraints [52].
However, co-optimization of CNT technology options and
CNFET circuit design parameters using trial-and-error-based
search can be prohibitively time-consuming. In this paper, we
demonstrate a systematic and VLSI-scalable methodology that
selects effective combinations of CNT processing options and
CNFET circuit design techniques to overcome CNT variations.
Our key contributions are as follows.
1) Techniques to quickly evaluate the impact of CNT vari-
ations on circuit yield, susceptibility to noise, delay,
and energy. They run >100× faster than previous
approaches.
2) A systematic methodology to explore the large space
of CNT processing options together with CNFET cir-
cuit design parameters (e.g., CNFET sizing and standard
cell layouts leveraging CNT correlation, see Section II),
to rapidly identify designs that reduce the impact
of CNT variations on circuit yield, susceptibility to
noise, and delay variations with ≤5% energy cost. This
is in sharp contrast to previous trial-and-error-based
approaches.
3) Derivation of guidelines for CNT processing and
CNFET circuit design parameters at highly scaled tech-
nology nodes to overcome CNT variations. We provide
guidelines to limit the overall circuit speed degrada-
tion to ≤5% with ≤5% energy cost while maintaining
≥99.999% functional circuit yield and ≤0.001% prob-
ability of failing to meet circuit-level noise margin
requirements (Section IV).
In Section II, we present an overview of CNT variations
and their impact on CNFET circuits. Section III describes
a methodology to optimize circuit performance in the presence
of CNT variations, leveraging a SPICE-compatible CNFET
device model to build efficient variation-aware models for the
delay, energy, and noise margin of CNFET circuits. Using this
methodology, we provide CNT processing and CNFET circuit
design guidelines for overcoming CNT variations at the 14,
10, 7, and 5 nm technology nodes (Section IV).
An earlier version of this paper was published in [16]. Here,
we present the following additional contributions.
1) Design and analysis of CNFET digital VLSI circuits
scaled to the 5 nm node, enabled by a recently developed
SPICE-compatible CNFET device model for accurate
analysis of sub-10 nm gate length CNFETs [23].
2) A computationally efficient technique to numerically
calculate the probability that CNFET circuits fail to
meet circuit-level noise margin requirements. This tech-
nique can accurately compute such probabilities less
than 0.001% (as is desirable for VLSI-scale circuits,
details in Section II-C).
In this paper, we make references to [17], which con-
tains additional figures and analysis details. It is available for
download at http://www.arxiv.org.
TABLE I
CNT PROCESSING PARAMETERS FOR CNT COUNT VARIATIONS. CNT
DENSITY = 250 CNTs/µM FOR ALL ANALYSIS [53]
Fig. 2. CNFET ION variations due to CNT variations (x-axis) for a minimum-
width CNFET (VDD = 0.50 V, width = half-contacted gate pitch; see
[17, Table VI]). IDC = 0.50 [49] for CNT density variations, pm = 33% [37]
and pRs = 4% [40] for m-CNT-induced variations. Diameter is normally dis-
tributed with µd = 1.3 nm and σd = 0.1 nm [31]. Alignment and doping
distribution details in [54]. To analyze ION variations attributed to individual
sources of CNT variations, all other sources of CNT variations are removed.
Additional parameters in [17, Table VI].
II. CNT VARIATIONS
In addition to process variations that exist for silicon-CMOS
FETs (e.g., variations in channel length, oxide thickness,
and threshold voltage [26]), CNFETs are also subject to
CNT-specific variations, including variations in CNT type
(semiconducting: s-CNT or metallic: m-CNT) [32], CNT
density [49], diameter [34], alignment [30], and doping [9]
(details in [17, Sec. VI]). While the on-current (ION) of
a CNFET with only a single CNT as its channel is highly sen-
sitive to CNT diameter variations [34], CNFETs in practical
VLSI circuits consist of multiple CNTs to provide sufficient
ION. Thus, the impact of diameter variations is reduced due to
statistical averaging (Fig. 2) [35]. Rather, ION variations are
dominated by variations in the CNT count: the number of s-
CNTs in a CNFET (after m-CNT removal, e.g., using VMR)1
[52]. CNT count variations stem from two sources.
1) CNT Density Variations: Precise positioning of CNTs
is difficult to control; resulting CNT-CNT spacing vari-
ations lead to a variable number of CNTs in each
CNFET [49].
2) m-CNT-Induced Variations: Each CNFET contains
a variable number of both s-CNTs and m-CNTs, result-
ing in CNT count variations even assuming a perfectly
selective m-CNT removal technique (i.e., pRm = 100%,
pRs = 0%: Table I). In addition, m-CNT removal
techniques may inadvertently remove a small frac-
tion of s-CNTs, further contributing to CNT count
variations [54].
CNT count variations are parameterized by the parameters:
Index of Dispersion for CNT count (IDC), pm, pRs, and pRm
(i.e., the processing parameters) defined in Table I. We ana-
lyze the impact of CNT count variations on CNFET circuit
1Another technique for post-growth m-CNT removal is known as CNT sort-
ing, in which s-CNTs are separated from m-CNTs in a solution [1]. However,
CNT sorting techniques have not yet achieved the selectivity required for
VLSI-scale digital circuits [50].
1084 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 7, JULY 2015
modules synthesized from the processor core of OpenSPARC
T2, a large multicore chip that closely resembles the commer-
cial Oracle/SUN Niagara 2 system [27]. These OpenSPARC
modules consist of ∼4 K to >100 K logic gates (Table III)
and expose several effects in VLSI-scale circuits (e.g., wire
parasitics) that are not visible in small circuit benchmarks.
We consider the effects of CNT count variations on the
following circuit-level metrics.
1) Functional Yield: Due to CNT count variations, there is
nonzero probability that a CNFET contains no s-CNTs
in its channel, leading to functional failure of the CNFET
(i.e., CNT count failure) [51]. The count-limited yield
of a CNFET circuit is the probability that no CNFET
experiences CNT count failure [51] (Section II-A).
2) Delay Penalty: The increase in the 95-percentile-delay
(T95: the minimum clock period that the circuit has
a 95% probability of meeting) relative to the nomi-
nal delay (the critical path delay when there are no
variations). Details in Section II-B.
3) Static Noise Margin (SNM): A measure of the noise
susceptibility of a pair of connected logic gates
(Section II-C).
4) Probability of Noise Margin Violation (PNMV): The
probability that any pair of connected logic gates in
a circuit fails to meet SNMR, a required SNM level
(Section II-C).
A. Impact on Circuit Functional Yield
For VLSI CNFET circuits with minimum-width CNFETs,
the count-limited yield can be very low (near zero) [51]. An
effective method to significantly improve the count-limited
yield (≥99.999%) is to perform minimum-width upsizing:
upsize all CNFETs that have width (W) less than a specified
minimum width (WMIN) to have W = WMIN [51]. Although
minimum-width upsizing effectively improves count-limited
yield, it can incur large energy costs if the CNT count failures
of all CNFETs are independent [51]. Rather, for CNFET cir-
cuits with highly aligned CNTs, the count-limited yield (and
the energy cost of minimum-width upsizing, details below) can
be significantly improved by leveraging the unique property
of CNT correlation: since CNTs are 1-D nanostructures with
lengths typically much longer than the CNFET contacted gate-
pitch [20], [31], the CNT counts of CNFETs can be uncorre-
lated or highly correlated depending on the relative physical
placement of the CNFET active regions (active region: area of
channel which has CNTs) [51]. Special aligned-active lay-
outs can engineer these correlations by aligning the active
regions in a library to maximize correlation [17, Fig. 15].
Aligned-active layouts incur minimal area increase (only 4 of
134 cells from the Nangate 45 nm Open Cell Library [25]
incur area penalties <14%), and the locations of I/O pins
are mostly retained, resulting in negligible impact on intercell
routing [51].
To achieve count-limited yield ≥99% for circuits today
(which can consist of 100M logic gates), the count-
limited yield for each OpenSPARC module (∼100K logic
gates) should be ≥99.999%. To reach this target, we use
a combination of minimum-width upsizing, aligned-active
layouts, and CNT process improvements. We first use
minimum-width upsizing with aligned-active layouts to
achieve count-limited yield ≥99.9% (which is lower than
Fig. 3. Full analysis & design methodology. Steps 1–3 (highlighted) are
described in this section. Additional details in Section III and [17, Sec. VII].
Fig. 4. Energy cost of minimum-width upsizing with aligned-active layouts
to achieve ≥99.9% count-limited yield: OpenSPARC modules, IDC = 0.50,
pm = 10%, pRs = 4%, pRm = 99.99% (count-limited yield improves to≥99.999% with the processing guidelines in Section IV). Improving delay
penalty and PNMV can require additional energy costs.
the 99.999% requirement, details below). Then, CNT process
improvements (which are required to meet delay penalty and
noise margin requirements) further improve the count-limited
yield, e.g., to ≥99.999% (details below in steps 1–3). We
define "EFUNC as the energy cost (in terms of total energy
per cycle) of minimum-width upsizing to reach a desired
count-limited yield (i.e., functional yield). "EFUNC can be
≤2.5% for all the OpenSPARC modules (Fig. 4). It is deter-
mined using the design flow in Fig. 3. Steps 1–3 (Fig. 3) are
described below.
1) Synthesis, Place and Route, and Parasitic Extraction:
Targeting the nominal case: no variations. Details in
[17, Sec. VII].
2) Minimum-Width Upsizing for Count-Limited Yield:
Determine WMIN to achieve count-limited yield ≥99.9%
with aligned-active layouts via the methodology in [51],
using experimentally demonstrated values for the pro-
cessing parameters (Table I, though other values may
be chosen). Then perform minimum-width upsizing (the
associated energy cost is "EFUNC). Note that, this initial
count-limited yield target of ≥99.9% is lower than the
required ≥99.999% count-limited yield. In Section IV,
we show that CNT process improvements (which are
required to meet delay penalty and noise margin require-
ments) further improve the count-limited yield, e.g.,
to ≥99.999%. If count-limited yield ≥99.999% is not
achieved after meeting delay penalty and noise margin
requirements, we return to this step and increase WMIN
to the width of the next-largest CNFET in our standard
cell library (details in [17, Sec. VII]).
3) Circuit EDP Optimization: We use the EDP metric to
quantify energy efficiency. We perform circuit sizing to
minimize circuit EDP using a selective transistor/logic
gate upsizing algorithm (i.e., selective upsizing) inspired
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS 1085
Fig. 5. Delay penalty for the OpenSPARC modules (after steps 1–3 in Fig. 3).
IDC = 0.50, pm = 10%, pRs = 4%, pRm = 99.99%. (a) Delay penalty across
technology nodes. (b) Delay penalty improvement due to selective upsizing
with "E ≤ 5%. For both (a) and (b): count-limited yield ≥99.98% in all cases
(it improves to ≥99.999% with the processing guidelines in Section IV).
by [52]: targeting the nominal case, we first sort all
standard cells according to their fan-out (fan-out: the
ratio of the output load capacitance to the minimum
input capacitance on any input), then upsize a param-
eterized number kSelUpsize ≥ 0 of the standard cells with
the largest fan-out (see algorithm in [17, Sec. VII-C]).
We sweep kSelUpsize to generate an energy-delay trade-
off curve. We record the nominal energy (ENom) and the
nominal critical path delay (TNom) for each point on this
curve, and then select the point with the minimum EDP
[EDPNomOpt, defined in (1)]. This point (for the nomi-
nal case) is referred to as the EDP-optimized nominal
design point: (ENomOpt, TNomOpt). All delay penalties are
relative to this point
EDPNomOpt = ENomOptTNomOpt. (1)
While (ENomOpt, TNomOpt) represents an attractive design
in the nominal case (since EDPNomOpt is small versus other
points on the energy-delay tradeoff curve), this design may
have a high delay penalty due to CNT variations (e.g., it can
be ≥20% at sub-10 nm nodes: Section II-B).
B. Impact on Circuit Delay Variations
To derive distributions of CNFET circuit delays resulting
from CNT count variations, we leverage the methodology
described in [52]. This is a Monte Carlo statistical static tim-
ing analysis (MC SSTA) approach with two key changes:
1) a variation-aware timing model for CNFET logic gates
(built using a CNFET device model [45]) and 2) highly effi-
cient CNT count sampling, based on the unique asymmetric
CNT correlation property (Section II-A). This allows us to
compute the delay penalty for each OpenSPARC module (after
steps 1–3 in Fig. 3) as follows: sample the delay distribution
via MC SSTA (using 2000 trials, excluding any trials that have
CNT count failure), then extract T95 from the delay distribution
to calculate the delay penalty [Fig. 5(a)]. Fig. 5(a) illustrates
that the delay penalty for the OpenSPARC modules can be
≥20% for EDP-optimized designs with aligned-active layouts
at highly scaled technology nodes.
To overcome CNT variations, we target delay penalty
≤5% with total energy per cycle cost "E ≤ 5% [relative
to ENomOpt(1)] to maintain ≥90% of the projected EDP
benefits of CNFET circuits, even in the presence of CNT
variations. To improve delay penalties we leverage the
selective upsizing approach described in Section II-A [52].
Fig. 5(b) shows that selective upsizing can reduce delay
Fig. 6. SNM illustration. (a) Example gate pair. (b) VTC(dr) and mirrored
VTC(ld). For each of G(dr) and G(ld): VOH, VIH, VIL, and VOL are taken
from the points on the VTC where the slope is −1.
penalties by 1.5× (e.g., from 17% to 11% for the “gkt”
OpenSPARC module); in Fig. 5(b), additional selective upsiz-
ing was performed after steps 1–3 in Fig. 3 by increasing
kSelUpsize to minimize the delay penalty subject to "E ≤ 5%.
C. Impact on Circuit PNMV
A common metric to quantify the noise susceptibility of
a pair of connected logic gates [i.e., a gate pair: (G(dr), G(ld)),
where G(dr) and G(ld) are the driving and loading logic gates,
respectively] is the SNM, which can be quantified as fol-
lows [using the gate pair shown in Fig. 6(a) as an example].
Let G(dr) have voltage transfer curve VTC(dr) (voltage trans-
fer curve: VOUT versus VIN in the static case) and let G(ld)
have voltage transfer curve VTC(ld). Also, let (V(dr)IL , V(dr)OH )
and (V(dr)IH , V(dr)OL ) be the points on VTC(dr) where the slope
of VOUT versus VIN is −1 [as shown in Fig. 6(b)]. Similarly
define (V(ld)IL , V(ld)OH ) and (V(ld)IH , V(ld)OL ) for VTC(ld) [mirrored
in Fig. 6(b)]. Then for the gate pair (G(dr), G(ld)), the high
SNM (SNMH), the low SNM (SNML), and the SNM are
defined in (2)–(4), respectively [48]
SNMH
(
G(dr),G(ld)
)
= V(dr)OH − V(ld)IH (2)
SNML
(
G(dr),G(ld)
)
= V(ld)IL − V(dr)OL (3)
SNM
(
G(dr),G(ld)
)
= min
(
V(dr)OH − V(ld)IH ,V(ld)IL − V(dr)OL
)
. (4)
SNM(G(dr), G(ld)) is sensitive to ION variations [48], and
so it is sensitive to CNT count variations. To quantify the
impact of SNM variations on circuit noise susceptibility, we
use the PNMV metric, which is the probability that any gate
pair in a circuit fails to meet a required SNM level: SNMR.
SNMR is a design constraint chosen by the designer and
PNMV is directly related to SNMR. As SNMR increases (tighter
SNM requirement) then PNMV increases (lower probability of
meeting the SNM requirement). Typical values of SNMR are
relative to the supply voltage, VDD (e.g., SNMR = VDD/5 [48]).
PNMV is defined in (5), where C is the set of all gate pairs
PNMV = 1− P
⎧⎨⎩ ⋂
(G(dr),G(ld))∈C
(
SNM
(
G(dr),G(ld)
) ≥ SNMR)
⎫⎬⎭. (5)
To solve for PNMV due to CNT count variations, we
leverage a variation-aware SNM model that can compute
SNMH(G(dr), G(ld)) and SNML(G(dr), G(ld)) for every gate pair
in a circuit, given the CNT counts of each CNFET contained
in G(dr) and G(ld) (details in Section III-B1). In Section III-B2,
we describe how to combine this variation-aware SNM model
and the distributions of CNT count for all CNFETs in the
circuit to efficiently calculate PNMV.
1086 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 7, JULY 2015
Fig. 7. PNMV for the OpenSPARC modules (after steps 1–3 in Fig. 3).
IDC = 0.50, pm = 10%, pRs = 4%, pRm = 99.99%. (a) PNMV versus
node (SNMR = VDD/4). (b) PNMV versus SNMR (5 nm node). (c) PNMV
versus "E for additional minimum-width upsizing (in addition to minimum-
width upsizing for count-limited yield, Fig. 3): “dec” OpenSPARC module.
For (a), (b) and (c): count-limited yield ≥99.98% in all cases (it improves to
≥99.999% with the processing guidelines in Section IV).
Fig. 7(a) and (b) quantify PNMV for the OpenSPARC mod-
ules (after steps 1–3 in Fig. 3), which can be nearly 100% at
the 5 nm node. To achieve PNMV ≤ 1% for circuits today
(with ∼100M logic gates), each OpenSPARC module (∼100K
logic gates) should have PNMV ≤ 0.001%.
Since minimum-width CNFETs are highly sensitive to CNT
count variations [52], gate pairs that contain minimum width
CNFETs are highly likely to cause SNM violations. Thus,
PNMV is highly sensitive to minimum-width CNFETs, so fur-
ther minimum-width upsizing (in addition to minimum-width
upsizing for count-limited yield) improves PNMV (via statis-
tical averaging) at the cost of energy [Fig. 7(c)]. However,
additional minimum-width upsizing may be undesirable as it
can require "E > 5%, can increase circuit delay, and is not
guaranteed to meet PNMV constraints [17, Sec. IX-A].
D. Overcoming CNT Variations
As shown above, CNFET upsizing techniques alone can be
insufficient to meet design goals (e.g., delay penalty ≤5% and
PNMV ≤ 0.001% with "E ≤ 5%) [54]. Rather, a combination
of CNT processing and CNFET circuit design is required [54],
but two key questions must be answered: 1) which processing
parameters to improve? 2) By how much?
Without a systematic methodology to evaluate the circuit-
level impact of CNT variations, one might blindly pursue
difficult CNT processing paths with diminishing returns,
while overlooking other processing parameters that enable
larger performance gains. For example, much research has
focused solely on improving pm [1]. However, reducing pm
past 1% suffers from diminishing returns and can be insuf-
ficient to meet design goals [16], [54] (e.g., in Fig. 16
in [17, Sec. VII]: pm = 0.1% does not achieve delay
penalty ≤5%).
Previously, co-optimization of processing and design has
been performed via a trial-and-error-based approach [52].
However, this can be prohibitively time-consuming, potentially
requiring months of simulation time (details in Section IV).
In Section III, we present a methodology that efficiently selects
effective combinations of CNT processing options and CNFET
circuit design techniques to overcome CNT variations.
Fig. 8. (a) Subset of logic gates in an example circuit module. (b) Illustration
of two rows of standard cells that depicts the relationship between the sam-
pling region CNT counts (e.g., n1, n2, . . . , n12) and the CNT counts of each
CNFET [53]. For example, the CNT count for CNFET P3,1 in inverter U3 is
n1 + n2 + n3 = 2 + 3 + 3 = 8.
III. RAPID CO-OPTIMIZATION OF PROCESSING & DESIGN
An existing approach to overcome CNT variations is based
on brute-force trial-and-error [52]: a designer iterates over
many design points (design point: a combination of values for
the CNT processing parameters: IDC, pm, pRs, pRm, and the
CNFET design parameters: e.g., kSelUpsize), analyzing each one
until a design point that satisfies a target delay penalty and tar-
get PNMV with small energy cost is found. Furthermore, this
approach utilizes highly accurate yet computationally expen-
sive models to calculate delay penalties and PNMV . It suffers
from two significant bottlenecks.
1) The time required to calculate delay penalties and PNMV
limits the number of design points that can be explored.
2) The number of required simulations can be exponential
in the number of CNT processing and CNFET design
parameters.
Our methodology overcomes these bottlenecks as follows.
1) We estimate delay penalties >100× faster than the
previous approach and efficiently calculate PNMV
≤ 0.001%, enabling exploration of many more design
points while maintaining sufficient accuracy to make
correct design decisions (details in Section IV).
2) We use a gradient descent search algorithm, based on
delay and PNMV sensitivity information with respect to
the processing parameters, to systematically guide the
exploration of design points (details in Section III-D).
A. Rapid Quantification of Circuit Delay Penalty
To quantify CNFET circuit delay variations, we leverage
the probabilistic framework in [52], which is based on an MC
SSTA approach with two key enhancements.
1) Highly Efficient Sampling Method: It is not trivial to
analytically model the effects of CNT correlation at the
circuit level. We partition the circuit area in sampling
regions, each of which has its own independent CNT
count. The CNT count of each CNFET is then the sum of
the CNT counts of each sampling region that it overlaps
(example shown in Fig. 8) [53].
2) Variation-Aware Timing Model: The drive current and
parasitic capacitances of CNFETs are modeled as
affine functions of the CNT counts in each sampling
region [53].
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS 1087
TABLE II
VARIABLES IN THE FULL-CIRCUIT DELAY MODEL
We incorporate two additional enhancements to improve
computation time and for sensitivity analysis of CNFET circuit
delay variations versus each processing parameter.
1) Gaussian Approximation of CNT Count Distributions:
This allows us to factor the variation-aware timing model
into two components, one of which does not depend
on the processing parameter values and can therefore
be precomputed (details below). The CNT count vari-
ables are thus elements of the set of real numbers (R)
instead of the set of nonnegative integers (Z+) [49].
The accuracy of this approximation is validated in
[17, Sec. VI-B].
2) Linearized Timing Model for Delay Variations: We
leverage the same timing model as in [53] to compute
the maximum path delay of a circuit when no varia-
tions are present (nominal delay). Then, we linearize
this nonlinear timing model (around the nominal case),
and use the resulting linearized timing model to analyze
the impact of CNT variations on CNFET circuit delay
variations (details in [17, Sec. VIII]). Similar techniques
are often used to approximate silicon-CMOS-based cir-
cuit delays in early design stages [24]. Unlike in [53],
we fix the input slew rate of each logic gate to its nom-
inal value. This allows us to efficiently compute all of
the logic gate delays in a circuit simultaneously. These
approximations have minimal impact on our design
choices (Section IV). We refer to the model in [53] as
the nonlinear timing model, and to the model described
below as the linearized timing model.
To formulate the delay model for the full circuit, let µR
and σR be the mean and standard deviation of the sampling
region CNT count distribution (µR and σR are functions of
the processing parameters shown in Table I). The first step to
estimate the delay penalty of a design point is to sample the
CNT count for each sampling region and for each MC trial.
Each sample is one entry in a matrix N ∈ Rr×n, where r is
the total number of sampling regions and n is the total number
of MC trials. We then compute the total capacitive load and
drive current for each of the m gates (for each trial) via an
affine transformation of the region CNT counts (based on the
model in [53]). We express this transformation in matrix form,
where CTot, IDrive ∈ Rm×n
CTot = ACLoadN + bCLoad1T (6)
IDrive = AIDriveN + bIDrive1T . (7)
Our delay models are fully specified by ACLoad,AIDrive ∈
Rm×r and column vectors bCLoad, bIDrive ∈ Rm, which contain
the coefficients of the affine transformations from the sam-
pling region CNT counts to the CNFET drive currents and
parasitic capacitances [53]. Next, we factor out µR and σR,
a crucial step in achieving computational efficiency. We rewrite
N = µR11T + σRX, where each element of X ∈ Rr×n is
distributed according to a unit Gaussian distribution, allow-
ing (6)-(7) to be written as
CTot = σRACLoadX + (µRACLoad1 + bCLoad)1T (8)
IDrive = σRAIDriveX + (µRAIDrive1 + bIDrive)1T . (9)
Note that, 1 is a column vector with every element equal
to 1, and multiplication of a matrix by a scalar (e.g., µR or
σR) indicates that each element in the matrix is multiplied
by that scalar. Any product that does not contain µR or σR is
independent of the processing parameters, and can therefore be
precomputed. The dominant computational tasks are the matrix
multiplications AX {which are O(mn) since A is sparse [12]}.
Precomputing such terms (and factoring in the multiplication
of CTot and VDD), yields equivalent expressions for total charge
and drive currents that require scalar operations (see Table II
for variable definitions)
QTot = σRQMC + (µRqExp + qFix)1T (10)
IDrive = σRIMC + (µRiExp + iFix)1T . (11)
Precomputing QMC, qExp, qFix, IMC, iExp, and iFix (Table II)
subsequently allows each logic gate delay to be efficiently
computed with only two multiplications, one division, and
three additions per trial [only counting operations in (12) that
must be computed for each trial]. This includes the addition
of dFix ∈ Rm, a vector of fixed delays (e.g., input delays from
external circuits). The matrix division in (12) is element-wise
D = σRQMC +
(
µRqExp + qFix)1T
σRIMC + (µRiExp + iFix)1T + dFix1T . (12)
We then perform static timing analysis (STA) for each MC
trial (and for the nominal case), and use the results to estimate
T95 and the delay penalty. The total circuit energy is computed
using a model of the form E = (1/2)CV2 [48]
ETot = (1/2)VDD1T((1/n)σRQMC1 + µRqExp + qFix). (13)
B. Rapid Quantification of Circuit PNMV
Our method of analyzing circuit PNMV consists of two key
components, each of which is described in this section.
1) A variation-aware SNM model, which computes VOH,
VIH, VIL, and VOL (these terms are defined in
Section II-C) as functions of the CNT counts of the
CNFETs within a logic gate. This model can be used to
compute SNM for every gate pair in the circuit.
2) A method to numerically calculate low PNMV values
(e.g., ≤0.001%), given the variation-aware SNM model
and given a network of cascaded logic gates (e.g., a cir-
cuit module after steps 1–3 in Fig. 3). This technique
accounts for correlations in CNT count among CNFETs.
1) Variation-Aware Static Noise Margin Model: We refer
to VOH, VIH, VIL, and VOL as the VTC parameters, and we
model them for each stage of cascaded logic. We distinguish
logic stages from standard cells since a standard cell can con-
sist of multiple logic stages (e.g., the standard cell BUF_X1
consists of two cascaded inverters, each of which is one logic
stage). For standard cells with multiple logic stages, we model
the VTC parameters separately for each logic stage (e.g., we
consider the cross-coupled inverters in a D-latch as two sep-
arate logic stages). For consistency with the terminology in
1088 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 7, JULY 2015
Fig. 9. Variations in the VTC due to CNT count variations (shown for an
inverter at the 5 nm node with VDD = 0.35 V: e.g., inverter U3 in Fig. 8,
with nP = n(P3,1)P and nN = n(N3,1)N ). Example VTCs: simulated using
SPICE by sweeping VIN to obtain VOUT. (Inset) VTC parameters versus CNT
count. Markers represent extracted values of the VTC parameters, lines rep-
resent the SNM model (14) for T(INV_X1) with TVOH0 = 0.33, TVOH1 = 0,
TVIH0 = 0.20, TVIH1 = 0.05, TVIL0 = 0.15, TVIL1 = 0.05, TVOL0 = 0.02,
TVOL1 = 0.
Section II-C (and without loss of generality), we assume that
G(dr) and G(ld) in a gate pair each represent a single logic
stage. We also define the state of a logic stage input or output
as its logic value (0 or 1). A logic stage input is sensitized if
the logic stage output depends on the state of that input (given
the logic values of all the other inputs).
For each logic stage input in our standard cell library, we
model the VTC parameters for every case in which that input is
sensitized (considering all possible combinations of the other
inputs). The VTC parameters are functions of the CNT counts
of the p- and n-type CNFETs (there is a CNT count variable
for each CNFET in the circuit) which: 1) are gated by that
input and 2) connect the logic stage output to either VDD or
ground through a series of CNFETs in the “on” state (see
[17, Sec. IX-B] for an example). We define nP as the sum
of the CNT counts of all such p-type CNFETs. We similarly
define nN for the n-type CNFETs. For example, inverter U3
in Fig. 8 consists of a p-type CNFET (labeled “P3,1”) and
an n-type CNFET (“N3,1”). The CNT counts of P3,1 and
N3,1 are n(P3,1)P and n
(N3,1)
N , respectively. Then nP = n(P3,1)P
and nN = n(N3,1)N .
For the NAND2 gate U4 in Fig. 8 (as an example of a logic
stage with multiple inputs), we separately model the VTC
parameters for each input: in1 and in2. Since there are two
sets of VTC parameters for the NAND2 gate and only one
output, the worst-case values for the output levels VOH and
VOL (which are modeled as being independent of the CNT
count: details at the end of this section) are selected from the
two sets of VTC parameters (i.e., so that SNM is the lowest,
see [17, Sec. IX-B] for an example).
Fig. 9 illustrates SPICE simulation data showing that the
VTC of a logic stage is sensitive to CNT count variations.
For example, the VTCs in Fig. 9 are representative of inverter
U3 in Fig. 8, with CNFET CNT counts nP = n(P3,1)P and nN =
n
(N3,1)
N ; variations in nP and nN cause variations in VOH,VIH,
VIL, and VOL, resulting in SNM variations and larger PNMV.
We model the VTC parameters (VOH, VIH, VIL, and VOL) as
affine functions of log10(nP/nN) (this model is shown for an
inverter in Fig. 9), which achieves a root-mean-square (RMS)
modeling error ≤2.5 mV in all cases (details in [17, Sec.
IX-C]). For each case, this affine function is represented by
a real-valued matrix T ∈ R4×2⎡⎢⎣ VOHVIHVIL
VOL
⎤⎥⎦ = T[ 1log10( nPnN )
]
,T =
⎡⎢⎣TVOH0 TVOH1TVIH0 TVIH1TVIL0 TVIL1
TVOL0 TVOL1
⎤⎥⎦. (14)
To construct the full variation-aware SNM model (consist-
ing of many instances of T in our standard cell library: one
for each combination of input states that sensitizes each logic
stage input), we perform two steps for each instance of T.
1) Sample the CNT count for each CNFET in the logic
stage 2000 times [53] (using the distribution of CNT
count, given the CNFET widths and the experimentally
demonstrated processing parameter values in Table I),
and use SPICE simulations to obtain VOUT versus VIN.
For each sample, record nP and nN and extract VOH,
VIH, VIL, and VOL from the VTC in each simulation.
2) Find T via linear regression, given the recorded nP and
nN and the extracted VOH, VIH, VIL, and VOL.
We observed that in all cases, TVOH1 ≈ 0 and
TVOL1 ≈ 0 (14), indicating that the CNT count ratio does not
strongly affect the output levels of a logic stage.2 Thus, to
simplify our model, we set TVOH1 = 0 and TVOL1 = 0, and
maintain RMS modeling error ≤2.5 mV in all cases (details
in [17, Sec. IX-C]). In Fig. 9, the VTC parameters are plotted
versus log10(nP/nN).
This variation-aware SNM model is critical for efficiently
computing PNMV due to CNT count variations, as it relates
the VTC parameters to the CNFET CNT counts for each logic
stage (14). However, solving (5) for PNMV (in Section II-C) is
not trivial due to CNT correlation (Section II-A), which causes
correlated SNM among gate pairs. In Fig. 8, for example, gate
pairs (U1, U3) and (U3, U5) have correlated SNM since the
CNFETs in U3 and U5 have correlated CNT counts (they
overlap the same sampling regions).
2) Full-Circuit PNMV Model: Here, we demonstrate how
the variation-aware SNM model is used to efficiently cal-
culate PNMV ≤ 0.001% (which is desirable for VLSI-scale
circuits: Section II-C) without using an MC-based technique
(which would require many trials: e.g., >105 since 0.001% =
1/105). There are two key aspects in our framework for
computing PNMV.
1) PNMV Formulation: We formulate PNMV as a func-
tion of the sampling region CNT count variables
(which are independent) to account for the effects of
CNT correlation (Section II-A) on SNM.
2) Solving the PNMV Formulation Efficiently: We pro-
vide a systematic technique to identify a small subset
of all SNM constraints in the circuit [i.e., in (5) in
Section II-C], referred to as the critical SNM constraints,
which are the only SNM constraints that are required
to compute PNMV. Due to CNT correlation, an SNM
violation in a noncritical SNM constraint implies that
there must also be an SNM violation in a critical SNM
constraint; hence, the noncritical SNM constraint is not
required to compute PNMV. For the OpenSPARC mod-
ules, <1% of all SNM constraints can be critical SNM
constraints [17, Table VIII]. Hence, the time to compute
PNMV (proportional to the number of SNM constraints)
can improve by >100×.
The first step to compute PNMV is to convert the SNM
constraints in (5) into constraints on the CNT counts of each
CNFET [using (2), (3), and (5) in Section II-C]. For each gate
2The effect of the CNT count ratio on the VTC is similar to that of the
“beta ratio” βP/βN (a measure of the relative strength of the pull-up and
pull-down networks) in silicon-CMOS-based circuits, which does not have
a strong effect on the output levels VOH and VOL [48].
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS 1089
pair (G(dr), G(ld)) [e.g., (U3, U4) in Fig. 8], each SNMH con-
straint has the form in (15) and each SNML constraint has
the form in (16) (there can be multiple SNMH and SNML
constraints for a single gate pair, details below)
SNMH
(
G(dr),G(ld)
)
: V(dr)OH − V(ld)IH ≥ SNMR (15)
SNML
(
G(dr),G(ld)
)
: V(ld)IL − V(dr)OL ≥ SNMR. (16)
We then substitute the variation-aware SNM model (14) into
these constraints, using T(dr) and T(ld) to represent the SNM
model for G(dr) and G(ld), respectively [e.g., T(dr) = T(INV_X1)
and T(ld) = T(NAND2_X1-in1) for (U3, U4) in Fig. 8]
T(dr)VOH0 − T(ld)VIH0 − T(ld)VIH1log10(nP/nN) ≥ SNMR (17)
T(ld)VIL0 + T(ld)VIL1log10(nP/nN)− T(dr)VOL0 ≥ SNMR. (18)
These constraints are equivalently expressed in matrix form[
1 H˜1,2
H˜2,1 1
][
nP
nN
]
≼
[
0
0
]
(19)
H˜1,2 = −10((T
(dr)
VOH0−T(ld)VIH0−SNMR)
/
T(ld)VIH1) (20)
H˜2,1 = −10((T
(ld)
VIL0−T(dr)VOL0−SNMR)
/
T(ld)VIL1). (21)
Note that, the vector inequality in (19) is element-wise (as
are all vector inequalities in this section). To account for all
SNM constraints in the circuit, let c be the total number of
SNM constraints, and let t be the total number of CNFETs
(each with its own CNFET CNT count variable, e.g., n(P3,1)P
for CNFET P3,1 in Fig. 8). For every gate pair (G(dr), G(ld)),
there is an SNMH constraint and an SNML constraint for each
combination of input states that sensitizes the input to G(ld)
that is driven by G(dr). For example, if G(dr) drives an input
of G(ld) that can be sensitized by three combinations of input
states [e.g., input “A” of an “and-or-invert” logic stage with
Boolean function: out = (A + (B∗C))′ ], then there are three
SNMH and three SNML constraints for that gate pair (which
may constrain different CNT count variables).
The total number of SNM constraints in the circuit is c,
and each one imposes a constraint on the CNFET CNT count
variables (e.g., n(P3,1)P , n(N3,1)N , n(P4,1)P , etc., in Fig. 8). We
can represent these c constraints with a single matrix inequal-
ity, by first defining a column vector s ∈ Rt that contains
the CNT count variables for all the CNFETs in the circuit
(e.g., if the entire circuit consisted of the ten CNFETs shown
in Fig. 8, then s = [n(P3,1)P ; n(N3,1)N ; n(P4,1)P ; n(N4,1)N ; n(P4,2)P ;
n
(N4,2)
N ; n
(P5,1)
P ; n
(N5,1)
N ; n
(P6,1)
P ; n
(N6,1)
N ]). Then by using all
instances of T in the SNM model (14), we formulate each
SNM constraint as a constraint on the vector s, using the same
procedure as above to convert (15)-(16) into constraints on the
CNT counts in (19) (example in [17, Sec. IX-D]). We express
these constraints using a matrix H ∈ Rc×t, such that satisfy-
ing (22) is equivalent to satisfying all SNM constraints in the
circuit. [17, Table IX] summarizes all variables in this section.
Hs ≼ 0. (22)
Note that, 0 is a column vector with element entry equal
to 0. See [17, Sec. IX-D] for the formulation of (22) for the
example circuit shown in Fig. 8. Since all SNM constraints
in the circuit are represented in (22) (each of c rows in H
represents a single SNM constraint), PNMV (5) is the prob-
ability that (22) is violated (i.e., PNMV = 1 − P{Hs ≼ 0}).
However, solving for PNMV using (22) is not trivial since the
CNT count variables (i.e., the elements of s) can be highly
correlated due to CNT correlation (Section II-A). For exam-
ple, in Fig. 8, the active regions of CNFETs P3,1 and P5,1
are aligned, so their CNT counts (n(P3,1)P and n(P5,1)P ) are cor-
related. Thus, the SNM constraints on n(P3,1)P and the SNM
constraints on n(P5,1)P are dependent.
We can reformulate PNMV to efficiently account for CNT
correlation by transforming the constraints in (22) (that
constrain the CNFET CNT count variables, which are depen-
dent) into constraints on the sampling region CNT count
variables (which are independent). To do so, we first define
a column vector n ∈ Rr that contains the CNT count variables
for all the sampling regions (e.g., in Fig. 8, n = [n1; n2; n3;
n4; n5; n6; n7; n8; n9; n10; n11; n12; . . . ]). To formulate (22) in
terms of vector n instead of vector s (s: the CNFET CNT count
variables), the relationship between n and s is required. We
express this relationship as a linear transformation represented
by a matrix B ∈ {0, 1}t×r (details below) such that
s = Bn. (23)
There is one row in B for each CNFET in the circuit, and
one column for each sampling region. To determine B: if
CNFET i (of t total CNFETs) overlaps sampling region j (of r
total sampling regions), then the value of B in row i, column j
is 1 (i.e., Bi,j = 1); otherwise, Bi,j = 0 (as an example, B for
the circuit in Fig. 8 is shown in [17, Sec. IX-D]). Then by sub-
stituting (23) into (22), the SNM constraints can be expressed
in terms of the region CNT count variables (instead of the
CNFET CNT count variables), using a matrix K ∈ Rc×r
K = HB (24)
Kn ≼ 0. (25)
All SNM constraints in the circuit are represented in (25)
[just as in (22)], so (25) can also be used to determine PNMV
(i.e., PNMV = 1 − P{Kn ≼ 0}). The advantage of using (25)
instead of (22) is that all the variables in n (the vector of
sampling region CNT counts) are independent (unlike the cor-
related variables in s: the vector of CNFET CNT counts). For
example, (25) can be used to estimate PNMV via an MC-based
approach: for each trial, sample all elements of n (from the
distribution of sampling region CNT count) and evaluate Kn.
Then estimate PNMV as the fraction of trials that violate (25).
However, evaluating every SNM constraint in (25) is unnec-
essary since many of them are noncritical SNM constraints (as
described above, any SNM constraint that cannot be uniquely
violated without simultaneously violating another SNM con-
straint is not required to compute PNMV). See [17, Sec. IX-E]
for a detailed description, including examples, of how to
systematically identify and eliminate all noncritical SNM
constraints.
Eliminating these noncritical SNM constraints is crucial
to improve computational efficiency, as they can account for
≥99% of all SNM constraints in the circuit (e.g., for the
OpenSPARC modules [17, Table VIII]). Since each row of
K (25) represents an SNM constraint, we can remove the rows
in K that correspond to noncritical SNM constraints to form
K˜ ∈ Rp×r, where p is the number of critical SNM constraints
K˜n ≼ 0. (26)
1090 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 7, JULY 2015
To further improve computational efficiency, we then fac-
tor out µR and σR from the sampling region CNT count
variables n in (26) (just as we did for the full-circuit delay
model in Section III-A), allowing us to quickly recompute
PNMV after updating the processing parameter values (details
in Section III-D). We rewrite n = µR1 + σRx, where each
entry of x ∈ Rr is distributed according to a unit Gaussian
distribution; then (26) becomes
K˜x ≼ (µR/σR)b (27)
b = −K˜1. (28)
In (27), b is a p-dimensional vector of constants and
the matrix-vector product K˜x is a p-dimensional vector of
Gaussian random variables with covariance matrix C ∈ Rp×p
C = K˜K˜T . (29)
That is, K˜x is distributed according to a multivariate nor-
mal (MVN) distribution with covariance matrix C; thus,
PNMV = 1 − P{K˜x ≼ (µR/σR)b} can be solved numeri-
cally using existing software packages for computing MVN
probabilities (see [14]). In particular, consider the cumulative
distribution function (CDF) of the MVN-distributed matrix-
vector product K˜x (i.e., the MVNCDF); the probability that all
SNM constraints are satisfied (i.e., 1−PNMV) is equal to the
value of the MVNCDF at the p-dimensional point (µR/σR)b
PNMV = 1− P{K˜x ≼ (µR/σR)b} (30)
PNMV = 1−MVNCDF(C, (µR/σR)b). (31)
In [17, Sec. IX-F], we describe how to efficiently solve (31)
by leveraging the property that many terms in the covari-
ance matrix, C, are 0; e.g., for the OpenSPARC modules,
PNMV can be computed in less than 10 seconds using a sin-
gle 2.93 GHz processor core. In [17, Fig. 21], we validate
the accuracy of (31) against MC simulations. For the MC
approach, we first sample the vector of sampling region CNT
counts [n in (25)] for each trial. Then we estimate PNMV as
the fraction of samples that violate (25).
C. Circuit Performance Sensitivity to Processing Parameters
Our goal is to achieve small delay penalties and PNMV
with small "E. We quantify the tradeoff between total circuit
energy [ETot in (13)] and delay penalty using EDP95: defined
in (32). We also define the energy-PNMV-product (ENP)
metric to quantify the tradeoff between ETot and PNMV
EDP95 = ETotT95 (32)
ENP = ETotPNMV. (33)
While rapid computation of circuit delay penalty and
PNMV overcomes the computation time bottleneck of ana-
lyzing a single design point, we still require a method for
intelligently exploring the large space of CNT processing
options. In general, a common measure of the sensitivity
of an objective function (e.g., EDP95 or ENP) with respect
to each of its input variables (e.g., the processing param-
eters) is its gradient. The EDP95 and ENP gradients are
defined in (34)–(36) and are used to guide the exploration
of processing options to improve delay penalties and PNMV
(Section III-D). Fig. 10 illustrates a flowchart of the steps used
to compute (32)–(36). The gradients (34)–(36) are computed as
Fig. 10. SDPA to calculate the delay penalty, EDP95, ∇EDP95, ENP, and∇ENP of a single design point.
described [17, Sec. X-A].
∇ETot =
[
∂ETot
∂IDC
; ∂ETot
∂pm
; ∂ETot
∂pRs
]
∇T95 =
[
∂T95
∂IDC
; ∂T95
∂pm
; ∂T95
∂pRs
]
(34)
∇PNMV =
[
∂PNMV
∂IDC
; ∂PNMV
∂pm
; ∂PNMV
∂pRs
]
∇EDP95 = ∇ETotT95 + ETot∇T95 (35)
∇ENP = ∇ETotPNMV + ETot∇PNMV. (36)
D. Guided Exploration to Overcome CNT Variations
To overcome the bottleneck of trial-and-error-based search
(i.e., iterating over many combinations of values for the
processing parameters and design parameters defined in
Section II: IDC, pm, pRs, pRm, kSelUpsize), we use a gradient
descent-based strategy to systematically guide the improve-
ment of EDP95 and ENP in the presence of CNT variations
(while gradient descent strategies can converge to local rather
than global optima, in [17, Sec. X-C] we discuss techniques
to reduce the impact of local optima during gradient descent
in our methodology).
For any design point, we can use single design point
analysis (SDPA: Fig. 10) to determine the sensitivity of
each circuit performance metric (e.g., EDP95 or ENP) to
each processing parameter by computing its gradient (e.g.,
∇EDP95 or ∇ENP). These gradients can then be used to
identify which processing parameters should be improved
(and by how much) to efficiently improve the circuit perfor-
mance metrics. For example, consider EDP95: ∂EDP95/∂IDC,
∂EDP95/∂pm, and ∂EDP95/∂pRs indicate how sensitive EDP95
is to improvements in IDC, pm, and pRs. Thus, to effectively
improve EDP95, we can update each processing parame-
ter by an amount proportional to its corresponding value in
∇EDP95 (e.g., IDC corresponds to ∂EDP95/∂IDC). We refer
to each such update as a gradient descent step (details in
[17, Sec. X-B]).
Before describing the full gradient descent methodology,
we define the initial design point as the design point after
EDP optimization in the nominal case (i.e., after steps 1–3 in
Fig. 3). Also, we define the initial processing parameter values
as the processing parameter values of the initial design point
(e.g., IDC = 0.50, pm = 10%, pRs = 4%, pRm = 99.99%:
Table I). Starting from the initial design point, we first perform
selective upsizing (as described in Section II-A), incrementally
increasing kSelUpsize (the number of standard cells to upsize)
to generate a set of design points referred to as the initial
energy-delay tradeoff curve (the values of kSelUpsize are chosen
so that each increase in kSelUpsize increases "E by ∼1%–2%
to identify multiple design points with "E ≤ 5%).
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS 1091
Fig. 11. Gradient descent-based methodology to meet delay penalty, PNMV,
and "E requirements. SDPA details in Fig. 10.
The full methodology, illustrated in Fig. 11, combines selec-
tive upsizing and gradient descent to overcome the impact
of CNT count variations on delay penalty and PNMV. After
generating the initial energy-delay tradeoff curve via selec-
tive upsizing, our goal is to identify multiple design points
that meet both a delay penalty constraint (e.g., delay penalty
≤5%) and a PNMV constraint (e.g., PNMV ≤ 0.001%) with
minimal energy cost (e.g., "E ≤ 5%). Such design points
that simultaneously satisfy all these design goals are referred
to as acceptable design points. Consequently, this is a fea-
sibility problem in which we search for design points that
meet two constraints, and we solve it using a variation of
an alternating projections (AP) algorithm [3]. A typical AP
algorithm iteratively projects a point onto multiple constraints
until all are satisfied. In our methodology, we use gradient
descent instead of projection; the full methodology (Fig. 11) is
described below (example in Fig. 12).
1) Analyze the Initial Design Point: Perform SDPA
(Fig. 10) on the initial design point [with initial pro-
cessing parameter values and kSelUpsize set to minimize
EDPNomOpt (1): i.e., after steps 1–3 in Fig. 3].
2) Gradient Descent: Alternate between: 1) performing
gradient descent steps using ∇EDP95 until the delay
penalty constraint is satisfied and 2) performing gradient
descent steps using ∇ENP until the PNMV constraint is
satisfied. This procedure continues until either: a) both
constraints are satisfied simultaneously (i.e., an accept-
able design point is found) or b) "E is too large or
the processing parameters are too constrained (e.g., a
design point with "E > 5% is reached, or the required
processing parameter values may be difficult to achieve
experimentally: both are design choices).
3) Selective Upsizing: Reinitialize the processing parame-
ters to their initial values (thus returning to the initial
Fig. 12. Gradient descent methodology (Fig. 11) to achieve delay penalty
≤5%, PNMV ≤ 0.001% (for SNMR = VDD/6), "E ≤ 5% (5 nm “pku”
OpenSPARC module). Gradient descent paths descend from the initial energy-
delay tradeoff curve (IDC = 0.50, pm = 10%, pRs = 4%, pRm = 99.99%).
The point (delay penalty, "E) = (0%, 0%) represents the EDP-optimized
nominal design point (Section II-A, Fig. 3). "E < 0 for point A since ETot
depends on the number of s-CNTs after m-CNT removal (i.e., the CNT count
variables), as shown in (13) in Section III-A; due to CNT count variations
(e.g., resulting from pm > 0%, pRs > 0%), the total number of s-CNTs in all
CNFETs can be reduced versus the nominal case (no variations).
energy-delay tradeoff curve) and then perform selective
upsizing (by increasing kSelUpsize) to move to the next
point on the initial energy-delay tradeoff curve. If "E
from selective upsizing is too large (e.g., "E > 5%),
then proceed to step 4 (below). Otherwise, loop back
to step 2 (gradient descent) to search for an additional
acceptable design point.
4) Design Point Selection and Validation: Select a single
design point from all acceptable design points identified
using gradient descent. For example, the designer can
select the acceptable design point with the minimum
EDP95 or with the most relaxed processing require-
ments (a design choice). Finally, highly accurate models
(e.g., the nonlinear timing model) can be used to validate
the selected design point (if all constraints are not satis-
fied during validation, then perform additional gradient
descent steps until they are satisfied).
Fig. 12 illustrates an example of the gradient descent-based
methodology (Fig. 11) to meet delay penalty ≤5% and PNMV
≤ 0.001% with "E ≤ 5%. Starting from point A (the initial
design point), we perform selective upsizing to generate the
initial energy-delay tradeoff curve (as described earlier in this
section) represented by points A–F. Then, using the method-
ology in Fig. 11, we perform gradient descent (starting from
the initial design point: point A) until delay penalty ≤5% and
PNMV ≤ 0.001% (at point G: an acceptable design point).
Next, the processing parameters are reinitialized and then
selective upsizing brings us to point B on the initial energy-
delay tradeoff curve. Again, gradient descent is performed to
identify another acceptable design point (point H). This pro-
cess repeats until we reach point F on the initial energy-delay
tradeoff curve, which has "E > 5%, concluding the search
for acceptable design points.
In Fig. 12, gradient descent has identified multiple
acceptable design points with varying "E and process-
ing requirements. Furthermore, alternative sets of acceptable
design points can be identified by adjusting the gradient
descent step procedure: e.g., if IDC is difficult to improve
(i.e., it is difficult to control CNT density variations experimen-
tally), then the gradient descent step can be weighted toward
larger updates in pm or pRs, or can be forced never to update
IDC past a predetermined hard-limit. These constraints can be
provided as inputs, and are features of this flexible framework.
1092 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 7, JULY 2015
Fig. 13. Delay penalty vs. "E (5 nm “gkt” OpenSPARC module). Large
markers: values computed using the nonlinear model; small markers: lin-
earized model. Processing parameter values are in [17, Table VII] (tests 1 & 8).
Both models identify that the same design point has the minimum EDP95,
thus EDP95,sub-opt = 0%. If the models had selected different design points
[e.g., if the linearized model had selected the labeled point at (2.3%, 8.2%)
with EDP95,sub-opt = 2.5%] then EDP95,sub-opt ≥ 0%. The point (delay
penalty, "E) = (0%, 0%) represents the EDP-optimized nominal design point
(Section II-A, Fig. 3). Design points with "E < 0 are for designs with smaller
CNFET widths.
IV. RESULTS
We present two sets of results to demonstrate that we have
overcome the bottlenecks of brute-force trial-and-error-based
approaches. The first set of results (Section IV-A) demonstrates
that we can analyze a set of design points >100× faster than
before, while maintaining sufficient accuracy to make cor-
rect design decisions. The second set of results (Section IV-B)
demonstrates the ability of this gradient descent algorithm to
identify multiple processing options to meet design goals (e.g.,
delay penalty≤5% and PNMV ≤ 0.001% with"E ≤ 5%) with-
out exhaustive search. Using the results from gradient descent,
we provide practical processing guidelines for each node and
for multiple values of VDD such that: even in the presence of
CNT variations, CNFET circuits can maintain ≥90% of the
projected EDP benefits of nominal CNFET circuits.
A. Linearized Timing Model Validation
Here, we validate the speed and accuracy of the linearized
timing model to analyze circuit delay variations. We first
choose a set of design points that would typically be chosen
by a designer seeking to optimize EDP95 using a brute-force-
search-based approach: we use the design points in [52] as
a reference. We analyze 112 design points including all com-
binations of: eight unique sets of processing parameter values
(the same as in [52]: see [17, Table VII]), and 14 differ-
ent kSelUpsize values (each increase in kSelUpsize increases "E
by ∼1%–5%, e.g., in Fig. 13; higher resolution requires more
computation time).
After choosing the set of design points, we use the nonlin-
ear timing model to compute EDP95 of every design point and
then select the design point with the best (minimum) EDP95.
We also record the total required computation time. We then
perform the same procedure using the linearized timing model.
We evaluate: 1) the total computation time for each model and
2) the degradation (increase) in EDP95 due to using the lin-
earized timing model. To quantify this degradation, we use the
EDP95 sub-optimality metric defined in (37); it is computed
using only the nonlinear timing model to compare: EDP95 of
the design point selected by each model
EDP95, sub-opt =
EDP(Selected By Linearized Model)95, Measured By NonLinear Model
EDP(Selected By NonLinear Model)95, Measured By NonLinear Model
− 1. (37)
TABLE III
EDP95 SUB-OPTIMALITY AND COMPUTATION TIME (MEASURED ON
A SINGLE 2.93 GHZ PROCESSOR WITH NO PARALLELIZATION).
LOGIC GATE COUNT IS TAKEN FROM THE SYNTHESIZED
NETLIST AT THE 5 NM NODE (WITH VDD = 0.50 V)
TABLE IV
PROCESSING ROUTES TO MEET DELAY PENALTY CONSTRAINTS AND
PNMV ≤ 0.001% WITH "E ≤ 5% FOR ALL OPENSPARC MODULES
SIMULTANEOUSLY. IN ALL CASES: VDD = 0.50 V, SNMR = VDD/6,
pRm = 99.99%, COUNT-LIMITED YIELD ≥99.999%
Ideally, the same design point is selected using each of
the two models (resulting in EDP95,sub-opt = 0%, exam-
ple in Fig. 13). This is the case for five of the eight
OpenSPARC modules (5 nm node), and the other three
have EDP95,sub-opt ≤ 2% (Table III).3 The linearized model
achieves >100× speed-up in all cases.
B. CNT Processing and CNFET Circuit Design Guidelines
We now demonstrate the effectiveness of the gradient
descent methodology to identify multiple sets of guidelines
for processing parameters (i.e., processing routes) that meet
design goals for all OpenSPARC modules simultaneously.
For each OpenSPARC module, we first perform gradient
descent (Fig. 11, with initial processing parameter values:
IDC = 0.50, pm = 1%, pRs = 4%, pRm = 99.99%) to identify
multiple acceptable design points (with delay penalty ≤5%,
PNMV ≤ 0.001%, "E ≤ 5%), and then we select the design
point with the most relaxed processing requirements (though
other selection criteria can be used, e.g., lowest EDP95).
Then, for each processing parameter, we select its most
constrained value (i.e., the value closest to its ideal value:
Table I) over all the selected design points (one for each
OpenSPARC module). These values form a processing route,
and we then validate that design goals are met for all modules
for this processing route (e.g., using the nonlinear model to
compute delay penalty). Table IV provides processing routes
for the OpenSPARC modules at the 14, 10, 7, and 5 nm nodes
(highlighted entries in Table IV are limited by the PNMV
constraint; other entries are limited by the delay penalty
constraint). For each node, processing routes are shown for
multiple delay penalty constraints to illustrate the tradeoff
between delay penalty, PNMV, and processing requirements.
All processing routes in Table IV meet count-limited yield
≥99.999%, resulting from minimum-width upsizing (step 2 in
Fig. 3: to reach count-limited yield ≥99.9%) and CNT process
3In general, EDP95,sub-opt depends on the chosen set of design points
since there is a finite number of possible values for EDP95,sub-opt; results
in Table III reflect a typical brute-force-based EDP optimization [52].
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS 1093
TABLE V
PROCESSING ROUTES TO MAINTAIN EDP BENEFIT ≥90% (VERSUS THE
NOMINAL CASE) AND PNMV ≤ 0.001% (SNMR = VDD/6) WITH
"E ≤ 5% (ALL OPENSPARC MODULES). COUNT-LIMITED
YIELD ≥99.999% IN ALL CASES
Fig. 14. Relative improvement of each processing parameter (versus IDC =
0.50, pm = 1%, pRs = 4%) for processing routes in Table V (VDD = 0.50 V).
pRm is not shown since pRm = 99.99% in all cases in Table V.
improvements; if count-limited yield <99.999%, then we
can return to step 2 in Fig. 3 to increase WMIN, then repeat
gradient descent (Fig. 11) to find processing routes.
We have so far targeted delay penalty ≤5%, PNMV ≤
0.001%, and "E ≤ 5%, which maintains ≥90% of the pro-
jected EDP benefits of nominal CNFET circuits despite CNT
variations. However, achieving these design goals can impose
processing requirements that may be difficult to achieve exper-
imentally (e.g., IDC = 0.19 for delay penalty ≤5% at the 5 nm
node: Table IV). In Table V, we provide alternative processing
routes that maintain ≥90% of the projected EDP benefits of
nominal CNFET circuits; we target design points with EDP
benefit ≥90% (versus nominal) with a relaxed delay penalty
constraint (≤10%, resulting in lower "E to meet the EDP
benefit goal).
The amount by which each processing parameter is
improved is a measure of its effectiveness to improve delay
penalty and PNMV (gradient descent incurs larger updates for
processing parameters that more significantly impact these per-
formance metrics, details in [17, Sec. X-B]). Fig. 14 shows the
relative improvement [R in (40)] of IDC, pm, and pRs from
their initial values to their final values (in Table V). R is cal-
culated using the percentage improvement (I) and the total
improvement (ITot) of the processing parameters
(IIDC, Ipm, IpRs) =
(
1− IDC
final
IDCinit
, 1− p
final
m
pinitm
, 1− p
final
Rs
pinitRs
)
(38)
ITot = IIDC + Ipm + IpRs (39)
(RIDC,Rpm,RpRs) =
(
IIDC
ITot
,
Ipm
ITot
,
IpRs
ITot
)
. (40)
The relative improvement is highest for IDC for all nodes,
showing that IDC is a highly effective parameter to improve
for reducing delay penalties and PNMV in an energy-efficient
manner. From our results, we make the following conclusions.
1) The computationally efficient linearized timing model
runs over 100× faster than the nonlinear timing model,
and maintains sufficient accuracy to identify design
points with EDP95,sub-opt ≤ 2% for all test cases.
2) PNMV ≤ 0.001% can be efficiently computed.
3) Gradient descent is a systematic and scalable method to
meet both delay penalty and PNMV constraints.
4) Gradient descent can efficiently identify multiple pro-
cessing routes to meet design goals.
5) In contrast to traditional thinking (which focuses on
reducing pm to ultralow values), gradient descent iden-
tifies that reducing IDC is a highly effective means of
meeting delay penalty and PNMV constraints, and that
reducing pm past 1% suffers from diminishing returns.
Unlike trial-and-error approaches [52], gradient descent
establishes these facts in a highly rigorous manner.
V. CONCLUSION
We have demonstrated a systematic methodology for joint
exploration and optimization of CNT processing and CNFET
circuit design to overcome the significant challenge of CNT
variations. Our approach enables quick evaluation of delay
variations and PNMV of CNFET VLSI circuits with >100×
speed-up versus existing approaches. Our gradient descent-
based framework accurately identifies the most important
processing parameters, in conjunction with CNFET circuit
sizing, to achieve high energy efficiency while satisfying
circuit-level noise margin and yield constraints. Using this
framework, an important question regarding CNT variations
can be answered.
Question: What values of IDC, pm, pRs, and pRm should
be targeted for highly scaled VLSI CNFET circuits to main-
tain a significant portion of their projected speed and energy
efficiency benefits despite CNT variations, while also meeting
circuit-level noise margin and yield constraints?
Answer: At the 5 nm node, we recommend IDC = 0.25, pm =
0.9%, pRs = 2.5%, and pRm = 99.99% to maintain≥90% of the
projected EDP benefits versus nominal CNFET circuits, with
PNMV ≤ 0.001%, functional yield ≥99.999%, and "E ≤ 5%.
These processing guidelines are attractive since pm = 1%
and pRm = 99.99% have been experimentally demonstrated,
pRs = 4% has been achieved, and promising work for continued
improvement of pRs has been shown [19]. This leaves IDC to be
improved by 2× (versus IDC = 0.50: shown experimentally),
thus identifying CNT density variations as an important topic
of research. Additionally, processing requirements may be fur-
ther relaxed by combining various CNT processing techniques
(e.g., CNT sorting [1] followed by VMR [32]). Processing
routes for other nodes are provided in Table V.
Unlike existing trial-and-error techniques, our framework
can systematically explore the large space of CNT processing
options, and generate a variety of processing routes depending
on CNT processing technology constraints. Such systematic
exploration is essential for a successful CNFET technology to
avoid potential obstacles. Future research directions include
the following.
1) Incorporation of CNT-metal contact resistance variations
and threshold voltage variations into our framework, as
well as other CNT processing techniques (e.g., [19]).
2) Experimental validation of model parameters for high-
density CNT growth techniques and for channel lengths
closer to the ballistic regime [41], [43].
3) Examination of the applicability of our framework for
other emerging nanotechnologies, as many emerging
nanotechnologies are expected to exhibit substantial
variations. Our methodology can be adapted to overcome
challenges in those technologies as well.
ACKNOWLEDGMENT
The authors would like to thank STARnet SONIC, NSF,
the Stanford Graduate Fellowship, and the Hertz Foundation
1094 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 7, JULY 2015
Fellowship (for M. M. Shulaker). They would also like to
thank S. Keller, California Institute of Technology, S. Sinha,
ARM, Inc., and Prof. D. Antoniadis, Massachusetts Institute
of Technology, for valuable discussions.
REFERENCES
[1] M. S. Arnold, A. A. Green, J. F. Hulvat, S. I. Stupp, and M. C. Hersam,
“Sorting carbon nanotubes by electronic structure using density
differentiation,” Nat. Nanotech., vol. 1, no. 1, pp. 60–65, 2006.
[2] S. Bangsaruntin, G. M. Cohen, A. Majumdar, and J. W. Sleight,
“Universality of short-channel effects in undoped-body silicon nanowire
MOSFETs,” IEEE Electron Device Lett., vol. 31, no. 9, pp. 903–905,
Sep. 2010.
[3] H. H. Bauschke and J. M. Borwein, “Dykstra’s alternating projection
algorithm for two sets,” J. Approx. Theory, vol. 79, no. 3, pp. 418–443,
1994.
[4] G. Budiman, Y. Gao, X. Wang, S. Koswatta, and M. Lundstrom.
(2014). Cylindrical CNTMOSFET Simulator. [Online]. Available:
https://nanohub.org/resources/moscntr
[5] Q. Cao et al., “Medium-scale carbon nanotube thin-film integrated
circuits on flexible plastic substrates,” Nature, vol. 454, pp. 495–500,
Jul. 2008.
[6] L. Chang et al., “IEDM short course,” in Proc. Int. Electron Device
Meeting (IEDM), San Francisco, CA, USA, 2012.
[7] Z. Chen et al., “An integrated logic circuit assembled on a single carbon
nanotube,” Science, vol. 311, no. 5768, p. 1735, 2006.
[8] C. Chu, “FLUTE: Fast lookup table based wirelength estimation
technique,” in Proc. Int. Conf. Comput.-Aided Design (ICCAD),
San Jose, CA, USA, 2004, pp. 696–701.
[9] J. Deng et al., “Carbon nanotube transistor circuits: Circuit-level perfor-
mance benchmarking and design options for living with imperfections,”
in Proc. Int. Solid-State Circuits Conf. (ISSCC), San Francisco, CA,
USA, 2007, pp. 70–588.
[10] R. Dennard et al., “Design of ion-implanted MOSFET’s with very small
physical dimensions,” IEEE J. Solid-State Circuits, vol. SC-9, no. 5,
pp. 256–268, Oct. 1974.
[11] L. Ding et al., “CMOS-based carbon nanotube pass-transistor logic
integrated circuits,” Nat. Commun., vol. 3, Feb. 2012, Art. ID 677.
[12] (Jul. 2012). Eigen C++ Template Library for Linear Algebra. [Online].
Available: http://eigen.tuxfamily.org
[13] A. D. Franklin et al., “Sub-10 nm carbon nanotube transistor,”
Nano Lett., vol. 12, no. 2, pp. 758–762, 2012.
[14] A. Genz, “Numerical computation of multivariate normal probabilities,”
J. Comput. Graph. Stat., vol. 1, no. 2, pp. 141–149, 1992.
[15] M. C. Hansen et al., “Unveiling the ISCAS-85 benchmarks: A case
study in reverse engineering,” IEEE Des. Test, vol. 16, no. 3, pp. 72–80,
Jul. 1999.
[16] G. Hills et al., “Rapid exploration of processing and design guidelines
to overcome carbon nanotube variations,” in Proc. Design Autom. Conf.,
Austin, TX, USA, 2013, pp. 1–10.
[17] G. Hills et al. Rapid Co-Optimization of Processing and Circuit
Design to Overcome Carbon Nanotube Variations. [Online]. Available:
http://www.arxiv.org
[18] (2013). ITRS. [Online]. Available: http://www.itrs.net/Links/2013ITRS/
Home2013.htm
[19] S. H. Jin et al., “Using nanoscale thermocapillary flows to cre-
ate arrays of purely semiconducting single-walled carbon nanotubes,”
Nat. Nanotechnol., vol. 8, no. 5, pp. 347–355, 2013.
[20] S. J. Kang et al., “High-performance electronics using dense, perfectly
aligned arrays of single-walled carbon nanotubes,” Nat. Nanotechnol.,
vol. 2, no. 4, pp. 230–236, 2007.
[21] A. Khakifirooz et al., “A simple semiempirical short-channel MOSFET
current–voltage model continuous across all regions of operation and
employing only physical parameters,” IEEE Trans. Electron Devices,
vol. 56, no. 8, pp. 1674–1680, Aug. 2009.
[22] J. Lohstroh et al., “Worst-case static noise margin criteria for logic cir-
cuits and their mathematical equivalence” IEEE J. Solid-State Circuits,
vol. 18, no. 6, pp. 803–807, Dec. 1983.
[23] J. Luo et al., “A compact model for carbon nanotube field-effect transis-
tors including non-idealities and calibrated with experimental data down
to 9 nm gate length,” IEEE Trans. Electron Devices, vol. 60, no. 6,
pp. 1834–1843, Jun. 2013.
[24] D. Markovic, V. Stojanovic, B. Nikolic, M. A. Horowitz, and
R. W. Brodersen, “Methods for true energy-performance optimization,”
IEEE J. Solid-State Circuits, vol. 39, no. 8, pp. 1282–1293, Aug. 2004.
[25] (Aug. 2014). Nangate Open Cell Libraries. [Online]. Available:
http://www.nangate.com
[26] S. Nassif et al., “High performance CMOS variability in the
65 nm regime and beyond,” in Proc. IEEE Int. Electron Devices
Meeting (IEDM), Washington, DC, USA, 2007, pp. 569–571.
[27] (Dec. 2011). OpenSPARC. [Online]. Available: http://
www.opensparc.net/opensparc-t2
[28] H. Park et al., “High-density integration of carbon nanotubes via chem-
ical self-assembly,” Nat. Nanotechnol., vol. 7, no. 12, pp. 787–791,
2012.
[29] J. Parker, C. Beasley, A. Lin, H.-Y. Chen, and H.-S. P. Wong, “Increasing
the semiconducting fraction in ensembles of single-walled carbon
nanotubes,” Carbon, vol. 50, no. 14, pp. 5093–5098, 2012.
[30] N. Patil, J. Deng, A. Lin, H.-S. P. Wong, and S. Mitra, “Design meth-
ods for misaligned and mispositioned carbon-nanotube immune circuits,”
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 10,
pp. 1725–1736, Oct. 2008.
[31] N. Patil et al., “Wafer-scale growth and transfer of aligned
single-walled carbon nanotubes,” IEEE Trans. Nanotechnol., vol. 8,
no. 4, pp. 498–504, Jul. 2009.
[32] N. Patil et al., “VMR: VLSI-compatible metallic carbon nanotube
removal for imperfection-immune cascaded multi-stage digital logic cir-
cuits using carbon nanotube FETs,” in Proc. IEEE Int. Electron Devices
Meeting (IEDM), Baltimore, MD, USA, 2009, pp. 1–4.
[33] N. Patil et al., “Scalable carbon nanotube computational and stor-
age circuits immune to metallic and mispositioned carbon nanotubes,”
IEEE Trans. Nanotechnol., vol. 10, no. 4, pp. 744–750, Jul. 2011.
[34] B. C. Paul et al., “Impact of a process variation on nanowire and nan-
otube device performance,” IEEE Trans. Electron Devices, vol. 54, no. 9,
pp. 2369–2376, Sep. 2007.
[35] A. Raychowdhury et al., “Variation tolerance in a multichannel
carbon-nanotube transistor for high-speed digital circuits,” IEEE Trans.
Electron Devices, vol. 56, no. 3, pp. 383–392, Mar. 2009.
[36] J. A. Roy and I. L. Markov, “High-performance routing at the nanometer
scale,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27,
no. 6, pp. 1066–1077, Jun. 2008.
[37] R. Saito, G. Dresselhaus, and M. S. Dresselhaus, Physical Properties of
Carbon Nanotubes. London, U.K.: Imperial College Press, 1998.
[38] N. Z. Shor, Minimization Methods for Non-Differentiable Functions.
Berlin, Germany: Springer, 1985.
[39] M. M. Shulaker et al., “Carbon nanotube computer,” Nature, vol. 501,
no. 7468, pp. 526–530, 2013.
[40] M. M. Shulaker et al., “Sensor-to-digital interface built entirely with
carbon nanotube FETs,” IEEE J. Solid-State Circuits, vol. 49, no. 1,
pp. 190–201, Jan. 2014.
[41] M. M. Shulaker et al., “Carbon nanotube circuit integration up to sub-
20 nm channel length,” ACS Nano, vol. 8, no. 4, pp. 3434–3443, 2014.
[42] M. M. Shulaker et al., “Monolithic 3D integration of logic and memory:
Carbon nanotube FETs, resistive RAM, and silicon FETs,” in Proc. Int.
Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2014,
pp. 27.4.1–27.4.4.
[43] M. M. Shulaker et al., “High-performance carbon nanotube field-effect
transistors,” in Proc. Int. Electron Devices Meeting (IEDM),
San Francisco, CA, USA, 2014, pp. 33.6.1–33.6.4.
[44] S. Sinha et al., “Design benchmarking to 7 nm with FinFET predictive
technology models,” in Proc. ACM/IEEE Int. Symp. Low Power Electron.
Design (ISLPED), Redondo Beach, CA, USA, 2012, pp. 15–20.
[45] (Apr. 2015). Stanford University VSCNFET Model. [Online]. Available:
https://nano.stanford.edu/stanford-cnfet2-model
[46] L. Wei, D. J. Frank, L. Chang, and H.-S. P. Wong, “A non-iterative
compact model for carbon nanotube FETs incorporating source exhaus-
tion effects,” in Proc. Int. Electron Devices Meeting (IEDM), Baltimore,
MD, USA, 2009, pp. 1–4.
[47] H. Wei et al., “Monolithic three-dimensional integration of car-
bon nanotube FET complementary logic circuits,” in Proc. Int.
Electron Devices Meeting (IEDM), Washington, DC, USA, 2013,
pp. 19.7.1–19.7.4.
[48] N. H. Weste and D. M. Harris, CMOS VLSI Design. Boston, MA, USA:
Pearson/Addison Wesley, 2005.
[49] J. Zhang, N. Patil, A. Hazeghi, and S. Mitra, “Carbon nanotube circuits
in the presence of carbon nanotube density variations,” in Proc. 46th
ACM/IEEE Design Autom. Conf. (DAC), San Francisco, CA, USA, 2009,
pp. 71–76.
[50] J. Zhang, N. Patil, and S. Mitra, “Probabilistic analysis and
design of metallic-carbon-nanotube-tolerant digital logic circuits,”
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 28, no. 9,
pp. 1307–1320, Sep. 2009.
[51] J. Zhang et al., “Carbon nanotube correlation: Promising opportunity
for CNFET circuit yield enhancement,” in Proc. 47th Design Autom.
Conf. (DAC), Anaheim, CA, USA, 2010, pp. 889–892.
[52] J. Zhang, N. Patil, H.-S. P. Wong, and S. Mitra, “Overcoming car-
bon nanotube variations through co-optimized technology and circuit
design,” in Proc. Int. Electron Devices Meeting (IEDM), Washington,
DC, USA, 2011, pp. 4.6.1–4.6.4.
[53] J. Zhang, “Variation-aware design of carbon nanotube digital VLSI
circuits,” Ph.D. dissertation, Dept. Electr. Eng., Stanford Univ., Stanford,
CA, USA, 2011.
[54] J. Zhang et al., “Carbon nanotube robust digital VLSI,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 31, no. 4, pp. 453–471,
Apr. 2012.
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS 1095
Gage Hills received the B.S. degree in electri-
cal engineering and computer science from Yale
University, New Haven, CT, USA, in 2007, and the
M.S. degree in electrical engineering from Stanford
University, Stanford, CA, USA, in 2012, where he
is currently pursuing the Ph.D. degree.
His current research interests include experimen-
tal demonstrations and statistical modeling of very
large-scale carbon nanotube-based digital circuits.
Jie Zhang (S’05–M’11) received the B.E. degree
in electronic engineering from Tsinghua University,
Beijing, China, in 2006, and the M.S. and Ph.D.
degrees in electrical engineering from Stanford
University, Stanford, CA, USA, in 2008 and 2011,
respectively.
Since 2011, he has been with Google, Inc.,
Mountain View, CA, USA. His current research
interests include modeling and simulation of carbon
nanotube-based devices and circuits, with a focus on
variation-aware design and optimization.
Dr. Zhang was a recipient of the Stanford Graduate Fellowship and the
Top-Class Scholarship from Tsinghua University.
Max Marcel Shulaker received the B.S. in electri-
cal engineering from Stanford University, Stanford,
CA, USA, in 2011, where he is currently pursuing
the Ph.D. degree, and researches on experimentally
demonstrating nanosystems with emerging technolo-
gies.
His current research interests include realizing
increased levels of integration for carbon nanotube-
based digital logic circuits.
Mr. Shulaker is a Stanford Graduate Fellow and
a Fannie and John Hertz Fellow.
Hai Wei (S’09) received the B.S. (Hons.) degree in
microelectronics from Tsinghua University, Beijing,
China, in 2007, and the M.S. and Ph.D. degrees
in electrical engineering from Stanford University,
Stanford, CA, USA, in 2010 and 2014, respectively.
His current research interests include design and
fabrication of carbon nanotube field-effect transistors
and circuits and monolithic 3-D integrated circuits.
Mr. Wei was a recipient of the Stanford School of
Engineering Fellowship Award.
Chi-Shuen Lee received the B.S. degree in elec-
trical engineering from National Taiwan University,
Taipei, Taiwan, in 2011, and the M.S. degree
in electrical engineering from Stanford University,
Stanford, CA, USA, in 2014, where he is currently
pursuing the Ph.D. degree.
He is currently involved in the compact model-
ing of carbon nanotube field-effect transistors and
performance benchmarking of digital systems based
on emerging CMOS technologies in the sub-10 nm
regime. His current research interests include mod-
eling and simulation of nanoscale transistors and performance benchmarking.
Arjun Balasingam received the Diploma degree
from Archbishop Mitty High School, San Jose,
CA, USA, in 2014, as the Valedictorian of his
class. He is currently pursuing the B.S. degree
with the School of Engineering, Stanford University,
Stanford, CA, USA.
His current research interests include electrical
engineering, mathematics, and computer science.
Mr. Balasingam was selected as a Siemens
Competition Regional Finalist in 2012 and a
Semifinalist in 2013 for his work on a variety of
engineering research projects. He was selected as an Intel Science Talent
Search Semifinalist in 2014. He was a recipient of the IEEE Silicon Valley
Engineering Council Education Award in 2014.
H.-S. Philip Wong (F’01) received the B.Sc.
(Hons.) degree from the University of Hong Kong,
Hong Kong, the M.S. degree from Stony Brook
University, New York, NY, USA, and the Ph.D.
degree from Lehigh University, Bethlehem, PA,
USA.
From 1988 to 2004, he was with IBM T.J. Watson
Research Center, Yorktown Heights, NY, USA,
where he has been Research Staff Member, Manager,
and Senior Manager. As Senior Manager, he was
responsible for shaping and executing IBM’s strat-
egy on nanoscale science and technology as well as exploratory silicon devices
and semiconductor technology. He joined Stanford University, Stanford, CA,
USA, as a Professor of Electrical Engineering, in 2004, where he is the Willard
R. and Inez Kerr Bell Professor with the School of Engineering. His academic
appointments include the Chair of Excellence of the French Nanosciences
Foundation, Grenoble, France, a Guest Professor with Peking University,
Beijing, China, an Honorary Professor with the Institute of Microelectronics
of Chinese Academy of Sciences, Beijing, a Visiting Chair Professor of
Nanoelectronics with Hong Kong Polytechnic University, Hong Kong, and
the Honorary Doctorate degree from the Institut Polytechnique de Grenoble,
Grenoble. His current research interests include carbon electronics, 2-D
layered materials, wireless implantable biosensors, directed self-assembly,
nanoelectro mechanical relays, device modeling, brain-inspired computing,
and nonvolatile memory devices such as phase change memory and metal
oxide resistance change memory.
Prof. Wong has served as an Elected Member of the Electron Devices
Society AdCom from 2001 to 2006, an Editor-in-Chief of the IEEE
TRANSACTIONS ON NANOTECHNOLOGY from 2005 to 2006, the Sub-
Committee Chair of the International Solid-State Circuit Conference from
2003 to 2004, the General Chair of the International Electron Devices Meeting
in 2007. He is the IEEE ExCom Chair of the Symposia of Very Large-Scale
Integration Technology and Circuits.
Subhasish Mitra (F’13) directs the Robust Systems
Group in the Department of Electrical Engineering
and the Department of Computer Science of
Stanford University, where he is the Chambers
Faculty Scholar of Engineering. Before joining
Stanford, he was a Principal Engineer at Intel.
Prof. Mitra’s research interests include robust
systems, VLSI design, CAD, validation and test,
emerging nanotechnologies, and emerging neuro-
science applications. His X-Compact technique for
test compression has been key to cost-effective man-
ufacturing and high-quality testing of a vast majority of electronic systems,
including numerous Intel products. X-Compact and its derivatives have been
implemented in widely-used commercial Electronic Design Automation tools.
His work on carbon nanotube imperfection-immune digital VLSI, jointly with
his students and collaborators, resulted in the demonstration of the first car-
bon nanotube computer, and it was featured on the cover of NATURE. The
NSF presented this work as a Research Highlight to the US Congress, and it
also was highlighted as “an important, scientific breakthrough” by the BBC,
Economist, EE Times, IEEE Spectrum, MIT Technology Review, National
Public Radio, New York Times, Scientific American, Time, Wall Street
Journal, Washington Post, and numerous other organizations worldwide.
Prof. Mitra’s honors include the Presidential Early Career Award for
Scientists and Engineers from the White House, the highest US honor for
early-career outstanding scientists and engineers, ACM SIGDA/IEEE CEDA
A. Richard Newton Technical Impact Award in Electronic Design Automation,
“a test of time honor” for an outstanding technical contribution, and the Intel
Achievement Award, Intel’s highest corporate honor. He and his students
published several award-winning papers at major venues: IEEE/ACM Design
Automation Conference, IEEE International Solid-State Circuits Conference,
IEEE International Test Conference, IEEE Transactions on CAD, IEEE VLSI
Test Symposium, Intel Design and Test Technology Conference, and the
Symposium on VLSI Technology. At Stanford, he has been honored sev-
eral times by graduating seniors “for being important to them during their
time at Stanford.”
Prof. Mitra has served on numerous conference committees and journal
editorial boards. He served on DARPA’s Information Science and Technology
Board as an invited member. He is a Fellow of the ACM and the IEEE.
A1 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO.7, JULY 2015 
 
 
Appendix 
 
TABLE VI 
RESULTS OF CNFET DEVICE PARAMETER OPTIMIZATION (SECTION VII) FOR EACH NODE AND VALUE OF VDD, USING THE VSCNFET MODEL [45]. 
“FIXED” VARIABLES ARE HELD CONSTANT DURING OPTIMIZATION; “OPTIMIZED” VARIABLES ARE SWEPT TO OPTIMIZE EDP (SECTION VII-A). 
VALUES FOR STANDARD CELL PARASITICS (RGCONT, RWPT, RSDCONT, CIN-OUT, CIN-RAILS, COUT-RAILS) ARE SHOWN FOR AN INVERTER WITH 
ONE N-TYPE CNFET AND ONE P-TYPE CNFET, EACH WITH WIDTH EQUAL TO THE MAXIMUM FINGER WIDTH (WFINGER,MAX); 
STANDARD CELL CIRCUIT SCHEMATICS ARE ANNOTATED WITH THESE PARASITICS AS ILLUSTRATED IN [44]. 
Parameter Source Technology parameter value 
 Technology node label [18] 14 nm 10 nm 7 nm 5 nm 5 nm 5 nm 5 nm 
 Year [18] 2014 2016 2018 2020 2020 2020 2020 
 Supply voltage, VDD (V) Fixed 0.50 0.50 0.50 0.50 0.45 0.40 0.35 
 CNFET contacted gate pitch, LPITCH (nm) [18] 64 56.6 45 35.8 35.8 35.8 35.8 
 CNFET gate length, LG (nm) Optimized 9.8 9.5 8.9 8.4 8.8 9.3 9.7 
 CNT-metal contact length, LC (nm) Optimized 44.3 38.1 29 21.5 20.9 20.3 19.8 
 CNFET extension region length, LEXT (nm) Optimized 4.95 4.5 3.55 2.95 3.05 3.1 3.15 
 CNFET nominal on-current, ION (mA / µm) Optimized 1.914 1.826 1.602 1.348 1.066 0.800 0.552 
 CNFET nominal off-current, IOFF (nA / µm) [18] 100 100 100 100 100 100 100 
 CNFET nominal on/off ratio Optimized 19e3 18e3 16e3 13e3 11e3 8e3 5e3 
 CNFET flat-band voltage, VFB (V) Optimized 0.007 0.010 0.018 0.026 0.018 0.010 0.005 
 Nominal CNT-CNT spacing, s (nm) Fixed [46] 4 4 4 4 4 4 4 
 CNT diameter, dCNT (nm) Fixed [31]  1.3 1.3 1.3 1.3 1.3 1.3 1.3 
 CNFET gate oxide thickness, TOX (nm) [18] 2.57 2.51 2.46 2.42 2.42 2.42 2.42 
 CNFET gate oxide dielectric constant, KOX [18] 13 14 15 16 16 16 16 
 CNFET gate equivalent oxide thickness, EOT (nm) [18] 0.771 0.699 0.640 0.590 0.590 0.590 0.590 
 CNFET gate height, hG (nm) [18] 20 20 15 15 15 15 15 
 CNFET contact height, hC (nm) [23] 20 20 15 15 15 15 15 
 Dielectric constant of spacer, KSPA [18] 2.775 2.59 2.59 2.31 2.31 2.31 2.31 
 M1 wire resistivity (µΩ-cm) [18] 4.62 4.77 5.41 6.35 6.35 6.35 6.35 
 M1 wire resistance (Ω / µm) [18] 23.7 29.8 53.4 99.1 99.1 99.1 99.1 
 M1 wire capacitance (fF / µm) [18] 0.19 0.19 0.19 0.17 0.17 0.17 0.17 
 M1 aspect ratio [18] 1.9 2.0 2.0 2.0 2.0 2.0 2.0 
 Standard cell height (µm) [18], [25] 0.768 0.679 0.540 0.430 0.430 0.430 0.430 
 Standard cell width (relative) [18], [25] 1.0 0.89 0.70 0.56 0.56 0.56 0.56 
 Standard cell area (relative) [18], [25] 1.0 0.78 0.49 0.31 0.31 0.31 0.31 
 Standard cell maximum finger width, WFINGER,MAX (µm) [18], [25] 0.280 0.240 0.200 0.160 0.160 0.160 0.160 
 CNT sampling region width (nm) Fixed 20 20 20 20 20 20 20 
 Minimum CNFET width in library (nm) Fixed 20 20 20 20 20 20 20 
 Minimum CNFET width (1/2 LPITCH) (nm) Fixed 32 28.8 22.5 17.6 17.6 17.6 17.6 
 Standard cell input-to-gate wire resistance, RGCONT (Ω) [18], [44] 150 155 176 206 206 206 206 
 Standard cell wire resistance per track, RWPT (Ω) [18], [44] 1.7 2.0 2.8 4.7 4.7 4.7 4.7 
 Standard cell source/drain wire resistance, RSDCONT (Ω) [18], [44] 100 113 142 179 179 179 179 
 Standard cell input-to-output capacitance, CIN-OUT (aF) [18], [44] 37.5 33.2 26.4 21.0 21.0 21.0 21.0 
 Standard cell input-to-supply rail capacitance, CIN-RAILS (aF) [18], [44] 29.5 26.1 20.7 16.5 16.5 16.5 16.5 
 Standard cell output-to-supply rail capacitance, COUT-RAILS (aF) [18], [44] 38.0 33.6 26.7 21.3 21.3 21.3 21.3 
VI. CNT VARIATIONS & CNT CORRELATION 
CNTs are subject to the following CNT-specific variations: 
1) CNT Type Variations: CNTs can be either metallic    
(m-CNT) or semiconducting (s-CNT) [50]. 
2) CNT Density Variations: described in Section II. 
3) CNT Diameter Variations: the diameter of a CNT is a 
function of its chirality, and can lead to changes in 
CNFET threshold voltage and on-current [34]. 
4) CNT Alignment Variations: mis-positioned CNTs cause 
random alignment angles with respect to the CNT 
growth direction, resulting in variations in CNFET 
channel length [30]. 
5) CNT Doping Variations: CNFETs require heavily 
doped source and drain extension regions to achieve 
small parasitic series resistance. Variations in the 
doping concentration lead to variation in series 
resistance [9]. 
A. Improving pm: Diminishing Returns 
As described in Section II-D, reducing pm past 1% suffers 
from diminishing returns and can be insufficient to meet 
design goals [16], [52]; Fig. 16 illustrates that pm = 0.1% does 
not achieve delay penalty ≤5% for the OpenSPARC modules 
at the 5 nm node. 
B. Gaussian Approximation of CNT Count Distributions 
To validate the Gaussian approximation to the CNT count 
distribution (as described in Section III-A), we sample the 
circuit delay cumulative distribution function (CDF) via MC 
SSTA (Section III-A) for each of two cases: using discrete 
(non-negative integer) CNT count variables [49], and using 
the Gaussian approximation. For example, in the case of the 
5 nm “gkt” OpenSPARC module, the Gaussian approximation 
underestimates the median delay (where the CDF is equal to 
50%) by only 0.07%, and overestimates the delay spread 
(measured as the width between the points where the CDF is 
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS A2 
 
 
equal to 5% and 95%) by only 0.8% (Fig. 16). We conclude 
that the Gaussian approximation is sufficient for our 
exploration purposes. 
VII. CNFET DEVICE- & CIRCUIT-LEVEL MODELING 
To efficiently evaluate delay penalties and PNMV for the 
OpenSPARC modules, we use variation-aware logic gate 
timing, energy, and SNM models that are built using a SPICE-
compatible CNFET compact device model [23], which is 
based on the virtual source model [21]. This virtual source 
CNFET (VSCNFET) device model accounts for several non-
idealities including (but not limited to) direct source-to-drain 
tunneling leakage current, parasitic gate-to-plug capacitance, 
fringing capacitance, source/drain extension region resistance, 
and CNT-metal contact resistance [45]. It has been calibrated 
with experimental data from 15 nm gate length CNFETs [23] 
and with data from NEGF-based (non-equilibrium Green’s 
function) simulations of 5 nm gate length CNFETs [4]. 
We leverage the VSCNFET model to extract timing, 
energy, and noise margin information from SPICE simulations 
to build variation-aware logic gate timing, energy, and SNM 
models for each standard cell in our standard cell library, 
which is derived from the Nangate 15 nm Open Cell Library 
(OCL) [25]. We use the standard cell height, width, area, and 
maximum finger width from the Nangate 15 nm OCL for our 
14 nm standard cell library, and then scale each of these 
dimensions by the ratio of the contacted gate pitch (LPITCH) 
given by the 2013 edition of the International Technology 
Roadmap for Semiconductors (ITRS) [18]. LPITCH (as well as 
other device- and circuit-level parameters, including oxide 
thickness and wire resistivity) for the 14, 10, 7, and 5 nm 
technology nodes is taken according to the “Node Range” 
Labeling in the Process Integration, Devices, and Structures 
(PIDS) table for High Performance Logic Technology 
Requirements. All of our variation-aware models account for 
standard cell parasitics, including wire resistance to the 
source, drain, and gate of each CNFET, wire track resistance, 
and capacitance between the input, output, and supply rails 
(using experimentally measured values from [44] for the 14 
nm node, which are then scaled for other nodes using 
parameters from the ITRS) [44]. See Table VI for results. 
A. CNFET Device Parameter Optimization 
Before building our variation-aware models for each 
standard cell, we first optimize the CNFET device parameters 
(i.e., parameters that define the geometry of the CNFETs and 
affect their electrical characteristics, details below) to target a 
high performance CNFET technology (as opposed to a low 
power CNFET technology; e.g., high performance versus low 
power options for standard cell libraries in silicon-CMOS 
circuits are often distinguished by the transistor threshold 
voltage and off-state leakage current) [18]. For example, these 
CNFET device parameters include the CNFET channel length, 
CNT-metal contact length, and the CNT diameter, all of which 
affect CNFET electrical characteristics (e.g., threshold 
voltage, parasitic capacitances, on-current, off-current, sub-
threshold slope, etc.). We choose to optimize the CNFET 
device parameters so as to minimize the EDP of an inverter 
with fan-out (FO) equal to four (i.e., the output load 
capacitance is four times as large as the input gate 
capacitance): a common metric for performance 
benchmarking and technology assessment [9]. We refer to this 
metric as EDPFO4, where EDPFO4 = EFO4TFO4, TFO4 is the 
average of the rise delay (falling input/rising output) and the 
fall delay (rising input/falling output), and EFO4 is the average 
switching energy per transition. We perform the following 
CNFET device parameter optimization to minimize EDPFO4 
(for each node and for each value of VDD): LPITCH is held 
constant, and then LG (gate length), LC (CNT-metal contact 
length), LEXT (CNT extension length, which refers to the un-
gated region of the CNT between the gate and the source/drain 
contact, Fig. 1 [23]), and VFB (flat-band voltage, which offsets 
the threshold voltage), are swept using the VSCNFET model 
to minimize EDPFO4, subject to the constraint that the CNFET 
off-current IOFF ≤ 100 nA/µm in the nominal case 
(IOFF = 100 nA/µm is the target for high performance logic 
specified by the ITRS) [18]. Note that, 
LPITCH = LC + LG + 2LEXT [23]. For each combination of 
CNFET device parameters, we simulate EDPFO4 using SPICE 
and the VSCNFET model and then select the CNFET device 
parameters that minimize EDPFO4. CNFET device parameter 
optimization results (as well as additional parameters, 
including gate dielectric constant, contact height, CNT-CNT 
spacing, etc.) are provided in Table VI. In particular, we 
illustrate the optimized LC values (Table VI) in Fig. 18, 
demonstrating that EDPFO4 is highly sensitive to CNT-metal 
contact resistance (RC). 
 
Fig. 17. Aligned-active layout (Section II-A). AOI222_X1 standard cell [25] 
before (a) and after (b) active alignment. 
 
 
 
Fig. 15. Delay penalty improvement due to improving pm (5 nm node, 
OpenSPARC modules, after steps 1-3 in Fig. 3). IDC = 0.50, pRs = 4%, 
pRm = 99.99%. Improving pm from 10% to 0.1% improves count-limited yield 
from ≥99.98% to ≥99.995%. However, despite pm = 0.1%, delay penalty can 
be >10%; thus, improving pm alone can be insufficient to meet design goals. 
 
 
Fig. 16. Estimated cumulative distribution function (CDF) of maximum path 
delay (5 nm “gkt” OpenSPARC module: 2000 MC trials). 
 
 
A3 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO.7, JULY 2015 
 
 
The optimized CNFET device parameters are then used as 
inputs to the VSCNFET model for all SPICE simulations to 
analyze ION variations (Fig. 2) and to build variation-aware 
logic gate timing, energy, and SNM models to evaluate 
circuit-level performance metrics (e.g., delay penalty, PNMV, 
and ΔE) for each node and for multiple values of VDD 
(VDD = 0.50 V to compare technology nodes and VDD is swept 
down to 0.35 V in 0.05 V increments at the 5 nm node to 
evaluate the impact of VDD scaling). 
B. Physical Circuit Design for Circuit-Level Analysis 
For circuit-level analysis, the OpenSPARC modules are 
synthesized using Synopsys Design Compiler (targeting the 
nominal case), Capo [36] is used for placement, and FLUTE 
[8] is used to estimate wire lengths, with wire parasitics 
computed using parameters from the ITRS [18]. The full 
design and analysis flow is shown in Fig. 3 (Section II). 
C. Selective Upsizing 
We leverage the following selective transistor/logic gate 
upsizing algorithm (i.e., selective upsizing, inspired by [52]) 
to minimize circuit EDP (Section II-A) and to reduce circuit 
delay penalty (Section II-B). We first sort all standard cells 
according to their fan-out (fan-out: the ratio of the output load 
capacitance to the minimum input capacitance on any input) in 
the nominal case. Next, we upsize the standard cell with the 
largest fan-out by incrementing its drive strength (e.g., 
INV_X1 becomes INV_X2) and then re-sort all of the 
standard cells according to their fan-out; a parameterized 
number kSelUpsize ≥ 0 of the standard cells are upsized 
sequentially in this manner (note that, each standard cell can 
potentially be upsized multiple times). If a standard cell 
cannot be upsized because it is at its maximum drive strength 
(e.g., INV_X64 is the strongest inverter in our library), then 
the standard cell with the largest fan-out that can be upsized 
(i.e., it is not at its maximum drive strength) is upsized instead. 
VIII. VARIATION-AWARE TIMING/ENERGY MODEL 
A. Timing Model Generation 
The VSCNFET SPICE model and CNFET device parameter 
optimization results (Table VI) are used to build a variation-
aware logic gate timing model that computes CNFET logic 
gate delay and output slew as functions of the CNT count in 
each sampling region (see Fig. 8 in Section III-B for details on 
the CNT count variables). 
To build this model, we perform over 2000 SPICE 
simulations for each input pin of each logic stage in our 
standard cell library, varying the input slew rate (tInSlew), the 
load capacitance (CLoad), and the CNT counts (n1, n2, …) of 
each sampling region. Four values of CLoad and six values of 
tInSlew are analyzed for each logic stage to emulate typical 
operating conditions in a digital system (CLoad: fan-out = 2, 4, 
6, and 8; tInSlew: 1-64 ps). For each of the 24 combinations of 
(CLoad, tInSlew), we sample 100 random values of CNT count 
from the CNT count distribution (using processing parameters 
defined in Table I), for a total of 2400 simulations for each 
input pin of each logic stage. We then calibrate timing models 
f(tInSlew, CLoad, n1, n2, …) to the delay (d) and to the output slew 
(tOutSlew) values extracted from the SPICE simulations. 
B. Timing Model Linearization 
Solving for d in (41) (using the nonlinear timing model in 
[53]) is not trivial, and may involve a numerical method that 
requires significantly more computational effort than a model 
of the form d = CV/i [48]: 
 ! = !!"# + !!"#$ !!!/(!!min!(2!/!!"#$%&, 1) + !!). (41) 
 
Additionally, tInSlew must be determined for each input pin of 
each logic stage and must propagate through the circuit (as it 
affects the delay of subsequent logic stages), which further 
increases the computation time. To obtain the linearized model 
in (42), we linearize (41) with the following procedure: 
 !!!!!!!!!!! = !!"# + !!"#$ !!!/!!"!"#. (42) 
 
1) Perform static timing analysis with the nonlinear timing 
model to calculate TNom (Section II). This also yields 
the nominal delays and input slew rates for each logic 
stage: dNom and tInSlewNom 
2) Define a new parameter, iDrive, which is an affine 
function of i1 and i2, and is therefore also an affine 
function of the region CNT counts: 
 
 !!"#$% = !!min!(2!!"#/!!"#$%&'(), 1) + !!. (43) 
 
3) Replace the denominator in (41) with the value of iDrive 
to create a first-order delay model of the form d = CV/i 
[as in (42)] that gives the same value of d as the 
nonlinear model in the nominal case. We choose to 
linearize the timing model around the nominal case so 
that it is independent of the processing parameter 
values. This enables delay factorization (Section III-A) 
to further improve computational efficiency. 
C. Timing Model Validation 
We validate our timing models using circuit modules 
synthesized from the ISCAS-85 benchmarks [15]. For each 
circuit module, we compare the critical path delay (in the 
nominal case) computed using our timing model versus SPICE 
simulations (using the VSCNFET model), according to the 
following three-step procedure. 
1) For the EDP-optimal nominal design point [defined in 
Section II-A: (ENomOpt, TNomOpt) (1)], compute the 
nominal critical path delay using the timing model and 
record an arbitrary critical path. 
2) Create a SPICE netlist of the cascaded standard cells on 
the recorded critical path (using the standard cell library 
described in Section VII) and instantiate capacitors to 
account for the capacitances of branches off the 
recorded critical path. 
 
Fig. 18. Contact resistance (RC) versus contact length (LC) per CNT for the 
VSCNFET model (diameter-independent case) [23]. CNFET device parameter 
optimization results are shown for each node. RC versus LC uses a transmission 
line model that is calibrated using experimental values extracted from 
CNFETs with LC ranging from 20-300 nm [23]. 
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS A4 
 
 
3) Perform transient analysis in SPICE and extract the 
delay (as the time taken for the output to reach VDD/2 
from the time that the input reaches VDD/2); compare it 
to the critical path delay computed using the timing 
model in step 1. 
Results for three of the ISCAS-85 benchmarks are shown in 
Fig. 19; the timing model overestimates the SPICE-extracted 
delay by an average of 3.8%, which we conclude is sufficient 
for our exploration purposes. 
In Section IV-A, the linearized timing model is validated 
against the nonlinear timing model using the processing 
parameter values in Table VII (see Section IV-A for details). 
IX. VARIATION-AWARE SNM MODEL & PNMV 
A. Minimum-Width Upsizing to Improve PNMV 
As shown in Fig. 7(c) in Section II-C, additional minimum-
width upsizing (in addition to minimum-width upsizing for 
count-limited yield, i.e., in Fig. 3 in Section II-A) can improve 
PNMV at the cost of energy due to statistical averaging. In 
some cases, however, target PNMV constraints (e.g., PNMV ≤ 
0.001%) cannot be satisfied even for arbitrarily high energy 
cost (e.g., for an arbitrarily large minimum width, WMIN, for 
minimum-width upsizing): once the minimum CNFET width 
exceeds the maximum width of a single CNFET in the 
standard cell library (limited by the standard cell height, see 
Table VI) [25], then multiple CNFETs connected in parallel 
(i.e., multiple fingers) are required to increase the effective 
CNFET width. For highly aligned CNTs with aligned-active 
layouts, adding fingers to increase the effective CNFET width 
does not achieve the full benefits of statistical averaging 
because of CNT correlation. For example, a CNFET 
consisting of two CNFET fingers that have perfectly 
correlated CNT counts exhibits the same magnitude of CNT 
count variations as either of the CNFET fingers alone 
(measured by the standard deviation of the CNT count relative 
to its mean), despite having twice the effective width. 
B. Variation-Aware SNM Model Construction 
As described in Section III-B1: for each logic stage input in 
our standard cell library, we model the VTC parameters for 
every case in which that input is sensitized (considering all 
possible combinations of the other inputs). The VTC 
parameters are functions of the CNT counts of the p- and n-
type CNFETs which: 1) are gated by that input and 2) connect 
the logic stage output to either VDD or ground through a series 
of CNFETs in the “on” state. As an example, consider the 
pull-down network of a 3-input CMOS logic stage (inputs: 
inA, inB, inC) which consists of 4 CNFETs: A1, A2, B, and C. 
A1 and A2 are both gated by inA, B is gated by inB, and C is 
gated by inC. A1 and B are connected in series between the 
logic stage output and ground, and so are A2 and C (forming 2 
parallel paths each with 2 CNFETs connected in series).  
Assume that we are obtaining the VTC of the logic stage for 
inA. Then when the state of (inB, inC) is (1, 0), A1 connects 
the logic stage output to ground through a series of CNFETs in 
the “on” state (since B is “on”), and A2 does not (since C is 
“off”). When the state of (inB, inC) is (1, 1) (i.e., another state 
that sensitizes inA), then both A1 and A2 connect the logic 
stage output to ground through a series of CNFETs in the “on” 
state (since both B and C are “on”), etc. 
The VTC parameters are modeled separately for each input 
of a logic stage. For example, NAND2 gate U4 in Fig. 8 
(Section III-B) consists of 4 CNFETs: 2 are gated by in1 (P4,1 
and N4,1) and 2 are gated by in2 (P4,2 and N4,2). The VTC 
parameters for in1 are functions of nP(P4,1) and nN(N4,1) (i.e., the 
CNT counts of CNFETs P4,1 and N4,1), and the VTC 
parameters for in2 are functions of nP(P4,2) and nN(N4,2) (i.e., the 
CNT counts of CNFETs P4,2 and N4,2). Hence, two instances 
of T are required in the variation-aware SNM model (14): 
T(NAND2_X1-in1) and T(NAND2_X1-in2). 
C. SNM Model Calibration 
Section III-B1 describes how we build the variation-aware 
SNM model for each input of each logic stage in our standard 
cell library. The error of this model versus SPICE simulations 
is shown in Fig. 20; for each VTC parameter, the root-mean-
square error [RMSE, defined in (44), e.g., for VOH] quantifies 
the error in the SNM model (14) versus VTC parameters 
extracted from 2000 SPICE simulations [e.g., VOH,i(VTC) is the 
value of VOH extracted from the VTC for the ith SPICE 
simulation (of q = 2000 simulations), and VOH,i(model) is its 
modeled value (14)]. 
 
 !"#$ = 1! !!",!!"# − !!",!(!"#$%) !!!!!  (44) 
D. Rapid Analysis of Circuit PNMV 
As described in Section III-B2, the SNM constraints on the 
CNFET CNT count variables are expressed using a matrix ! ∈ ℝ!×!, such that satisfying !" ≼ ! [(22) in Section III-B2] 
is equivalent to satisfying all SNM constraints in the circuit. 
 
Fig. 19. Critical path delay comparison: timing model versus delay extracted 
from SPICE transient analysis, shown for three circuit modules synthesized 
from the ISCAS-85 benchmarks (5 nm node, after steps 1-3 in Fig. 3) [15]. 
TABLE VII 
PROCESSING PARAMETER VALUES USED TO VALIDATE THE LINEARIZED 
TIMING MODEL (SECTION IV-A). pRm = 99.99% FOR ALL TESTS 
Test 1 2 3 4 5 6 7 8 
IDC 0.50 0.50 0.50 0.50 0.35 0.10 0.25 0.25 
pm 10% 5% 1% 0.1% 5% 5% 5% 5% 
pRs 5% 5% 5% 5% 5% 5% 5% 2.5% 
 
 
Fig. 20. SNM model calibration results for a subset of minimum-sized cells 
(e.g., NAND2_X1 for a NAND2 cell, for an arbitrary input) in our standard 
cell library (5 nm node, VDD = 0.35 V). RMSE ≤ 2.5 mV in all cases. 
A5 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO.7, JULY 2015 
 
 
As an example, (45) shows the constraints !" ≼ ! for the 
circuit in Fig. 8 [for gate pairs (U1, U3), (U3, U5), and (U3, 
U6)].4 Each row in H represents a single SNM constraint; e.g., 
the first row is an SNMH constraint for the gate pair (U1, U3) 
on the CNFET CNT count variables nP(P3,1) and nN(N3,1): the 
first two elements in vector s. In (45), each element of H is 
subscripted by its row and column indices (e.g., H1,2 for the 1st 
row, 2nd column of H). These terms are determined using 
equations similar to (20)-(21) along with all instances of T in 
the variation-aware SNM model (14). For example, H1,2 and 
H2,1 are associated with the gate pair (U1, U3), hence they are 
computed using T(D-latch,Out) for the output stage of D-latch U1 
and T(INV_X1) for inverter U3. 
 SNMH !1,!3 :!SNML !1,!3 :!SNMH !3,!5 :!SNML !3,!5 :!SNMH !3,!6 :!SNML !3,!6 :!⋮
1 !!,! 0 0 0 0 ⋯!!,! 1 0 0 0 0 ⋯0 0 1 !!,! 0 0 ⋯0 0 !!,! 1 0 0 ⋯0 0 0 0 1 !!,!⋯0 0 0 0 !!,! 1 ⋯⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
!!(!3,1)!!(!3,1)!!(!5,1)!!(!5,1)!!(!6,1)!!(!6,1)⋮
≼
000000⋮
 (45) 
 
Also described in Section III-B2, the relationship between 
the CNFET CNT count variables [s ∈ ℝ! in (23), for t total 
CNFETs] and the sampling region CNT count variables 
[n ∈ ℝ!  in (23), for r total sampling regions] is expressed 
using a linear transformation s = Bn (23), where ! ∈ 0,1 !×! 
has Bi,j = 1 if CNFET i overlaps sampling region j, and Bi,j = 0 
otherwise. For example, six rows of B for the circuit in Fig. 8 
are shown in (46). The first row represents the transformation 
from the sampling region CNT counts to the CNFET CNT 
count nP(P3,1); since CNFET P3,1 overlaps sampling regions 1, 
2, and 3, then B1,1 = 1, B1,2 = 1, B1,3 = 1 (all other values of B in 
this row are 0), i.e., nP(P3,1) =  n1 + n2 + n3. 
 !!(!3,1)!!(!3,1)!!(!5,1)!!(!5,1)!!(!6,1)!!(!6,1)⋮
= !
! ! ! 0 0 0 0 0 0 0 0 0 ⋯0 0 0 ! ! ! 0 0 0 0 0 0 ⋯! ! ! 0 0 0 0 0 0 0 0 0 ⋯0 0 0 ! ! ! 0 0 0 0 0 0 ⋯0 0 0 0 0 0 ! ! ! 0 0 0 ⋯0 0 0 0 0 0 0 0 0 ! ! ! ⋯⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
!!!!!!⋮!!"!!!!!"⋮
 (46) 
 
By substituting (46) into (45) and computing K = HB (24), 
all SNM constraints [on the sampling region CNT count 
variables, i.e., Kn ≼ 0 (25)] for the example circuit in Fig. 8 
are expressed in (47). Although (47) can be used to determine 
PNMV (e.g., using an MC-based approach, as described in 
Section III-B2), it contains many noncritical SNM constraints, 
which can be eliminated to more efficiently compute PNMV. 
 
!
1 1 1 !!,!!!,!!!,! 0 0 0 0 0 0 ⋯!!,!!!,!!!,! 1 1 1 0 0 0 0 0 0 ⋯1 1 1 !!,!!!,!!!,! 0 0 0 0 0 0 ⋯!!,!!!,!!!,! 1 1 1 0 0 0 0 0 0 ⋯0 0 0 0 0 0 1 1 1 !!,!!!,!!!,!⋯0 0 0 0 0 0 !!,!!!,!!!,! 1 1 1 ⋯⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
!!!!!!⋮!!"!!!!!"⋮
≼
000000⋮
! (47) 
                                                            
4 Note that, a single logic gate can be a driving logic gate in multiple gate 
pairs. In this case, U3 drives both U5 and U6, so it is a driving logic gate in 
gate pairs (U3, U5), and (U3, U6); there are SNMH and SNML constraints for 
each of these gate pairs. 
E. Eliminating Noncritical SNM Constraints for PNMV 
As described in Section III-B2, identifying and eliminating 
noncritical SNM constraints is crucial to efficiently determine 
PNMV; Table VIII illustrates that they can account for ≥99% 
of the total number of SNM constraints. Here, we describe 
how to systematically identify and eliminate noncritical SNM 
constraints. As an example, consider gate pairs (U1, U3) and 
(U3, U5) in Fig. 8 with the following SNMH constraints (15). 
 !!!!!!!!SNMH !1,!3 :!!!"!! − !!"!! ≥ !"!! !! (48) !!!!!!!!SNMH !3,!5 :!!"!! − !!"!! ≥ !"!! !. (49) 
 
We now describe a case where SNMH(U3, U5) is always 
larger than SNMH(U1, U3), meaning that (49) is a noncritical 
SNM constraint [i.e., it can only be violated if (48) is also 
violated]. 5  First, consider the VIH terms in (48)-(49). We 
replace VIH(U3) and VIH(U5) with the variation-aware SNM 
model (14), using T(INV_X1) for both VIH terms (since both U3 
and U5 are inverters): 
 !!"!! = !!"#!"#_!" + !!"!!"#_!" log!"(!!!!,! /!!!!,! )! (50) !!"!! = !!"#!"#_!" + !!"!!"#_!" log!"(!!!!,! /!!!!,! ). (51) 
 
These VIH terms are equivalent, since nP(P3,1) = nP(P5,1) and 
nN(N3,1) = nN(N5,1) (as the CNFETs in inverters U3 and U5 
overlap the exact same sampling regions: Fig. 8). Thus, 
VIH(U3) = VIH(U5), so the difference between (48) and (49) comes 
from the VOH terms. Using the variation-aware SNM model 
(14) [with T(D-latch,Out) for the output stage of the D-latch], 
VOH(U1) = TVOH0(D-latch,Out) and VOH(U3) = TVOH0(INV_X1). Therefore, 
if TVOH0(INV_X1) > TVOH0(D-latch,Out) (i.e., VOH for the inverter is 
higher than VOH for the output stage of the D-latch), then 
SNMH(U3, U5) > SNMH(U1, U3), signifying that (49) is a 
noncritical SNM constraint. 
In the general case, noncritical SNM constraints can be 
identified and eliminated via the following procedure. 
1) SNM Constraint Partitioning: each row of K (25) 
(Section III-B2) corresponds to an SNM constraint on 
the sampling region CNT count variables. Partition the 
rows of K into groups such that each row in a group 
constrains the exact same sampling region CNT count 
                                                            
5 If SNMH(U3, U5) < SNMR [i.e., (49) is violated], then SNMH(U1, U3) < 
SNMH(U3, U5) implies SNMH(U1, U3) < SNMR [i.e., (48) is also violated]. 
TABLE VIII 
NUMBER OF SNM CONSTRAINTS AND CRITICAL SNM CONSTRAINTS 
FOR THE OPENSPARC MODULES (USING THE SYNTHESIZED NETLIST 
FOR EACH MODULE FOR THE 5 NM NODE WITH VDD = 0.50 V) 
Open-
SPARC 
module 
Rows of 
standard 
cells 
Width 
(µm) 
Height 
(µm) 
SNM 
constraints 
Critical SNM 
constraints 
Critical SNM 
constraints 
(% of total) 
dec 58 22.9 22.8 11.8K 356 3.3% 
pmu 94 37.4 37.2 29.8K 636 2.1% 
pku 98 38.9 38.8 32.2K 754 2.3% 
gkt 100 40.2 39.6 26.6K 536 2.0% 
exu 122 48.4 48.4 65.0K 1226 1.9% 
lsu 207 82.8 82.4 141.0K 1882 1.3% 
tlu 243 96.6 96.8 218.2K 2684 1.2% 
fgu 275 110 110 364.6K 3058 0.8% 
 
 
HILLS et al.: RAPID CO-OPTIMIZATION OF PROCESSING AND CIRCUIT DESIGN TO OVERCOME CARBON NANOTUBE VARIATIONS A6 
 
 
variables; i.e., given rows i and j in K, rows i and j 
should be partitioned into the same group if and only if: 
for all columns 1 ≤ k ≤ r (K has r columns since there 
are r sampling regions), either Ki,k ≠ 0 and Kj,k ≠ 0, or 
Ki,k = 0 and Kj,k = 0. For example, in (47), the first six 
rows of K are partitioned into two groups (labels are the 
row indices of K): {1, 2, 3, 4} and {5, 6}. 
2) Noncritical SNM Constraint Identification: compare 
each row i in K against each other row j in K that is 
partitioned into the same group, using the following 
comparison criterion: if Ki,k ≤ Kj,k for all columns 
1 ≤ k ≤ r, then the constraint in row i is noncritical, 
since it cannot be violated without simultaneously 
violating the constraint in row j. This property holds for 
s ≻ 0 (i.e., the vector of CNFET CNT counts is strictly 
greater than 0, meaning that there is no CNT count 
failure; CNT count failure and count-limited yield are 
accounted for separately using minimum-width 
upsizing for count-limited yield: Section II-A). 
3) Noncritical SNM Constraint Elimination: remove all 
rows of K that correspond to the noncritical SNM 
constraints identified in step 2. 
As shown in Table VIII, the number of critical SNM 
constraints can be ≤1% of the total number of SNM 
constraints for the OpenSPARC modules (e.g., for “fgu”). 
To validate that (26) can be used instead of (25) to compute 
PNMV (i.e., that PNMV is unchanged by eliminating all 
noncritical SNM constraints): we first sample n two million 
times (the number of samples required to estimate 
PNMV ≤ 0.001% for the OpenSPARC modules: details in 
Fig. 21). Then, we evaluate both Kn and !n for each sample 
and verify that (25) is violated if and only if (26) is violated. 
F. Efficiently Solving the MVNCDF Formulation for PNMV  
We make one final adjustment to (31) (in Section III-B2) to 
further improve the computational efficiency of solving for 
PNMV. In particular, the rows of C [in (29) in Section III-B2] 
can be permuted so that C is a block diagonal matrix (i.e., a 
matrix that has main diagonal blocks that are square matrices 
and off-diagonal blocks that are zero matrices); for example, 
starting with the matrix K in (47) (in Section IX-D), take the 
case in which rows 1 and 2 are redundant constraints (e.g., 
H1,2 < H3,4 and H2,1 < H4,3) and remove them from K to form !. Then the covariance matrix C = !!! is a block diagonal 
matrix with diagonal blocks Cu (52). 
 
!!!!!!!!!!! = !!
! !!!! 0 0 ⋯!!!! !!! 0 0 ⋯0 0 !!! !!!!⋯0 0 !!!! !!! ⋯⋮ ⋮ ⋮ ⋮ ⋱ =
!! 0⋯0 !!⋯⋮ ⋮ ⋱ . (52) 
 
We now justify why C can be block diagonal. Let there be y 
rows of standard cells (e.g., row 1 and row 2 of standard cells 
are shown in Fig. 8 in Section III-B); each SNM constraint is 
associated with a single gate pair: (G(dr), G(ld)), where G(dr) is 
placed in row u of standard cells and G(ld) is placed in row v. 
Then the SNM constraints [represented by the rows of ! in 
(26) in Section III-B2] can be partitioned according to the row 
of standard cells in which G(ld) is placed, such that the SNM 
constraints in each partition are independent of the SNM 
constraints in each other partition. This property holds since: 
1) VOH(dr) and VOL(dr)  (for G(dr)) are independent of the 
CNT count in the variation-aware SNM model [i.e., 
TVOH1 = 0 and TVOL1 = 0 in (14): Section III-B1]. Thus, 
the SNM constraints (26) only bound the CNT counts 
of the CNFETs in the loading logic gate (the driving 
logic gates affects the tightness of these constraints). 
2) The sampling region CNT count variables are 
partitioned according to the boundaries between the 
rows of standard cells (e.g., n1, … n6 belong to row 1 of 
standard cells in Fig. 8 and n7, … n12 belong to row 2). 
The consequence of these properties is as follows. Let 
(Gi(dr), Gi(ld)) and (Gj(dr), Gj(ld)) be the gate pairs constrained by 
the ith and jth SNM constraints, respectively [i.e., the ith and 
jth rows of ! in (26) in Section III-B2]. Then the covariance 
term Ci,j is 0 if Gi(ld) and Gj(ld) are placed in different rows of 
standard cells. Thus, C = !!! can be block diagonal, where 
the size of each block u is equal to the number of SNM 
constraints in which the loading logic gate is placed in row u 
of standard cells. For example, in (52), C1 ∈ ℝ!×! since there 
are two SNM constraints that constrain the sampling region 
CNT counts in row 1 of standard cells. 
 
Fig. 21. Validation of MVNCDF-based PNMV calculation (31), shown for the 
5 nm “dec” OpenSPARC module. The coefficient of variation is the standard 
deviation of the sampling region CNT count relative to the mean: it depends 
on the processing parameter values, and we sweep it by changing IDC with 
pm = 1%, pRs = 4%, pRm = 99.99%. 2×106 MC trials are required to reject the 
hypothesis that PNMV > 0.001% with 90% statistical power. The root-mean-
squared percentage error is ≤10% (for PNMV values computed with statistical 
power ≥90%). 
TABLE IX 
VARIABLES IN THE FULL-CIRCUIT PNMV FORMULATION 
Variable ∈ Description 
t ℤ! Total number of CNFET CNT count variables 
c ℤ! Total number of SNM constraints 
s ℝ! Vector of CNFET CNT counts 
H ℝ!×! Constraints on the CNFET CNT counts: Hs ≼ 0 
r ℤ! Total number of sampling regions 
µR ℝ Mean of the sampling region CNT count distribution 
σR ℝ Standard deviation of the sampling region CNT count 
distribution 
n ℝ! Vector of sampling region CNT counts:  normally 
distributed with mean µR & standard deviation σR 
x ℝ! Vector of unit Gaussian random variables 
B ℤ!×!  Transformation from the sampling region CNT counts 
to the CNFET CNT counts: s = Bn 
K ℝ!×!  Constraints on the sampling region CNT counts: 
K = HB, Kn ≼ 0 
p ℤ! Total number of critical SNM constraints !! ℝ!×! Constraints on the sampling region CNT counts, after 
removing noncritical SNM constraints: !!n ≼ 0 
C ℝ!×! Covariance matrix for MVNCDF formulation: C = !!!!T 
b ℝ! Upper bound for MVNCDF formulation: b = -!!1 
y ℤ! Number of rows of standard cells 
 
A7 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO.7, JULY 2015 
 
 
Since the covariance terms are zero between each block Cu 
in C, then the MVNCDF function in (31) can be separated into 
a product of smaller MVNCDF functions that are independent: 
one for each block Cu for each of y rows of standard cells, 
with corresponding vectors bCu (such that Cu and bCu are taken 
from the same rows of C and b, respectively) 
 !!!!!!!!!!!!!!!"#$ = 1 − MVNCDF !!, !!"!!!! . (53) 
 
In our analysis, the size of each Cu is on the order of ℝ!"×!", 
corresponding to ~10 gate pairs per row of standard cells (after 
eliminating noncritical SNM constraints, i.e., following steps 
1-3 in Section IX-E). As mentioned in Section III-B2, solving 
for PNMV in (53) can be efficiently computed, e.g., in less 
than 10 seconds for the OpenSPARC modules (using a single 
2.93 GHz processor core). Table IX summarizes the variables 
used to compute PNMV. The accuracy of the MVNCDF 
formulation [both (31) and (53)] is validated in Fig. 21. 
X. GRADIENT DESCENT IMPLEMENTATION 
A. Gradient Calculation 
A critical path is any path between a circuit input and a 
circuit output with propagation delay equal to the maximum 
path delay. There can be multiple critical paths for a single 
MC trial (full circuit delay model in Section III-A). 
Immediately after STA of each MC trial (Fig. 10), we 
numerically estimate ∇T95 (34) via the following procedure: 
1) Record an arbitrary critical path for each MC trial. 
These paths are used to estimate ∇T95 using a sub-
gradient, borrowing from the sub-gradient method for 
minimization of non-differentiable functions [38] (e.g., 
“max” in STA). 
2) Decrease pm by an incremental amount (i.e., by 
δpm = 10-6; 10-6 is <0.1% of all the experimentally 
demonstrated processing parameter values in Table I), 
and then recompute the path delay only for the 
arbitrarily chosen critical path of each MC trial. Build 
the CDF of these path delays, and extract the delay 
where the CDF is equal to 95%: this extracted delay 
value differs from T95 by an amount δT95(pm). Do the 
same for IDC and for pRs (with !IDC = 10-6 to compute 
δT95(IDC) and δpRs = 10-6 to compute δT95(pRs)). 
3) Numerically estimate each element of ∇T95 using (54). 
This strategy assumes that for each MC trial, the chosen 
critical path remains a critical path after updating each 
processing parameter, which in general is only true in 
the limit as δIDC!→!0, δpm!→!0, δpRs!→!0, and is an 
approximation for δIDC = 10-6, δpm = 10-6, δpRs = 10-6. 
 
 
 
 
 
 
 
 
 
!!!"∂!"# = δ!!"!"#δ!"# , ∂!!"∂!! = δ!!"!"δ!! , ∂!!"∂!!" = δ!!"!!"δ!!" . (54) 
 
We use a similar methodology to estimate ∇PNMV (34): for 
each processing parameter, decrease that processing parameter 
by an incremental amount and recompute PNMV (to calculate 
δPNMV(IDC), δPNMV(pm), and δPNMV(pRs)). The elements of ∇PNMV are estimated using similar equations as (54). We use 
the same numerical approach to compute ∇ETot (34) [using ETot 
as defined in (13)]. 
B. Gradient Descent Step 
To update the processing parameters during gradient 
descent, we first normalize the gradient vector [e.g., ∇EDP95 
(35) or ∇ENP (36)] by its !1-norm (i.e., the sum of the 
absolute values of its elements). This normalizes the 
magnitude of the step size. We then take a step so that the 
improvement in each processing parameter is proportional to 
its corresponding magnitude in the normalized gradient, and 
so that the total improvement in processing parameters sums 
to 10%. Small step sizes require more simulations; large step 
sizes yield coarse granularity in exploration. This strategy 
(though others may be used) assumes that it is equally difficult 
to improve each processing parameter by a fixed percentage. 
For example, if the elements of the normalized gradient have 
magnitudes 0.70, 0.10, and 0.20, then we reduce IDC, pm, and 
pRs by 7%, 1%, and 2% (versus their current values), 
respectively. 
C. Avoiding Convergence to Local Optima 
Given that our optimization methodology is based on 
gradient descent, we employ two strategies to avoid 
convergence to local optima (since the objective is not 
necessarily a convex function): 
1) Initialize Gradient Descent from Multiple Design Points on 
the Initial Energy-Delay Tradeoff Curve: each instance of 
gradient descent typically leads to a unique design point. 
Even if all instances of gradient descent converge to the 
same local optimum, we never choose a worse design point 
by starting another instance of gradient descent. 
2) Never Increment a Processing Parameter Away from its 
Ideal Value (Table I): The only case in which all CNT 
count variations are zero, by definition, is the nominal case 
(in which all processing parameters have their ideal 
values). This is consequently the global optimum in terms 
of minimizing the effect of variations. Any case in which 
the gradient vector points toward incrementing the value of 
a processing parameter away from its ideal value is 
indicative of local optima; we choose to not update that 
parameter in these cases.  
