Improvements towards Optimal Design of Reliable Subthreshold Digital CMOS with applications in Logic and Memory by Berge, Hans Kristian Otnes
Improvements towards Optimal Design of Reliable
Subthreshold Digital CMOS with applications in
Logic and Memory.
Hans Kristian Otnes Berge
July 15, 2015
© Hans Kristian Otnes Berge, 2015 
Series of dissertations submitted to the  
Faculty of Mathematics and Natural Sciences, University of Oslo 
No. 1658 
ISSN 1501-7710 
All rights reserved. No part of this publication may be  
reproduced or transmitted, in any form or by any means, without permission.  
Cover: Hanne Baadsgaard Utigard. 
Printed in Norway: AIT Oslo AS.   
Produced in co-operation with Akademika Publishing.  
The thesis is produced by Akademika Publishing merely in connection with the  
thesis defence. Kindly direct all inquiries regarding the thesis to the copyright  
holder or the unit which grants the doctorate.   
Contents
1 Introduction 5
1.1 Why low voltage CMOS ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Challenges for ultra low voltage CMOS . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 A roadmap for this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background 9
2.1 Ultra Low Power Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 CMOS Subthreshold operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Short and Narrow Channel Eﬀects . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Random Dopant Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Multi-objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Ultra Low Voltage SRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Summary of paper contributions 25
3.1 Paper I : Beneﬁts of Decomposing Wide CMOS Transistors into Minimum
Size Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.3 Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.4 Postscript : Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Paper II : Multi-Objective Optimization of Minority-3 Functions for Ultra
Low Voltage Supplies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.3 Postscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.4 Errata/Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Paper III : Muller C-elements based on Minority-3 Functions for Ultra Low
Voltage Supplies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.2 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Paper IV : Design of 9T SRAM for Dynamic Voltage Supplies by a Multiob-
jective Optimization Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.3 Supplement: Deriving equation (2) of paper IV . . . . . . . . . . . . . . . 32
iii
3.5 Paper V : A 65 nm 32 b Subthreshold Processor with 9T Multi-Vt SRAM and
Adaptive Supply Voltage Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Paper VI : Yield-Oriented Energy and Performance Model for Subthreshold
Circuits with Vth Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.2 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.3 Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Discussion 37
4.1 Exploiting INWE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Multi-objective optimization of ULV circuits . . . . . . . . . . . . . . . . . . . . . . 38
4.3 ULV SRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Subthreshold energy modeling including RDF . . . . . . . . . . . . . . . . . . . . . 44
5 Conclusion 47
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Recommendations for Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Publications included in the thesis 53
Paper I : Beneﬁts of decomposing wide CMOS transistors into minimum size gates,
NORCHIP, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Paper II : Multi-Objective Optimization of Minority-3 Functions for Ultra Low
Voltage Supplies, ISCAS, 2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Paper III : Muller-C elements based onMinority-3 Functions for Ultra Low Voltage
Supplies, DDECS, 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Paper IV : Design of 9T SRAM for Dynamic Voltage Supplies by a Multiobjective
Optimization Approach, ICECS, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Paper V : A 65 nm 32 b Subthreshold Processor with 9T Multi-VT SRAM and
Adaptive Supply Voltage Control, JSSC, Jan. 2013 . . . . . . . . . . . . . . . . . . 79
Paper VI : Yield-Oriented Energy and Performance Model for Subthreshold Cir-
cuits with Vth Variations., DDECS, 2013 . . . . . . . . . . . . . . . . . . . . . . . . 93
Bibliography 103
iv
Nomenclature
Notation
6T Short hand for 6-transistor (circuit)
E[X ] Estimate (arithmetic mean) of random variable X
SD[X ] Standard deviation of random variable X
SD2[X ] Variance of random variable X
Variables and Constants
α Activity factor
AVT Mismatch factor for determination of SD[Vth]. Typically expressed in units of mVμm
β The ratio of width to length for the transistor. (WL )
CD Depletion (channel-bulk) capacitance
CL Load capacitance.
Cox Gate oxide capacitance
Emin The minimum achievable energy per operation
Edyn Dynamic energy
Eleak Leakage (static) energy
Eop Energy per operation
Esc Excess short-circuit energy
εox Permittivity of Oxide
εSi Permittivity of Silicon
Ids Drain to source current
Ioff Drain to source current when Vg s = 0,Vd s =VDD
v
Ion Drain to source current when Vg s =Vd s =VDD
k Boltzmann’s constant 8.617× 10−5 eVK-1
L Transistor length
Leﬀ Transistor eﬀective length
M Propagation delay scaling factor
μ0 Electron mobility at low Vd s bias.
n Subthreshold slope factor
Pα Switching probability
Pdyn Dynamic power consumption (switching power).
Pstat Static power consumption (leakage power).
q A quantile, q ∈ (0,1)
q Electron charge in Coloumbe or electronVolts. 1.602× 10−19C= 1eV
Q,QR Number of standard deviations
T Temperature, typically in Kelvin
tox Gate oxide thickness
tp Intrinsic gate propagation delay
UT The thermal voltage (kT /q )
VDD The supply voltage
Vds Drain-source voltage
Vgs Gate-source voltage
Vmin The minimum supply voltage before failure occurs
Vopt The optimum supply voltage achieving Emin
Vth The transistor threshold voltage
Vth,n Threshold voltage for NMOS device
Vth,p Threshold voltage for PMOS device
W Transistor width
wc Index signifying worst-case condition
Weﬀ Transistor eﬀective width
vi
Acronyms
ABB Adaptive Body Bias.
ASIC Application Speciﬁc Integrated Circuit.
BL Bitline.
BSIM Berkeley Short-channel IGFET Model.
CMOS Complementary Metal-Oxide-Semiconductor.
CPI Clocks per Instruction.
DIBL Drain Induced Barrier Lowering.
DITS Drain Induced Threshold Shift.
DVS Dynamic Voltage Scaling.
DVFS Dynamic Voltage and Frequency Scaling.
EDA Electronic Design Automation.
EKV Enz, Krummenacher, Vittoz. A MOSFET model.
FIFO First in, First Out (memory buﬀer for queuing/ﬂow control).
INWE Inverse Narrow Width Eﬀect.
MEP Minimum Energy (operating) Point.
MOEA Multi-Objective Evolutionary Algorithm.
MOO Multi-Objective Optimization.
MOOP Multi-Objective Optimization Problem.
MOS Metal-Oxide-Semiconductor (Transistor).
MOSFET Metal-Oxide-Semiconductor Field Eﬀect Transistor.
MST Minimum-Split Transistor. Array of parallel transistors using minimum width.
NBTI Negative Bias Temperature Instability.
vii
NMOS N-type Metal-Oxide-Semiconductor.
NWE Narrow Width Eﬀect.
PMOS P-type Metal-Oxide-Semiconductor.
PDP Power-Delay Product.
PVT Process, Voltage and Temperature.
RDF Random Dopant Fluctuations.
SRAM Static Random Access Memory.
SCE Short Channel Eﬀect.
SNM Static Noise Margin.
STI Shallow Trench Isolation.
RISC Reduced Instruction Set Computing.
RMSE Root Mean Square Error.
RSCE Reverse Short Channel Eﬀect.
RV Random Variable.
ULV Ultra-Low Voltage.
ULP Ultra-Low Power.
VLIW Very Large Instruction Word.
WL Wordline.
WT Wide transistor. Used when comparing with MST.
viii
List of Figures
2.1 Qualitative illustration of power, cycle period, and energy per cycle as a func-
tion of VDD , in a switching circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Plot of Id s for an arbitrary NMOS device, displaying regions of operation
with respect to Vg s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Plot of Id s for a LP 65 nm NMOS device that displays both RSCE (along
length axis) and NWE (along width axis). . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Plot of conﬁdence bands using simple theory, for subthreshold Id s normalized
to the typical case. (AV T=4mVμm , n=1.7, T=-20°C ) . . . . . . . . . . . . . . 18
2.5 An example Pareto front for two conﬂicting objectives, delay and power, and
the tentative solutions p, q and r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Overview of typical SRAM module organization, components and layout. . . 22
2.7 Conventional CMOS 6-transistor (6T) SRAM . . . . . . . . . . . . . . . . . . . . . 22
2.8 Static noise margin (SNM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 8T SRAM cell with separate read and write wordlines (RWL,WWL) and sep-
arate read and write bitlines (BL, RBL) . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1 Layout and measurements of MST and WT ring oscillator. . . . . . . . . . . . . 27
3.2 Minority-3 functions investigated in Paper I. Named after the number of tran-
sistors in the circuit: 22T, 12T, 10T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Pareto sets showing width and length parametersWdn,Wd p ,L corresponding
to the Pareto Front in Fig. 2 of Paper II for areas less than 40μm2 and RMSEs
less than 4mV. (Marker symbols purely for visibility). . . . . . . . . . . . . . . . 29
3.4 Yield contours for equation (1) of Paper VI. . . . . . . . . . . . . . . . . . . . . . . 35
ix

Abstract
This dissertation is organized as a collection of papers, where each paper represents original
research contributions relating to the design and analysis of ultra low power CMOS, with a
particular emphasis on ultra low voltage and subthreshold operation. The individual papers
represent advancements particularily within methods and practices related to the design of
both digital logic and memory circuits in the presence of severe process variation. At the
device level it is demonstrated how the use of multiple minimum-width gates can exploit the
inverse narrow-width subthreshold device eﬀect to improve performance and power-delay
products. Measurement results from a 90 nm prototype conﬁrm the eﬀect. Multi-objective
optimization strategies are developed and applied to allow exploration of the Pareto opti-
mal design space for reliable logic at 150mV. Targeting operation at 300mV, the design of
a 9-transistor SRAM memory cell employing multi-Vt and virtual power techniques is pre-
sented. A multi-objective optimization strategy is developed and applied to achieve an opti-
mal trade-oﬀ for an eﬃcient and reliable sizing of the SRAM cell. Based on the 9-transistor
cell, measured results from an ultra low voltage 64 × 32 SRAM module operating down
to 273mV in a 65 nm technology indicate good yield and competitive performance metrics
(17.8 fJ/access/bit at averages of 761 kHz @ 321mV supply). Finally, the behavior of sub-
threshold logic circuits under the inﬂuence of adverse ﬂuctuations in the transistor threshold
voltages is treated analytically, with speciﬁc emphasis on minimum-energy operation and
yield constraints. The analysis can suggest optimal choices for supply voltage and device
sizing, prior to simulation.

Preface
During my studies at the University of Oslo I had the pleasure of making many acquain-
tances, both local and across the world. First and foremost I would like to thank my main
thesis supervisor Professor Snorre Aunet, who is now with the Norwegian University for
Science and Technology (NTNU), for his genuineness, his friendship, and for a lot of help-
ful advice and encouragement throughout the course of my Ph.D. Also thanks to my co-
supervisors Professor Tor Sverre Lande, and Professor Emeritus Oddvar Søråsen. Thank
you to the Department for Computer Science for funding my Ph.D. from 2009-2012, and to
the Nanoelectronics Research Group for funding 90 nm chip production, and thank you to
Nano Network for funding my trip to the DDECS conference in 2013.
I would also like to thank Associate Professor Sigbjørn Næss, whom I had the pleasure
of working with from 2009 to 2011, during the early development of the courses INF1410
and later INF1411. Here I got the chance to hone my skills on teaching, creating laboratory
exercises, tutorials, and solutions to hundreds of undergraduate EE problems. During the
semester 2011/12 I also had the pleasure of co-supervising Martin Haugland on his M.Sc.
thesis work, culminating in a tiny functional subthreshold standard cell library, including
sample synthesized layouts.
From the research groups Cognitronics and Sensor Systems Group at the Center of Ex-
cellence Cognitive Interaction Technology, Bielefeld University, and Systems and Circuit
Technology, Heinz-Nixdorf Institute, University of Paderborn, I would like to especially
express my thanks to Professor Dr.-Ing. Ulrich Rückert, Dr.-Ing. Sven Lütkemeier, and
Dr.-Ing. Mathias Blesken, for our successful collaboration on two papers, as well as being
excellent hosts, and for ensuring interesting talks during our research visit in Paderborn in
2010, and 2011, as well as funding of the 65 nm chip presented in Paper V. Thanks also go to
the German Academic Exchange Service DAAD, and the Norwegian Research Council, for
funding our exchange visits at Paderborn University in the project “Robust Ultra-Low-Power
Circuits for Nano-Scale CMOS Technologies”.
To my all my former oﬃce cohabitants, Amir Hasanbegovic, Dr. Kin-Keung Lee, Dr.
Jørgen Andreas Michaelsen, Ali Zaher and his family, Kristian Gjertsen Kjelgård, and Dr.
Jan Erik Ramstad, thank you all for your friendship, your openness, and the times that we
shared. Especially thanks to Amir Hasanbegovic for the collaboration on Paper III as well
as taking the lead role on our tutorial on standard cell characterization. And special thanks
also to Dr. Jørgen Andreas Michaelsen for great collaborative work on our project “A Low-
Voltage Low-Power and Low-Noise Signal Ampliﬁcation and Activity Detector”.
Thanks also to all my other colleagues at the Nanoelectronics Research Group of whom
I had the pleasure of meeting there, to my former M.Sc. thesis advisor Associate Professor
3
Philipp Häﬂiger for his eﬀorts then and continued support now, to Malihe Z. Doogahbadi,
Thanh Trung Nguyen, Srinivasa Reddy Kuppi Reddi, Dr. Shanthi Sudalaiyandi, thanks.
Thanks also to Head Engineer Olav Stanly Kyrvestad, for many years of friendship and for
keeping the labs fully operational, to Dr. Øivind Næss, and to Professor Yngvar Berg, for
very interesting discussions, but perhaps mostly for being an excellent teacher which I had the
good fortune of experiencing as an undergraduate student, to Håkon Andre Hjortland, for
sharing his love and passion for Linux-based free software tools, and to Associate Professor
Joar Martin Østby, for his dedication in arranging interesting technical meetings in Analog
Asic Forum, and everyone else if I have missed anyone.
Thanks goes also to all my colleagues at Integrated Detector Electronics for accepting
me so quickly. Especially to the general manager Dr. Gunnar Mæhlum for allowing me
time to complete my thesis, but also to Cand. real. Alf Olsen, Dr. Dirk Meier, Jahanzad
Talebi, Mehmet Altan, Jörg Ackermann, Suleyman Azman, Bahram Najaﬁuchevler, Codin
Gheorge, Tor Magnus Johansen, Philip Påhlsson, Petter Øya, David Steenari, Aage Kalsæg,
Willy Dang, Arkadiusz Edward Dlugolecki and Lyusine Shakbazhyan.
Finally, I would like to thank my family, in particular my parents, Erling Berge and Berit
Otnes, for always being loving and supportive. And especially my beautiful soul-mate and
wife, Samantha Kay Kelly Berge, for her continuing, radiant, and unabating love and support,
never ceasing to believe in me during my work towards this Ph.D.
4
Chapter 1
Introduction
1.1 Why low voltage CMOS ?
Historically, in commercial CMOS technologies, the core supply voltage has been scaled
down along with transistor dimensions. The main reason for doing so can be tied to device
reliability issues, as high electric ﬁelds can cause damage and reduce the lifetime of nanome-
ter scale transistors [1]. Additionally, the reduced supply voltage facilitates a power reduc-
tion. Beneﬁcially this reduces power dissipation, and very importantly it reduces self-heating
which can be a major concern in densely packed high-performance devices.
For any electronic digital circuit technology scaling down the power supply voltage is
beneﬁcial in terms of reducing both the dynamic (switching) and passive (leakage) power
consumption. For circuits dominated by active (switching) power consumption these gains
are roughly proportional to the frequency reduction and the square of the voltage reduc-
tion [2]. For circuits dominated by static power (standby leakage power), gains in deep
submicron processes can have an exponential relationship to the supply voltage, due to ef-
fects such as drain-induced barrier lowering (DIBL) [3]. When the supply voltage is reduced
circuits dominated by switching power will also beneﬁt from a reduction in the energy per
computation ﬁgure. However, as the maximum operating speed drops, leakage power in-
creases in proportion, thus leading to a minima condition for the energy per computation.
When the nominal supply voltage was around 5V one study [4] indicated that it could be
possible to reduce power consumption by several orders of magnitude by reducing the power
supply voltage. As nominal core supply voltages are soon creeping below 0.8V [5], potential
gains from supply voltage reduction are reduced in magnitude, but still one can achieve very
signiﬁcant gains.
It is important to note that reducing the supply voltage consequentially decreases the
operating speed. For applications in the low to medium performance region, or for applica-
tions where high performance is only required occasionally, there are however few reasons to
maintain a higher supply voltage than neccessary, as this would simply lead to wasted power.
Many applications could thus take advantage of reducing the voltage supply to reduce power
consumption or the energy per computation. As reductions in power and energy consump-
tion are beneﬁcial in extending the battery time of battery-operated devices, applications
within handheld and portable devices can easily be imagined. Additionally, ultra low power
consumption may be an enabler for new classes of devices, powered for instance by energy
5
6 Chapter 1. Introduction
harvesting mechanisms. Savings on energy/computation ﬁgures may also provide substantial
savings on the electricity bill of large scale computing farms. A potential example is wireless
sensor nodes. Wireless sensor nodes could be used for a wide variety of purposes, with many
examples such as [6]: humidity monitoring within agriculture, early forest ﬁre detection
within environmental monitoring, or gas leak detection for oil and gas industries.
1.2 Challenges for ultra low voltage CMOS
While scaling down the supply voltage is very eﬀective in reducing the overall power con-
sumption of a circuit, there are several factors that can limit eﬀectiveness and reliability of
circuits in nanometer scale CMOS.
When the supply voltage is reduced to below the transistors inherent threshold voltage,
the transistors operate in the subthreshold region. In the subthreshold domain it is normal
to see great variation in the on-currents of devices, particularily for small transistors. The
dominant cause of this is random dopant ﬂuctuations (RDF), i.e. ﬂuctuations in the number
and placement of dopant atoms, which are implanted during fabrication in order to set the
transistor threshold voltage. The overall eﬀects of RDF can however be tedious to model and
simulate, and works focused on subthreshold design can suﬀer a strong bias if it’s disregarded.
For minimum size devices RDF can cause variations in the device on-current of several orders
of magnitude. For digital circuits this may lead to timing variations of similar magnitude.
For synchronous systems the worst case implication is fatal timing violations, as current
design practices for determining safe hold times may not be adequate in subthreshold design.
Although careful design can limit fatal errors, the main eﬀect of RDF is that the maximum
clock speed is drastically reduced, leading to a signiﬁcant performance drop.
When scaling down the supply voltage we simultaneously reduce the noise margins. At
lower voltages the diﬀerence between the device on and oﬀ currents are also reduced. Com-
bined with the increased current variation induced by random dopant ﬂuctuations this can
ultimately lead to an inability of simple gates to yield the correct output voltage.
The current in devices operated in the subthreshold region is limited by the diﬀusion
of available carriers in the channel [7]. Therefore subthreshold devices show a very strong
response to changes in operating temperature. At lower temperatures there are fewer carri-
ers available, and the current is greatly reduced, while at higher temperatures an increase in
current is seen. This is the opposite of what is seen in superthreshold. When operating a
transistor in the superthreshold region, one typically estimates a ±20% change in the device
on-current, in a typical range from -20°C to +85°C . In the same temperature range a sub-
threshold current may exhibit variation of several orders of magnitude. While this topic is
not given too much time in this thesis, the global variation resulting from temperature change
can be handled by several techniques, such as adaptive body biasing, or dynamic scaling of
the voltage supply.
Traditional superthreshold sizing strategy, using L = Lmin while scaling W , is not neces-
sarily the best approach for nanoscale subthreshold transistors. Speciﬁcally, the short channel
eﬀect (SCE), the narrow width eﬀect (NWE), and the inverse narrow width eﬀect (INWE)
may yield counter-intuitive results. Near the smallest dimensions, the device Id s current may
1.3. A roadmap for this thesis 7
increase with increasing L due to SCE, or even decrease with increasing W due to INWE.
If we additionally consider the impact that sizing has on RDF, optimal subthreshold device
sizing becomes a rather complex problem.
1.3 A roadmap for this thesis
This thesis is a collection of papers that all relate to ultra-low voltage and subthreshold cir-
cuits. The original contributions perhaps most central theme is, speciﬁc to subthreshold logic
gates and memory, how to mitigate or utilize certain device and processing eﬀects that are
typically diﬃcult to handle during design and optimization. Reprints of the individual works
are included in Part II. Paper contributions included in this thesis are listed as follows :
Paper I H. K. O. Berge and S. Aunet, “Beneﬁts of decomposing wide CMOS tran-
sistors into minimum-size gates” in NORCHIP, 2009, pp. 1 –4, Nov. 2009.
Paper II H. K. O. Berge and S. Aunet, “Multi-objective optimization of minority-3
functions for ultra-low voltage supplies”, in Proc. IEEE Int. Circuits and
Systems (ISCAS) Symp., pp. 2313–2316, 2011.
Paper III H. K. O. Berge, A. Hasanbegovic, and S. Aunet, “Muller c-elements based
onminority-3 functions for ultra low voltage supplies”, inDesign andDiag-
nostics of Electronic Circuits Systems (DDECS), 2011 IEEE 14th International
Symposium on, pp. 195–200, April 2011.
Paper IV H. K. O. Berge, M. Blesken, S. Aunet, and U. Rückert, “Design of 9T
SRAM for dynamic voltage supplies by a multiobjective optimization ap-
proach”, in Proc. 17th IEEE Int Electronics, Circuits, and Systems (ICECS)
Conf, pp. 319–322, 2010.
Paper V S. Lütkemeier, T. Jungeblut, H. K. O. Berge, S. Aunet, M. Porrmann,
and U. Ruckert, “A 65 nm 32 b subthreshold processor with 9T multi-
Vt SRAM and adaptive supply voltage control,” Solid-State Circuits, IEEE
Journal of, vol. PP, pp. 1 –12, Jan 2013.
Paper VI H. K. O. Berge and S. Aunet, “Yield-oriented energy and performance
model for subthreshold circuits with Vth variations,” in Design and Diag-
nostics of Electronic Circuits Systems (DDECS), 2013 IEEE 16th International
Symposium on, pp. 193–198, April 2013.
All paper contributions (I–VI) relate to ultra-low voltage and subthreshold circuits. Pa-
per I concerns itself with a new opportunity for device sizing that may arise for devices that
display the inverse narrow width eﬀect (INWE), allowing two minimum-width transistor to
operate with improved characteristics compared to a single wide equivalent. Papers II, III,
and IV investigates the application of multi-objective optimization strategies to improve the
performance and reliability/yield of subthreshold circuits. Papers II and III explore the sub-
threshold design space for several implementations of minority-3 gates andMuller C-elements
by using multi-objective optimization to uncover the Pareto Fronts of these circuits. Paper
IV and V relate to the design and measurement results of subthreshold SRAM. Paper IV cov-
ers design improvements and design space exploration for a 9T multi-Vt SRAM cell making
8 Chapter 1. Introduction
it suitable for subthreshold applications at 300mV. Paper V describes measurement results
from a subthreshold VLIW processor, as well as measured results from an SRAM module
based on the 9T cell of Paper IV. Paper VI approaches the problem of subthreshold device
sizing and voltage selection for minimum energy consumption analytically. This analysis is
done taking into account the desired yield, and RDF as the dominant source of variation.
The papers are organized in chronological order, with the exemption of Paper IV, due to
it’s close thematic and introductory relation to Paper V. For Paper V, SRAM related content
has the emphasis in this thesis. To learn more about the subthrehold processor of Paper V,
please refer to section 3.5
Additionally during my Ph.D. work I co-supervised the M.Sc. thesis work of Martin
Haugland [8]. This work can be considered related to this thesis as it employs multi-objective
optimization to size standard cells targeting a subthreshold standard cell library, including
layout, synthesis and trial place and route.
The rest of this thesis is organized as follows: In Chapter 2 a brief introduction to the
most central topics of this thesis is given. In Chapter 3 each paper contribution is introduced,
and a summary of the results are given. Chapter 4 is devoted to a discussion of the thesis
contributions, providing further perspectives. The conclusion and recommendations for
future work are presented in Chapter 5.
Chapter 2
Background
This chapter very brieﬂy introduces basic concepts central to the paper contributions of this
thesis. It is not intended as an exhaustive review, but rather serve as a convenience to the
reader providing a more general scope. Short introductions are also provided in the papers.
2.1 Ultra Low Power Design
A short historical account for low power design up until 2003 has been given in [9], of
which I give an even briefer, slightly modiﬁed account in this ﬁrst paragraph. In the early
days of computing vacuum tubes were used to do calculations and power consumption was
a concern. The ENIAC used 18,000 vacuum tubes and consumed 150 kW. By comparison
transistors dissipate much less power, typically at least by a factor 1000, and a much greater
power reduction is achieved in modern ICs. Since the invention of the bipolar transistors
in 1947/1948, and later integrated ICs in 1958/1959, power consumption in computational
circuits was for a long time rarely a concern for circuit designers. Ultra low power (ULP)
circuit design was however pioneered in the 1960s-1970s, when Swiss watchmakers decided to
make an electronic watch. While their ﬁrst circuits were made in bipolar, they became early
adopters of CMOS in 1964. To allow the watch to operate for 1 year on a small battery it
had to consume only microwatts. Fortunately it did; their Beta wristwatch operated at 1.3V
and drew only 13μWof power, while the battery could supply 18μW. Around 1990-1992,
the semiconductor industry became aware that it would be necessary to pay more atten-
tion to the power consumption in designs and that cooling might be necessary. The power
consumption was continually increasing, along with the speed and complexity of digital pro-
cessors. Additionally, the market demand for more complex portable devices was growing.
Low Power conferences and workshops started appearing around 1993. Many concepts that
were discussed then were not really new but were in part the reuse of old techniques with the
purpose of achieving low power, e.g. pipelining, parallelism, asynchronous circuits, selection
of states for ﬁnite state machines, reduced swing and transistor sizing. The Harvard Architec-
ture (which is used in the now popular AVR architecture) was designed in 1939, and almost
all early computers used RISC-like instruction sets, achieving a low clocks per instruction
(CPI) ﬁgure. Pipelining and parallelism for low power were introduced in [10]. Pipelining
shortens delay paths and thus allows one to reduce the supply voltage at the same frequency.
9
10 Chapter 2. Background
Parallelism allows a reduction in frequency for the same throughput, thus also allowing a
reduction in the supply voltage. Asynchronous circuits promise the removal of the clock
tree, which often is responsible for a major part of the power consumption. Some new, but
perhaps obvious concepts were introduced, such as gated clocks and activity reduction. An-
other new concept was dynamic voltage scaling (DVS), e.g. changing the supply voltage and
frequency dynamically to suit the required throughput. Today, many of the more dramatic
issues discussed derive from the use of deep submicron and nanometer scale processes, e.g.
leakage, delay variations, very low supply voltages, cross-talk, and soft errors.
The Swiss watchmakers continued their work from the late 1960s into the 1970s. In [11]
it is described how in particular Dr. Eric A. Vittoz did pioneering work on operating MOS-
FETs in the weak inversion region [12]. Although analytical expressions for the current
in weak inversion, or the diﬀusion current, was in basic principles derived independently
by 1966 [13], Vittoz showed how to utilize this region of operation, using what was then
unheard of low supply voltages, achieving a remarkably low power consumption, with ap-
plications in miniature portable devices such as hearing aids, wrist-watches and biomedical
devices [11]. His work on micropower techniques and near- and subthreshold operation has
continued such as in [2, 14, 15] with contributions also to [16].
During the early 2000s attention was again raised when the Massachusetts Institute of
Technology’s Subthreshold Research Group produced several relatively complex digital ICs
capable of operation at very low VDD , such as a 175mV multiply-accumulate unit [17],
and a 180mV FFT processor [18]. In more recent years a few companies and startups have
appeared with either a direct or partial goal of taking advantage of the subthreshold domain.
To name a few, AmbiqMicro founded in 2010 advertises a microcontroller with 30μA/MHz,
as well as another product, a real-time clock that can operate with as little as 42 nW 1. PsiKick
founded in 2012 is currently developing “Ultra-Low-Power Wireless Platforms”2. Iridium
Technologies LLC founded in 2006 is working on producing high-reliability and radiation-
hardened circuits capable of reliable subthreshold operation3.
As subthreshold operation allows the lowest supply voltage it is easy to understand it’s
allure. This is easy to see when looking at the dynamic power in a switching circuit which
can be expressed as [2]:
Pdyn = f αCLVDDVswing (2.1)
Here f is the frequency, CL is the total capacitive load,VDD is the power supply voltage,Vswing
is the logic voltage swing (often equal to VDD), and α is an activity factor – a number between
0 and 1 specifying the proportion of how much of the total load is being switched on average
in a cycle. For a ﬁxed system conﬁguration with a low frequency capable of operating at
200mV we can calculate that we can save 96% of the dynamic power compared to operating
at 1V. In addition to switching power, the total active power may also be considered to
contain a component of short-circuit power [2]. This contribution is fairly often considered
to be negligible [19, 20], although it can depend on design and application. According to
[2] the short circuit contribution should remain below 20%. For the purpose of this brief
1See ambiqmicro.com
2See www.iridumtec.com
3See iridumtec.com
2.1. Ultra Low Power Design 11
introduction it will also be neglected. The static leakage power Pstat can be expressed as:
Pstat = Ileak(VDD)VDD (2.2)
Here the leakage Ileak is expressed as a function of the supply voltage. Pstat scales linearly with
VDD , only if Ileak(VDD) is constant with VDD . However, in deep submicron and nanometer
processes Ileak(VDD) typically increases exponentially with VDD , which will be further ex-
plained in Section2.2. From these two power equations it is easy to see that it is very useful
to operate at a low VDD as long as your system requirements otherwise allow this. This is
the primary driver for interest in the ﬁeld of subthreshold CMOS or ultra low voltage (ULV)
circuits in general – its promise of delivering extremely low power consumption and energy
per computation.
The energy per operation Eo p can in sequential circuits be calculated as the power dissi-
pated during the clock period tc l k = 1/ fc l k [2]:
Eo p = tc l kPdyn+ tc l kPstat (2.3)
Eo p = αCLV
2
DD+ tc l k Ileak(VDD)VDD (2.4)
With respect to adjustments of the supply voltage, we can ﬁnd the supply voltage where min-
imum energy per operation occurs (Vopt), by solving
∂ Eo p
∂ VDD
= 0 [16]. Finding the minimum
energy operating point when taking into account RDF is also considered in Paper VI.
Figure 2.1 shows qualitatively how the components Pdyn , Pstat , cycle period, Eo p , and
static and dynamic energy components typically scale. This ﬁgure is based on simulations in
a 65 nm LP technology with typical conditions and no statistical variation, of an 11-stage ring
oscillator where the dynamic power.has been scaled down in post-processing, to simulate an
activity factor of α= 0.05. We can see that the power consumption varies with over 6 orders
of magnitude, while the cycle period varies with a little more than 5 orders of magnitude
until leakage dominates the power consumption. For the energy consumed per cycle Eo p ,
the minimum energy point occurs around 0.4V, and for high performance duty at 1.2V
the maximum energy occurs at a factor approximately 7× larger. This corresponds to an
energy saving of over 85% if it suits the demands of the application. The activity factor α
is important when estimating how low the energy minimum will occur. For α = 0.2 the
MEP occurs at 0.25V and saves 92.75% compared to maximum performance. However, the
maximum energy then consumes a factor 4 more, so to keep any gains the increased switching
should result in increased throughput.
According to [21] the minimum theoretical operating voltage for CMOS switching cir-
cuits for a fan-in of 3 and maximum gain larger than 4 is 83mV at room temperature, while
the practical limit due to PVT variability was estimated at 200mV. In [22] 100mV is sug-
gested as a lower practical limit for VDD. Necessarily, with higher reliability demands and
increasing system complexity, variability may lead to more adverse results than in [21]. In
[23] it is argued that variability and high yield targets may make it impossible to reach the
target Vmin or Vopt. That discussion has been given a more quantitative basis in Paper VI.
12 Chapter 2. Background
a)
VDD
0 0.2 0.4 0.6 0.8 1 1.2
P
ow
er
 [W
] a
nd
 C
yc
le
 p
er
io
d 
[s
]
10 -16
10 -14
10 -12
10 -10
10 -8
10 -6
10 -4
10 -2
t
clk
P
dyn
P
stat
b)
VDD
0 0.2 0.4 0.6 0.8 1 1.2
E
ne
rg
y 
pe
r o
pe
ra
tio
n 
[J
]
10 -16
10 -15
E
op
E
dyn
E
stat
Figure 2.1: Qualitative illustration of power, cycle period, and energy per cycle as a function
of VDD , in a switching circuit.
2.2 CMOS Subthreshold operation
Several good introductions to subthreshold operation and design have already been written,
for instance [2, 14, 16, 23, 24]. I will however introduce a few concepts central to this thesis
2.2. CMOS Subthreshold operation 13
Vgs [V]
0 0.2 0.4 0.6 0.8 1 1.2
I d
s 
[A
rb
. u
ni
ts
]
10 -12
10 -11
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
Subthreshold Superthreshold
Vth
Weak inversion Strong inversion
Near
threshold
Moderate
inversion
 	
  
 
Figure 2.2: Plot of Id s for an arbitrary NMOS device, displaying regions of operation with
respect to Vg s .
here, as a convenience to the reader.
Figure 2.2 displays the subdivision of operating regions for an NMOS with respect to
varying the gate to source voltage Vg s . The subthreshold region can be deﬁned as when
Vg s is smaller than the threshold voltage (Vg s < Vth ). The weak inversion region is where
the transistor drain current develops exponentially with the gate to source voltage (Vg s ).
However, often in the literature, the subthreshold current is actually referring to the current
in the weak inversion region (Vg s < Vth − X ), where X is a suitable value allowing the
approximation to remain reasonable. In this thesis I also keep this simpliﬁcation, referring
to the weak inversion current simply as the subthreshold current, and I explicitly mention
near-threshold or moderate inversion operation when necessary.
The current in weak inversion, or the diﬀusion current, was in basic principles derived
analytically by 1966 [13]. Later, several MOSFET models have added more detail. One
model that has been popular with analog circuit designers is the EKV model [15], and it is
continuous and diﬀerentiable over all regions. In the following we shall however exclusively
concern ourselves with weak inversion. A slight reformulation of the expression for the drain
to source subthreshold current, excluding moderate inversion, that perhaps is particularly
suitable for circuit design and analysis was expressed in [23] as:
Id s =βI0e
Vg s+λd sVd s
nUT

1− e −Vd sUT

(2.5)
14 Chapter 2. Background
Here Vg s is the gate-to-source voltage, Vd s is the drain-to-source voltage, UT is the ther-
mal voltage (kT /q ), and λd s represents the shift of the threshold voltage due to DIBL. The
slope factor, n, is given by (1+ CdCox ) where Cd is the depletion layer capacitance per unit
area. Experimental data show that n is also aﬀected by the geometric sizing of the transistor,
particularily the length [25]. The subthreshold swing nUT ln10 expresses the subthreshold
slope in terms of the Vg s necessary to increase the current by a decade. Although the ideal
transistor would reach 60mV/decade for n = 1, more moderate values for n will result in a
larger subthreshold swing, e.g. 83.5mV/decade for n = 1.4, or 101.4mV/decade for n = 1.7.
Deep submicron CMOS processes usually involve a poorer subthreshold swing.
β in equation (2.5) represents the tuneable transistor strength as typically seen by the
circuit designer:
β=
W
L
e
λb sVb s
nUT (2.6)
HereW and L are the transistors width and length,Vb s represents the body-to-source voltage,
and λb s represents the body-eﬀect on the transistor threshold voltage. The device character-
istic current I0 is given mainly by factors from the process, often outside the circuit designers
direct inﬂuence:
I0 = (n− 1)μ0CoxU 2T e
−Vth
nUT (2.7)
Here μ0 is the carrier mobility, Cox the oxide capacitance, and Vth is the threshold voltage.
For long and wide gates with uniform doping, the threshold voltage Vth can be given by:
Vth =Vth0+ γb s

Φs −Vb s (2.8)
Here Vth0 is the long-channel threshold voltage for zero substrate bias, and Φs is the surface
potential. Non-uniform doping eﬀects can be modelled using λb s and γb s . Note that λb sVb s
and λd sVd s in equation (2.6) is also typically considered a contribution the threshold voltage,
and a detailed model is much more complicated. Also obscured by equations (2.6, 2.8, 2.7),
Vth0 is also a function of deviceW ,L sizing. This is particularly relevant at small dimensions.
In the next subsection I will however describe the inﬂuence these contributions to Vth in
more detail.
2.2.1 Short and Narrow Channel Eﬀects
When the channel length becomes smaller, or the drain voltage becomes larger, the electric
ﬁeld from the MOS drain terminal to the channel grows in importance. This phenomenon
is called drained-induce barrier lowering (DIBL4). Eventually, when increasing Vd s and/or
decreasing L, DIBL will lead to punch-through, as source and drain channel become merged.
To counteract the eﬀect of short channel eﬀects such as DIBL and its associated Vth roll-oﬀ,
one can use a larger doping near the source and drain edges of the channel. These implants
are called pocket implants (or Halo implants) and are widely used in deep submicron pro-
4In an energy band diagram DIBL could be drawn as a drag on the energy bands of the channel (barrier).
2.2. CMOS Subthreshold operation 15
cesses [26].
To allow a slightly deeper understand of the various eﬀects that modulate the threshold
voltage in deep submicron CMOS devices, we will indulge ourselves with a quick and shallow
review of the main contributions to Vth as expressed in the BSIM4.6 model.
In the BSIM4.6 model the short channel eﬀect (SCE) on the threshold voltage is modelled
separately to DIBL and can be written as [25]:
ΔVth,SCE =− 0.5DVT0
cosh

DVT1 Le f flt0(1+DVT2Vb s
− 1 [Vbi −φs] (2.9)
Where Le f f is the eﬀective length, lt0 is the characteristic length,Vbi is the built-in voltage
of the source and drain junctions. The model parameters DVT0, DVT1, DVT2 are respec-
tively the ﬁrst, second, and body-bias coeﬃcients for the short channel eﬀect. The eﬀect of
DIBL on the threshold voltage is modelled as [25]:
ΔVth,DIBL =− 0.5
cosh

DSUBLe f flt0
− 1 [ETA0+ETABVb s]Vd s (2.10)
Where the model parameters ETA0 is the DIBL coeﬃcient in subthreshold region, and ETAB
is the body-bias coeﬃcient for the subthreshold DIBL eﬀect.
We notice that both the above eﬀects are strongly dependent on short lengths as when
Le f f approaches zero the cosh function approaches 1. We can also see that the DIBL con-
tribution, that is dependent on Vd s , is a weaker eﬀect in subthreshold compared to nominal
supply voltage.
For short channels the length dependent eﬀect of pocket (Halo) implants modulates the
eﬀect of body bias. This is modelled as [25]:
ΔVth,RSCE =K1

φs −Vb s −

phis
	√√√1+ LPEB
Le f f
(2.11)
+K1
√√√1+ LPE0
Le f f
− 1

φs −K2Vb s (2.12)
Here the parameters K1 and K2 are called the ﬁrst-order and second-order body bias co-
eﬃcient (written as γb s and λb s in the previous section), LPE0 and LPEB are respectively
the zero body bias, and body bias dependent, lateral non-uniform doping parameters. For
long channel devices pocket implants can also cause a signiﬁcant drain induced threshold
shift (DITS) [26]. In the BSIM4.6 model the eﬀect of DITS is modelled as:
ΔVth,DITS =−nUT ln
⎛
⎜⎜⎝
Le f f

1− e−Vd sUT

Le f f +DVTP0 (1+ e−DVTP1Vd s )
⎞
⎟⎟⎠ (2.13)
Here nUT is the subthreshold swing, and DVTP0 and DVTP1 are the ﬁrst and second
coeﬃcient for DITS due to long channels with pocket implants. This eﬀect approaches zero
16 Chapter 2. Background
slowly for increasing Le f f .
For long narrow-width devices there is a notable contribution from fringing ﬁelds along
the edge of the channel. This eﬀect depends on the isolation technology, and cause a threshold
shift which is modelled as [25]:
ΔVth,NWE1 = (K3+K3BVb s )
TOXE
W ′e f f +W0
φs (2.14)
Here K3 is called the narrow width coeﬃcient and K3B is the body eﬀect coeﬃcient of
K3, while W0 is the narrow width parameter, and TOXE is the equivalent oxide thickness.
The main eﬀect of equation (2.14) is an increase in Vth at low channel widths. Alternatively
the expression can be used for a decrease in Vth if we allow for a negative sign for K3, and
K3B. For short and narrow devices a reverse width-dependent eﬀect is also modelled as [25]:
ΔVth,NWE2 =− 0.5DVT0W
cosh

DVT1W
Le f f W
′
e f f
lt0(1+DVT2WVb s )
	
− 1
[Vbi −φs] (2.15)
Here DVT0W and DVT1W and DVT2W are respectively the ﬁrst, second and body-bias
coeﬃcients for narrow width eﬀects on Vth in short channels. We see that this contribution
grows for small widths, and decreases with Vb s (for default negative values of DVT2W).
To summarize, the various short and narrow channel eﬀects contribution to a Vth shift
in deep submicron devices are numerous and relatively complicated. Partly this is due to
non-uniform halo doping which is introduced to counter Vth roll-oﬀ and punch-through. To
visualize what reverse and short channel eﬀects on the Id s current of a 65 nm device may look
like, simulation results are displayed in a mesh plot in Figure 2.3. At the lower bounds for
both W and L we see the eﬀect of Vth roll-oﬀ on the drain current. The normal and reverse
short channel eﬀect and it’s impact on design in subthreshold is discussed in [27], while
utilization of the reverse (inverse) narrow width eﬀect is a topic of Paper I in this thesis.
2.2.2 Random Dopant Fluctuations
Although variations in the eﬀective width and length of the transistor contributes to varia-
tions in the threshold voltage [28], the dominant source of random variations in the thresh-
old voltage is often a result of the implantation process. Typically for CMOS, the threshold
voltage of a transistor is typically set by implanting dopant atoms near or just below the
channel-oxide interface [29]. While many aspects of the implantation process is well con-
trolled, some parameters, such as the number of dopants, and their geometrical distribution,
relies on random processes. Therefore each transistor will be slightly diﬀerent from the next
and in a macroscopic model they have slightly diﬀerent threshold voltages.
Threshold voltage ﬂuctuations are often considered to follow the Gaussian (normal) dis-
tribution after experimental evidence such as in [30], and “atomistic” simulations such as in
[31]. However, one might encounter some skewness depending on the exact nature of the
device. More recent experimental results on Vth shifts induced by NBTI such as in [32]
displays comparatively fairly strong skewness after stressing devices.
2.2. CMOS Subthreshold operation 17
1
0.8
Width [μm]
Mesh plot for NMOS in 65 nm
Showing RSCE and NWE in subthreshold
0.6
0.4
0.2
00
0.1
0.2
Length [μm]
0.3
0.4
2.5
1.5
2
5
4.5
4
3.5
3
0.5
10 -9
I d
s 
[A
]
Figure 2.3: Plot of Id s for a LP 65 nm NMOS device that displays both RSCE (along length
axis) and NWE (along width axis).
For circuit design the standard deviation of Vth , σ(Vt), is typically given as [33]:
σ(Vt) = SD[Vth] =
AV T
We f f Le f f
(2.16)
Here We f f and Le f f are the transistor eﬀective width and length. AV T is given by the tech-
nology as [34]:
AV T =
1
2
4

4q3NdεSiφB
tox
εox
(2.17)
Where q is the elementary charge, Nd is the number of channel dopants, εSi and εox are
the permittivity of the silicon and the oxide, φB is the work function, and tox is the oxide
thickness. Technology scaling to smaller geometry processes usually involves reducing the
oxide thickness or increasing the oxide permittivity (using high-K dielectric), in an eﬀort to
enhance the channel control. Therefore more modern processes usually means a reduced
AV T , resulting in that a same-geometry transistor will have a reduced σ(Vt)in a smaller scale
technology. However, if the geometry is scaled down as well, more modern technologies
usually display more variation in Vth as a result of RDF and downscaling.
For subthreshold operation the variation in current due to RDF can be very large in
a minimum size device. Since the subthreshold current is exponentially dependent on the
threshold voltage the impact of Vth variations due to RDF are much more severe than in the
18 Chapter 2. Background
	
  	
10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3

	



 
10 -2
10 -1
10 0
10 1
	   
		 
   
	  
3
6
Figure 2.4: Plot of conﬁdence bands using simple theory, for subthreshold Id s normalized to
the typical case. (AV T=4mVμm , n=1.7, T=-20°C )
superthreshold domain. This can have a severe negative impact on the realization of large
synchronous digital circuits, and large delay increases may occur. To visualize the impact a
plot of the ±3σ and ±6σ conﬁdence bands of Id s has been made based on equation (2.5) and
(2.16), shown in Figure 2.4.
Naturally it would be welcome if devices could oﬀer less variation inVth . New gate stacks
with high-K dielectrics and metals and alloys to set the threshold voltage oﬀer a signiﬁcant
improvement and were introduced in commercial processes in 2007 at the 45 nm node [35].
While the technique is diﬃcult, most technology nodes below 45 nm utilize such techniques
today. At moderate technology nodes there are however few if any advertised processes that
oﬀers this.
2.3 Multi-objective Optimization
The classical approach to sizing standard CMOS logic cells is to set the length at minimum
to minimize input capacitance while maximizing drive strength and focus on symmetric DC
curves to simultaneously optimize noise margin and propagation delay [36]. This leaves one
free parameter. Knowing that subtle and complex eﬀects cause the DC behaviour of sub-
threshold MOS devices to vary substantially, a question that may arise is how can a circuit
designer take into account this multitude of eﬀects to optimize a subthreshold circuit?. One pos-
sible answer may be to exploit a multi-objective optimization (MOO) method. Paper II,
III and IV apply MOO to circuit design optimization. The papers primarily focus on the
2.3. Multi-objective Optimization 19
application and results, and do not go into details of the algorithms, therefore only a brief
introduction is included here.
Multi-objective optimization can be used for problems when optimal decisions need to be
taken in the presence of conﬂicting objectives. This type of problem is inherent to engineer-
ing where one seeks to balance performance, cost, risk, and schedule [37]. Many algorithm
approaches exist to solving multi-objective optimization problems (MOOPs), such as genetic
algorithms, simulated annealing, the complex method, random search, taboo search, and hy-
brid methods [38]. In the last decade research interest has however increasingly focused on
using multi-objective evolutionary algorithms (MOEAs) [39], as they can work on popula-
tions of solutions.
Multi-objective optimization seeks to optimize multiple objective functions, under a set
of constraints. Mathematically, a multi-objective optimization problem (MOOP) can be
described as a minimization problem:
min{x1,...,xn}∈S
{ f1(x), . . . , fk(x)} (2.18)
Here the parameters {x1, . . . , xn} form a point in the n-dimensional search space S, also
known as the decision space. The objective functions { f1(x), . . . , fk(x)} form a k-dimensional
function F , the objective space. The solution to the MOOP is called the Pareto5 set P , while
the image of P in F is called the Pareto front. P consists of all non-dominated solutions p in
S. A solution is called non-dominated if there exists no other solution that can improve any
objective without at the same time worsening another. Conversely, a solution is called domi-
nated if there exists another solution that is not worse for any objective and at the same time
improves at least one objectives. For more complex MOO problems, it becomes increasingly
hard to ﬁnd the true Pareto front. Therefore, MOO algorithms typically only approximate
P .
To explain the above an example Pareto front is shown in Fig. 2.5. Here the objectives
delay and power are subject to be minimized with respect to underlying design parameters.
In the ﬁgure the line represents the true Pareto front, and the points represent the evaluation
of solutions p, q and r . In the ﬁgure q is dominated by p since both delay and power is
improved for p, thus q is not part of the Pareto set or front. The solution p is not dominated
by either q or r , and r is not dominated by either p or q . A MOO algorithm can not
necessarily see that p is part of the true Pareto front or that r is not. The solution r would
therefore also be considered non-dominated and part of the approximated Pareto front until
the algorithm can ﬁnd a new solution that dominates it.
It is fairly common to use MOEAs to solve complex MOOPs by approximating the
Pareto Front. Genetic algorithms such as the Non-dominated Sorting Genetic Algorithm-II
(NSGA-II) [40] and Strength Pareto Evolutionary Algorithm 2 (SPEA-2) [41] have become
standard approaches, and many other algorithms and variants exist.
For standard cell optimization the GAIO software package was used in [42]. There, after
an initial search in a grid, the search space is iteratively subdivided in smaller and smaller
subspaces, so only subspaces that contained Pareto points is searched. This approach seems
5Historically, Francis Y. Edgewick and Vilfredo Pareto are credited with the introduction of the concept of
non-inferiority in the context of economy [37].
20 Chapter 2. Background
p
q
r
Power [W]
D
el
ay
 [s
]
Figure 2.5: An example Pareto front for two conﬂicting objectives, delay and power, and the
tentative solutions p, q and r .
eﬀective when there is an expected relationship between parameters, and will have some
obvious advantages when the number of solutions are large.
Naturally, it makes sense to employ MOO only for conﬂicting objectives. If two objec-
tives are related such that improving one objective will always improve the other, then it is
suﬃcient to include only one objective in the MOOP. In the context of optimizing fuzzy
control systems [43] quotes MOO evolutionary algorithms as usually being very good at
handling two or three objective functions, whereas when the number of objectives increases,
almost all solutions become non-dominated, thus their search capacity worsens. To handle
this one may choose to ignore some objectives, or integrate several aspects into one objective.
These methods often work well if some objectives are statistically insigniﬁcant, or if they are
related.
For the purpose of optimizing CMOS logic circuits multiple performance criteria exist.
In broad terms we often seek to optimize on criteria such as delay, active power, leakage
power, layout area, and reliability. There are many ways to express these criteria into objec-
tive functions for a MOOP, and several of these objectives have some relation. E.g. minimiz-
ing the area for a given delay will typically simultaneously minimize the active power, unless
leakage is a signiﬁcant contributor. Fairly recently MOO was used for resource eﬃcient de-
sign of standard cell libraries, both for standard and subthreshold operation [36, 42, 44]. The
approach allows for a well-informed, and balanced selection of optimal sizings for standard
cells. In [36] the objectives noise margin, dynamic energy and propagation delay are opti-
mized. However noise margin and propagation delay appear related, thus one could perhaps
get similar results with one less objective or by combining the objectives.
Naturally also the number of parameters inﬂuence the complexity and hence the search
capacity and execution time. If there exists algebraic combinations of parameters that eval-
uate to equal and Pareto optimal objective values, and the parameters are continuous, then
there are also inﬁnitely many solutions to the MOOP. On the other hand, if such an alge-
braic combination is known, then one may set this as a constraint and reduce the number of
parameters thus simplifying the search.
To summarize, multi-objective optimization problems typically have more than one so-
lution. MOEAs represent an eﬃcient method to approximate all the best available resource-
eﬃcient tradeoﬀs in an engineering design. At todays status MOEAs can be eﬀective in
2.4. Ultra Low Voltage SRAM 21
handling multi-objective optimization problems with up to four objectives.
2.4 Ultra Low Voltage SRAM
Static random access memory (SRAM) plays a key role in many digital systems, supporting
volatile storage in applications such as instruction memory, data memory, cache, FIFOs,
register ﬁles and scratchpad memories. The ability to reduce the supply voltage of SRAM
modules is interesting for several reasons; to reduce leakage during inactive standby modes
while retaining the contents of memory, to reduce access energy when only low throughput
is required, and/or to operate at the same supply voltage as other intra-die ULV circuits.
When scaling down the supply voltage of digital circuits, the SRAM minimum operating
voltage Vmin, is however often considered the limiting factor [45].
The typical organization of an SRAM module is depicted in Figure 2.6. The SRAM 1-bit
memory cells (bit cells) are organized in an array with rows and columns. Typical control
signals include chip select (CS), write enable (WE), a clock (CLK), and an address (ADDR).
When a read or write access is performed the address is split up into two parts, a row address
and a word address. The row decoder decodes the row address and enables a signal along
the appropriate row, this signal is typically called the wordline (WL). For read access data
is output by each cell of the active row using the bitline (BL) signals, oriented along the
columns. The value from each cell is typically ampliﬁed by a sense ampliﬁer. A multiplexer
(before or after the sense ampliﬁers) uses the word address to select a subset of the columns
to output as a data word. During write access the bitlines are actively driven in order to
overpower the cell and write a new value, logic ’1’ or ’0’.
The energy per access in an SRAM module depends on the number of rows and columns
in the bit cell array. When a full row is accessed for read, the switching energy per access in a
cycle can be estimated as:
Eread ≈ Ect l&d ec ,r +NCCWL,b i tV 2DD+NC

NRCBL,b i tVDDΔVBL+ Esens e&ou t pu t

(2.19)
Here NR and NC are the numbers of rows and columns, Ect l&d ec ,r is the switching energy
from the decoder and control circuitry, CWL,b i t is the wordline capacitance per bit, CWL,b i t
is the bitline capacitance per bit, ΔVBL is the bitline swing, and Esens e&ou t pu t represent the
energy from the sense ampliﬁer and other output stages. We see in this expression that when
bothNR andNC become large the bitline capacitance of the bit cell will become the dominant
term. To reduce the impact of this term it is customary to use a thin cell layout, and to reduce
the bitline swing by employing sense ampliﬁers that can amplify a smaller diﬀerential voltage.
Figure 2.7 shows the conventional 6T SRAM cell, which uses two back-to-back inverters
to store 1 bit on the complementary retention nodes (Q and Q). Writing is performed by
raising the wordline (WL), while forcing complementary value on the bitlines, BL and BL. In
order for the write to be successful, primarily the access transistors M5,6 need to be stronger
thanM3,4. A read can be performed by raising both BL and BL to the supply voltage, and then
raising WL. The current through M5,6 is integrated on the bitline capacitance and produces a
smaller diﬀerential voltage. This diﬀerential voltage is then ampliﬁed by the sense ampliﬁer
and then latched as a digital value. Usually the bitline capacitance is large, thus the access
22 Chapter 2. Background
R
ow
s
Columns
WL Decoder Bitcell array
Timing
& Control
Sense ampliﬁers & BL drivers
Word multiplexer
Data wordAddress & Control
Figure 2.6: Overview of typical SRAM module organization, components and layout.
transistors M5,6 will for a while be able pull the drain of M1,2. Therefore to avoid overwriting
the cell during read, the retention transistors M1,2 must be stronger than M5,6. The fact that
read stability depends on weak M5,6 and write stability depend on strong M5,6, represents a
conﬂict. When the margins are reduced, such as when reducing the cell area or reducing the
voltage supply, it eventually becomes diﬃcult or impossible to retain reliable access to the
cell.
When the DC transfer function for the two back-to-back inverters of the 6T cell are
plotted against each other, such as in Figure 2.8, the static noise margin of an SRAM cell can
be deﬁned as the diagonal of the maximum square that can be ﬁtted within the two curves, on
both sides of the tripping point. If there is no such square on either side, the cell is unstable
and will not be able to retain data.
M5
M6
Q
Q
WL WL
BL BL
M3
M1
M4
M2
Figure 2.7: Conventional CMOS 6-transistor (6T) SRAM
2.4. Ultra Low Voltage SRAM 23
VI(VO)
0 0.2 0.4 0.6 0.8 1 1.2
V
O
(V
I)
0
0.2
0.4
0.6
0.8
1
1.2
Vo(Vi)
Vi(Vo)
SNM
 	 

Figure 2.8: Static noise margin (SNM)
An eﬀective way to simulate SNM can be found in [46]. The same method can also be
used to simulate an SNM for the read and write operation. SNMread is found when both
wordlines and bitlines are forced to the supply voltage. SNMwrite is found when wordlines
are at the supply voltage and bitlines are forced to complementary logic values that should
cause the cell to be overwritten. In the case of SNMwrite it should not be possible to ﬁt a
square within the two DC curves, therefore SNMwrite is often represented as a negative value.
Techniques to improve writeability of the 6T cell include boosting the WL voltage [47],
or through a virtual supply rail for the retention inverters, either collapsing VDD or boosting
ground [48]. Similarly dual-rail schemes can also allow enhanced readability if a virtual VDD
is raised prior to access, or a virtual ground is lowered; also a WL underdrive scheme will
result in improved read stability [48].
To circumvent the conﬂict between read and write stability it is also possible to introduce
extra transistors. As seen in Figure 2.9 the additional read transistors in the 8T SRAM cell
allows for a much improved read stability at low voltages [49, 50]. In [51] this was used
together with sizing utilizing RSCE, to achieve aVmin of 260mV. Another consideration that
can improve margins is optimizing device threshold voltages, and utilizing multiple threshold
voltages [52].
24 Chapter 2. Background
M5
M6
Q
Q
WWL WWL
BL BL
M3
M1
M4
M2 M7
M8 RBL
RWL
Figure 2.9: 8T SRAM cell with separate read and write wordlines (RWL,WWL) and separate
read and write bitlines (BL, RBL)
Chapter 3
Summary of paper contributions
3.1 Paper I : Beneﬁts of Decomposing Wide CMOS Tran-
sistors into Minimum Size Gates
3.1.1 Introduction
Paper I [53] (see reprint on p. 53) focuses on exploiting the inverse narrow width eﬀect
(INWE) of MOSFETs for subthreshold operation. NWE and INWE is typically a concern
in narrow devices made using shallow trench isolation (STI). INWE is quoted to cause a
ﬁeld-enhancement that reduces the threshold voltage of the device [54]. For superthreshold
operation this is a concern as it can cause a considerable increase in the subthreshold leak-
age current [55]. For pure subthreshold operation the eﬀect can however be beneﬁcial, as
both leakage and drive current see an increase while parasitic capacitances are reduced. The
reduction in parasitic capacitance allows a substantial reduction in the switching energy or
Power-Delay Product (PDP), for logic gates drawn using minimum-width devices of a type
that displays the INWE eﬀect. The combination of increased currents and reduced parasitic
capacitances allows faster operation, but with an increase in leakage.
To highlight potential beneﬁts and drawbacks of using minimum width transistors the
paper investigates the properties of drive strength multiples through parallel coupling of
minimum width transistors. In the paper this is called a minimum-split transistor (MST).
The MST is compared to an iso-area design where the drive strength is adjusted by increasing
the width of a single transistor. A Monte Carlo experiment conducted in MatLab features
some statistical properties of the drive current of MSTs.
3.1.2 Summary of results
Main results of Paper I include an inverter design that for iso-areas exhibited reduced delays
(by 35%–40%) and reduced PDP (by 40%–43%) for the switching of an inverter in a ring
oscillator experiment. The Monte Carlo experiment shows that multiple MSTs have similar
or improved worst-case on-currents when compared to iso-width WTs.
25
26 Chapter 3. Summary of paper contributions
3.1.3 Errata
1. In Paper I, Fig. 2, 0.6m should have read 0.6μm .
2. In Paper I, In the legend of Fig. 4, the words mean and nominal were accidentally
switched.
3.1.4 Postscript : Measurements
A fewmonths prior to the writing of Paper I, two ring oscillators were sent to manufacturing.
The transistor topologies are slightly diﬀerent to those of Paper I. However, the measurement
results still display positive results for the MST variant, at the lowest voltages.
In the experimental setup two ring oscillators were implemented each with 11 stages. For
the MST inverter both PMOS and NMOS had 4 parallel transistors of minimum width. A
layout of the two inverter cells can be seen in ﬁgure 3.1a. A low threshold voltage device was
used for the PMOS and a standard threshold voltage device was used for the NMOS. The
PMOS employed minimum length, while the length of the NMOS was slightly increased
to 0.12μm to allow equal pull-up and pull-down currents for the MST inverter at VDD =
300mV. The wide gate structure had two gates 0.54μm wide for both PMOS and NMOS,
and a shared drain junction. In both oscillators the inverters were spaced equidistant to make
wiring identical.
Measurements of the oscillator frequencies as a function Vdd is shown in the top part of
ﬁgure 3.1b. TheWT design was functional down to a minimum supply voltage of 140mV. At
this VDD the WT oscillator frequency ( fW T ) was 7.25MHz while the MST design was 50.3%
faster operating at fMST = 10.9MHz at the same operating voltage. The minimum VDD for
the MST design was about 80mV, and at this voltage it was running at fMST = 2.62MHz. The
relative speed diﬀerence between the two designs decreased when the power supply voltage
was increased. At about 311mV the two oscillators were operating at the same frequency
of about 178MHz. At 500mV the WT design was running at 908MHz while the MST
design was about 26.4% slower, running at 668MHz. Unfortunately it was not possible to
do measurements of the power consumption as for this multi-project chip, the power supply
was required to be connected to other circuits through the pad ring.
3.1. Paper I 27
In In Out
Gnd
Vdd
Out
(a) Layout of manufactured inverters. MST (left) and WT (right).
0 0.1 0.2 0.3 0.4 0.5
106
107
108
109
O
sc
il
la
to
r
fr
eq
u
en
cy
[H
z]
Vdd
 
 
MST
WT
0 0.1 0.2 0.3 0.4 0.5
−40
−20
0
20
40
60
S
p
ee
d
im
p
ro
ve
m
en
t
of
M
S
T
[%
]
Vdd
(b) MST andWT operating frequency (top) and relative speed
diﬀerence (bottom).
Figure 3.1: Layout and measurements of MST and WT ring oscillator.
28 Chapter 3. Summary of paper contributions
3.2 Paper II : Multi-Objective Optimization of Minority-3
Functions for Ultra Low Voltage Supplies
3.2.1 Introduction
When optimizing multiple and diﬀerent performance goals, it can become a time-consuming
task to ﬁnd good and appropriate trade-oﬀs for several conﬂicting performance goals. Many
traditional circuit optimization techniques focus on one performance variable, while other
performance measures are regarded as constraints which a designer must set from a priori
knowledge of the circuit and the environment it is expected to operate in. Multi-objective
optimization takes a step back and allows search for multiple solutions in a multidimensional
performance space. Only Pareto optimal solutions are kept, that is to say the result of algo-
rithm will only be eﬃcient trade-oﬀs where each potential solution is not dominated by any
other potential solution on all performance measures. Following a multi-objective optimiza-
tion run, a designer or perhaps a place and route algorithm, may from the results choose a
candidate particularly suited to the application it is intended for.
Paper II [56] (see reprint p. 59) presents a method used to explore the performance space
of three implementations of the minority-3 logic function operating at a supply voltage of
150mV. Three conﬂicting design goals; area, leakage power, and a measure for robustness, are
evaluated for each circuit. These performance goals are co-optimized using multi-objective
optimization so the optimization result for each circuit is a Pareto Front. At speciﬁc perfor-
mance values we further evaluate each circuit with a ring oscillator experiment, where we
include measures of the ring oscillators frequency and energy consumption.
3.2.2 Summary of results
The presented multi-objective optimization method was developed to co-optimize three con-
ﬂicting design objectives; circuit area, static leakage power, and robustness. The robustness
measure is novel and was developed primarily as an optimization goal to compare between
the circuits abilities to present the ideal logic output voltage. All performance measures are
evaluated several times for the same circuit in a Monte Carlo run. The approximated Pareto
front resulting from the multi-objective optimization are plotted for all circuits. We found
that for the supply voltage of 150mV the performance space was dominated by the 10T mir-
rored gate implementation [57] at small circuit areas. At larger areas a 22T standard CMOS
implementation using inverters and 2-input NAND and NOR gates, was potentially more
robust. While the optimization method did not take speed into account, the 12T mirrored
gate implementation [58] was slightly faster than the 10T implementation in the iso-area ring
oscillator experiment.
3.2.3 Postscript
Following the presentation at ISCAS further analysis of the data in this paper was performed.
Speciﬁcally we investigated the width and length sizing parameters of the circuits that were
part of the Pareto front. For series transistors, the width parameters Wd p , Wdn, and the
3.2. Paper II 29
22T
12T
10T
A
A
A
A
B
B
C
C
C
C
B
B
Min3
B
A
B
A
C
C
C
C
B
B
Min3
A
Min3
C
B
Figure 3.2: Minority-3 functions investi-
gated in Paper I. Named after the number
of transistors in the circuit: 22T, 12T, 10T.
0 0.1 0.2 0.3 0.4 0.5
0
2
4
6
8
10
12
L
min
L [μm]
W
dp
 
[μm
]
0 0.1 0.2 0.3 0.4 0.5
0.1
0.2
0.3
0.4
0.5
L
min
L [μm]
W
dn
 
[μm
]
Figure 3.3: Pareto sets showing width and
length parameters Wdn,Wd p ,L correspond-
ing to the Pareto Front in Fig. 2 of Paper II
for areas less than 40μm2 and RMSEs less
than 4mV. (Marker symbols purely for visi-
bility).
common length parameter L, are shown in Figure 3.3. The particular optimization problem,
as stated in the paper, tries to optimize gates for maximum stability (minimizing “RMSE”)
while simultaneously minimizing power and area. This ﬁgure is a good example to show
that the optimization method is able to choose parameter values that would perhaps deviate
from a conventional sizing method, incorporating short- and narrow-channel eﬀects, as well
as variability based on results from the Monte Carlo simulation. The algorithm has preferred
to scale NMOS in primarily two regimes; ﬁrst for small area transistors L is kept slightly
above Lmin while W is scaled until it reaches approximately 370 nm, second for larger area
transistor, it keeps W constant around 370 nm and increases L. For PMOS minimizing
RMSE, power and area results in aWmin just under 1μm . For larger areas bothW and L are
then both scaled, although the ratio decays somewhat. Ratios Wd p/Wdn are typically in the
range 5×−20×. This is not an untypical ratio for subthreshold sizing according to [59].
3.2.4 Errata/Note
1. In Fig. 2 the Pareto fronts are referred to as Pareto sets. This is not entirely uncommon,
however it diﬀers from the terminology applied elsewhere in this thesis.
30 Chapter 3. Summary of paper contributions
3.3 Paper III : Muller C-elements based onMinority-3 Func-
tions for Ultra Low Voltage Supplies
3.3.1 Introduction
Paper III [60] (see reprint p. 65) expands on the results of Paper II. Results from the
multi-objective optimization of Paper II are given for two additional implementations of
the minority-3 function. This includes a 6T ’ratioed-ﬁght’ implementation, and a variation
over the 12T implementation of Paper II. From the ﬁve diﬀerent minority-3 implementa-
tions, a 10T and 22T implementation is selected to form two-input Muller C-elements which
the paper refers to as the 12T and the 24T Muller C-element, as the Muller C-elements are
formed by adding a two-transistor inverter and a feedback connection. Muller C-elements,
named after D. E. Muller who introduced them in [61], is a logic function that switches state
only when all its inputs hold the same logic value. Muller-C elements can therefore be used
as memory elements. One common use is to facilitate correct timing for asynchronous logic
[62]. As such Muller-C elements are common building blocks of Null Convention Logic[63].
3.3.2 Summary of results
The two candidates (12T and 24T) for an Ultra-Low Voltage Muller C-element were resized
to accommodate equal rise and fall times, and were further investigated under operation at
150mV and 300mV and at three diﬀerent temperatures, -20°C, 27°Cand 85°C. The inﬂuence
of process variations was investigated using Monte Carlo simulations and characteristics for
switching energy, power consumption and propagation delay are evaluated.
At room temperature and a supply voltage of 150mV the 12T implementation of the
Muller C-element had a switching delay of approximately 16.22μs , approximately 10% faster
than the 24T implementation. The 12T’s static power consumption was on average 2.62 pW,
or just 35% of that of the 24T implementation. The switching energy was also approximately
44% lower in the 12T implementation. The relative comparison was fairly constant at the
other temperatures and at the increased a power supply of 300mV. At 300mV the absolute
switching energy was however approximately quadrupled, while propagation delay was re-
duced with a factor of about 5×. Process variations and the temperature had a very strong
inﬂuence on the propagation delays, ranging over 3 orders of magnitude from worst-case at
-20°Cto best-case at +85°C.
Overall the 12T implementation, also known as the Sutherland implementation [64] ap-
peared to be the superior candidate for an ULV Muller C-element although the results of the
multi-objective optimization indicate through the robustness measure that the 24T imple-
ment may be a more robust solution at large circuit areas.
3.4. Paper IV 31
3.4 Paper IV : Design of 9T SRAM for Dynamic Voltage
Supplies by a Multiobjective Optimization Approach
3.4.1 Introduction
Memory is a central component in modern microsystems and SRAM is a typical choice
for applications such as cache or FIFOs. Low voltage operation of SRAM is desirable to
reduce leakage power in standby and low power modes, or simply to allow compatibility
with other circuits operating at low voltages without the need for interface circuitry such
as level shifters. It is also possible to reduce access energy in SRAM by operating at lower
voltages, this is however limited to small to moderately sized SRAM blocks (cell arrays) as
the leakage energy per access proportion grows with memory size.
For operation at very low supply voltages (below approximately 600mV for 65 nm and
90 nm technologies) the traditional 6T SRAM cell quickly runs into reliability issues as read
and write reliability are in conﬂict by design, i.e. increasing read reliability will lower write
reliability, and vice versa [65]. This conﬂict can be solved by using additional transistors to
decouple the read access from the cell, as for instance in [65, 66].
Paper IV [67] (see p. 73) focuses on the design of a 9T SRAM cell, with an emphasis
on reliable operation at 300mV. The 9T SRAM cell solves the read-write reliability con-
ﬂict by decoupling the read-out signal so current is provided externally to the back-to-back
memory-retaining inverters. Thus the 9T cell also features independent read and write ac-
cess. Compared to the 8T SRAM cell of [66] the readout signal is diﬀerential, thus allowing
traditional sensing techniques for the bitline.
In the paper a multi-objective optimization approach is used for transistor sizing. This
method provides a global picture of all Pareto optimal solutions and allows us to present a
variety of resource eﬃcient implementation choices. A sample Pareto optimal cell is investi-
gated further in the paper, using Monte Carlo simulation.
Paper IV is a paper written in collaboration with the Schaltungstechnik research group
from the Heinz-Nixdorf Institute, University of Paderborn. Additional and comprehensive
information on the multiobjective optimization method related to this paper is available in
German in Matthias W. Blesken’s Ph.D thesis [42].
3.4.2 Summary of Results
A main result in the paper is the multi-objective optimization method that was designed to
facilitate a choice of resource eﬃcient sizings for the 9T SRAM cell. The method optimizes
σ(SNMhold), cell area, SNMwrite, and leakage power and read and write delay times at supply
voltages of 300mV and 1.2V. A sample resource eﬃcient sizing was investigated more thor-
oughly. Here, the leakage per cell was very low at 0.26 pW. This is two orders of magnitude
lower than the design references that were used for comparison [47, 66, 68, 69], although
the references include leakage from peripheral access circuits. Stability was indicated as good
with the typical SNMhold value located at 8.56σ away from the typical simulation failure cri-
terion. Read and write delay optimization goals have typical values at 0.268μs indicating an
expected performance comparable and/or improved with respect to [47, 66, 68, 69].
32 Chapter 3. Summary of paper contributions
3.4.3 Supplement: Deriving equation (2) of paper IV
We assume that variability in the threshold voltage (Vth ) of a transistor can be modeled as a
random variable following a normal distribution, with the standard variation of the threshold
voltage modeled using equation (2.16):
σ(Vth) =
AV T
We f f Le f f
We also assume that the static noise margin (SNM) can be expressed on the linear form:
SNM= RpVt h,p +RnVt h,n +M (3.1)
where Rp , Rn and M are constants. Since the means are constants we can rewrite this as:
SNM= RpΔVth,p +RnΔVth,n +M
′ (3.2)
Where M ′ is a constant that incorporates the means, and the ΔVth values are random vari-
ables. The variance of the sum of two independent RVs taken from the normal distributions
a (0,σ1) and b (0,σ2), where a and b are constants (linearly transforming the normal
distribution), can be given by:
σ2Σ = a
2σ21 + b
2σ22 (3.3)
The variance of the sum of the RVs (RpΔVth,p +RnΔVth,n) can thus be expressed:
σ2Σ = R
2
p(σ(Vth,p))
2+R2n(σ(Vth,n))
2 (3.4)
Using equation 2.16 we then insert for σ(Vth) and get:
σ2Σ =
R2pA
2
V T ,p
WpLp
+
R2nA
2
V T ,n
WnLn
(3.5)
Comparing this to equation (2) we see that we should have:
σ2n =
R2nA
2
V T ,n
W0,nL0,n
σ2p =
R2pA
2
V T ,p
W0,pL0,p
(3.6)
To ﬁnd σn we ﬁxW0,n,L0,n,W0,p and L0,p , and determine σn by settingAV T ,p = 0, running
a Monte-Carlo simulation and calculating the resulting standard deviation of SNMhold. We
determine σp in an equivalent manner by setting AV T ,n = 0. Since the lengths L1 = L0,n =
L3 = L0,p are constant and equal in the application in Paper IV we can simplify equation (3.5)
to equation (2).
3.5. Paper V 33
3.5 Paper V : A 65 nm 32 b Subthreshold Processor with 9T
Multi-Vt SRAM and Adaptive Supply Voltage Control
3.5.1 Introduction
Paper V [70] (see p. 79) presents overall design methods and measurement results from a
32 b very large instruction word (VLIW) 65 nm subthreshold processor [71], together with a
64×32 block of the 9T SRAM cell [67]. Paper V is a paper written in collaboration with the
Cognitronics and Sensor System Group, University of Bielefeldt, and the Schaltungstechnik
research group from the Heinz-Nixdorf Institute, University of Paderborn. While this thesis
only concerns itself with content relating to the SRAM block of this paper, additional and
comprehensive information on the subthreshold processor can be found, in German, in Sven
Lütkemeier’s Ph.D. dissertation [72].
Micropower processing in the subthreshold domain can be very beneﬁcial, as reduced
supply voltages often yield large savings in the energy per computational operation. Savings
on the order 10×–20× [73] in the energy per operation compared to operation at nominal
supply voltages of 1.2V. A subthreshold processor could therefore be natural as part of many
computational systems aiming at minimal energy and/or power consumption. The processor
presented in the paper is a subthreshold implementation of the CoreVA architecture [74].
SRAM is an indispensable part of many digital systems, with important applications such
as in cache, FIFO buﬀers or register ﬁles, and it is of interest to be able to operate SRAM at
the same voltage as surrounding circuits. Design in the subthreshold domain is demanding
as increased susceptibility to variations, particularly in the transistor threshold voltage [23],
incurs large timing variations and present hazardous sources of error for hold and setup
timing closure. One must therefore take into account this increased variation and mitigate
the impact on failure rates and delay variations. The paper also details a performance and
power management subsystem that provides dynamic voltage and frequency scaling (DVFS)
combined with an adaptive supply voltage generation for dynamic PVT compensation.
3.5.2 Summary of results
The processor was synthesized using standard cells with high eﬃciency ensured by a multi-
objective optimization strategy presented in [44]. In measurement the CoreVA processor
achieves a minimum energy per instruction of 9.94 pJ at 325mV. The lowest operating volt-
age is 200mV for best samples with a clock frequency of 10 kHz, while mean speed is
94.32MHz at 1.2V. Average energy per cycle is 110.22 pJ at 1.2V, a factor of 11.1 compared
to the minimum energy operation point.
The 2 kb 9T SRAM macro presented in the paper achieves minimum energy per opera-
tion at averages of 321mV (0.03σ/μ), 0.57 pJ (0.037σ/μ) and 730 kHz (0.184σ/μ). Maxi-
mum operating frequencies at the minimum operating voltage fell in the range from 448 kHz
to 1016 kHz. All 38 measured samples were functional from 1.2V down to 280mV. Best
samples operated error-free down to 230mV. At 171mV 1% of the bits experienced reten-
tion errors. Average leakage per cell was 17.8 pW at 0.3V.
34 Chapter 3. Summary of paper contributions
3.6 Paper VI : Yield-Oriented Energy and PerformanceModel
for Subthreshold Circuits with Vt h Variations
3.6.1 Introduction
Paper VI [75] (see p. 93) extends previous work [76, 77] to model also the impact of RDF on
various performance metrics for subthreshold logic circuits. Although many sources of con-
tribute to variability in nanoscale subthreshold circuits, RDF is often the dominant source
of intra-die current and delay variation [23]. Investigating this topic is important as it helps
in understanding how overall system performance degrades when scaling up the number of
transistors in a sequential circuit.
The analysis consist of several steps. First fairly basic subthreshold current equations [23]
are developed in order to express a worst-case current based on Vth variance and gate area.
Thus later metrics can be calculated taking into account a targeted circuit complexity and
yield, as well as gate sizing. In the analytical expressions the propagation delay is lognormally
distributed. Using an approximation for the sum of iid. lognormals we evaluate a worst-case
clock period as a function of number of gate delays in series and load capacitance. We then
follow steps similar to [76] in order to ﬁnd analytical expressions for optimal supply voltage
for minimum energy operation.
3.6.2 Summary of results
The main results of Paper VI are the analytical expressions developed during the analysis.
These enables a designer to fairly quickly evaluate relevant worst-case conditions in the sub-
threshold domain. The analysis derives worst-case expressions for: on-current ( Ion ), on/oﬀ
ratios ( Ion/Io f f ), propagation delay of simple gates ( tp ), clock period for series-connection of
identical gates ( tc l k ), and energy per cycle (Eo p ). Results are presented in contour plots, visu-
alizing the subthreshold digital logic design space in a 90 nm process, with respect to gate area
and supply voltage parameters. An analytical expression is found for the optimum voltage
supply for minimum energy operation, given that it operates on the worst-case clock period
tc l k :
VDDopt = nUT

2−W−1

− Pα
NMG(·) exp

A2V T
2WLn2U 2T
−Q AVT
WLnUT
− 2

(3.7)
This expression takes into account threshold voltage variations which to our knowledge has
not been done analytically in previous works, although it signiﬁcantly aﬀects the minimum
energy operating point. Since the analysis relies on subthreshold current expressions, near-
and super-threshold operation is not taken into account. For pure subthreshold circuits we
ﬁnd that Vth ﬂuctuations signiﬁcantly increases the energy per cycle, and also that Vth vari-
ation increases the required VDD and gate area to achieve minimum energy operation. For
the technology we evaluated, we found that gate sizes several times the minimum may be
required for minimum energy designs, primarily as it reduces the impact of RDF on delay.
Importantly the paper may help the reader gain an understanding and intuition for very
3.6. Paper VI 35
signiﬁcant yield-related eﬀects applicable to design of circuits in the subthreshold domain.
99%99.99%
3
4
5
6
7
Yield vs failure probability for identical components
Contours at 99% and 99.99%
Number of components Nc
10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9
P
fa
il
10 -12
10 -10
10 -8
10 -6
10 -4
10 -2
10 0
Figure 3.4: Yield contours for equation (1) of Paper VI.
3.6.3 Supplementary material
To supplement the paper and later discussion yield contours for equation (1) of Paper VI is
plotted in Figure 3.4. Horizontal stapled lines are drawn for sigma values from the normal
distribution, corresponding to the failure rate Pfail. We see that a ﬁxed 6σ failure rate for 10 k
components will result in a yield of 99.99%.

Chapter 4
Discussion
4.1 Exploiting INWE
The use of minimum-width and minimum-length sized transistor arrays, rather than the con-
ventional scaling of width may have applications in modern technologies, also in superthresh-
old. In a recent paper [78] it is reported on performance and energy gains of minimum sized
transistor arrays based on a 16 nm predictive technology model operated at VDD = 0.8V. For
instance, a two-input NAND shows a performance (speed) gain of 61%, and energy gains of
57% at a cost of a 22% area increase compared to conventional scaling. These are substantial
gains indeed, still one may raise questions about how trustworthy these simulation results
are. Speciﬁcally, while NWE is modeled in both BSIM3v3 and BSIM4, the inverse narrow
width eﬀect (INWE) is not modeled by either [79], although BSIM4 has a length-dependent
reverse narrow width term for short channels. It is however sometimes seen that foundries
use the NWE parameters to model eﬀects that could include INWE, utilizing negative coef-
ﬁcients and/or wrapping equations around the central BSIM NWE model parameters (K3,
K3B and DVT0W-DVT2W [25]). On the other hand, other foundry models do not describe
the required parameters to include this eﬀect at all, in eﬀect setting these coeﬃcients to zero.
The measurement results presented in Section 3.1.4, Figure 3.3 indicate that there could
potentially be substantial gains at low VDD if minimum-sized transistor arrays are employed,
and INWE combined with reduced gate capacitance, could at least partially explain why.
However these results do not match too well with simulated results. In order to fully uti-
lize eﬀects such as INWE in the weak or moderate inversion regions, improved models are
required to properly quantify the eﬀects. Needless to say, a combined speed and energy
improvement of minimum-sized transistor arrays would be welcome news for many appli-
cations. However, these results are based only on a single sample, due to great diﬃculty in
extracting the data, and should therefore not be considered to be conclusive.
Although the topic of minimum-sized arrays of transistors was set aside, partly due to
poor transistor models in a circuit design context, others have developed such a concept
further.The impact of INWE on subthreshold device sizing is investigated in [80]. In [81]
the use of INWE aware sizing to construct a low voltage standard cell library is investigated.
Applying this to a baseband processor they indicate gains of up to 20% less delay, up to
34% less power consumption and up to 47% less area. In [82] it is advocated to use arrays
of optimally sized transistors, to improve delay, power and reliability, through W/L sizing
37
38 Chapter 4. Discussion
allowing adjustment of Vth and σ(Vt)of individual unit transistors.
In subthreshold, the sum of iid. lognormally distributed currents may in some respects
show superior statistical qualities compared to the single current of an iso-area transistor,
even without a baseline shift of Vth by eﬀects such as SCE, NWE and INWE, as illustrated
in the numerical experiment of Paper I. How the statistical properties of the sum of iid.
subthreshold currents behave is however strongly dictated by the magnitude of the ratio
σ(Vt)
nUt
, i.e. for a subthreshold current the ratio σ(Vt)nUt would correspond to σ as appearing in Fig
5. in Paper VI, which shows the right tail of a lognormal sum. Noteworthy, skewing eﬀects
on the sum of subthreshold iid. currents increase with decreasing temperature.
4.2 Multi-objective optimization of ULV circuits
In the context of the particular MOOP of Paper II, the objectives area and power were fairly
closely related, particularily for areas between 10μm2 to 40μm2for the 10T and 12T gates.
This should perhaps come as no surprise when noting that area is related to the capacitance,
and capacitance is related to switching power. However, for small areas and for the 22T gate,
the relationship with power was not quite that clear. In later analysis the area was also found
to be related to the static leakage and switching energy, providing a near linear relationship
in the 10μm2 to 40μm2range. For propagation delay the area relation was slightly non-linear
but still monotonous. As a lesson learned, in the context of this MOOP, many aspects could
be co-optimized by simply calculating the circuit area, or perhaps better in some respects, the
total capacitance. The main conﬂicting objective for the area objective , was our objective for
optimizing ’robustness’ (reliability) – the RMSE of the output voltage. As can be seen in the
Fig. 2 of paper II, the area objective has a very large eﬀect on the RMSE for very small areas,
with a decreasing inﬂuence when the area grows.
In paper II where 10T, 12T and 22T minority-3 gates are compared it may be of interest
to note that the lower bounds on RMSE, for a given area, seem to covariate with the num-
ber of transistors in the output function. For the metrics for active power and RMSE, the
relationship seems weak until RMSE approaches a limit where the power consumption rises
fairly sharply.
As a ﬁnal note on the circuits investigated in Paper II and III, for a fully comprehensive
comparison of minority-3 and C-element circuits in the subthreshold domain, it is probably
of interest to also include the topology of the van Berkel implementation [83]. Since the van
Berkel implementation utilizes 3 transistors in series it’s Vmin is likely higher than that of the
Sutherland implementation. Properties such as area, minimum energy and operating speed
may however compete with the Sutherland implementation at moderate supply voltages.
As seen in Papers II, III, and IV, multi-objective optimization (MOO) can be used to
approximate the Pareto Front, which in the context of the optimization problem represents
all resource-eﬃcient tradeoﬀ alternatives for a circuit. Thus MOO represents a powerful
method for exploration of the potential performance space of a circuit. Given a speciﬁc
optimization problem, MOO can be used to compare circuit topologies and ﬁnd the optimal
implementation for a speciﬁc desired performance range. Visualization or metrics from a
Pareto Front can quickly provide performance limits of a speciﬁc topology. For applications
4.2. Multi-objective optimization of ULV circuits 39
where speciﬁc performance requirements are available, selection of the best suited tradeoﬀ
alternative(s) is fairly easy. For applications where requirements are not too speciﬁc, the
Pareto front can be used in a further cost-beneﬁt analysis to suggest optimal allocation of
resources in a larger context.
Thus, when the combined eﬀects of subthreshold phenomena such as SCE, NWE and
INWE, and RDF, may otherwise cause design diﬃculties due to the complex design space,
MOO has, in principle, the potential to bypass such problems. Several challenges are how-
ever involved for the successful application of any MOO algorithm in the context of circuit
design. Formulation of the optimization problem; the objective functions and constraints
for the search space play a critical role for achieving quality results, within a reasonable com-
putational time.
Another obvious concern for the application of multi-objective optimization is the eval-
uation time of the objective functions. For circuit optimization macro-modeling can reduce
computational eﬀort greatly, however this can not oﬀer the same detail as circuit simulation.
For large populations and long simulations parallelization can however be used to reduce
the time spent in function evaluation. To incorporate variability in simulations one could
consider to introduce corner simulations or Monte Carlo simulations, although both will in-
crease the time spent in simulation. Corner simulations for mismatch typically ﬁx all devices
with the same parameter deviation, and may therefore likely yield unrealistic results. Monte
Carlo on the other hand will introduce randomness to function evaluation which can cause
convergence diﬃculties for the MOO algorithm. To reduce the impact of random errors one
could consider to greatly increase the number of simulations. Another intermediate solu-
tion is to use the same seed or sampling as in Paper II and III, so any error is kept constant
for each function evaluation, although one must keep in mind this can introduce a certain
bias. A challenge related to using Monte-Carlo simulation in conjunction with MOO is thus
how to sample the distribution eﬀectively. Several approaches exist, with latin hypercube
sampling and orthogonal sampling among the more common methods. Another way could
be to tailor corners to the speciﬁc application, such as in [84]. The problem of sampling
when optimizing a circuit is made more diﬃcult by the inﬂuence ofW /L scaling on the con-
tributing distributions, and the lognormal distribution of currents in subthreshold. Circuit
simulators also often provide DC sensitivity analysis. This can be used to a certain extent,
however since it evaluates sensitivity in a ﬁxed operating point, it could lead to an error when
estimating the variance of aggregated sources of variation. When it is possible and practical,
a solution may also be to analytically determine the variance, thus eliminating the need for
Monte Carlo simulations.
Statistical simulations of standard cells could perhaps also be avoided during the opti-
mization stage if all pull-up and pull-down network can be constrained to achieve equal or
comparable worst-case conditions in their current distribution. Final evaluation of reliability
can then be done in a second step. A constraint for the relative area of single PMOS and
single NMOS that to a large degree would equalize the variance of their subthreshold Ion
current can be found in equation (29) in Paper VI. Since the NMOS is typically smaller a
cost-beneﬁt analysis could also set a slightly larger area for the NMOS, improving the distri-
bution further. For more complex gates further area constraints for equalizing the variability
of multiple series PMOS and series NMOS would however need to be developed. The single
40 Chapter 4. Discussion
constraint would remove one dimension from the search, and similar relationships for the
area of series transistors could provide even greater beneﬁt for more complex gates.
Compared to the approach with MATLAB’s ’gamultiobj’ in Papers II and III, a diﬀerent
algorithmic approach was made for the work in [8]. There the Pareto fronts were found by a
multi-dimensional bisection search. Initially the search space was divided in a coarse multidi-
mensional grid and each grid point was evaluated and used to ﬁnd a coarse approximation to
the Pareto front. Next the search grid was reﬁned enhancing the resolution for each param-
eter by a factor 2 in each iteration, however only neighbors in the current Pareto set were
investigated in each new iteration. This resulted in a gradual reﬁnement of the Pareto front.
For problems that involve several minima this may of course miss solutions, depending on
the starting grid. However given some a-priori knowledge of the circuit and kinks related to
SCL and NWE eﬀects, this can largely be avoided during the setup of the initial grid. Sim-
ulation speed was also improved by using DC approximations and running all simulations
via RAM disk, oﬀering an over 10× speed improvement in our environment. For problems
with 3 objectives and up to 4 parameters we found this approach preferable to the use of
’gamultiobj’, however there are still obvious problems with the starting grid and number of
neighbors to search, when scaling the number of parameters.
The algorithmic approach in [8] has some similarities to the GAIO algorithm employed
in [42]. There, after an initial search in a grid, the search space is iteratively divided in
smaller and smaller subspaces, so only subspaces that contained Pareto points is searched.
This approach seems eﬀective when there is an expected relationship between parameters,
and will have some obvious advantages when the number of solutions are large.
4.3 ULV SRAM
In the context of low power design, ULV memory is an interesting topic with particular
challenges. For larger SRAM arrays, such as in a large cache, power consumption is of-
ten dominated by leakage due to a low activity factor for each bitcell. Thus it can be of great
value to reduce leakage by operating the SRAM bitcell array at a reduced supply voltage. As a
counterpoint, the leakage resulting from larger SRAM arrays will push the minimum energy
operating point to a higher supply voltage, perhaps above the threshold voltage. Smaller,
more active SRAM modules may however display minimum energy operation at voltages
well below the threshold voltage of the process. This was the case with the 64×32 block pre-
sented in Paper V, where the minimum energy occurred at VDD = 0.3V , and the threshold
voltage was approximately 450mV. Mainstream applications for an eﬀective use of subthresh-
old SRAMs are then perhaps limited to smaller SRAM modules for use as FIFOs, L1 cache,
register ﬁles and scratchpad memories. A strategy to remedy the situation for larger SRAMs
is to design peripheral circuits either with a higher voltage supply or using low-Vth devices.
This can improve access times for larger arrays by a signiﬁcant amount, thus reducing the
leakage energy. For cost and overhead reasons it may however be of interest to simply op-
erate SRAM blocks at the same VDD and/or frequency as the rest of the circuit. For special
applications, such as circuits powered by a very limited power source, i.e. in an energy har-
vesting application, it may also be of beneﬁt for the circuit to operate, even with an extremely
4.3. ULV SRAM 41
low power supply voltage.
The ﬁrst challenge that must be addressed when implementing an ULV SRAM cell is the
decrease in noise margins. This is very prominent for the traditional 6T SRAM cell as there
is a conﬂict when optimizing the cell for the noise margins during read and write access,
SNMread and SNMwrite. Similar to several other ULV SRAM solutions [47, 50, 51, 65, 66, 68,
69, 85], the 9T cell of Paper V solves this conﬂict by adding extra transistors for the read
access. Thus the write access and read access transistors can be optimized independently.
Another way of improving ULV operation is to break the feedback-loop during access [47].
Recently, in a comparative study including 4 SRAM cells the 9T cell of Paper IV was found
to yield the best hold SNM [86].
Separating it from [87] the 9T cell of Paper V uses a virtual supply for the retention volt-
age, multiple threshold voltages and PMOS write access transistors as these showed improved
characteristics with respect to variability in the target 65 nm LP process. The use of multiple
threshold voltages in a very similar 9T topology was further investigated in [88] and triple-
Vth versions of the 7T [89], 8T [49] and 9T [87] topologies are compared with improvements
in other topologies in [52, 90]. In particular the layout of [88] is likely a major improvement
to the layout described in Paper V, as the thincell layout will show superior qualities with
respect to bitline capacitance thus lowering Emin signiﬁcantly. On the other hand the thin-
cell layout may have some diﬃculties in accommodating a virtual retention supply for the
row, as was used in the cell of Paper V. In the context of [52] triple-Vth versions of the 8T
and 9T topologies are the best with regards to minimum supply voltage and data stability,
with the 9T shows somewhat less leakage. The 9T implementation incurs an area and thus
a performance penalty over the 8T implementation. However, a major diﬀerence from the
8T cell, which perhaps is not highlighted in that comparison, is that also the 9T can beneﬁt
from splitting bitlines into separate read and write bitlines. This reduces bitline capacitance
and thus it will enhance access times and lower dynamic power consumption. Additionally
the 9T cell provides a diﬀerential output signal, circumventing the need for replica bitlines
or other techniques to predict leakage when evaluating the read signal, likely this provides
additional dynamic read stability to the 9T cell when compared to the 8T.
In Table 4.1 the manufactured SRAM block of of Paper V is compared to seven contempo-
rary manufactured subthreshold SRAMs. For the minimum supply voltage metric Vmin our
SRAM is only the ﬁfth best, although several of our best samples operated down to 230mV.
Notably, all the SRAM cells manufactured in a 130 nm technology were able to reach a lower
Vmin than those in a 65 nm technology. The reason for this is probably a combination of well
controlled Vth and cell area, as area can be traded for enhanced stability. Body biasing is also
used in [69, 91] to reduce Vmin signiﬁcantly. For the area metric our cell is the second small-
est, between the publications where cell layout area was reported. In the context of Table 4.1
the 9T SRAM of Paper V was second best for the average minimum energy per operation
metric, and it had the lowest minimum energy supply voltage, still with a reasonable average
speed of 761 kHz. For operating speed our cell was mid-range at the minimum energy per
operation operating point. However when comparing at VDD =400mV our worst sample
operated at 2MHz, with average speed at 2.9MHz.
The leakage per bit at 300 mV was 17.8 pW per bit, placing the 9T SRAM of Paper V in
the middle. The eﬀective leakage per bit depends not only on the cell topology, but also on
42 Chapter 4. Discussion
the technology and on peripheral circuits. For larger SRAM arrays typically only a fractional
part of the leakage derives from the peripheral circuit. However based on simulations of
the SRAM module of Paper V a very substantial 72% of the leakage can be attributed to
the decoder. If the number of columns in the SRAM were increased, one could expect a
signiﬁcant reduction in the peripheral circuits leakage. Since average leakage per cell for
the 64× 32 block was 17.8 pW at 0.3V, thus one could expect a 64× 64 module to exhibit
leakage/bit closer to 12 pW per bit. Although extending the number of columns could incur
a speed reduction due to increased wordline capacitances, the throughput would normally
increase.
At the time of making, the SRAM array was sized as 64× 32. The rather limited size
is primarily due to size constraints of the die. Although SRAMs with a fairly high number
of rows have been demonstrated, such as in [66, 96] which achieve 1 k cells per bitline, here
the number of rows was limited intentionally to safely avoid read access errors due to bitline
leakage. In typical cases the worst-case maximum leakage from all non-accessed bitcells on
the column can be considered a sum of iid. lognormally distributed currents and we can
apply techniques such as that of Appendix A in Paper VI to ﬁnd a worst-case value for the
acceptable leakage given a reasonable yield. While the leakage on its own must be small
enough for the duration of the bitline integration to not aﬀect operation of the sense ampli-
ﬁer, also the diﬀerence between the accessed bitcell read current and the leakage must be able
to produce some minimum voltage as determined by the requirements of the sense ampliﬁer.
Interestingly in [97] it is shown that for operation in the ULV domain energy eﬃciency can
be better when the number of rows is kept low, when sizing the SRAM array, however they
argue that this is the case particularily when the SRAM array is large.
Sense ampliﬁer design in the subthreshold domain is perhaps the most challenging task
of ULV SRAM design as RDF causes severe mismatch in the input sensing transistors, and
speed and energy eﬃciency will suﬀer when upsizing the transistors to compensate. In the
context of Paper VI, no special circuit techniques other than W /L sizing were used to mit-
igate subthreshold variability. Although multiple sense ampliﬁer topologies were investi-
gated, all struggled to meet the design requirements of operation from -20°C to +85°C with
Vmin = 200mV, with a sensing time allowing for an access time of 1μs at VDD = 300mV.
While the selected topology was the only one that came close to these requirements within
an appreciable yield, the implementation displays increased detection delays when scaling to
higher supply voltages. This is due to the read access precharge to ground and the need for
an increased bitline integration time to provide suﬃcient current. For this application it was
however acceptable as ULV operation was the main target for investigation.
One way of handling mismatch in the sense ampliﬁers is to calibrate each sense ampliﬁer
via multiple references, as seen in [98]. However in subthreshold a large temperature range
may cause major grievance unless it is tackled. Body biasing [17, 99, 100, 101] can improve
leakage and mitigate global variation in subthreshold currents due to overall Vth bias as well
as temperature eﬀects over a signiﬁcant range [23]. Thus it could be a very sensible technique
to apply to sense ampliﬁers, to boost their eﬀectiveness when operated in the subthreshold
domain. Body biasing for sense ampliﬁers was for instance used for calibration in [102]
to achieve 2× less input oﬀset after calibration. Adaptive body biasing was also used in
the SRAM of [69, 91] to achieve a supply voltage as low as 193mV for a 6T SRAM cell.
4.3. ULV SRAM 43
Ta
bl
e
4.
1:
C
om
pa
ri
so
n
of
su
bt
hr
es
ho
ld
SR
A
M
m
od
ul
es
.
So
ur
ce
T
hi
sw
or
k
[8
5]
[6
8,
92
]
[6
5,
93
]
[4
7]
[5
1,
94
]
[6
6,
95
]
[6
9,
91
]
[9
6]
Te
ch
no
lo
gy
65
nm
65
nm
65
nm
65
nm
90
nm
13
0n
m
13
0n
m
13
0n
m
13
0n
m
C
el
lt
yp
e
9T
9T
8T
10
T
10
T
8T
10
T
6T
A
8T
-R
W
D
5
Bl
oc
k
siz
e
64
×3
2
64
×7
2
64
×1
28
25
6×
12
8
12
8×
25
6
51
2×
64
10
24
×2
56
16
×1
6
51
2×
32
C
el
la
re
a
[μ
m
2 ]
2.
83
1.
77
3.
51
6.
36
7.
50
4.
79
Eﬀ
.c
el
la
re
a
[μ
m
2 ]
4.
61
3.
04
9.
34
13
.9
6
9.
46
E m
in
Vo
lta
ge
[m
V
]
32
13
50
0
40
0
40
02
40
02
40
02
34
0
37
51
E m
in
Fr
eq
ue
nc
y
[k
H
z]
76
13
(1
01
64
)
50
00
1
50
01
47
5
67
00
19
00
40
01
19
00
1
E m
in
[f
J/
ac
ce
ss
/b
it]
17
.8
3
62
.5
82
.5
1
14
.1
1
25
.2
32
.9
1
13
8
11
91
Le
ak
ag
e
[p
W
/b
it]
@
30
0m
V
17
.8
3
40
.7
1
12
.2
1
11
.8
1
25
64
.1
1
7.
31
54
.9
11
.5
V
m
in
[m
V
]
27
3
(2
30
4 )
27
5
25
0
38
0
23
0
20
0
19
3
30
0
(2
50
4 )
1
A
pp
ro
xi
m
at
io
n
ba
se
d
on
di
ag
ra
m
an
d/
or
ca
lc
ul
at
io
n.
2
O
pe
ra
tin
g
po
in
tc
ho
se
n
du
e
to
va
lu
es
av
ai
la
bl
e
in
pu
bl
ic
at
io
n,
bu
tn
ot
re
po
rt
ed
as
tr
ue
en
er
gy
m
in
im
um
.
3
A
ve
ra
ge
va
lu
e.
4
Va
lu
e
fo
rb
es
ts
am
pl
e.
5
A
ve
ra
ge
-8
T
R
ea
d
W
ri
te
D
ec
ou
pl
ed
†
T
hi
st
ab
le
is
an
ex
pa
nd
ed
ve
rs
io
n,
ba
se
d
on
a
ta
bl
e
pr
es
en
te
d
in
[7
2]
44 Chapter 4. Discussion
Considering multi-VDD circuits, another and more direct way of improving sense ampliﬁers
is to operate these at a supply voltage allowing super-threshold operation. Although this
would increase leakage in the sense ampliﬁers, the sensing time would be expected to drop
drastically and could thus improve overall energy eﬃciency.
Although adaptive body biasing easily can compensate global between-die variation, local
within-die variation is still severe. The analysis of measured bitcell retention failure in Paper
V adds weight to the hypothesis that mismatch in the subthreshold domain is dominated
by RDF, and thus relatively less impacted by strain and optical eﬀects due to surrounding
geometries.
4.4 Subthreshold energy modeling including RDF
The impact of RDF induced variations is important to investigate in order to achieve rea-
sonable yield in subthreshold circuits. Monte-Carlo simulation can be helpful to investigate
single transistors, or smaller circuits, but this approach often becomes unmanageable for
larger circuits with high yield targets. In some cases an analytical approach can however
replace Monte-Carlo simulation. Speciﬁc for subthreshold circuits individual gate delays fol-
low strongly skewed distributions. Summing a ﬁxed number of skewed distributions with
unequal means and variances to ﬁnd the distribution of total delay, is unfortunately rather
diﬃcult to treat analytically. Paper VI instead looks at the sum of lognormally iid. distributed
delays. For circuit design this is perhaps rather a special case that is only valid if all gate delays
are optimized to have the same delay distribution. Analysis of circuits where multiple dis-
tributions are known to be lognormal but have diﬀerent means and variances could perhaps
beneﬁt from the approach in [103]. Nevertheless, several principles are well highlighted by
this study. Perhaps most importantly, the sum of relatively few iid. lognormal distributions
may show a strong response in the right-tail of the total, given that the variance of individual
delays are large. I.e., for long paths and large variances the worst-case delay may not always
be adequately ameliorated by ’averaging’ eﬀects, as is sometimes suggested.
The analysis derives worst-case expressions for: on-current ( Ion ), on/oﬀ ratios ( Ion/Io f f ),
propagation delay of simple gates ( tp ), clock period for series-connection of identical gates
( tc l k ), and energy per cycle (Eo p ). Analytical expressions for the optimal VDD to minimize
energy per operations were also derived. A similar analysis can be found in [76, 77], although
these do not include variability. Minimum energy-operation when simultaneously applying
DVS and ABB techniques was also treated in [104]. While [77] states that minimum energy
per operation theoretically is at minimum gate area. Paper VI ﬁnds that given the inﬂuence
of RDF this may not always be the case, although the absence of RDF yields the same result.
Although Paper VI successfully incorporates RDF in a subthreshold minimum energy
model there are several eﬀects that may be interesting to add if one were to expand the
analysis. In the analysis several threshold voltage shifts due to geometric variation of the gate
are not taken into account. As mentioned already, short and narrow channel eﬀects may
shift the mean threshold voltage. Additionally DIBL may have some inﬂuence as it depends
both on the geometry of the device as well as the drain voltage. Since the DIBL Vth shift
is proportional to VDD [7], it is however a weaker eﬀect when comparing to superthreshold
4.4. Subthreshold energy modeling including RDF 45
operation. The assumption that the threshold voltage follows the normal distribution may
be challenged, perhaps particularily if eﬀects such as NBTI modulate the threshold voltage
during operation. Experimental results in [32] seem to indicate that Vth shifts induced by
NBTI follow a Skellam distribution after stressing devices.
A notable aspect of the contour plots of Fig. 1 in Paper VI is that the optimal reliable
sizing for minimum energy for a subthreshold circuit, does not coincide with an optimal
sizing for higher supply voltages. Thus a project targeting a DVS scheme for minimum
energy will be forced to balance the choice of how to size standard cells, based on the expected
activity of the circuit.
The analysis of Paper VI is helpful to understand how circuit complexity and RDF-
induced variability aﬀects the overall system performance when scaling up the number of
transistors in a sequential subthreshold circuit. One application for which the analysis may
be particularily suited lies in feasibility studies, where one wants early estimates of perfor-
mance given a speciﬁc circuit complexity and a target yield. Another application for the
analysis would be in standard cell optimization, e.g. to use the results to guide optimization
such as target VDD and gate area, or also to use some of the developed expressions to constrain
and speed up the optimization process.

Chapter 5
Conclusion
5.1 Conclusion
The paper contributions of this thesis relate closely to the ﬁeld of ultra low voltage or sub-
threshold circuit design, and seek to advance the ﬁeld both through an enhanced understand-
ing, and improved methods for the purpose of designing reliable and eﬃcient subthreshold
digital logic and memory.
Potential exploits of the inverse narrow-width eﬀect (INWE) are highlighted in Paper
I, where it is shown that arrays of minimum-sized transistors in certain cases can provide
enhanced performance over wide transistors of an equivalent area. In simulation iso-area
inverters in a ring oscillator achieved reduced delays (35%–40%) and reduced PDP (40%–
43%). Measurement results point to the presence of such an eﬀect although it may be weaker,
thus improved models may be necessary for reliably optimizing designs to take advantage of
this eﬀect.
Multi-objective optimization (MOO) algorithms, such as employed in Papers II, III, and
IV represents a powerful method for exploration of resource eﬃcient tradeoﬀ alternatives
when implementing a circuit. Given a speciﬁc optimization problem, MOO can be used
to compare circuit topologies and ﬁnd the optimal implementation for a speciﬁc desired
performance range. After the Pareto Front has been found, the Pareto optimal solution that
best suits the application, can easily be selected for further use. Visualization, and metrics of
Pareto Fronts can quickly provide performance limits of a speciﬁc implementation. MOO
also has, in principle, the potential to bypass many problems related to subthreshold sizing,
when the combined eﬀects of subthreshold phenomena such as SCE, NWE, INWE, and
RDF, may otherwise cause design diﬃculties.
The 9T SRAM cell of Paper IV was carefully optimized for resource eﬃcient operation
at a supply voltage of 300mV. The cell uses techniques such as read-write decoupling, mul-
tiple threshold voltages, and a virtual retention supply voltage to achieve improved stability,
leakage and access speed. In Paper V the 9T cell is used within a 2 kb SRAM module to
demonstrate the feasibility of a design based on this cell. From 38 tested samples reliable
measurements from the 9T SRAM module are provided, with Vmin between 230mV and
271mV, minimum energy per operation at an average supply voltage of 321mV, with aver-
ages of 0.57 pJ energy per access and 730 kHz for the operating frequency. These results are
evidence of reliable operation, and the performance metrics are comparable to state-of-the-art
47
48 Chapter 5. Conclusion
published results for ULV-SRAM.
For digital circuit design in the subthreshold domain the analysis of Paper VI shows a
possible path for how combined eﬀects of RDF variability and complexity can be estimated
very early in the design phase, even before W /L sizing of library gates. Through slightly
simpliﬁed models the analysis allows early estimates for the minimum energy operating volt-
age, expected energy per gate per cycle, as well as expected operating frequency and suggested
gate areas for eﬃcient RDF mitigation. Through contour plots available trade-oﬀs can be
investigated. Such an analysis could perhaps be ideal in tasks such as performing feasibility
studies to evaluate the feasibility of a speciﬁc circuit project in the subthreshold domain, or
during technology selection to compare merits of diﬀerent technologies.
5.2 Recommendations for Further Work
Improvements and use of the work in Paper I, in the context of circuits and systems design,
will perhaps rely on foundries providing accurate models for ULV phenomena such as INWE.
Since subthreshold circuit design is still considered a niche the wait may however be long.
Of course for projects targeting this, statistical measurements on test devices could resolve
that situation. Utilization of beneﬁts should be fairly straightforward, applying the MST
techniques to digital standard cells that require higher operating speeds and can tolerate an
increase in leakage, but potentially granting savings on energy per operation.
For MOO, computational eﬀort can be greatly reduced by ﬁnding good constraints.
Speciﬁcally it would be interesting to investigate the usefulness of applying some of the re-
sults from the analysis of Paper VI to provide constraints, e.g. transistor area, for the multi-
objective optimization of standard cells for a subthreshold standard cell library. It would also
be interesting with further investigation into which related objectives should be preferred or
combined when considering reducing the number of objective functions. After the applica-
tion of a MOOmethod to a circuit intended for use in a standard cell library, when multiple
resource eﬃcient tradeoﬀs are available, it seems interesting to pursue a path where electronic
design automation (EDA) tools could pick and choose from the available tradeoﬀs, in order
to optimize each delay path individually. This may however require automated layouts, and
likely several modiﬁcations to EDA software algorithms. Unfortunately state-of-the-art EDA
software is almost entirely proprietary. A more moderate step towards this would perhaps
be to realize and characterize a moderate subset of the Pareto optimal solutions to provide
evidence of the eﬃcacy of such an approach, or perhaps to do postprocessing of the gate level
netlist to reﬁne the implementation. A challenge related to using Monte-Carlo simulation in
conjunction with MOO, is how to sample the distribution eﬀectively. The problem is exacer-
bated by the inﬂuence ofW /L scaling on the distribution. Little work has been published so
far, with the aim of eﬃciently sampling or describing lognormal performance distributions
as seen in subthreshold circuits.
To improve the 9T SRAM a thincell implementation of the cell layout may allow a strong
improvement to the access energy. Since one of the main diﬀerences to the 8T SRAM cell is
that the 9T cell has a diﬀerential read current, it would be prudent to ﬁrst show that the sense
ampliﬁer can utilize this to improve performance, granting an overall beneﬁt that outweighs
5.2. Recommendations for Further Work 49
the eﬀects of increased area. For applications requiring extremely low Vmin a body biasing
scheme should be implemented, in the bit cell array as well as in peripheral circuits. Device
properties in other fabrication technologies may be diﬀerent, thus NMOS access transistors
may be preferred.
Further work on the analysis of Paper VI would perhaps start at invoking a circuit sim-
ulator to test the accuracy of the model. Incorporating simulation also allows for modeling
of prominent subthreshold device eﬀects such as SCE, NWE, INWE and DIBL, although
optimal individual gateW /L sizing and/or body bias tuning might be a complicating factor.
Plots of worst-case delay vs. yield, and energy and operating voltage vs. yield, may be an in-
teresting addition for visualization. An extension of the analysis could also cover the eﬀects
of multiple series transistors on the delay distribution, which can be expected to worsen the
performance somewhat.

Publications
51

Backmatter
101

Bibliography
[1] C. Hu, “Future CMOS scaling and reliability,” Proceedings of the IEEE, vol. 81, pp. 682–
689, May 1993.
[2] E. A. Vittoz, Low Power CMOS Circuits, ch. 16. Weak Inversion for Ultimate Low-
Power Logic, pp. 16–1–16–18. CRC Press, Taylor & Francis Gorup, LLC, 2006.
[3] R. Troutman, “Vlsi limitations from drain-induced barrier lowering,” Solid-State Cir-
cuits, IEEE Journal of, vol. 14, pp. 383–391, Apr 1979.
[4] D. Liu and C. Svensson, “Trading speed for low power by choice of supply and thresh-
old voltages,” Solid-State Circuits, IEEE Journal of, vol. 28, pp. 10 –17, jan 1993.
[5] ITRS, “The international technology roadmap for semiconductors : 2013 edition :
Process integration, devices, and structures,” tech. rep., ITRS, 2013.
[6] H. Ammari, N. Gomes, W. Grosky, M. Jacques, B. Maxim, and D. Yoon, Wireless
Sensor Networks: Current status and future trends., ch. I: Review of applications of
wireless sensor networks, pp. 3–32. CRC Press, 2012.
[7] B. G. Streetman and S. Banerjee, Solid State Electronic Devices. Prentice Hall, 5 ed.,
2000.
[8] M. S. O. Haugland, “Multiobjective optimization of an ultra low voltage/low power
standard cell library for digital logic synthesis,” Master’s thesis, University of Oslo,
2012.
[9] C. Piguet et al., Low Power CMOS circuits. CRC Press, Taylor & Francis Group, LLC,
2006.
[10] A. Chandrakasan, S. Sheng, and R. Brodersen, “Low-power CMOS digital design,”
Solid-State Circuits, IEEE Journal of, vol. 27, pp. 473–484, Apr 1992.
[11] Y. Tsividis, “Eric vittoz and the strong impact of weak inversion circuits,” Solid-State
Circuits Society Newsletter, IEEE, vol. 13, pp. 56–58, Summer 2008.
[12] E. Vittoz and J. Fellrath, “CMOS analog integrated circuits based on weak inversion
operation,” IEEE Journal of Solid-State Circuits, vol. 12, pp. 224–231, June 1977.
[13] H. Pao and C. Sah, “Eﬀects of diﬀusion current on characteristics of metal-oxide
(insulator)-semiconductor transistors,” Solid-State Electronics, vol. 9, no. 10, pp. 927
– 937, 1966.
103
104
[14] E. A. Vittoz, Micropower Techniques, pp. 53–96. Prentice-Hall, Inc., 1994.
[15] C. Enz, F. Krummenacher, and E. Vittoz., “An analytical MOS transistor model valid
in all regions of operation and dedicated to low-voltage and low-current applications,”
Analog Integrated Circuits and Signal Processing journal on Low-Voltage and Low-Power
Design, vol. 8, no. Special Issue July, pp. 83–114, 1995.
[16] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-threshold design for ultra low-
power systems. Springer, 2006.
[17] J. Kao, M. Miyazaki, and A. Chandrakasan, “A 175-mV multiply-accumulate unit us-
ing an adaptive supply voltage and body bias architecture,” Solid-State Circuits, IEEE
Journal of, vol. 37, pp. 1545–1554, Nov 2002.
[18] A. Wang and A. Chandrakasan, “A 180-mv subthreshold FFT processor using a min-
imum energy design methodology,” Solid-State Circuits, IEEE Journal of, vol. 40,
pp. 310–319, Jan 2005.
[19] D. Blaauw and B. Zhai, “Energy eﬃcient design for subthreshold supply voltage oper-
ation,” in Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International
Symposium on, pp. 4 pp.–32, May 2006.
[20] O. C. Akgun, J. Rodrigues, and J. Sparsø, “Energy-minimum sub-threshold self-timed
circuits using current sensing completion detection,” IET Computers & Digital Tech-
niques, vol. 5, no. 4, pp. 342–353, 2011.
[21] G. Schrom, C. Pichler, T. Simlinger, and S. Selberherr, “On the lower bounds of
CMOS supply voltage,” Solid-State Electronics, vol. 39, no. 4, pp. 425 – 430, 1996.
[22] E. Nowak, “Maintaining the beneﬁts of CMOS scaling when scaling bogs down,” IBM
Journal of Research and Development, pp. 169–180, Mar./May 2002.
[23] M. Alioto, “Ultra-low power VLSI circuit design demystiﬁed and explained: A tuto-
rial,” IEEE Trans. Circuits Syst. I, vol. 59, no. 1, pp. 3–29, 2012.
[24] M. Alioto, “Understanding DC behavior of subthreshold CMOS logic through closed-
form analysis,” IEEE Trans. Circuits Syst. I, vol. 57, no. 7, pp. 1597–1607, 2010.
[25] M. V. Dunga and et al., BSIM4.6.0 MOSFET Model - User’s Manual. University of
California, Berkeley, 2006.
[26] K. M. Cao, W. Liu, X. Jin, K. Vashanth, K. Green, J. Krick, T. Vrotsos, and C. Hu,
“Modeling of pocket implanted mosfets for anomalous analog behavior,” in Electron
Devices Meeting, 1999. IEDM ’99. Technical Digest. International, pp. 171–174, Dec 1999.
[27] T.-H. Kim, J. Keane, H. Eom, and C. H. Kim, “Utilizing reverse short-channel eﬀect
for optimal subthreshold circuit design,” IEEE Trans. VLSI Syst., vol. 15, no. 7, pp. 821–
829, 2007.
105
[28] A. Asenov, S. Kaya, and A. Brown, “Intrinsic parameter ﬂuctuations in decananometer
mosfets introduced by gate line edge roughness,” Electron Devices, IEEE Transactions
on, vol. 50, pp. 1254–1260, May 2003.
[29] S. A. Campbell, The science and engineering of microelectronic fabrication. Oxford Uni-
versity Press, 2001.
[30] T. Mizuno, J. Okumtura, and A. Toriumi, “Experimental study of threshold voltage
ﬂuctuation due to statistical variation of channel dopant number in MOSFET’s,” Elec-
tron Devices, IEEE Transactions on, vol. 41, pp. 2216–2221, Nov 1994.
[31] A. Asenov, “Random dopant induced threshold voltage lowering and ﬂuctuations in
sub-0.1 ummosfet’s: A 3-d "atomistic" simulation study,” Electron Devices, IEEE Trans-
actions on, vol. 45, pp. 2505–2513, Dec 1998.
[32] V. Huard, C. Parthasarathy, C. Guerin, T. Valentin, E. Pion, M. Mammasse, N. Planes,
and L. Camus, “NBTI degradation: From transistor to SRAM arrays,” in Reliability
Physics Symposium, 2008. IRPS 2008. IEEE International, pp. 289–300, April 2008.
[33] M. Pelgrom and M. Vertregt, “Component matching: best practices and fundamental
limits..” IDESA Seminar DVDs, www.idesa-training.org.
[34] P. A. Stolk, F. P. Widdershoven, and D. B. M. Klaassen, “Modeling statistical dopant
ﬂuctuations in MOS transistors,” IEEE Transactions on Electron Devices, vol. 45,
pp. 1960–1971, Sept. 1998.
[35] K. Mistry, C. Allen, C. Auth, B. Beattie, D. Bergstrom, M. Bost, M. Brazier,
M. Buehler, A. Cappellani, R. Chau, C.-H. Choi, G. Ding, K. Fischer, T. Ghani,
R. Grover, W. Han, D. Hanken, M. Hattendorf, J. He, J. Hicks, R. Huessner, D. In-
gerly, P. Jain, R. James, L. Jong, S. Joshi, C. Kenyon, K. Kuhn, K. Lee, H. Liu, J. Maiz,
B. Mcintyre, P. Moon, J. Neirynck, S. Pae, C. Parker, D. Parsons, C. Prasad, L. Pipes,
M. Prince, P. Ranade, T. Reynolds, J. Sandford, L. Shifren, J. Sebastian, J. Seiple, D. Si-
mon, S. Sivakumar, P. Smith, C. Thomas, T. Troeger, P. Vandervoorn, S. Williams, and
K. Zawadzki, “A 45nm logic technology with high-k+metal gate transistors, strained
silicon, 9 Cu interconnect layers, 193nm dry patterning, and 100% Pb-free packaging,”
in Electron Devices Meeting, 2007. IEDM 2007. IEEE International, pp. 247–250, Dec
2007.
[36] M. Blesken, U. Rückert, D. Steenken, K. Witting, and M. Dellnitz, “Multiobjective
optimization for transistor sizing of CMOS logic standard cells using set-oriented nu-
merical techniques,” in 27th Norchip Conference, nov. 2009.
[37] O. L. D. Weck, “Multiobjective optimization: History and promise,” in Proc. 3rd
China-Japan-Korea Joint Symp. Optimization Structural Mech. Syst. Invited Keynote Paper
GL2-2, 2004.
[38] J. Andersson, “A survey of multiobjective optimization in engineering design,” Tech-
nical report LiTH-IKP-R-1097, Dept. Mechanical Engineering, Linköping University,
2000.
106
[39] A. Zhou, B.-Y. Qu, H. Li, S.-Z. Zhao, P. N. Suganthan, and Q. Zhang, “Multiobjective
evolutionary algorithms: A survey of the state of the art,” Swarm and Evolutionary
Computation, vol. 1, no. 1, pp. 32 – 49, 2011.
[40] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective
genetic algorithm: Nsga-ii,” IEEE Trans. Evol. Comput., vol. 6, no. 2, pp. 182–197,
2002.
[41] E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the strength pareto evo-
lutionary algorithm,” tech. rep., 2001.
[42] M. W. Blesken, Ein Mehrzieloptimierungsansatz zur Dimensionierung ressourceneﬃzien-
ter integrierter Schaltungen. PhD thesis, University of Paderborn, 2012.
[43] M. Fazzolari, R. Alcala, Y. Nojima, H. Ishibuchi, and F. Herrera, “A review of the
application of multiobjective evolutionary fuzzy systems: Current status and further
directions,” Fuzzy Systems, IEEE Transactions on, vol. 21, pp. 45–65, Feb 2013.
[44] M. Blesken, S. Lütkemeier, and U. Rückert, “Multiobjective optimization for transis-
tor sizing of sub-threshold CMOS logic standard cells,” Circuits and Systems (ISCAS),
Proceedings of 2010 IEEE International Symposium on, pp. 1480 –1483, May 2010.
[45] M. H. Abu-Rahma and M. Anis, Nanometer Variation-Tolerant SRAM. Springer New
York, 2013.
[46] E. Seevinck, F. List, and J. Lohstroh, “Static-noise margin analysis of MOS SRAM
cells,” IEEE Journal of Solid State Circuits, vol. 22, no. 5, pp. 748–754, 1987.
[47] I. J. Chang, J.-J. Kim, S. Park, and K. Roy, “A 32 kb 10T sub-threshold SRAM ar-
ray with bit-interleaving and diﬀerential read scheme in 90 nm CMOS,” Solid-State
Circuits, IEEE Journal of, vol. 44, pp. 650 –658, feb. 2009.
[48] B. Zimmer, S. O. Toh, H. Vo, Y. Lee, O. Thomas, K. Asanovic, and B. Nikolic, “Sram
assist techniques for operation in a wide voltage range in 28-nm cmos,” Circuits and
Systems II: Express Briefs, IEEE Transactions on, vol. 59, pp. 853–857, Dec 2012.
[49] L. Chang, D. Fried, J. Hergenrother, J. Sleight, R. Dennard, R. Montoye, L. Sekaric,
S. McNab, A. Topol, C. Adams, K. Guarini, and W. Haensch, “Stable SRAM cell
design for the 32 nm node and beyond,” in VLSI Technology, 2005. Digest of Technical
Papers. 2005 Symposium on, pp. 128–129, June 2005.
[50] L. Chang, Y. Nakamura, R. Montoye, J. Sawada, A. Martin, K. Kinoshita, F. Gebara,
K. Agarwal, D. Acharyya, W. Haensch, K. Hosokawa, and D. Jamsek, “A 5.3GHz 8T-
SRAM with operation down to 0.41V in 65nm CMOS,” in VLSI Circuits, 2007 IEEE
Symposium on, pp. 252–253, June 2007.
[51] T.-H. Kim, J. Liu, and C. Kim, “A voltage scalable 0.26 V, 64 kb 8T SRAMwith Vmin
lowering techniques and deep sleep mode,” Solid-State Circuits, IEEE Journal of, vol. 44,
pp. 1785–1795, June 2009.
107
[52] H. Zhu and V. Kursun, “A comprehensive comparison of data stability enhancement
techniques with novel nanoscale SRAM cells under parameter ﬂuctuations,” Circuits
and Systems I: Regular Papers, IEEE Transactions on, vol. 61, pp. 1473–1484, May 2014.
[53] H. Berge and S. Aunet, “Beneﬁts of decomposing wide CMOS transistors into
minimum-size gates,” in NORCHIP, 2009, pp. 1 –4, nov. 2009.
[54] K. Ohe, S. Odanaka, K. Moriyama, T. Hori, and G. Fuse, “Narrow-width eﬀects of
shallow trench-isolated CMOS with n+ -polysilicon gate,” IEEE Trans. Electron De-
vices, vol. 36, pp. 1110–1116, June 1989.
[55] P. Sallagoity, M. Ada-Haniﬁ, M. Paoli, and M. Haond, “Analysis of width edge ef-
fects in advanced isolation schemes for deep submicron CMOS technologies,” IEEE
Transactions on Electron Devices, vol. 43, pp. 1900–1906, 1996.
[56] H. K. O. Berge and S. Aunet, “Multi-objective optimization of minority-3 functions
for ultra-low voltage supplies,” in Proc. IEEE Int Circuits and Systems (ISCAS) Symp,
pp. 2313–2316, 2011.
[57] D. Hampel, K. Prost, and N. Scheingberg, “Threshold logic using complementary
MOS device,” Aug. 1975.
[58] S. Aunet, “Subthreshold minority-3 gates and inverters used for 32-bit serial and paral-
lel adders implemented in 90 nm CMOS,” in Proceedings of NORCHIP 2009, pp. 1 –6,
nov. 2009.
[59] B. H. Calhoun, A. Wang, and A. Chandrakasan, “Device sizing for minimum energy
operation in subthreshold circuits,” in Proc. Custom Integrated Circuits Conf the IEEE
2004, pp. 95–98, 2004.
[60] H. Berge, A. Hasanbegovic, and S. Aunet, “Muller c-elements based on minority-3
functions for ultra low voltage supplies,” in Design and Diagnostics of Electronic Cir-
cuits Systems (DDECS), 2011 IEEE 14th International Symposium on, pp. 195–200, April
2011.
[61] D. Muller and W. Bartky, “A theory of asynchronous circuits,” in Proceedings of an
International Symposion on the Theory of Witching, pp. 204–243, Harvard University
Press, Apr. 1959.
[62] J. Sparsø and S. Furber, Principles of asynchronous circuit design - A systems perspective.
Kluwer Academic Publishers, 2001.
[63] K. Fant and S. Brandt, “NULL convention logic™: a complete and consistent logic
for asynchronous digital circuit synthesis,” in Application Speciﬁc Systems, Architectures
and Processors, 1996. ASAP 96. Proceedings of International Conference on, pp. 261–273,
Aug 1996.
[64] I. E. Sutherland, “Micropipelines,” Commun. ACM, vol. 32, pp. 720–738, 1989.
108
[65] B. Calhoun and A. Chandrakasan, “A 256-kb 65-nm sub-threshold SRAM design for
ultra-low-voltage operation,” IEEE journal of solid-state circuits, vol. 42, pp. 680–688,
2007.
[66] T.-H. Kim, J. Liu, J. Keane, and C. Kim, “A 0.2 v, 480 kb subthreshold SRAM with 1
k cells per bitline for ultra-low-voltage computing,” Solid-State Circuits, IEEE Journal
of, vol. 43, pp. 518 –529, feb. 2008.
[67] H. K. O. Berge, M. Blesken, S. Aunet, and U. Rückert, “Design of 9T SRAM for
dynamic voltage supplies by a multiobjective optimization approach,” in Proc. 17th
IEEE Int Electronics, Circuits, and Systems (ICECS) Conf, pp. 319–322, 2010.
[68] M. Sinangil, N. Verma, and A. Chandrakasan, “A reconﬁgurable 8T ultra-dynamic
voltage scalable (U-DVS) SRAM in 65 nm CMOS,” Solid-State Circuits, IEEE Journal
of, vol. 44, pp. 3163 –3173, nov. 2009.
[69] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, “A variation-tolerant sub-200 mV
6-T subthreshold SRAM,” Solid-State Circuits, IEEE Journal of, vol. 43, pp. 2338 –2348,
oct. 2008.
[70] S. Lütkemeier, T. Jungeblut, H. K. O. Berge, S. Aunet, M. Porrmann, and U. Ruckert,
“A 65 nm 32 b subthreshold processor with 9T multi-Vt SRAM and adaptive supply
voltage control,” Solid-State Circuits, IEEE Journal of, vol. PP, pp. 1 –12, Jan 2013.
[71] S. Lütkemeier, T. Jungeblut, M. Porrmann, and U. Rückert, “A 200 mV 32 b sub-
threshold processor with adaptive supply voltage control,” in Proc. IEEE Int. Solid-State
Circuits Conf. Digest of Technical Papers (ISSCC), pp. 484–486, 2012.
[72] S. Lütkemeier, Ressourceneﬃziente Digitalschaltungen für den Subschwellbetrieb. PhD
thesis, University of Paderborn, 2013.
[73] M. Seok, G. Chen, S. Hanson, M. Wieckowski, D. Blaauw, and D. Sylvester, “CAS-
FEST 2010: Mitigating variability in near-threshold computing,” IEEE Journal on
Emerging and Selected Topics in Circuits and Systems, vol. 1, no. 1, pp. 42–49, 2011.
[74] T. Jungeblut, G. Sievers, M. Porrmann, and U. Rückert, “Design space exploration for
memory subsystems of VLIW architectures,” in Networking, Architecture and Storage
(NAS), 2010 IEEE Fifth International Conference on, pp. 377 –385, july 2010.
[75] H. Berge and S. Aunet, “Yield-oriented energy and performance model for subthresh-
old circuits with Vth variations,” inDesign andDiagnostics of Electronic Circuits Systems
(DDECS), 2013 IEEE 16th International Symposium on, pp. 193–198, April 2013.
[76] A. Wang, B. Calhoun, and A. P. Chandrakasan, Sub-threshold Design for Ultra Low-
Power Systems (Series on Integrated Circuits and Systems), ch. 4. Springer-Verlag New
York, Inc., 2006.
109
[77] B. Calhoun, A. Wang, and A. Chandrakasan, “Modeling and sizing for minimum en-
ergy operation in subthreshold circuits,” Solid-State Circuits, IEEE Journal of, vol. 40,
pp. 1778 – 1786, sept. 2005.
[78] A. Beg, “Designing array-based CMOS logic gates by using a feedback control system,”
in Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on, pp. 935–
939, Oct 2014.
[79] P. T. B. Yew, Compact Modeling of Deep Submicron CMOS Transistor with Shallow
Trench Isolation Mechanical Stress Eﬀect. PhD thesis, Universiti Sains Malaysia, 2008.
[80] J. Zhou, S. Jayapal, J. Stuyt, J. Huisken, and H. de Groot, “The impact of inverse
narrow width eﬀect on sub-threshold device sizing,” in Proc. 16th Asia and South Paciﬁc
Design Automation Conf. (ASP-DAC), pp. 267–272, 2011.
[81] J. Zhou, S. Jayapal, B. Busze, L. Huang, and J. Stuyt, “A 40 nm inverse-narrow-width-
eﬀect-aware sub-threshold standard cell library,” pp. 441–446, 2011.
[82] V. Beiu, L. Iordaconiu, A. Beg, W. Ibrahim, and F. Kharbash, “Low power and highly
reliable gates using arrays of optimally sized transistors,” in Semiconductor Conference
(CAS), 2012 International, vol. 2, pp. 433–436, Oct 2012.
[83] K. van Berkel, “Beware the isochronic fork,” Integr. VLSI J., vol. 13, pp. 103–128, June
1992.
[84] M. Sengupta, S. Saxena, L. Daldoss, G. Kramer, S. Minehane, and J. Cheng,
“Application-speciﬁc worst case corners using response surfaces and statistical mod-
els,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on,
vol. 24, pp. 1372–1380, Sept 2005.
[85] M.-H. Tu, J.-Y. Lin, M.-C. Tsai, C.-Y. Lu, Y.-J. Lin, M.-H. Wang, H.-S. Huang, K.-D.
Lee, W.-C. Shih, S.-J. Jou, and C.-T. Chuang, “A single-ended disturb-free 9T sub-
threshold SRAM with cross-point data-aware write word-line structure, negative bit-
line, and adaptive read operation timing tracing,” Solid-State Circuits, IEEE Journal of,
vol. 47, pp. 1469–1482, June 2012.
[86] V. Beiu, M. Tache, and F. Kharbash, “Reliability enhanced SRAM bit-cells,” in Semi-
conductor Conference (CAS), 2014 International, pp. 229–232, Oct 2014.
[87] Z. Liu and V. Kursun, “Characterization of a novel nine-transistor SRAM cell,” Very
Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 16, pp. 488 –492, april
2008.
[88] H. Zhu and V. Kursun, “Symmetrical triple-threshold-voltage nine-transistor SRAM
circuit with superior noise immunity and overall electrical quality,” in SoC Design
Conference (ISOCC), 2011 International, pp. 333–336, Nov 2011.
110
[89] S. Tawﬁk and V. Kursun, “Low power and robust 7T dual-Vt SRAM circuit,” in Cir-
cuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on, pp. 1452–1455,
May 2008.
[90] H. Zhu and V. Kursun, “A comprehensive comparison of superior triple-threshold-
voltage 7-transistor, 8-transistor, and 9-transistor SRAM cells,” in Circuits and Systems
(ISCAS), 2014 IEEE International Symposium on, pp. 2185–2188, June 2014.
[91] B. Zhai, D. Blaauw, D. Sylvester, and S. Hanson, “A sub-200mV 6T SRAM in 0.13 um
CMOS,” in Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers.
IEEE International, pp. 332–606, Feb 2007.
[92] M. Sinangil, N. Verma, and A. Chandrakasan, “A reconﬁgurable 65nm SRAM achiev-
ing voltage scalability from 0.25 - 1.2V and performance scalability from 20kHz
- 200MHz,” in Solid-State Circuits Conference, 2008. ESSCIRC 2008. 34th European,
pp. 282–285, Sept 2008.
[93] B. Calhoun and A. Chandrakasan, “A 256kb sub-threshold SRAM in 65nm CMOS,”
in Solid-State Circuits Conference, 2006. ISSCC 2006. Digest of Technical Papers. IEEE
International, pp. 2592–2601, Feb 2006.
[94] T.-H. Kim, J. Liu, and C. Kim, “A voltage scalable 0.26V, 64kb 8T SRAM with Vmin
lowering techniques and deep sleep mode,” in Custom Integrated Circuits Conference,
2008. CICC 2008. IEEE, pp. 407–410, Sept 2008.
[95] T.-H. Kim, J. Liu, J. Keane, and C. Kim, “A high-density subthreshold SRAM with
data-independent bitline leakage and virtual ground replica scheme,” in Solid-State
Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International,
pp. 330–606, Feb 2007.
[96] M. Khayatzadeh and Y. Lian, “Average-8T diﬀerential-sensing subthreshold SRAM
with bit interleaving and 1k bits per bitline,” Very Large Scale Integration (VLSI) Sys-
tems, IEEE Transactions on, vol. 22, pp. 971–982, May 2014.
[97] A. Garg and T.-H. Kim, “SRAM array structures for energy eﬃciency enhancement,”
Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 60, pp. 351–355, June
2013.
[98] S. Cosemans, W. Dehaene, and F. Catthoor, “A 3.6 pJ/access 480 MHz, 128 kb on-chip
SRAM with 850 MHz boost mode in 90 nm CMOS with tunable sense ampliﬁers,”
Solid-State Circuits, IEEE Journal of, vol. 44, pp. 2065–2077, July 2009.
[99] A. Keshavarzi, S. Narendra, S. Borkar, C. Hawkins, K. Roy, and V. De, “Technology
scaling behavior of optimum reverse body bias for standby leakage power reduction
in CMOS IC’s,” in Low Power Electronics and Design, 1999. Proceedings. 1999 Interna-
tional Symposium on, pp. 252–254, Aug 1999.
111
[100] J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De,
“Adaptive body bias for reducing impacts of die-to-die and within-die parameter vari-
ations on microprocessor frequency and leakage,” in Solid-State Circuits Conference,
2002. Digest of Technical Papers. ISSCC. 2002 IEEE International, vol. 1, pp. 422–478
vol.1, Feb 2002.
[101] S. G. Narendra, Eﬀect of MOSFET Threshold Voltage Variation on High-Performance
Circuits. PhD thesis, Massachusetts Institute of Technology, 2002.
[102] Y. Sinangil and A. Chandrakasan, “A 128 kbit SRAM with an embedded energy mon-
itoring circuit and sense-ampliﬁer oﬀset compensation using body biasing,” Solid-State
Circuits, IEEE Journal of, vol. 49, pp. 2730–2739, Nov 2014.
[103] C. Lam and T. Le-Ngoc, “Log shifted gamma approximation to lognormal sum distri-
butions,” in Communications, 2005. ICC 2005. 2005 IEEE International Conference on,
vol. 1, pp. 495–499 Vol. 1, May 2005.
[104] D. Blaauw, S. Martin, T. Mudge, and K. Flautner, “Leakage current reduction in VLSI
systems,” Journal of Circuits, Systems and Computers, vol. 11, no. 6, 2002.

