The Integration of nearthreshold and subthreshold CMOS logic for energy minimization by Hicks, John
Rochester Institute of Technology
RIT Scholar Works
Theses Thesis/Dissertation Collections
10-1-2010
The Integration of nearthreshold and subthreshold
CMOS logic for energy minimization
John Hicks
Follow this and additional works at: http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.
Recommended Citation
Hicks, John, "The Integration of nearthreshold and subthreshold CMOS logic for energy minimization" (2010). Thesis. Rochester
Institute of Technology. Accessed from
The Integration of Nearthreshold and Subthreshold
CMOS Logic for Energy Minimization
by
John Kevin Hicks
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of
Master of Science in Computer Engineering
Supervised by
Dr. Dhireesha Kudithipudi
Department of Computer Engineering
Kate Gleason College of Engineering
Rochester Institute of Technology
Rochester, New York
October 2010
Approved By:
Dr. Dhireesha Kudithipudi
Assistant Professor, Department of Computer Engineering
Primary Advisor
Dr. Marcin Łukowiak
Assistant Professor, Department of Computer Engineering
Dr. Mike Hewitt
Assistant Professor, Department of Industrial & Systems Engineering
Thesis Release Permission Form
Rochester Institute of Technology
Kate Gleason College of Engineering
Title: The Integration of Nearthreshold and Subthreshold CMOS Logic
for Energy Minimization
I, John Kevin Hicks, hereby grant permission to the Wallace Memorial Library to reproduce
my thesis in whole or in part.
John Kevin Hicks
Date
Dedication
To my parents, for their constant love and support.
iii
Acknowledgments
I would like to thank my primary faculty advisor, Dr. Dhireesha Kudithipudi, for her
guidance throughout this research work. I would also like to thank my committee
members, Dr. Marcin Łukowiak and Dr. Mike Hewitt, for their time and valuable
feedback. Additionally, for acting as the point of contact for numerous Synopsys technical
support cases, I would like to thank Mr. Joe Walton.
iv
Abstract
With the rapid growth in the use of portable electronic devices, more emphasis has recently
been placed on low-energy circuit design. Digital subthreshold complementary metal-
oxide-semiconductor (CMOS) circuit design is one area of study that offers significant
energy reduction by operating at a supply voltage substantially lower than the threshold
voltage of the transistor. However, these energy savings come at a critical cost to per-
formance, restricting its use to severely energy-constrained applications such as micro-
sensor nodes. In an effort to mitigate this performance degradation in low-energy designs,
nearthreshold circuit design has been proposed and implemented in digital circuits such as
Intel’s energy-efficient hardware accelerator.
The application spectrum of nearthreshold and subthreshold design could be broad-
ened by integrating these cells into high-performance designs. This research focuses on
the integration of characterized nearthreshold and subthreshold standard cells into high-
performance functional modules. Within these functional modules, energy minimization is
achieved while satisfying performance constraints by replacing non-critical path logic with
nearthreshold and subthreshold logic cells. Specifically, the critical path method is used to
bind the timing and energy constraints of the design. The design methodology was verified
and tested with several benchmark circuits, including a cryptographic hash function, Skein.
An average energy savings of 41.15% was observed at a circuit performance degradation
factor of 10. The energy overhead of the level shifters accounted for at least 8.5% of the en-
ergy consumption of the optimized circuit, with an average energy overhead of 26.76%. A
heuristic approach is developed for estimating the energy savings of the optimized design.
v
Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Motivation and Supporting Work . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Supporting Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Ab Initio of Subthreshold/Nearthreshold Operation . . . . . . . . . . . . . 6
2.1 Subthreshold Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Nearthreshold Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Standard Cell Library Characterization for Hybrid Design Methodology . 11
3.1 Design Considerations for Standard Cell Library . . . . . . . . . . . . . . 11
3.2 Subthreshold Standard Cell Library . . . . . . . . . . . . . . . . . . . . . 14
3.3 Nearthreshold Standard Cell Library . . . . . . . . . . . . . . . . . . . . . 17
3.4 Standard Cell Library Characterization in Synopsys . . . . . . . . . . . . . 19
3.5 Gate Delay and Energy Consumption Estimation . . . . . . . . . . . . . . 21
3.5.1 Gate Delay Estimation . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5.2 Energy Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Level Shifter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Critical Path Method for Cell Placement . . . . . . . . . . . . . . . . . . . 30
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Critical Path Method Algorithm . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Identify Nodes and Node Arcs . . . . . . . . . . . . . . . . . . . . 31
vi
4.2.2 Identify Successors and Predecessors . . . . . . . . . . . . . . . . 33
4.2.3 Compute Early Start Times . . . . . . . . . . . . . . . . . . . . . . 34
4.2.4 Compute Late Start Times . . . . . . . . . . . . . . . . . . . . . . 34
4.2.5 Calculate Slack of Each Node . . . . . . . . . . . . . . . . . . . . 35
4.3 Application to Digital Circuits for Cell Placement . . . . . . . . . . . . . . 35
4.3.1 Circuit Representation as Directed Acyclic Graph . . . . . . . . . . 36
4.3.2 Cell Placement Algorithm . . . . . . . . . . . . . . . . . . . . . . 38
4.3.3 Performance Degradation Factor . . . . . . . . . . . . . . . . . . . 39
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Hybrid NESST Design Methodology . . . . . . . . . . . . . . . . . . . . . 41
5.1 Hybrid NESST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Heuristic for Hybrid Energy Savings Estimation . . . . . . . . . . . . . . . 45
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1 8-Bit Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.1 Functional Performance and Energy Analysis . . . . . . . . . . . . 48
6.1.2 Hybrid NESST Results Based on Energy Estimations . . . . . . . . 53
6.2 64-Bit Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3 ISCAS’89 Benchmark Circuits . . . . . . . . . . . . . . . . . . . . . . . . 61
6.4 Skein: A Cryptographic Hash Function . . . . . . . . . . . . . . . . . . . 66
6.4.1 Skein Key Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.4.2 Skein-256-256 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
A Delay and Energy Measurements . . . . . . . . . . . . . . . . . . . . . . . 79
B ISCAS’89 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . 83
B.1 ISCAS’89 S386 Benchmark Circuit . . . . . . . . . . . . . . . . . . . . . 84
B.2 ISCAS’89 S832 Benchmark Circuit . . . . . . . . . . . . . . . . . . . . . 85
vii
B.3 ISCAS’89 S1196 Benchmark Circuit . . . . . . . . . . . . . . . . . . . . . 86
B.4 ISCAS’89 S1238 Benchmark Circuit . . . . . . . . . . . . . . . . . . . . . 87
B.5 ISCAS’89 S1494 Benchmark Circuit . . . . . . . . . . . . . . . . . . . . . 88
viii
List of Figures
2.1 NMOS Transistor Diffusion Current in Subthreshold . . . . . . . . . . . . 7
2.2 NMOS Transistor with Logic High Input, VDD = 0.6V . . . . . . . . . . . 9
2.3 NMOS Transistor Drift Current in Nearthreshold . . . . . . . . . . . . . . 9
3.1 Propagation Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Inverter with X1, X2, X4, X8 Drive Strengths . . . . . . . . . . . . . . . . 13
3.3 Nearthreshold Inverter VTC with 2:1, 3:1, 4:1 Aspect Ratios . . . . . . . . 17
3.4 Library Characterization Flow in Synopsys . . . . . . . . . . . . . . . . . 19
3.5 Rise Transitions for 3-Input AND Gate . . . . . . . . . . . . . . . . . . . 23
3.6 Fall Transitions for 3-Input AND Gate . . . . . . . . . . . . . . . . . . . . 23
3.7 Average Energy Consumption of Nearthreshold and Subthreshold Cells . . 25
3.8 Level Shifter Interface Between Subthreshold and Superthreshold Logic
Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.9 Conventional Level Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.10 Constant Current Mirror With Body Ties Level Shifter . . . . . . . . . . . 27
3.11 Static Current in Constant Current Mirror With Body Ties Level Shifter . . 28
3.12 Contention-Mitigated Level Shifter . . . . . . . . . . . . . . . . . . . . . 28
4.1 Directed Acyclic Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Directed Acyclic Graph with Start and Finish Nodes . . . . . . . . . . . . 32
4.3 (a) Predecessors and (b) Successors of Node E . . . . . . . . . . . . . . . 33
4.4 Combinational Logic Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 Combinational Logic Circuit as Directed Acyclic Graph . . . . . . . . . . 37
4.6 Sequential Logic Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.7 Sequential Logic Circuit as Directed Acyclic Graph . . . . . . . . . . . . 38
5.1 Hybrid Super/Near/Subthreshold Design Methodology . . . . . . . . . . . 43
5.2 Hybrid NESST Program Design Methodology . . . . . . . . . . . . . . . 44
5.3 Heuristic Logic Sequence of N Combinational Stages . . . . . . . . . . . . 45
6.1 8-Bit Multiplier with Sequential Output Architecture . . . . . . . . . . . . 48
ix
6.2 8-Bit Multiplier Total Energy Savings with a Subset of PDFs . . . . . . . . 51
6.3 8-Bit Multiplier Total Energy Savings with Varying PDFs . . . . . . . . . 51
6.4 Estimated Total Energy Savings of 8-Bit Multiplier . . . . . . . . . . . . . 53
6.5 Superthreshold AND Gate Chain . . . . . . . . . . . . . . . . . . . . . . 54
6.6 AND Gate Chain with Degradation Factor of 9 . . . . . . . . . . . . . . . 55
6.7 AND Gate Chains with Degradation Factors of (a) 684 and (b) 690 . . . . 55
6.8 Total Energy Savings with Countermeasure Mode for 8-Bit Multiplier . . . 57
6.9 Total Energy Savings with Extended Mode for 8-Bit Multiplier . . . . . . 57
6.10 64-Bit Adder with Sequential Output Architecture . . . . . . . . . . . . . 58
6.11 Total Energy Savings with Original Mode for 64-Bit Adder . . . . . . . . 60
6.12 Total Energy Savings with Countermeasure/Extended Modes for 64-Bit
Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.13 Total Energy Savings with Original Mode for ISCAS’89 S1488 . . . . . . 62
6.14 Total Energy Savings with Countermeasure Mode for ISCAS’89 S1488 . . 63
6.15 Total Energy Savings with Extended Mode for ISCAS’89 S1488 . . . . . . 64
6.16 Level Shifter Energy Overhead with Countermeasure Mode for ISCAS’89
S1488 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.17 Block Diagram of Skein Key Scheduler [37] . . . . . . . . . . . . . . . . 66
6.18 Total Energy Savings with Countermeasure Mode for Skein Key Scheduler 68
6.19 Block Diagram of Block Cipher Used in Skein [24] . . . . . . . . . . . . . 69
6.20 Total Energy Savings with Countermeasure Mode for Skein-256-256 . . . 70
B.1 Energy Savings with Countermeasure Mode for ISCAS’89 S386 (Subset) . 84
B.2 Energy Savings with Countermeasure Mode for ISCAS’89 S386 . . . . . . 84
B.3 Energy Savings with Countermeasure Mode for ISCAS’89 S832 (Subset) . 85
B.4 Energy Savings with Countermeasure Mode for ISCAS’89 S832 . . . . . . 85
B.5 Energy Savings with Countermeasure Mode for ISCAS’89 S1196 (Subset) 86
B.6 Energy Savings with Countermeasure Mode for ISCAS’89 S1196 . . . . . 86
B.7 Energy Savings with Countermeasure Mode for ISCAS’89 S1238 (Subset) 87
B.8 Energy Savings with Countermeasure Mode for ISCAS’89 S1238 . . . . . 87
B.9 Energy Savings with Countermeasure Mode for ISCAS’89 S1494 (Subset) 88
B.10 Energy Savings with Countermeasure Mode for ISCAS’89 S1494 . . . . . 88
B.11 Energy Savings with Extended Mode for ISCAS’89 S1494 . . . . . . . . . 89
x
List of Tables
3.1 Subthreshold Standard Cell Library From [20] . . . . . . . . . . . . . . . . 15
3.2 Expanded Subthreshold Standard Cells . . . . . . . . . . . . . . . . . . . . 16
3.3 Nearthreshold Inverter Propagation Delays . . . . . . . . . . . . . . . . . . 18
3.4 Sequential Standard Cell Library Gates . . . . . . . . . . . . . . . . . . . 20
3.5 2-Input Logic Gate Input Combinations . . . . . . . . . . . . . . . . . . . 21
3.6 2-Input AND Logic Gate Input Combinations . . . . . . . . . . . . . . . . 22
3.7 Level Shifter Delays and Energy Consumption . . . . . . . . . . . . . . . . 29
4.1 Node Arcs and Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Node Predecessors and Successors . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Early Start Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Late Start Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Slack Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1 Gate Counts for 8-Bit Multiplier Functional Tests . . . . . . . . . . . . . . 49
6.2 Total Energy Per Bit Measurements for 8-Bit Multiplier Functional Simu-
lations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Total Energy Savings for 8-Bit Multiplier Functional Simulations . . . . . . 52
6.4 Total Energy Savings for 8-Bit Multiplier, Super/Near/Sub-Vt Implemen-
tations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.5 8-Bit Multiplier Gate Counts Demonstrating Nearthreshold Replacement . . 54
6.6 Gate Counts for 64-Bit Adder . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.7 Estimated Total Energy Savings for 64-Bit Adder, Super/Near/Sub-Vt Im-
plementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.8 Gate Counts for ISCAS’89 S1488 . . . . . . . . . . . . . . . . . . . . . . 62
6.9 Estimated Total Energy Savings for ISCAS’89 S1488, Super/Near/Sub-Vt
Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.10 Gate Counts for Skein Key Scheduler . . . . . . . . . . . . . . . . . . . . 67
6.11 Estimated Total Energy Savings for Skein Key Scheduler, Super/Near/Sub-
Vt Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
xi
6.12 Gate Counts for Skein-256-256 . . . . . . . . . . . . . . . . . . . . . . . . 70
6.13 Estimated Total Energy Savings for Skein-256-256, Super/Near/Sub-Vt Im-
plementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A.1 Delay and Energy Measurements . . . . . . . . . . . . . . . . . . . . . . . 79
A.2 Delay and Energy Measurements, Continued . . . . . . . . . . . . . . . . 80
A.3 Delay and Energy Measurements, Continued . . . . . . . . . . . . . . . . 81
A.4 Delay and Energy Measurements, Level Shifters . . . . . . . . . . . . . . . 82
xii
Chapter 1
Motivation and Supporting Work
1.1 Motivation
The rapid growth of battery-operated mobile devices, including smart phones and e-readers,
has created demand for methods to reduce the energy consumption of these handheld elec-
tronic devices. Subthreshold and nearthreshold operation are two emerging low-power de-
sign techniques that could potentially provide significant energy savings for these devices.
In addition to mobile electronics, other energy-constrained applications could also benefit
from the reduction of energy consumption, including the implementation of micro-sensor
nodes, often used to study temperature, pressure, and behavior in remote locations [7].
Subthreshold operation results in a significant decrease in the energy consumption of a
circuit. By operating at a subthreshold supply voltage, the energy consumption of a circuit
can be reduced by more than 20X due to the fact that dynamic energy is proportional to
VDD
2, where VDD is the supply voltage of the circuit [13]. However, the performance
of the circuit suffers as a result of the weak transistor drain currents driving the circuit.
When compared to transistor operation with a superthreshold supply voltage, the transient
behavior of logic in subthreshold is 3 to 4 orders of magnitude slower [5].
More recently, operating transistors at a supply voltage near the threshold voltage of
the devices has been investigated. Although the energy savings provided by nearthreshold
operation are not as significant as subthreshold, an energy savings on the order of 10X has
been observed [13]. An advantage of utilizing nearthreshold operation over subthreshold
1
is the performance cost; nearthreshold operation is only 1 to 2 orders of magnitude slower
[13].
Due to the weighty performance cost of subthreshold operation, its use has mainly
been restricted to niche applications that operate under a severe energy constraint [7].
Nearthreshold operation has been investigated in mainstream applications, including Intel’s
energy-efficient hardware accelerator, capable of performing encryption and decryption of
the Advanced Encryption Standard (AES). The supply range of this 45nm chip is 1.1V to
320mV [1].
In order to expand the range of applications for subthreshold and nearthreshold opera-
tion, the performance degradation resulting from the use of these devices must be mitigated
[5]. Traditional high performance applications cannot exclusively utilize subthreshold or
nearthreshold operation because they require periodic high-speed functionality. However,
there are several sub-components within these applications, which operate at a much lower
frequency or stay idle for most of the time. Subthreshold/Nearthreshold operation for these
sub-components can reduce the processor’s energy consumption, which can lower the on-
chip hotspots and lower energy costs. Based on this idea, this work investigates a hybrid
design methodology that supports flexible integration of subthreshold and nearthreshold
logic in regular design fabrics. The critical and non-critical paths of a high performance
design are identified, and the non-critical path logic is analyzed to determine if the available
timing slack allows for the integration of functionally-equivalent low-energy subthreshold
or nearthreshold cells. In this design approach, for a fixed performance budget, the user
can implement a combination of subthreshold and nearthreshold designs. With this design
methodology, the application spectrum of circuits utilizing subthreshold and nearthreshold
operation can be expanded.
2
1.2 Supporting Work
Recent studies have shown that nearly 10% of the United States’ total power consumption
comes from consumer electronics, including portable handheld devices and laptop com-
puters [19]. By decreasing the power consumption of these electronics, the battery lives of
these devices can be extended. Subthreshold and nearthreshold designs are two low-power
design techniques that have been investigated in recent years.
There is a significant body of existing work on subthreshold design [6, 7, 16, 22, 32, 35].
Because of the potential energy savings offered by subthreshold operation, great effort
has been put forth in understanding transistor behavior in subthreshold designs. Alioto
derives the direct current (DC) behavior of subthreshold CMOS logic in [2], where both
simplified large-signal and small-signal models of MOS transistors in the subthreshold
region are introduced. Further investigation into the robust operation of subthreshold logic
is performed by Gupta et al. in [15], who note an increase in the susceptibility to process
variations.
In addition to process variations in subthreshold design, transistor sizing also presents
a design challenge. In order to size the devices for equal rise and fall times of a logic gate,
[15] analyzes the subthreshold swing difference between PMOS transistors and NMOS
transistors, as well as the threshold voltage of the device. A sizing framework is developed
by Keane et al. that offers design rules for complex gates with stacks of transistors in [22].
A methodology for designing standard cells in subthreshold is developed in [20]. This
work identifies the minimum-energy point for subthreshold operation and designs a sub-
threshold standard cell library for minimum energy while maintaining performance. Addi-
tionally, the author implements cryptographic circuits in subthreshold to mitigate the threat
of side-channel power attacks.
In [3], the author expands the work in [20] for improved performance. Starting with the
subthreshold standard cells from [20], performance enhancements including substrate bias-
ing and charge-boosting are implemented. Three substrate biasing techniques are utilized,
3
including drain-drain, gate-gate, and supply-ground biasing. These performance improve-
ments result in a 10X improvement in the frequency and a 2X increase in the energy-
delay product of the circuit. Due to the higher energy consumption, a cell placement algo-
rithm is presented based on the critical path method. This algorithm integrates the original
subthreshold and performance-enhanced subthreshold standard cells into a single design,
yielding a 50% reduction in the energy-delay product.
Because of the performance degradation of subthreshold circuits, applications utilizing
this design technique have incorporated a hybrid approach. Henry et al. utilize a hybrid
methodology in the design of a low power Fast Fourier Transform (FFT) architecture [18].
In this FFT design, the memory elements operate in superthreshold, while the processing
elements operate in subthreshold. To compensate for the low operating frequency of the
subthreshold processing elements, identical copies are replicated and the FFT operations
are parallelized, reducing power consumption by 65% while maintaining the same through-
put as the high-performance serial design, at a significant area cost.
In the hybrid design methodology, an interface between the superthreshold memory
elements and the subthreshold processing elements is necessary; the logic high output of
a subthreshold gate cannot drive a superthreshold gate without the use of level shifters.
Standard level shifters are examined by Chavan et al. in [8], where additional level shifters
for the up-conversion of subthreshold signals to superthreshold levels are also presented.
An alternative hybrid design approach is demonstrated by Shau in [31]. In this ap-
proach, a circuit initially operates in subthreshold. Small functional modules switch to
a higher nominal voltage when they are in use; after the operation completes, the circuit
switches back to the subthreshold supply voltage. This design methodology circumvents
the performance degradation of subthreshold design at the cost of a control unit to deter-
mine when the nominal supply voltage needs to be applied.
More recently, nearthreshold designs and associated hybrid methodologies have been
explored as well. It is speculated that nearthreshold operation will emerge as a popu-
lar choice for mainstream processors [5]. Direct comparisons of the subthreshold and
4
nearthreshold regions are analyzed in [13], where an 8-bit processor is operated at various
supply voltages. At the minimum-energy point in subthreshold, the operating frequency is
limited to the sub-350KHz range, with a minimum energy consumption of 3.52pJ/instruction.
The nearthreshold processor operates above 1.79MHz and still offers a 6.6X reduction in
energy consumption when compared to the superthreshold implementation, which con-
sumes 33.1pJ/instruction. Additional work in the area of nearthreshold operation includes
a pass-transistor based logic style that outperforms standard CMOS [25].
Dreslinski et al. developed a parallel nearthreshold architecture for a chip multiproces-
sor [14]. In this design, superthreshold memory elements interact with parallel nearthresh-
old cores, where a 53% energy savings is observed in performance benchmarks. In ad-
dition, Intel announced its creation of an energy-efficient hardware accelerator in July
2010 [1]. This reconfigurable prototype 45nm chip contains nearthreshold optimized cir-
cuits, and is targeted for real-time on-die encryption and decryption of media content. The
nearthreshold operation yields an 8X energy savings in this design.
The aforementioned hybrid design methodologies integrate subthreshold or nearthresh-
old elements at the block level, replacing entire functional modules with parallel processing
elements containing low-energy elements. In this work, both subthreshold and nearthresh-
old combinational CMOS logic are integrated at the sub-block level. Decreasing the gran-
ularity of the logic replacement takes advantage of timing slack that is present along non-
critical path logic, while providing greater flexibility. Because timing constraints are sat-
isfied, the replication of functional blocks is not required to maintain throughput, saving
significant area overhead when compared to the parallel hybrid designs. Integrating both
subthreshold and nearthreshold elements provides two possible replacements, taking full
advantage of the timing slack for each logic gate. To allow for the integration of these
cells based on the critical path method, there must be no feedback loops in the directed
acyclic graph representation of the circuit. In conventional circuit design, feedback loops
exist mostly for sequential elements. Hence, only combinational cells are replaced in the
proposed methodology. This restriction mitigates process variations in sequential elements.
5
Chapter 2
Ab Initio of Subthreshold/Nearthreshold
Operation
Digital subthreshold operation has been garnering more attention since Vittoz et al. demon-
strated the breakthrough 0.95V subthreshold operation of a CMOS frequency divider with
2MHz performance [35]. More recently, nearthreshold operation has been considered as
a viable alternative, where the magnitude of the transistor drain current falls between the
subthreshold and superthreshold operation.
In this chapter, brief introductions to both subthreshold and nearthreshold operation
are presented. Subthreshold operation is introduced in Section 2.1, where the equation for
subthreshold current is presented. Operation in the nearthreshold region is then discussed
in Section 2.2.
2.1 Subthreshold Operation
In subthreshold (or weak inversion) operation, the flow of current is diffusion-based. In
this mode, the gate-source voltage (VGS) is less than the threshold voltage (Vt), VGS < Vt.
The transistor operates through the modulation of leakage current, and this results in an
exponential decrease in the drain current strength. Because the drive current is weaker in
subthreshold circuits, the transistors switch slower, resulting in inferior performance when
compared to traditional superthreshold designs [18]. The drain current in the subthreshold
6
region is shown in Equation 2.1 [16].
Ion−sub =
W
Leff
µeffCox(m− 1)V 2T exp
Vgs − Vt
mVT
(1− exp −Vds
VT
) (2.1)
In the above equation,W is the width of the transistor, Leff is the effective length of the
transistor, µeff is the effective mobility, Cox is the oxide capacitance, m is the subthreshold
slope factor, and VT is the thermal voltage, where VT = (kTq ). In the computation of the
thermal voltage, k is the Boltzmann constant, T is the absolute temperature, and q is the
magnitude of the electrical charge on an electron.
In subthreshold, VGS does not exceed the requirement for carrier inversion in the source-
drain channel. However, it does reduce the potential barrier created by the source-channel
junction, shown in Figure 2.1. As a result, electrons are injected from the source into the
channel. The diffusion current in subthreshold is present due to these injected electrons,
where minority carriers flow [36].
n+  n+ 
Gate Oxide 
p 
V B = 0 V 
V G 
V D 
I SUB 
e 
e 
e e 
e  e e  e 
e  e  e  e 
e e 
e 
e 
e 
e 
e 
V GS < V t 
V S
Figure 2.1: NMOS Transistor Diffusion Current in Subthreshold
Although subthreshold circuits operate at a lower frequency, there are benefits when
analyzing the energy consumption of the corresponding circuits. Energy is comprised of
two components, dynamic energy and static energy. Dynamic energy, the energy due to the
charging and discharging of load capacitances, is calculated using Equation 2.2,
7
Edynamic =
1
2
CLV
2
ddα (2.2)
where CL is the load capacitance and α is the activity factor [27]. From the dynamic energy
equation, the energy savings of reducing the supply voltage into the subthreshold regime is
apparent; lowering the supply voltage, with its quadratic relationship, has a drastic impact
on the overall dynamic energy.
Static energy, the energy due to the leakage current, is calculated using Equation 2.3,
Estatic = IleakageVddtp (2.3)
where tp is the propagation delay [27].
In subthreshold operation, while the supply voltage is reduced, the propagation delay
increases. Therefore, the benefit of utilizing the subthreshold mode of operation is mainly
observed through dynamic energy savings.
2.2 Nearthreshold Operation
The exponential reduction in drive current observed in subthreshold operation is a major
drawback impeding its widespread use in practical applications. By increasing the supply
voltage to a value that is slightly above the threshold voltage of the transistor, the perfor-
mance loss is reduced while maintaining a significant portion of the energy savings [42].
While subthreshold current is dominated by diffusion current, the drain current under
nearthreshold (or moderate inversion) operation is predominantly drift-based. In nearthresh-
old, a positive gate-source voltage (VGS) is applied to the circuit, where VGS > Vt, demon-
strated in Figure 2.2.
Because VGS exceeds the threshold voltage of the transistor, the holes in the source-
drain channel are repelled from the positive gate voltage. Thus, the channel is inverted
from a p-type region to an n-type region, where electrons are the majority carriers [36].
8
0.6V
0.6V
VGS > Vt
VDS > (VGS – Vt)
Figure 2.2: NMOS Transistor with Logic High Input, VDD = 0.6V
Electrons flow through the source-drain channel due to the electric field created by the
drain-source voltage (VDS) as shown in Figure 2.3.
n+  n+ 
Gate Oxide 
p 
V B = 0 V 
V G 
V D 
I D 
V GS > V t 
V S
Figure 2.3: NMOS Transistor Drift Current in Nearthreshold
Because the operation is switching above the threshold voltage of the transistor, the
calculation of the drain current follows the typical high-performance design methodology.
Thus, the drain current can be expressed with Equation 2.4, where λ accounts for channel-
length modulation [39].
ID =
1
2
µnCox
W
L
(VGS − Vt)2(1 + λVDS) (2.4)
In nearthreshold operation, a performance degradation of 10X has been observed when
9
compared to traditional circuit design. When this degradation is compared to the 500-
1000X that is incurred when first entering subthreshold, it is apparent that nearthreshold
operation is more likely to be considered in a broad range of applications [13].
2.3 Summary
This chapter provided an introduction to both subthreshold and nearthreshold operation.
The weak drive current of subthreshold devices causes an increased gate delay while pro-
viding substantial energy savings. Nearthreshold logic does not offer energy savings as
substantial as subthreshold, but the stronger drive current results in operation that is orders
of magnitude faster. This behavior in subthreshold and nearthreshold operation influences
the design of the cell-replacement algorithm for the hybrid design methodology.
10
Chapter 3
Standard Cell Library Characterization
for Hybrid Design Methodology
In this chapter, the characterization process of the standard cell libraries is presented. Sev-
eral design parameters for a standard cell library are described in Section 3.1. The expan-
sion of the subthreshold standard cell library from [20] is described in Section 3.2, and
the design of the nearthreshold standard cell library is presented in Section 3.3. The li-
brary characterization process in Synopsys is discussed in Section 3.4. Energy and delay
estimation methodologies are then analyzed in Section 3.5. To interface between the su-
perthreshold, nearthreshold, and subthreshold gates, level shifters are required; the design
of these level shifters is discussed in Section 3.6. The chapter concludes with a summary
of the standard cell library characterization process.
3.1 Design Considerations for Standard Cell Library
In this section, several considerations when designing a standard cell library are presented.
The analysis of these parameters determines the performance, area, and energy consump-
tion of the resulting standard cell library.
Propagation Delay: The propagation delay of a cell corresponds to the time that is re-
quired for the output to respond to a change at the inputs of the gate. Thus, it corresponds to
the delay that is encountered by a signal when passing through the gate [27]. To determine
11
the propagation delay of a cell, the time between the 50% transition points of the input and
output waveforms is measured, similar to that shown in Figure 3.1.
IN
OUT
tp
Figure 3.1: Propagation Delay
Sizing: The goal in transistor sizing of the base inverter is symmetric output and equiv-
alent low-to-high and high-to-low propagation delays [27]. Symmetric output is observed
on the voltage-transfer characteristic curve of the device; the resulting inverter sizing is
then used as the basis for the sizing of all remaining gates in the standard cell library.
Loads: Transistor operation results in the charging/discharging of the output load ca-
pacitance to the supply/ground voltage. The propagation delay of the circuit is proportional
to the load capacitance; therefore, an increase in the load capacitance will result in a lower
operating frequency [21]. Major components of load capacitance include intrinsic capaci-
tance and extrinsic load capacitance. Intrinsic capacitance consists of diffusion and overlap
capacitances, resulting from connections within the gate itself, while extrinsic load capaci-
tance results from additional cells connected to the gate [27].
Drive Strength: In cases where the output capacitance is large, the delay of the circuit
decreases. To rectify this situation, gates with strong drive strength can be utilized. By
increasing the widths of the transistors at the output stage of a standard cell, an increased
current drives these capacitive loads [41]. These gates with larger drive strengths can re-
place gates that utilize the standard transistor sizing when there is a large fan-out or a long
signal wire. Figure 3.2 shows an inverter with the standard drive strength X1 as well as
larger drive strengths X2, X4, and X8, respectively.
12
2µ 
1µ 
4µ 
2µ 
8µ 
4µ 
16µ 
8µ 
Figure 3.2: Inverter with X1, X2, X4, X8 Drive Strengths
Power: Although it has historically been a secondary consideration behind speed and
area, power consumption has emerged as a primary design consideration. From the equa-
tion for instantaneous power in Equation 3.1, it is apparent that a stronger current results
in greater power consumption. Thus, larger transistors consume more power than transis-
tors with smaller sizes. The instantaneous power is also directly proportional to the supply
voltage, so scaling the supply would assist in reducing instantaneous power.
P (t) = iDD(t)VDD (3.1)
Energy: When utilizing the equation for instantaneous power, the energy consumption
of a circuit can easily be computed. The energy consumption of a digital circuit is measured
as the integral of the instantaneous power over a given time period, shown in Equation 3.2.
E =
T∫
0
iDD(t)VDD dt (3.2)
When analyzing digital circuits, a common metric used for comparison among different
gates is the energy-delay product, shown in Equation 3.3 [27], where Pavg is average power,
tp is propagation delay, andCL is load capacitance. This metric combines both performance
and energy into a single value.
EDP = Pavgt
2
p =
CLV
2
DD
2
tp (3.3)
13
3.2 Subthreshold Standard Cell Library
Previous work in [20] included the development of a subthreshold standard cell library
consisting of 32 standard cells. Transistor sizing in the library is optimized for minimum-
energy subthreshold operation at 0.3V with a commercial 65nm model package. The lo-
cation of the minimum-energy operation is a compromise between the dynamic and static
energies; the point of intersection of the dynamic and static energy curves is defined as the
minimum-energy point. For the design of the subthreshold standard cell library, a seven-
stage ring oscillator was used as a test circuit for an inverter to determine optimal sizing.
The ideal aspect ratio is observed when the charging and discharging currents are equal
and a symmetric output is observed. Although Kanitkar identifies an optimal aspect ratio
of 9:1, CMOS logic gates with stacked transistors would face a large area overhead. To
mitigate this area overhead, an aspect ratio of 5:1 is utilized for the subthreshold standard
cells. Table 3.1 presents the available standard cells provided by Kanitkar’s subthreshold
library [20].
To maintain proper operation under high fan-out or long signal wire conditions, the
standard cell elements are replicated with various drive strengths. By increasing the width
of a transistor in the cell’s drive stage, the drive strength of the cell increases proportion-
ately.
In order to provide greater flexibility to the standard cell library, additional combina-
tional logic gates are designed for subthreshold operation, based on the aspect ratio used
by Kanitkar in [20]. These cells include a two-to-four decoder, a half adder, a full adder,
a two-to-one multiplexer, and a four-to-one multiplexer. The netlists for these new stan-
dard cell gates are created based on the optimal transistor aspect ratio used for the original
subthreshold standard cells.
The additions to the existing subthreshold standard cell library increase the cell count
to 80 combinational elements, listed in Table 3.2. A naming convention of the cell name,
followed by the drive strength and a subthreshold identifier, is utilized in the expanded
subthreshold standard cells.
14
Table 3.1: Subthreshold Standard Cell Library From [20]
Cell Name Cell Description Cell Function
AND2 2-input AND X*Y
AND3 3-input AND X*Y*Z
AND4 4-input AND W*X*Y*Z
AO21 3-input AND-OR (A0*A1)+B0
AO22 4-input AND-OR (A0*A1)+(B0*B1)
AO221 5-input AND-OR (A0*A1)+(B0*B1)+C0
AO32 5-input AND-OR (A0*A1*A2)+(B0*B1)
AO321 6-input AND-OR (A0*A1*A2)+(B0*B1)+C0
AOI21 3-input AND-OR-INVERT !((A0*A1)+B0)
AOI22 4-input AND-OR-INVERT !((A0*A1)+(B0*B1))
AOI221 5-input AND-OR-INVERT !((A0*A1)+(B0*B1)+C0)
AOI32 5-input AND-OR-INVERT !((A0*A1*A2)+(B0*B1))
AOI321 6-input AND-OR-INVERT !((A0*A1*A2)+(B0*B1)+C0)
INV 1-input inverter !X
NAND2 2-input NAND !(X*Y)
NAND3 3-input NAND !(X*Y*Z)
NAND4 4-input NAND !(W*X*Y*Z)
NOR0211 2-input NAND, 1 inverted input (!A0)*A1
NOR2 2-input NOR !(X+Y)
NOR3 3-input NOR !(X+Y+Z)
NOR4 4-input NOR !(W+X+Y+Z)
OA21 3-input OR-AND (A0+A1)*B0
OA32 5-input OR-AND (A0+A1+A2)*(B0+B1)
OAI21 3-input OR-AND-INVERT !((A0+A1)*B0)
OAI32 5-input OR-AND-INVERT !((A0+A1+A2)*(B0+B1))
OR2 2-input OR X+Y
OR3 3-input OR X+Y+Z
OR4 4-input OR W+X+Y+Z
XNOR 2-input XNOR (!X*!Y) + (X*Y)
XOR 2-input XOR (X*!Y) + (!X*Y)
15
Table 3.2: Expanded Subthreshold Standard Cells
AND2X1 SUB AOI21X2 SUB NAND2X1 SUB OA21X2 SUB
AND2X2 SUB AOI221X1 SUB NAND2X2 SUB OA32X1 SUB
AND2X4 SUB AOI221X2 SUB NAND2X4 SUB OA32X2 SUB
AND3X1 SUB AOI22X1 SUB NAND3X1 SUB OAI21X1 SUB
AND3X2 SUB AOI22X2 SUB NAND3X2 SUB OAI21X2 SUB
AND3X4 SUB AOI321X1 SUB NAND3X4 SUB OAI32X1 SUB
AND4X1 SUB AOI321X2 SUB NAND4X1 SUB OAI32X2 SUB
AND4X2 SUB AOI32X1 SUB NAND4X2 SUB OR2X1 SUB
AND4X4 SUB AOI32X2 SUB NOR0211X1 SUB OR2X2 SUB
AO21X1 SUB DECODER24X1 SUB NOR0211X2 SUB OR2X4 SUB
AO21X2 SUB FULLADDERX1 SUB NOR0211X4 SUB OR3X1 SUB
AO221X1 SUB HALFADDERX1 SUB NOR2X1 SUB OR3X2 SUB
AO221X2 SUB INVX1 SUB NOR2X2 SUB OR3X4 SUB
AO22X1 SUB INVX2 SUB NOR2X4 SUB OR4X1 SUB
AO22X2 SUB INVX4 SUB NOR3X1 SUB OR4X2 SUB
AO321X1 SUB INVX8 SUB NOR3X2 SUB OR4X4 SUB
AO321X2 SUB MUX21X1 SUB NOR3X4 SUB XNORX1 SUB
AO32X1 SUB MUX21X2 SUB NOR4X1 SUB XNORX2 SUB
AO32X2 SUB MUX41X1 SUB NOR4X2 SUB XORX1 SUB
AOI21X1 SUB MUX41X2 SUB OA21X1 SUB XORX2 SUB
16
3.3 Nearthreshold Standard Cell Library
Based on the design principles discussed in Section 3.1, the optimal aspect ratio of an
inverter for nearthreshold operation is identified, operating at a supply voltage of 0.6V. The
design goals of the inverter sizing include symmetric output and equivalent low-to-high and
high-to-low propagation delays.
With various aspect ratios for the nearthreshold inverter, the voltage-transfer character-
istic (VTC) curves of the circuits are measured utilizing a sweep of input voltage values.
The resulting voltage-transfer characteristic curves are shown in Figure 3.3.
Figure 3.3: Nearthreshold Inverter VTC with 2:1, 3:1, 4:1 Aspect Ratios
17
As identified on the VTC curve, the midpoint voltage with a 2:1 aspect ratio was mea-
sured at 0.30V. When the aspect ratio is increased to 3:1, the increased PMOS strength
results in a rise in midpoint voltage. Hence, the 2:1 aspect ratio results in symmetric out-
put, while increasing the aspect ratio results in less symmetric output for the characteristic
curve.
To satisfy the equivalent propagation delay constraint, both the low-to-high and high-
to-low propagation delays are measured. Because the inverter is a single input gate, there
is only one condition that triggers a rise in output and one condition that triggers a fall in
output. The propagation delay resulting from these triggers are measured for the 2:1 and
3:1 aspect ratioed inverters; the results are shown in Table 3.3.
Table 3.3: Nearthreshold Inverter Propagation Delays
Aspect Ratio 2:1 3:1
tplh 68.97ps 67.39ps
tphl 69.19ps 78.72ps
From the simulation results, it is clear that the 2:1 aspect ratio offers an approximate
equality in the rise and fall propagation delays. With larger aspect ratios, such as the 3:1
ratio, the results demonstrate the dominating strength of the pull-up network over the pull-
down network, as the fall time increases and the rise time decreases.
A 2:1 aspect ratio satisfies both constraints for the design of the nearthreshold inverter;
thus, this aspect ratio is used as the base inverter sizing for the nearthreshold standard cell
library. The same logic gates as the subthreshold standard cell library are implemented at a
supply voltage of 0.6V, providing a set of nearthreshold replacements for the hybrid design
methodology.
18
3.4 Standard Cell Library Characterization in Synopsys
Design flow tools utilize proprietary library formats with custom syntax for explicitly de-
scribing the propagation delays, leakage power, and other physical attributes of cell behav-
ior. A library characterization process is necessary to generate these library formats for the
Synopsys design flow tools used in the hybrid methodology.
The library characterization flow in Synopsys is shown in Figure 3.4. Starting with the
SPICE netlists of the standard cells, this process includes the use of Liberty NCX, HSPICE,
and Library Compiler to create the characterized library file. Several additional input files
are used with the Liberty NCX tool.
Spice Netlists
Cell 
Template Files
NCX 
Command File
Library 
Template File
Synopsys Liberty NCX with HSPICE Simulator
Liberty
Timing File
Synopsys Library Compiler
Characterized
Library File
Figure 3.4: Library Characterization Flow in Synopsys
The NCX command file specifies options for library characterization, including types
19
of timing models to generate. For use with the hybrid design methodology, both Compos-
ite Current Source (CCS) and Non-Linear Delay/Power Modeling (NLDM/NLPM) tim-
ing/power models are used for delay/power measurements. The library template file cus-
tomizes characterization parameters, including the net transition times for signals used to
generate power and timing models for the library cells. Additional parameters such as the
net output capacitance to be used during library characterization are also specified in the
library template. Further parameters are then specified in the cell template files, where each
element of the standard cell library specifies the logic function performed by the cell.
The Liberty NCX tool utilizes the information contained in the input files to perform
HSPICE simulations, generating the timing and power measurements of the standard cells;
the resulting measurements are enumerated in a Liberty timing file. However, for use by the
other Synopsys tools utilized in the proposed design methodology, a Synopsys technology
library format is required. In order to convert the Liberty timing file to the Synopsys tech-
nology library format, Library Compiler is utilized, concluding the library characterization
process.
By utilizing the library characterization process for the subthreshold and nearthreshold
standard cell libraries, the cells are available for integration into a high-performance design.
The development of the initial high-performance design requires the characterization of
a superthreshold standard cell library. Hence, a superthreshold library with the typical
2:1 aspect ratio, containing the same combinational logic gates as the subthreshold and
nearthreshold libraries, is characterized for operation at 1.0V. Six superthreshold sequential
cells are implemented and characterized; these cells are listed in Table 3.4. Nearthreshold
and subthreshold variants of these sequential cells are included in each respective library,
but they are not used in the hybrid design methodology due to sensitivity to variations as
well as the fast access/retrieval rates required for these units.
Table 3.4: Sequential Standard Cell Library Gates
DFFX1 DFF PRE REX1 LATCHX1
DFFX2 DFF PRE REX2 LATCHX2
20
3.5 Gate Delay and Energy Consumption Estimation
The gate delay and energy consumption estimations of the standard cells are vital com-
ponents of the hybrid design methodology. These estimations are used when determining
whether cell replacement is possible as well as when calculating the resulting energy sav-
ings.
3.5.1 Gate Delay Estimation
The delay of a logic gate is dependent upon the previous inputs into the gate as well as the
current inputs. Based on the previous set of inputs into a logic gate, parts of the circuit
may be precharged to VDD or GND, so the required energy for a set of current inputs will
not necessarily be the same. For a 2-input logic gate, all combinations from the truth table
shown in Table 3.5 must be considered.
Table 3.5: 2-Input Logic Gate Input Combinations
Previous Inputs Current Inputs
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
However, not all input combinations will trigger a measurable gate delay, as it may
not trigger a change in the output of a gate. For example, consider a 2-input AND gate.
21
The input combinations and resulting transitions are shown in Table 3.6. Note that only
three input combinations trigger a rise transition and three input combinations trigger a fall
transition; the remaining ten transitions yield no change in gate output.
Table 3.6: 2-Input AND Logic Gate Input Combinations
Previous Inputs Current Inputs Transition
0 0 0 0 No change
0 0 0 1 No change
0 0 1 0 No change
0 0 1 1 Rise
0 1 0 0 No change
0 1 0 1 No change
0 1 1 0 No change
0 1 1 1 Rise
1 0 0 0 No change
1 0 0 1 No change
1 0 1 0 No change
1 0 1 1 Rise
1 1 0 0 Fall
1 1 0 1 Fall
1 1 1 0 Fall
1 1 1 1 No change
To determine the maximum gate delay, the propagation delays resulting from each input
combination that triggers a rise transition or a fall transition are measured. The maximum
gate delay is obtained as the sum of the maximum rise transition and the maximum fall tran-
sition, based on the relationship between these delay values and the maximum switching
frequency of a gate shown in Equation 3.4.
fmax =
1
tphl + tplh
(3.4)
To observe an example of this methodology, consider a 3-input AND gate. There are
seven possible previous/current input combinations that trigger a rise transition, and seven
combinations that trigger a fall transition. The simulation results of the rise and fall triggers
are shown in Figures 3.5 and 3.6, respectively.
22
Figure 3.5: Rise Transitions for 3-Input AND Gate
Figure 3.6: Fall Transitions for 3-Input AND Gate
23
From the 3-input AND gate simulation results, the longest rise transition time is mea-
sured at 63.60ps and the longest fall transition time is measured at 61.93ps. Thus, the gate
delay for the superthreshold 3-input AND gate is calculated to be 125.53ps.
The delay measurements of all superthreshold, nearthreshold, and subthreshold logic
gates are located in Appendix A.
3.5.2 Energy Estimation
The energy consumption of each logic gate is estimated based on the input combinations
that trigger a rise or fall transition at the output node of the gate.
The current drawn from the power supply is measured during the entire simulation
period of the output-triggering input combinations. This measurement is multiplied with
the power supply voltage, yielding the total energy consumption during the simulation.
The total energy measurement is divided by the number of input combinations that were
utilized in the simulation, yielding the average energy consumption per transition period.
Throughout the design methodology, a switching activity factor of 0.5 is used. Thus, the
average energy consumption of each cell, per transition, is computed using Equation 3.5,
EAV G =
α
T∫
0
iDD(t)VDD dt
TransitionCount
(3.5)
where α is the switching activity factor, iDD(t) is the instantaneous current, VDD is the
supply voltage, and TransitionCount is the number of transitions triggering a change in
output over time period T . The resulting energy consumption measurements for a subset
of the subthreshold and nearthreshold libraries are shown in Figure 3.7.
The energy consumption estimates for each of the superthreshold, nearthreshold, and
subthreshold standard cells are presented in Appendix A.
24
0.50
0.60
0.70
0.80
0.90
1.00
E n
e r
g y
 
C
o
n
s u
m
p t
i o
n
 
[ f J
]
Standard Cell Energy Consumption
0.00
0.10
0.20
0.30
0.40
A
N
D
2 X
1
A
N
D
3 X
1
A
N
D
4 X
1
A
O
2 1
X
1
A
O
2 2
1 X
1
A
O
2 2
X
1
A
O
3 2
1 X
1
A
O
3 2
X
1
A
O
I 2
1 X
1
A
O
I 2
2 1
X
1
A
O
I 2
2 X
1
A
O
I 3
2 1
X
1
A
O
I 3
2 X
1
D
E C
O
D
E R
2 4
X
1
F U
L L
A
D
D
E R
X
1
H
A
L F
A
D
D
E R
X
1
I N
V
X
1
M
U
X
2 1
X
1
M
U
X
4 1
X
1
N
A
N
D
2 X
1
N
A
N
D
3 X
1
N
A
N
D
4 X
1
N
O
R
0 2
1 1
X
1
N
O
R
2 X
1
N
O
R
3 X
1
N
O
R
4 X
1
O
A
2 1
X
1
O
A
3 2
X
1
O
A
I 2
1 X
1
O
A
I 3
2 X
1
O
R
2 X
1
O
R
3 X
1
O
R
4 X
1
X
N
O
R
X
1
X
O
R
X
1
E n
e r
g y
 
C
o
n
s u
m
p t
i o
n
 
[ f J
]
Standard Cell
Near-Vt Sub-Vt
Figure 3.7: Average Energy Consumption of Nearthreshold and Subthreshold Cells
3.6 Level Shifter Design
In order to interface between the superthreshold, nearthreshold, and subthreshold cells,
appropriate level shifters must be placed. These level shifters are required when the output
of a lower supply voltage is being used as the input to a higher supply voltage cell, as
shown in Figure 3.8. The level shifter is required because the logic high output of the
subthreshold gate does not meet the minimum logic high input of the superthreshold gate.
The three combinations that are required for the hybrid methodology include interfacing (i)
subthreshold to superthreshold, (ii) subthreshold to nearthreshold, and (iii) nearthreshold
to superthreshold. The use of down level shifters, where a high voltage is shifted to a low
voltage level, is not required in this design methodology.
25
Subthreshold Logic Superthreshold LogicLevel Shifter
1.0V0.3V
Figure 3.8: Level Shifter Interface Between Subthreshold and Superthreshold Logic Cells
A conventional level shifter design is shown in Figure 3.9. With a logic high input,
MN1 is turned on and MN2 is turned off. Due to the conducting path to GND through
MN1, MP2 turns on, and the input is shifted from a lower voltage to a higher voltage
through the output inverter. This design requires an equal pull-up/pull-down ratio. Other-
wise, contention will result at nodes A and B, resulting in crowbar current flows during
switching events [38].
IN
OUT
VDD
MP1 MP2
MN1 MN2
A B
Figure 3.9: Conventional Level Shifter
When operating the conventional level shifter at subthreshold levels, the NMOS tran-
sistors are being driven by a very weak gate voltage, and there is contention between the
pull-up network and pull-down network of the circuit. This drive strength of MN1 can-
not overcome the drive strength of MP1, preventing node A from being pulled to GND.
26
Thus, the conventional level shifter is impractical for the interfacing of subthreshold volt-
ages [17].
A level shifter proposed specifically for subthreshold to above-threshold operation is
shown in Figure 3.10. The operation of this circuit acts as a constant current mirror with
the MP2 body tied to the B1 node [8]. The sizing for the PMOS-to-NMOS aspect ratio is
1:10; this sizing is necessary to increase the strength of the NMOS transistors being driven
by weak input voltages.
IN
OUT
VDD
MP1 MP2
MN1 MN2
B1B1
Figure 3.10: Constant Current Mirror With Body Ties Level Shifter
While the constant current mirror is operational at subthreshold levels, constant static
power consumption makes it undesirable for the hybrid design methodology; the static
power would cripple any potential energy savings that would be possible in a circuit with
this level shifter. To observe a static power path, consider an example with a logic-high
input voltage. With this scenario, as shown in Figure 3.11, the B1 node is effectively
grounded, turning MP1 on. Thus, there is a static path from VDD to GND. Hence, the
constant current mirror with body ties is not a suitable level shifter for the proposed design
methodology.
Less energy overhead is observed in a variation of the conventional level shifter, pro-
posed to mitigate the contention between the pull-up and pull-down networks. This contention-
mitigated level shifter is shown in Figure 3.12. With the contention-mitigated design, the
27
IN
OUT
VDD
MP1 MP2
MN1 MN2
B1B1
Figure 3.11: Static Current in Constant Current Mirror With Body Ties Level Shifter
logical values of the nodes that are driving the MP1 and MP2 PMOS transistors switch
faster and with less contention, as these nodes are being driven by a quasi-inverter [38].
IN
OUT
VDD
MP1 MP2
MN1 MN2
MP3 MP4
Figure 3.12: Contention-Mitigated Level Shifter
With the contention-mitigated level shifter design, there is no direct static path as seen
in the constant current mirror, and the contention is reduced when compared to the conven-
tional circuit design. Thus, this level shifter design results in minimum-energy operation
28
and is used in the hybrid design methodology. For the subthreshold to nearthreshold volt-
age level shifter, a PMOS-to-NMOS aspect ratio of 1:10 is utilized to provide additional
strength to the NMOS being driven by the subthreshold input voltage. For the nearthreshold
to superthreshold voltage level shifter, all gates are operating above the threshold voltage,
so a 1:1 aspect ratio is utilized, where all gate widths are 65nm. For the subthreshold to
superthreshold voltage level shifter, the subthreshold to nearthreshold and nearthreshold to
superthreshold level shifters are simply cascaded together.
The delays and energy consumption of the three level shifters utilized in the hybrid
design methodology are presented in Table 3.7. A naming convention is also identified that
is used throughout the remainder of the document.
Table 3.7: Level Shifter Delays and Energy Consumption
Level Identifier Delay Energy
Shifter [ns] [fJ]
0.3V→ 0.6V S2N 63.1110 1.4279
0.6V→ 1.0V SUB 0.4691 1.2301
0.3V→ 1.0V NEAR 68.1040 3.2338
3.7 Summary
In this chapter, the design parameters of a standard cell library were discussed. These
design parameters had previously been applied to the subthreshold standard cell library,
which was expanded in this work with gates of increased drive strength. The nearthresh-
old standard cell library was designed based on a 2:1 aspect ratio, and the standard cell
library characterization process was conducted for the superthreshold, nearthreshold, and
subthreshold standard cell libraries. Simulations were performed to allow for the estima-
tion of the gate delay and energy consumption for use in the hybrid design methodology.
A brief survey of subthreshold level shifter design was also conducted, and a contention-
mitigated level shifter resulted in the best performance in regards to energy consumption
for the three required level shifter designs.
29
Chapter 4
Critical Path Method for Cell Placement
In this chapter, an optimization algorithm known as the critical path method is presented,
which allows for the placement of nearthreshold and subthreshold cells. An overview of
the goal of the critical path method and cell placement algorithm is provided in Section
4.1. The computational steps of the critical path method are discussed in Section 4.2.
An analysis of the application of the critical path method to digital circuit design is then
provided in Section 4.3. The chapter concludes with a summary of the cell placement
optimization algorithm.
4.1 Overview
After the initial synthesis of the Register-Transfer Level (RTL) model, only the superthresh-
old versions of the standard cells are utilized in the design. While this restriction ensures
that the gate is operating with the fastest cells available to the designer, it also implies that
the versions of the cells with the highest energy consumption are also being used. For gates
that do not lie along the critical path of the circuit, the path with the longest delay, the
speed benefit provided by the superthreshold gate may be unnecessary. If the nearthreshold
or subthreshold variant of the gate still supplies the necessary signal before the critical path
signal is ready, then the gate can be converted to one of these low-power variations without
impacting the operating frequency of the circuit. The goal of the critical path method is to
30
identify which superthreshold gates can be replaced with the nearthreshold and subthresh-
old implementations.
4.2 Critical Path Method Algorithm
The general algorithm for the critical path method is presented in Algorithm 1 [28]. The
details of each step of the algorithm are described in its respective subsection.
Algorithm 1 Critical Path Method: Compute slack of each node in a directed acyclic graph
Require: Directed Acyclic Graph
Identify Nodes and Node Arcs
Identify Successors and Predecessors
Compute Early Start T imes
Compute Late Start T imes
Calculate Slack of Each Node
4.2.1 Identify Nodes and Node Arcs
The input requirement for the critical path method is a directed acyclic graph. This restric-
tion requires that there are no cycles present in the graph, and there is a definite direction
of flow. Each arc on the directed acyclic graph has an associated delay. An example of a
directed acyclic graph that may be used as the input to the critical path method is shown in
Figure 4.1.
A
B
C
D
E
F
5
3
2
5
5
4
G
1
Figure 4.1: Directed Acyclic Graph
31
At the beginning of the algorithm, START and FINISH nodes are placed at the begin-
ning and end of the directed acyclic graph, respectively. The arcs that connect to these
special nodes are given a delay value of zero. The updated directed acyclic graph is shown
in Figure 4.2.
A
B
C
D
E
F
5
3
2
5
5
4
G
1
0
0
0
START FINISH
0
0
Figure 4.2: Directed Acyclic Graph with Start and Finish Nodes
With the updated directed acyclic graph, a table consisting of each arc and correspond-
ing arc delay is built. The table corresponding to the example directed acyclic graph is
shown in Table 4.1.
Table 4.1: Node Arcs and Delays
Arc Delay
START→ A 0
START→ B 0
START→ C 0
A→ D 5
B→ D 3
B→ E 4
C→ E 5
D→ F 5
E→ F 2
E→ G 1
F→ FINISH 0
G→ FINISH 0
32
4.2.2 Identify Successors and Predecessors
In order to be able to determine the sequence in which the nodes of the graph are encoun-
tered, it is necessary to know when a node can begin processing. A node is a predecessor
of a given node X if it must be completed before node X can begin processing. Similarly, a
node is a successor of node X if node X must be completed before it can begin processing.
Examples of predecessor and successor identification for a given node are shown in Figures
4.3(a) and 4.3(b).
B
C
E
5
4
E
F
2
G
1
(a) (b)
Figure 4.3: (a) Predecessors and (b) Successors of Node E
The successors and predecessors are stored in a table similar to Table 4.2, which iden-
tifies the predecessors and successors for the given example graph.
Table 4.2: Node Predecessors and Successors
Node Predecessors Successors
START - A, B, C
A START D
B START D, E
C START E
D A, B F
E B, C F, G
F D, E FINISH
G E FINISH
FINISH F, G -
33
4.2.3 Compute Early Start Times
In order for a node to be processed, all of the node’s predecessors must have completed their
operation. Therefore, the earliest that a node can start processing is equal to the length of
the path from START to the node with the longest delay. In order to determine this path, a
breadth-first traversal of the graph is performed while keeping track of the longest delay to
reach each node. The early start times for the example graph are given in Table 4.3. The
critical path shows the path with the longest delay that must be traveled before a node can
start to process.
Table 4.3: Early Start Times
Node Early Start Time Critical Path
START 0 -
A 0 START-A
B 0 START-B
C 0 START-C
D 5 START-A-D
E 5 START-C-E
F 10 START-A-D-F
G 6 START-C-E-G
FINISH 10 START-A-D-F-FINISH
4.2.4 Compute Late Start Times
The late start time is the latest time that a node can start processing and still finish before
the output is needed by its successors. Computation of the late start time is similar to the
process of identifying the early start time, except the process begins at the FINISH node and
works backwards towards the START node. Thus, the late start time of a node is computed
by subtracting the longest path from the node to the FINISH node from the longest-delay
path of the entire graph. The resulting late start times for the example graph are shown in
Table 4.4.
34
Table 4.4: Late Start Times
Node Late Start Time Critical Path
START 0 FINISH-F-D-A-START
A 0 FINISH-F-D-A
B 2 FINISH-F-D-B
C 3 FINISH-F-E-C
D 5 FINISH-F-D
E 8 FINISH-F-E
F 10 FINISH-F
G 10 FINISH-G
FINISH 10 -
4.2.5 Calculate Slack of Each Node
With the early start times and late start times computed, the slack of each arc (i, j) in the
graph can be calculated by Equation 4.1, where tij is the delay of arc (i, j).
Slack(i, j) = LateStartT ime(j)− EarlyStartT ime(i)− tij (4.1)
The slack of an arc shows the available time that the arc delay could be extended without
affecting the path of the graph with the longest delay. This means that the arc could utilize
the entire slack available to it, and it would still finish its operation at or before the time
that the critical path requires its completion. The slacks of the example graph are shown in
Table 4.5. The arcs for which the slack time is zero are the arcs that lie along the critical
path of the graph.
4.3 Application to Digital Circuits for Cell Placement
The critical path method described in the preceding section can be applied to digital circuits
to determine whether a standard superthreshold cell can be replaced with a nearthreshold
or subthreshold cell.
35
Table 4.5: Slack Times
Arc Slack
START→ A 0
START→ B 2
START→ C 3
A→ D 0
B→ D 2
B→ E 4
C→ E 3
D→ F 0
E→ F 3
E→ G 4
F→ FINISH 0
G→ FINISH 4
4.3.1 Circuit Representation as Directed Acyclic Graph
In order for the critical path method to be applicable to a digital circuit, the circuit must be
able to be represented as a directed acyclic graph. For combinational logic, this task can be
achieved by representing standard cells as the arcs of the graph, with the input and output
pins as the nodes of the graph. Consider the combinational logic circuit shown in Figure
4.4.
A
B
C
OUT
N2
N1 N3
Figure 4.4: Combinational Logic Circuit
The above circuit can be represented as the directed acyclic graph as shown in Figure
4.5.
36
AB
C
N1
AND
START FINISH
N3
N2
OUT
AND
NAND
NAND
AND
AND
INV
0
0
0
0
Figure 4.5: Combinational Logic Circuit as Directed Acyclic Graph
Because sequential logic circuits often incorporate feedback loops into the circuit de-
sign, special design considerations must be made when building the directed acyclic graph
for sequential circuits. Consider the simple sequential circuit of Figure 4.6.
D Q OUTA
B
CLK
N1
N2 N3
Figure 4.6: Sequential Logic Circuit
Upon initial examination, it appears that the feedback loop would render the critical
path method unusable; however, the sequential element, the D-flip-flop, can be represented
as two nodes, one node for the flip-flop input and one node for the flip-flop output. Thus,
in the example circuit of Figure 4.6, nodes N2 and N3 are represented as unique nodes
in the graph representation of the circuit. This design methodology is acceptable because
in sequential elements such as flip-flops, the only time that the input is read or the output
changes is on the active edge of the clock. The time between the active edges of the clock is
used by the combinational logic to perform its function and set up the next input to be read
by the flip-flop, on the next active edge of the clock. Therefore, the critical path method
can be applied to the combinational elements that perform the logic functions that set up
the input to the sequential elements. The outputs of the flip-flops are then treated as inputs
37
to the circuit. For the sequential circuit shown above, the resulting directed acyclic graph
is shown in Figure 4.7.
A
B
N3
N1
AND
START FINISH
AND
0
0
0
0
OUT
N2
INV AND
AND
Figure 4.7: Sequential Logic Circuit as Directed Acyclic Graph
Following this convention of creating directed acyclic graphs, the critical path method
can be applied to both combinational and sequential logic circuits.
4.3.2 Cell Placement Algorithm
The goal of the cell placement algorithm is to minimize energy consumption by integrating
subthreshold and nearthreshold logic into a high performance design while satisfying tim-
ing constraints. For the directed acyclic graph that represents the circuit being analyzed, the
delay time for the arc is the propagation delay of the standard cell that it represents. There-
fore, during the algorithm execution, the superthreshold propagation delays are utilized. At
the end of the critical path method, the resulting slack time for each superthreshold logic
gate in the circuit has been computed. At this point, the cell placement algorithm deter-
mines if there is enough slack time to replace the superthreshold cell with the functionally-
equivalent nearthreshold or subthreshold cell, with the objective of replacing the logic gate
with the lowest-energy gate possible, taking a greedy-subthreshold approach.
The algorithm starts at the FINISH node and works through the circuit in reverse-order
traversal until it eventually reaches the START node. During the processing of each gate,
the algorithm determines if a level shifter would be needed if the cell is replaced with a
38
nearthreshold or subthreshold cell, and takes this additional delay into consideration. If
the slack time exceeds the time required to insert a subthreshold cell and level shifter (if
needed), then the logic gate is converted to a subthreshold gate. If the slack time does not
meet the subthreshold criterion, but meets the time required for the nearthreshold cell and
level shifter (if needed), then the superthreshold gate is replaced with a nearthreshold gate.
After the gate has been updated, the new late start time for the arc is computed, taking the
additional delay into consideration. The algorithm is summarized in Algorithm 2.
Algorithm 2 Greedy-Subthreshold Cell Placement Algorithm
Require: Completion of critical path method for slack times
for all arcs in reverse-order traversal do
RequiredSlack ← tp,subthreshold − tp,superthreshold
if LevelShifterRequired then
RequiredSlack ← RequiredSlack + tp,sub to gate levelshifter
end if
if RequiredSlack ≥ AvailableSlack then
Replace superthreshold cell with subthreshold cell
else
RequiredSlack ← tp,nearthreshold − tp,superthreshold
if LevelShifterRequired then
RequiredSlack ← RequiredSlack + tp,near to gate levelshifter
end if
if RequiredSlack ≥ AvailableSlack then
Replace superthreshold cell with nearthreshold cell
end if
end if
Update late start time of arc nodes
end for
4.3.3 Performance Degradation Factor
To provide greater flexibility during the design process, a methodology to reduce the oper-
ating frequency of the circuit is provided through a performance degradation factor (PDF).
When a PDF is specified, the Late Start Time of each cell in the circuit is multiplied by
39
this degradation factor, providing additional time for the operation to complete. This addi-
tional processing time is reflected in slack time calculation, increasing the likelihood that
enough slack time is present to allow for cell placement. The cell placement algorithm is
not affected by the presence of a PDF.
4.4 Summary
The critical path method and cell placement algorithm have been presented for the analysis
of both combinational and sequential circuits. The algorithms calculate the slack time that
is available to each logic gate in a circuit and determine if there is enough slack time to
replace the superthreshold gate with a nearthreshold or subthreshold gate. A performance
degradation factor is also discussed to allow for the insertion of more nearthreshold and
subthreshold cells at the cost of slower performance; in cases where a performance degra-
dation factor of 1 is provided by the designer, there is no change in the operating frequency
of the original circuit.
40
Chapter 5
Hybrid NESST Design Methodology
This chapter describes the Hybrid NESST (NEar/Super/SubThreshold) design methodol-
ogy. The hybrid design methodology is discussed in Section 5.1, and a heuristic for the
hybrid design methodology is presented in Section 5.2.
5.1 Hybrid NESST
The proposed Hybrid NESST design methodology integrates the characterized superthresh-
old, nearthreshold, and subthreshold libraries, resulting in a hybrid super/near/subthreshold
design methodology. A summary of the proposed design methodology is shown in Figure
5.1.
The Hybrid NESST user interface assists the designer in utilizing the hybrid design
methodology. Starting with an RTL model of the circuit, the design is synthesized, re-
sulting in a gate-level netlist utilizing only superthreshold gates. The automated design
methodology prompts the user to identify the type of constraint towards which the design
methodology should attempt to satisfy. During the execution of the Hybrid NESST, the
user is prompted to specify any performance degradation that is acceptable in regards to
the operating frequency of the resulting circuit. By prompting the user for the acceptable
PDF, the resulting circuit in the design methodology may result in significantly lower en-
ergy consumption, as slower clock speeds will allow for more of the slower subthreshold or
nearthreshold devices to replace the faster superthreshold devices. The user also specifies
41
the design constraints in terms of timing, energy or both timing and energy constraints. The
Hybrid NESST design methodology operates towards satisfying these constraints as shown
below:
• Timing Constraint: Utilize the maximum PDF that meets the timing constraint.
• Energy Constraint: Utilize the minimum PDF that meets the energy constraint.
• Timing and Energy Constraints: Utilize the minimum PDF that meets the energy
constraint and verify that this PDF satisfies the timing constraint.
At the end of the design methodology, the performance degradation and energy savings
of the optimized design are available to the user. For the energy savings, the estimation
models that were developed for the standard cell libraries are utilized. The design method-
ology creates a new transistor-level netlist based off of the new design, for use in HSPICE
simulations to verify the functionality of the optimized design.
In addition to the hybrid approach, the Hybrid NESST design methodology also allows
the designer to investigate the fully subthreshold or fully nearthreshold implementations of
the circuit. In these implementations, all combinational and sequential logic is converted
to the respective supply voltage; thus, no level shifters are incorporated into the design and
the output of the circuit is at the respective supply voltage. The expected energy savings
and the maximum operating frequency of the circuit are provided at the end of the design
methodology.
The flowchart of the Hybrid NESST user interface is shown in Figure 5.2.
42
NY
RTL Model
Functional 
Verification 
(ModelSim)
Synthesis 
(Design 
Compiler)
Identify User 
Design 
Constraints
Critical Path 
Method and 
Cell Placement 
Algorithms
User 
Constraints 
Met?
Transistor-
Level Netlist 
Generation
Figure 5.1: Hybrid Super/Near/Subthreshold Design Methodology
43
Hybrid NESST 
Program Start
Subthreshold, 
Nearthreshold, or 
Hybrid?
Read RTL Model
Convert Netlist 
to Subthreshold
Constraints: Timing, 
Energy, or Both?
Sub Hybrid
Near
BothTiming
Energy
Synthesize Design 
in Superthreshold
Read RTL Model
Synthesize Design
in Superthreshold
Convert Netlist
to Nearthreshold
Determine Max 
Operating 
Frequency
Determine Max 
Operating 
Frequency
Estimate Energy and 
Create Log
Estimate Energy and 
Create Log
Identify 
Maximum PDF
Determine if 
Constraints Met At 
Maximum PDF
Read RTL Model
Synthesize Design
in Superthreshold
Set Maximum PDF 
to 4000
Identify Maximum 
PDF
Identify Smallest 
PDF Meeting 
Constraints
Estimate Energy and 
Create Log
Optimize at the 
Maximum PDF
Estimate Energy and 
Create Log
Figure 5.2: Hybrid NESST Program Design Methodology
44
5.2 Heuristic for Hybrid Energy Savings Estimation
When the hybrid design option of the Hybrid NESST methodology is utilized, the resulting
circuit will integrate nearthreshold and subthreshold CMOS logic. In order for energy
savings to be observed, the optimized logic must consume less energy than the original
superthreshold implementation, including the energy overhead of required level shifters.
A heuristic approach has been developed that identifies the required combinational logic
depth to exhibit energy savings with nearthreshold or subthreshold cell replacements.
The design of the heuristic approach assumes that a logic sequence of N combinational
stages will require one level shifter at the end of the final combinational logic gate, similar
to the logic shown in Figure 5.3.
...
N Stages
Figure 5.3: Heuristic Logic Sequence of N Combinational Stages
The energy of the unoptimized design is computed using Equation 5.1,
Eunopt = (N)(Eavg,superV t) (5.1)
whereN is the number of combinational logic stages and Eavg,superV t is the average energy
consumption of a superthreshold logic gate in the current design.
The average subthreshold or nearthreshold energy consumption per gate can then be
used to determine the optimized energy consumption. The energy overhead of a single
45
level shifter is also incorporated into the calculation of the optimized energy, as shown in
Equation 5.2,
Eopt = (N)(Eavg,subV t/nearV t) + ELS (5.2)
where N is the number of combinational logic stages, Eavg,subV t/nearV t is the average en-
ergy consumption of a subthreshold/nearthreshold logic gate in the current design, andELS
is the energy consumption of the corresponding level shifter.
In order to determine the required number of logic stages to exhibit energy savings
with the optimized design, the required N to satisfy the inequality of Eopt <Eunopt must
be computed. When the combinational logic depth exceeds this value of N , a reduction
of energy is observed. When the logic depth does not meet this requirement, the overhead
of the level shifter masks the energy savings obtained by converting to nearthreshold or
subthreshold logic. Thus, the designs that yield the greatest energy savings are designs
with deeper combinational logic.
5.3 Summary
This chapter presented the Hybrid NESST design methodology, which allows for the inte-
gration of subthreshold and nearthreshold CMOS logic into a high-performance design. A
heuristic approach for estimating the required combinational logic depth to exhibit energy
savings in the hybrid approach was also presented.
46
Chapter 6
Results and Analysis
This chapter presents results and subsequent analysis of the Hybrid NESST design method-
ology for multiple benchmark circuits. All circuits utilize the commercial 65nm models
from which the standard cell libraries were based. The functionality of the modified logic
circuit after the execution of the cell placement algorithm is verified with an 8-bit multi-
plier, presented in Section 6.1. The resulting output for a 64-bit adder is summarized in
Section 6.2. Additional analysis for an ISCAS’89 benchmark circuit and a cryptographic
hash function, Skein, is presented in Sections 6.3 and 6.4, respectively. The chapter con-
cludes with a summary of the result findings.
6.1 8-Bit Multiplier
In order to verify the design methodology with a simple circuit, an implementation of an 8-
bit multiplier was applied to Hybrid NESST. The multiplier accepts two 8-bit multiplicand
inputs, and the entire multiplication process is completed during a single clock cycle. At
the rising edge of the clock, the resulting value is stored utilizing D-flip-flops. The 16-bit
product is the value that is stored in these flip-flops. The block diagram of the multiplier is
shown in Figure 6.1.
47
D Q
D Q
D Q
Combinational
Logic
IP1[7]
IP1[6]
IP1[0]
IP2[7]
IP2[6]
IP2[0]
OP[15]
OP[14]
OP[0]
CLK
Figure 6.1: 8-Bit Multiplier with Sequential Output Architecture
6.1.1 Functional Performance and Energy Analysis
The 8-bit multiplier was synthesized using the characterized superthreshold library. For the
synthesized 8-bit multiplier, a minimum clock period of 4ns was measured, corresponding
to an operating frequency of 250MHz. During functional testing to verify the correct exe-
cution of the original multiplier, the total current drawn from the supply voltage was mea-
sured. This current was used to calculate the energy consumption of the multiplier. Based
on this energy consumption measurement, the average energy per output bit was computed.
For the original multiplier, this value was measured at 44.54fJ/bit.
The critical path method and cell placement algorithm were used to reduce the average
energy consumption of the multiplier. With the original multiplier circuit as the input to the
critical path method, the slack times of the circuit were calculated. Various performance
degradation factors were utilized in the hybrid design methodology.
For these functional experiments, the energy drawn from each power supply (1.0V,
0.6V, and 0.3V) was measured independently. An additional 1.0V power supply was in-
serted to isolate the energy drawn by the level shifters in the design, allowing for a direct
observation of the impact of the level shifters.
The resulting cell counts after the cell placement algorithm completed execution are
48
shown in Table 6.1. With a PDF of 1, the operating frequency remained unchanged, but
26 superthreshold cells that were not on the critical path of the circuit were replaced with
nearthreshold cells. As the PDF increased, the number of superthreshold cells that were
replaced increased as well. Once the degradation factor reached 10, all combinational logic
was replaced with nearthreshold cells. Because the hybrid approach does not affect the
sequential elements of the circuit, the 16 remaining superthreshold cells were the 16 D-
flip-flops. As the degradation factor continued to increase, the slack of individual gates
became large enough to allow for the insertion of subthreshold cells. The maximum degra-
dation factor (4000) shows a point at which all combinational logic is implemented in
subthreshold, requiring 16 level shifters from 0.3V to 1.0V for the interfacing between the
subthreshold gates with the superthreshold sequential cells.
Table 6.1: Gate Counts for 8-Bit Multiplier Functional Tests
PDF Super-Vt Near-Vt Sub-Vt Total NEAR S2N SUB
Cells Cells Cells Level Level Level Level
Shifters Shifters Shifters Shifters
1 284 26 0 25 25 0 0
2 250 60 0 47 47 0 0
3 238 72 0 47 47 0 0
4 222 88 0 40 40 0 0
5 186 124 0 38 38 0 0
10 16 294 0 16 16 0 0
100 168 137 5 43 38 5 0
1000 127 132 51 44 25 3 16
2000 56 118 136 24 1 7 16
3000 16 49 245 18 0 2 16
4000 16 0 294 16 0 0 16
The energy measurements for the 8-bit multiplier simulations are presented in Table
6.2. From these results, several interesting observations can be made. As expected, a larger
degradation factor yields a lower energy consumption for VDD because the superthreshold
cells are being replaced. This results in an increase in the VDDN and VDDL energy con-
sumption as nearthreshold and subthreshold cells are inserted into the design. The cost of
replacing a superthreshold cell, besides the additional delay in the circuit, is also observed
49
by analyzing the level shifter energy consumption.
Table 6.2: Total Energy Per Bit Measurements for 8-Bit Multiplier Functional Simulations
PDF EV DD EV DDN EV DDL EV DDLS ETOTAL
(1.0V) (0.6V) (0.3V) (1.0V) [fJ]
[fJ] [fJ] [fJ] [fJ]
Baseline 44.5390 - - 44.5390
1 43.7600 0.5421 - 1.4195 45.7216
2 36.6979 1.7680 - 2.2706 40.7365
3 35.6188 2.3285 - 2.7217 40.6690
4 32.7125 2.8092 - 2.1333 37.6550
5 27.1833 4.4094 - 2.3250 33.9177
10 3.7973 13.6865 - 1.6264 19.1101
100 24.3188 5.4219 0.0335 2.1933 31.9675
1000 24.0188 6.2358 0.6994 2.9096 33.8636
2000 12.6350 7.5423 2.8627 2.0103 25.0503
3000 5.2502 3.5835 6.3867 2.6146 17.8350
4000 5.2567 1.1105 7.9717 2.9415 17.2803
The total energy savings, when compared to the original baseline design, are presented
in Table 6.3. These savings are presented in graphical form in the plots shown in Figures 6.2
and 6.3. With a degradation factor of 1, there are 26 superthreshold gates that are replaced
with nearthreshold gates, and nearly all of these cells require a level shifter to interface
with the next logic gate in its path. This additional overhead eliminates any benefits that
the nearthreshold gates provide and there is actually an increase in energy consumption
over the original implementation.
After the degradation factor exceeds the point that only nearthreshold gates are utilized
for the combinational logic, subthreshold cells begin to be added to the design. Beyond
a PDF of 10, the energy savings retrograde. This observation can be attributed to the
additional level shifter overhead required for subthreshold cells; additional analysis of this
increase is covered in the estimation results for the 8-bit multiplier, where a larger sample
set is available for a more thorough observation.
In addition to the hybrid approach, a fully nearthreshold and a fully subthreshold im-
plementation of the multiplier circuit were also produced by the design methodology. In
50
1  2  3  4  5  6  7  8  9  10 
­10 
0 
10 
20 
30 
40 
50 
60 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
8­Bit Multiplier Simulation Percent Energy Savings 
Performance Degradation Factor
Figure 6.2: 8-Bit Multiplier Total Energy Savings with a Subset of PDFs
0  500  1000  1500  2000  2500  3000  3500  4000 
­10 
0 
10 
20 
30 
40 
50 
60 
70 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
8­Bit Multiplier Simulation Percent Energy Savings 
Performance Degradation Factor
Figure 6.3: 8-Bit Multiplier Total Energy Savings with Varying PDFs
51
Table 6.3: Total Energy Savings for 8-Bit Multiplier Functional Simulations
PDF Operating Etotal Percent
Frequency [fJ] Energy Savings
[MHz]
Baseline 250.0000 44.5390 0%
1 250.0000 45.7216 -2.66%
2 125.0000 40.7365 8.54%
3 83.3000 40.6690 8.69%
4 62.5000 37.6550 15.46%
5 50.0000 33.9177 23.85%
10 25.0000 19.1101 57.09%
100 2.5000 31.9675 28.23%
1000 0.2500 33.8636 23.97%
2000 0.1250 25.0503 43.76%
3000 0.0833 17.8350 59.96%
4000 0.0625 17.2803 61.20%
these circuits, all cells, including sequential logic, operated at the respective supply volt-
ages. Therefore, the outputs of the circuits were either 0.6V or 0.3V for the nearthreshold
or subthreshold circuits, respectively. The results of the complete super/near/subthreshold
implementations are summarized in Table 6.4.
Table 6.4: Total Energy Savings for 8-Bit Multiplier, Super/Near/Sub-Vt Implementations
Implementation Operating Etotal Percent
Frequency [fJ] Energy Savings
Superthreshold 250MHz 44.539 -
Nearthreshold 31.25MHz 14.577 67.27%
Subthreshold 62.5KHz 8.216 81.55%
52
6.1.2 Hybrid NESST Results Based on Energy Estimations
The Hybrid NESST design methodology, which makes use of the energy estimation mod-
els rather than functional simulations for energy measurements, was used to examine all
integer degradation factors up to 4000 with the 8-bit multiplier RTL model. This design
methodology was used to generate the resulting energy savings displayed in Figure 6.4.
0  500  1000  1500  2000  2500  3000  3500  4000 
­10 
0 
10 
20 
30 
40 
50 
60 
70 
80 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Performance Degradation Factor 
8­Bit Multiplier Estimated Percent Energy Savings
Figure 6.4: Estimated Total Energy Savings of 8-Bit Multiplier
This plot of the estimated percent energy savings reveals a trend that shows a general in-
crease in the energy savings as the PDF increases. However, there are a significant number
of instances where the percent savings drop below the energy savings seen with a previous
PDF. This is especially true for PDFs ranging from 50 to 3000, as observed on the energy
savings plot.
During the range where this behavior is most evident, superthreshold gates that had
already been replaced by nearthreshold gates are starting to be replaced by subthreshold
cells. For example, consider one of the more extreme cases of energy saving loss with the
53
degradation factors shown in Table 6.5. At a PDF of 75, the estimated energy savings is
59.07% in a design consisting of 16 superthreshold and 294 nearthreshold cells. However,
after the PDF increases to 83, the estimated energy savings is 26.08% in a design consisting
of 135 superthreshold cells, 174 nearthreshold cells, and 1 subthreshold cell.
Table 6.5: 8-Bit Multiplier Gate Counts Demonstrating Nearthreshold Replacement
PDF Super-Vt Near-Vt Sub-Vt Total NEAR S2N SUB Percent
Cells Cells Cells Level Level Level Level Energy
Shifters Shifters Shifters Shifters Savings
75 16 294 0 16 16 0 0 59.07%
83 135 174 1 39 38 1 0 26.08%
This phenomenon can be explained by the large delay penalty that is incurred to the
circuit by replacing a superthreshold with a subthreshold gate. To observe an example
of this behavior, consider the chain of four AND gates shown in Figure 6.5, where each
superthreshold AND gate has a delay of 0.10ns.
A
B C OUTD
E
Super-
Vt Super-
Vt
Super-
Vt
PATH DELAY = 0.40704ns
MAX PATH DELAY=0.40704ns
D=0.10176ns
D=0.10176ns
D=0.10176ns
D=0.10176nsSuper-
Vt
Figure 6.5: Superthreshold AND Gate Chain
When the degradation factor reaches 9, the allowable delay of this chain of AND gates is
3.52ns. With each nearthreshold AND gate having a delay of 0.88ns, all four superthreshold
gates can be converted to nearthreshold cells, as shown in Figure 6.6.
The next possible change to this chain of AND gates occurs when the degradation factor
reaches 684. At this point, the allowable delay of the circuit is 278.42ns. The AND gate
closest to the output can now be replaced with the subthreshold variant, with a delay of
278ns. The remaining cells do not have enough slack time to remain as nearthreshold cells,
54
A
B C OUTD
E
Near-
Vt Near-
Vt
Near-
Vt
PATH DELAY = 3.5212ns
MAX PATH DELAY=3.6634ns
D=0.8803ns
D=0.8803ns
D=0.8803ns
D=0.8803nsNear-
Vt
Figure 6.6: AND Gate Chain with Degradation Factor of 9
so all three must operate in the superthreshold regime. This operation is reflected in Figure
6.7(a). All of the remaining AND gates can be converted back to nearthreshold gates once
the degradation factor has reached 690. At this point, there is enough available slack for
nearthreshold and subthreshold operation, as shown in Figure 6.7(b).
A
B C OUTD
E
Super-
Vt Super-
Vt
Sub-
Vt
PATH DELAY = 278.4053ns
MAX PATH DELAY=278.4154ns
D=0.10176ns
D=0.10176ns
D=0.10176ns
D=278.1nsSuper-
Vt
A
B C OUTD
E
Near-
Vt Near-
Vt
Sub-
Vt
PATH DELAY = 280.7409ns
MAX PATH DELAY=280.8576ns
D=0.8803ns
D=0.8803ns
D=0.8803ns
D=278.1nsNear-
Vt
(a) (b)
Figure 6.7: AND Gate Chains with Degradation Factors of (a) 684 and (b) 690
When considering the energy consumption of the three types of AND gates, the ob-
served results with the 8-bit multiplier can be comprehended. The superthreshold AND
consumes 1.01fJ, the nearthreshold AND consumes 0.28fJ, and the subthreshold AND con-
sumes 0.11fJ. Thus, the original AND chain consumes 4.04fJ. In the case where all four
gates operate in the nearthreshold regime, the total energy consumption is 1.12fJ. How-
ever, after the subthreshold gate is inserted, and the three remaining AND gates return to
the superthreshold regime, the total energy consumption increases to 3.14fJ. After the three
remaining AND gates enter nearthreshold operation, the total energy consumption is only
0.95fJ.
55
In a larger design, this phenomenon can be expected to occur in multiple locations
throughout the circuit, thus explaining the behavior observed with the 8-bit multiplier. To
mitigate this undesirable behavior, two additional versions of the Hybrid NESST design
methodology were created.
Countermeasure Mode: This mode accepts the maximum PDF from the user and per-
forms three iterations of the critical path method and cell placement algorithm with this
maximum PDF: one iteration with the hybrid design, one iteration where superthreshold
gates are replaced with nearthreshold gates only, and one iteration where the superthreshold
gates are replaced with subthreshold gates only. This modification to the design methodol-
ogy ensures that the introduction of subthreshold cells into the design does not result in a
decrease in energy savings when compared to the nearthreshold-only replacement method-
ology.
While Countermeasure Mode helps to mitigate some of the decreases in energy savings,
it does not guarantee a monotonic increase in percent energy savings. After the decreased
energy savings results in higher savings than the nearthreshold-only replacement design,
the fluctuating pattern will still be observed.
Extended Mode: This mode performs the cell placement algorithm for every integer
PDF from 1 to the maximum allowed by the designer, sacrificing program execution time
for a monotonically increasing energy savings function. During execution, the PDF that
results in the maximum energy savings is stored. The PDF that provides the maximum
energy savings is used in the final design, ensuring that the optimal conditions are being
utilized. While this approach ensures that the best result is utilized in the design, a severe
penalty is incurred in the runtime of the design methodology.
The energy savings of the 8-bit multiplier with these two additional versions of the
design methodology are shown in Figures 6.8 and 6.9. Both versions offer a vast improve-
ment over the original version, guaranteeing at least 59.072% energy savings when the
degradation factor is greater than 10.
56
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
70 
80 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Performance Degradation Factor 
8­Bit Multiplier Estimated Percent Energy Savings
Figure 6.8: Total Energy Savings with Countermeasure Mode for 8-Bit Multiplier
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
70 
80 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Performance Degradation Factor 
8­Bit Multiplier Estimated Percent Energy Savings
Figure 6.9: Total Energy Savings with Extended Mode for 8-Bit Multiplier
57
6.2 64-Bit Adder
To provide another classic example circuit, the three versions of the hybrid design method-
ology were applied to a 64-bit adder. The architecture of the adder is similar to the 8-bit
multiplier, where the combinational logic is performed in a single clock cycle, and the out-
put is clocked into 64 D-flip-flops. The architecture of the 64-bit adder is shown in Figure
6.10.
D Q
D Q
D Q
Combinational
Logic
IP1[63]
IP1[62]
IP1[0]
IP2[63]
IP2[62]
IP2[0]
OP[63]
OP[62]
OP[0]
CLK
Figure 6.10: 64-Bit Adder with Sequential Output Architecture
The total energy savings for various integer performance degradation factors in the
original version of the design methodology is shown in Figure 6.11. A great deal of insight
can be observed by analyzing the resulting plot. Unlike the 8-bit multiplier, which suffered
an energy consumption gain when a PDF of 1 was utilized, the 64-bit adder demonstrated a
significant energy savings. Without affecting the operating frequency of the original circuit,
over 100 cells were converted to nearthreshold operation, yielding a total energy savings of
22.16%. After the degradation factor reached 10, all of the combinational logic gates were
replaced with nearthreshold gates, as shown in Table 6.6.
However, after this threshold passed, and the nearthreshold elements began to be re-
placed with subthreshold cells, the energy savings dropped below the nearthreshold-only
58
Table 6.6: Gate Counts for 64-Bit Adder
PDF Super-Vt Near-Vt Sub-Vt Total NEAR S2N SUB Total
Cells Cells Cells Level Level Level Level Energy
Shifters Shifters Shifters Shifters Savings
1 199 117 0 64 64 0 0 22.16%
9 77 239 0 64 64 0 0 39.16%
10 64 252 0 64 64 0 0 40.79%
replacement value. In fact, even when all combinational logic was implemented in the sub-
threshold regime, the energy savings was still below the nearthreshold-only replacement
value. This behavior is explained due to the high number of sequential flip-flops in relation
to the number of logic gates in the circuit. Because this circuit is a 64-bit adder, it requires
64 D-flip-flops. Hence, there must be 64 level shifters that operate from the subthreshold
regime to superthreshold regime when all of the combinational logic consists of subthresh-
old logic. Thus, the savings provided to the circuit by changing from the nearthreshold
gates to the subthreshold gates did not compensate for the extra energy cost of level shift-
ing from 0.3V to 1.0V versus 0.6V to 1.0V. This observation suggests that the average
combinational logic depth was not sufficient to compensate for this additional level shifter
overhead.
From the observations made from the original version of the hybrid design method-
ology, the expected outcomes of the Countermeasure and Extended Modes were plateaus
after all of the combinational logic was replaced with nearthreshold cells. This hypothesis
was verified with the plot of the results of hybrid design methodology, shown in Figure
6.12. Thus, there was no energy savings benefit seen by exceeding a PDF of 10. Because
this function is already monotonic, there was no change when the Extended Mode of the
hybrid design methodology was executed.
59
0  500  1000  1500  2000  2500  3000  3500  4000 
10 
15 
20 
25 
30 
35 
40 
45 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Performance Degradation Factor 
64­Bit Adder Estimated Percent Energy Savings
Figure 6.11: Total Energy Savings with Original Mode for 64-Bit Adder
0  500  1000  1500  2000  2500  3000  3500  4000 
22 
24 
26 
28 
30 
32 
34 
36 
38 
40 
42 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Performance Degradation Factor 
64­Bit Adder Estimated Percent Energy Savings
Figure 6.12: Total Energy Savings with Countermeasure/Extended Modes for 64-Bit Adder
60
Although there was no advantage to operating with subthreshold gates in the hybrid
mode, there were significant energy savings observed when the entire circuit was placed in
subthreshold. These implementations, shown in Table 6.7, represented circuits where all
logic high voltages were 0.3V or 0.6V for subthreshold or nearthreshold circuits, respec-
tively.
Table 6.7: Estimated Total Energy Savings for 64-Bit Adder, Super/Near/Sub-Vt Imple-
mentations
Implementation Operating Percent
Frequency Energy Savings
Superthreshold 58.824MHz -
Nearthreshold 6.068MHz 65.16%
Subthreshold 15.548KHz 87.65%
6.3 ISCAS’89 Benchmark Circuits
In order to provide circuit designs more representative of practical applications, benchmark
circuits from the ISCAS’89 package were utilized. The ISCAS’89 S1488 controller circuit
is a selected sample of the benchmark circuits provided by the ISCAS’89 package; addi-
tional ISCAS’89 benchmark circuit results can be found in Appendix B. The S1488 circuit
consists of 8 primary inputs, 19 primary outputs, and a total of 659 gates. With 19 pri-
mary outputs and 6 D-flip-flops included in the design, a minimum of 25 level shifters was
expected when operating all combinational logic in either nearthreshold or subthreshold.
The resulting energy savings from the original version of the Hybrid NESST design
methodology is shown in Figure 6.13. The observed results were similar to the behavior
of the 8-bit multiplier. After the PDF reached 10, all of the non-sequential logic was im-
plemented in the nearthreshold regime, demonstrated by Table 6.8. The situation where
the addition of subthreshold cells results in an initial decline in the overall energy savings
was observed, but the overall trend of the energy savings eventually increased to a higher
percentage than the savings provided by nearthreshold-only replacements. Thus, there was
61
a definite benefit in incorporating a large number of subthreshold cells into the design.
Table 6.8 also includes the scenario where only subthreshold gates are utilized, yielding
an additional 8.22% energy savings when compared to the nearthreshold-only replacement
method.
Table 6.8: Gate Counts for ISCAS’89 S1488
PDF Super-Vt Near-Vt Sub-Vt Total NEAR S2N SUB Total
Cells Cells Cells Level Level Level Level Energy
Shifters Shifters Shifters Shifters Savings
9 14 645 0 25 25 0 0 63.62%
10 6 653 0 25 25 0 0 64.04%
3333 6 1 652 25 0 0 25 72.25%
3334 6 0 653 25 0 0 25 72.26%
0  500  1000  1500  2000  2500  3000  3500  4000 
­10 
0 
10 
20 
30 
40 
50 
60 
70 
80 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Performance Degradation Factor 
ISCAS'89 S1488 Estimated Percent Energy Savings
Figure 6.13: Total Energy Savings with Original Mode for ISCAS’89 S1488
The two additional versions of the hybrid design methodology eliminated many of the
energy increases at higher PDFs observed in the original design methodology. As expected
with the Countermeasure Mode, shown in Figure 6.14, the total energy savings did not drop
62
below the nearthreshold-only replacement value of 64.04% after the degradation factor
exceeded 10. The output of the Extended Mode of the hybrid design methodology, shown
in Figure 6.15, shows the monotonic improvement that was obtained when all previous
degradation factors were tested to determine the best case.
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
70 
80 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Performance Degradation Factor 
ISCAS'89 S1488 Estimated Percent Energy Savings
Figure 6.14: Total Energy Savings with Countermeasure Mode for ISCAS’89 S1488
An analysis of the level shifter energy overhead for the S1488 benchmark circuit demon-
strated the cost of including these elements in the final design. The energy overhead of the
level shifters for this benchmark circuit are shown in Figure 6.16, which shows two distinct
regions. The first region, where the PDF ranges up to 1600, shows the energy overhead
with only nearthreshold replacements inserted into the design. At these degradation fac-
tors, the energy consumed by the level shifters accounted for 15.03% of the entire energy
consumption of the optimized circuit. After exceeding this PDF range, there was a drastic
increase in level shifter overhead, as subthreshold cells were inserted into the design. This
increase was observed due to the fact that subthreshold level shifters consume significantly
63
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
70 
80 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Performance Degradation Factor 
ISCAS'89 S1488 Estimated Percent Energy Savings
Figure 6.15: Total Energy Savings with Extended Mode for ISCAS’89 S1488
more energy than nearthreshold level shifters, while the subthreshold cells themselves con-
sume less energy than the nearthreshold cells. Hence, the relative energy overhead of the
nearthreshold level shifter is significantly smaller than the subthreshold level shifter. The
worst-case scenario was observed where the energy consumption of the level shifters ac-
counted for 51.85% of the energy consumption of the circuit; when this occurred, however,
the energy savings provided by the subthreshold cells still outweighed the cost of the level
shifter overhead, and an increase on the energy savings plot was observed.
The overhead of the level shifters, as well as the overhead of superthreshold sequential
elements, are eliminated when the entire design is implemented in nearthreshold or sub-
threshold. However, there are heavy performance costs. The complete subthreshold and
nearthreshold implementations for the ISCAS’89 S1488 benchmark circuit is summarized
in Table 6.9.
64
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
ISCAS'89 S1488 Level Shifter Overhead 
P
er
ce
nt
 E
ne
rg
y 
O
ve
rh
ea
d 
Performance Degradation Factor
Figure 6.16: Level Shifter Energy Overhead with Countermeasure Mode for ISCAS’89
S1488
Table 6.9: Estimated Total Energy Savings for ISCAS’89 S1488, Super/Near/Sub-Vt Im-
plementations
Implementation Operating Percent
Frequency Energy Savings
Superthreshold 635.845MHz -
Nearthreshold 27.554MHz 66.38%
Subthreshold 72.442KHz 87.45%
65
6.4 Skein: A Cryptographic Hash Function
Skein, a cryptographic hash function based on a tweakable block cipher, not only serves
as a larger input circuit for the Hybrid NESST design methodology, but also serves as a
practical application for the integration of nearthreshold and subthreshold logic. As the
use of portable electronics continues to expand across application spectrums, the need for
secure devices will grow as well.
The key scheduler of the cryptographic hash function was isolated from the design for
analysis with the Hybrid NESST design methodology, serving as an example where shallow
combinational logic depths yield lower total energy savings. A complete implementation
of Skein-256-256 yields significantly more savings than the isolated key scheduler.
6.4.1 Skein Key Scheduler
Skein represents a family of hash functions with a customizable state size and output size.
A low-memory variant of Skein, Skein-256-256, corresponding to a 256-bit state size and
256-bit output, was used for profiling. The block diagram of the key scheduler is shown in
Figure 6.17 [37].
Figure 6.17: Block Diagram of Skein Key Scheduler [37]
66
Because the design is larger than the other benchmark circuits, containing 3,660 logic
gates, an analysis of every possible PDF would require an adverse amount of time for a
circuit designer. Therefore, a test methodology was developed for larger circuits.
The proposed test methodology involves running the Countermeasure Mode at a PDF
of 1, 10, and 4000. At a PDF of 1, the energy savings without impacting the operating
frequency of the circuit can be analyzed, while testing at a PDF of 10 will demonstrate the
possible savings that can be achieved with all combinational logic converted to nearthresh-
old replacements. To test whether there is any benefit to increasing above this PDF, a PDF
of 4000 should be used; if the resulting energy savings is equivalent to the savings seen at
a PDF of 10, then the combinational logic depth is not sufficiently deep to mask the energy
overhead of the subthreshold level shifters. When this occurs, then there is no benefit in
increasing the PDF beyond 10.
With the proposed test methodology, all combinational logic gates were implemented
in nearthreshold at a PDF of 10, verified with the gate counts presented in Table 6.10.
Table 6.10: Gate Counts for Skein Key Scheduler
PDF Super-Vt Near-Vt Sub-Vt Total NEAR S2N SUB Total
Cells Cells Cells Level Level Level Level Energy
Shifters Shifters Shifters Shifters Savings
1 812 2808 0 1728 1728 0 0 5.03%
9 538 3082 0 1728 1728 0 0 9.77%
10 512 3108 0 1728 1728 0 0 10.19%
With the Skein key scheduling algorithm, a maximum energy savings of 10.19% was
observed. Any potential benefit of replacing these nearthreshold gates with subthreshold
cells was tested by checking with a degradation factor of 4000, but the output of the design
methodology still presented an optimal savings of 10.19%. Thus, even when all combina-
tional logic could have potentially been implemented in the subthreshold regime, it was still
optimal to use nearthreshold cells. The main operation of the key scheduler is a bit-wise
XOR operation followed in some cases by a possible addition. Thus, the expected depth
of the combinational logic is relatively shallow so few subthreshold gates could be placed
67
before a level shifter into a sequential flip-flop was needed. Hence, any PDF exceeding 10
was slowing circuit performance without any energy savings benefit.
For observation, the energy savings for the first 12 performance degradation factors are
shown in Figure 6.18, demonstrating the achievable energy savings with a given PDF. Fully
subthreshold and nearthreshold implementations were also investigated, and the results are
summarized in Table 6.11.
0  2  4  6  8  10  12 
5 
6 
7 
8 
9 
10 
11 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Skein Key Scheduler Estimated Percent Energy Savings
Figure 6.18: Total Energy Savings with Countermeasure Mode for Skein Key Scheduler
Table 6.11: Estimated Total Energy Savings for Skein Key Scheduler, Super/Near/Sub-Vt
Implementations
Implementation Operating Percent
Frequency Energy Savings
Superthreshold 59.559MHz -
Nearthreshold 6.068MHz 66.525%
Subthreshold 15.548KHz 87.424%
68
6.4.2 Skein-256-256
The total energy savings of the Skein key scheduler were significantly lower than the sav-
ings observed with the other benchmark circuits; to observe if this trend continued with
the entire cryptographic hash function, an implementation of Skein-256-256 was analyzed
with the hybrid design methodology. The synthesized model consisted of 6,675 standard
cells. A block diagram of the circuit is shown in Figure 6.19 [24].
Figure 6.19: Block Diagram of Block Cipher Used in Skein [24]
A similar testing methodology as the key scheduler testing was utilized, where an ex-
haustive search of all integer performance degradation factors was not performed. The
Countermeasure Mode of the design methodology was used on the Skein implementation,
with a PDF of 10. This factor yielded a savings of 33.60% with nearthreshold combi-
national cell replacements, as shown in Table 6.12. To observe if there was any benefit
to utilizing subthreshold cells, a degradation factor 4000 was used; however, the savings
were still the nearthreshold-replacement value of 33.60%. This observation suggested that
while the depth of the combinational logic had increased from the key scheduler to provide
additional savings at the nearthreshold-only replacement regime, the depth was still not
69
sufficient to provide additional savings through the integration of subthreshold logic.
Table 6.12: Gate Counts for Skein-256-256
PDF Super-Vt Near-Vt Sub-Vt Total NEAR S2N SUB Total
Cells Cells Cells Level Level Level Level Energy
Shifters Shifters Shifters Shifters Savings
1 3305 3370 0 2200 2200 0 0 2.11%
9 933 5742 0 2038 2038 0 0 31.74%
10 768 5907 0 2049 2049 0 0 33.60%
Additional performance degradation factors showed the growth pattern of the total en-
ergy savings as the factor reached the optimal PDF of 10. The resulting plot is shown in
Figure 6.20. The increase is approximately linear as the depth of the logic that could be
replaced with nearthreshold logic increased by a linear factor.
0  2  4  6  8  10  12 
0 
5 
10 
15 
20 
25 
30 
35 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
Skein­256­256 Estimated Percent Energy Savings
Figure 6.20: Total Energy Savings with Countermeasure Mode for Skein-256-256
In order to avoid the high cost of the level shifters in the hybrid design of Skein, the
complete nearthreshold and complete subthreshold implementations were also investigated.
70
Although these implementations yielded significant energy savings, these savings came at
a critical performance cost. A summary of these implementations is shown in Table 6.13.
Table 6.13: Estimated Total Energy Savings for Skein-256-256, Super/Near/Sub-Vt Imple-
mentations
Implementation Operating Percent
Frequency Energy Savings
Superthreshold 53.792MHz -
Nearthreshold 5.475MHz 66.147%
Subthreshold 14.036KHz 86.696%
6.5 Summary
This chapter presented the results and analysis of five benchmark circuits: an 8-bit mul-
tiplier, a 64-bit adder, an ISCAS’89 benchmark circuit, a Skein key scheduler, and an
implementation of the Skein-256-256 cryptographic hash function. All circuits exhibited
energy savings when all combinational logic was converted to nearthreshold at a PDF of
10. Two of these benchmark circuits exhibited additional savings by continuing to increase
the degradation factor so that subthreshold gates could be inserted into the design; these
circuits achieved an optimal savings after all of the combinational logic was replaced by
subthreshold logic, which occurred after the PDF reached 4000. Thus, it is not always
optimal to use a large degradation factor and expect better results than when using a degra-
dation factor of 10. Because of this observation, a recommended design testing process for
this hybrid design methodology was presented.
71
Chapter 7
Conclusions and Future Work
7.1 Conclusions
In this research work, a Hybrid NESST design methodology integrating subthreshold and
nearthreshold CMOS logic into a superthreshold CMOS design has been presented. The
subthreshold and nearthreshold standard cell libraries are characterized for minimum-energy
operation. Available timing slack in the circuit is identified by the critical path method;
these slack values are used to determine where superthreshold cells can be replaced with
subthreshold or nearthreshold logic. For the benchmark circuits that were analyzed with
the Hybrid NESST design methodology, an average energy savings of 41.15% was ob-
served at a performance degradation factor of 10. The Hybrid NESST design methodology
allows for full control over the resulting operating frequency and energy consumption of
the optimized circuit by the designer, while automatically integrating level shifters when
necessary. For the benchmark circuits, the level shifters consumed at least 8.5% of the
energy consumption of the optimized circuit, with an average energy overhead of 26.76%.
A recommended test methodology was also presented that allows the designer to observe
the potential energy savings provided by nearthreshold and subthreshold cell replacements
while performing a minimum number of iterations of the design methodology (with a PDF
of 1, 10, and 4000).
The implementations of the hybrid design methodology require the use of level shifters
for the sequential elements and output nodes. While further work in low-energy level shifter
72
design could improve energy savings, the overhead of these level shifters is not limited to
the energy consumption of these cells. The area of the circuit will be increased by the
addition of level shifters as well. To provide optimal savings, the fully nearthreshold and
fully subthreshold implementations should be utilized. In these design methodologies, a
single power supply is utilized; thus, there is no requirement for the use of level shifters.
However, the operating frequency of the circuit faces a significant loss compared to the
options provided by the hybrid methodology. Additionally, the output nodes of the fully
nearthreshold and fully subthreshold implementations are at the respective supply voltage
values, so the design must be able to accommodate these output voltages.
With the Hybrid NESST design methodology, the critical path method is used to satisfy
timing and energy constraints. Because timing constraints are met, additional power opti-
mization techniques, such as dynamic frequency scaling, could be applied to the optimized
circuit to yield additional energy savings. Alternative approaches to the current greedy-
subthreshold cell placement algorithm could also be investigated to determine optimal cell
placement. While the hybrid design methodology presents challenges including multiple
voltage island design and susceptibility to process variations, recent research, including
Intel’s nearthreshold hardware accelerator, provides evidence that these design challenges
can be overcome.
7.2 Future Work
Although the demonstrated energy savings illustrate the benefits of the proposed Hybrid
NESST design methodology, it also presents design challenges, namely area overhead and
process variations. Because the application spectrum of low-energy design includes the use
of portable electronics, area considerations are an important aspect of the circuit design.
This design consideration should be incorporated as a third constraint for Hybrid NESST
after the physical layouts of the standard cell libraries are available.
The area overhead is present due to the fact that the subthreshold standard cell library
73
utilizes a 5:1 aspect ratio, while the nearthreshold and superthreshold standard cell libraries
use a typical 2:1 aspect ratio. Further area overhead is introduced by the integration of level
shifters into the design. These level shifters are essential for the functionality of the circuit,
and the area overhead of these elements cannot be disregarded. To minimize the area
overhead of the level shifters, further work in the use of level shifter resource sharing by
multiple circuit elements could be investigated.
Process variation is another design aspect that must be analyzed. Subthreshold and
nearthreshold systems have been observed to be extremely sensitive to process variations
[23, 33]. Hence it is integral to incorporate variation tolerant techniques for these designs.
The impact of process variation is more evident at lower supply voltages, with observed
variations in subthreshold gate delay as high as 300% from the nominal case [43].
Several design approaches have been suggested to mitigate these process variations. For
example, substrate biasing is a design technique that increases the robustness of a standard
cell to process variations, while increasing performance [33]. Substrate biasing has been
analyzed for the subthreshold standard cell library in previous works, achieving a 75.60%
reduction in process variation for the base inverter [4]. An alternative solution is presented
by Roy et al. incorporating an adaptive beta ratio modulation approach [29]. Similar design
techniques should be included in the implementation of the optimized hybrid circuit.
After the area and process variation design challenges are considered, the next logical
progression of the circuit design methodology is to obtain the resulting physical layout.
Because the hybrid design methodology results in an optimized transistor-level netlist con-
sisting of superthreshold, nearthreshold, and subthreshold gates, the layout will present
some design challenges of its own. The placement and routing for each of the gate variants
operating at different supply voltages will require a multiple voltage island design.
74
Bibliography
[1] Energy-efficient hardware accelerator. Innovation@Intel, Jul. 2010.
[2] M. Alioto. Understanding dc behavior of subthreshold cmos logic through closed-
form analysis. Circuits and Systems I: Regular Papers, IEEE Transactions on,
57(7):1597 –1607, Jul. 2010.
[3] S. Amarchinta. High performance subthreshold standard cell design and cell place-
ment optimization. RIT Digital Media Library, Jun. 2009.
[4] S. Amarchinta, H. Kanitkar, and D. Kudithipudi. Robust and high performance sub-
threshold standard cell design. In Circuits and Systems, 2009. MWSCAS ’09. 52nd
IEEE International Midwest Symposium on, pages 1183 –1186, Aug. 2009.
[5] B.H. Calhoun and D. Brooks. Can subthreshold and near-threshold circuits go main-
stream? Micro, IEEE, 30(4):80 –85, Jul. 2010.
[6] B.H. Calhoun, A. Wang, and A. Chandrakasan. Modeling and sizing for mini-
mum energy operation in subthreshold circuits. IEEE Journal of Solid-State Circuits,
40(9):1778–1786, Sept. 2005.
[7] B.H. Calhoun, A. Wang, and A. Chandrakasan. Sub-Threshold Design for Ultra Low-
Power Systems. Springer, 2006.
[8] A. Chavan and E. MacDonald. Ultra low voltage level shifters to interface sub and
super threshold reconfigurable logic cells. Aerospace Conference, 2008 IEEE, pages
1 –6, Mar. 2008.
[9] B. Chen and I. Nedelchev. Power compiler: a gate-level power optimization and
synthesis system. Computer Design: VLSI in Computers and Processors, 1997. ICCD
’97. Proceedings., 1997 IEEE International Conference on, pages 74 –79, Oct. 1997.
75
[10] Chunbong Chen and M. Sarrafzadeh. An effective algorithm for gate-level power-
delay tradeoff using two voltages. Computer Design, 1999. (ICCD ’99) International
Conference on, pages 222 –227, 1999.
[11] J.C. Chi, H.H. Lee, S.H. Tsai, and M.C. Chin. Gate level multiple supply voltage
assignment algorithm for power optimization under timing constraint. IEEE Transac-
tions on Very Large Scale Integration (VLSI) Systems, 15(6):637–648, Jun. 2007.
[12] R. Dettmer. Softly, softly [cmos scaling advances by subthreshold technology]. IEE
Review, 51(9):26 – 30, Sept. 2005.
[13] R.G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge. Near-
threshold computing: Reclaiming moore’s law through energy efficient integrated
circuits. Proceedings of the IEEE, 98(2):253 –266, Feb. 2010.
[14] R.G. Dreslinski, B. Zhai, T. Mudge, D. Blaauw, and D. Sylvester. An energy efficient
parallel architecture using near threshold operation. Parallel Architecture and Com-
pilation Techniques, 2007. PACT 2007. 16th International Conference on, pages 175
–188, Sept. 2007.
[15] S.K. Gupta, A. Raychowdhury, and K. Roy. Digital computation in subthreshold
region for ultralow-power operation: A device, circuit, architecture codesign perspec-
tive. Proceedings of the IEEE, 98(2):160 –190, Feb. 2010.
[16] S. Hanson, B. Zhai, K. Bernstein, D. Blaauw, A. Bryant, L. Chang, K.K. Das,
W. Haensch, E.J. Nowak, and D.M. Sylvester. Ultralow-voltage, minimum-energy
cmos. IBM Journal of Research and Development, 50(4.5):469 –490, July 2006.
[17] A. Hasanbegovic and S. Aunet. Low-power subthreshold to above threshold level
shifter in 90 nm process. In NORCHIP, 2009, pages 1 –4, 2009.
[18] M. Henry and L. Nazhandali. Hybrid super/subthreshold design of a low power
scalable-throughput fft architecture. In Lecture Notes in Computer Science, volume
5409, pages 278–292. Springer, Dec. 2008.
[19] D.W. Kang, J.T. Doyle, M. Hartman, S. Dhar, M.B. Dermody, R.C. Woolf, R.S. Am-
batipudi, and Y. Kim. A low power methodology for portable electronics. In Inter-
national Symposium on Advanced Radio Technologies, pages 109–116. IEEE, March
2005.
76
[20] H. Kanitkar. Subthreshold circuits: Design, implementation and application. RIT
Digital Media Library, Sept. 2008.
[21] S. Katrue. Power reduction techniques for memory elements. RIT Digital Media
Library, Dec. 2007.
[22] J. Keane, Hanyong E., Tae-Hyoung Kim, S. Sapatnekar, and C. Kim. Subthreshold
logical effort: a systematic framework for optimal subthreshold device sizing. Design
Automation Conference, 2006 43rd ACM/IEEE, pages 425 –428, 2006.
[23] E. Krimer, R. Pawlowski, M. Erez, and P. Chiang. Synctium: a near-threshold stream
processor for energy-constrained parallel applications. Computer Architecture Let-
ters, 9(1):21 –24, Jan. 2010.
[24] M. Long. Implementing skein hash function on xilinx virtex-5 fpga platform. The
Skein Hash Function Family, Feb. 2009.
[25] D. Markovic, C.C. Wang, L.P. Alarcon, T. Liu, and J.M. Rabaey. Ultralow-power
design in near-threshold region. Proceedings of the IEEE, 98(2):237 –252, Feb. 2010.
[26] B.G. Perumana, R. Mukhopadhyay, S. Charaborty, C. Lee, and J. Laskar. A low-
power fully monolithic subthreshold cmos receiver with integrated lo generation for
2.4 ghz wireless pan applications. IEEE Journal of Solid-State Circuits, 43(10):2229–
2238, Oct. 2008.
[27] J.M. Rabaey, A. Chandrakasan, and B. Nikolic. Digital Integrated Circuits: A Design
Perspective. Prentice Hall Electronics and VLSI Series, 2003.
[28] R. Rardin. Optimization in Operations Research. Prentice Hall, 1998.
[29] K. Roy, J.P. Kulkarni, and Myeong-Eun Hwang. Process-tolerant ultralow voltage
digital subthreshold design. In Silicon Monolithic Integrated Circuits in RF Systems,
2008. SiRF 2008. IEEE Topical Meeting on, pages 42 –45, Jan. 2008.
[30] A. Schorr. Performance analysis of a scalable hardware fpga skein implementation.
RIT Digital Media Library, Feb. 2010.
[31] J. Shau. Ultra-low power hybrid sub-threshold circuits. United States Patent and
Trademark Office, Jan. 2010.
77
[32] H. Soeleman and K. Roy. Ultra-low power digital subthreshold logic circuits. Low
Power Electronics and Design, 1999. Proceedings. 1999 International Symposium on,
pages 94 – 96, 1999.
[33] H. Soeleman, K. Roy, and B. Paul. Robust ultra-low power sub-threshold dtmos logic.
In Low Power Electronics and Design, 2000. ISLPED ’00. Proceedings of the 2000
International Symposium on, 2000.
[34] H. Soeleman, K. Roy, and B. Paul. Sub-domino logic: ultra-low power dynamic sub-
threshold digital logic. VLSI Design, 2001. Fourteenth International Conference on,
pages 211 –214, 2001.
[35] R. Swanson and J. Meindl. Ion-implanted complementary mos transistors in low-
voltage circuits. Solid-State Circuits Conference. Digest of Technical Papers. 1972
IEEE International, XV:192 – 193, Feb. 1972.
[36] Y. Taur and T. Ning. Fundamentals of Modern VLSI Devices. Cambridge University
Press, 2009.
[37] S. Tillich. Hardware implementation of the sha-3 candidate skein. Cryptology ePrint
Archive, Apr. 2009.
[38] C.Q. Tran, H. Kawaguchi, and T. Sakurai. Low-power high-speed level shifter design
for block-level dynamic voltage scaling environment. Integrated Circuit Design and
Technology, 2005. ICICDT 2005. 2005 International Conference on, pages 229 – 232,
May. 2005.
[39] N. Weste and D. Harris. CMOS VLSI Design. Pearson Education, 2005.
[40] W.L. Winston. Operation Research: Applications and Algorithms. PWS Publishers,
1987.
[41] W. Wolf. Modern VLSI Design. Prentice Hall PTR, 2002.
[42] B. Zhai, R.G. Dreslinski, D. Blaauw, T. Mudge, and D. Sylvester. Energy efficient
near-threshold chip multi-processing. Low Power Electronics and Design (ISLPED),
2007 ACM/IEEE International Symposium on, pages 32 –37, Aug. 2007.
[43] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester. Analysis and mitigation of variabil-
ity in subthreshold design. In Low Power Electronics and Design, 2005. ISLPED ’05.
Proceedings of the 2005 International Symposium on, pages 20 – 25, Aug. 2005.
78
Appendix A
Delay and Energy Measurements
Table A.1: Delay and Energy Measurements
Superthreshold Nearthreshold Subthreshold
Delay Energy Delay Energy Delay Energy
[ns] [fJ] [ns] [fJ] [ns] [fJ]
AND2X1 0.10176 1.0060 0.88030 0.2782 278.10 0.1129
AND2X2 0.11775 1.3600 0.92650 0.4822 287.10 0.1580
AND2X4 0.14056 1.7720 1.14990 0.5195 328.50 0.2478
AND3X1 0.12553 1.0790 1.13500 0.3272 428.10 0.1403
AND3X2 0.13644 1.4390 1.16810 0.6020 444.20 0.1900
AND3X4 0.15817 2.0770 1.37690 0.7380 530.10 0.2896
AND4X1 0.14656 1.3770 1.41830 0.4194 526.70 0.1778
AND4X2 0.15740 1.7310 1.43940 0.7520 544.10 0.2298
AND4X4 0.18025 2.4260 1.65890 0.8490 622.20 0.3336
AO21X1 0.12975 1.0200 1.21590 0.3013 478.00 0.1360
AO21X2 0.14236 1.3890 1.27110 0.5035 495.20 0.1878
AO221X1 0.18536 1.2590 2.04110 0.4478 843.70 0.2109
AO221X2 0.19784 2.1880 2.08500 0.6390 851.70 0.2640
AO22X1 0.15297 1.3980 1.62550 0.4989 654.20 0.2056
AO22X2 0.16429 1.7640 1.65640 0.5280 665.30 0.2584
AO321X1 0.20690 1.5780 2.37400 0.4982 986.70 0.2088
AO321X2 0.22090 1.9590 2.42000 0.6045 1004.50 0.2620
AO32X1 0.17381 1.4950 1.91070 0.5390 774.20 0.2096
AO32X2 0.18660 1.8670 1.97000 0.6270 787.70 0.2628
AOI21X1 0.08909 0.5586 0.83640 0.1670 332.51 0.0791
AOI21X2 0.08762 1.0640 0.66760 0.3400 260.02 0.1534
AOI221X1 0.13951 1.0750 1.47790 0.3522 624.90 0.1522
AOI221X2 0.13862 2.0730 1.28700 0.7245 545.60 0.2986
79
Table A.2: Delay and Energy Measurements, Continued
Superthreshold Nearthreshold Subthreshold
Delay Energy Delay Energy Delay Energy
[ns] [fJ] [ns] [fJ] [ns] [fJ]
AOI22X1 0.11263 0.9436 1.08940 0.3164 456.30 0.1472
AOI22X2 0.11220 1.7970 0.91950 0.6195 380.94 0.2878
AOI321X1 0.16219 1.1250 1.77120 0.3654 767.60 0.1500
AOI321X2 0.16100 2.1710 1.57610 0.7160 677.80 0.2940
AOI32X1 0.13208 1.0370 1.34490 0.3454 565.30 0.1509
AOI32X2 0.13121 1.9820 1.17380 0.6775 483.30 0.2948
DECODER24X1 0.15649 5.2400 1.43270 0.8175 509.70 0.4292
FULLADDERX1 0.26590 4.4840 2.75500 0.8765 1246.00 0.5465
HALFADDERX1 0.10176 0.7485 0.99320 0.5560 491.90 0.1670
INVX1 0.03935 0.1431 0.37100 0.0430 120.30 0.0190
INVX2 0.03919 0.2558 0.25670 0.0768 76.01 0.0358
INVX4 0.03904 0.4853 0.20054 0.1448 52.58 0.0694
INVX8 0.03894 0.8805 0.17009 0.2811 40.97 0.1366
MUX21X1 0.22930 2.0220 2.27300 0.7265 870.50 0.3912
MUX21X2 0.24190 2.7090 2.32200 0.9545 882.80 0.4438
MUX41X1 0.45780 1.4760 4.40000 0.4773 1727.40 0.2417
MUX41X2 0.46480 1.2760 4.41400 0.4974 1722.40 0.2544
NAND2X1 0.07270 0.3695 0.52610 0.1214 201.67 0.0497
NAND2X2 0.07259 0.6615 0.40930 0.2225 142.31 0.0950
NAND2X4 0.07254 1.2690 0.33440 0.4275 114.69 0.1860
NAND3X1 0.08367 0.6225 0.70750 0.1824 286.07 0.0840
NAND3X2 0.08401 1.1600 0.57400 0.4048 210.28 0.1617
NAND3X4 0.08431 2.2260 0.50500 0.4646 177.84 0.3168
NAND4X1 0.10765 0.9200 0.88930 0.4954 360.09 0.1196
NAND4X2 0.10847 1.7200 0.77970 0.7275 289.81 0.2310
NOR0211X1 0.10217 0.7975 0.91150 0.4156 326.00 0.1159
NOR0211X2 0.12436 1.3790 0.99070 0.4265 369.40 0.2072
NOR0211X4 0.16219 2.5020 1.35230 0.8794 511.40 0.3935
NOR2X1 0.07854 0.4162 0.76500 0.1318 168.92 0.0627
NOR2X2 0.07174 0.7670 0.61980 0.2528 130.52 0.1218
NOR2X4 0.07864 1.5380 0.51540 0.4896 111.22 0.2398
NOR3X1 0.10406 0.7815 0.93470 0.2563 326.10 0.1230
NOR3X2 0.10342 1.5060 0.75910 0.4960 269.30 0.2408
NOR3X4 0.10529 2.9780 0.67730 0.9720 239.30 0.4755
NOR4X1 0.13918 1.2860 1.29580 0.4066 469.10 0.1938
NOR4X2 0.13876 2.4560 1.11990 0.7885 408.30 0.3816
80
Table A.3: Delay and Energy Measurements, Continued
Superthreshold Nearthreshold Subthreshold
Delay Energy Delay Energy Delay Energy
[ns] [fJ] [ns] [fJ] [ns] [fJ]
OA21X1 0.12647 1.1660 1.25490 0.4040 490.30 0.1576
OA21X2 0.13757 1.5280 1.31100 0.5760 501.20 0.2094
OA32X1 0.17569 1.7560 2.02950 0.6345 846.30 0.2532
OA32X2 0.18790 2.1100 2.01230 0.7740 825.20 0.3063
OAI21X1 0.08290 0.7030 0.85000 0.2511 323.38 0.1006
OAI21X2 0.08262 1.3260 0.67290 0.4056 254.72 0.1952
OAI32X1 0.12980 1.3140 1.33630 0.4470 557.60 0.1950
OAI32X2 0.12885 2.5240 1.14660 0.8580 480.00 0.3823
OR2X1 0.11318 0.8215 0.98880 0.2495 367.40 0.1128
OR2X2 0.12759 1.1500 1.04460 0.5740 378.10 0.1577
OR2X4 0.15007 1.7460 1.28840 1.2180 458.60 0.2480
OR3X1 0.14416 1.2200 1.39050 0.4875 522.70 0.1774
OR3X2 0.15825 1.5600 1.42920 0.4900 533.70 0.2278
OR3X4 0.18293 2.3364 1.67440 1.5097 605.40 0.3272
OR4X1 0.18455 1.7100 1.87460 0.6765 714.40 0.2510
OR4X2 0.19692 2.0660 1.89750 0.8225 711.90 0.3026
OR4X4 0.22300 2.7680 2.13720 1.6292 785.10 0.4046
XNORX1 0.22190 2.7350 2.19700 0.9540 826.40 0.3432
XNORX2 0.23550 2.9580 2.24700 1.5060 839.70 0.3912
XORX1 0.23810 2.9030 2.34200 0.8935 894.90 0.3466
XORX2 0.25130 2.9930 2.38400 1.2340 909.60 0.3945
81
Table A.4: Delay and Energy Measurements, Level Shifters
Delay Energy
[ns] [fJ]
LEVELSHIFTERX1 68.1040 3.2338
LEVELSHIFTERX2 68.2310 3.4065
LEVELSHIFTERX4 68.4100 4.3800
LEVELSHIFTERX8 68.7600 6.9478
LEVELSHIFTER NEARX1 0.4691 1.2301
LEVELSHIFTER NEARX2 0.5412 1.6577
LEVELSHIFTER NEARX4 0.6842 2.5943
LEVELSHIFTER NEARX8 0.9650 4.8472
LEVELSHIFTER S2NX1 63.1110 1.4279
LEVELSHIFTER S2NX2 66.5540 1.5519
LEVELSHIFTER S2NX4 71.2800 1.7914
LEVELSHIFTER S2NX8 78.2900 2.3310
82
Appendix B
ISCAS’89 Benchmark Results
This appendix displays the results for additional ISCAS’89 benchmark circuits. For each
of the benchmark circuits, the number of primary inputs, primary outputs, and gates in the
design are given. The plots demonstrating the percent energy savings are then displayed.
83
B.1 ISCAS’89 S386 Benchmark Circuit
7 primary inputs, 7 primary outputs, 165 gates
1  2  3  4  5  6  7  8  9  10 
0 
10 
20 
30 
40 
50 
60 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S386 Estimated Percent Energy Savings
Figure B.1: Energy Savings with Countermeasure Mode for ISCAS’89 S386 (Subset)
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S386 Estimated Percent Energy Savings
Figure B.2: Energy Savings with Countermeasure Mode for ISCAS’89 S386
84
B.2 ISCAS’89 S832 Benchmark Circuit
19 primary inputs, 7 primary outputs, 292 gates
1  2  3  4  5  6  7  8  9  10 
0 
10 
20 
30 
40 
50 
60 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S832 Estimated Percent Energy Savings
Figure B.3: Energy Savings with Countermeasure Mode for ISCAS’89 S832 (Subset)
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S832 Estimated Percent Energy Savings
Figure B.4: Energy Savings with Countermeasure Mode for ISCAS’89 S832
85
B.3 ISCAS’89 S1196 Benchmark Circuit
14 primary inputs, 14 primary outputs, 547 gates
1  2  3  4  5  6  7  8  9  10 
0 
10 
20 
30 
40 
50 
60 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S1196 Estimated Percent Energy Savings
Figure B.5: Energy Savings with Countermeasure Mode for ISCAS’89 S1196 (Subset)
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S1196 Estimated Percent Energy Savings
Figure B.6: Energy Savings with Countermeasure Mode for ISCAS’89 S1196
86
B.4 ISCAS’89 S1238 Benchmark Circuit
14 primary inputs, 14 primary outputs, 526 gates
1  2  3  4  5  6  7  8  9  10 
0 
10 
20 
30 
40 
50 
60 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S1238 Estimated Percent Energy Savings
Figure B.7: Energy Savings with Countermeasure Mode for ISCAS’89 S1238 (Subset)
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S1238 Estimated Percent Energy Savings
Figure B.8: Energy Savings with Countermeasure Mode for ISCAS’89 S1238
87
B.5 ISCAS’89 S1494 Benchmark Circuit
8 primary inputs, 19 primary outputs, 653 gates
1  2  3  4  5  6  7  8  9  10 
0 
10 
20 
30 
40 
50 
60 
70 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S1494 Estimated Percent Energy Savings
Figure B.9: Energy Savings with Countermeasure Mode for ISCAS’89 S1494 (Subset)
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
70 
80 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S1494 Estimated Percent Energy Savings
Figure B.10: Energy Savings with Countermeasure Mode for ISCAS’89 S1494
88
0  500  1000  1500  2000  2500  3000  3500  4000 
0 
10 
20 
30 
40 
50 
60 
70 
80 
Performance Degradation Factor 
P
er
ce
nt
 E
ne
rg
y 
S
av
in
gs
 
ISCAS'89 S1494 Estimated Percent Energy Savings
Figure B.11: Energy Savings with Extended Mode for ISCAS’89 S1494
89
