Low Power Techniques and Architectures FDR Field Programmable Gate Arrays by Kadir, Kureshi Abdul
LOW POWER TECHNIQUES AND ARCHITECTURES 
FDR FIELD PRDGRAMMABLE GATE ARRAYS 
ABSTRACT 
THESIS 
SUBMITTED FOR THE AWARD OF THE DEGREE OF 
Hoctor of MMsia^f^f 
m 
ELECTRONICS ENGINEERING 
BY 
KURESHI ABDUL KADIR 
UNDER THE SUPERVISION OF 
ROF. MOHD. HASAN 
DEPARTMENT OF ELECTRONICS ENGINEERING 
Z. H. COLLEGE OF ENGINEERING & TECHNOLOGY 
ALIGARH MUSLIM UNIVERSITY 
ALIGARH (INDIA) 
2010 
^Bstract 
Low Power Techniques and Architectures 
for Field Programmable Gate Arrays 
ABSTRACT 
Field Programmable Gate Arrays (FPGAs) have become extremely popular due 
to their field programmability,^ rapid prototyping of systems and cost effectiveness for 
low volume. They aefiievb'implementation efficiency approaching that of specialized 
logic while providing the silicon reusability bf-general purpose processors. The major 
problems associated, \yith an FPGA are low throughput and high power consumption 
compared to ApplicatwDj'.'^ jg.mflclnteg^^^^ Circuits (ASICs). Till now, ASICs are 
predominantly used in all portable"applications for a reasonable battery life contrary to 
FPGA on account of its very high power consumption. The cost of fabricafing an ASIC 
is exponentially rising. There are only few companies left that can afford the cost of 
fabricating a deep submicron ASIC. There is a pressing need to reduce the power 
consumption of existing FPGAs so that they can also be employed in energy efficient 
portable systems. This will help significantly in reducing the number of ASICs required 
in battery operated portable systems and hence the cost of these systems will come 
down. 
This work is aimed at exploring techniques at different levels of the design 
abstraction to reduce static and dynamic power consumption in different modules of 
commercial FPGAs. The work will involve investigation of different modules of the 
FPGA namely Configurable Logic Block (CLB), Programmable Interconnect and 
Configuration Memory at the circuit, logic and architectural levels so as to realize a 
more energy efficient FPGA. Several techniques like application of low leakage vectors 
and use of dynamic threshold MOSFET in place of conventional MOSFET have been 
applied to reduce both the dynamic and leakage power consumption in FPGAs. 
Carbon nanotube based electronics is evolving rapidly. Researchers have come 
up with a new carbon nanotube based MOSFET (CNFET) which can be used 
JiBstract 
effectively in place of a conventional MOSFET. They have also proposed the use of 
bundles of carbon nanotubes as interconnects in place of existing copper interconnect. 
This work also explores the potential of CNFET in place of MOSFET in 
designing more energy efficient configurable logic blocks and static random access 
memories for FPGA. The FPGA interconnects consume almost 60-70% of its total 
power consumption. This work investigates the use of bundles of carbon nanotubes as 
interconnect instead of copper and also CNFET as an interconnect driver instead of 
conventional MOSFET to realize more energy efficient interconnects. The 
incorporation of carbon nanotubes based electronics will go a long way to extend the 
battery life of FPGA based portable systems. 
Abstract 
INTRODUCTION OF FPGAS 
FPGAs have evolved considerably from their initial usage as just glue logic. 
The spatial realization of application and the programmability of the architecture allow 
FPGAs to approach the efficiency of ASICs with the silicon reusability of general-
purpose processors. There are different FPGA architectures commercially available, 
and all of them are built from the same functional components: such as an array of 
programmable logic blocks, programmable interconnect and I/O blocks around the 
perimeter. The reconfigurable elements allow an FPGA to be programmed to 
implement virtually any digital logic function. The majority of FPGAs provide 
programmable logic using lookup tables (LUTs). An individual k-input lookup table, or 
k-LUT, is capable of implementing any k-input combinational logic function. In order 
to support sequential logic, flip- flop is placed at the LUT output. This combination is 
referred to as a basic logic element (BLE). In most of the modem FPGAs, BLEs are 
grouped together in larger blocks called configurable logic blocks (CLBs) and are 
configured using SRAM memory cells. The implementation of FPGAs in silicon falls 
into three groups: SRAM-programmed, antifuse-programmed, and EPROM-
programmed. The configurable logic blocks in different implementations are very 
similar. The primary difference in various implementations is in the programmable 
routing architecture and the way it is configured. Due to the immense popularity of 
SRAM-based FPGA, only it is considered in this thesis. In FPGA architecture the 
connectivity between logic blocks is achieved through programmable routing resources. 
Connection boxes (CB) are at the adjacent to the CLBs on all four sides. They provide 
programmable connections between CLBs and the routing tracks; therefore, each input 
pin of CLB is able to connect to a certain percentage of tracks in the adjacent channel. 
Switch boxes (SB) exist at the intersection of every horizontal and vertical channel. 
They provide programmable connections between the horizontal and the vertical 
channels. 
Traditionally FPGAS have been used in the environment where the energy 
consumption was not critical issue. Present day por1:able devices have become more 
complex, and can take advantage of programmability offered by FPGA. These 
requirements places stress on the energy efficiency of FPGAs, which is lacking in the 
^Bstract 
existing commercial architecture. Another factor is the reduction in feature size, which 
has increased the transistor density and transistor count per die. This has resulted in an 
increase of overall power dissipation per chip. This trend will continue, and has 
implications on the economics and the technology of packing these devices. Therefore 
there is a pressing need to reduce the power consumption of existing FPGAs so that 
they can also be employed in energy efficient portable systems. This thesis presents 
different techniques and different technology to reduce the static and dynamic power 
consumption of different building blocks of FPGAs. 
STRUCTURE OF THESIS 
The core chapters of this thesis, from Chapter 4 to Chapter 9, are a collection of 
manuscripts published in various journals and conference proceedings. The structure of 
this thesis is as follows: 
• Chapter 2 covers different sources of power consumption in nanoscale CMOS 
followed by a survey of the relevant published literature on low power design 
techniques. 
• Chapter 3 presents the detailed architecture of an FPGA. It also gives an 
overview of power dissipation in an FPGA and discusses previous work on 
power reduction in FPGAs. 
• Chapter 4 investigates ultra-low power operation of FPGA building blocks such 
as 4-input look table (LUT), 1-bh full adder and SRAM cell. Different biasing 
schemes such as Subthreshold-MOS (Sub-MOS) and Subthreshold Dynamic 
Threshold MOS (Sub-DTMOS) are used and their performance is assessed 
under different process parameter variations. 
• Chapter 5 describes the basics of carbon nanotubes, different fabrication 
techniques and the modeling of carbon nanotube field effect transistor 
(CNFET). The chapter further analyses the performance of CNFET with bulk 
on the benchmark circuits, furthermore the performance of CNFET based basic 
logic element (BLE), a building block of FPGAs is explored. 
Abstract 
Chapter 6 presents leakage reduction of routing multiplexers through input 
vector control. A significant leakage power saving is achieved through 
manipulation of the configuration memory bits along with the muhiplexer 
inputs for the unused multiplexers contrary to earlier approaches in the literature. 
Chapter 7 explores the performance of a low leakage CNFET based 6T SRAM 
cell and also compares it with that of the conventional CMOS cell at a deep 
submicron 32nm technology node. Due to inherent characteristics of CNFET, 
such as good gate controllability, drive current and immunity to short channel 
effect, the CNFET based cell outperforms to CMOS cell in terms of leakage 
power saving, write margin, speed and read SNM. It has been also found that 
the CNFET cell is more robust against process parameter variations compared 
CMOS cell. 
Chapter 8 introduces carbon nanotube (CNT) bundle interconnects and also 
model mixed CNT bundle as interconnects. This chapter also compares the 
performance of mixed CNT bundle interconnects and CNFET as a driver with 
traditional Copper interconnects with CMOS driver. Due to high speed of 
CNFET and lower delay of mixed CNT bundle interconnects, the combination 
of mixed CNT bundle interconnects with CNFET driver can be very good 
alternative for future low energy FPGA routing. 
Chapter 9 explains different combinations of DTMOS transistor for realizing 
energy efficient FPGA interconnect fabric. This chapter also proposes a new 
DTMOS based multiplexer-type interconnects. Since FPGA interconnect fabric 
has thousands of switches (inside the multiplexer and switch boxes), therefore, 
overall improvement in power delay product (PDP) for the whole FPGA can be 
significant. 
Chapter 10 presents conclusions and a summary of the thesis and suggests topic 
for future research. 
Abstract 
CONCLUSIONS AND ACHIEVEMENTS 
Subthreshold DTCMOS based schemes for implementing FPGA building 
blocks such as 4-input LUT, 1-Bit full adder cell and 8 transistor SRAM cell are 
proposed in Chapter 4 at 22nm node. Such kind of FPGAs can be used even in ultra 
low power portable applications like hearing aids, pacemakers etc. Due to lower value 
of threshold voltage (because of body bias) of subtlireshold DTCMOS under ON 
condition the drain current increases, which provides higher speed operation compared 
to the conventional Subthreshold circuits. The higher operating current only slightly 
increases the power consumption and hence, the PDP of DTCMOS is much better than 
CMOS throughout the supply voltage range (from 150mV to 300mV). It is evident 
from the said chapter that the PDP of LUT and full adder cell is minimum at around 
250mV supply voltage. The sensitivities of the two schemes against process parameter 
variations are also explored. It has been found that DTCMOS shows superior 
robustness against temperature and process variations. Chapter 4 also implements 8 
transistors SRAM cell in CMOS and DTCMOS schemes and it has been found that 
DTCMOS based 8 transistors SRAM cell provides up to 15% and 23% improvement in 
read SNM and speed at a supply voltage of 200mV. This work has been published in 
[6], [13], [17] and [25]. 
Chapter 5 analyses the performance of CNFET with bulk on the benchmark 
circuits, furthermore the performance of CNFET based basic logic element (BLE), a 
building block of FPGAs is explored. Due to very high intrinsic carrier mobility of 
carbon nanotubes (CNTs), and because of very low effective capacitance of CNFETs, 
the CNFETs based BLE provides 37% improvement in power delay product (PDP). A 
part of this work has been published in [18] and [20]. 
The ability of FPGA to implement a variety of circuits on a single chip always 
results in the under utilization of some logic and interconnect resources. These unused 
transistors leak power in the absence of switching activity. The interconnect fabric of 
an FPGA consumes a major portion of the total leakage power. Chapter 6 focuses on 
the reduction of leakage power in the interconnect switch matrix multiplexers by 
ensuring that minimum leakage vector should be applied to all these multiplexers. The 
analyses of multiplexers have been carried out with varying sizes, topologies and 
Abstract 
transistor sizing at different temperatures and supply voltages at a deep submicron 
22nm technology node. The minimum leakage state heavily depends on the relative 
magnitude of the subthreshold and the gate leakage currents. Therefore, different low 
leakage vectors are selected for a minimum and optimum sized multiplexers such as 
keeping all input lines of multiplexers at logic ' 1 ' and inputs to inverters at logic '0' or 
keeping all the inputs of the multiplexers and inverters at logic T will provide a 
significant reduction in leakage for all unused interconnects without any kind of penalty. 
This work and other leakage power reduction techniques of FPGAs building blocks 
have been published respectively in [5], [7], [14], [19] and [27]. 
Due to aggressive scaling, secondary effects and process variations, the power 
consumption and performance of CMOS SRAM cell worsens in deep submicron 
technology. It has therefore, become difficuh to design low power, high speed, robust, 
and compact SRAM cells in deep submicron technology. The carbon nanotube based 
field effect transistor (CNFET) technology with reduced process variation, better gate 
controllability, high thermal stability and high drive current is a promising alternative 
to the bulk CMOS. Chapter 7 explores a low leakage, high speed and robust CNFET 
based 6T-SRAM cell and compares its performance with that of conventional CMOS 
based cell. All the simulations are carried out at 32nm technology node with equal 
threshold voltage for CNFET and CMOS transistors. Due to inherent characteristics of 
CNFET such as good gate controllability, drive current and immunity to short channel 
effect, the CNFET cell outperforms to CMOS cell in terms of leakage power saving, 
write margin, speed and read SNM. As the FPGA consists of a large number of 
configurable SRAM cells, the implementation of low leakage SRAM cell with CNFET 
technology will greatly contribute to the reduction of overall leakage consumption of 
FPGAs. This work has been published in [1]. 
The International Technology Roadmap for Semiconductors (ITRS) predicts 
that the traditional copper interconnects will be a major bottleneck when feature sizes 
become smaller than 45nm. This is due to steep rise in parasitic resistance of copper, 
which not only increases the interconnect delay but also limits their current carrying 
capability. In order to alleviate such problems, alternative interconnect technologies 
and their architectural implications for FPGAs in future process technologies must be 
explored. 
Abstract 
Due to their long mean free paths (MFP), high current carrying capability and 
high thermal conductivity, CNTs are expected to be a very good alternative material for 
future FPGA interconnects. Chapter 8 proposed that the FPGAs implemented by a 
mixed bundle interconnect with CNFETs as drivers outperform in delay and energy 
consumption compared to that of traditional Cu interconnects. This Chapter also 
provides important guidelines for selection of vital parameters of mixed CNT bundles 
so as to optimize the resistance, capacitance and inductance of mixed CNT bundle 
interconnects. This work has been published in [2], [4], 18]-[l 1] and [26]. 
The basic switching element in most of the FPGAs interconnect are NMOS pass 
transistors or multiplexers which suffer from threshold voltage drop thait causes high 
DC power dissipation in level restoring buffers. To eliminate this problem, recent 
architecture from Xilinx and Altera uses tri-state buffers. However this approach has 
significant area and power consumption overhead. Chapter 9 suggests some methods 
other than replacing pass transistors by tri-state buffers for reducing power 
consumption. A new augmented DTMOS biasing scheme is proposed which provides a 
fixed body bias and improves the level of high input of level restoring buffer which 
reduces the DC power consumption. The main advantage of DTMOS over conventional 
MOS is its higher drive current at lower bias levels. A simulation has performed on 
multiplexer based interconnect resources of FPGA such as Double, Hex and Long 
driving a Copper wire of different length. It has been found that the DTMOS based 
interconnect resources outperform in speed and PDP compared to conventional 
MOSFET. Since FPGA interconnect fabric has thousands of switches (inside the 
multiplexer and switch boxes), therefore, overall improvement in power delay product 
(PDP) for the whole FPGA can be significant. 
A final complication with DTMOS based interconnect resources is the process 
complexity and area penalty. The area overhead of the proposed interconnect will be 
very less because the extra needed transistor for DTMOS based switches is easily 
shared among all multiplexer based interconnects and the augmented transistor is of 
minimum size. This work has been published in [3], [12], [16] and [23]. 
ASstract 
LIST OF PUBLICATION 
JOURNALS 
1- Kureshi A. K. and Mohd. Hasan, "Performance comparison of CNFET and 
CMOS based 6T SRAM cell in deep submicron," Microelectronic Journal, 
Vol.40, No.6, pp.979-982, June-2009. 
2- Kureshi A. K. and Mohd. Hasan, "Analysis of CNT bundle and its comparison 
with copper interconnect for CMOS and CNFET drivers," Journal of 
Nanomaterials, pp. 1-6, 2009. 
3- Kureshi A. K. and Mohd. Hasan, "DTMOS based low power high speed 
interconnects for FPGA," Journal of Computers, Vol.4, No. 10, pp.921-926, 
Oct-2009. 
4- Kureshi A. K. and Mohd. Hasan, "Analysis of CNT bundle and its comparison 
with copper for FPGAs interconnects," International Journal of Applied 
Science, Engineering and Technology, Volume 5:3, pp.178-183, 2009. 
5- Kureshi A. K. and Mohd. Hasan, "Leakage analysis and optimization of CLB in 
Vertex-II FPGAs," International Journal of Systemic Cybernetic and 
Informatics, Vol.3, No. 1, pp.2718-2722, Nov-2007 
CONFERENCES 
6- Abdul Kadir Kureshi, Mohd. Hasan, and Naushad Alam, "Subthreshold deep 
submicron performance investigation of CMOS and DTCMOS Biasing 
Schemes for Reconfigurable Computing," IEEE ISCAS, Taivv'an, pp. 2545-
2548, May 2009. 
7- Mohd. Hasan and Abdul Kadir Kureshi, "Leakage reduction in FPGA routing 
multiplexers," IEEE ISCAS, Taiwan, pp. 1129-1132, May 2009. 
8- Naushad Alam, A. K. Kureshi, and Mohd. Hasan, "Carbon nanotube 
interconnects for low-power high-speed applications," IEEE ISCAS, Taiwan 
pp. 2273-2276, May 2009. 
Abstract 
9- A. K. Kureshi and Mohd. Hasan, "Energy efficient high speed CNFET based 
interconnect drivers for FPGAS," International Conference on Multimedia, 
Signal Processing and Communication Technologies (IMPACT-09) pp. 48-51, 
March 2009. 
10-Naushad Alam, A. K. Kureshi, and Mohd. Hasan, "Analysis of carbon nanotube 
interconnects and their comparison with Cu interconnects," International 
Conference on Multimedia, Signal Processing and Communication 
Technologies (IMPACT-09), pp. 124-127, March 2009. 
U-Naushad Alam, A. K. Kureshi, and Mohd. Hasan, "Performance comparison 
and variability analysis of CNT bundle and Cu interconnects," International 
Conference on Multimedia, Signal Processing and Communication 
Technologies (IMPACT-09), pp. 169-172, March 2009. 
12-Kureshi A. K., Naushad Alam, and Mohd. Hasan, "A novel low power high 
speed field programmable gate array routing interconnect," in the proceedings 
of SPIT-IEEE Colloquium and International Conference, Vol. 2, pp. 145 - 149, 
Feb. 2008. 
13-Naushad Alam, Kureshi A. K., and Mohd. Hasan, "Analysis and comparison of 
subthreshold 1-bit full adder cells", in the proceedings of SPIT-IEEE 
Colloquium and International Conference, Vol. 2, pp. 127 - 131, Feb. 2008. 
14- Kureshi A. K. and Mohd. Hasan, "Leakage power estimation and minimization 
in CLB of FPGA," Proceedings of the IEEE-International Conference on 
Computer and Communication Engineering, Kuala Lumpur, Malaysia, pp. 270-
274, May 13-15, 2008. 
15-Tarun Kumar Agarwal, Anurag Sawhney, Kureshi A. K. and Mohd. Hasan, 
"Performance comparison of static CMOS and MCML gates in sub-threshold 
region of operation for 32nm CMOS technology," Proceedings of the IEEE-
International Conference on Computer and Communication Engineering, Kuala 
Lumpur, Malaysia, pp. 284-287, May 13-15,2008. 
16-Kureshi A. K. and Mohd. Hasan, "Low power field programmable gate array 
interconnects," in the proceedings of the 5th International Conference on 
Systemics, Cybernatics and Informatics (ICSCI 2008), Vol. 1, pp. 52 - 55. 
10 
Abstract 
17-Naushad Alam, Kureshi A. K.,and Mohd. Hasan, "Subthreshold CMOS full 
adder for ultra-low power operation," in the proceedings of the 5th International 
Conference on Systemics, Cybernatics and Informatics (ICSCI 2008), Vol. 1, 
pp. 48-51 
18-Kureshi A. K. and Mohd. Hasan, "Leakage analysis of CNFET based basic 
digital building blocks," in the proceedings of International Conference on 
Embedded System and VLSI Design (ICEVD 2008;, pp. 246 - 249 
19-Kureshi A. K. and Mohd. Hasan, "Leakage power and delay optimization of 
FPGA interconnects," IEEE-International Conference on Signal processing. 
Communications and Networking (ICSCN 2008), pp. 568-572, Jan 2008. 
20-Kureshi A. K. and Mohd. Hasan, "Low leakage high speed CNFET based look 
up table," IEEE-International Conference on Emerging Trends in Electronics 
Technology), pp. 432-436, July 2008. 
21-Fahad aliusmani, A. K. Kureshi, Mohd. Hasan, and M.J.R. Khan, "Performance 
analysis of bulk CMOS, strained silicon and CNFET based operational 
amplifiers in VDSM technology," IEEE International Advance Computing 
Conference (lACC), pp. 2567-2579, March- 2009. 
22-Kureshi A. K., and Mohd. Hasan, "Performance comparison of CNFET and 
CMOS based 8T SRAM Cell in deep submicron," 12th IEEE VLSI Design and 
Test Symposium (VDAT-08), pp. 270-274, July 2008. 
23-Naushad Alam, Kureshi A. K., and Mohd. Hasan, "Dynamic threshold PMOS 
switch for power gating," 12th IEEE VLSI Design and Test Symposium 
(VDAT-08), pp. 317-320, July 2008. 
24- Anurag Agarwal, Kureshi A. K., and Mohd. Hasan, "Performance comparison 
of CNFET and CMOS based full adder at 32 nm technology node," 12th IEEE 
VLSI Design and Test Symposium (VDAT-08), pp. 426-430, July 2008. 
25-Kureshi A. K. and Mohd. Hasan, "Subthreshold operation of field 
programmable gate array," In the proceedings of MTECS, pp. 244 - 247, March 
2008. 
11 
JiBstract 
26-Kureshi A. K. and Mohd. Hasan, "Interconnect performance comparison of 
FPGA at 32nm technology," in the proceedings of NSC-08 (IIT Roorkee), pp. 
766-769,2008. 
27-Kureshi A. K. and Mohd. Hasan, "Leakage analysis and optimization of SRAM 
cell at 32nm," in the proceedings of NSC-07, pp. 123 - 126, 2007. 
28- Kureshi A. K. and Mohd. Hasan, "A study of different circuit level techniques 
for low leakage SRAM cells," In the proceedings of NCACCN, pp. 438 - 411, 
Feb 2007. 
29-Kureshi A. K. and Mohd. Hasan, "Energy efficient FPGA interconnects," 
Submitted to Mediterranean Nanotechnology letters, 7^ ^ Jan-2010. 
30- Kureshi A. K. and Mohd. Hasan, "DTMOS based low power FPGAs building 
blocks," Submitted to Mediterranean journal of Electronics and 
Communication, 2"'' Feb-2010. 
31-Kureshi A. K. and Mohd. Hasan, "Ultra-low power FPGAs design in 
subthreshold regime," Submitted to Microelectronic Journal, lO' April 2010. 
12 
LOW POWER TECHNIQUES AND ARCHITECTURES 
FOR FIELD PROGRAMMABLE GATE ARRAYS 
THESIS 
SUBMITTED FOR THE AVVARD OF THE DEGREE OF 
doctor of $f)taopf)j> 
IN 
ELECTRONICS ENGLNEERING 
BY 
KURESHI ABDUL KADIR 
UNDER THE SUPERVISION OF 
PROF. MOHD. HASAN 
DEPARTMENT OF ELECTRONICS ENGINEERING 
Z. H. COLLEGE OF ENGINEERING & TECHNOLOGY 
ALIGARH MUSLIM UNIVERSITY 
ALIGARH (INDIA) 
2010 
2 : OCT 2014 
T8719 
Professor Mohd. Hasan 
M.Tech (IIT-D), PhD (U.K.) 
Tel (O): 0571-2721148 
(R): 09897790043 
e-mail: m_hasan@rediffinail.com 
DEPARTMENT OF ELECTRONICS ENGINEERING 
Z. H. College of Engirieerihg and Technology 
Aligarh Muslim University. Aliaarh-lndia 202002 
Date:l3'^  April 2010 
Certificate 
Certified that this work entitled "LOW POWER TECHNIQUES AND 
ARCHITECTURES FOR FIELD PROGRAMMABLE GATE ARRAYS" which is 
being submitted by Mr. KURESHI ABDUL KADIR, in partial fulfillment of the 
requirements for the award of the degree of doctor of philosophy in Electronics 
Engineering from Aligarh Muslim University, Aligarh-India. This is a record of 
candidate's own work under my supervision and guidance;. The matter embodied in this 
thesis has not been submitted for the award of any other degree or diploma. 
rof. Mohd. Hasan 
Low Power Techniques and Architectures for 
Field Programmable Gate Arrays 
Abstract: - Field Programmable Gate Arrays (FPGAs) have become extremely popular due to 
their field programmability, rapid prototyping of systems and cost effectiveness for low 
volume. They achieve implementation efficiency approaching that of specialized logic while 
providing the silicon reusability of general purpose processors. The major problems 
associated with an FPGA are low throughput and high power consumption compared to 
Application Specific Integrated Circuits (ASICs). Till now, ASICs are predominantly used in 
all portable applications for a reasonable battery life contrary to FPGA on account of its very 
high power consumption. The cost of fabricating an ASIC is exponentially rising. There are 
only few companies left that can afford the cost of fabricating a deep submicron ASIC. There 
is a pressing need to reduce the power consumption of existing FPGAs so that they can also 
be employed in energy efficient portable systems. This will help significantly in reducing the 
number of ASICs required in battery operated portable systems and hence the cost of these 
systems will come down. 
This work is aimed at exploring techniques at different levels of the design 
abstraction to reduce static and dynamic power consumption in different modules of 
commercial FPGAs. The work will involve investigation of different modules of the FPGA 
namely Configurable Logic Block (CLB), Programmable Interconnect and Configuration 
Memory at the circuit, logic and architectural levels so as to realize a more energy efficient 
FPGA. Several techniques like application of low leakage vectors and use of dynamic 
threshold MOSFET in place of conventional MOSFET have been applied to reduce both the 
dynamic and leakage power consumption in FPGAs. 
Carbon nanotube based electronics is evolving rapidly. Researchers have come up 
with a new carbon nanotube based MOSFET (CNFET) which can be used effectively in place 
of a conventional MOSFET. They have also proposed the use of bundles of carbon nanotubes 
as interconnects in place of existing copper interconnect. This work also explores the potential 
of CNFET in place of MOSFET in designing more energy efficient configurable logic blocks 
and static random access memories for FPGA. The FPGA interconnects consume almost 60-
70% of its total power consumption. This work investigates the use of bundles of carbon 
nanotubes as interconnect instead of copper and also CNFEiT as an interconnect driver instead 
of conventional MOSFET to realize more energy efficient interconnects. The incorporation of 
carbon nanotubes based electronics will go a long way to extend the battery life of FPGA 
based portable systems. 
Ill 
Acknowledgements 
I am very much indebted and have great pleasure in expressing my deep sense of 
gratitude to my supervisor Prof. Mohd. Hasan Department of Electronics Engineering 
Aligarh Muslim University Aligarh, for his creative ideas, instructive guidance, expert 
advice and sparing his time. His advice and research attitude have provided me with a 
model for my entire future career. Without him this work would have not been completed. 
He was always at hand for his valuable suggestion and critical evaluation of theoretical 
and simulated results. He provided the necessary facilities for conducting the 
experimental work. For this and much more I am deeply grateful to him. 
Appreciation is expressed to Naushad Alam who gave me a great help during simulation 
of different FPGAs block on Hspice tools. 
A special thank goes to my friends Israil, Sachin and Amin for their helps in proof-
reading of this dissertation. 
Finally, 1 would like to thank, although this is too weak a word, my late father and all the 
other family members for their continual encouragement and support throughout this 
work. 
CONTENTS 
Page No. 
Abstract 111 
Acknowledgements IV 
CONTENTS V 
LIST OF TABLES X 
LIST OF FIGURES XI 
LIST OF SYMBOLS XVI 
LIST OF ABBREVIATIONS XVlll 
CHAPTER 1 INTRODUCTION 1 
1.1 Field Programmable Gate Arrays 1 
1.2 Motivation 2 
1.3 Contribution 3 
1.4 Structure 3 
1.5 Summary 5 
CHAPTER 2 LOW POWER TECHNIQUES 6 
2.1 Introduction 6 
2.2 Sources of Power Consumption in CMOS Circuits 6 
2.2.1 Dynamic / Switching Power 6 
2.2.2 Short-Circuit Power 6 
2.2.3 Static Power 7 
2.3 Dynamic Power Reduction Techniques 10 
2.3.1 Dual Power Supply 11 
2.3.2 Gate Sizing 11 
2.3.3 Transi stor S izing 12 
2.4 Leakage Power Reduction Techniques 12 
2.4.1 Power Gating and Multi-threshold 12 
2.4.2 Adaptive Body Bias 14 
2.4.3 Transistor Stacks 15 
2.4.4 Forced Transistor Stacking 16 
2.4.5 Sleepy Stack 16 
2.4.6 Long Channel Devices 17 
2.4.7 Optimal Standby Vector 17 
2.5 Summary 18 
CHAPTER 3 POWER REDUCTION TECHNIQUES FOR FPGAS 19 
3.1 Overview of FPGA Architecture 19 
3.2 Interconnect Architecture 21 
3.2.1 Island -style Architecture 21 
3.2.2 Row-Based Architecture 21 
3.2.3 Sea-of Gate Architecture 22 
3.2.4 Hierarchical Architecture 23 
3.3 Virtex-II FPGA Architecture 23 
3.4 Power Estimation Methodology 25 
3.4.1 Dynamic Power Estimation Methodology 26 
3.4.2 Leakage Power Estimation Methodology 28 
3.5 Different Power Reduction Techniques of FPGAs 29 
3.5.1 Low Power Routing 29 
3.5.2 Multi-threshold CMOS 30 
3.5.3 Power Gating 31 
3.5.4 Activity Packing 3 2 
3.5.5 Dual-VDD Assignment 33 
3.5.6 Super Cut-off Technique 34 
3.5.7 Heterogeneous Routing 34 
3.6 Summary 35 
CHAPTER 4 ULTRA LOW POWER FPGAs 36 
4.1 Introduction 36 
4.2 DTCMOS 37 
4.3 Analysis of Subthreshold FPGA Building Blocks 37 
4.3.1 Lookup Table 37 
4.4 Effect of Process Variation 40 
4.4.1 Temperature 41 
4.4.2 Threshold Voltage 42 
4.4.3 Channel Length 43 
4.5 SRAM Cell 44 
4.5.1 Standby Leakage 46 
VI 
4.5.2 Read SNM and Write Delay 47 
4.6 Full Adder 49 
4.7 Summary 53 
CHAPTER 5 CARBON NANOTUBE FIELD EFFECT TRANSISTOR TECHNOLOGY 55 
5.1 Introduction 55 
5.2 Carbon Nanotubes 56 
5.2.1 Properties of Single-Wall Carbon Nanotubes 57 
5.3 Fabrication of Carbon Nanotubes 58 
5.3.1 Arc Discharge 5 9 
5.3.2 Laser Ablation 60 
5.3.3 Chemical Vapor Deposition 61 
5.3.4 Flame Synthesis 62 
5.4 C N F E T Device Structure 62 
5.5 Model Overview 64 
5.5.1 CNFET Model Level 1 (LI) 65 
5.5.2 CNFET Model Level 2 (L2) 68 
5.5.3 CNFET Model Level 3 (L3) 69 
5.6 CNFET I-V Characteristics 69 
5.7 Performance Evaluation of Benchmark Circuits 72 
5.8 Performance Evaluation of a Basic Logic Element of an FPGA 75 
5.9 Summary 77 
CHAPTER 6 FPGA INTERCONNECT POWER REDUCTION BY INPUT VECTOR 78 
CONTROL 
6.1 Introduction 78 
6.2 Background Work 78 
6.3 Architecture of Target FPGA 79 
6.4 Input-State Dependence of Leakage Power in FPGAs 81 
6.4.1 Subthreshold Leakage 81 
6.4.2 Gate Leakage 81 
6.5 Leakage Power in FPGA Multiplexers 82 
6.5.1 4:1 Decoded Multiplexer 82 
6.5.2 4:1 Encoded Muhiplexer 84 
6.5.3 Hybrid Large size Multiplexers 85 
VII 
6.6 Summary 86 
CHAPTER 7 ROBUST LOW POWER CNFET BASED 6T SRAM CELL 87 
7.1 Introduction 87 
7.2 Carbon Nanotube Field Effect Transistor 88 
7.3 Standby Leakage Power 89 
7.4 Different Performance Parameters 90 
7.4.1 Stability 90 
7.4.2 Temperature 91 
7.4.3 Driving Capability 92 
7.4.4 Write Stability of the Cell 93 
7.5 Effect of Process Parameter Variations 94 
7.6 Summary 95 
CHAPTER 8 ENERGY EFFICIENT DRIVERS AND CNT BUNDLES AS 96 
INTERCONNECT 
8.1 Introduction 96 
8.2 Classification of carbon nanotubes 97 
8.2.1 SWCNT 97 
8.2.2 MWCNT 99 
8.2.3 Mixed CNT 101 
8.3 Dependence of Conductance on Different Parameters of CNT Bundle 102 
8.4 Dependence of Inductance and Capacitance on Different Parameters 103 
of CNT Bundle 
8.4.1 CNT Bundle Inductance 103 
8.4.2 CNT Bundle Capacitance 105 
8.5 RLC Parameter Comparison of CNT and Chopper Interconnects 106 
8.6 Comparison of CNFET and CMOS Driver with CNT and Cu 108 
Interconnects 
8.7 Performance Comparison of FPGA Routing Fabric 111 
8.8 Summary 115 
CHAPTER 9 DTMOS BASED Low POWER HIGH SPEED FPGA 116 
INTERCONNECTS 
9.1 Introduction 116 
9.2 Different Configuration of DTMOS Transistor / Switch 117 
VIII 
9.2.1 Basic DTMOS 118 
9.2.2 DTMOS with Augmenting Transistor 118 
9.2.3 DTMOS with Limiting Transistor 118 
9.2.4 DTMOS with Augmenting Fixed Reference Voltage Transistor 118 
9.2.5 DTMOS with Augmenting Fixed Reference Standard 119 
Threshold 
Voltage Transistor 
9.2.6 DTMOS with Augmenting Fixed Reference High Threshold 119 
Voltage Transistor 
9.3 Performance Comparison 119 
9.3.1 Selection of Vref for SVT-DTMOS 119 
9.3.2 I-V Characteristics 120 
9.3.3 Effect of Temperature 121 
9.4 Target Interconnect Resources 123 
9.4.1 Simulation Strategies (Leakage) 123 
9.4.2 Driving Capability and Power Consumption 125 
9.5 Performance Comparison of Realistic Interconnects 127 
9.5.1 Double Intercormect 127 
9.5.2 Hex Interconnect 130 
9.5.3 Long Interconnect 13 2 
9.6 Summary 134 
CHAPTER 10 SUMMARY AND CONCLUSIONS 136 
10.1 Introduction 136 
10.2 Summary 136 
10.3 Conclusions 141 
10.4 Achievements 143 
10.5 Areas of Future Research 144 
APPENDIX A LIST OF PUBLICATIONS 147 
REFERENCES 151 
IX 
LIST OF TABLES 
Table 
No. 
3.1 
4.1 
4.2 
5.1 
6.1 
6.2 
6.3 
7.1 
7.2 
8.1 
8.2 
8.3 
8.4 
8.5 
8.6 
8.7 
8.8 
9.1 
9.2 
9.3 
9.4 
9.5 
9.6 
Illustration Page No. 
Major interconnect present in the switch matrix ^^ 
Standby leakage (pA) of 8T CMOS and DTCMOS cell with 46 
respective to temperature and VDD 
Comparison of write delay of 8T CMOS and DTCMOS cell 49 
with respective to VDD 
Leakage power (nW) comparison of CNFET and bulk F04 ' ^  
Leakage power of 4:1 decoded multiplexer °^ 
84 Leakage power of 4:1 encoded multiplexer 
Of. 
Leakage power of 16:1 hybrid multiplexer 
Leakage power (nW) at different temperatures 
Write margin (mV) at different N of CT^ FET cell ^^ 
Local CNT interconnects parameters ^ " 
Local Cu interconnects parameters ^^" 
Intermediate Cu interconnects paramet(jrs ^^' 
Intermediate CNT interconnects parameters ^' 
Global Cu interconnects parameters ^ ^ ' 
Global CNT interconnects parameters ^^' 
Intermediate CNT interconnects optimized parameters ^ ° 
Global CNT interconnects optimized parameters ^^° 
Target interconnect resources present in the switch matrix ^^^ 
Leakage of different 2-input multiplexer schemes ^ ^ ^ 
Power consumption of different 2-input multiplexer schemes ^ ^ " 
Delay (ps) of different Double interconnect schemes ^ ^ ° 
Delay (ps) of different Hex interconnec t schemes ^ ^ ' 
Delay (ps) of different Long interconnect schemes ^ ^ ^ 
LIST OF FIGURES 
Figure TH * *• ^^^e 
°^ Illustration ivr^  
No. l^ o-
2.1 Schematic of inverter with vohage and current waveforms 7 
2.2 Leakage mechanisms in an NMOS transistor 8 
2.3 A low voltage inverter driving a high voltage inverter 11 
2.4 Schematic ofmulti-threshold CMOS inverter 13 
2.5 Schematic of common sleep transistor 13 
2.6 Adaptive body biasing 15 
2.7 Stacking effect in two-input NAND gate 15 
2.8 Forced stacking concept 16 
2.9 Sleepy stack concept 16 
3.1 (a) Abstract view of FPGA (b) Basic logic element (BLE) 20 
3.2 (a) FPGA Architecture (b) Logic inside the SB 20 
3.3 Row-based FPGA architecture 22 
3.4 Sea-of-gate FPGA architecture 22 
3.5 Hierarchical FPGA architecture 23 
3.6 Virtex-II FPGA architecture 25 
3.7 CLB and logic slice 25 
3.8 Dynamic power breakdown in Xilinx Virtex-II 27 
3.9 Contribution of dynamic power by different Interconnect out of 60% 27 
3.10 Leakage power breakdown based on circuit types 29 
3.11 New programmable low-power FPGA routing switch 30 
3.12 Coarse-gated SRAM array with 16 SRAM cells 31 
3.13 Dual-VDD assignment 3 3 
4.1 Schematic of subthreshold DTNMOS and DTPMOS 3 8 
4.2 Transistor level circuh of 4-inputLUT 38 
4.3 Delay variation with supply voltage for a 4-input LUT 39 
XI 
4.4 Power variation with supply voltage for a 4-input LUT 39 
4.5 PDF variation with supply voltage for a 4-input LUT 40 
4.6 Variation of delay with temperature for a 4-input LUT 42 
4.7 Variation of delay with threshold voltage for a 4-input LUT 42 
4.8 Delay variation with channel length for a 4-input LUT 43 
4.9 Variation of power dissipation with channel length for a 4-input LUT 43 
4.10 Variation of PDP with channel length for a 4-input LUT 44 
4.11 Schematic of 6T SRAM cell 45 
4.12 Schematic of 8T SRAM cell 45 
4.13 Standby leakage of 8T cell with respective to VDD at T=75"C 47 
4.14 Standby leakage of 8T cell with respective T at VDD=200mV 47 
4.15 Read SNM of 8T CMOS cell vs. VDD at different temperatures 48 
4.16 Read SNM of 8T DTCMOS cell vs. VDD at different temperatures 48 
4.17 Transistor level schematic of CMOS static full adder 49 
4.18 Delay variation with supply voltage for a static CMOS adder 50 
4.19 Power variation with supply voltage for a static CMOS adder 50 
4.20 PDP variation with supply voltage for a static CMOS adder 51 
4.21 Delay variation with temperature for a static CMOS adder 51 
4.22 Delay with threshold voltage variation for a static CMOS adder 52 
4.23 PDP variation with channel length for a static CMOS adder 53 
5.1 (a) Layers of sp2 bonded grapheme and (b) C-60 molecule 55 
5.2 Schematic representation of chiral vector in the crystal lattice of CNTs 57 
5.3 Schematic ofMWNT production using arc discharge. 59 
5.4 Schematic of SWNT production using arc discharge 59 
5.5 Schematic of SWNT production using laser ablation 60 
5.6 Schematic of chemical vapor deposition process 61 
5.7 Schematic of CNFET 62 
5.8 Schematic of SB-CNFET 63 
5.9 Schematic of MOS-CNFET 63 
5.10 Three-level hierarchy of CNFET (a) Level 1 model, (b) Level 2 65 
XII 
model and (c) Level 3 model 
5.11 Equivalent circuit for the level 1 model (LI). 66 
5.12 Eelectrostatic capacitor model to calculate the channel surface 67 
potential change A05 
5.13 Equation solver circuit used to implicitly solve for A<DS 68 
5.14 Schematic of N-channel CNFET 70 
5.15 Idsvs.VdsofN-CNFETfor(13,0)chirality 70 
5.16 Idsvs.VdsofN-CNFETfor(19,0)chirality 71 
5.17 Ids vs. Vds of N-CNFET for (26, 0) chirality 71 
5.18 Ids vs. Vgs of N-CNFET for different chirality's 72 
5.19 Frequency of 5-stage ring oscillator 73 
5.20 Power consumption of5-stage ring oscillator 73 
5.21 Delay vs. supply voltage of F04 inverter 74 
5.22 Power vs. supply voltage of F04 inverter 74 
5.23 PDP vs. supply voltage of F04 inverter 75 
5.24 Delay vs. supply voltage of BLE 76 
5.25 Power vs. supply voltage of BLE 76 
5.26 PDP vs. supply voltage of BLE 76 
6-1 4:1 Decoded multiplexer ^^ 
6-2 4:1 Encoded multiplexer ^^ 
"•^  16:1 Hybrid multiplexer °^ 
7.1 CNFET cross-section and related parameters 88 
7.2 Ids vs. S for different (t / r) ratios for N-CNFET 89 
7.3 Schematic of a 6T SRAM cell 90 
7.4 Read SNM vs. supply voltage 91 
7.5 Temperature vs. read SNM 92 
7.6 Write delay vs. number of tubes / CNFET 93 
7.7 Write power vs. number of tubes / CNFET 93 
7.8 Write delay vs. supply voltage 93 
7.9 Write power vs. supply voltage 94 
xm 
7.10 Read SNM vs. channel length of MDl transistor 95 
8.1 (a) Single-wall carbon nanotubes (b) Muhi-wall carbon nanotubes 97 
8.2 Isolated CNT with diameter 'J'over a ground plane at a distance 'y' 98 
8.3 Equivalent circuit model of ideally contacted SWCNT 99 
8.4 Bundle conductance vs. process parameters (D, Ji? & r) 102 
8.5 Inductance vs. bundle width (W) 103 
8.6 Kinetic inductance vs. process parameters (D, t/i? & r) 104 
8.7 Kinetic inductance vs. average diameters 105 
8.8 Quantum capacitance (Cg) of bundle 105 
8.9 CNT bundle circuit model for local interconnects 108 
8.10 CNT bundle circuit model for intermediate and global interconnects 109 
8.11 Cu wire circuit model for local, intermediate and global interconnects 109 
8.12 Test setup for simulation 109 
8.13 Delay of different driver-interconnect 110 
8.14 Power of different driver-interconnect 111 
8.15 Power delay product of different driver-interconnect 111 
8.16 Island-style FPGA architecture 112 
8.17 Baseline test platform. 112 
8.18 Energy vs. VDD for Hex interconnect 114 
8.19 Delay vs. VDD for Hex interconnect 114 
8.20 EDP vs. VDD for Hex interconnect 114 
8.21 EDP vs. VDD for Long interconnect 115 
9.1 Different configurations of DTMOS transistor 117 
(a) Basic DTMOS (b) DTMOS with augmenting transistor 
(c) DTMOS with limiting transistor and 
(d) DTMOS with augmenting fixed reference voltage transistor 
9.2 Ids vs. Vds of SVT- DTMOS transistor 120 
9.3 Ids vs. Vds of SVT- DTMOS transistor at T=75°C 120 
9.4 Ids vs. Vds for different transistor schemes 121 
9.5 Ids vs. Vgs for different transistor schemes 121 
XIV 
9.6 Ids vs. Vgs for different transistor schemes at T=75°C 122 
9.7 Ids vs. Vds (at Vgs=OV) for different transistor schemes at T=75°C 122 
9.8 Transistor level view of 2-input multiplexer 124 
(a) Conventional and (b) DTMOS based 
9.9 Average leakage of different 2-input multiplexer schemes 125 
9.10 Delay of different 2-input multiplexer schemes 126 
9.11 PDP of different 2-input multiplexer schemes 126 
9.12 Simulation set-up for different interconnects 127 
9.13 Delay of different Double interconnect schemes at VDD=0.9V and 0.6 V 128 
9.14 PDP of different Double interconnect schemes at VDD=0.9V 129 
9.15 PDP of different Double interconnect schemes at VDD=0.6V 129 
9.16 Leakage power of Double interconnect schemes at different VDD 130 
9.17 Leakage power of Hex interconnect schemes at different VDD 131 
9.18 PDP of different Hex interconnect schemes at VDD=0.9V 132 
9.19 PDP of different Hex interconnect schemes at VDD=0.6V 132 
9.20 Leakage power of Long interconnect schemes at different VDD 133 
9.21 PDP ofdifferent Long interconnect schemes at VDD=0.9V 133 
9.22 PDP of different Long interconnect schemes at VDD=0.6V 134 
XV 
LIST OF SYMBOLS 
Symbol 
(n, m) 
a 
al, a2 
^c-c 
Cdm 
CE 
Cox 
CQ 
Csub 
d 
D 
E, 
gm 
h 
Hox 
loFF 
ION 
kg 
Km 
L 
M 
q 
r 
S 
T 
Tox 
tr 
VDD 
Vox 
Vppit 
Illustration 
Chiral vector 
Lattice constant 
Unit vectors for the graphene hexagonal structure 
Nearest neighbor distance between C-C bonds 
Depletion layer capacitance 
Electrostatic capacitance 
Gate oxide capacitance 
Quantum capacitance 
Capacitance between the channel and substrate 
Diameter of a CNT 
Tube density 
Energy gap 
Transconductance 
Distance between the gate and the CNT center 
Gate dielectric thickness between the CNT center and gate 
Transistor off current 
Transistor on current 
Boltzmann's constant 
Wavenumber of the wth subband 
Number of substates 
Number of subbands 
Elementary charge 
Probability of CNT metallic 
Distance between the centers of the two adjacent parallel CNT 
Temperature 
Oxide thickness 
Rise time 
Supply voltage 
Potential drop across thin oxide layer 
Carbon-carbon (C-C) tight binding overlap energy 
XVI 
Vth 
pCc 
n 
A0B 
e 
Ps 
0f 
0OX 
Vtho 
V 
VT 
//O 
peff 
s. 
a 
a 
c 
Cs 
gCNT 
ki 
k2 
Threshold Vohage 
Coupling capacitances from the channel to the source and drain 
Body effect coefficient 
Change in the channel surface potentiial 
Chirality angle 
Source resistance per unit length of doped CNT 
Substrate Fermi potential 
Barrier height of tunneling particles 
Zero bias threshold voltage 
DIBL effect coefficient 
Thermal voltage 
Zero Bias mobility 
Effective carrier mobility 
Relative permittivity of silicon 
Activity factor 
Capacitance of node i on the critical path 
Load capacitance 
Shell to shell capacitance 
Transconductance per CNT 
Dielectric constants of the gate oxidfj (Hf02) 
Dielectric constants of insulating bulk oxide 
XVII 
LIST OF ABBREVIATION 
ASICs 
AT 
BL 
BLB 
BLE 
BPTM 
CR 
CAD 
CB 
Cins 
CLB 
CNFET 
CNIA 
CNT 
Cu 
CVD 
D,n 
Dout 
DTNMOS 
DTPMOS 
DT-Sub 
EDT 
EDP 
FBB 
FPGAs 
Application Specific Integrated Circuits 
Augmenting Transistor 
Bitline 
Bitline Bar 
Basic Logic Element 
Berkeley Predictive Technology Model 
Cell Ratio 
Computer Aided Design 
Connection Boxes 
Insulator Capacitance 
Configurable Logic Block 
Carbon Nanotube Field Effect Transistor 
Carbon Nanotubes Interconnect Analyzer 
Carbon Nanotube 
Copper 
Chemical Vapor Deposition 
Minimum (inner) Shell Diameter 
Maximum (outer) Shell Diameter 
Dynamic Threshold NMOS 
Dynamic Threshold PMOS 
Dynamic Threshold- Subthreshold 
Edge Direct-Tunneling 
Energy Delay Product 
Forward Body Bias 
Field Programmable Gate Arrays 
XVIII 
GIDL 
HVT 
I/O 
Ipeak 
Isub 
ITRS 
IXbar 
i^Bundle 
Lk 
i^ m 
LT 
LUT 
MFP 
MOS-CNFET 
MOSFET 
MTCMOS 
MWCNT 
ncNT 
Nsh 
OXbar 
PR 
PDF 
RBB 
RBL 
RQ 
RsWCNT 
RWL 
Gate-Induced Drain Leakage 
High Threshold Voltage 
Input / Output 
Peak Current 
Subthreshold Leakage 
International Technology Roadmap For Semiconductor 
Input Crossbar 
Bundle Inductance 
Kinetic Inductance 
Magnetic Inductance 
Limiting Transistor 
Lookup Table 
Mean Free Path 
MOSFET-Like CNFET 
Metal Oxide Semiconductor Field Effect Transistor 
Multi-Threshold CMOS 
Multi-Wall Carbon Nanotube 
Number OfCNT In Bundle 
Number Of Shells 
Output Crossbar 
PuU-Up Ratio 
Power Delay Product 
Reverse Body Bias 
Read Bitline 
Quantum Resistance 
Resistance OfCNT 
Read Wordline 
XIX 
SB 
SB-CNFET 
SNM 
SRAM 
Sub 
Sub-DTMOS 
Sub-MOS 
SVT 
SWCNT 
VLS 
Vref 
VvD 
WM 
Switch Blocks 
Schottky Barrier CNFET 
Static Noise Margin 
Static Random Access Memory 
Subthreshold 
Subthreshold Dynamic Threshold MOS 
Subthreshold-MOS 
Standard Threshold Voltage 
Single Wall Carbon Nanotube 
Vapor-Liquid-Solid 
Reference Voltage 
Virtual VDD 
Write Margin 
XX 

Introduction 
Chapter 1 
INTRODUCTION 
1.1 Field Programmable Gate Arrays 
The continuous improvement in speed, density and lower non-recurring 
engineering (NRE) cost of field-programmable gate arrays (FPGAs) make them a 
viable alternative to custom application specific integrated circuit (ASICs) for digital 
system implementation [l]-[2]. The cost of fabrication of an ASIC is exponentially 
rising in deep submicron. It will become difficuh to afford the cost of ASIC based 
portable products in the future. FPGAs cannot be used in place of ASICs in portable 
applications due to its high power consumption [3]-[5]. This work investigates different 
techniques for reducing the power consumption of FPGA so that it can also be 
employed in portable systems. FPGAs are programmable logic devices (PLDs) that can 
be configured by the end user to implement virtually any digital system. Commercial 
FPGAs, such as Strafix from Altera, and Virtex from Xilinx have on-chip memory 
blocks and DSP resources, tiles for I/O clock management apart from the 
programmable logic, making it even more attractive for implemenfing complete system 
on chip [6]-[7]. 
FPGA consists of an array of programmable logic blocks that are connected 
through a programmable interconnection network. The majority of FPGAs provide 
programmable logic using lookup tables (LUTs). An individual k-input lookup table (k-
LUT) is capable of implemenfing any k-input combinafional logic funcfion. In order to 
support sequenfial logic, flip flops are placed at the LUT output. This combination is 
referred to as a basic logic element (BLE) [8]. In most modem FPGAs, BLEs are 
grouped together in large blocks called configurable logic blocks (CLBs) and are 
configured using static random access memory (SR \^M) elements. 
Connectivity between logic blocks is achieved through programmable routing 
resources. These resources are made up of metal tracks arranged in channels running 
vertically and horizontally across the FPGA. A channel is made up of a number of 
tracks; a track is made up of wire segments of fixed length. Wires are connected to each 
Motivation 
other using switch blocks, and the logic blocks input/output are connected to these 
wires using connection blocks. 
1.2 Motivation 
FPGAs are a popular choice for digital system implementation because of their 
growing density, speed, short design cycle, and steadily decreasing cost. The power 
optimization has attracted increased attention due to the rapid growth of personal 
wireless communication, battery powered devices and portable digital applications. 
Compared to the ASIC chips, FPGA chips are generally perceived as power inefficient, 
because they use larger number of transistors to provide programmability [2]. In 
addition, a large percentage area of FPGA remains unutilized in most of the 
configuration. As a result, the power dissipation of FPGA device is significantly larger 
than that of their ASIC counterpart. Thus, the large power consumption of FPGA has 
become a constraining factor for FPGA designs to enter main stream low power 
portable applications [9]. No commercial FPGA vendor offers hardware or software 
specially targeted to low-power applications. The extent to which FPGA power can be 
optimized through CAD, architecture, or circuit level techniques has been an open 
question. Despite the relative weakness of FPGAs from the leakage power angle, not 
much research has been done on leakage power reduction in FPGAs. 
The prior research work has been mainly concerned with dynamic power 
consumption [10] and assumes leakage power to be a small component of the total 
power. However, these analyses are based on technologies at 0.15 um node or above, 
making them somewhat out of step with today's state-of-the-art FPGAs, which are 
fabricated in sub-lOOnm technology. The ITRS roadmap indicates that as we move to 
smaller node, leakage will dominate the total power distribution [11]. Unlike ASICs, an 
FPGA circuit implementation uses only a fraction of FPGA resources. Leakage power 
is dissipated in both the used and the unused parts of the FPGA [12]. Therefore, the 
leakage power consumption is higher than the dynamic power consumption. Further, 
the programmability of FPGA implies that more transistors are needed to implement a 
given logic circuit, in comparison with ASICs. Leakage power is proportional to total 
transistor count, and consequently, leakage optimization will likely be a key design 
objective in future FPGA technologies. Reducing the power consumption of FPGA is 
beneficial as it lowers packaging/cooling cost, improves reliability and enables FPGA 
usage in low-power portable applications. 
ContriSution 
1.3 Contribution 
This work is aimed at exploring techniques at different levels of the design 
abstraction to reduce static and dynamic power consumption in different modules of 
commercial FPGAs. The work involves investigation of different modules of FPGAs 
namely Configurable logic block (CLB), Programmable interconnect, and Configuration 
memory at the circuit, logic and architectural levels so as to realize a more energy 
efficient FPGA. Several techniques like application of low leakage vectors and use of 
dynamic threshold MOSFET in place of conventional MOSFET have been applied to 
reduce the leakage and dynamic power consumption in an FPGA. 
Carbon nanotube based electronics is evolving rapidly. Researchers have come 
up with a new carbon nanotube based MOSFET (CNFET) which can be used 
effecfively in place of a conventional MOSFET. Similarly carbon nanotubes (CNTs) 
bundle have been proposed as a possible replacement for on-chip copper interconnects 
due to their large conductivity and current carrying capabilities. CNT bundles can 
provide substantially lower resistance than copper wires, especially for intermediate 
and global interconnect applications. The FPGA interconnects consume almost 60-70% 
of its total power consumption [13]-[16]. This work therefore investigate the potential 
of CNFET by utilizing CNT bundles as wires in the FPGA interconnect fabric and 
compare their performance to standard copper interconnect with MOSFET to realize 
more energy efficient interconnects. 
This work also explores the potential of CNFET in place of MOSFET in 
designing more energy efficient configurable logic blocks and static random access 
memories for FPGA. The incorporation of carbon nanotube based electronics will go a 
long way to extend the battery life of FPGA based portable systems. 
1.4 Structure 
The core chapters of this thesis, from Chapter 4 to Chapter 9, are a collection of 
manuscripts published in various journals and conference proceedings. This thesis 
focuses on the reduction of static and dynamic power consumption of FPGA building 
blocks with optimization in propagation delay. The structure of this thesis is as follows: 
• Chapter 2 covers different sources of power consumption in nanoscale CMOS 
followed by a survey of the relevant published literature on low power design 
techniques with an emphasis on techniques for static power reduction. 
Structure 
Chapter 3 presents the detailed architecture of an FPGA. It also gives an 
overview of power dissipation in an FPGA and discusses previous work on 
power reduction in FPGAs. 
Chapter 4 investigates ultra-low power operation of FPGA building blocks such 
as 4-input look table (LUT), 1-bit full adder and SRAM cell. Different biasing 
schemes such as Subthreshold-MOS (Sub-MOS) and Subthreshold Dynamic 
Threshold MOS (Sub-DTMOS) are used and their performance is assessed 
under different process parameter variations. It was found that DTMOS has 
lower PDP and sensitivities to process variations compared to sub-MOS. 
Moreover, the PDP of LUT and adder blocks can be further improved by using 
longer channel lengths. 
Chapter 5 describes the basics of carbon nanotubes, different fabrication 
techniques and the modeling of carbon nanotube field effect transistor 
(CNFET). The chapter further analyses the performance of CNFET with bulk 
on the benchmark circuits, furthermore the performance of CNFET based basic 
logic element (BLE), a building block of FPGAs is explored. 
Chapter 6 presents leakage reduction of routing multiplexers through input 
vector control. A significant leakage power saving is achieved through 
manipulation of the configuration memory bits along with the multiplexer 
inputs for the unused multiplexers contrary to earlier approaches in the 
literature. 
Chapter 7 explores the performance of a low leakage CNFET based 6T SRAM 
cell and also compares it with that of the conventional CMOS cell at a deep 
submicron 32nm technology node. Due to inherent characteristics of CNFET, 
such as good gate controllability, drive current and immunity to short channel 
effect, the CNFET based cell outperforms to CMOS cell in terms of leakage 
power saving, write margin, speed and read SNM. The CNFET based SRAM 
cell has also more stable SNM against temperature variation compared to 
CMOS cell. The CNFET cell is more robust against process parameter 
variations. Hence, CNFET holds a lot of promise as an alternative to the MOS 
transistor and has significant improvements in performance than CMOS. 
Summary 
• Chapter 8 introduces carbon nanotube (CNT) bundle interconnects and also 
model mixed CNT bundle as interconnects. This chapter also compares the 
performance of mixed CNT bundle interconnects and CNFET as a driver with 
traditional Copper interconnects with CMOS driver. Due to high speed of 
CNFET and lower delay of mixed CNT bundle interconnects, the combination 
of mixed CNT bundle interconnects with CNFET driver can be very good 
alternative for future low energy FPGA routing. 
' Chapter 9 explains different combinations of DTMOS transistor for realizing 
energy efficient FPGA interconnect fabric. In DTMOS, the gate is tied to the 
body of the MOSFET. Hence, any variation at the gate potential induces the 
same variation on the body, thereby, dynamically changing the threshold 
voltage. This chapter also proposes a new DTMOS based multiplexer-type 
interconnects. Since FPGA interconnect fabric has thousands of switches (inside 
the multiplexer and switch boxes), therefore, overall improvement in power 
delay product (PDP) for the whole FPGA can be significant. The etrea overhead 
of the proposed switches and interconnects will be very less if the extra needed 
transistors of DTMOS based switches is judiciously shared, which is easily 
possible in all muhiplexer-based interconnects. 
• Chapter 10 presents conclusions and a summary of the thesis and suggests topic 
for future research. 
1.5 Summary 
Field programmable gate arrays are widely used to implement a variety of 
Digital Systems. FPGAs are cost effective and flexible, because functions and 
interconnections of logic resources can be directly programmed by end users. Despite 
their design cost advantage, FPGAs impose large power consumption overhead 
compared with custom silicon alternatives. The overhead increases packaging costs and 
limits usage of FPGAs in portable systems. This thesis presents techniques and the use 
of alternative technology (CNFET transistor and bundles of CNT as interconnects) to 
reduce the power consumption in different blocks of FPGAs such that they can also be 
employed in portable systems in place of expensive ASICs in deep submicron 
technology. The next chapter reviews main sources of power consumption in CMOS 
technologies and low power techniques available in the literature. 

Low (PonverTec/iniques 
Chapter 2 
Low POWER TECHNIQUES 
2.1 Introduction 
This chapter presents sources of power dissipation in CMOS transistors with a 
special focus on those contributing to the static p)Ower consumption. Section 2.2 
reviews power dissipation in CMOS circuits, covering both dynamic as well as static 
power. Section 2.3 and 2.4 give overviews of different power reduction techniques for 
dynamic and leakage power of CMOS circuits respectively. 
2.2 Sources of Power Consumption in CMOS Circuits 
The sources of power consumption for CMOS integrated circuits are classified 
as dynamic/ switching power, short circuit power and static power. 
2.2.1 Dynamic/Switching Power 
The dynamic power consumption is the power consumed during charging and 
discharging of capacitances associated with each circuit nodes [17]-[18]. The 
components of dynamic power (Pjy) is shown in equation (2.1) 
P./v^aFCVDD^ (2.1) 
Where 'VDD' is the power supply voltage, 'F' is the clock frequency, and ' a ' is 
the switching factor (or activity factor). Equation (2.1) shows that the 'Pdy' is a function 
of the square of the supply voltage and hence the most significant reduction in power 
can be achieved by reducing VDD. 
2.2.2 Short-Circuit Power 
The second component of power consumption is the short-circuit power. This 
power consumption is incurred when both pull-up and pull-down transistors are 
simultaneously conducting. Consider the simplest static complementary MOS (CMOS) 
inverter shown in Figure 2.1. When NMOS transistor turns ON due to a rising 
waveform at the input, then the PMOS transistor also continues to conduct current until 
the input voltage becomes greater than VDD-|Vl;p|, (hence both transistors are ON 
simultaneously). Therefore, a direct current flow from supply to ground, which is called 
Sources of (Power Consumption in CMOS Circuits 
short-circuit current [19]. The short-circuit current waveform can be approximated as a 
triangular wave. The total charge that flows in this period can be found by calculating 
the area of this triangle [20]. Let' U' denotes the time for the input voltage to rise from 
Vtn to VDD-|Vtp|. Where Vtn / Vtp is the threshold voltage of NMOS / PMOS 
transistors respectively. Assuming symmetric high-to-low and low-to- high transitions 
for both input and output of inverter gate, the total short-circuit power {Psc) for a this 
gate is defined as 
Psc = a.trIpeakFVDli (2.2) 
Where 'a ' is the switching activity factor and Ipeak is the peak current per transistor 
width. 
VDD 
Vin Vout 
Figure 2.1: Schematic of Inverter with voltage and current waveforms 
2.2.3 Static Power 
> t 
> t 
The static power is defined as the power consumption originated from currents 
constantly flowing from VDD to ground. This means that even when the circuit is in 
idle mode (no activity), power continues to be dissipated. For long chaimel transistors 
with high threshold voltage, this type of dissipation was negligible. Unfortunately, 
present and future technologies will suffer from high static power, which could even 
exceed the dynamic power contribution in active mode. 
Sources of (Power Consumption in CMOS Circuits 
The shrinking geometries have led to different sources of lealcage current. 
Figure 2.2 shows the different leakage current mechanisms through a short channel 
NMOS transistor [21]-[23] such as 
(a) Reverse bias p-n junction current and band to band tunneling (//) 
(b) Subthreshold current (T?) 
(c) Gate Leakage current {J3) 
(d) Gate current due to hot-carrier injection (7.^ ) 
(e) Gate-Induced Drain Leakage {I5) 
(f) Punchthrough {k) 
Gate 
Figure 2.2: Leakage mechanisms in an NMOS transistor 
In normal MOS transistor operation mode, the drain/source to well junctions is 
reverse biased, causing pn junction leakage current (//). This current is generated 
because of the minority carrier diffusion/drift near the edge of the depletion region, and 
due to electron-hole pair generation in the depletion region of the reverse biased 
junction [24]-[25]. The magnitude of the diode's leakage current depends on the area of 
the drain diffusion and the leakage current density, which is in turn determined by the 
doping concentration. If both n and p regions are heavily doped, band-to-band 
tunneling (BTBT) dominates the pn junction leakage [26]. 
In case the n and p regions are heavily doped then band-to-band tunneling 
(BTBT) starts to dominate the leakage current in the pn junction. The BTBT leakage 
current flows under high electric field (>10^ V/cm) across the reverse-biased pn 
junction and causes significant current to flow through the junction due to tunneling of 
electrons from the valence band of the p region to the conduction band of the n region. 
The Subthreshold current {h) is originated by the diffusion of minority carriers 
in a non conducting transistor i.e. (Fg5< Vth). Under this condition, the MOS transistor 
Sources of (Power Consumption in CMOS Circuits 
is operating in weak inversion [27]-[28]. The potential difference between drain and 
source creates a flow of minority carriers on the surface of the channel. The 
subthreshold current has an exponential dependency on the threshold voltage (Vth). 
This is the reason why the low Vth characterizing recent technologies lead to large 
subthreshold current. The value of Vth is fixed for a given technology but can be 
modulated through body effect. The body effect appears when a potential difference is 
present between body (bulk) and source. This happens because bulk and source operate 
as a reverse biased p-n junction. By increasing the body potential in an NMOS or by 
decreasing it in a PMOS (forward biasing), the junction depletion reduces and so is the 
threshold voltage leading to the increase in sub-threshold leakage current. Similarly, a 
reduction of the body potential (lower than VSS for NMOS and higher than VDD for 
PMOS, called reverse biasing) increases the depletion charge and hence the threshold 
voltage, leading to reduced subthreshold leakage [29]-[30]. 
The subthreshold leakage ' /sub' given by [31] is 
hub=juoCox (Ky.) e e{- ) ( l -e - ) (2.3) 
Where ' //o 'is the zero bias mobility, ' Vth' is the threshold voltage, ' F?-' is the thermal 
voltage, '«' is the subthreshold swing coefficient, 'Cox' is the gate oxide capacitance, 
and ' ^ //-g/r' is the width/ effective length of transistor respectively. 
It is clear from equation (2.3), that the reduction of threshold voltage 
exponentially increases the subthreshold leakage current. Similarly decreasing the L.r 
also increases the subthreshold leakage. 
The gate leakage (Is) is due to direct tunneling current that penetrates the gate 
insulator. Unlike subthreshold leakage, the gate leakage is present in both the OFF state 
and the ON state of MOS transistor, which makes gate leakage more difficult to control 
than subthreshold leakage [32]-[33]. In the ON state, the gate leakage is the sum of two 
components namely the gate to channel and the gate to source/drain extension overlap 
current, while in the OFF state, it is due to edge direct-tunneling current (EDT). Hence, 
the gate leakage strongly depends on the potential of transistor's gate, the gate oxide 
thickness (Tox), the gate oxide material (K) and the; effective width of the transistor 
(Weff). The gate leakage expressed in [34] is given by equation 2.4. 
(Dynamic (Power eduction TecRniques 
Vox 2 
^S'"'=^eff^eff'^(rox^ exp 
. ( l - ' ^ ) 2 
'vox 
(^OX 
(2.4) 
16 n h (j>ox 
4 :2m(j)0x'^l'^ 
D = - -
Where ' Vox' is the potential drop across the thin oxide layer;' ^ x ' is the barrier 
height for the tunneling particles, Tox' is the oxide thickness, 'A' and 'B' are physical 
parameters. 
In the overlapping zone between gate and drain, a high electric field exists, 
leading to the generation of current from drain to substrate. Consider an NMOS 
transistor; when a low gate potential is applied {Vg near zero volts); the holes 
accumulate at the surface and create a region which is more heavily p-doped than the 
substrate. If this happens while the drain is connected to a high potential (say VDD), 
the depletion layer near the drain becomes narrower, ks a result, minority carriers are 
emitted in the drain region underneath the gate and pushed to the substrate due to the 
vertical electric field. Thinner oxide thickness and higher potential between gate and 
drain enhance the electric field and, therefore, increase Gate-Induced Drain Leakage 
(GIDL) [35]-[36]. 
In short-channel devices, due to the proximity of the drain and the source, the 
depletion regions at the drain-substrate and source-substrate junctions extend into the 
channel. As the channel length is reduced for a fixed doping level, the separation 
between the depletion region boundaries decreases. An increase in the reverse bias 
across the junctions (with increase in Vds) also pushes the junctions nearer to each 
other. The combination of channel length reduction and high reverse bias leads to the 
merging of the depletion regions, and then punchthrough occurs. 
2.3 Dynamic Power Reduction Techniques 
The following techniques can be used to reduce the dynamic power 
10 
(Dynamic <Power ^ fifuctiim Tecfytiques 
2.3.1 Dual Power Supply 
Reducing the supply voltage, or voltage scaling,, is the most effective technique 
for dynamic power reduction because dynamic power is proportional to the square of 
the supply voltage. This technique can significantly reduce dissipated power without 
degrading speed, by selectively lowering the supply voltage along non-critical delay 
paths or light workloads and higher supply voltage for heavy workloads I37]-[38]. The 
main problem of designing dual supply voltage in CMOS circuits is the increased 
leakage current in the high voltage gates, when a low voltage gate is driving them [39]-
[40]. Figure 2.3 shows the case, where a low supply voltage (VDDL) inverter is driving a 
high supply voltage (VDDH) inverter. To overcome the problem of increased leakage 
current an additional level converter is required, which has area and power penalty. 
VDDL V D D H 
Vin VOut 
static current 
Figure 2.3: A low voltage inverter driving a high voltage inverter 
2.3.2 Gate Sizing 
Non-critical paths have timing slack and the delays of some gates on these paths 
can be increased without affecting the performance. Since the lengths of devices 
(transistors) in a gate are usually minimal for a high spieed application. ITie gate delay 
can be increased by reducing the device width; as a result, the dynamic power is 
accordingly decreased (this is due to smaller loading capacitance 'C/,', proportional to 
the device size). Gate sizing is a technique that determines device widths for gates [41]-
[44]. Traditional gate sizing approaches use Elmore delay models in a polynomial 
formulation. Heuristics based Greedy approaches can be used to solve such 
polynomials. 
11 
Lea^ige (Power ({(fduction techniques 
2.3.3 Transistor Sizing 
The basic concept of transistor sizing is exactly the same as that of gate sizing 
except that in gate sizing all the transistors in one gate are sized together with the same 
factor but in transistor sizing each transistor can b(j sized independently, therefore 
transistor sizing [45]-[47] explores the maximum possible optimization space for power 
reduction without much performance degradation. 
2.4 Leakage Power Reduction Techniques 
The following techniques can be used to reduce the leakage power 
2.4.1 Power Gating and Multi-Threshold Voltage 
The most natural way of lowering the leakage power dissipation of a CMOS 
circuit in the standby mode is to turn off its supply voltage. This can be done by using 
one PMOS and one NMOS transistor in series with the transistors of logic block [48]; 
these transistors are also called as sleep transistors. The NMOS insertion scheme is 
preferable, since the NMOS on-resistance is smaller for the same width; therefore, it 
can be sized smaller than the corresponding PMOS. The insertion of these transistors 
creates a virtual ground and a virtual power supply rail as depicted in Figure 2.4. In the 
ACTIVE mode, the sleep transistor is ON, therefore, the circuit functions as usual. In 
the STANDBY mode, the sleep transistor is turned OFF, which disconnects the circuit 
from the ground and therefore helps in leakage reduction. Note theit to lower the 
leakage, the threshold voltage of the sleep transistor must be large, otherwise, the sleep 
transistor will have a high leakage current which makes power gating less effective. In 
practice, dual Vth CMOS or Multi-Threshold CMOS (MTCMOS) is used for power 
gating [49]-[50]. In these technologies, there are several types of transistors with 
different Vth values. Transistors with a low Vth are used to implement the logic, while 
high Vth devices are used as sleep transistors. 
To guarantee proper functionality of the circuit, the sleep transistor has to be 
carefully sized to decrease its voltage drop when it is ON. The voltage drop across the 
sleep transistor decreases the effective supply voltage of the logic gate. Also, it 
increases the threshold voltage of the pull down traiisistors due to body effect. This 
increases high-to-low transition delay of the circuit. This problem can be solved by 
using a large sleep transistor. On the other hand, using a large sleep transistor increases 
the area overhead and consumed more dynamic power. Since using one transistor for 
12 
Lea^ge (Power (Reduction Tecfiniqties 
each logic gate results in a large area and power overhead, one transistor may be used 
for a group of gates as depicted in Figure 2.5. Notice that the size of the sleep transistor 
in this case ought to be larger than the one used in Figure 2.4. To find the optimum size 
of the sleep transistor, it is necessary to search the v(;ctor that causes the worst case 
delay in the circuit. This requires simulating the circuit under all possible input values, 
a task that is not possible for large circuits. Power gating is a very effective method for 
decreasing the leakage power. However, it suffers from the following demerits: 
1. It requires modification in the CMOS process technology to support both a 
high Vth device (for sleep transistor) and a low Vth device (for logic gates). 
2. It decreases the voltage swing and hence the DC noise margin. 
3. Sleep transistor sizing is a non-trivial task and requires much effort. 
Sleep > e p ^ 
In 
Virtual V D D 
O u t 
N 
Sleep 
Virtual Grcjund 
Figure 2.4: Schematic of multi-threshold CMOS inverter 
VDD 
Gate 1 Gate 2 
Sleep H 
Gates 
^ 
q 
Virtual Ground 
Figure 2.5: Schematic of common sleep transistor 
13 
Lea^e (Power eduction Tecfiniques 
2.4.2 Adaptive Body Bias 
One of the methods for decreasing the leakage current is using reverse body 
bias (RBB), to increase the threshold voltage of transistors in the STANDBY state. The 
threshold voltage of a transistor is given in by the following standard expression, 
Vih=Vtho+r( :\-^</>[, +K551- [2(2)/,I) (2.5) 
Where ' Vtho' is the threshold voltage for body to source voltage ' V^g '=0, 
'(/)p-' is the substrate Fermi potential and the parameter '7' is the body-effect 
coefficient. As can be seen from equation (2.5), reverse biasing a transistor increases its 
threshold voltage. 
An effective approach to reduce subthreshold leakage involves dynamically 
changing the body bias of transistors [51]-[53]. This technique can be either applied at 
the chip level or at fine granularity. Typically, a block-level approach is preferred as it 
provides leakage power reduction whenever the functional block becomes idle, 
regardless of the operation of the rest of the chip. 
In this approach, a control loop is required to provide appropriate substrate bias 
based on the operational state of the functional block. A block schematic of this 
approach is shown in Figure 2.6. When the block enters the standby state, reverse body 
bias (RBB) is applied to increase the Vth of transistors, which decreases subthreshold 
leakage current. When the block returns to the active state, 'RBB' is removed to 
decrease the Vth of transistors, and thus restorinjj the nominal performance of 
transistors [54]. The key issue with this approach is that the range of Vth adjustment is 
limited, which in turn limits the amount of subthreshold leakage reduction. Thus, in 
general, this approach is less effective than utilizing sleep transistors. However, the 
advantage of this approach over the sleep transistors is that it can be implemented 
without incurring any delay penalty. This can be done by applying forward body bias 
(FBB) when the block is in the active state. Under FBB, the Vth of devices is lowered, 
which improves the performance. As a tradeoff, FBB increases the subthreshold 
leakage. It is also crucial to limit FBB to ensure that the source-bulk pn-junction 
remains in cut-off when FBB is applied. 
Some additional overheads of this approach include area, cost in the control 
loop and energy cost in charging and discharging the large substrate capacitance (each 
time the block enters or leaves the standby mode). Similar to the case of sleep 
14 
Lea^e (Power (Reduction TecHniques 
transistors, idle periods should be kept long enough to justify the power savings of this 
approach. 
> VDD standby 
< VDD', Active 
VDD 
HE Control 
Loop 
Gnd * 
>Gnd Active 
< Gnd', Standby 
Figure 2.6: Adaptive body biasing 
2.4.3 Transistor Stacks 
Subthershold leakage current flowing thro\igh a stack of series-connected 
transistors reduces when more than one transistor in the stack is turned OFF. This effect 
is known as the stacking effect [55]-[57]. The stacking effect in two-input NAND gate 
is shown in Fig. 2.7. When both transistors Ml and M2 are turned OFF, the voltage at 
the intermediate node (VM) is positive. This is due to the small drain current flowing 
through Ml and M2. Due to the positive potential of VM, the gate-to-source voltage 
(Vgs) of Ml becomes negative hence, the subthreshold current reduces substantially. 
Similarly, due to VM, the body-to-source potential (VBS) of Ml becomes negative, 
resulting in an increase in the threshold voltage of Ml. This causes reduction in 
subthreshold leakage. With transistor stacking by replacing one single OFF transistor 
with a stack of serially-connected OFF transistors, leakage can be significantly reduced. 
The disadvantages of this technique are also obvious, such as a stack of 
transistors causes either performance degradation or more dynamic power 
consumption. 
VDD 
Vout 
M1 
VM 
M2 
Figure 2.7: Staciting effect in two-input NAND gate 
15 
Lea^ge (Power (Reduction Techniques 
2.4.4 Forced Transistor Stacking 
Stack effect or self-reverse bias effect is the phenomenon where leakage current 
decreases due to two or more series connected transistors turning OFF. Figure 2.7 
illustrates this concept. 
J 
IE 
B-
B" 
'' W 
B" IE-
W/2 
•' W/2 
Figure 2.8: Forced Stacking Concept 
A NAND gate is an example where natural stack of transistors exist. As the 
depth of the stack is increased, higher leakage power saving is observed. However, in 
certain circuits the natural stacking of transistors does not exist. To utilize the stacking 
effect in such a situation, a single transistor of width ' W is replaced by two transistors, 
each width is opted to 'W/2', which is called as forced stacking. Figure 2.8 illustrates 
the concept of forced stacking. Since two transistors turn OFF at the same time, 
stacking effect reduces subthreshold leakage current. As DIBL worsens with lower 
process technology, stack effect is expected to be even more effective with shrinking 
technology nodes. 
2.4.5 Sleepy Stack 
AHI f: 
t 
3HE 
w/2 
W/^pleep 
W/2 —1 i-\A, 
Figure 2.9: Sleepy Stacli Concept 
16 
Lea^e (Power <^duction TecHniqiies 
The second transistor stacking based approach is the sleepy stack [58] 
technique. In this technique, forced stacking is first implemented. Then to one of the 
stacked transistors, a sleep transistor is connected in parallel as shown in Figure 2.9. 
Thus, during active mode, sleep transistors are ON, thereby, reducing the effective 
resistance of the path (due to two parallel transistors). This leads to reduced 
propagation delay during active mode as compared to the forced stacking method. 
During standby mode, the sleep transistor is turned OFF and the stacked transistor 
suppresses the leakage. 
2.4.6 Long Channel Devices 
The active leakage of CMOS circuits can be reduced by increasing their 
transistor channel lengths [59]. This is because there is a Vth roll-off due to the Short 
Channel Effect (SCE). Different threshold voltages can be achieved by using different 
channel lengths. The longer channel length is used to achieve high threshold voltage 
which increases the gate capacitance; therefore, it has a negative impact on the 
performance and dynamic power dissipation. Long charmel insertion has similar or 
lower process cost, taken as the size increases rather than the mask cost [59]. In 
addition, different channel lengths track each otlier over process variation. The 
technique can be applied in a greedy manner to an existing design to limit the leakage 
current. A potential penalty is that the dynamic power dissipation of the up-sized gate 
increases proportionally with the effective channel length. There is less scope of saving 
power unless the activity factor of the affected gates is reduced. Therefore, the activity 
factor must be taken into account when choosing transistors whose channel lengths are 
to be increased. 
2.4.7 Optimal Standby Vectors 
Subthreshold leakage current depends on the vectors applied to the gate inputs 
because different vectors cause different transistors to be turned OFF. For example, a 2-
input NAND gate has the smallest subthreshold leakage due to the stacking effect when 
the input vector is '00'. When a circuh is in the standby mode, one should carefully 
choose an input vector so that the total leakage in the whole circuit gets minimized. 
17 
SummaTy 
2.5 Summary 
This chapter has introduced the field of low power design. Various techniques 
to reduce power consumption at the transistor level are described. Power dissipation in 
CMOS circuits can be either dynamic or static (leakage). Unlike dynamic power 
consumption, the leakage current does not depend on the switching activity. It only 
depends on the number of transistors on the chip. The dynamic power has dominated 
power consumption in CMOS circuits; however, technology scaling trends have 
resulted in leakage becoming a dominant component of total power. The next chapter 
covers the architecture of FPGA and also reviews the main sources of power 
consumption in FPGA building blocks. 
18 

(Power ^ diiction Techniques for T<P^Jls 
Chapter 3 
POWER REDUCTION TECHNIQUES FOR FPGAS 
This chapter presents background information regarding FPGA along with 
power reduction techniques for FPGA. The first half of the chapter presents an 
overview of FPGA architecture with breakdown of dynamic and leakage power, and 
the second half focused on different power reduction techniques for FPGA basic 
building blocks. 
3.1 Overview of FPGA Architecture 
Although there are many different FPGA architectures commercially available, 
they all are built from the same functional components: such as an array of 
programmable logic blocks, programmable interconnect and I/O blocks around the 
perimeter as shown in Figure 3.1(a) [60]-[62]. The reconfigurable elements allow an 
FPGA to be programmed to implement virtually any digital logic function. The 
majority of FPGAs provide programmable logic using lookup tables (LUTs) [63]-[65]. 
An individual k-input lookup table, or k-LUT, is capable of implementing any k-input 
combinational logic function. In order to support sequential logic, flip- flop is placed at 
the LUT output. This combination is referred to as a basic logic element (BLE) shown 
in Figure 3.1(b). In most modem FPGAs, BLEs are grouped together in larger blocks 
called configurable logic blocks (CLBs) and are configured using SRi\.M memory cells. 
The implementation of FPGAs in silicon falls into three groups: SRAM-programmed, 
antifuse-programmed, and EPROM-programmed. The configurable logic blocks in 
different implementations are very similar. The primary difference in various 
implementations is in the programmable routing architecture and the way it is 
configured. Due to the immense popularity of SRAM-based FPGA, only it is 
considered in this thesis. Figure 3.2 presents a typical island-style FPGA architecture in 
which the connectivity between logic blocks is achieved through programmable routing 
resources. Connection boxes, shown in Figure 3.2, are adjacent to the CLBs on all four 
sides. They provide programmable connections between CLBs and the routing tracks; 
therefore, each input pin of CLB is able to connect to a certain percentage of tracks in 
the adjacent channel. Switch boxes (SB) exist at the intersection of every horizontal and 
vertical channel. They provide programmable connections between the horizontal and 
19 
Overview of^F^gAArcfiiteaure 
the vertical channels, as well as horizontal-to-horizontal (and vertical-to-vertical) 
connectivity at the end of a segment. An SB implemented by a multiplexer, buffer and 
configurable SRAM cells as shown in Figure 3,2 (b). 
Roi 
1 1 
1' 1 
i. 1 
1: 1 
1' 1 
itingTrac 
! 1 
! 1 
1 1 
1 1 
ks Logic 
1 
1 
1 
1 1 
; Block 
1 
1 
i 
1 i 
1 1 
J/O 
/ 
1 1 
1 1 
1 1 
1 1 
1 1 
 
 
 1 
 
 
* 
4- Input 
LUT 
Clk 
D Flip-flop 
|SRA^— 
X}-
V 
X 
D 
1112I3I4 
(a) (b) 
Figure 3.1: (a) Abstract view of FPGA (b) Basic Logic Element (BLE) 
CLB - CB CLB - CB CLB CB CLB CB CLB 
CB SB CB SB I CB 3 SB I CB I SB I CB Z 
CLB CB CLB CB - CLB CB CLB CB - CLB 
c^Yrrfrin 
CB I SB I CB I SB I CB I SB I CB I SB CB 
CLB CB CLB CB CLB - CB 
Z CB : SB I CB I SB I CB 
CLB - CB - CLB 
SB Z CB 
CLB - CB CLB - CB CLB -
ji 
I SB CB Z 
CB - CLB - CB - CLB 
a) (b) 
Figure 3.2: (a) FPGA Architecture (b) Logic inside the SB 
20 
Interconnect ykrchitecture 
3.2 Interconnect Architecture 
The arrangement of the logic block and interconnect resources can be 
broadly classified into four groups such as Island style, row based. Sea of gates and 
hierarchical [2]. 
3.2.1 Island Style Architecture 
This architecture consists of an array of programmable logic blocks with 
vertical and horizontal programmable routing channels as shown in Figure 3.1 (a) and 
3.2 (a) [65]. The number of segments in the channel determines the resources available 
for routing. The pin of the logic block/CLB can access the routing channel through the 
connection box. The connectivity of each pin to the segments in the channel is 
determined by the flexibility 'Fc' of the connection box. The vertical and horizontal 
routing channels are connected at the switch box. 
The flexibility 'Fs' of the switch box determines the connection available from 
each track in a routing channel to the track in the other routing channels. 
3.2.2 Row Based Architecture 
In this architecture, logic blocks are arranged in rows with horizontal routing 
channels running between successive rows as shown in Figure 3.3. The routing tracks 
within the channel are divided into one or more segments. The segments can be 
connected to each other using programmable switches to increase their length. Other 
tracks run vertically through logic blocks. They provide connections between 
horizontal routing channels. The pin of the logic block can cormect to horizontal 
routing channels and vertical routing segments. 
The length of wiring segments in the channel is determined by the tradeoffs 
involving the number of tracks, the resistance of the routing switches and the 
capacitance of the segments. 
Actel FPGAs is an example of row based architecture [66]. 
21 
Interconnect JLnRitecturB 
i 
O 
Logic block 
Vertical 
Track " O 
--Q 
Logic block 
0-
-0 
-<> 
Logic blixk Logic block 
0 -
Segmented 
/^Tracks 
Logic bl(}ck Logic block 
Horizontal Routing 
/^Channel 
t 
Figure 3.3: Row-Based FPGA Architecture 
3.2.3 Sea -of- Gate Architecture 
Unlike the previous architecture, it does not contain an array of logic blocks 
embedded in the routing structure. It consists of fine grain logic blocks covering the 
entire floor of the device as shown in Figure 3.4. Connectivity is realized using 
dedicated neighbor-to- neighbor routes that are; faster than the general routing 
resources. The architecture also uses some general routes to realize longer connections. 
The XS family from Actel FPGAs is an example of this architecture [67]. 
Sea of Logic 
Block 
Logic 
Block 
^ 
HM 
P ^ r i M 
Local 
Interconnect 
Figure 3.4: Sea- of- Gate FPGA Architecture 
22 
Virtej^II Tcpgji Architecture 
3.2.4 Hierarchical Architecture 
This architecture is created by connecting logic blocks into clusters [68]. This 
cluster is recursively connected to form hierarchical structure as shown in Figure 3.5. 
The speed of the net is determined by the number of I'outing switches in its path. The 
hierarchical structure reduces the number of switches in series for the long coimection 
and hence potentially runs at higher speed. 
Logic 
Block 
Logic 
Block 
Logic 
Block 
Logic 
Block 
Local Track 
Logic 
Block 
Logic 
Block 
— 
<— Globa Track 
Figure 3.5: Hierarchical FPGA Architecture 
3.3 Virtex-II FPGA Architecture 
The FPGA device, which is introduced by Xlinix in the market, is Virtex-II 
family FPGA. Virtex-II is fabricated in 0.1 Sum process technology with eight layers of 
metal at 1.5V power supply [10]. Figure 3.6 shows the smallest member of the Virtex-
II FPGA family. It consists of a number of hard cores, including memory blocks, digital 
clock managers, I/O blocks, encryption circuitry, and custom multipliers. However, 
most of the silicon area in the larger member of the family is consumed by 
programmable fabric. Power and performance of FPGAs are often compared with their 
23 
"l^irtejc-ll ^(PgAArcfiitecture 
standard cell ASIC counterparts that use less silicon area for realizing the same 
functionality. 
The power inefficiency of FPGAs is often attributed to its programmable fabric 
that trades additional silicon area for flexibility, Virtex-II fabric consists of 
Configurable Logic Blocks (CLBs), which are connected using a big set of routing 
resources. Each Virtex-II CLE contains four slices as shown in Figure 3.7. Each slice 
consists of two 4-input Lookup Tables (LUTs), two Flip-flops (FFs), and a variety of 
dedicated circuitry to accommodate more efficient implementation of some specific 
logic. Virtex-II uses a segmented routing structure to minimize the number of 
transistors and wires that a signal needs to traverse to reach its destination. The 
segmented routing architecture includes wires that travel two CLBs (Doubles), six 
CLBs (Hexes), and the length of the chip in both vertical and horizontal direction called 
as (Longs) respectively [10]. There are also two sets of switches to connect the wire 
segments to the inputs and outputs of each CLB; which is referred to as Input Crossbar 
(IXbar) and Output Crossbar (OXbar). The CLB slices are also referred to as logic and 
the above five sets of switches comprise interconnect switch matrix as presented in 
Table 3.1. More detailed information regarding Virtex-II architecture can be found in 
[69]. 
Table 3.1: Major Interconnect Present in the Switch Matrix 
Interconnect 
Resources 
Double 
Hex 
Long 
IXbar 
OXbar 
Details 
16-to-l multipilexer and buffer 
12-to-l multipilexer and buffer 
32-to-l multiplexer and buffer 
30-to-l multiplexer and buffer 
24-to-l multiplexer and buffer 
24 
<Power'Estimation MetfiodoCogy 
IS 
z 
^ 
^ 
^ 
^ 
III III • 
III III • 
i iW i lH 
. 
^ 
III III 
_^  Block RAM 
Multiplier 
C ^ ^ Fabric 
^ 
^ 
^ 
• - I/O 
1—i^r Uigital (Jlock 
• ^ fim Manager 
^ ;i^  
miiriiniBii 
Figure 3.6: Virtex-II FPGA Architecture 
^ 
Interconnect 
Switch Matri) 
vCLB 
*—• Logic 
Slice 
( 
•-
3out 
Logic 
Slice 
4 • 
• 
*—» 
4 
Logic 
Slice 
-» 
Logic 
Slice 
Cir 
4 • 
~ ^ 
_l 
y 
Loa 
4-l/p 
LUT 
4-l/p 
LUT 
\r <5lirc k 
FF/ 
Latch 
FF/ 
Latch 
Figure 3.7: CLB and Logic Slice 
3.4 Power Estimation Methodology 
In deep-submicron technology, the power dissipated by the FPGA is the sum of 
the dynamic and static (leakage) power which are computed separately. The estimation 
methodology followed for the two components is discussed in the next sections. 
25 
(Power ^ Estimation Methodology 
3.4.1 Dynamic Power Estimation Methodology 
Dynamic power dissipation is caused by signal transitions in the circuit. Higher 
operating frequency leads to more frequent signal transitions and resuhs in increased 
power dissipation. The most significant source of dynamic power consumption in 
CMOS circuits is the charging and discharging of capacitance, which can be modeled 
by [10], is as follows 
/ '^=fC,K/F, (3.1) 
Where C„ V, and F, are the capacitance, voltage swing, and operating frequency of 
resource /, respectively. 
To calculate the total power dissipation, the authors in [10] consider three 
factors namely the effective capacitance, the resource utilization factor and the 
switching activity. The effective capacitance is due to sum of capacitances of 
interconnection wires and transistors. It can be measured by transistor-level simulation. 
The second important factor is the resource utilization. In typical FPGA designs, the 
majority of resources are not used after the configuration and thus they will not 
consume any dynamic power. Since the resource utilization varies with design, 
therefore, the authors in [10] considered a number of Perl programs to obtain the 
utilization for each resource in the routed designs. The third factor in determining 
power dissipation is the switching activity, which is defined as the number of signal 
transitions in a clock period. The switching activity for each resource also requires a 
statistical representation because it not only depends on the type of design, but also on 
the input stimuli. To measure the switching activity, Modelsim is used (for real-delay 
timing simulation). Finally, the results of this simulation are read into Perl script along 
with the routed design in Xilinx Design Language (XDL) format to obtain the statistical 
representation of the switching activity for each resource. 
According to Figure 3.8, most of the power dissipation in FPGA fabrics occurs 
in the interconnect resources as shown in Figure 3.9. The high wire capacitance in Long 
and Hex contributes most to the power dissipation while high utilization of Double is 
the main cause of their high share in the Pie charts. The total power dissipation of all 
circuits at 100 MHz with supply voltage of 1.5V is also estimated. It was found that the 
average power consumption of a Virtex-II CLB is 5.9uW per MHz but high switching 
26 
(Power (Estimation MetfiodoCogy 
activity can significantly raise the CLB power dissipation. For example, a switching 
activity of 50% would raise CLB power consumption to as high as 23 uW per MHz. On 
the other hand, a switching activity of 5% would reduce the CLB power dissipation to 
3.1 uW per MHz. 
Dynamic Power Breakdown in Xilinx Virtex-il FPGAs 
10% 
16% 
14% 
60% D Interconnects 
• Qocking 
D Logic 
D D B S 
Figure 3.8: Dynamic power breakdown in Xilinx Virtex-II 
12% 18% 
19% 
• Double 
^ BHex 
D Long 
DlXbar 
• Oxbar 
Figure 3.9: Contribution of dynamic power by different Interconnect out of 60% 
27 
(Power ^ Estimation Metfiodolbgy 
3.4.2 Leakage Power Estimation Methodology 
FPGAs are promising solutions for managing increasing design complexity 
while achieving both performance and flexibility. To support reconfiguration, FPGAs 
use more transistors per function than fixed-logic solutions, resulting in higher leakage 
power consumption. Therefore, FPGAs are generally not found in mobile applications. 
The authors in [9] and [70] analyses the leakage power of FPGA device using detailed 
circuit-level simulations. The simulation methodology accounts for design-dependent 
variations and provides detailed leakage power breakdowns. The leakage analysis is 
based for the family of 1.2V, SRAM-based FPGAs built in a 90nm CMOS process [9]. 
All simulations are performed using dc-operating point analyses in SPICE using 
BSIM4 device models. Since all CLBs in an FPGA are identical, therefore, simulations 
are carried out for a single CLB only. The CLE design is divided into smaller circuit 
blocks. Each block is simulated individually to identify its leakage power consumption. 
The total leakage power of the FPGA is measured by taking the sum of each block's 
leakage power and then muUiplied by the number of CLBs in the array of that FPGA 
[9]. 
As leakage power of a circuit depends on the values of its inputs, therefore to 
model the effects of input data variation, each circuit block is simulated under all 
possible input states such as (minimum, maximum, and average). The minimum, 
maximum, and average leakage values for all circuit blocks are summed up to compute 
the total CLB leakage power for best-case, worst-case and average-case design data 
respectively. 
In an FPGA, the three most common circuit types are configuration SRAM 
cells, interconnect multiplexers, and LUTs. Figure 3.10 shows the circuit-based leakage 
breakdown for an average design at 25°C. Combined, these three circuit types consume 
88% of the total leakage power. 
Normally, designs implemented in FPGA never use all available on-chip 
resources. Resource utilization affects total leakage power because most blocks are 
configured differently when unused. To account for the effects of resource utilization 
variation, the authors in [9] simulated each block in used and unused configurations. 
Total FPGA leakage is computed by summing the leakage power of both used and 
28 
afferent (Pcnver eduction Tecfiniques (rf'FPgjis 
unused resources. For each resource, the number of used and unused instances is 
determined based on its respective utilization factor. 
Finally, simulation results in [9] shows that the FPGA consumes 4.2uW per 
CLB nominally and up to 26uW per CLB in the worst-case data with (100% CLB 
utilization, at 85°C). A substantial leakage reduction is possible by applying low-
leakage techniques to SRAM cells. In addition, there is also scope of FPGA CAD tools 
to help in power saving by clustering unused resources such that they can be 
collectively shut down. 
Leakage power breakdown in Xilinx Spartan-3 
12% 
38% 
34% 
D Interconnect 
• LUTs 
D Configuration SRAMs 
D Other 
Figure 3.10: Leakage power breakdown based on circuit types 
3.5 Different Power Reduction Techniques of FPGAs 
3.5.1 Low Power Routing 
The authors in [13] proposed a design of a routing switch for FPGA that is 
programmable to operate in three different modes namely high speed, low-power and 
sleep. High-speed mode provides similar power and performance to a traditional 
routing switch. In low-power mode, speed is degraded in order to reduce the power 
consumption. 
To support different operation modes, the switch includes NMOS (MNX) and 
PMOS (MPX) sleep transistors in parallel shown in Figure 3.11. In high-speed mode, 
MPX and MNX are turned ON and therefore, the virtual VDD (VVD) is equal to VDD 
and output swings are full rail-to-rail. In low-power mode, MPX is turned OFF and 
29 
(Different (Fewer eduction ^ecRniques qfT(PgjLs 
MNX is turned ON. The buffer is powered by the reduced voltage, VVD = WDD-Vth. 
Since VVD < VDD, and therefore, the delay is increased (speed is reduced). As the 
output swing is reduced by Vth, therefore, the switching energy and leakage power are 
reduced substantially. In the sleep mode, both MPX and MNX are turned OFF, which 
is similar to supply gating. 
Results show that the proposed switch design reduces leakage power 
consumption by up to 40% in low-power mode compared with high-speed mode, 
dynamic power by up to 28% and leakage power in sleep mode is 61% lower than that 
of high speed mode. 
11-
12-
13-
14-
VDD VDD 
-Sleep 
MNX MF'X 
r 
0-
H 
Low-Power v Sleep 
0— 
VVD 
r—O 
Out 
1 
Figure 3.11: New programmable low-power FPGA routing switch 
3.5.2 Multi-Threshold CMOS (MTCMOS) 
Multi-Threshold CMOS (MTCMOS) refers to a circuit technique that uses a 
high Vth footer and/or header transistor to cut off a low Vth circuit from the power rails 
during standby mode. The authors in [71] proposed a design approach using gate-level 
sleep devices and provided a thorough analysis of sneak leakage paths in FPGA. They 
have also proposed a set of design guidelines for preventing the most common sneak 
leakage paths and insertion of sleep devices to mitigate the leakage current in FPGAs 
test chip. A design of a 0.13um, dual Vth (a difference of lOOmV in threshold voltage) 
testchip using sleep devices at the gate level is proposed by [71]. The total area penalty 
30 
(Different (Power'Eduction ^ecHniqties of<F<pgAs 
of the sleep devices on the test-chip is less than 5%. The test-chip implements low 
power FPGA architecture with 12 Configurable Logic Blocks in 3 slices. The memories 
that hold LUT values and configuration bits use high Vth devices, and the CLBs use 
MTCMOS circuits for the critical path. It is found that the test-chip entirely placed in 
the sleep mode provides a leakage reduction of 7.OX to 8.6X. 
3.5.3 Power Gating 
The authors in [72] analyses the leakage power dissipation in an SRAM array of 
FPGAs Lookup Table (LUT) for a 65nm CMOS process. The power-gating of an 
SRAM array of LUTs is implemented both at coarse-grain and fine-grain levels. 
At first, the leakage power dissipation in a single SRAM cell was estimated 
with and without power-gafing. Then, the leakage power dissipafion in a 16-by-l 
SRAM array was analyzed for three cases: no power-gafing, fine-grain power gating, 
and coarse-grain power gating. In fine-grain power-gating, every SRAM cell is power-
gated individually whereas, for coarse-grain power-gating, the entire array is power-
gated using one global gating transistor, as shown in Figure 3.12. This array is the 
source of data inputs for 16:1 muhiplexer used in a LUT of FPGAs. It was found that 
more leakage savings were obtained with coarse-grain power-gating than with fine-
grain power gating. The coarse-grain and fine-grain power-gating techniques yielded 
approximately 99% and 81% leakage savings, respectively, over the case where no 
power gating was applied. 
VDD 
Power Transistor 
LUT SRAM 
16- by- 1 
Figure 3.12: Coarse-gated SRAM array with 16 SRAM ceils 
31 
(Different (Power <ilfdMctton Techniques ofT(PgjLs 
There are also different granularities for power gating in the programmable 
logic and interconnection resources in FPGA as proposed by [73], such as every 
programmable logic and interconnection resource like LUT or routing multiplexer can 
be power gated individually or any unused programmEible resource can be set in sleep 
mode, leading to lower active leakage power consumption. Fine-grain power gating 
provides more controllability but incurs the highest area overhead. Therefore, coarse-
grain power gating can be used by sharing the sleep transistors among multiple similar 
programmable logic and interconnection resources such as CLBs, LUTs and 
interconnection multiplexers (muxes) etc. In coarse-grain power gating, the relative 
design overhead associated with the static or dynamic control of the sleep transistor is 
small compared with that of fine-grain power gating. In addition, the sleep transistor's 
size (W/L ratio) can be reduced based on the utilization statistics of the power-gated 
routing muxes or configurable logic. The author in [73] also describes more detailed 
design considerations regarding the size and granularity of power gating transistor for 
FPGA fabric (such as Double, Hex and Long). Furthermore, they have also used the 
mid-oxide sleep transistor and the utilization statistics of programmable interconnect 
resources of Spartan-3 FPGA to optimize the size of the sleep transistor. 
Finally, it was found that fine-grain power gating is more effective in reducing 
active leakage power, but it has large area penalty. Whereas, the coarse-grain power 
gating is less effective for reducing active leakage and has minimum area penalty. As 
suggested by [73], a combination of coarse- and fine-grain power gating provides the 
best design trade-off between area overhead and active leakage power reduction. 
3.5.4 Activity Packing 
Two packing algorithms for the detection of activity profiles in MTCMOS-
based FPGA structures are proposed in [74]. The first algorithm is a connection based 
packing technique by which the proximity of the logic blocks is accounted and the 
second algorithm is a logic-based packing approach by which the weighted Hamming 
distance between the blocks activities is considered. Once the activity profiles are 
computed, sleep transistors are carefully positioned at the clustered blocks that share 
similar activity profiles. A 15% standby leakage reduction is achieved by the two 
algorithms for the tested FPGA benchmarks. 
32 
(Different (Power llffduction techniques ofT^gAs 
3.5.5 Dual-VDD Assignment 
A programmable dual-VDD architecture for the logic block (CLB) and the 
routing switch is proposed in [75], [14] in which the supply voltage of the logic and 
routing blocks are programmed by using configurable transistors as shown in Figure 
3.13. Significant power reduction can be achieved by assigning low-VDD (VDDL) to 
non-critical paths and high-VDD (VDDH) to the timing critical paths in the design to 
meet timing constraints. A dual-VDD design needs level conversion, when a VDDL 
block drives a block operating at VDDH [75]. Therefore, a level conversion (LC) is 
placed at CLB output. A multiplexer is used to by-pass the level converter if it is not 
needed at that pin. MCNC benchmarks were used for experimentation to evaluate the 
dual-VDD architecture and VDD assignment algorithms. Using High-to-Low algorithm 
and placing LC at CLB inputs saves 24% dynamic power and 76% leakage power and 
has significant area penalty. But placing the 'LC at C'LB output pins reduces the area 
penalty by about 2% and still saves about 57% of total power. Due to the area 
overhead of LC and configurable supply transistors, the dual-VDD FPGA takes 
approximately 21% more area than a conventional single-VDD FPGA. 
V D D L 
V D D H 
Supply Transistor 1 
Supply Transistor 2 
inputs 
^ 
CLB 
> BLE 
BLE 
BLE 
BLE 
BLE 
BLE 
BLE 
BLE 
>LC 
LC 
LC 
LC 
LC 
LC 
>'LC 
LC 
Level Converter at 
CLB Output 
Figure 3.13: Dual-VDD Assignment 
33 
(Different (Power (Rfcfuction Techniques ofT^QAs 
3.5.6 Super Cut-off Techniques 
A popular technique to reduce the leakage is to apply a negative gate-to-source 
voltage {Vgs) to the NMOS transistor, since the subthreshold current exponentially 
depends on Vgs. The authors in [76] applies a reverse voltage of magnitude equal to 
threshold voltage Vth to the NMOS transistor based multiplexer routing switch, and 
found a leakage reduction of two orders of magnitude compared to traditional routing 
switch. The advantage of this approach is that there is; no area or delay penalty. 
3.5.7 Heterogeneous Routing 
The authors in [77] proposed a timing-driven placement and routing 
experiments along with power modeling, for identifying the type and percentage of 
resources (such as Double, Hex and Long) that can be slowed down. A methodology to 
incorporate heterogeneous (HT) routing fabric requires minimal changes to existing 
circuit design and CAD tools such as 
1. Speed File Based Modification 
Here SPICE simulations are performed to determine the performance of slower 
routing resources and speed files are updated accordingly. 
2. Post Pass 
The routed design here identifies non-critical nets. 
3. New Cost Function 
This approach leads to higher utilization of low power resources in non-critical 
paths. Finally place and route are performed with benchmark circuits. 
SPICE simulations are used to estimate the delay, average active and standby 
power of routing and input/output multiplexers at 90nm node. Different approaches are 
used to determine the size of transistors in low power resources. Such as, all device 
widths in a programmable pass transistor based multiplexer are scaled down by the 
same ratio, and in other case, keeping the scaling ratio of width constant, the channel 
length is increased by -25%. It is found that, increasing the channel length by -25% 
leads to a substantial reduction in standby power at the expense of performance penalty. 
A similar approach to implement 'HT' routing architecture for low leiikage is obtained 
by introducing transistors with higher threshold voltages in slow routing resources. The 
34 
Summary 
HT routing architecture reduces the standby power dissipation of FPGA routing fabrics 
by 33% without any area penalty and at the cost of less than 5% performance 
degradation. 
3.6 Summary 
Advances in the process technology will aid in the quest for improved speed 
performance and higher logic capacity. Even though FPGA move into the deep-
submicron technology the associated voltage reduction will reduce the power 
dissipation per logic but the higher transistor densities and operating frequencies will 
actually increase the overall FPGA power consumption. The breakdown of power 
consumption in FPGAs is well-studied, and it has be;en observed that the interconnect 
accounts for the major portion of static and dynamic power. Various approaches for 
reducing FPGA power have been proposed in the literature, including leakage as well 
as dynamic power. A significant improvement in power efficiency has to be achieved to 
make FPGAs viable in portable domain applications. This thesis investigates further 
low power techniques for FPGA so that it can be employed even in portable systems. 
The next chapter presents the subthreshold operation of different building blocks of 
FPGA for ultra-low power applications. 
35 

VCtra Lffw (Power TcpgAs 
Chapter 4 
ULTRA LOW POWER FPGAS 
4.1 Introduction 
The increasing demand for portable applications such as wireless and medical 
(hearing aids and pacemaker) has caused a significant growth of low-power design 
from system level to device level [78]-[83]. Operating the transistors of a digital logic 
in the subthreshold region has recently been proposed to achieve ultra low power 
consumption [84]-[86]. However, the performance of digital subthreshold circuits is 
several orders of magnitude lower than theii strong inversion counterparts. The 
subthreshold digital circuits will be suitable only for specific applications which do not 
need high performance, but require extremely low power consumption [87]. 
Subthreshold designs has been shown to consume orders of magnitude less power than 
the regular strong inversion circuit at the same operating frequency. Subthreshold 
region is ideal for ultra low power and low throughput applications in the high KHz or 
low MHz frequency range. The leakage current in the subthreshold region is used as the 
switching current, which in turn reduces the throughput. Leakage currents 
exponentially depend on the supply voltage, threshold voltage and temperature in the 
subthreshold region contrary to the superthreshold region. Hence, the most important 
concern in subthreshold region is variability because a small change in threshold 
voltage or temperature changes the leakage currents significantly [88]-[90]. 
Reconfigurable devices such as field programmable gate arrays (FPGA) are growing in 
popularity because of the spiraling increase in the Non-Recurring Engineering (NRE) 
cost of manufacturing an application specific integrated circmt (ASIC) [l]-[2]. This 
trend will become even more pronounced in deep submicron with further increase in 
NRE. It is envisaged that reconfigurable devices will replace most ASICs even in 
battery operated ultra low power portable devices in the near future. This chapter 
investigates the use of dynamic threshold CMOS (DTCMOS) in place of conventional 
CMOS for realizing more robust and high speed ultra low power FPGAs building 
blocks. 
36 
(DTGMOS 
4.2 DTCMOS 
Subthreshold CMOS is the conventional CMOS logic that operates in the 
subthreshold region whereas dynamic threshold CMOS (DTCMOS) logic is obtained 
by shorting the gate to the body [91] - [93]. As the threshold voltage Vth operating 
under DTCMOS mode is reduced due to the forward biasing of the substrate, therefore, 
the current drive can be significantly improved under the ON state [94]. 
Figure 4.1 shows DTNMOS and DTPMOS transistors in which the substrate 
voltage dynamically changes with the gate voltage, therefore, the threshold voltage is 
changed dynamically, when Vin=VDD (Vin=0) for DTNMOS (DTPMOS) 
respectively. The reduction in the threshold voltage of DTCMOS indirectly increases 
the carrier mobility due to lower value of normal effective field. The higher IQN (on-
current) and lower IQFF (off-current) leads to higher subthreshold slope which results in 
higher value of gain in the subthreshold region. The power consumed by DTCMOS is 
slightly higher but its delay is much smaller than CMOS leading to its lower value of 
power delay product (PDF). 
4.3 Analysis of Subthreshold FPGA Building Blocks 
This section investigates the effectiveness of both CMOS and DTCMOS biasing 
schemes on key building blocks of reconfigurable devices namely a 4-input look-up-
table (LUT), a 16:1 multiplexer switch, SRAM cell and a CMOS adder. A 16:1 
multiplexer switch is a part of an LUT and therefore, the simulation results for the 
multiplexer switch are identical to that of an LUT. Hence, the multiplexer is not 
analyzed separately all the simulations here are carried out using Berkeley Predictive 
Technology Models (BPTM) at 22nm technology node [95]. 
4.3.1 Look-up-Table 
A 4-input LUT, shown in Figure 4.2, is a key building block of reconfigurable 
devices for realizing logical functions [96]. It comprises of 16 SRAM cells along with a 
16:1 multiplexer for reading the contents of the SRAM cells. The buffer at the output of 
the multiplexer is realized as a level restoring buffer for reducing its power 
consumption. A subthreshold 4-input LUT (Sub-4 I/P LUT) is realized by using both 
conventional CMOS and DTCMOS (DT-Sub-4 I/P LUT) biasing schemes. All HSPICE 
simulations are carried out at a clock frequency of 500KHz with a load capacitance of 
37 
AnaCysis ofSuBthresfioQfT^gA (BuMn^(Blbc^ 
lOOfF at 50°C. The delay, power dissipation and PDP are plotted for both the biasing 
schemes as shown in Figures 4.3, 4.4 and 4.5 respectively. 
7\_ 7\_ 
Figure 4.1: Schematic of subtiiresholcl DTNMOS and DTPMOS 
LUT 0/P 
Figure 4.2: Transistor level circuit of 4-input LUT 
38 
Anafysis ofSu6tliresfioC({T(PgA (Building <BCOC^ 
70 
60 
^ 50 
S 40 
cs 
•S 30 
Q 
20 
10 
0 
Delay vs VDD 
- * - Sub-4 I/P LUT - • — DT-Sub- 4 I/P LUT i 
150 200 250 
VDD(m\0 
300 
Figure 4.3: Delay variation with supply voltage for a 4-input LUT 
250 
200 J 
B 150 i 
(£100 
50 
Power VS VDD 
-•—Sub-4 I/P LUT 
-A- DT-Sub- 41/P LUT 
0 -
150 200 250 
VDD (mV) 
300 
Figure 4.4: Power variation with supply voltage for a 4-input LUT 
39 
(Effect of (Process Variation 
3500 
3000 i 
5^ 2500 • 
-r 2000 
u 
oT 1500 1 
Q 
*- 1000 ! 500 
0 • 
1 
PDPvsVDD 
i 
'' --*— Sub-41/P LUT -m— DT-Sub- 41/P LUT i 
i 
tr—-^^ 
""" --^ * _^—i 
i 
1 
- - - • • - • • •• - - , - • - -
150 200 250 300 
VDD(m^^ 
Figure 4.5: PDF variation with supply voltage for a 4-input LUT 
It is clear from the above plots that DTCMOS is much faster than CMOS 
scheme for the LUT even in deep submicron due to lower value of threshold voltage 
under ON condition. The lower value of threshold voltage increases the drain current, 
thereby, reducing the charging and discharging times. The higher drain current only 
slightly increases the power consumption and hence, the PDF of DTCMOS is much 
better than CMOS throughout the supply voltage range (from 150mV to 300mV). It is 
evident from Figure 4.5 that the PDF of LUT is minimum at around 250mV supply 
voltage for the chosen technology. 
4.4 Effect of Process Variation 
It is seen that subthreshold DTCMOS logic blocks have lower PDF with that of 
regular subthreshold CMOS logic blocks. It is also important to study the stability of 
subthreshold DTCMOS logic to temperature, Vth and channel length variations to 
compare their robustness over regular subthreshold MOS circuits. To compare the 
stability of different logic blocks, we analyze the effects of temperature and process 
variations on power, delay and PDF of both subthreshold DTCMOS and regular 
subthreshold CMOS logic. 
40 
Tffect of Process Variation 
4.4.1 Temperature 
Equation (4.1) gives the subthreshold current in the weak inversion region. Fj-in 
this equation is the thermal voltage proportional to the absolute temperature 'Vj 
=kgT/q' where, kg is the Boltzmann's constant and 'q' is the elementary charge. 
VT occurs twice as an exponent in equation (4.1). As Frfor typical temperatures is just 
a few millivolts, the second factor is typically very close to 1. 
Vzs-Vth -Vds/Vr 
Isub=kVj e ' nVj (4.1) 
Where « = / + ^ ^ andyt = //o ! ^ ' - ^^4 
^ox ' ^'l" 
Thus, the temperature mainly contributes to the first exponential factor. As can 
be seen, rising the temperature by some factor ' f has the same exponential influence on 
the subthreshold current than reducing the threshold voltage by the same factor. 
The subthreshold slope '«' just depends on the ratio between the depletion layer 
capacitance (C^^) and the oxide layer capacitance (C^^). 'k' is the absolute 
proportionality factor and jio, Est, Nci, and <z!sis the zero-bias mobility, relative 
permittivity of silicon, the effective channel doping and the surface potential 
respectively. 
Similarly [87] shows that the increase in temperature leads to the threshold 
voltage reduction at a rate of around 349|iV/°C. This reduction in Vth increases the IQFF 
as well as the IQN of regular MOS transistor. However, the shift in the operating region 
causes the IQN current of DTCMOS to reduce with temperature [87]. The effect of 
temperature variations is analyzed by simulating the two structures from 50°C to 125°C 
at a frequency of lOOKHz with a supply voltage (VDD) of 250mV. The variation of 
delay with temperature is plotted in Figure 4.6. It is clear from this plot that the delay 
sensitivity of DTCMOS is lower compared to CMOS over the whole temperature 
range. 
41 
T.ffect of (Process Variation 
50 
40 
30 
Q 20 
10 
0 
Delay vs Tempiirature 
-.A— Sub-4 I/P LUT —•— D7'-Sub- 4 I/P LUT 
25 125 50 75 100 
Temperature ("C) 
Figure 4.6: Variation of Delay with temperature for a 4-input LUT 
4.4.2 Threshold Voltage 
The effect of process variations on CMOS and DTCMOS LUTs are analyzed by 
varying the threshold vohage by 10% in both the directions from their customized 
value. The delay with threshold voltage variation for 4-input LUT is shown in Figure 
4.7. The results show that the delay of DTCMOS LUT is less sensitive to the process 
variation. Hence, DTCMOS LUT maintains stable operation over a wide range of 
threshold voltage. 
Delay vs Vtho 
, -A— Sub-4 1/P LUT - • — DT-Sub- 41/P LUT , 
45 
i 
40 j 
35 i 
30 ' 
25 ; 
2 0 I 
I I 
! 15 ' 
10 ; 
i '\ \ 
; 10% 5% 0% -5% -10% • 
I 
' Vo vtho variation ' 
Figure 4.7: Variation of Delay with threshold voltage for a 4-input LUT 
42 
^Effect of (Process Variation 
4.4,3 Channel Length 
The Delay, Power and PDF variation with channel length for a LUT are shown 
in Figures 4.8, 4.9 and 4.10 for a supply voltage of 150mV. The PDP is minimum for 
channel lengths of 26nm and 30nm, rather than the actual minimum channel length of 
22nm for CMOS and DTCMOS schemes respectively. This shows that the best 
performance in terms of PDP is achieved by choosing channel length slightly longer 
than the minimum. 
Delay vs L at 150mV 
• Sub-41/P LUT —•— DT-Sub- 41/P LUT 
140 
120 
Q 
100 
80 
60 
40 
22 24 26 28 30 i 
Channel Length (nm) 
Figure 4.8: Delay variation with channel length for a 4-input LUT 
Power VS L at ISOmV 
-•—Sub-41/P LUT -A-DT-Sub-4I/PLUT 
55 ; 
50 '-• 
45 ; 
^ 40 i 
r 35 • 
% 30 ^ 
£ 2 5 - ! 
20 I 
15 • 
10 ! -
22 24 26 28 
Channel Length (nm) 
30 
Figure 4.9: Variation of Power dissipation with channel length for a 4-input LUT 
43 
S^RAO^ CeCC 
— 
3000 j 
2800 
!2. i 
-7 2600 ! 
U 
^ 
. Q 2400 • 
^ 
2200 ; 
2000 •' 
22 
PDPvsLatlSOmV 
Sub-4I/PLUT ^fc—DT-Sub-4I/PLUT ; 
i 
i 
1 
^ 
V^ 
\L ' 
24 26 28 30 
Channel Length (nm) 
Figure 4.10: Variation of PDP with channel length for a 4-input LUT 
This is because the delay goes up almost linearly with channel length but the 
power consumption falls down exponentially within ai short window of channel length 
as shown in Figures 4.8 and 4.9 respectively. The PDP starts increasing again for 
channel lengths larger than 26nm for CMOS and 30nm for DTCMOS for a 22nm 
technology as depicted in Figure 4.10 
4.5 SRAM Cell 
The 4-input LUT consists of 16; 1 multiplexer, the inputs to this multiplexer is 
stored in the 16 SRAM cells. It is, therefore, very important to evaluate the 
functionality of SRAM cells in the subthreshold region. Though, the static noise margin 
(SNM) is certainly important during hold. The cell stability during active operation 
represents a more significant limitation to SRAM operation specifically, at the onset of 
a read access, (when the wordline WL= "1" and the bitlines (BL and BLB) are still 
precharged to "1" as Figure 4.11 illustrates). The internal nodes (VI/V2) of the cell that 
represents a zero gets pulled upward through the access transistor due to the voltage 
dividing effect across the access transistors (M2, M5) and driver transistors (Ml, M4). 
This increase in voltage severely degrades the SNM during read operation (called as 
read SNM). For the standard 6T SRAM cell, as shown in the Figure 4.11, the read 
SNM obtained at VDD=300mV is only 28mV, which is very small therefore the 6T 
44 
S^O^ Cetf 
SRAM cell is not suitable for subthreshold application, To overcome this lower read 
'SNM' a 8T SRAM cell as depicted in Figure 4.12 is explored. 
To address the reduced read SNM problem, the read and write operations are 
separated by adding read access transistors (M7 and M8) to the standard 6T cell, thus 
increasing the transistor count to 8 therefore it is called here as 8T cell and is shown in 
Figure 4.12. As the read current does not significantly affect the cell value (due to 
separate read access), therefore the read stability of the 8T cell [97] is dramatically 
increased compared with the standard 6T cell. 
The transistor configuration (Ml through M6) is identical to a standard 6T cell. 
Write access to the cell occurs through the write access transistors and from the write 
bitlines, 'BL' and 'BLB'. Read access to the cell is through the read access transistor 
and controlled by the read wordline, 'RWL'. The read bitline, 'RE5L' is precharged 
prior to the read access. The wordline for read is also distinct from the write wordline. 
By using this cell, the worse-case stability condition encountered in standard 6T SRAM 
cell, is avoided and a high read SNM is retained. 
Here, after for the analysis only 8T cell is considered in subthreshold regime. 
The width/length (W/L) ratio for different transistors of 8T cell considered is as: 
M1=M4 = 6/1, M2=M5 = 4.5/1, M3=M6 = 1/1 and M7=M8 = 4.5/1. The following key 
performance parameter of 8T cell is evaluated in CMOS and DTCMOS configuration. 
BLB 
WL 
M3 
V1 
3—1 H 
WL 
M6 
V2 
BL BLB RBL 
Figure 4.11: Schematic of 6T SRAM ceil Figure 4.12: Schematic of 8T SRAM cell 
45 
Sm^ Celf 
4.5.1 Standby Leakage 
In FPGA the input of LUTs, routing switches and connection boxes are 
configured with SRAM cell. It is, therefore, important to estimate the standby leakage 
consumption of SRAM cell for ultra low power operation. 
Figure 4.12 shows the schematic of 8T SRj\M cell. In the standby mode, the 
WL and RWL are deactivated. Assume that the lefl. side of the node (VI) stores a '0 ' 
and the right side of the node (V2) a ' 1 ' . The major sources of leakage by various 
transistors of 6T cell is explained in chapter 7 for more detail please refer para 7.3. 
Table 4.1 estimates the standby leakage of 8T cell for different temperature and VDD 
in CMOS and DTCMOS configurations. As the temperature increases Vth decreases 
and the leakage increases accordingly. Compared to CMOS cell, the leakage penalty of 
DTCMOS cell is high. This is due to body biasing and higher VDD operation of the 
cell. But as the VDD is decreased, the penalty in leakage decreases. 
Figure 4.13 shows that, at temperature of 75°C compared to CMOS, the 
DTCMOS cell consume 21%, 7% and 5% more leakage for respective VDD of 300mV, 
250mV and 200mV. Hence, to have a comparable leakage power consumption of 
DTCMOS cell, a lower value of VDD is recommended. Therefore the simulation 
carried out for VDD=200mV, at temperature of 100°C shows that the DTCMOS cell 
consumes only 8.3% more leakage than the CMOS cell as depicted in Figure 4.14. 
Table 4.1 Standby leakage (pA) of 8T CMOS and DTCMOS cell with respective to temperature 
and VDD 
VDD 
(mV) 
300 
275 
250 
225 
200 
175 
150 
25°C 
MOS 
746 
580 
446 
338 
250 
182 
126 
DTMOS 
2280 
1204 
664 
414 
280 
196 
136 
50°C 
MOS 
1390 
1088 
842 
642 
480 
348 
242 
DTMOS 
2740 
1388 
958 
692 
508 
370 
262 
75°C 
MOS 
2414 
1904 
1482 
1136 
852 
620 
434 
DTMOS 
3082 
2178 
1604 
1202 
900 
662 
472 
100°C 
MOS 
3934 
3118 
2440 
1876 
1412 
1028 
724 
DTMOS 
6504 
4224 
2920 
2102 
1540 
1122 
796 
46 
S<RAM Cell 
•CMOS •DTCMOS 
3200 
200 250 
VDD (mV) 
300 
Figure 4.13: Standby leakage of 8T cell with respective to VDD at T=75°C 
->-CMOS -T(k— DTCMOS 
1600 
^ 1200 . 
< 
o) 800 -: 
n 
ra 
-• 400 
0 J 
25°C 50°C 75=0 
Temperature 
100°C 
Figure 4.14: Standby leakage of 8T cell with respect to 'T' at VDD=200mV 
4.5.2 Read SNM and Write Delay 
Due to fast and prompt switching of DTCMOS transistor (as the gate is 
connected to the body), the DTCMOS cell shows slightly improved noise margin than 
CMOS cell. Figure 4.16 shows that the DTCMOS cell has around 10% to 18% higher 
read SNM at all operating voltages (from 150mV to 300mV) compared to CMOS cell 
(Figure 4.15). This shows that DTCMOS cell has improved stability contrary to CMOS 
cell. Similarly as temperature increases, the threshold voltage of CMOS and DTCMOS 
transistor decreases and, therefore, the read SNM of both cells also decreases as 
depicted in Figure 4.15 and Figure 4.16 respectively. 
47 
S^RAM CeCC 
-25X •50°C 100°C 
150 200 250 
VDD (mV) 
3oo: 
Figure 4.15: Read SNM of 8T CMOS cell vs. VDD at different Temperatures 
-25''C •50°C •75°C -^<-100°C 
150 200 250 
VDD (mV) 
300 
Figure 4.16: Read SNM of 8T DTCMOS cell vs. VDD at different Temperatures 
Table 4.2 shows write delay for the two cells. DTCMOS cell has lower write 
delay this is due to the lowering of Vth for transistors under the ON state on account of 
forward body bias. This body bias reduces the bod>' charge and increase the carrier 
mobility which results in higher on-current drive. Whereas for CMOS cell, the fixed 
Vth of access transistors bring significant write delay overhead which is about 1.34X (at 
VDD=175mV) longer than that of DTCMOS as shown in Table 4.2. This shows that a 
proper selection of VDD may provide more advantage of speed for DTCMOS over 
CMOS cell. 
48 
Table 4.2 Comparison of write delay of 8T CMOS and DTCMOS cell with respective to VDD 
VDD (mV) 
300 
275 
250 
225 
200 
175 
Write delay (ns) 
DTMOS Cell 
740 
822 
885 
933 
998 
1182 
MOS Cell 
863 
954 
995 
1060 
1199 
1591 
Write delay of MOS Cell 
with respective to DTMOS 
Cell 
1.16X 
1.16X 
1.12X 
1.13X 
1.20X 
1.34X 
4.6 Full Adder 
VDD 
A H L BH H L BHIZ C H L 
Co 
VDD 
Hq BHC, phB u,[^ ^-\[^ BHC,ci-iii; 
Figure 4.17: Transistor level schematic of static CMOS full adder 
The standard 28 transistor static CMOS full adder cell [98]-[100] shown in 
Figure 4.17 has been simulated with equal [width of PMOS/width of NMOS] (Wp/Wn) 
ratio of unity for all transistors except for inverters where the {Wp/Wn) ratio is 1.5. 
Also, a buffer has been added before all inputs to make the test more realistic. The 
adder is simulated at a frequency of lOOKHz, temperature at 50°C and VDD ranging 
from 150mV to 350mV with a load capacitance of lO'OfF for both schemes. The delay, 
power and PDP with supply voltage variation are plotted in Figures 4.18, 4.19 and 4.20 
respectively. In the deep subthreshold region, the delay of DTCMOS full adder is much 
49 
smaller and power dissipation is comparable to CMOS full adder. Hence, DTCMOS 
full adder maintains a lower PDP in the subthreshold :region. 
Delay vs VDD 
•CMOS FA -A-DTCMOS FA 
400 
150 200 250 
VDD(mV) 
300 350 
Figure 4.18: Delay variation with supply voltage for a static CMOS adder 
Power VS VDD 
•CMOS FA -A-DTCMOS FA 
150 200 250 
VDD (mV) 
300 3501 
Figure 4.19: Power variation with supply voltage for a static CMOS adder 
50 
TuHAdcCer 
PDP «^ VDD 
• CMOS FA - • — D T C M O S F A ! 
10 
150 200 250 300 
VDD (mV) 
350 
Figure 4.20: PDP variation with supply voltage for a static CMOS adder 
Delay vs Temperature 
•CMOS FA -•—DTCMOS FA 
60 
20 
25 50 75 100 
Temperature (°C) 
125 
Figure 4.21: Delay variation with temperature for a static CMOS adder 
The effect of temperature variations is analyzed by simulating the two structures 
from 50°C to 125°C at a frequency of lOOKHz with VDD of 250mV. The variation in 
delay with temperature for an adder is plotted in Figure 4.21. It is clear from Figure 
51 
4.21 that the sensitivity of delay with respect to temperature is lower for DTCMOS 
compared to CMOS even for this block also. 
The effect of process variations on CMOS and DTCMOS full adder are 
analyzed by varying the threshold voltage by 10% in both directions from their 
customized value. 
The response of CMOS full adder and DTCMOS full adder are shown in Figure 
4.22. The results show that the delay of DTCMOS full adder is less sensitive to the 
process variation compared to the CMOS full adder. Hence, DTCMOS full adder 
maintains a stable operation over a wide range of threshold voltages. 
The effect of channel length on an adder is studied by simulating the DTCMOS 
adder at a high frequency of 2MHz without any load down to 150mV supply voltage. 
Similar results are obtained at other frequencies and with load capacitances even for a 
CMOS adder. The PDP of an adder is minimum for 24nm channel length, except for a 
supply voltage of 250mV, as shown in Figure 4.23. The channel length for minimum 
PDP corresponding to 250mV supply voltage is 26nm. 
Delay vs Vtho 
•CMOS_FA -•-DTCMOS_FA 
200 . 
a 
-10% 
Percentage Vtho Variation 
Figure 4.22: Delay with threshold voltage variation for a static CMOS adder 
52 
Summary 
PDP vs Channel length 
-•—VDD=0.15V -
60 
50 
^ 40 
1 — < 
B 30 
£ 20 
10 
0 
•VDI>=0.25V 
VDD=0.2V 
VDI>=0.3V 
22 24 26 
Channel length(nm) 
28 
Figure 4.23: PDP variation with channel length for a static CMOS adder 
The common trend between the two blocks (full adder and 4-input LUT) is that 
the charmel length for minimum PDP is around 2 to 8nm longer than the minimum 
channel length of 22nm for both schemes. It has been observed that the channel length 
for minimum PDP varies with the scheme, supply voltage and frequency of operation. 
4.7 Summary 
The subthreshold logic can be easily implemented and derived from traditional 
existing circuits by lowering the supply voltage below the threshold voltage of 
transistor. The building blocks of reconfigurable hardware such as LUT and one bit frill 
adder cell is implemented and their performance such delay, power and PDP is 
estimated in the subthreshold region. The sensitivities of the two schemes, against 
process parameter variations, are also explored and, it has been found that DTCMOS 
shows superior robustness, tolerance to temperature and process variations besides 
having lower delay than that of CMOS for above blocks at 22nm node. 
Similarly the DTCMOS based 8T cell also shows improved performance in read 
SNM and writes delay with minimal penalty in leakage consumption. However, 
DTCMOS can only be implemented in a triple well process technology. Additional 
increase in area and process complexity for DTCMOS is compensated by its higher 
operating frequency while maintaining lower PDP than CMOS in the deep subthreshold 
53 
Summary 
region. The performance of LUT and one bit full adder blocks can be improved by 
choosing channel length 2 to 8nm longer than the minimum channel length of 22nm. A 
number of advantages in subthreshold operation includes improved gain, noise margin, 
and more energy efficient than standard CMOS at low frequency of operation. 
However, due to its slow performance, subthreshold circuit is limited to only certain 
applications where ultra-low power is the main requirement, and performance is of 
secondary importance. This investigation will go a long way in the design of 
subthreshold FPGAs for catering to the ultra low power market segment. The next 
chapter explains the overview and modeling of Carbon Nanotube Field Effect 
Transistor (CNFET). 
54 
Cda-pter 5 
Tkc^9{OLogr 
CarBon [NhnotuSe TieHf Tiffed Transistor T^echnoCogy 
Chapter 5 
CARBON NANOTUBE FIELD EFFECTK 
TRANSISTOR TECHNOLOGY 
5.1 Introduction 
Carbon is a Group IV element with four valence electrons. This gives it the 
unique property to combine with a large variety of substances. Common forms in 
nature start with carbon polymers, which are the backbone of organic molecules and 
give rise to the term carbon-based life. Planar hexagonal rings bonded with sp2 
hybridization form graphene with ;i-bonded layers of graphene stacking to form 
graphite. Given enough pressure and time, these loosely bonded graphene sheets can 
be forced to form sp3 hybridizations, which yields diamond. These substances have 
been known for millennia because they can form in regular, macroscopic quantities. 
Recently, the underlying physics, imaging technologies and fabrication methods have 
advanced to where even smaller forms of carbon can be studied [101]. The progenitor 
of exotic molecular carbon forms is the carbon-60 (C-60) or Buckminsterfullerene 
[102]. Named after an architect who popularized the geodesic dome, a C-60 molecule, 
which is formed from 60 carbon atoms in a regular hexagon/pentagon pattern (much 
like a soccer ball) is shown in Figure 5.1. Researchers are finding uses for C-60 in areas 
ranging from drug delivery to mechanical lubricants. So important was this 1985 
discovery that it merited the 1996 Nobel Prize in Chemistry [103]. 
Figure 5.1: (a) Layers of sp2 bonded graphene and (b) C-60 molecule 
55 
CarBon O^anotuBes 
5.2 Carbon Nanotubes 
While experimenting with an arc discharge process expected to yield C-60 
molecules, Sumio lijima noticed high aspect ratio carbon filaments in the resulting soot 
[104]. Ranging from 4 to 30nm in diameter, they were too small to be regular carbon 
fibers and later tests would show that they were formed as if sheets of graphene were 
rolled into tubes. Initially dubbed helical microtubules of graphitic carbon, the name 
carbon nanotube (CNT) were soon adopted and an entirely new branch of nanoscience 
had been formed. 
The initially discovered carbon nanotubes were multi-walled, with smaller 
carbon nanotubes nested inside larger ones at a separation about the same as between 
the sheets of graphite (0.34nm) with as many as 50 nested carbon nanotubes. These 
multi-walled nanotubes had interesting properties but were difficult to characterize 
because there was no method for analyzing the individual shells. 
Guided by theoretical predictions that single wall carbon nanotube might be 
molecular one dimensional quantum wire, lijima refined his fabrication method until 
enough single wall carbon nanotube could be produced. These single walled tubes are 
the focus of much of the current investigation in carbon nanotubes, which have 
proposed applications in field emission, current transmission, genetic probes and nano-
electro-mechanical systems to name a few. 
The carbon nanotubes unique structure has some remarkable electrical 
properties. The carbon atoms are each bonded to three other carbon atoms with sp2 
bonds, which are even stronger than the sp3 bonds in diamond. This makes atomic 
electromigration extremely difficult and means that carbon nanotubes are an extremely 
stable structure which are resistant to damage from high currents [105]. 
A carbon nanotube is very close to a one dimensional system of electrons that 
gives rise to many unique electrical and thermal properties. Since electrons can move in 
one dimension only, the phase space for scattering in nanotube is very limited 
(electrons can scatter only backward). The mean free path in nanotube, therefore, is in 
micron range. This long mean free path might also mean that both metallic and 
semiconducting nanotubes can support ballistic transport. 
56 
CarSon iNanotuBes 
5.2,1 Properties of Single-Walled Carbon Nanotubes 
Single-walled carbon nanotubes are best characterized by its chirality or chiral 
vector. The chirality of a nanotube determines its properties and diameter [106]-[108]. 
The chirality is represented with a pair of indices (n, m) called the chiral vector. The 
chiral vector is a line that traces the CNT around its circumference from one carbon 
atom, called the reference point (O) back to itself (point C) as shown in Figure 5.2 
[109]-[110]. The circumference described by the line OC is represented by the 
following mathematical expression, 
C = nal + ma2 (5.1) 
Where al and a2 are the unit vectors for the graphene hexagonal structure, and 
(«, m) are integers that represents the number of hexagons from the reference point (0) 
to point (C), in the al and a2 directions, respectively. 
(0,7) (" 
(11,0) 
, / ^ N 
* / •> 
«>. 
C (".^ ) 
"Vx / -
Figure 5.2: Schematic representation of a chiral vector in the crystal lattice of carbon nanotubes 
57 
TaBrication ofCo-rSon !Nanotu6es 
The diameter of a CNT given by [111]-[112] is 
f« + m + nm 
d = a (5.2) 
Where 'a' is the lattice constant =2.49A° 
In Figure 5.2, two wrapping angles have been defined namely '6' and ' 0 ' . The 
angle 0 is a chiral angle defined as the angle between the zigzag axis and the chiral 
vector. Chirality angle is calculated using the formula, 
e^sm'-^J^ (5.3) 
2v«^ +m^ +nm 
The angle (p is defined as the angle between the armchair axis and the chiral 
vector. Using the (n, m) indices and the chiral angle, carbon nanotubes can be classified 
into three groups [113] namely armchair nanotubes for n=m, with a chiral angle 6=30°, 
zigzag nanotubes for n=0 or m=0 with 0=0° and chiral nanotubes for any other 
combination with 0° < 6 < 30°. Furthermore, the n, m integers also determine whether a 
CNT is metallic or semiconducting, when n-m=3l (7 ' being an integer), the nanotube 
is metallic, and when n-m ^ 31, the nanotube is semiconducting with an energy gap 
depending on its diameter. The energy gap of a CNT given by [114] is 
Eg=2Vpp^ac.c/d (5.4) 
Where 'Vpp j^' is the carbon-carbon (C-C) fight binding overlap energy (3.033 
eV), 'ac-c' is the nearest neighbor distance between C-C bonds (0.144 nm) and 'i/' is the 
diameter of carbon nanotube. 
5.3 Fabrication of Carbon Nanotubes 
Carbon nanotubes are the monolayer sheet of graphite atoms arranged in a 
hexagonal grid pattern which have been rolled into a tube. They can have a single wall 
(single cylinder) tube (SWNT) [115] or can have tubes of varying diameter within 
tubes as in the case of multi-walled nanotube (MWNT). Although most methods of 
fabricafion produce mulfi-walled tube, single-walled tube can be preferentially 
produced through the addifion of a catalyst in the fabrication process. The graphene 
58 
TaSrication ofCofSon O^anotuSes 
grid structure can either align itself with or it can twist with respect to the tube axis. 
This variation in alignment of the grid, in addition to the tube diameter, determines the 
electrical properties such as whether the tube will behave as a metal or as a 
semiconductor. There are four ways to fabricate CNTs such as: arc discharge, laser 
ablation, chemical vapor deposition and flame synthesis. 
5.3.1 Arc Discharge 
Arc discharge is the earliest method implemented to create nanotubes [116]-
[117]. It involves vaporizing carbon in the form of tube called as nanotube. Figure 5.3 
shows a schematic inside the chamber where the tubes are made. The apparatus is kept 
in a helium atmosphere at 50-760 torr. A bias voltage of about 10-30V is applied. The 
anode, which is composed of amorphous carbon, is brought in contact with the cathode 
and then removed. This results in a discharge of about 50-300 amps. This discharge 
vaporizes part of the anode and the helium quenches the vapor to form nanotubes which 
is collected at the end of the anode. 
Graphite Rod 
MWNT 
Cathode Anode 
^1 
Figure 5.3: Schematic of MWNT production using Arc Discharge 
Transition metals 
Ni, Co, Fe ,Yi) 
SWCNT 
Figure 5.4: Schematic of SWNT production using Arc Discharge 
59 
T'aSrication ofCar6on 9fanotu6es 
The growth rate is directly proportional to the helium pressure. Such technique 
will only create muhi-wall carbon nanotubes (MWNTs). A catalyst is required to create 
single-wall carbon nanotubes (SWNTs). The most common is Ni:Yi, other catalysts are 
Fe, Co, and Ni. The biggest requirement for a catalyst is that it should be poor carbide 
former. Figure 5.4 shows the setup used to create SWNTs. When the discharge occurs, 
both the catalyst and the carbon vaporize. The carbon vapor is absorbed by the catalyst 
which is in the form of a liquid alloy with the carbon. A layer of carbon is formed 
around the catalyst particles. Eventually, a saturation point is reached and carbon is 
precipitated out in the form of SWNTs. During the entire time, carbon is continuously 
being absorbed allowing the nanotube to grow. This process is called vapor-liquid-solid 
growth. This method is the cheapest; however, it results in contaminated nanotubes that 
require cleaning. The soot that is produced with the nanotube needs to be removed 
before use. 
5.3.2 Laser Ablation 
Furnace at 1200 T 
Aragon gas 
Water-coated 
Copper collector 
Needymium yurium 
Aluminium garnet laser 
Graphite target 
Nanotube growing 
along tip of collector 
Figure 5.5: Schematic of SWNT production using Laser Ablation 
60 
TaSrication ofCarSon !Nanotu6es 
As shown in the setup of Figure 5.5, a lOHz pulsed laser is focused onto a 
graphite catalyst target in a quartz tube, which is located in a tube furnace. The laser 
vaporizes part of the target and produces nanotube using vapor-liquid-solid (VLS) 
growth. In order for VLS growth [118]-[119] to occur, the oven temperature must be 
chosen such that the catalyst carbon mixture will be in a liquid alloy state and the pure 
carbon can remain solid to precipitate out. The nanotubes are then swept by flowing 
inert gas (argon) and collected on a cold finger. Laser ablation will only grow SWNTs. 
The SWNT produced is highly pure and therefore, minimal cleaning is required. This 
method is very expensive due to the requirement of the laser. 
5.3.3 Chemical Vapor Deposition 
Chemical vapor deposition (CVD) [120]-[121] is the easiest method which 
involves the layering of a catalyst and then having a hydrocarbon gas reacts with it to 
grow nanotubes. Figure 5.6 shows the setup used. The sample, in the form of a 
substrate layered with catalyst particles, is placed at the center of the tube and heated to 
around 720°C. The hydrocarbon gas is then allowed to flow through the tube. The gas 
catalytically reacts and grows the nanotubes. CVD produces the cleanest nanotubes and 
hence no cleaning is required. In addition, the process is very familiar to the current IC 
foundries and so it would be easiest to scale up to industrial production. 
Quartz tube 
sample 
Oven 
720 °c 
Figure 5.6: Schematic of Chemical vapor deposition Process 
61 
dNT^T (Device Structure 
5.3.4 Flame Synthesis 
The work in [122] has shown that flame synthesis is an inexpensive large scale 
method to produce single-walled carbon nanotubes. In a flame synthesis process, the 
combustion of hydrocarbon fuel is responsible to produce enough heat to establish the 
required temperature environment for the process and to form small aerosol metal 
catalyst islands. Single-walled carbon nanotubes are grown in these catalyst islands in 
the same way as in the laser ablation and arc discharge processes [123]-[124]. 
5.4 CNFET Device Structure 
Nanotube 
- Silicon Dioxide 
Silicon Wafer 
Gate Oxide 
Figure 5.7: Schematic of a CNFET 
Section 5.4 and 5.5 reviews the existing literature on carbon nanotube field 
effect transistors (CNFETs). Recently CNFETs have a metal carbide source/drain 
contact and a top gated structure with thin gate dielectrics [125] as shown in Figure 5.7. 
The principle working operation of CNFET is similar to that of traditional silicon 
devices (MOSFET). This three/four terminal device consists of a semiconduting 
nanotube, acting as a conducting channel, bridging the source and drain contacts. The 
device is turned on or off electrostatically via the gate. Based on the operation, the 
CNFET can be categorized as either Schottky Barrier (SB-CNFET) or MOSFET-like 
(MOS-CNFET) [126]-[127]. SB-CNFET is a tunneling device that works on the 
principle of direct tunneling through a Schottky barrier (SB) at the source-channel 
junction. This transistor is shown in Figure 5.8, the barrier width is modulated by the 
62 
CN^PE'T (Device Structure 
application of gate voltage and therefore, the transconductance of the device is 
dependent on the gate voltage. These devices are fabricated using direct contact of the 
metal with the semiconducting nanotube, and consequently, they have an SB at the 
metal nanotube junction [126]-[127]. Two important aspects of these nanotubes 
transistors are 
I) The energy barrier at the SB severely limits the transconductance of the 
nanotubes in the ON-state and therefore reduces the current delivering capability 
of transistor. 
II) The SB-CNFETs exhibit ambipolar characteristics in their current-voltage 
behavior, and this constraints the use of these transistors in conventional CMOS 
logic families. 
Gate 
Source Drain 
CNT 
metal lorp- metal 
Figure 5.8: Schematic of SB-CNFET 
Source 
n" 
Gate 
Drain 
CNT 
I orp- n^  
Figure 5.9: Schematic of MOS-CNFET 
Whereas, the source and drain terminals of (MOS-CNFET) are heavily doped 
like MOSFET as shown in Figure 5.9. This device operates on the principle of 
modulation the barrier height by gate voltage application [126]. The drain current is 
controlled by magnitude of charge that is induced in the channel by gate terminal. 
63 
Mod^C Overview 
The authors in [126] have presented numerical studies on MOS-CNFETs such as: 
I) The MOS-CNFETs has unipolar characteristics unlike SB-CNFETs' ambipolar. II) 
The absence of SB reduces the OFF state leakage current. Ill) They are more scalable 
compared to their SB counterparts. IV) In the ON-state, the source-to-channel junction 
has no SB, and hence, the device demonstrates significantly higher "ON" state current. 
The ac performance of SB-CNFET is poor due to the proximity of the gate electrode to 
the source/drain metal. The ambipolar behavior of SB-CNFET also makes it 
undesirable for complementary logic design. Considering both the fabrication 
feasibility and superior device performance of the MOS-CNFET as compared to the 
SB-CNFET, in this work only MOS-CNFETs is used. 
5.5 Model Overview 
A circuit-compatible and compact model of CNFET from Stanford University 
[112], [128]-[129] is used here for simulation. It provides improved accuracy by 
accounting several practical nonidealities, such as scattering, effects of the doped 
source/drain extension region, and inter-CNT charge screening effects. In addition, by 
including a full transcapacitance network, it produces better predictions of the dynamic 
performance and transient response. Figure 5.10 shows the modeling of this device. 
The semiconducting intrinsic CNT region under the gate acts as the channel and 
the heavily doped CNT regions outside the gate form the source/drain extension 
regions. 
The model is organized hierarchically into three levels [128], each deal with 
different sections of the CNFET and is described here. 
The first level (LI) describes the intrinsic CNT region that forms the channel 
under the gate. Several non-idealities, such as near ballistic transport and parasitics, are 
taken into account here. The second level (L2) includes the source/drain extension 
regions with its parasitic resistances and capacitances. The third level (L3) describes 
the modeling of muhiple CNTs under the same gate and accounts for CNT-to- CNT 
charge screening effects. 
64 
ModeC Overview 
The following sections describe the three levels of the model. 
Intrinsic CNT Channel 
Substrate 
(a) Level 1 model (LI) 
Doped CNT 
Sutstrate 
(b) Level 2 model (L2) 
CNTs 
(c) Level 3 model (L3) 
Figure 5.10: Three-level hierarchy of CNFET model 
5.5.1 CNFET Model Level 1 (LI) 
LI has two main parts; it includes 3 dependent current sources to model the 
intrinsic channel current and also includes a 5-capacitor transcapacitance network to 
model the AC response. The equivalent circuit of the Level 1 model is shown in Figure 
5.11 [112]. 
65 
ModeC Overview 
CNFET LI models the intrinsic CNT channel current by considering three contributing 
sources: (1) thermionic current contributed by the semiconducting subbands, (Isemi). 
(2) band-to-band tunneling current through the semiconducting subbands, (Ibtbt) and 
(3) Current contributed by the metallic subbands, {Imetallic). 
Each of these contributions is discussed below. 
Gate 
Sub 
Cbs 
Drain 
Source 
Figure 5.11: Equivalent circuit for tiie level 1 model (LI) 
For a MOS-like n-type CNFET, the hole current is usually negligible compared 
to the electron current for semiconducting sub-bands. This is because in the heavily 
doped source/drain extension regions (n-type), the hole carrier density is negligible 
compared to the electron carrier density. The opposite is true for p-type CNFETs. Thus, 
to simplify computation, the model only includes electron current for calculating Isemi. 
Here the model equations and derivations for an n-type CNFET is considered. The total 
Isemi is the sum of the current components flowing irom the drain to the source {+k 
components) minus the current components flowing from the source to the drain {-k 
components) summed over all subbands and substates given by equation (5.5). 
M L 
[semi(Vch, DS, Vch,GS) = 2 T !• [TLR.JmJ{0,mE) \ +k - TRL.Jm,i(Vch, 05", AOB | -k)] 
km k! 
(5.5) 
Where Vch,DS and Vch,GS denotes the Fermi potential differences near source 
side within the channel. TLR and TRL are the transmission probability of the carriers at 
the sub-state (w,/) in +k branch and -k branch, respectively. The factor of 2 is due to the 
66 
ModeC Overview 
double-degeneracy of the sub-band. M is the number of subbands and L is the number of 
substates. The km denotes the wavenumber of the mXh subband in the circumferential 
direction and kl denotes the wavenumber of the /th substate in the axial direction. The 
Jm,l is the current contribution from the {m,l) substate and AOB is the change in the 
channel surface potential due to an applied bias. 
Vch,S'S pCc Vch,D'S 
Vch/BS 
Figure 5.12: Electrostatic capacitor model to calculate the channel surface potential change S<i>B 
The Stanford CNFET Model finds AOS by applying the law of charge 
conservation to the electrostatic capacitances. Figure 5.12 shows the electrostatic 
capacitance model superimposed on the energy band diagram for a CNFET. Vch,GS 
{Vch,BS) is the potential difference from the gate (bulk/substrate) to the source-side 
channel region; and Vch,S'S (Vch,D'S) is the potential difference from the external source 
(drain) outside the channel region to the source-side channel region. Cox is the 
electrostatic coupling capacitor from the gate to the channel; C,„/, is the capacitance 
between the channel and substrate. 
Whereas, (1 - fi)Cc and fiCc are the coupling capacitances from the channel to the source 
and drain respectively. The p and Cc are fitting parameters in the model. Charge 
conservation then dictates that the charge induced by the electrodes (Qcap) is equal to 
the charge induced on the CNT surface (QCNT)'. 
Qcap ~ Qcm (5.6) 
67 
ModeC Overview 
Q^^j^=Cox{VchXiS-VFB) + auhVch,BS+PCcVchjysH^-/i)CcVcl,,S'S-{Cnx + C.s,,h + Cc)'^~ (5.7) 
-cap 
Q. CNT 
M L 4<? 
^SKm Kl ' (Em,l-AQ>B)/KT (Em,l-A0B+qVDS)/KT' 
m=mO 1=0 1+e 1+e 
(5.8) 
Where mO is 1 for semiconducting CNTs, and 0 for metallic CNTs (i.e., the metallic 
subband is not included for semiconducting CNTs and is included for metallic CNTs). 
Vdrain 
f 
QQ. 
OQO 
-«A(t)B 
Figure 5.13: Equation solver circuit used to implicitly solve for A(DJB 
AOfi is computed iteratively from equation (5.6-5.8), using constructs available 
in circuit simulation tools such as Verilog-A (and HSPICE), the technique is illustrated 
in Figure 5.13, which uses an "Equation Solver Circuit" to solve for AOe. Two voltage-
dependent current sources, with current expressions equal to the expressions of Qcap 
and QcNT above, are placed in series. They are both dependent on a dummy voltage 
which plays the role of A<D5 in the equations. Rdummy shown in Figure 5.13 is a very 
large resistor used to aid convergence. The two current sources are forced to have equal 
currents (and thus Qcap=QcNT ) and Verilog-A (and HSPICE) will automatically, 
iteratively calculate the value of the dependent voltage such that this condition is met. 
The value of this voltage is precisely the value of AOB. 
5.5.2 CNFET Model Level 2 (L2) 
The second level L2 model the heavily doped source/drain extension region and 
takes into account the effects such as elastic scattering, parasitic resistances of the 
source/drain extension region, parasitic capacitances; of the source/drain extension 
region and the Schottky Barrier (SB) resistance of 1;he source/drain metal contacts. 
68 
C5VT*E'r J-l^ CHaraaeristics 
Elastic scattering is primarily due to scattering sites caused by imperfect CNT 
fabrication. 
5.5.3 CNFET Model Level 3 (L3) 
L3 models the effects such as CNT-to-CNT charge screening, and the gate-to-
neighboring-contacts parasitic capacitances the model is shown in Figure 5.10(c). 
Practical CNFETs must contain multiple CNTs per device in order to achieve sufficient 
drive currents for reasonable speed. Thus the model allows multiple nanotubes and 
accounts for the CNT-to-CNT charge screening effects present in these devices. In 
other words, CNTs only experience charge screening from their immediate neighbors. 
5.6 CNFET I-V Characteristics 
A typical layout of a MOS-CNFET device is illustrated in Figure 5.10 (c). One 
or multiple devices can be fabricated along a single CNT, and multiple CNTs may be 
placed under the same gate in order to improve the drive current. The CNT channel 
region is undoped, and the other regions are heavily doped, acting as both the 
source/drain extension region and/or interconnects between two adjacent devices. In 
this work, we consider a planar gate structure ^vith multiple cylindrical conducting 
channels and high-^ gate dielectric material on a substrate with a different dielectric 
constant (ki=16 and K2=3.9) as shown in Figure 5.10 (c). The ki and Ki are the 
dielectric constants of the gate oxide and insulating bulk oxide respectively. The 
diameter of the cylinder (tube) is 'J ' . The normal distance between the CNT center and 
gate i.e. (thickness of gate dielectric oxide) is denoted by '//', and the distance between 
the centers of the two adjacent parallel CNTs is denoted by pitch 'S ' . 
Due to ultra long (~l|j,m) scattering mean-free-path (MFP), ballistic or near-
ballistic transport, the quasi-ID structure provides better electrostatic control over the 
channel region than 3D device (e.g. bulk CMOS) and 2D device (e.g. fully depleted 
SOI). These properties make CNFET one of the promising new devices in traditional 
silicon technology [126-127]. For determining the performance of CNFET technology 
over CMOS technology, the CNFET on-current (ION) can be derived as 
ION = n.ga, (VDD -VSS' -Vth) (5.9) 
69 
CS^TT/T I-'U CHaracteristics 
Where, VSS'= loNLsps/n 
The parameter 'n' is the number of CNTs per device, ' Vth' is the threshold 
vohage of semiconducting CNT, 'gent' is the transconductance per CNT, Ls is the 
source length (doped CNT region), and ps is the source resistance per unit length of 
doped CNT. Whereas, VSS' is the voltage drop between the inner source nodes S' and 
the external source node S as shown in Figure 5.14. 
Figure 5.14: Schematic of N-channei CNFET 
The simulation measurement plots of Ids vs. Vds for the N-channel semiconducting 
CNFET (N-CNFET) for chirality's (13, 0), (19, 0) and (26, 0) having diameter of Inm, 
1.5nm and 2nm with following parameters is shown in Figures 5.15 to 5.17. 
—•—Vgs=0.3V 
3.00E-05 
• Vgs=0.6V -Vgs=0.9V 
2.50E-05 ' 
2.00E-05 
•h ^ A • 
1.50E-05 
1.00E-05 
5.00E-06 
O.OOE+00 
0.2 0.4 0.6 0.8 
Vds (V) 
Figure 5.15: Ids vs. Vds of N-CNFET for (13,0) chirality 
70 
CW2=*E'r/-l^  Cfiaracteristics 
-•-Vgs=0.3V 
7.00E-05 
.Vgs=0.6V .Vgs=0.9V 
0.4 0.6 
Vds (V) 
Figure 5.16: Ids vs. Vds of N-CNFET for (19,0) chirality 
Vgs=0.6V .Vgs=09V 
0.4 0.6 
Vds (V) 
Figure 5.17: Ids vs. Vds of N-CNFET for (26, 0) chirality 
h=4nm, ki=16, K2=3.9, S=20nm, channel length Lg=32mn and number of tube used 
are three. From Figures 5.15 to 5.17 it is clear that as the diameter of CNT increases the 
drive current increases. 
71 
(Performance TlvaCtiation of<Beticfimar^Circuits 
• Ids (19, 0) 
1.00E-03 -, 
• Ids (16, 0) • Ids (13, 0) ! 
9 0.2 0.4 0.6 0.8 
1.00E-04 4 
1 .OOE-05 
--- 1.00E-06 
< 
2 1.00E-07 
1 .OOE-08 
1.00E-09 +-
1.00E-10 
VgsMV) 
Figure 5.18: Ids vs. Vgs of N-CNFET for different chirality's 
Similarly, Figure 5.18 shows the Ids Vs Vgs (at Vds=0.9V) plot of N-CNFET 
for different chirality's such as (13, 0), (16, 0) and (19, 0) having diameter of Inm, 
1.25nm and 1.5nm respectively. The above plot shows that at Vgs=OV, as the diameter 
is increased, the leakage (Off-current) also increases. It is, therefore, very important to 
select the optimum diameter for low power and moderate speed performance. 
5.7 Performance Evaluation of Benchmark Circuits 
For performance evaluation between bulk (CMOS) and CNFET transistors, a 
benchmark circuit of 5-stage ring oscillator [130] is implemented at 32nm technology. 
The selected (W/L) ratio for NMOS and PMOS transistors is 2 and 4 respectively. 
Similarly a (19, 0) chirality of CNFET each with 3 nanotubes for N and P CNFET is 
considered over here for comparison. The frequency and power consumption of 5-stage 
ring oscillator is depicted in Figure 5.19 and 5.20 respectively. It is found that at 
VDD=0.9V, the CNFET based 5-stage ring oscillator provides 2.2X more frequency 
and consumes 47% less power than that of bulk ring oscillator. 
72 
(Performance %vduation of(Benclimar^Circuits 
1.20E+1 
l.OOE+l 
^ 8.00E+10 
N 
I , 
^ 6.00E+I0 
3 
i C 
, ,? 4.00E+10 
2.00E+10 
O.OOE+00 
•Bulk •CNFET 
Figure 5.19: Frequency of 5-Stage Ring Oscillator 
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 
l.OOE-03 1 
l.OOE-04 ' 
l.OOE-05 ' 
^ l.OOE-06 ; 
i o l.OOE-07 i 
1 1 
l.OOE-08 i 
l.OOE-09 i 
\ 
; I 
l.OOE-10 
-•—Bulk 
-A-CNFET 
VDD(V) 
Figure 5.20: Power Consumption of 5-Stage Ring Oscillator 
Similarly to observe the driving capability and power consumption 'an inverter 
driving load of fan-out of four inverter (F04) benc^hmark is implemented in bulk and 
CNFET transistors. Now each CNFET consists of 5 tubes whereas the (W/L) ratio of 
bulk F04 is kept constant as described above. The simulation is carried out at 
frequency of 500MHz with supply voltage ranging from 0.9V to 0.4V. 
73 
(Performance ^EvaCuation ofcBencHmar^Ctrcuits 
Due to higher mobility and baUistic transport of CNFET, the F04 implemented 
with CNFET has high speed shown in Figure 5.21. Similarly the effective width of 
CNFET is very small therefore switching power consumption of CNFET F04 is lower 
than bulk F04 shown in Figure 5.22. Due, to lower delay and power consumption the 
PDF of CNFET F04 is lower than that of implemented in bulk. Simulation results of 
Figure 5.23 show that at VDD=0.9V and 0.4V, the PDF of CNFET F04 is 72% and 
86% less than the bulk F04. The leakage power of CNFET F04 is very small as shown 
in Table 5.1. This is due to absence of dangling bonds in CNFET and the use of 
aqueous dielectrics provides opportunity to CNFET for high- K (Hf02) electrolyte 
gating. 
0.4 0.5 
Bulk ^A-CNFET 
0.6 0.7 
VDD(V) 
0.8 0.9 
Figure 5.21: Delay vs. supply voltage of F04 inverter 
0.4 0.5 0.6 0.7 
VDDC^ 
0.8 0.9 
Figure 5.22: Power vs. supply voltage of F04 inverter 
Table 5.1: Leakage power (nW) comparison of CNFET and Bulk F04 
VDD (V) 
Bulk 
CNFET 
0.9 
54.9 
0.92 
0.8 
35.7 
0.81 
0.7 
24.2 
0.71 
0.6 
15.6 
0.60 
0.5 
10.3 
0.50 
0.4 
6.5 
0.40 
74 
(Performance ^EvaCuation of (Basic Logic element ofT^gA 
9? 
20 
15 1 
-•—Bulk -^-CNFET 
a 10 
a. 
o 
a. 
0.4 0.5 0.6 0.7 
VDDOO 
0.8 0.9 
Figure 5.23: PDP vs. supply voltage of F04 inverter 
5.8 Performance Evaluation of Basic Logic Element of FPGAs 
The reconfigurable element of FPGA implements virtually any digital logic 
function. The majority of FPGAs provide programmable logic using 4 input lookup 
tables (LUTs). A, LUT is capable of implementing a combinational logic function. In 
order to support sequential logic, flip flop is placed at the LUT output. This 
combination is referred to as a basic logic element (BLE). For more details of BLE 
please, refer Figure 3.1 (b) of Chapter-3. To evaluate the performance benefit of 
CNFET for FPGAs a BLE is implemented in bull.: and CNFET transistors. The (W/L) 
ratio of NMOS pass transistors in the bulk LUT is considered as 2 whereas the 
NCNFET pass transistor is implemented by using 15 CNTs of diameter 1.5nm. 
Similarly the level restoring buffer of respective BLE is sized up interms of 
width and number of CNTs. A load capacitance of IfF is connected at the output of 
BLE and simulation is carried out at frequency of 500MHz with supply voltage ranging 
from 0.9V to 0.5V. As per the above discussion (F04), Similar trends in delay and 
power of both BLE is observed shown in Figure 5.24 and 5.25. 
As the parasitic load of BLE circuit is high therefore for higher VDD range 
(from 0.9V to 0.6V) the PDP decreases, thereafter for lower VDD (i.e. at '0.5V') the 
delay increases abruptly and PDP also increased. This is due to longer critical path of 4-
input LUT (i.e. 16:1 multiplexer which is sub-block of LUT). 
75 
(Performance T.vaCuation of (Basic Logic 'Element ofT(P^A 
a 
1 
300 
250 « 
1 
200 ^  
i 
150 ' 
100 
50 : 
0 ^' 
-•-Bulk -A-CNFET 
0.5 0.6 0,7 0.8 
VDD(V) 
Figure 5.24: Delay vs. supply voltage of BLE 
0.7 
VDD(V) 
Figure 5.25: Power vs. supply voltage of BLE 
:2, 
90 
Q 
a. 
400 1 
350 i 
300 ' 
250 -• 
200 . 
150 ; 
100 • — 
50.; 
0 1-
0.5 
-•-Bulk -A-CNFET 
0.6 0.7 
VDD(V) 
0.8 0.9: 
Figure 5.26: PDP vs. supply voltage of BLE 
76 
Summary 
The optimum PDP obtained for both LUT is at VDD=0.6V. Due to advantage 
of lower power consumption and higher speed of CNFET, the CNFET based BLE 
provides 37% improvement in optimum PDP compared to bulk BLE as depicted in 
Figure 5.26. This shows that CNFET holds a lot of promise as an alternative to the 
MOS transistor for implementing future low power FPGAs logic blocks. ^ ^ '' "^-" 
5.9 Summary X.'< /* ,.. 
This chapter gives an overview of CNT technology along with the different 
fabrication techniques. The chapter also explains the modeling methodology of MOS-
CNFET. MOS-CNFET provides improved accuracy by accounting several practical 
nonidealities, such as scattering, effects of the doped source/drain extension region, and 
inter-CNT charge screening effects. The IV characteristics of CNFET with different 
chiralities (diameters) is demonstrated and it is found that appropriate selection of 
chirality is essential for low power and high speed performance. The performance 
analysis of benchmark circuits such as 5-stage ring oscillator and inverter driving F04 
inverter is demonstrated and it has been found that the CNFET has very high operating 
frequency and consume very low power contrary to bulk. Further a low power basic 
logic element (BLE) block of FPGAs is explored and it is found that CNFET based 
BLE provides a significant improvement in PDP over Bulk BLE. The next chapter is 
about the reduction of leakage in FPGA multiplexers by input vector control 
techniques. 
77 
chapter 6 
Com'<KPL 
T^gA Interconnect (Power ^ diiction (By Input Sector ControC 
Chapter 6 
FPGA INTERCONNECT POWER REDUCTION 
BY INPUT VECTOR CONTROL 
6.1 Introduction 
The ability of FPGAs to implement a variety of circuits on a single chip results 
in significant spatial and temporal underutilization of logic and interconnects resources. 
These under utilizations cause transistors to leak power in the absence of switching 
activity. The interconnect contributes nearly 70% of the total power of which 35% is 
attributed to standby leakage [14]. Further as we move to smaller node sizes, leakage 
will ultimately dominate the total power distribution [11]. Thus, the problem of leakage 
power management is particularly acute in the FPGA routing fabric. This chapter 
focuses on the reduction of leakage power in the interconnect switch matrix 
multiplexers. The analysis of multiplexers is carried out with varying sizes, topologies 
and transistor sizing at different temperatures and siapply voltages at a deep submicron 
22nm technology node. 
Section 6.2 presents the background work on the leakage reduction of routing 
multiplexers. Section 6.3 presents the architecture of target FPGAs. Section 6.4 
explains the dependence of leakage power in FPGA on input state and Section 6.5 
presents the leakage optimization of decoded, encoded type 4:1 multiplexer switches 
and a larger (16:1) hybrid multiplexer switch. 
6.2 Background Work 
Majority of the leakage reduction techniques for FPGAs in the literature such as 
power gating, use of dual supply voltages, gate length biasing, body biasing, multiple 
threshold voltage transistors, redundancy in circuit design etc, have area and 
performance penalties [131]- [134]. These techniques can be employed only in critical 
battery operated applications where power is a major issue at the cost of performance. 
The authors in [135] have managed to reduce leakage power of the utilized parts of an 
78 
(Bac^roumfWor^ 
FPGA through exploiting a look-up table property that allows a signal to be 
interchanged with its complement. 
Input signal forcing is a substantial leakage power reduction technique in FPGA 
due to the use of pass-transistor logic in their design. The power dissipation is strongly 
state dependent in pass transistor logic. The authors in [136] proposed a complete new 
methodology, based on input pin reordering to reduce the total leakage power 
dissipation in all components of FPGA without any physical or perfoirmance penalties. 
Input reordering is used to place the logic and routing resources in their lowest leakage 
state. 
Input signal forcing techniques have also been used in [12] to reduce leakage power in 
FPGAs. Since the leakage current is heavily state dependent, by manipulating the 
inputs of some logic blocks, some FPGA parts can be placed in a low-leakage state. 
The authors in [12] demonstrated substantial leakage power saving without much area 
and performance penalties by applying the minimum leakage vector to the unused and 
partially used FPGA muhiplexers. Moreover, it has also been observed that almost 90% 
of the routing multiplexers are unused in the targeted FPGA for benchmark designs. 
However, the authors have computed the minimum leakage vector by only considering 
the multiplexer instead of the combination of the multiplexer and the buffer circuits. It 
has been found that the leakage power of a stage depends on the output stage loading 
[137] and, therefore, the power consumed by the multiplexer depends upon the power 
consumed in the output buffer stage and hence, the combination should be simulated 
for arriving at the minimum leakage vector. Moreover, the authors in [12] have also 
assumed that all the bits in the configuration memory corresponding to the unused 
portions of the FPGA is equal to '0'. It is possible to save leakage power considerably 
by manipulating not only the inputs to the routing multiplexers but also the select inputs 
feeding from the configuration memory as is evident irom this work. 
6.3 Architecture of Target FPGAs 
The target FPGA architecture considered here is already presented in chapter 3 
(Figure 3.7). This chapter focuses on the leakage power reduction in the interconnect 
switch matrix multiplexers. The multiplexers in FPGA interconnects are implemented 
using NMOS transistor trees [135] as shown in Figure 6.1 and Figure 6.2. 
79 
Mcfiitecture of Target <F(PgA 
Figure 6.1 shows a 4:1 decoded multiplexer which requires four configuration SRAM 
cells (R1-R4) for their operation. The input to output path consists of only one pass 
transistor. Figure 6.2 shows an encoded 4:1 multiplexer that requires only two 
configuration SRAM cells (R5-R6) but has a longer delay path of two pass transistors 
from input to output. A hybrid design comprising of a combination of decoded and 
encoded multiplexers, shown in Figure 6.3 for a 16:1 multiplexer, is commonly used 
for larger than 4:1 multiplexers to achieve the best area-delay tradeoff [12]. The size of 
the multiplexer in an FPGA varies from 4:1 to 32:1 depending upon the length of the 
wire segment to be driven [135]. In this work, the larger size multiplexers are realized 
as a combination of decoded and encoded multiplexers. The maximum number of pass 
transistors in a chain from the input of the multiplexer to the buffer input is assumed to 
be three over here. When a logic 1 is passed through an NMOS-based multiplexer, a 
weak T appears on the multiplexer's output (VDD-Vth). The weak T can cause 
excessive leakage power in the buffer attached to th(j multiplexer output and, therefore, 
a level restoring buffer is used at the output of every multiplexer. This work 
concentrates on reducing leakage power in the unused multiplexers as well as the level 
restoring buffer of the interconnect switch matrix only by the application of minimum 
leakage vector. The unused multiplexer inputs can be set to the desired values by 
setting the output of the multiplexer or a logic slice that is driving it. Most of the 
multiplexers in an FPGA are unused. The output of these multiplexers is pulled-up in 
an FPGA mainly due to the presence of a feedback PMOS transistor [12]. 
R5 
1 R1 1 
R2 1 
R3 
1 
R4 
1 
D ' 
V DD 
R5B R6 R6B 
R5 
R6 
VDD 
R5B 
J 
[R5 
J R6B 
R5B 
Figure 6.1: 4:1 Decoded multiplexer Figure 6.2: 4:1 Encoded multiplexer 
80 
Input-State (Dependence ofLea^ge (Power in T<PQJis 
6.4 Input-State Dependence of Leakage Power in FPGAs 
The leakage current has two major components namely subthreshold and gate 
leakage currents. Both of these currents strongly depend upon the values of the input 
signals. 
6.4.1 Subthreshold leakage 
The subthreshold leakage current is given by following Expression. 
/sub=/oe(^^^~^^^(l-e~^~^) (l+ms) (6.1) 
nVrp Vrp 
Where, 7o'is a constant,' VT is the thermal voltage,'«' is the subthreshold slope factor 
and '/I' is the channel length modulation parameter. It is clear that the subthreshold 
current depends both on the gate to source voltage {Vgs) and the drain to source voltage 
{Vds). Both these voltages should be kept small for achieving the minimum value of the 
subthreshold current. It is also evident from the expression that the subthreshold current 
exponentially increases with the reduction in the threshold voltage in deep submicron. 
Hence, the subthreshold current can be reduced by increasing the threshold voltage. 
This can be achieved by increasing the bulk tO' source voltage which indirectly 
increases the threshold voltage through a well known phenomenon of Body effect. It 
means that the transfer of a logic ' 1' through a pass transistor results in lower leakage 
than that of logic '0' because the transfer of '0' has a lower value of bulk to source 
voltage {Vhs). The subthreshold current also increases exponentially with the increase 
in temperature. All the multiplexers are simulated at two different temperatures such as 
40 and 85°C respectively to investigate this effect. 
6.4.2 Gate leakage 
The continuous scaling of channel length requires corresponding scaling of the 
oxide thickness to counteract short channel effects. The reduction in oxide thickness 
results in a flow of tunneling currents between gate and source terminals and also 
between gate and drain terminals. The tunneling cuirrent exponentially depends upon 
the oxide thickness and the magnitude of Vgs and Vds. Hence, the gate leakage current 
is maximum when the gate voltage is different from the drain and source voltages of 
the pass transistor network. Gate leakage can be minimised by keeping identical values 
of all the three terminal voltages. Gate leakage current exists in both the ON and OFF 
81 
Lea^e (Power in T(PgA Multiplexers 
states of the MOSFET contrary to subthreshold leakage which is only present in the 
OFF states. The minimum leakage state in a pass transistor network depends upon the 
relative strengths of subthreshold and gate leakage currents. The minimum leakage 
state is quite different at different technology nodes from 90nm to 22nm depending 
upon the relative strengths of the two leakage components and temperature. In [138], it 
is reported that the gate leakage power dissipation is less than 1/20 of the subthreshold 
leakage power dissipation in 90nm FPGAs at room temperature. Subthreshold leakage 
dominates at 90nm but gate leakage is increasing at a steeper rate than subthreshold 
leakage despite the use of high k material in deep submicron. Moreover, gate leakage is 
almost independent of temperature contrary to subthreshold leakage. Hence, there is a 
need to investigate the minimum leakage vector in the FPGA multiplexer exhaustively 
with respect to transistor sizing, voltage scaling and at different temperatures at a deep 
submicron technology node. 
6.5 Leakage Power in FPGA Multiplexer 
The leakage of an unused multiplexer is a strong function of its state. This 
chapter carried out leakage analysis of several, multiplexers with varying sizes, 
topologies and transistor sizing at different temperatures and supply voltages at a deep 
submicron 22nm technology node. The foUov/ing sections cover the different 
muhiplexers used in the analysis. 
6.5.1 4:1 Decoded Multiplexer 
Table 6.1 shows the different low leakage vectors obtained for minimum and 
optimum sized (Opt) 4:1 decoded multiplexers. The leakage power values reflect the 
power consumed in both the multiplexer and the level restoring buffer. The second 
stage of the level restoring buffer is sized three times higher than the first stage [135]. 
The optimum sized multiplexers are ten times wider than the minimum size one for best 
area-delay tradeoff in FPGA multiplexers [140]. This work compares three different 
low leakage vectors for a 4:1 decoded multiplexer. The first one set to ' 1 ' all inputs to 
the select lines and input lines of the muhiplexer which is referred to as 'Dl', the 
second vector corresponds to application of '0 ' to the select lines and ' 1 ' to the input 
lines which is referred to as 'D2'. The last vector resets to '0 ' all the selection, input 
lines of the multiplexer along with its output by using a pull-down transistor at the 
82 
Lea^e (Power in T<P^A MuCttpk^ers 
output of the multiplexer along with one more le /^el restoring transistor across the first 
inverter, which is referred to as 'Tuan' approach [12]. It is clear from Table 1 that 'D7 ' 
vector consumes the minimum leakage power for all sizes of multiplexers at both the 
temperatures and the supply voltages. The leakage power saving of 'Z)7' with respect to 
'Tuan' is even more at a lower supply voltage of 0.6V. 'Z)7 ' gives best results because 
gate leakage in the multiplexer reduces considerably if all the terminals of the pass 
transistors are at the same potential. The threshold voltage also goes up due to higher 
value of Vhs contrary to 'Tuan' resulting in further decrease of drain to source current. 
Moreover, the power consumed in the more complex level restoring buffer in case of 
'Tuan' approach is much higher as compared to 'D/ ' . 
This work proves that leakage power saving of more than 20?/o compared to the 
existing 'Tuan' approach is possible by setting all the inputs of the multiplexers to ' 1' 
rather than '0 ' . The proposed approach can be easily applied to the inputs of 
multiplexers by setting the output of the previous stage multiplexer or logic slice to ' 1' 
because of the presence of the level restoring buffer contrary to 'Tuan' approach which 
requires additional hardware to reset the output into '0' state. The configuration 
memory leakage power is negligible [70], [139] and, therefore, it can be set into logic 
T state corresponding to decoded multiplexer without leakage overhead. It is even 
possible to apply 'D2' leakage vector for the minimum size multiplexer instead of 
'Tuan' to achieve slightly less leakage reduction but in this case the configuration 
memory content can remain equal to '0' . It is also clear from Table 6.1 that 'D2' vector 
results in higher leakage current in the optimum sizesd multiplexers because of the high 
gate leakage and subthreshold currents due to large transistor size in the presence of 
high Vds and Vgs contrary to 'Tuan' and 'Dl'. 
It should be noted that all the other leakage vectors consume more leakage power than 
these vectors. 
83 
Lea^e (Power in T(PgA Mufiipfexers 
Table 6.1: Leakage Power of 4:1 Decoded Multiplexer 
Vector 
Dl 
D2 
Tuan 
Total leakage power in nW 
40°C (Temperature) 
0.8V 
Minimum 
size 
205 
235 
246 
Optimum 
Size 
205 
578 
246 
0.6V 
Minimum 
Size 
66 
76 
81 
Optimum 
size 
66 
190 
81 
85°C (Temperature) 
0.8V 
Minimum 
size 
326 
357 
394 
Optimum 
size 
326 
720 
394 
0.6V 
Minimum 
size 
114 
125 
146 
Optimum 
size 
114 
239 
146 
6.5.2 4:1 Encoded Multiplexer 
Table 6.2 shows the different low leakage A'ectors obtained for a minimum and 
optimum sized 4:1 encoded multiplexer. The first leakage vector is obtained by keeping 
all input lines of multiplexers at logic ' 1 ' and inputs to inverters at logic '0\'E1'). The 
second vector is obtained by keeping all the inputs to the multiplexers and inverters at 
logic T {'E2') and the third leakage vector corresponds to all '0' inputs and '0' 
multiplexer output ('Tuan'). It is clear from lable 6.2 that the proposed leakage vector 
'El' resuhs in leakage power saving from 11% to 43% compared to the exisfing 'Tuan' 
depending upon the transistor size, temperature and supply voltage. The leakage power 
saving is higher for larger transistor size because the gate leakage depends upon the 
transistor size and goes up substantially in the presence of higher Vgs and Vgd in the 
pass transistors of multiplexer for 'Tuan' contrary to El' and 'E2'. 
Vector 
El 
E2 
Tuan 
Table 6.2 : Leakage Power of 4:1 Encoded Multiplexer 
Total leakage power in nW 
40°C (Temperature) 
0.8V 
Minimum 
size 
304 
330 
338 
Optimum 
Size 
511 
537 
730 
0.6V 
Minimum 
size 
98 
109 
110 
Optimum 
size 
169 
180 
213 
85°C (Temperature) 
0.8V 
Minimum 
size 
472 
493 
530 
Optimum 
size 
698 
753 
931 
0.6V 
Minimum 
size 
164 
192 
192 
Optimum 
size 
240 
267 
300 
84 
Lea^e (Power in (pcpgjL MuCtipkxers 
6.5.3 Hybrid Large Size Multiplexers 
Table 6.3 shows different low leakage ve(;tors obtained for a minimum and 
optimum sized 16:1 hybrid multiplexer shown in Figure 6.3. Since the number of 
transistor levels is assumed to be three for these multiplexers, the decoded multiplexer 
stage varies in size from 2:1 to 8:1 whereas the encoded muhiplexer stage has a fixed 
size of 4:1 for the realization of 8:1 to 32:1 large size muhiplexers. This work compares 
four different low leakage vectors for the 16:1 multiplexers. Similar results are obtained 
for 8:1 and 32:1 muhiplexers. The first vector is obtained by keeping all inputs and 
select lines of decoded multiplexers at logic ' 1' and inputs to the inverters of encoded 
multiplexers at logic '0'( 'HI'). The second vector is obtained by keeping all the inputs 
to the multiplexers and inverters at logic ' 1 ' ('H2'). The third leakage vector 'H3'is 
obtained by keeping all the multiplexer inputs at logic ' 1' except the select inputs of the 
decoded stage and the inverter inputs which are kept at logic '0' . The last vector 
corresponds to all inputs along with the multiplexer output equal to '0 ' {'Tuan'). It is 
clear from Table 6.3 that the leakage power saving is minimum for our leakage vector 
'HI' even for 16:1 multiplexer for different supply voltages and temperatures. The 
leakage power saving is more than 18% for 'H]' over 'Tuan' for different transistor 
sizes, temperatures and supply voltages. This is due to both higher gate leakage power 
consumption in the encoded stage and also more leakage in the first inverter of the 
more complex level restoring buffer in case oi'Tuan'. It is also evident from Table 6.3 
R1 R2 R3 
A -
B-
C-
D-
A -
B-
C-
D-
A -
C-
D-
A -
B-
C-
D-
I 
R4 
4:1 
Decoded 
Mux 
4:1 
Decoded 
Mux 
VDD 
4:1 
Decoded 
Mux 
4:1 
Decoded 
Mux 
Figure 6.3: 16:1 Hybrid multiplexer 
that the leakage vector 'H2' also consumes lower leakage power than 'Tuan'. Only 
'H3' vector consumes more leakage power than 'Tuan' only for optimum sized 
85 
SutnmaTy 
multiplexers because of the large gate leakage cunrent in large size pass transistors of 
multiplexer. It means that for achieving least leakage power in large multiplexers all the 
inputs of the multiplexers should be at logic ' 1 ' including the select lines of the 
decoded stage whereas the inverter inputs of the encoded stage must be kept at logic 
'0'. 
The authors in [12] have implemented a large number of benchmark circuits in 
an FPGA and have found that almost 92% of the multiplexers are dead multiplexers i.e. 
neither the input nor the output of these multiplexers are used for these designs in the 
targeted Virtex-II FPGA. The authors have applied their minimum leakage vector 
technique namely 'Tuan' to achieve 70% leakage power reduction in these multiplexers 
with overall FPGA leakage power saving of 38%i. The leakage power in the dead 
multiplexers can be further reduced by more than 20%) by applying the minimum 
leakage vectors investigated here over 'Tuan' vector. Moreover, there is no area 
overhead contrary to 'Tuan' work. 
Table 6.3: Leakage Power of 16:1 Hybrid Multiplexer 
Vector 
HI 
H2 
H3 
Tuan 
Total leakage power in nW 
40°C (Temperature) 
0.8V 
IMinimum 
size 
322 
347 
382 
381 
Optimum 
Size 
515 
542 
1297 
666 
0.6V 
Minimum 
Size 
98 
109 
118 
121 
Optimum 
size 
170 
182 
430 
204 
85°C (Tern 
0.8V 
Minimum 
size 
496 
552 
558 
583 
Optimum 
size 
727 
783 
1577 
942 
perature) 
0.6V 
Minimum 
size 
164 
192 
186 
204 
Optimum 
size 
250 
278 
525 
298 
6.6 Summary 
Leakage current also depends on the length and width parameters of the 
transistor. If the transistor becomes longer it will ha\'e a smaller leakage current and if 
it becomes wider it will have a higher leakage cun^ent. Unlike total dynamic power 
consumption, total leakage current does not depend on the switching activity. It only 
depends on the number of transistors which exist on the FPGA chip. The minimum 
leakage state heavily depends on the relative magnitude of the subthreshold and the 
gate leakage currents. Consequently, it is expected that the minimum leakage states will 
change for each technology. Finding the minimal leakage input pattern is very 
important in minimizing the leakage current. 
86 

(RpSust Law (Power C!NT^n:'(Base(f6'TS^M Cell 
Chapter 7 
ROBUST LOW POWER C N F E T BASED 
6T SRAM CELL 
7.1 Introduction 
To achieve high integration density with every technology generation, CMOS 
semiconductor devices are scaled aggressively. This scaling requires reduction in 
supply voltage and gate oxide thickness. Scaling of supply voltage needs proportional 
reduction in threshold voltage of the transistor to maintain speed. The reduction in 
threshold voltage exponentially increases the subthreshold leakage. Similarly, scaling 
of gate oxide thickness exponentially increases the direct tunneling current as well as 
source-substrate and drain-substrate junction bamJ-to-band tunneling current. Due to 
aggressive scaling, secondary effects and process variations, the power consumption 
and performance of CMOS SRAM cell worsens in deep submicron [141]-[142]. It has, 
therefore, become difficult to design low power, robust, compact SRAM cell in deep 
submicron. The carbon nanotube based field effect transistor (CNFET) technology with 
reduced process variation, better gate controllability, high thermal stability and high 
drive current is a promising alternative to the conventional bulk CMOS [143]. 
This chapter explores a low leakage and robust CNFET based 6T-SRAM cell 
and compares its performance with that of a conventional CMOS based cell. 
Simulations are performed for equal threshold vohage {Vth) and for cell ratio=pull-up 
ratio='r using the Berkeley predictive technology model [144] at 32nm technology 
node. An experimentally validated Verilog-A model of the CNFET is used instead of 
an HSPICE MACRO to speed up simulation [145] and [128]-[129]. Considerable 
improvement in write margin (WM), write delay, standby leakage power consumption 
and speed has been achieved for the same cell sizing in CNFET cell compared with 
CMOS cell. Section 7.2 describes the CNFET; Section 7.3 compares the standby 
leakage power of CNFET and CMOS cells. Section 7.4 demonstrates and compares 
different key performance parameters of the cells. Section 7.5 compares the effects of 
process parameter variations. 
87 
CarOon !Nanotu6e TieOf effect Transistor 
7.2 Carbon Nanotube Field Effect Transistor 
The single walled carbon nanotube (SWNT) is a one-dimensional conductor that 
can be either metallic or semiconducting depending upon the arrangement of the carbon 
atoms. Both metallic and semiconducting nanombes are attractive for a variety of 
applications. On one hand, the conductivity of metiillic nanotubes and their robustness are 
attracting interest for its use as future interconnects. On the other hand, the semiconducting 
nanotubes exhibit the desired properties for implementing field effect transistors and logic 
switches. The carbon nanotube FET CNFET has been covered in Chapter 5. 
The MOSFET like CNT (MOS-CNFET) [129], shown in Figure 7.1, is the 
device used in simulation. In CNFET, the gate electrode separates the CNT by a high-K 
dielectric (HfOi). 
L g 
CNTs 
W g a t e 
-J 1 
I L 
bra in 
£ite 
S o u r c e 
L D D 
L S S 
Dielectric 
(High K-gate) CNTs 
Drain 
\ I Gate I 
Source 
- p Csub 
Substrate 
Figure 7.1: CNFET cross-section and related parameters 
Despite the fact that CNT material enjoys very high carrier mobility, a single 
intrinsic CNT appears to have low current drive. Hence, an array of CNT is used to 
provide a large current drive. The individual CNT in an array is separated by a distance 
known as pitch 'S' (center to center CNT distance),, The drive current strongly depends 
upon 'S ' and the ratio (t/r) [146], where 't' is the oxide thickness and 'r' is the radius of 
the CNT. Figure 7.2 shows the Ids vs. S plot, with Vds=Vgs=0.9V for the different (t/r) 
ratios of an N-channel CNFET. It is found that choosing pitch larger than 2(t + r) with 
small (t/r) ratio provides larger gate to nanotube capacitance per unit area i.e. larger 
drive current. This is due to two reasons: 
1) As the oxide thickness is reduced, the charge distribution becomes tighter and the 
charge tends to accumulate just over the CNT channel. 
88 
Standby Lea^e (Power 
2) As the CNTs are brought closer and closer, the capacitance from the gate to each 
CNT channel decreases because each CNT can mirror a smaller amount of 
charge from the gate, which leads to a reduction in current per tube [147]-[148]. 
As depicted in Figure 7.2, the maximum current drive is obtained for selected 
parameters {(t=3nm, r = 0.75nm, S =20nm, number of tubes=3 and LDD=LSS=32nm)} 
.i.e. t^ r=4. Where, LDD/LSS is the length of doped CNT drain /source side extension 
region. 
S=5 S=10 S=15 S=20 
Inter-CNT Pitch (nm) 
Figure 7.2: Ids vs. S for different (t / r) ratios for N-CNFET 
7.3 Standby Leakage Power 
Figure 7.3 shows the schematic of a 6T SRAM cell. In the standby mode, the 
word line is deactivated. Assume that the left side of the node stores a '0' and the right 
side a T . The major sources of leakage by various transistors in this mode are as 
follows: 
The access transistor 'MAI' and the pull down transistor 'MD2' contribute 
towards subthreshold leakage, whereas transistor 'MDl', pull up transistor 'MU2' and 
access transistor 'MA2' are responsible for gate tunneling leakage [149]-[150]. 
Table 7.1 shows that the leakage power of CNFET cell is very small. This is 
due to absence of dangling bonds in CNFET and the use of aqueous dielectrics 
provides opportunity for high K (Hf02) electrolyte gating. This helps in achieving very 
high insulator capacitance Cins. This in turn improves the gate control and also lowers 
gate leakage. At operating temperature of 27°C and 80°C, the standby leakage power 
89 
(Different (Performance (Parameters 
consumption of CNFET cell is 84% and 40% less than that of CMOS cell respectively. 
Hence, CNFET will be the best choice for low leakage memory design. 
BL 
VDD 
BLB 
VWL — ^ 
^ J MUl 
r i 
MAI 
N /Dl 
1^— 
' 
r* 1 
MU2 VWL 
i 
1 
1 1 
MA2 
MD2 
— » i 
Figure 7.3: Schematic of a 6T SRAM cell 
Table 7.1: Leakage power (nW) at different temperatures 
Temperature (°C) 
27 
80 
CMOS Cell 
2.60 
CNFET Cell 
0.42 
3.96 2.36 
7.4 Different Performance Parameters 
Three aspects are important for the SRAM cell design namely the cell area, 
stability and standby leakage. These are often conflicting requirements and trade offs 
must be made to realize the optimum design. For example, it is difficult to meet the 
requirement of both stability and small cell size or low power and high performance. 
The designing of the cell for improved stability invariably requires a larger cell area 
which degrades the write margin (WM). A memory cell has to be designed very 
carefully for reliable operation and minimum area. The key parameters for performance 
evaluation are such as: 
7.4.1 Stability 
The stability of the cell is characterized by the static noise margin (SNM) 
criterion when the word line is activated for the read margin and inactivated for the 
hold margin. The SNM is defined as the minimum DC noise voltage necessary to flip 
the state of an SRAM cell [151]-[152]. The simulations of both the cells are carried out 
90 
<JHjfennt (Performance (Parameters 
for Cell Ratio (CR) = Pull-up Ratio (PR) =1, where [CR= (W/L) of Pull down 
transistor/(W/L) of access transistor] and [PR= (W/L) of Pull up transistor/(W/L) of 
access transistor]. As the threshold voltage selected for CMOS and CNFET is equal, 
and therefore, the read SNM of both the cells is ~ 165mV. Figure 7.4 shows the rate of 
fall of read SNM of both cells with respect to VDD. At VDD=0.4V, the read SNM of 
CMOS and CNFET cell is 70mV and 89mV respectively. The 21% increase in read 
SNM of CNFET cell is due to peak rise in transconductance (gm) in correspondence 
with gate voltage at which the first subband crosses the source Fermi level. Further, 
Figure. 7.4 shows that compared to CMOS cell, CNFET cell provides high read SNM 
as VDD scales down below 0.4V. This is due to high insulator capacitance Cins and 
thus improved gate controllability. Hence, CNFETs are suitable for ultra-low power 
memory design. 
1 
i 
> 
E 
*•—^  S 
z CO 
• D 
TO 
0) 
a: 
200 
150 
I 
100 1 
i 
j 
50 ! 
0 
-CMOS CNFET 
0 0.2 0.4 0.6 0.8 1 
Supply voltage VDD (V) 
Figure 7.4: Read SNM vs. supply voltage 
7.4.2 Temperature 
With the continuous technology scaling, the transistor density on a chip is 
approximately doubled for each new technology generation, resulting in the increase of 
the chip average operating temperature [153]. The simulation results of Read SNM of 
CMOS and CNFET based SRAM cell is shown in Figure 7.5. As temperature increases, 
the threshold voltage iyth) of CMOS transistor decreases and, therefore, the read SNM 
of CMOS SRAM cell also decreases. As the temperature changes from 25°C to 100°C, 
the read SNM of CMOS cell varies by up to 10.5%, whereas this variation is only 6% 
for CNFET cell. Due to high thermal stability of CNFET [154]-[156], there is a very 
minor fall in its transconductance (gm) and therefore, the variation in its read SNM is 
minimal. 
91 
(Different (Performance (Parameters 
170 
^ 165 1 
£ 160 < 
Z 155 
^ 150 
<D 
°^ 145, 
140 
2 5 50 
1—•—CMOS-•—OFElj 
! 
75 100 ! 
Temperature (°C) 
Figure 7.5: Temperature vs. Read SNM 
7.4.3 Driving Capability 
The driving performance of CNFET cell is investigated by varying the number 
of tubes. Increasing the number of tubes in CNFET can reduce the write delay of the 
cell, but it has a power penalty. Figure 7.6 and Figure 7.7 show that as the number of 
tubes per CNFET is increased from 3 to 7, the write delay decreases from 269ps to 
182ps whereas, the write power increases from 4.8luW to 6.75uW. For our design, we 
have kept the number of carbon nanotubes equal to 3 for all transistors. At VDD=0.9V 
for the bit line capacitance of lOOfF, the write delay of CNFET cell is 269ps whereas it 
is 496ps for the CMOS cell. This shows that the speed of CNFET cell is 1.84X that of a 
CMOS cell. This is due to 1) high gate capacitance 2) improved channel transport and 
3) higher carrier velocity of CNT (7 x 10^  cm/s) compared to PMOS (3.5 x 10^  cm/s). 
The improved channel velocity for the CNFET arises from the increased mobility and 
the band structure of CNT. 
The write power of CNFET cell is 4.8 luW whereas; it is 3.59uW for the CMOS 
cell. Hence, the write power of CNFET is 1.3X higher than that of CMOS cell. This is 
due to very high gate insulator capacitance Cins. As the supply voltages is scaled down 
from 0.9V to 0.4V, the write delay increases and write power decreases as shown in 
Figure 7.8 and Figure 7.9 respectively. Due to high mobility, the rise in delay of 
CNFET cell with supply voltage reduction is slow whereas it is very fast in case of 
CMOS cell. If we take the speed advantage of CNFET cell, then at VDD=0.9V and 
VDD=0.4V, compared to CMOS cell, the CNFET cell provides an improvement of 
29.3% and 16.7% in power delay product respectively. 
92 
afferent <Peiformance (Parameters 
300 
250 
CO 200 
o. 
il50 
0) 
•D 100 
B 
5 50 
n 4 
3 1 4 5 6 
Number Of tubes/CNFET 
! 
1 
1 
7 
Figure 7.6: Write delay vs. Number of tubes / CNFET 
r 
f 
<o 
HI 
^m^ 
W. 
a> S 
o 
o. 
0> 
L . 
5 
7 
6 
f) 
4 
3 
^^,,~—''^ 
i 4 5 6 
Number Of tubes/CNFET 
7 i 
Figure 7.7: Write power vs. Number of tubes / CNFET 
9000 
8000 
7000 
g 6000 
>^ 5000 
(D 
« 4000 
a 3000 
$ 2000 
1000 
0 
C 
- • - C M O S 
-»-CNFET 
•--^^Itnrr*.-^ 
) 0.2 0.4 0.6 0.8 
SupplyvoltageVDD(V) 
1 
i 
Figure 7,8: Write delay vs. supply voltage 
7.4.4 Write Stability of the Cell 
It is measured using write margin (WM), which depends on the voltage divider 
between the supply voUage of the cell and the bit line (BLB), represented by the load 
(MU2) and the access (MA2) transistors connected to node ' 1 ' . The WM is the 
93 
effect of (Process (Parameter Variations 
maximum voltage on the bit-line that allows to write the cell while the other bit-line is 
at VDD [157]. Higher the write margin, greater is the cell stability. To improve the 
write margin, current flow through pull-up transistor must be lower than the access 
transistors. For the PR=CR= ' 1 ' , the write margin 'WM' of CMOS cell is 130mV 
whereas CNFET cell provides a 'WM' of 116mV, which is 10.7 % lower than CMOS 
cell. CMOS gets this advantage due to mobility difference between PMOS and NMOS, 
whereas mobility of N-CNFET and P-CNFET are equal and therefore, the current 
through pull-up and access transistors are same leading to the degradation in 'WM'. 
One of the interesting features of CNFET is that it is possible to increase the 
conduction of CNFET by increasing the numbers of tubes. 
The WM of CNFET cell can be improved if more number of tubes is used in 
access transistor compared to pull-up transistors. If the number of tubes for pull-up 
transistor is kept constant i.e. (N=3) and the number of tubes in access transistor is 
increased, then WM will increase. Table 7.2 shows the improvement in WM of CNFET 
cell by varying the number of carbon nanotubes in access and pull up transistors. 
Table 7.2: Write margin (mV) at different N of CNFET cell 
Number of tubes 
(N) 
Pull-up transistor 
3 
3 
3 
Access transistor 
3 
4 
5 
WM (mV) 
116 
130 
300 
t 
\ ( O 
ty. 
i w. 
5 
o OL 
ID 
i -
§ 
6 1 
5 \ 
\ 
4-1 
3 i 
? 
1 i 
Oi 
0 0.2 0.4 0.6 0.8 
Supply voltage VDD (V) 
Figure 7.9: Write power vs. supply voltage 
7.5 Effect of Process Parameter Variations 
To verify the effect of process parameter variation on the read SNM, the chaimel 
length (Lg) of transistor 'MDl' is varied [158]. Our examination shows that for CMOS 
cell, the SNM increases when the 'Lg' of 'MDl' decreases. This is due to increase in 
94 
Summaty 
cell ratio, whereas due to inherent device structure, geometric properties and 
considering the ballistic transport, the variation in SNM of CNFET cell is very small. 
Further, the variation of Lg in CMOS varies the Vth which affects read SNM, whereas 
the Vth of CNFET does not depend on the Lg and it is a function of diameter of tube. 
Therefore the SNM of CNFET cell does not change with the variation in Lg [158]-
[159]. 
9 
E 
Z 
CO 
73 (0 (D 
176 
170^ 
164' 
158 
152 
146 
140 
2 
—•—CWOS -
^ . 
B 28 30 32 34 36 
Channel length L (nm) 
•—CNFElj 
38 
Figure 7.10: Read SNM vs. Channel length of MDl transistor 
As shown in Figure 7.10, the SNM variation of CMOS cell is upto 10.6% whereas 
it is just 1.2% for the CNFET cell. This shows that compared to CMOS, the CNFET cell 
is more robust against process parameter variations. 
7.6 Summary 
This chapter has successfully explored the performance of the CNFET based 
6T-SRAM cell and compared it with that of the conventional CMOS cell at a deep 
submicron 32rmi technology node. Due to inherent characteristics of CNFET such as 
good gate controllability, drive current and immunity to short channel effects, the 
CNFET cell outperforms in leakage power consumption, write margin, speed and read 
SNM as compared to a CMOS cell. The CNFET based SRAM cell has also 
demonstrated much more stable SNM against temperature variation. Compared to 
CMOS, the CNFET cell is more robust against process parameter variations. CNFET 
transistor holds a lot of promise as an alternative to the MOS transistor. 
95 
CHapter 8 
im^RCommcT 
Energy efficient (Drivers atufC^NH^'bundles As Interconnect 
Chapter 8 
ENERGY EFFICIENT DRIVERS AND C N T 
BUNDLES AS INTERCONNECT 
8.1 Introduction 
While the reconfigurable nature of FPGA makes them an attractive hardware 
solution, they suffer from large overheads (area, delay and power) because of their 
programmable interconnect fabrics [160]. Further the interconnect fabric of FPGA 
typically consists of muhiple routing segments and switches which can results in larger 
delay compared to ASICs. The impact of interconnect performance will continue to 
increase as process technology scales. The International Technology Roadmap for 
Semiconductors (ITRS) predicts that traditional copper interconnects will be a major 
bottleneck when feature sizes become smaller than 45nm [161]. This is due to steep rise 
in parasitic resistance of copper due to the combined effects of grain boundary 
scattering, surface scattering and the presence of a highly resistive diffusion barrier 
layer. This rise in the resistance of copper not only increases the interconnect delay but 
also limits its current carrying capability [162]. In order to alleviate these problems, 
alternative interconnect technologies and their architectural implications for FPGAs in 
future process technologies must be explored. 
Carbon nanotubes (CNTs) have recently been proposed as a possible 
replacement for metal interconnects in future technologies [120], [163]. Due to their 
long mean free paths (MFP), high current carrying capability and high thermal 
conductivity [164]-[165], CNTs are expected to be a very good alternative material for 
future FPGA interconnects. Further, due to their covalently bonded structure, they are 
highly resistive to electromigration and other sources of physical breakdown. However, 
the high intrinsic resistance associated with an isolated CNT is (~ 6.45KQ) [106]. This 
necessitates the use of a bundle of CNTs as interconnect. The resistance of CNTs in a 
bundle comes in parallel and therefore, CNTs bundle interconnects carry higher 
magnitude of current and provides lower delay than Cu interconnects. The delay of 
interconnect depends on the values of resistance, capacitance and inductance. This 
chapter describes in detail the computation of resistances, capacitances and inductances 
for different types of CNT interconnects. This computation is necessary to compare the 
96 
COissijication cfQitSon !Ndnotu6es 
performance of CNT based interconnects with the copper interconnects. Section 8.2 of 
this chapter presents the classification of CNTs. Section 8.3 evaluates dependence of 
conductance on different parameters of CNT bundle. Section 8.4 evaluates dependence 
of Inductance and Capacitance on different parameters of CNT bundle. Section 8.5 
compares the RLC parameter of CNT and Copper interconnects. Section 8.6 compares 
the performance of CNFET and CMOS drivers with CNT and copper as interconnects 
and Section 8.7 presents the performance comparison of copper and CNT bundle for 
future FPGA interconnect. 
8.2 Classification of Carbon Nanotubes 
CNTs can be broadly classified as single-walled carbon nanotubes (SWCNTs) 
[166]-[167] and multi-walled carbon nanotubes (MWCNTs) [168]-[169]. SWCNTs 
consist of a single sheet of graphene rolled into a cylindrical tube with a diameter in the 
nanometer range. While, MWCNTs consist of two or more SWCNTs concentrically 
wrapped around each other with diameters ranging from a few to several hundred 
nanometers as shown in Figure 8.1. A SWCNT bundle consists of several SWCNTs 
packed together in parallel whereas a mixed CNT bundle consists of a combination of 
SWCNTs and MWCNTs packed together in parallel [170]. While SWCNTs can be 
either metallic or semiconducting depending on their chirality (the direction in which 
they get rolled up); giving rise to zigzag (mostly semiconducting), armchair (metallic) 
or chiral nanotubes (mostly semiconducting), MWCNTs are always metallic. 
Figure 8.1: (a) Single-wall carbon nanotubes (b) Multi-wall carbon nanotubes 
8.2.1 SWCNT 
Unlike copper, the resistance of a SWCNT is not a linear function of its length. 
Instead, the SWCNT resistance is constant up to a certain length called mean free path 
97 
classification of Carbon !NanotuSes 
(X) and then increases linearly thereafter. Due to spin and sub-lattice degeneracy of 
electrons in graphene, each nanotube has four conducting channels in parallel. Hence, 
the conductance of a single ballistic single-walled CNT assuming perfect contacts, is 
given by 4e^/h = 155 ,^5, which yields a resistance of 6.45Kfi [106]. This is the 
fundamental resistance associated with a SWCNT that cannot be avoided. 
For longer lengths, SWCNT resistance has been shown to depend on its length and bias 
voltage. For bias voltage less than the critical bias (< 0.16V for global wires [171]), the 
SWCNT resistance (Rsivcw-) is determined by [172] and is given as: 
Rswcm = R = h / 4e^ 1<X (8.1) 
R = h/4eU\) l>l (8.2) 
Where '/?' is the Planck's constant, 'e ' is the charge of an electron, ' / ' is the length of 
SWCNT and U' is the mean free path (MFP) length. The 'MFP' length is given by 
X^y^dla.T, where ( 'a ' is the total scattering rate, ' V^  ' is the Fermi velocity of 
graphene, 'd' is the SWCNT diameter and 'T' is the temperature respectively). 
The resistance of a SWCNT bundle is simply a parallel combination of number of 
metallic SWCNTs present in that bundle. 
The capacitance of a CNT arises from tvv'o sources: The electrostatic 
capacitance (Ci) and the quantum capacitance {CQ) [173]. The capacitance 'C// is 
calculated by treating the CNT as a thin wire with diameter 'tf placed at a distance 'y 
away from a ground plane as shown in Figure 8.2. 
Ci.: = 2ne/ln(y/d) (8.3) 
*CNT 
Ground Plane 
y 
Figure 8.2: Isolated CNT with diameter 'rf'over a ground plane at a distance 'y' 
Whereas 'CQ arises from the quantum electrostatic energy stored in the nanotube when 
it carries the current. Due to Pauli's exclusion principle, it is only possible to add 
electrons into the nanotube at an available quantum state above the FeiTni energy level. 
The SWCNT quantum capacitance per unit length is given by: 
Ca = 2eV/?Vf (8.4) 
98 
classification ofCarBon !Nanotu6es 
A SWCNT have been shown to consist of four co-propagating quantum 
channels therefore, the effective SWCNT quantum capacitance is 4Co. The same 
effective charge resides on both these capacitances (Q and 4Co) when the CNT carries 
current, as is true for any two capacitances in series. Hence, these capacitances appear 
in series in the effective circuit model [173]. 
The CNT has two types of inductances namely magnetic and kinetic [174]. The 
magnetic inductance (1^) depends on the magnetic field inside and betv/een the tubes 
whereas the kinetic inductance (Lk) is calculated in [17'>] (following [176]) by equating 
the kinetic energy stored in each conducting channel of the CNT to an effective 
inductance. These inductances are given by 
L^=j^Cosh-^(2y/d) (8.5) 
Since a nanotube has four co-propagating quantum channels, the effective value of 
kinetic inductance in the equivalent circuit is Z,^  / 4. The total SWCNT bundle 
inductance is then given by 
^Bundle =(^„^^^^^"'^-' (^ -^ ^ 
Where ' /i(-^j-' is the number of CNTs in that bundle. The equivalent circuit model 
of ideally contacted SWCNT is shown in Figure 8.3. 'Ri' indicates the resistance of 
CNT. 
Ri/2 Fli/2 
^^^^^^^\-^^^--eK«^^ 
-J- 4CQ I p 4CQ 
-|- CE I ^ CE 
Figure 8.3: Equivalent circuit model of ideally contacted SWCNT 
8.2.2 MWCNT 
A MWCNT consists of two or more concentric SWCNTs. MWCNT is 
essentially SWCNTs of varying diameters. Many projperties of SWCNT are valid for 
MWCNT. The number of shells (TV,/,) in a MWCNT is diameter dependent, as given by 
[170] 
99 
classification ofCarBon !Nanotu6es 
Where 'Doui' and '/),„' are the maximum and minimum shell diameter and '5 ' is the 
Van der Waals distance between graphene layers in graphite (which is 0.34nm). The 
approximate number of conduction channels (Nch) per shell for a MWCNT given by 
[169] is 
Nch /dsh = (ad+ b)r d> 6nm (8.9) 
=2r <6nm (8.10) 
Where a = 0.1836 nm'', b = 1.275, 'dsh' is the diameter of shell and V is the 
probability of the tube being metallic. 
The resistance for the /th SWCNT shell with diameter ' J / ' is given by: 
n _ ^SWCNT (n,,. 
'^SWCNTKdiJ)- j^,hld „ . , ^ ' 
sh(i) 
Each shell has its own 'c//' and number of conduction channels ' Nch' which can 
be derived from 'Doui' • Therefore, the total MWCNT resistance (RMWCNT) is a parallel 
combination of resistance of all the concentric SWCNTs [177]. 
sh 
Where RSWCNTI is the resistance of the rth concentric SWCNT and Nsh is the number of 
shells. Finally, the total MWCNT inductance is given by a relation similar to that for a 
SWCNT bundle [177]. 
L,,,,,^.,^=( '' +Lm).l (8.13) 
Since the circuit parameters of multiple shells van^ in MWCNTs. The potentials of 
different shells cannot be assumed to be equal as in the case of SWCNT bundles which 
induces shell-to-shell capacitive coupling. The shell-to-shell capacitance (Cs) per unit 
length can be obtained by using the coaxial capacitance formula [178]. 
Cs= ^^^ =- - •^ ^^ - (8.14) 
\n(Dou,/Din) \n{Dou,iDin-2S) 
100 
classification ofCarBon !NanotuSes 
The quantum capacitance CQ for MWCNT given by [179] is 
Cg = 4e'K,.L/hVf (8.15) 
Thus, 'Q/ is in series with electrostatic capacitance including shell-to-shell 
capacitance Cs and ground capacitance C/.. 
8.2.3 Mixed CNT 
A mixed (SWCNT and MWCNT) bundle consists of SWCNTs with a diameter 
'J' and MWCNTs with various diameters '£);„ < di < Doui- It has been shown that the 
outer diameters follow a normal (Gaussian) distribution [170]. The total resistance for a 
mixed CNT bundle given by [177] is 
Mixed 
' N(D JSD ^ ^ 
r ^ out^ out (8.16) 
Where ^MWCNT^^OUP^^ ^^ obtained from equation (8.12). The tube 
count N{D^^^) is obtained according to D^^^ with a normal (Gaussian) distribution with 
mean diameter' D^^^' and a standard deviation ' a Dg^.'. 
For the number of CNTs in the bundle Q^bmdk), the tube count N{D^^^) for the given 
D^^^ estimated by [170] is 
yV(D ) = — i ^ ^ e x p [ - i ( - ^ L ^ ) 2 ] (8.17) 
This relation can be used to derive a distribution curve for the tube count. The 
resuhing curve and the corresponding MWCNT resistance curve (equation 8.16) can be 
used to determine the total resistance of mixed CNT bundle. 
The capacitive characteristics of a mixed CNT bundle can be determined by the 
cross sectional dimension of the bundle therefore, similar to the case of the SWCNT 
bundle, we can assume that the mixed bundle have electrostatic capacitance to ground, 
coupling capacitances and quantum capacitance. Finally, the total kinetic inductance of 
a mixed bundle is the parallel inductance value of all the conduction channels in the 
bundle. 
101 
(DependJence of Conductance on (DiffereM (Parameters of COft<Sund{e 
Almost all experimental results [170], [180]-[182] have demonstrated that a 
realistic nanotube bundle contains a mixed bundle of SWCNTs and MWCNTs, 
therefore hereafter for all simulation, only a mixed CNT bundle interconnects is 
considered. 
Different geometric and process parameters such as (bundle width (W), bundle 
height (H), bundle length (/)), and (average diameter of tube, tube density (D), the 
r2ii\o[DinIDout ={dR)\, probability of metallic CNTs (r)) respectively in a bundle has 
significant effect on the conductance, inductance and capacitance of a CNT bundle. The 
next section, therefore, evaluates the performance of CNT bundle with respect to the 
above parameters. 
8.3 Dependence of Conductance on Different Parameters of a CNT Bundle 
Figure 8.4 shows simulation result of different process parameters such as tube 
density (D), [DinlDout ={dR)] and probability of metallic CNTs (r) in a bundle. 
0) 
o 
c 
u 
O UJ 
o ^ 
2. 
•a 
c 
3 
OQ 
1E12, 5E12, 5E12, 5E12, 5E12, 5E12, 
0,5 & 0.6 & 0.5 & 0.5 & 0.4 & 0.3 & 
0.333 0.667 0.333 0.667 0.667 0.667 
Different process parameters 
Figure 8.4: Bundle conductance vs. process parameters (D, dR & r) 
For the same aspect ratio of a CNT bundle, if 'D' varies from lE+12 to 5E+12 
tubes/cm^ the number of tubes in the bundle increases from 21 to 90. Similarly, the 
variation in V from 1/3 to 2/3 increases the number of conduction channels from 256 
to 312. 
The variation of 'dR' ratio impacts the number of the shells of MWCNTs. A 
smaller 'dR' leads to more shells and hence higher conductance. Simulation results as 
depicted in Figure 8.4 shows that compared to (D= lE+12, Ji?=0.5 and r =0.33), the 
102 
(Dependence of Inductance and Capacitance on (Different (Parameters ofCMT(Bundle 
process parameters (D= 5E+12, dR=0.3 and r =0.667) improve the bundle conductance 
by lOX. Hence, proper selection of above process parameters decides the improvement 
in bundle conduction. 
8.4 Dependence of Inductance and Capacitance on Different Parameters of 
CNT Bundle 
The dependence of inductance and capacitance on the geometric/process 
parameters of a CNT bundle is described as follows: 
8.4.1 CNT Bundle Inductance 
The CNT has two types of inductances namely magnetic and kinetic 
inductances. To analyze the contribution of both inductances, a simulaition has been 
carried out for a bundle geometry of [width ' W'= height 'H' for interconnect length 'L' 
=10um] with other process parameters kept constant i.e. (D= 5E+12, dR=0.5 and 
r=0.667). It was found that as ' W increases, the magnetic inductance starts to fall 
gradually (due to magnetic field generated by more number of tubes) v/hereas due to 
constant number of conduction channels (as 'JT?' is fixed), the kinetic inductance 
remains constant. Thus, the total inductance (kinetic -^ magnetic) falls gradually with 
' W as shown in Figure 8.5. Hence, for a significant reduction in total inductance, the 
contribution of kinetic inductance should also be reduced. 
Equation (8.13) shows that the kinetic inductance of a bundle has inverse relation 
with the number of conduction channels. Hence, by lowering 'dR\ we can create more 
conduction channels thereby, lowering the kinetic inductance. 
—•— Magnetic Inductance (E-11) 
.. A Total Inductance (E-TO) 
100 1000 
Width (nm) 
Figure 8.5: Inductance vs. bundle width (W) 
103 
(Dependence of Inductance and Capacitance on (Different (Parameters ofCMT(Bundle 
Simulation plots of Figure 8.6 for bundle geometry of (W==H=50nm and 
L=10um) shows that compared to (D=5E+12, dR=0.6 and r=0.33), the process 
parameters (D=5E+12, dR=03 and r=0.667) reduces the kinetic inductance by 67%. 
Similarly, Figure 8.7 shows the simulation results of the above geometry bundle 
with respect to the average diameter of tubes. As the average diameter increases from 
2.5 to 4nm, the bundle has around 120 tubes and the number of conduction channels 
' N^i^' increases from 271 to 421. This decreases the kinetic inductance from 2.96E-10 
to 1.9IE-10 Henry respectively (which is 35% less). 
Now as the average diameter reaches 4.5nm, the number of tubes in the said 
bundle reduces from 120 to 105, therefore ' A^^ /,' falls from 421 to 312. This results in 
the increase of kinetic inductance from 1.91E-10 to 2.57E-10Henry. Beyond the 
average diameter of 4.5nm, the density of tubes crosses the limit of 5E+12 tubes/cm^ 
and therefore, the simulations are restricted up to an average diameter of 4.5nm only. 
Hence, it is important to choose the average diameter carefully so as to reduce the 
kinetic inductance of the given mixed CNT bundle for the selected tube density. 
5E12, 5E12, 5E12, 5E12, 5E12, 5E12, 
0.6 & 0,6 & 0.5 & 0.5 & 0,4 & 0.3 & 
0.333 0.667 0.333 0.667 0.667 0.667 
Different process parameters 
Figure 8.6: Kinetic inductance vs. process piarameters (D, dR & r) 
104 
(Dependence of Inductance and Ccipcicitance on (Different (Parameters of CJfT bundle 
n Average diameter (nm) 
C3 No. of conduction channel (E+2) 
I Kinetic inductance (E-10) H 
4.5. 
4j 
3.5! 
3J 
2.5^ 
2 
1.5 i 
li 
O.5J 
A 
Figure 8.7: Kinetic inductance vs. Average diameters 
8.4.2 CNT Bundle Capacitance 
The capacitance of a CNT arises from two sources. Tiie electrostatic 
capacitance {CE) can be calculated by using equation (8.3).Whereas the quantum 
capacitance {CQ) arises from the quantum electrostatic energy stored in the nanotubes. 
The Cgof MWCNT is given by equation (8.15). Figure 8.8 shows the plot for bundle 
geometry (W=H =50nm and L= lOum), as the average diameter increases from 2.5 to 
4nm. The number of the tubes remains 120 and due to increasing number of subbands, 
/V^ ;, increases from 236 to 372, therefore CQ increases by 37%. As the average 
diameter approaches 4.5nm, then the said bundle geometry accommodates only 105 
tubes and N^^ reduces to 256 from 372 i.e. (CQ decreases by 31%). Hence, the proper 
selection of average diameter is important because h decides the magnitude of CQ for a 
given CNT bundle. 
3 3.5 4 
Average diameter (nm) 
4.5 
Figure 8.8: Quantum capacitance (CQ) of bundle 
105 
<SJLC (Parameter Comparison of CNTatuf Copper Interconnects 
8.5 RLC Parameter Comparison of CNT and Copper Interconnects 
The parameters resistance, inductance and capacitance (R, L and C) of mixed 
bundle of CNTs and Copper (Cu) interconnects were extracted, using the Carbon 
Nanotubes Interconnect Analyzer (CNIA) [183] and BPTM tools [95] with 
interconnects geometry suggested in [184] for 32nm node. The process parameters 
considered for RLC extraction of mixed CNT bundle is {D=5E+12 tubes/cm , Average 
Diameter-4.2mn, [Di„/Dout={dR)]=0.5 and r=2/3)} [170], [174]. It is found that the 
resistance of CNT bundle at local interconnects level, remains independent of length as 
shown in Table 8.1. This is due to the maximum length of l|im considered for local 
interconnects in the analysis which is smaller than the mean free path (MFP) of CNTs 
in the bundle. For CNT length less than its MFP, electron transport is essentially 
ballistic within the nanotube and the resistance is independent of length [185], which is 
called its intrinsic resistance (Ri). Whereas in Cu wire, the electrons can be 
backscattered by a series of small-angle scattering and the MFP is in the range of few 
tens of nanometers [185]. Therefore, the resistance of Cu wire increases linearly with 
hs length as shown in Table 8.2. Similarly Table 8.3 to Table 8.6 shows the extracted 
R, L and C parameters for the Intermediate and Global Interconnects for Cu and CNT 
respectively. For the higher geometry i.e. Intermediate and Global Interconnects, the 
CNT bundle accommodates more number of tubes, and therefore, the resistance of 
CNT bundle interconnects decreases as depicted in Table 8.4 and Table 8.6. 
Table 8.1: Local CNT Interconnects Parameters 
Length (fim) 
0.2 
0.4 
0.6 
0.8 
1 
Resistance (O) 
7.936 
7.936 
7.936 
7.936 
7.936 
Inductance (pH) 
5.03 
10.1 
15.2 
20.3 
25.5 
Capacitance (x 10"''F) 
2.5 
5.0 
7.5 
10.0 
12.5 
Table 8.2: Local Cu Interconnects Parameters 
Length (jiim) 
0.2 
0.4 
0.6 
0.8 
1.0 
Resistance (£i) 
1.6037 
3.2074 
4.811 
6.4149 
8.0186 
Inductance (pH) 
1.0 
3.0 
6.0 
8.0 
11.0 
Capacitance (x 10'^F) 
2.84 
5.68 
8.52 
11.36 
14.20 
106 
^ C Parameter Comparison of CO^ and Copper Interconnects 
Table 8.3: Intermediate Cu Interconnects Parameters 
Length (jim) 
100 
200 
300 
400 
500 
Resistance (U) 
513.194 
1026.388 
1539.583 
2052.777 
2565.971 
Inductance (nH) 
0.155 
0.338 
0.532 
0.733 
0.939 
Capacitance ( X 10'^F) 
14.2858 
28.5716 
42.8574 
57.1432 
71.4129 
Table 8.4: Intermediate CNT Interconnects Parameters 
Length (jim) 
100 
200 
300 
400 
500 
Resistance (fl) 
152.67 
305.12 
458.71 
609.75 
763.35 
Inductance (nH) 
2.63 
5.32 
7.97 
10.60 
13.31 
Capacitance (x 10"'^  F) 
12.2 
25.1 
37.5 
50.0 
62.5 
Table 8.5: Global Cu Interconnects Parameters 
Length (fim) 
200 
400 
600 
800 
1000 
Resistance {H) 
374.149 
748.299 
1122.448 
1496.598 
1870.748 
Inductance (nH) 
0.317 
0.689 
1.083 
1.490 
1.907 
Capacitance (x 10"" F) 
36.2834 
72.5668 
108.8502 
145.1336 
181.4171 
Table 8.6: Global CNT Interconnects Parameters 
Length (jim) 
200 
400 
600 
800 
1000 
Resistance (Q) 
109.05 
218.34 
326.79 
436.68 
546.44 
Inductance (nH) 
5.27 
10.60 
16.00 
21.31 
26.72 
Capacitance (x 1 0 " F) 
28.3 
56.7 
85.0 
113.0 
142.0 
As per the discussion and simulation results of section 8.3 and 8.4, the selection 
of appropriate process parameters can optimize (lowers) the value of R and L 
component of mixed CNT bundle interconnect. To extract the optimized values of R 
and L the process parameter considered here are such as {D=5E+12 tubes/cm , 
Average Diameter=3.8nm, [DinlDout={dR)]=^.'},3 and r=2/3)}. Table 8.7 and 8.8 
depicted the optimized values of R, L and C for intemiediate and global interconnects. 
As the bundle geometry is same therefore there is no change in the vales of capacitance. 
107 
Comparison ofC^N'PETandCMOS (Diiver-witfi CNl'amfCu Interconnects 
8.7: Intermediate CNT Interconnects Optimized Parameters 
Length (fim) 
100 
200 
300 
400 
500 
Resistance (£2) 
98 
198 
294 
392 
490 
Inductance (nH) 
1.52 
3.05 
4.66 
6.1 
7.6 
Capacitance (x 10'*F) 
12.2 
25.1 
37.5 
50.0 
62.5 
Table 8.8: Global CNT Interconnects Optimized Parameters 
Length (^m) 
200 
400 
600 
800 
1000 
Resistance (£2) 
70 
141 
210 
281 
350 
Inductance (nH) 
3.13; 
6.28: 
9.45 
12.95 
15.68 
Capacitance (X 10"'* F) 
28.3 
56.7 
85.0 
113.0 
142.0 
8.6 Comparison of CNFET and CMOS Driver with CNT and Cu Interconnects 
For performance analysis and comparison ol" CNT bundle and Cu interconnects 
in terms of power dissipation and delay. We use standard test benches [186]-[187] 
shown schematically in Figures 8.9, 8.10 and 8.11, instead of interconnect of Figure 
8.12. In Figures 8.9 to 8.11, 'RO' represents the olunic resistance, 'L' represents the 
total inductance and ' C represents the total capacitance of interconnect. Figure 8.12 
shows HSPICE test setup used for performance evaluation of CNFET and CMOS 
driver with mixed CNT bundle and Cu as interconnect for 32nm technology node at a 
frequency of IGHz. The length of interconnect considered for simulation is lOOum. 
The measured ratio between parallel nanotubes in a CNFET inverter is 3:2 (i.e. parallel 
3 and 2 CNTs each for N and P type CNFET respectively) to effectively balance the 
on-current of inverter. Whereas for CMOS inverter it is 1:1.6 (i.e. W/L of PMOS is 
1.6XofW/LofNMOS). 
Ri/2 
-A'V\A-
L Ri/2 
C -r- C 
Figure 8.9: CNT bundle circuit model for Local interconnects 
108 
l^umpuTuum oj i^jyr'jc-i ana x^wn^JO '•Lmver -wnn ujy-i ana i,u iTU-eramnects 
Ri/2 Ro L 
_/vvv^yvw\ A^ 
T 
Ro |_ Ri/2 
Figure 8.10: CNT bundle circuit model for Intermediate and Global interconnects 
Ro L Ro , Ro 
> - W W \ -
T 
Figure 8.11: Cu wire circuit model for Local, Intermediate and Global interconnects 
Ineyt 
Interconnect 
\ 
D r- Output 
Voltage 
1fF T 
Figure 8.12: Test setup for simulation 
For performance comparison with CNFET, a high performance (HP) predictive 
model of MOSFET [95] is used. Similarly a MOSFET like CNFET (MOS_CNFET) 
model from [129] with following specification is used. 
Diameter of CNT (d=\.5x)in), pitch (S=2i/=2* 1.5nm=3nm), Tox=2nm, Channel 
length Lg=32nm and Source/Drain under-lapped LSS=LSD=32nm. 
To observe the effect of number of CNTs used in a CNFET, three drivers of 
different CNT ratios such as (15:10), (30:20) and (45:30) are used and it is found that 
as the number of CNTs in a buffer increases they are able to more effectively drive the 
load, which results in shorter delay. As shown in Figure 8.13, the delay of CMOS 
driver with Cu at VDD=0.9V and 0.6V is 1.3X, 2X, 2.7X and 1.6X, 2.9X, 3.8X more 
109 
Comparison ofC9/'PEl'amfC^OS (Driver-with COfTandCu Interconnects 
than that of the CNFET driver of (15:10), (30:20), (45:30) with CNT as interconnects 
respectively. This high speed of CNFET driver is due to use of channel length of 
CNFET less than the mean free path of acoustic phonon, hence the CNFET buffer 
Delay vs VDD 
- CNFET_CNT_N:P(15:10) H t - CNFETlCNTjsl:P(30:20y 
-CNFET_CNT_N:P(45:30) -fp-CM0S_Cu_N;P(l:1.6) 
Figure 8.13: Delay of different driver-interconnect 
operates in the ballistic mode and therefore provides higher on-current at relatively 
lowers bias voltages [188]. Whereas for CMOS buffer, the delay increases as VDD is 
decreased since the operating voltage approaches the threshold voltage of MOSFETs. 
As the number of CNTs in a CNFET driver increases, the gate parasitic capacitance 
increases and power consumption also increases, but it is comparable to traditional 
CMOS as shown in Figure 8.14. Therefore for performance evaluation power delay 
product (PDP) is considered here. 
Because of ballistic conduction of CNFETs and due to higher conductance of 
CNT interconnects, the resultant delay of CNFET driver with CNT interconnect is very 
small therefore, the said combination of (45:30) of CNFET driver with CNT 
interconnects is 67% and 78% more energy efficient than the traditional interconnects 
at VDD of 0.9V and 0.6V respectively as depicted in Figure 8.15. 
110 
<Performance Comparison of<F^g^ luting TaSric 
PvsVDD 
- CNFET_CNT_N:P(15:10) -m— CNFET_CNT_N:P(30:20) 
- CNFET_CNT_N:P(45:30)-H-CMOS_Cu_N:P(l:1.6) 
0.4 0.5 0.6 0.7 
VDD(V) 
0.8 0.9 
Figure 8.14: Power of different driver-interconnect 
PDPvsVDD 
-CNFET_CNT_N:P(15:10) -•—CNFEI1CNTJM:P(30:20)^ 
• CNFET_CNr_N:P(45:30) - > - CMOS_Cu_N:P(l: 1.6) I 
0.6 0.7 
VDD(V) 
0.9 
Figure 8.15: Power delay product of different driver-interconnect 
8.7 Performance Comparison of FPGAs Routing Fabric 
For performance comparison between Cu and mixed bundle interconnects an 
island-style FPGA architecture by Xilinx [63] shown in Figure 8.16 is considered. The 
111 
(Performance Comparison ofT^^Ji (Rputing TaBric 
CLBs access the interconnect fabric through connection blocks (CBs) and the inter-
CLB wires are interconnected through switch blocks (SBs). The SBs consist of variable 
length wire segment that connect to one another through programmable buffered 
switches called interconnect resources. These intercormect resources are classified as 
Double, Hex and Long as discussed in Chapter 3 (section 3.3). These interconnect 
resources typically consume most of the FPGAs area and constitute the major portion 
of critical path delay and power consumption in most of FPGA based systems. As the 
technology scales, the mixed CNT bundle have potentially less resistance than copper 
wires, therefore this section explore CNT as interconnects and CNFET as 
programmable buffered switches (CNFET-CNT) with respect to traditional FPGAs 
interconnects i.e. (programmable buffered switches are implemented in CMOS and Cu 
as intercoimects). 
CLB 
Connection Boii 
CLB 
Svi/lch Box 
CLB CLB 
Figure 8.16: Island-style FPGA architecture 
Pre-switch 
Interconnect 
Test switch Load switch 
Figure 8.17: Baseline test platform 
112 
(Performance Comparison of<F<Pgji Ruling Tadric 
To measure the performance of both resources, the conditions of used switches 
in an actual FPGA were simulated using the test platform shown in Figure 8.17. The 
test platform corresponds to a contiguous path of three switches through an FPGA 
routing fabric; the multiplexers in all three switches are configured to pass input 'In' to 
their outputs. Power and delay measurements are made for the second switch labeled as 
"test switch" including interconnects. 
The equivalent RLC model of Section 8.6 is used for measuring the delay and 
energy. The segment length (between two CLB tile) for 32nm technology node 
considered is 17.7um [189]. All multiplexer in the SBs are made with minimum sized 
transistors (for CNFET, 15 CNT are used). The length of interconnect used in 
simulation for Double (2x 17.7um), Hex (6x 17.7um) and Long (24x 17.7) i.e. 35.4um, 
106.2um and 424.8um respectively. 
To take the advantage of longer CNT interconnect, simulation is carried out 
only for Hex and Long interconnects. The Hex and Long interconnects are 
implemented as intermediate and global interconnect respectively. Due to 15 CNTs 
used in the pass transistor of (CNFET) multiplexer, the gate parasitic capacitance 
increases and therefore energy consumption of CNFET-CNT interconnects resource 
also increases but it is comparable to traditional interconnects resources as shown in 
Figure 8.18. Compared to Cu, the CNT bundle interconnects has lower values of 
extracted R and C component. Further, the CNFET operates in ballistic mode, therefore 
the delay (Figure 8.19) and energy-delay-product (EDP) of CNFET-CNT interconnects 
resource is lower than that of implemented by traditional interconnects resource. 
At VDD = 0.9V, 0.8V and 0.7V, the Hex interconnect resource implemented by 
CNFET-CNT has 41%, 50% and 63% less EDP than the traditional Hex as shown in 
Figure 8.20. Due to larger length of Long interconnect, the advantage of delay, energy 
and EDP in case of CNT interconnects is more than Cu interconnects. Simulation 
results depicted in Figure 8.21 shows that at VDD=0.9V, the Long interconnect 
resource implemented by CNFET-CNT has 44% less EDP than the traditional Long 
interconnects resource. 
113 
(Performance Comparison ofT^QA 'S^puting TaBric 
VDD vs Enei^ gy 
-CNFET CNT Hex -Ji—CMOS Cu Hexj 
0.7 0.8 
VDD(V) 
0.9 
Figure 8.18: Energy vs. VDD for Hex interconnect 
VDD vs Delay 
• CNFET CNT Hex - -CMOS_Cu_Hex, 
Figure 8.19: Delay vs. VDD for Hex interconnect 
VDD vs EDP 
• CNFET CNT Hex -«—CMOS_Cu_Hex| 
0.7 0.8 
VDD(V) 
0.9 
Figure 8.20: EDP vs. VDD for Hex interconnect 
114 
Summary 
in 
U 
S . ^ 
0. Q 
225 -
200 g__ 
175 1 
150 ] 
125 i 
100 
75^ 
50 
25 
0 
r— 
0.7 
VDD vs EDP 
-]|k-CNFET_CNT_Long -•—CMOS_Cu_Long 
0.8 
VDD(V) 
_ ^ 
0.9 
Figure 8.21: EDP vs. VDD for Long interconnect 
8.8 Summary 
This chapter presents an analysis of mixedl CNT bundle interconnect and 
compare its performance with Cu interconnect for future FPGA routing architecture. 
Because of very low resistance of CNT bundle, the FPGAs that utilize mixed CNT 
bundle as interconnect and CNFET as SB instead of traditional (such as Cu as 
interconnect and CMOS as SB) can provide an impro\'ement of more than 40% in EDP. 
The analysis of simulation results also explore that the tube density, tube distribution; 
metallic tube ratio, the ratio of D,„ / Dg,^, and bundle dimension are crucial factors in 
determining the inductance, capacitance and conductance performance of the mixed 
CNT bundle. The discussion on the selection of these CNT parameters can provide an 
important guideline for the design of mixed Cl^ IT bundles for future FPGAs 
interconnects. 
115 
CUdpter 9 
(MMOS <BM'E(D LOW (PoWE^^ 
mg9{S<PE'E(D <F(pgA 
OlMOS (Based Lmv (PcnverHigfi Speed T<Pgji Interconnects 
Chapter 9 
DTMOS BASED LOW POWER HIGH SPEED 
FPGA INTERCONNECTS 
9.1 Introduction 
In modern Field Programmable Gate Arrays (FPGAs), power consumption has 
become an important design consideration. Increasing the performance and complexity 
has raised the dynamic power consumption per chip. While in deep submicron node, 
slirinking transistor channel length, reducing oxide thickness and threshold voltage 
have contributed towards the rapid increase in gate and subthreshold leakage. High 
power consumption requires expensive packaging and cooling solutions. In battery 
powered applications, high power consumption may prohibit the use of FPGA 
ahogether. Consequently, solutions for reducing FPGA power are needed. As power is 
related quadratically to the supply voltage, reducing the voltage to ultra low level 
results in a significant reduction in both power and energy consumption. However, 
reducing supply voltage also negatively affects the circuit performance; therefore, a 
trade-off should be taken into consideration. A study of a state of the art Virtex-ll 
FPGA family from Xilinx shows that the interconnect is the dominant power consumer 
with Hex lines, Double lines and Long lines consuming almost 60% of the total chip 
power [10]. The major role of interconnects in FPGA power consumption make it a 
high leverage target for power optimization. If power consumed by interconnect is 
reduced, it would contribute greatly to the reduction in overall power consumed by the 
FPGA. 
This chapter proposes a novel technique for reducing the dynamic power of 
multiplexer based interconnects. Here, a Dynamic Threshold MOS (DTMOS) transistor 
is used and the augmented transistor is judiciously shared in the multiplexer based 
interconnect with a minimal area penalty. All simulations are performed using 65nm 
BPTM [95] technology at 300MHz. The rest of this chapter is organized as follows: 
Section 9.2 describes different schemes of DTMOS transistors. Section 9.3 compares 
the performance of different DTMOS pass transistors with conventional pass 
transistors. Section 9.4 describes the simulation strategies for target interconnect 
116 
different Configuration ofOltMOS Transistor/Switch 
resources. Section 9.5 compares the performance of realistic Virtex-II FPGA 
interconnects and Section 9.6 presents the summary of the chapter. 
9.2 Different Configuration of DTMOS Transistor / Switch 
The basic switching element in most of the FPGA interconnect is NMOS pass 
transistor /multiplexer but it suffers from threshold voltage drop and causes high DC 
power dissipation in level restoring buffers. To eliminate the problem of static power 
dissipation in level restoring buffers, recent architectures from Xilinx and Altera Stratix 
FPGAs uses buffers as suggested in [65], [190]. However this approach yields a 
significant increase in area and power consumption [191]. This chapter suggests some 
methods other than replacing pass transistors by buffers for reducing power 
consumption. The proposed method using novel configurations of dynamic threshold 
MOS (DTMOS) based switches which overcome the above disadvantage at a minimal 
increase in area [94]. In DTMOS, any variation in the gate potential induces the same 
variation in the body, thereby, dynamically changing the threshold voltage. The main 
advantage of DTMOS over conventional MOS is its higher drive current at lower bias 
levels [192]-[195]. 
This section illustrates the types of switch configurations that need to be chosen 
for meeting different performance constraints such as assigning high Vth to the main 
transistor (MT) of DTMOS switch reduces its leakage current in the OFF state. 
Similarly, during the ON state, forward body bias reduces the Vth of MT and hence 
performance (speed) is boosted. Using this variation, one can trade-off standby leakage 
versus active power delay product. Figure 9.1 shows different configurations of 
DTMOS transistors. 
(a) 
Vref 
LT 
(c) (d) 
Figure 9.1: Different configurations of DTMOS transistor (a) Basic DTMOS (b) DTMOS with 
Augmenting Transistor (c) DTMOS with Limiting Transistor and (d) DTMOS with Augmenting 
Fixed Reference Voltage Transistor 
117 
(Different Configuration ofdyitMOS Transistor/Sivitcfi 
9.2.1 Basic DTMOS 
It consists of a single NMOS transistor in which the body is connected to the 
gate terminal. The gate voltage swing cannot exceed the cut-in voltage of the diode 
otherwise; a large current would flow through the forward-biased body-source and 
body-drain junction diodes [192], [195]. To overcome above limitation of DTMOS, 
some variations have been proposed in the basic circuits [196]-[198], as shown in 
Figure 9.1 and can be described as follows: 
9.2.2 DTMOS with Augmenting Transistor 
It consists of a main transistor (MT) and an augmenting transistor (AT). The 
drain and gate terminals of both transistors are shorted to each other and hence AT 
becomes ON only when MT is ON else both remains OFF. Because of shorted drain 
and gate, it is impossible to share the AT among other main transistors (Figure 9.1 (b)). 
9.2.3 DTMOS with Limiting Transistor 
It consists of a main transistor (MT) and a limiting transistor (LT), the gate of 
'LT' is connected to the reference voltage (Vrcf) as shown in Figure 9.1(c). The selected 
V,ef should be higher than the threshold voltage of the transistor. The disadvantage of 
this scheme is that LT remains ON due to Vref at the gate terminal (even though MT is 
OFF). This results a small penalty in gate leakage v^ 'hen the switch is inactive. 
9.2.4 DTMOS with Augmenting Fixed Reference Voltage Transistor 
It consists of a main transistor (MT) and an augmented transistor (AT). A fixed 
reference voltage (Vref) is connected to the drain of AT as shown in Figure 9.1(d). 
When MT is ON then AT also becomes ON and a fixed body bias of (~Vref) is applied 
to the MT. Since only the gate terminal of MT and AT are shorted hence it is possible 
to share a single AT among many MT transistors vv^ hich is useful in multiplexer based 
routing switches of FPGA in which the same select line drives a large NMOS pass 
transistor tree. This scheme reduces the area penalty by a large margin. Throughout this 
chapter, we have used this scheme of DTMOS which can be further configured and 
described as below. 
118 
(Performance Comparison 
9.2.5 DTMOS with Augmenting Fixed Reference S'tandard Tiireshold Voltage Transistor 
Here both MT and AT transistors are having standard threshold voltage (SVT) 
which is termed here as (SVT-DTMOS). The Vtho for the 65nm technology node is 
0.423V {Vtho is threshold voltage at zero body bias). In the On-state, the threshold 
voltage of MT decreases and therefore, the drive current increases. Hence, this switch 
has less delay compared to conventional MOS (Conv-MOS) in which the gate and body 
terminals are isolated. Due to SVT of both transistors, the leakage consumption of this 
switch is highest among all other switches. 
9.2.6 DTMOS with Augmenting Fixed Reference High Threshold Voltage Transistor 
Here both MT and AT transistors are having high threshold voltage (HVT) 
which is termed here as (HVT-DTMOS). The selected {Vtho is 0.473V) i.e. original 
Vtho is boosted by 50mV. Due to 'HVT' of MT and AT transistors, the drive current of 
this configuration is lower than (SVT-DTMOS) but higher than Conv-MOS. Similarly 
the leakage consumption of this scheme is lower than SVT-DTMOS switch. 
9.3 Performance Comparison 
This section compares the I-V characteristics of Conv-MOS, SVT-DTMOS and 
HVT-DTMOS Transistors. 
9.3,1 Selection of Vref for SVT-DTMOS 
SVT-DTMOS scheme is more prone to leakage power consumption due to its 
standard threshold voltage for MT and AT transistors. Hereafter, for all configurations, 
the selected W/L for MT is 2/1 whereas AT is oi" minimum size (to limit leakage, 
switching power and area penalty). When MT is ON then AT also becomes ON. As 
Vref is connected to the drain terminal of AT, th(;refore a body bias of magnitude 
approximately equal to Vref is applied to MT and its threshold voltage decreases. This 
decrease in threshold voltage increases its drive current. A simulation of Ids vs. Vds (at 
Vgs=0.9V) carried out for SVT-DTMOS scheme at 27°C and depicted in Figure 9.2. 
The plot of (Figure 9.2) shows that as the magnitude of Vref increases, the body bias 
goes up which then results in the increase in Ids (on-current). Compared to Vref of 
75mV, the bias of 300mV provides 12% more improvement in current drive. Similarly, 
compared to Vref of 75mV, the bias of 300mV consumes 1.5x more leakage (OFF-
current) as depicted in Figure 9.3 (plot of Ids vs. Vds at Vgs=OV). Hence to limit the 
119 
(Performance Comparison 
magnitude of leakage throughout this chapter, the chosen Vref is 300mV unless 
otherwise mentioned. 
— Vref=75mV - A — Vref=150mV - • — Vref=225mV -^x— Vref=300mV 
120 : 
100 
< 
3 80 (0 
•o 
60 
40 
0 0.2 0.4 0.6 0.8 
Vds (V) 
Figure 9.2: Ids vs. Vds of SVT- DTMOS transistor 
-•—Vref=75mV 
-A—Vref=225mV 
25 
-^-Vref=150mV 
-X—Vref=300mV 
0.2 0.4 O.e) 0.8 
Vds (V) 
Figure 9.3: Ids vs. Vds of SVT- DTMOS transistor at T=75°<: 
9.3.2 I-V Characteristics 
Figure 9.4 and 9.5 show plots of Ids vs. Vds for (Vgs=0.9V) and Ids vs. Vgs for 
(Vds=0.9V). In both cases, SVT-DTMOS has higher magnitude of current drive, this is 
due to SVT of MT and AT transistors, whereas in HVT-DTMOS, both transistors are 
120 
(Performance Comparison 
halving HVT and hence effective body bias is less compared to SVT-DTMOS. In case 
of Conv-MOS, there is no body bias resulting in least current drive. 
120 , 
90 
< 
(0 
60 
30 
Conv-MOS 
• — HVT-DTMDS 
^ — SVT-DTMDS 
0.2 0.4 0.6 
Vds(V) 
0.8 
Figure 9.4: Ids vs. Vds for different transistor schemes 
120 
• — Conv-MOS 
• — HVT-DTMOS 
*—SVT-DTMOS 
0.1 0.3 0.5 
Vgs(V) 0.7 o.a 
Figure 9.5: Ids vs. Vgs for different transistor schemes 
9.3.3 Effect of Temperature 
Environmental temperature fluctuations can cause significant variations in die 
temperature; electronic systems mounted on various applications operate at high 
temperature. Temperature variations affect the device characteristics of MOSFETs, 
such as change in threshold voltage, carrier mobility, and saturation velocity [199]. 
Therefore to verify the effect of temperature variation on the proposed schemes of 
121 
(Performance Comparison 
DTMOS a simulation of Ids vs. Vgs for (Vds=0.9V) is carried out at a temperature of 
75°C as depicted in Figure 9.6. 
Compared to same simulation at 27°C (Figure 9.5), the Ids current degrades 
approximately by 10% for all MOS configurations. Similarly at high temperature (due 
to fall in threshold voltage), the OFF-state current increases. The leakage penalty is 
more in case of SVT-DTMOS (due to SVT of both MT and AT transistors), whereas 
the HVT-DTMOS has least leakage due to HVT of both MT and AT as depicted in 
Figure 9.7. 
110 
Conv-MOS 
HVT-DTMOS 
SVT-DTMOS 
0.5 
Vgs(V) 
Figure 9.6: Ids vs. Vgs for different transistor schemes at T=75°C 
A— HVT-DTMOS 
0.2 0.4 0.6 
Vds (V) 
0.8 1 \ 
Figure 9.7: Ids vs. Vds (at Vgs=OV) for different transistor schemes at T=75°C 
122 
target Interconnect (Resources 
9.4 Target Interconnect Resources 
Here the target FPGA considered for the study is island-style SRAM-based 
FPGA architecture, the Xilinx and Altera's FPGA fall into this category. All 
experiments were performed on Xilinx Virtex-II FPGA. The details of Virtex-II device 
is presented in chapter 3 (please refer section 3.3). The interconnect switch matrix 
consists of variable length wire segments that connect to one another through 
programmable buffered switches such as Double, HEX and Long. Table 9.1 lists the 
different target interconnect resources for Virtex-II FPGA family. 
Table 9.1: Target Interconnect Resources Present in the Switch Matrix 
interconnect 
Resources 
Hex 
Double 
Long 
Details 
12-to-l multiplexer and level restoring buffer 
16-to-l multiplexer and level restoring buffer 
32-to-l multiplexer and level restoring buffer 
9.4.1 Simulation Strategies (Leakage) 
All interconnect resources discussed above consists of wide input NMOS pass 
transistor based multiplexers and a level restoring buffer. Figure 9.8 shows the 
transistor level view of a 2-input multiplexer swi1;ch with level restoring buffer. All 
higher order interconnects (multiplexers) resources present in the switch matrix are 
implemented similar to the 2-input multiplexer. 
When logic 1 is being driven through an NIVtOS pass transistor, it leads to a Vth 
drop in the voltage level of the signal. This leads to both the PMOS and NMOS of the 
driven buffer to get partially turned ON (because the pull-up is not fully OFF) leading 
to large static current. Since the DTMOS circuit potentially lower the ON-state 
threshold voltage to quite below Vth, therefore DTMOS based interconnects get 
benefited from such type of design. Figure 9.8(a) shows a conventional muhiplexer 
(Conv-Mux) whereas Figure 9.8(b) shows a DTMC>S based multiplexer which will be 
further configured as: 
SVT-DMOS-Mux: When both NMOS pass transistors Ml and M2 and 
augmented transistors Al and A2 have standard threshold voltage 
(HVT-DTMOS-Mux): When both NMOS pass transistors Ml and M2 and 
augmented transistors Al and A2 have high threshold voltage. 
123 
Target Interconnect <l(fsources 
Here after for all DTMOS configurations, the threshold variation is restricted to 
only to Ml, M2 and Al, A2 transistors (i.e. multiplexer tree only) and all other 
transistors i.e. (the transistors in the select line (S) of inverter and level restoring buffer) 
are always SVT type. 
s 
S VDD 
V1 
V2 
M1 
M2 
V1 1 Ml 
Vout 
V2 
± 
A1 
Vref 
M2 
Vout 
K 
(a) 
A2 
Vref 
(b) 
Figure 9.8: Transistor level view of 2-input multiplexer (a) Conventional and (b) DTMOS based 
To determine the performance benefit of DTMOS schemes over the 
conventional MOS, this section estimates the total leakage (sum of gate and 
subthreshold leakage), dynamic/switching power and propagation delay. To measure 
the leakage all possible combination of inputs of 2-input multiplexer with status of 
select line (S) is considered and depicted in Table 9.2. To estimates the actual leakage 
penalty of DTMOS with respect to conventional multiplexer, the average of all 
combination of inputs is considered and depicted in Figure 9.9, which shows that the 
leakage of SVT-DTMOS-Mux is 1.23X that of Conv-Mux. This is due to SVT of 
transistors whereas leakage penalty of HVT-DTMOS-Mux is negligible due to HVT of 
all said transistors. 
Table 9.2: Leakage of different 2-input multiplexer schemes 
Inputs 
VI 
0 
0 
1 
1 
0 
0 
1 
] 
V2 
0 
1 
0 
1 
0 
1 
0 
1 
Select 
line 
S 
0 
0 
0 
0 
1 
1 
1 
I 
Total Leakage (nA) 
Conv-Mux 
26.98 
31.57 
34.21 
24.59 
29.59 
36.82 
34.18 
27.20 
SVT-DTMOS-Mux 
28.36 
38.90 
52.70 
26.10 
311.40 
54.40 
42.20 
28.70 
HVT-DTMOS-Mux 
28.22 
32.89 
38.60 
25.10 
31.10 
42.10 
35.20 
28.20 
124 
Target Interconnect <l(f sources 
< 
c 0> 
o> 
n 
-J 
S 
> 
< 
40 
35 ! 
3 0 •• 
25 
20 . 
15 ' 
10 : 
5 i 
0 
'. 
^ 
•^- » 
- I ' 1 
I *• 
2, ^ i 
', ^ 1^ 
Conv-Mu 
' ' » . 
^A^ 
lU 
• f 
~ 4 
^X I 4 
ii 
. 4' 
- j i 
!---r^. 
»-r." 
^ 
E 
^ 
, i 
x HVT-DTMOS-Mux SVT-DTMOS-Mux 
Different Mux Schemes 
Figure 9.9: Average leakage of different 2-input multiplexer schemes 
9.4.2 Driving Capability and Power Consumptiom 
To determine the driving capability and power consumption of the target circuit, 
a load capacitor CL of 5 to 25fF is connected at the output of a level restoring buffer 
(at Vout terminal) of Figure 9.8 and simulation is carried out for varying values of CL 
at operating frequency of 300MHz at a supply voltage of 0.9V. The SVT-DTMOS-Mux 
scheme has 'SVT' transistors therefore, the Vth of pass transistor reduces by a large 
magnitude w h^ich resuhs in high speed (less delay), whereas due to 'HVT' transistor of 
HVT-DTMOS-Mux, the Vth of pass transistor does not reduce by the same magnitude 
which results in higher delay as depicted in Figure 9.10. 
At CL=25fF compared to Conv-Mux, the SVT-DTMOS-Mux and HVT-
DTMOS-Mux schemes provide 15.31% and 11.71% less delay. Whereas, the power 
consumption of both the schemes is approximately same. In case of Conv-Mux, the 
weak-1 at the input of level restoring buffer causes high dc power dissipation therefore, 
the power consumption of Conv-Mux is high as depicted in Table 9.3. To make a trade 
off between power and delay, an important performance metric of a circuit i.e. power 
delay product (PDF) is considered. 
125 
Tkrget Interconnect (Resources 
250 
-Conv-Mux -A—HVT-DTMOS-Mux -»—SVT-DTMOS-Mux 
50 • 
5 10 15 20 25 
Load Capacitor CL (fF) 
Figure 9.10: Delay of different 2-input multiplexer schemes 
TABLE 9.3: Power (uW) consumption of different 2-input multiplexer schemes 
CL(fF) 
5 
10 
15 
20 
25 
Conv-Mux 
1.05 
1.62 
2.19 
2.77 
3.35 
SVT-DTMOS-Mux 
0.96 
1.53 
2.11 
2.69 
3.27 
HVT-DTMOS-Mux 
0.97 
1.54 
2.12 
2.70 
3.28 
A lower PDP indicates an energy efficient design. Figure 9.11 shows the PDP 
of all schemes. As predicted by delay and power, the PDP of SVT-DTMOS-Mux is less 
than HVT-DTMOS-Mux and Conv-Mux schemes. At CL=25fF, the SVT-DTMOS-
Mux and the HVT-DTMOS-Mux are 17.33% and 13.57% more energy efficient than 
Conv-Mux schemes. 
-*—Conv-Mux 
800 
• HVT-DTMOS-Mux 
-SVT-DTMOS-Mux 
10 15 20 
Load Capacitor CL(fF) 
25 
Figure 9.11: PDP of different 2-input multiplexer schemes 
126 
(Performance Comparison of^aGstic Interconnects 
9.5 Performance Comparison of Realistic Interconnects 
In FPGA architecture, the interconnect resources drive different length of 
Copper (Cu) interconnects. Therefore, to have a realistic view of interconnect 
resources, the Cu interconnects are modeled as equivalent transmission line and the 
equivalent circuit parameters such as resistance, inductance and capacitance (R, L and 
C) were extracted, using BPTM [95] tools for 65nm technology node. For simulation 
only the Double, Hex and Long interconnect resources as depicted in Table 9.1 were 
considered. 
Figure 9.12 shows the simulation setup, which models the environment under 
which the interconnect resources has to operate in a typical FPGA [200]. The circuit 
consists of different interconnect resources followed by a wire which is based on the 
distributed RLC model. An inverter is placed after a certain length which drives a 20fF 
capacitive load. The length of the wire between intercormects resources and inverter is 
varied from 20um to lOOum. An input is applied to interconnect resources and power, 
delay figures is measured at transistor level using HSPICE simulator. 
Input Output 
Figure 9.12: Simulation set-up for different interconnects 
9.5.1 Double Interconnect 
Table 9.3 shows the delay of different doui:)le interconnect resources schemes 
such as conventional (Conv-Double), DTMOS based standard threshold voltage (SVT-
Double) and high threshold voltage (HVT-Double). As discussed in the above section 
at VDD of 0.9V and 0.6V, the SVT-Double and HVT-Double scheme provides less 
delay than that of Conv-Double as depicted in Figure 9.13. 
127 
Qfetjormance Comparison of ^falistic Interconnects 
TABLE 94: Delay (ps) of different Double interconnect schemes 
L 
(um) 
20 
40 
60 
80 
100 
20 
40 
60 
80 
100 
VDD=0.9V 
Conv-Double 
234 
248 
266 
294 
312 
SVT-Double 
193 
214 
227 
243 
263 
HVT-Doubie 
199 
219 
232 
248 
268 
VDD=0.7V 
351 
369 
408 
427 
443 
310 
324 
357 
372 
421 
316 
335 
368 
387 
430 
VDD=0.8V 
Conv-Double 
267 
289 
303 
328 
342 
SVT-Double 
229 
250 
272 
296 
312 
HVT-Doub!e 
240 
260 
278 
301 
318 
VDD=0.6V 
519 
556 
579 
627 
656 
448 
468 
500 
531 
560 
473 
527 
547 
591 
600 
.Conv-DoubieJO.gV) 
.Conv-Double_(0.6V) SVr-Double_{0.9V) SVr-DoubleJO.eV) 
• HVr-DoubleJO.QV) 
.HVr-Double_(0.6V) 
700 
100 
20 60 
Mum) 
100 
Figure 9.13: Delay of different Double interconnects schemes at VDD=0.9V and 0.6V 
128 
(Performance Comparison of'S^aRstic Interconnects 
-•—Conv-DoubleJO.gV) —•—SVT-DoubleJO.QV) -A-HVT-DoubleJO.gV) 
1900 
1600 
!il, 1300 \ 
D. 
a 
Q. 
1000 
^ ^ 
700 
20 60 
L (urn) 
100 
Figure 9.14: PDP of different Double interconnect schemes at VDD=0.9V 
Figure 9.15: PDP of different Double interconnect schemes at VDD=0.6V 
As the difference between the power consumption of SVT-Double and HVT-
Double scheme is negUgible and hence to take advantage of speed of DTMOS schemes, 
PDP is plotted at supply voltages of 0.9V and 0.6V as depicted in Figure 9.14 and 9.15 
respectively. 
129 
(Performame Comparison of^aRstic Interconnects 
It is found that even at lower operating voltage of 0.6V and interconnects length 
of lOOum, the SVT-Double and HVT-Double schemes are 18% and 8.5% more energy 
efficient than that of Conv-Double scheme. 
To observe the leakage power penalty, the strategies mentioned in Table 9.2 is 
used and the average leakage power is depicted in Figure 9.16. It has been found that at 
all supply voltages from 0.9V to 0.6V, the SVT-Double consumes approximately 2X 
more leakage than that of Conv-Double scheme This is due to two reasons; first the 
SVT-Double is implemented by SVT transistors i.e. (more subthreshold leakage) 
secondly, the SVT-Double scheme requires extra 8 augmented SVT transistors for the 
four select lines. These extra transistors contribute significantly to leakage (because 
total leakage depends on the number of transistors in a circuit). The HVT-Double 
consumes comparable leakage as that of Conv-Double scheme due to HVT transistor 
(less subthreshold leakage). 
400 
350 
\ 300 
0) 
? 250 
o 
Q. 
§, 200 
ra j£ n 
jj 150 
V 
0) 
2 100 
> 
< 
50 
m Conv-Double 
Q SVT-Double 
D HVT-Double 
'1 
J 
i 
, i 
i 
• . J 
'•i i 
0.9 0.8 0.7 0.6 
VDD (V) 
Figure 9.16: Leakage power of Double interconnect schemes at different VDD 
9.5.2 Hex Interconnect 
The Hex interconnect is implemented by mixing decoded and encoded 
multiplexers. As the transistor count is less than the Double interconnect therefore, the 
leakage (shown in Figure 9.17) and PDP (in Figure 9.18) is less than that of Double 
interconnect. Similarly, delay of Hex interconnect is less than Double interconnect due 
to the presence of only three pass transistors in the critical path of Hex interconnect 
130 
(Performance Contparison of^aRstic Interconnects 
compared to four pass transistors in case of Double interconnect. The delay of Hex 
interconnect is depicted in Table 9.4. 
TABLE 9.5: Delay (ps) of different Hex interconnect schemes 
L 
(urn) 
20 
40 
60 
80 
100 
20 
40 
60 
80 
100 
VDD=0.9V 
Conv-Hex 
204 
225 
237 
255 
270 
SVT- Hex 
175 
200 
215 
230 
250 
HVT- Hex 
181 
207 
222 
238 
259 
VDD=0.7V 
314 
347 
363 
385 
408 
270 
300 
330 
355 
387 
290 
310 
350 
370 
400 
VDD=0.8V 
Conv- Hex 
241 
271 
286 
305 
322 
SVT- Hex 
208 
235 
250 
273 
295 
HVT- Hex 
228 
247 
262 
285 
309 
VDD=0.6V 
461 
502 
526 
560 
594 
401 
440 
470 
502 
539 
430 
475 
505 
540 
570 
I Conv-Hex Q SVr-Hex D HVT-Hex 
c 
250 
200 
g, 150 
n j£ 
n 
0) 
i 100 
^ 50 
..'A 
0.9 0.8 0.7 
VDD (V) 
0.6 
Figure 9.17: Leai<age power of Hex interconnect schemes at different VDD 
The PDP of Hex interconnect is shown in Figure 9.18 and 9.19. Similar to the 
discussion in section 9.5.1, It is found even at lower operating voltage of 0.6V and 
L=100um, the SVT-Hex and HVT-Hex schemes Eire 17% and 12% more energy 
efficient than that of Conv-Hex scheme. 
131 
Performance Comparison of^afktic Interconnects 
—•— Conv-HexJO.QV) 
1500 
• SVT-HexJO.QV) -A-HVT-HexJO.gV) 
Figure 9.18: PDP of different Hex interconnect schemes at VDD=0.9V 
-Conv-HexJO.eV) • SVr-Hex_(0.6V) •HVr-Hex_(0.6V) 
1400 
1200 
oo 
Si, 1000 
Q. 
o 
a. 
800 
600 
'' 20 60 100 
I 
L(um) 
Figure 9.19: PDP of different Hex interconnect schemes at VDD=0.6V 
9.5.3 Long Interconnect 
The Long interconnect is implemented by encoded multiplexers. As the total 
number of transistor count is more than the Double interconnect therefore, the leakage 
(shown in Figure 9.20) and PDP of Long interconnects is more than that of Double 
interconnect. Similarly the delay of Long interconnect is more than Double 
interconnect due to five pass transistors in its critical path compared to four pass 
132 
(Performance Comparison of^aRstic Interconnects 
transistors in case of Double interconnect. The delay of Long interconnect is depicted 
in Table 9.5. 
TABLE 9.6: Delay (ps) of different Lonj' interconnect schemes 
L 
(um) 
20 
40 
60 
80 
100 
20 
40 
60 
80 
100 
VDD=0.9V 
Conv-Long 
264 
276 
298 
336 
360 
SVT- Long 
224 
242 
257 
272 
296 
HVT- Long 
230 
248 
264 
279 
304 
VI)D=0.7V 
380 
401 
448 
492 
530 
339 
372 
409 
441 
501 
350 
381 
420 
450 
511 
VDD=0.8V 
Conv- Long 
293 
320 
347 
370 
390 
SVT- Long 
252 
285 
297 
315 
359 
HVT- Long 
260 
294 
307 
324 
368 
VDD=0.6V 
539 
597 
646 
685 
715 
501 
538 
583 
640 
680 
526 
568 
601 
670 
701 
I Conv-Long D SVT-Long • HVT-Long 
500 
450 
c-400 
=.350 
0)300 
ra 
io 250 
0) 
«200 
2 150 
> 100 
< 
50 
0 
i } • • • • • — I if] 
0.8 0.7 0.6 
VDD (V) 
Figure 9.20: Leakage power of Long interconnect schemes at different VDD 
20 60 
L(um) 
100 
Figure 9.21: PDP of different Long interconnect schemes at VDD=0.9V 
133 
Summary 
The PDP of Long interconnect is shown in Figure 9.21 and 9.22 for supply 
voltages of 0.9V and 0.6V respectively. Similar to the discussion in section 9.5.1, it has 
been found that even at lower operating voltage of 0.6V and L=100um, the SVT-Long 
and HVT-Long schemes are 10% and 7.8% more energy efficient than that of Conv-
Long scheme. 
-•— Conv-LongJO.eV) 
1800 -, 
1600 
-SVr-Long_{0.6V) -HVr-Long_(0.6V) 
800 
20 60 
L(um) 
100, 
Figure 9.22: PDP of different Long interconnect schemes at VDD=0.6V 
Similarly at VDD=0.9V, the proposed SVT-DTMOS and HVT-DTMOS 
scheme for (Hex, Double and Long) interconnect provides respectively 16 to 22% and 
12 to 20 % improvement in energy saving over the Conventional (Conv) MOS scheme. 
9.6 Summary 
This chapter demonstrates various techniques of designing low power DTMOS 
based interconnect for FPGA that can be used for a broad range of supply voltages. 
DTMOS delay and efficiency become superior to th(; traditional design as the voltage is 
reduced and the loading is increased. The proposed DTMOS based interconnect 
resources result in an energy-efficient FPGA architecture. 
Since the interconnect fabric of FPGA has a large number of multiplexer based 
interconnects, therefore overall improvement in PDP for the whole FPGA can be 
significant. The area overhead of the proposed interconnects will be very less because 
134 
Summary 
the extra needed transistors for DTMOS based switches is easily shared in all 
multiplexer based interconnects and also the augmented transistor is of minimum size. 
A final complication with DTMOS based switches and interconnects is the 
process complexity. The isolation to the body contact requires an additional masking 
step. DTMOS can only be implemented in triple-well process technology. Isolation 
comes naturally for DTMOS when implemented on SOI wafers but it is difficult with 
bulk silicon wafers. The additional increase in area and process complexities for 
DTMOS is compensated by its higher operating frequency and higher driving 
capability compared to Conventional-CMOS circuit topology. 
135 
CUdpter 10 
Summary and ConcCusion 
Chapter 10 
SUMMARY AND CONCLUSION 
10.1 Introduction 
Field programmable gate arrays require considerable hardware overhead to 
offer programmability thereby, making them less powe;r efficient than custom ASICs 
for implementing same logic circuit. The large number of transistors in FPGA chips 
suggests that the power trends associated with scaling may impact FPGAs more 
severely than custom ASICs. Power management in FPGA will be mandatory at very 
deep submicron technology node to ensure correct functionality, reliability and to 
reduce packaging costs. Furthennore, low power consumption is needed if FPGAs are 
to be a viable alternative to ASICs in low-power portable applications. 
The aim of this thesis is to investigate the low power architecture for the basic building 
blocks of FPGA and techniques that can be used for reducing the power consumption 
of these blocks. 
This chapter is organized into four sections. The first section presents the 
summary of the work presented in each chapter. The second section provides the 
conclusions drawn from the results obtained in each chapter. The third section 
summarizes the achievement and the last section outlines area for future research. 
10.2 Summary 
With the scaling of technology, digital sj^ stems have grown immensely 
complex. However, with increasing circuit complexity, the cost and design cycles of 
custom VLSI designs have increased significantly. FPGAs offer an efficient and cost 
effecfive opfion for implementing digital systems for medium to low volume 
production. Digital system designers therefore now get the advantages of low time-to-
market of the programmable logic in addition to almost ASIC-like logic density. 
Commercial FPGAs have on-chip memory blocks and DSP resources, apart from the 
programmable logic making them more attractive for implementing complete systems 
on chip. 
In very deep submicron technology, leakage power has emerged as a key design 
challenge because leakage power increases with small geometries. For the FPGAs to 
continue to retain its semiconductor market and competitive advantages over the high 
136 
Summary 
performance custom VLSI designs, the FPGA industry must adopt new techniques for 
leakag(; power reduction. 
This thesis has proposed the operation of FPGA blocks in subthreshold regime. 
Such kind of FPGAs can be used even in ultra low power portable applications like 
hearing aids, pacemakers etc. 
This thesis also proposed low power high speed carbon nanotube field effect 
transistor based SRAM cell. Due to high-K dielectric and high intrinsic carrier 
mobility in carbon nanotubes, the CNFET has lo\ver leakage and higher speed 
compared with CMOS transistor. Since FPGA consists of a large number of SRAM 
cells therefore, total leakage power consumption will bi; considerably reduced and there 
will be a significant improvement in speed if all CMOS based SRAM cells are replaced 
by CNFET cells. 
Power dissipation in CMOS circuits can be classified as either dynamic or 
static, Dynamic power consumption is due to the logic transitions at circuit nodes 
whereas static power is dissipated even when a circuit is in the idle state. Historically, 
dynamic power has dominated power consumption in CMOS circuits; however, 
technology scaling trends have resulted in leakage power becoming an increasing 
component of total power. Chapter 2 provides a general and block specific low power 
techniques for reducing leakage and switching power in CMOS circuits. 
The recent architecture and breakdown of power consumption in FPGA is 
described in detail in chapter 3, and it has been observed that interconnect accounts for 
the bulk of FPGA's static and dynamic power consumption. Various approaches for 
reducing FPGA power have been proposed in the literature including approaches for 
leakage reduction. A significant improvement in power efficiency has to be achieved to 
make FPGAs viable in potable domain applications. 
Chapter 4 proposed the subthreshold regime operation of FPGA building 
blocks. The subthreshold logic can be easily implemented and derived from traditional 
existing circuits by lowering the supply voltage to k;ss. than the threshold voltage. The 
building blocks of reconfigurable hardware such as 4-input LUT and one bit full adder 
cell is implemented and their performance such as delay, power and PDP are estimated 
in the subthreshold region. The sensitivities of the two schemes against process 
parameter variations are also explored. It has been found that DTCMOS shows superior 
robustness against temperature and process variations. Further, an 8T subthreshold 
SRAM cell is implemented in MOS and DTCMOS schemes and it has been found that 
137 
SummaTy 
DTCMOS cell outperforms in read SNM and write delay (speed) with minimal penalty 
in leakage consumption for the selected supply voltage. 
Technology scaling of the bulk silicon transistor over the last three decades has 
not only produced ultra high performance digital circuits but has also sustained 
Moore's Law. However, ramifications of "short channel effects" such as exponential 
increase in leakage current and large parameter variations have created challenges in 
design and testing of bulk integrated circuits. Due to very high intrinsic carrier mobility 
of carbon nanotubes (CNTs), the carbon nanotub(2 field effect transistors (CNFETs) 
have caught the attention of device/circuit and system engineers worldwide. Chapter 5 
re\'iews the literature on the architecture and modeling of CNFET. This chapter also 
compares the performance of bulk and CNFET transistor based 5-stage ring oscillator 
and F04 benchmark circuits. This chapter also explores the performance of a CNFET 
based FPGA 'Basic Logic Element (BLE)' block and it is found that the CNFET based 
BLE outperforms in switching power and speed compared to the same implemented in 
bulk CMOS technology. 
The ability of FPGA to implement a variety of circuits on a single chip always 
resuhs in the under utilization of some logic and interconnect resources. These unused 
transistors leak power in the absence of switching activity. The interconnect fabric of 
an FPGA consumes a major portion of the total leakage power. Further as we move to 
smaller nodes, leakage will ultimately dominate the total power consumption. Chapter 
6 focuses on the reduction of leakage power in the interconnect switch matrix 
multiplexers by ensuring that minimum leakage vec^ tor should be applied to all these 
multiplexers. The analyses of multiplexers have been carried out with varying sizes, 
topologies and transistor sizing at different temperatures and supply voltages at a deep 
submicron 22nm technology node. The minimum leakage state heavily depends on the 
relative magnitude of the subthreshold and the gate leakage currents. Therefore, 
different low leakage vectors are selected for a minimum and optimum sized 
multiplexers such as keeping all input lines of multiplexers at logic ' 1' and inputs to 
inverters at logic '0' or keeping all the inputs of the multiplexers and inverters at logic 
' 1' will provide a significant reduction in leakage for all unused interconnects without 
any kind of penalty. 
Due to aggressive scaling, secondary effects and process variations, the power 
consumption and performance of CMOS SRAM cell worsens in deep submicron 
technology. It has therefore, become difficult to design low power, high speed, robust, 
138 
SummaTy 
and compact SRAM cells in deep submicron technology. The carbon nanotube based 
field effect transistor (CNFET) technology with reduced process variation, better gate 
controllability, high thermal stability and high drive current is a promising alternative 
to the bulk CMOS. Chapter 7 explores a low leakage, high speed and robust CNFET 
based 6T-SRAM cell and compares its performance with that of conventional CMOS 
based cell. All the simulations are carried out at 32nm technology node with equal 
threshold voltage for CNFET and CMOS transistors. Due to inherent characteristics of 
CNFET such as good gate controllability, drive cuitent and immunity to short channel 
effect, the CNFET cell outperforms to CMOS cell in terms of leakage power saving, 
^vrite margin, speed and read SNM. As the FPGA consists of a large number of 
configurable SRAM cells, the implementation of low leakage SRAM cell with CNFET 
technology will greatly contribute to the reduction of overall leakage consumption of 
FPGAs. 
The International Technology Roadmap for Semiconductors (ITRS) predicts 
that the traditional copper interconnects will be a major bottleneck when feature sizes 
become smaller than 45nm. This is due to steep rise in parasitic resistance of copper, 
^ '^hich not only increases the interconnect delay but also limits their current carrying 
capability. In order to alleviate such problems, alternative interconnect technologies 
and their architectural implications for FPGAs in future process technologies must be 
explored. 
CNTs have recently been proposed as a possible replacement for metal 
interconnects in future teclinologies. Due to their long mean free paths (MFP), high 
current carrying capability and high thermal conductivity, CNTs are expected to be a 
very good alternative material for future FPGA interconnects. Chapter 8 describes the 
different categories of CNTs such as Single-Wall Carbon Nanotubes (SWCNTs), 
Multi-Wall Carbon Nanotubes (MWCNTs) and Mixed CNTs. Because of their 
extremely desirable properties such as high mechanical and thermal stability, high 
thermal conductivity and large current carrying capacity, CNT bundle based 
interconnects promises to be good alternative as future FPGA interconnect. However 
the high resistance (of the order of 6.45 KQ.) associated with an isolated CNT 
necessitates the use of a bundle of CNTs. Moreover, due to the lack of control on 
chirality, any bundle of CNTs consists of metallic as well as semiconducfing nanotubes. 
Almost all experimental results have demonstrated that a realistic nanotube bundle 
contains a mixed bundle of single-walled and multi-walled CNTs (SWCNTs and 
139 
Summary 
MWCNTs). Therefore, more emphasis is given to mixed CNTs and it has been found 
that FPGAs which are implemented by mixed CNT interconnects outperforms in delay 
and energy consumption compared to that of traditional Cu interconnects. The chapter 
8 also provides important guidelines for selection of vital parameters of mixed CNT 
bundles so as to optimize the resistance, capacitance and inductance of mixed CNT 
bundle intercormects. Moreover, exhaustive simulations have been carried out for 
different interconnect lengths with both CMOS and CNFET drivers for copper and 
CNT intercormects. It has been found that most FPGA interconnect resources 
implemented with CNFET drivers and CNT interconnects provide best performance 
than traditional CMOS and copper interconnects. 
The basic switching element in most of the FPGAs interconnect are NMOS pass 
ti-ansistors or multiplexers which suffer from thr(;shold voltage drop that causes high 
E)C power dissipation in level restoring buffers. To eliminate this problem, recent 
architecture from Xilinx and Altera uses tri-state buffers. However this approach has 
significant area and power consumption overhead. Chapter 9 suggests some methods 
other than replacing pass transistors by tri-state buffers for reducing power 
consumption. The proposed method uses a novel configuration of dynamic threshold 
MOS (DTMOS) transistor. DTMOS based switches overcome the above disadvantage 
at a minimal increase in area. A new augmented DTMOS biasing scheme is proposed 
which provides a fixed body bias and improves the level of high input of level restoring 
buffer which reduces the DC power consumption. The main advantage of DTMOS over 
conventional MOS is its higher drive current at lower bias levels. A simulation is 
performed on the realistic multiplexer based interconnect resources of FPGA such as 
Double, Hex and Long at a transistor level driving a Copper wire of length varying 
from 20um to lOOum. It has been found that the DTMOS based interconnect resources 
outperform in speed and PDP compared to conventicmal MOSFET. 
A final complication with DTMOS based interconnect resources is the process 
complexity and area penalty. The area overhead of the proposed interconnect will be 
very less because the extra needed transistor for DTMOS based switches is easily 
shared among all multiplexer based interconnects and the augmented transistor is of 
miniimum size. The above disadvantage can be compensated for higher driving 
capability, higher operating frequency and low energy consumption of DTMOS circuit 
over the conventional MOS. 
140 
ConcCusions 
10.3 Conclusions 
This thesis investigates low power architecture for the basic building blocks of 
FPGA and techniques for reducing both switching and leakage power consumption for 
i;hese blocks. The breakdown of power consumption in FPGA is well-studied in 
Chapter 3 and it can be concluded from the said chapter that the interconnect accounts 
for major portion of static and dynamic power consumption. 
Chapter 4 implements the building blocks of reconfigurable hardware such as 4-input 
LUT and one bit full adder cell and compared their performance such as delay, power 
and PDF in the subthreshold region for MOS and DTCMOS schemes. It can be 
concluded from Chapter 4 that DTCMOS has superior robustness against temperature 
and process variations besides having lower delay than that of CMOS for above blocks 
at 22nm node. Chapter 4 also implements 8 transistors SRAM cell in CMOS and 
DTCMOS schemes and it has been found that DTCMOS based 8 transistors SRAM cell 
provides up to 15% and 23% improvement in read SNM and speed at a supply voltage 
of 200mV. Whereas in standby mode due to different voltage at the cell nodes, the 
body biasing results in more leakage. Hence, DTCMIOS based cell consumes 8% more 
leakage than that of CMOS cell. 
Chapter 5 reviews the literature on the architecture and modeling of CNFET. 
This chapter also proposed a CNFET based FPGA building block such as BLE. Due to 
very high intrinsic carrier mobility of carbon nanotubes, it is found that the CNFET 
based BLE outperforms in speed compared with bulk CMOS technology. Similarly due 
to very small capacitance of CNT, the CNFET based BLE consumes very small 
svatching power compared to bulk CMOS based LUT. 
Chapter 6 proposed input vector control technique of leakage power reduction 
in the interconnect switch matrix multiplexers. It has been observed that most of the 
routing multiplexers remain unused in a typical FPGA implemented design. These 
multiplexers contribute significantly to the leakage power consumption. There is 
tremendous scope of reducing leakage power in these unused multiplexers provided 
these multiplexers are fed with the least leakage input vector at all times. Chapter 6 
explores minimum leakage vectors for different multiplexer types and sizes at different 
temperature and supply voltages at a deep submicron 22nm technology node. It can be 
concluded from Chapter 6 that the proposed selection of input vectors for Hybrid 16:1 
multiplexer for optimum size transistors reduces leakage by upto 23% and 16% at 85°C 
for supply voltages of 0.8V and 0.6V respectively. 
141 
Concfusions 
Chapter 7 proposed a CNFET based 6 transistor SRAM cell. Due to inherent 
characteristics of CNFET such as good gate controllability, drive current, immunity to 
short channel effect and due to very high intrinsic carrier mobility, the CNFET cell 
outperforms in leakage power consumption, write margin, speed and read SNM as 
compared to a CMOS cell. It can be concluded from Chapter 7 that CNFET cell is 
1.84X faster in speed and provides 21% improvement in read SNM. Furthermore due, 
ti3 absence of dangling bonds in CNFET and b(icause of high K (Hf02) electrolyte 
gating, the leakage power consumption of CNFET cell is 84% and 40% less than that of 
CMOS cell at operating temperatures of 27°C and 80°C respectively. 
The steep rise in parasitic resistance of copper in deep submicron not only 
increases the interconnect delay but also limits its current carrying capability. Due to 
their long mean free paths (MFP), high current caitying capability and high thermal 
conductivity, CNTs are expected to be a very good alternative material for future FPGA 
interconnects. Chapter 8 proposed a mixed CNT bundle as interconnects for FPGAs. 
Double, Hex and Long interconnect resources of FPGA are implemented by CNT and 
Cu wires with CNFET and CMOS drivers respectively. Due to lower values of 
detracted R and C components of CNT, the CNl" bundle interconnects based FPGA 
resources outperform the traditional interconnects in terms of speed, power, energy and 
energy-delay-product (EDP). It can be concluded from Chapter 8, that the Hex 
interconnect resource implemented by CNFET-CNT has 41% less EDP than the 
traditional Hex. Similarly larger length of Long interconnects provides more advantage 
of speed, energy and EDP for CNT interconnects. Therefore, the Long interconnect 
resource implemented by CNFET-CNT has 44% less EDP than the traditional Long 
interconnect resource. 
Most of FPGAs interconnect has NMOS pass transistor based multiplexer as a 
switching element which suffer from a weak-' 1' and causes high DC power dissipation 
in level restoring buffers. Chapter 9 proposed use of novel configurations of dynamic 
threshold MOS (DTMOS) which overcome this disadvantage at a minimal increase in 
area. By using this novel DTMOS, a realistic multiplexer based interconnect resources 
of FPGA such as Double, Hex and Long are implemented. These interconnect 
resources drives a copper wire of length 20um to lOOum. Due to high driving current of 
DTMOS, it has been found that the DTMOS based interconnect resources outperform 
in 5;peed and PDP compared with the conventional MOS interconnects. It is concluded 
142 
^cfHevements 
from Chapter 9 that at VDD=0.6V, compared to conventional MOS (Conv-MOS), the 
proposed SVT scheme such as SVT-Double, SVT-Hex and SVT-Long provide an 18%, 
17% and 10% improvement in PDF respectively. The area overhead of the proposed 
DTMOS based interconnect will be minimum because the extra needed augmented 
transistor is shared among many multiplexer trees and is of minimum size. 
10.4 Achievements 
• Subthreshold DTCMOS based schemes for implementing FPGA building 
blocks such as 4-input LUT, 1-Bit full adder cell and 8 transistor SRAM cell are 
proposed which have better power perfonnance and higher speed than blocks 
implemented with conventional MOS. 
• A CNFET based FPGA building block such as BLE is proposed. Due to very 
high carrier mobility and low capacitance of carbon nanotubes, the CNFET 
based BLE outperforms in speed and power consumption compared to BLE 
implemented in bulk CMOS technology. 
• An input vector control scheme for reducing the leakage of unused FPGA 
multiplexer based interconnect is proposed which reduces significantly leakage 
power of interconnects without any kind of penalty. 
• A CNFET based SRAM cell is proposed due to inherent characteristics of 
CNFET such as good gate controllability, drive current and immunity to short 
channel effect, the CNFET cell outperforms the CMOS cell in terms of leakage 
power saving, write margin, speed and read SNM. 
• A mixed CNT bundle based interconnects for FPGAs is proposed and the 
performance is compared with conventional Cu interconnects. Due to lower 
values of extracted R and C components of mixed CNT, the mixed CNT bundle 
interconnects based FPGA resources outpeiform in delay, power, energy and 
energy-delay-product (EDP). A combination of CNFET driver and CNT 
interconnect gives best performance in terms of key parameters. 
• A novel configuration of dynamic threshold MOS (DTMOS) scheme for FPGA 
muhiplexer based interconnect is proposed. In DTMOS, the gate potential 
changes the threshold voltage and produces a strong 1 which reduces the DC 
power dissipation in level restoring buffer. By using this novel DTMOS, a 
realistic multiplexer based interconnect resources of FPGA such as Double, Hex 
and Long are implemented. Due to high driving current of DTMOS, the 
143 
Areas for Tuture ^search 
DTMOS based interconnect resource outperforms in speed and PDP compared 
to conventional MOS interconnects. 
10.5 Areas for Future Research 
• Subthreshold FPGA 
A subthreshold FPGA design faces a combination of subthreshold circuit 
challenges and problem inherent in FPGA architectures. Three major challenges stand 
out for the subthreshold FPGA design. These are process variation, long length of 
interconnect and memory. Variation threatens to disrupt any subthreshold design. 
FPGAs typically dissipate 60% - 70% of their pov/er in the interconnect network (e.g., 
wires, buffers, connection boxes and routing switches), 10% - 20% in the clock 
network and 5% -20% in logic. This breakdown indicates that a focus on clocking and 
interconnect are necessary. These are the two areas on which very little previous 
svibthreshold FPGAs work has been focused. The clock network for an FPGA extends 
across the entire fabric and drives all of the registers in the design. This large 
distributed network can consume a significant amount of power. Furthermore, driving 
the large capacitive load of the clock network with the buffers operating in the 
subthreshold region presents a significant problem. Variation in the buffer can lead to 
substantial differences in the drive strength of buffers leading potentially large clock 
skew across the FPGA fabric. Hence some efficient method to reduce skew is important 
for viable subthreshold FPGA design. One extension of this work may be the 
implementation of interconnects by Mixed CNT bundle instead of Copper and all the 
logics, switch boxes and connection boxes by CNFET instead of CMOS. 
• Nanowire Based FPGAs 
The [ITRS-07] has stated that it will be a difficult challenge to progress CMOS 
technology beyond the 22 nm technology generation. This challenge has stimulated the 
next-generation devices most likely based on non-planar structures such as double-gate 
FETs and fin-FETs. However, these technologies rely on an enhancement of individual 
device performance (such as increased mobility, lower leakage current or higher drive 
current) and do not solve the issues of the size and density limitations. 
To directly address these challenges, novel nanoscale transistor channel materials, such 
as semiconducting nanowires (NWs) has found very good scope to replace the 
144 
Jlreasfor Tuture ^searcH 
conventional CMOS. These materials are attractive because they have very narrow 
diameters and have no density limitations since they are not fabricated using 
conventional lithography techniques. 
Due to their crystalline structure, smooth surfaces and the ability to produce 
radial and axial nanowire they can reduce scattering, which results in higher carrier 
mobility. Semiconducting nanowires of a variety of materials can be grown with 
controlled diameters down to 3nm. By material selection or doping, one can engineer 
the electrical properties of the nanowires (e.g., P-type, N-type). With nano-imprint. Dip 
Pen Nano-lithography and self-assembly technologies it is possible to get sets of 
parallel wires which can be used in FPGA routing. These wires could be made using a 
single crystal of metal-silicide (NiSi nano-wires). At the nanoscale, one can use single-
molecule switches that exhibit reversible switching behavior. These molecules self-
asisemble at the cross-points of nanowires, and can be switched between ON and OFF 
states by the application of a voltage bias. 
The future scope of this work is to implement thie FPGA routing interconnects with 
nanowires and the C-Box, S-Box and logic blocks by single-molecule switches. It is 
expected that such nanoscale FPGAs architecture v^ ill provide the best performance 
with the least area. Building successful nanoscale devices require synergy between the 
Design and Process Engineers and the Chemists. 
• Sublithography FPGAs 
One of the most important challenges for scaling feature sizes is the cost of the 
fabrication process. Sublithographic techniques may offer an economical alternative to 
costly lithographic feature size scaling. These nev^  sublithographic technologies with 
lOnm full pitch semiconductor and metal nanov i^res may enable tera-scale system 
integration. In addition, nanowires also provide \QV)r high interconnect density. Sub-
lithographic electronic devices may also work as reconfigurable molecules (switching 
elements). 
Switchable molecules can be assembled and placed one or a few molecules 
under each junction in a crossed array to provide programmable junctions. With these 
building blocks, one can build a diode junction by crossing P-doped and N-doped 
nanowire. Then field-effects are used to control conduction in semiconducting 
nanowires to implement switchable crosspoints or memory bits. Using the axial 
145 
^reasfor future (Rfsearcfi 
variation of doping or material combination, a single nanowire can have regions which 
are gateable and other regions which are not gateable. These devices are sufficient to 
build programmable memory points and field-effect based inverting and restoring logic 
gates which can be incorporated in the future nanosacle FPGAs. 
The CNFET Technology is assumed to be perfect in this work for simulation 
purposes. The work can be refined further by assuming limitations of existing CNT 
technology like imperfect Chirality control, misaligned CNTs and presence of metallic 
CNTs etc. 
146 
Jippendhi JL 
LISTOT (PV(BLICATI0^ 
List of(Pu6Rcation 
Appendix A 
LIST OF PUBLICATION 
Journals 
1- Kureshi A. K. and Mohd. Hasan, "Perfoimance comparison of CNFET and 
CMOS based 6T SRAM cell in deep submicron," Microelectronic Journal, 
Vol.40, No.6, pp.979-982, June-2009. 
2- Kureshi A. K. and Mohd. Hasan, "Analysis of CNT bundle and its comparison 
with copper interconnect for CMOS and CNFET drivers," Journal of 
Nanomaterials, pp. 1-6, 2009. 
3- Kureshi A. K. and Mohd. Hasan, "DTMOS based low power high speed 
interconnects for FPGA," Journal of Computers, Vol.4, No. 10, pp.921-926, 
Oct-2009. 
4- Kureshi A. K. and Mohd. Hasan, "Analysis of CNT bundle and its comparison 
with copper for FPGAs interconnects," International Journal of Applied Science, 
Engineering and Technology, Volume 5:3, pp.178-183, 2009. 
5- Kureshi A. K. and Mohd. Hasan, "Leakage analysis and optimization of CLB in 
Vertex-II FPGAs," International Journal of Systemic Cybernetic and 
Informatics, Vol.3, No.l, pp.2718-2722, Nov-2007 
Conferences 
6- Abdul Kadir Kureshi, Mohd. Hasan, and Naushad Alam, "Subthreshold deep 
submicron performance investigation of CMOS and DTCMOS Biasing 
Schemes for Reconfigurable Computing," IEEE ISCAS, Taiwan, pp. 2545-
2548, May 2009. 
7- Mohd. Hasan and Abdul Kadir Kureshi, "Leakage reduction in FPGA routing 
multiplexers," IEEE ISCAS, Taiwan, pp. 1129-1132, May 2009. 
8- Naushad Alam, A. K. Kureshi, and Mohd. Hasan, "Carbon nanotube 
interconnects for low-power high-speed applications," IEEE ISCAS, Taiwan pp. 
2273-2276, May 2009. 
9- A. K. Kureshi and Mohd. Hasan, "Energy efficient high speed CNFET based 
interconnect drivers for FPGAS," International Conference on Multimedia, 
147 
List of(Pu6Rcation 
Signal Processing and Communication Technologies (IMPACT-09) pp. 48-51, 
March 2009. 
10-Naushad Alam, A. K. Kureshi, and Mohd. Hasan, "Analysis of carbon nanotube 
interconnects and their comparison with Cu interconnects," International 
Conference on Multimedia, Signal Processing and Communication 
Technologies (IMPACT-09), pp. 124-127, March 2009. 
11-Naushad Alam, A. K. Kureshi, and Mohd. Hasan, "Performance comparison 
and variability analysis of CNT bundle and Cu interconnects," International 
Conference on Multimedia, Signal Processing and Communication 
Technologies (IMPACT-09), pp. 169-172, March 2009. 
12-Kureshi A. K., Naushad Alam, and Mohd. Hasan, "A novel low power high 
speed field programmable gate array routing interconnect," in the proceedings 
of SPIT-IEEE Colloquium and International Conference, Vol. 2, pp. 145 - 149, 
Feb. 2008. 
13-Naushad Alam, Kureshi A. K., and Mohd. Hasan, "Analysis and comparison of 
subthreshold 1-bit full adder cells", in the proceedings of SPIT-IEEE 
Colloquium and International Conference, Vol. 2, pp. 127 - 131, Feb. 2008. 
14-Kureshi A. K. and Mohd. Hasan, "Leakage power estimation and minimization 
in CLB of FPGA," Proceedings of the IEEE-International Conference on 
Computer and Communication Engineering, Kuala Lumpur, Malaysia, pp. 270-
274, May 13-15, 2008. 
15-Tarun Kumar Agarwal, Anurag Sawhney, Kureshi A. K. and Mohd. Hasan, 
"Performance comparison of static CMOS and MCML gates in sub-threshold 
region of operation for 32nm CMOS technology," Proceedings of the IEEE-
International Conference on Computer and Communication Engineering, Kuala 
Lumpur, Malaysia, pp. 284-287, May 13-15,2008. 
16-Kureshi A. K. and Mohd. Hasan, "Low power field programmable gate array 
interconnects," in the proceedings of the: 5th International Conference on 
Systemics, Cybematics and Informatics (ICSCI2008), Vol. 1, pp. 52 - 55. 
17-Naushad Alam, Kureshi A. K.,and Mohd. Hasan, "Subthreshold CMOS full 
adder for ultra-low power operation," in the proceedings of the 5th International 
Conference on Systemics, Cybematics and Informatics (ICSCI 2008), Vol. 1, pp. 
48-51 
148 
List qf(Pu6Rcation 
18- Kureshi A. K. and Mohd. Hasan, "Leakage analysis of CNFET based basic 
digital building blocks," in the proceedings of International Conference on 
Embedded System and VLSI Design (ICEVD 2008), pp. 246 - 249 
19-Kureshi A. K. and Mohd. Hasan, "Leakage power and delay optimization of 
FPGA interconnects," IEEE-International Conference on Signal processing. 
Communications and Networking (ICSCN 2008), pp. 568-572, Jan 2008. 
20- Kureshi A. K. and Mohd. Hasan, "Low leakage high speed CNFET based look 
up table," IEEE-International Conference on Emerging Trends in Electronics 
Technology), pp. 432-436, July 2008. 
21- Fahad aliusmani, A. K. Kureshi, Mohd. Hasan, and M.J.R. Khan, "Performance 
analysis of bulk CMOS, strained silicon and CNFET based operational 
amplifiers in VDSM technology," IEEE International Advance Computing 
Conference (lACC), pp. 2567-2579, March- 2009. 
22- Kureshi A. K., and Mohd. Hasan, "Performance comparison of CNFET and 
CMOS based 8T SRAM Cell in deep submicron," 12th IEEE VLSI Design and 
Test Symposium (VDAT-08), pp. 270-274, July 2008. 
23-Naushad Alam, Kureshi A. K., and Mohd. Hasan, "Dynamic threshold PMOS 
switch for power gating," 12th IEEE VLSI Design and Test Symposium 
(VDAT-08), pp. 317-320, July 2008. 
24-Anurag Agarwal, Kureshi A. K., and Mohd. Hasan, "Dynamic threshold PMOS 
switch for power gating," 12th IEEE VLSI Design and Test Symposium 
(VDAT-08), pp. 426-430, July 2008. 
25-Kureshi A. K. and Mohd. Hasan, "Subthreshold operation of field 
programmable gate array," In the proceedings of MTECS, pp. 244 - 247, March 
2008. 
26-Kureshi A. K. and Mohd. Hasan, "Interconnect performance comparison of 
FPGA at 32nm technology," in the proceedings of NSC-08 (IIT Roorkee), pp. 
766 - 769, 2008. 
27- Kureshi A. K. and Mohd. Hasan, "Leakage analysis and optimization of SRAM 
cell at 32nm," in the proceedings of NSC-07, pp. 123 - 126, 2007. 
28- Kureshi A. K. and Mohd. Hasan, "A study ctf different circuit level techniques 
for low leakage SRAM cells," In the proceedings of NCACCN, pp. 438 - 411, 
Feb 2007. 
149 
List of(Pu6Rcation 
29-Kureshi A. K. and Mohd. Hasan, "Energy efficient FPGA interconnects," 
Submitted to Mediterranean Nanotechnology letters, 7"^  Jan-2010. 
30- Kureshi A. K. and Mohd. Hasan, "DTMOS based low power FPGAs building 
blocks," Submitted to Mediterranean journal of Electronics and Communication, 
2"''Feb-2010. 
31-Kureshi A. K. and Mohd. Hasan, "Ultrei-low power FPGAs design in 
subthreshold regime," Submitted to International Journal of Electronics, 10 
April 2010. 
150 

(References 
F^FERENCES 
[I] E. Kusse and J. Rabaey, "Low-energy embedded FPGA structures," in Proc. 
Int. Symp. Low Power Electronics and Design, pp. 155-160, Aug. 1998. 
[2] V. George and J. Rabaey. Low-Energ>' FPGAs: Architecture and Design. 
Kluwer Academic Publishers, Boston, VIA, 2001. 
[3] P. Zuchowski, C. Reynolds, R. Grupp, S. Davis, B. Cremen, and B. Troxel, 
"A Hybrid ASIC and FPGA Architecture," IEEE International Conference 
on Computer-Aided Design, pp. 187-194,2002. 
[4] F. Li and L. He, "Power modeling and characteristics of field programmable 
gate arrays," IEEE Trans. Computer-Aided Design of Integrated Circuits 
and Systems, Vol. 24, No. 11, pp. 1712-1724, Oct. 2005. 
[5] I. Kuon and J. Rose., "Measuring the gap between FPGAs and ASICs," 
IEEE Trans. On CAD of Integrated Circuits and Systems, Vol. 26, No. 2, 
pp.203-215, February 2007. 
[6] Stratix FPGA Device Handbook. Altera Corp., San Jose, CA, 2003. 
[7] Xilinx, Inc., San Jose, CA. Virtex-5 FPGA Data Sheet, 2007. 
[8] Ahmed, E.; Rose, J., "The effect of LUT and cluster size on deep-submicron 
FPGA performance and density," IEEE Transactions on Very Large Scale 
Integration (VLSI) Systems, vol.12, no.3, pp. 288- 298, March 2004. 
[9] T. Tuan and B. Lai. "Leakage power analysis of a 90nm FPGA," in Proc. 
IEEE Custom Integmted Circuhs Conf, pp. 57-60, 2003. 
[10] L. Shang, A. S. Kaviani, and K. Bathala, "Dynamic power consumption in 
Virtex-Il FPGA Family," in Proc. ACM/SIGDA 10th Int. Symp. Field 
Programmable Gate Arrays, pp. 157-164, 2002. 
[II] http://www.itrs.net/Links/2007ITRS/2007 Chapters/2007 Design.pdf. 
[12] S. Srinivasan, A. Gayasen , N. Vijaykrishnan, T. Tuan, "Leakage control in 
FPGA routing fabric" ASP-DAC, pp. 661-664, 2005. 
151 
<S^ferences 
[13] J. H. Anderson and F. N. Najm, "Low-Power Programmable Routing 
Circuitry for FPGAs," ICCAD, pp. 602-609, 2004. 
[14] F. Li, Y. Lin, and L. He, "Vdd Programmability to reduce FPGA 
Interconnect Power," in Proc. Intl. Conference on Computer Aided Design, 
Nov 2004. 
[15] F. Li, D. Chen, L. He, and J. Cong, "Architecture evaluation for power-
efficient FPGAs. In ACM International Symposium on Field Programmable 
Gate Arrays," pp. 175-184, 2003. 
[16] K.W. Poon, A. Yan, and S. J. E. Wilton, "A flexible power model for 
FPGAs. In International Conference on Field-Programmable Logic and 
Applications," pp. 312-321, 2002. 
[17] S.M. Kang and Y. Leblebici. CMOS Digital Integrated Circuits: Analysis 
and Design. Third Edition. McGraw-Hill, 2003. 
[18] G. Yeap. Practical Low Power Digital VLSI Design. Kluwer Academic. 
[19] Man Lung Mui, Kaustav Banerjee, and Amit Mehrotra, "Supply and Power 
Optimization in Leakage-Dominant Technologies" IEEE Trans, on 
computer-aided design of integrated circuits and systems. Vol. 24, No. 9, 
pp. 1362-1371 Sept-2005. 
[20] H. J. M. Veendrick, "Short-circuit dissipation of static CMOS circuitry and 
its impact on the design of buffer circuits," IEEE J. Solid-State Circuits, 
Vol. 19, No. 4, pp. 468^73, Aug. 1984. 
[21] A. Agarwal, C. Kim, S. Mukhopadhyay, and K. Roy, "Leakage in Nano-
Scale Technologies: Mechanisms, Impact and Design Considerations," In: 
ACM/IEEE Design Automation Conference, pp. 6-11, San Diego, CA, 
2004. 
[22] W.K. Henson et al., "Analysis of leakage currents and impact on off-state 
power consumption for CMOS technology in the 100-nm regime," IEEE 
Transactions on Electron Devices, Vol. 47, No. 2, pp. 440-447, February 
2000. 
152 
'^ferences 
[23] K. Roy, S. Mukhopadhyay, H. M. Meimand, "Leakage current mechanisms 
and leakage reduction techniques in deep-submicrometer CMOS circuits," 
Proc. IEEE, Vol. 91, No.2, pp. 305-327, Feb -2003. 
[24] V. De et al., "Techniques for leakage power reduction," in Design of High-
Performance Microprocessor Circuit, Circuits, A. Chandrakasan, W. .1. 
Bowhill, and F. Fox, Eds. Piscataway, N.I: IEEE, 2001, pp. 285-308. 
[25] B. Van Zeghbroeck, Principles of Semiconductor Devices, http://ecewww. 
Colorado. Edu /-bart/ book/book/title.htm, ch.4. 
[26] Y. Taur and T. H. Ning, Fundamentals of Modem VLSI Devices. New 
York: Cambridge Univ. Press, 1998, ch. 2, pp. 94-95. 
[27] S. Borkar, "Circuit techniques for subthreshold leakage avoidance, control, 
and tolerance," in lEDM Tech. Dig., pp. 421-424, 2004. 
[28] D. Lee, W. Kwong, D. Blaauw, and D. Sylvester., "Simultaneous 
subthreshold and gate-oxide tunneling leakage current analysis in 
nanometer CMOS design," ISQED, pp. 287-292, 2003. 
[29] H. Koura, M. Takamiya, and T. Hiramoto, "Optimum conditions of body 
effect factor and substrate bias in various threshold voltage MOSFETs," 
Jpn. J. Appl. Phys., vol. 39, no. 4B, pp. 2312-2317, Apr. 2000. 
[30] S.-F. Huang, et al., "Scalability and biasing strategy for CMOS with active 
well bias," in VLSI Symp. Tech. Dig., pp. 107-108, 2001. 
[31] J. Kao, A. P. Chandrakasan, "Dual-threshold voltage techniques for low-
power digital circuits," IEEE Journal of Solid-State Circuits, Vol. 35, pp. 
1009-1018, July 2000. 
[32] D. Lee, D. Blaauw, and D. Sylvester, "Gate oxide leakage current analysis 
and reduction for VLSI circuits," IEEE Trans, on Very Large Scale 
Integration (VLSI) Systems, Vol. 12, No. 2, pp. 155-166, Feb. 2004. 
[33] F. Hamzaoglu and M. Stan, "Circuit-level techniques to control gate leakage 
for sub-lOOnm CMOS," in Proc. Int.Symp. Low Power Electronics and 
Design, pp. 60-63, Aug.- 2002. 
153 
'l^erences 
[34] S. Mukhopadhyay, C. Neau, R. T. Cakici, A. Agarwal, C. H. Kim, and K. 
Roy, "Gate Leakage Reduction for Scaled Devices Using Transistor 
Stacking," IEEE Transactions on VLSI Systems, Vol. 11, No. 4, pp. 716-
730, 2003. 
[35] K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design. New 
York: Wiley, 2000, ch. 2, pp. 28-29. 
[36] 0. Semenov, A. Pradzynski, M. Sachdev, "Impact of Gate Induced Drain 
Leakage on Overall Leakage of Submicrometer CMOS VLSI Circuits," 
IEEE Trans, on Semicond. Manufacturing, Vol. 15, No.l, 2002. 
[37] J.-M. Chang and M. Pedram, "Energy minimization using multiple supply 
voltages," IEEE Transactions on Very Large Scale Integration (VLSI) 
Systems, vol. 5, pp. 436-443, 1997. 
[38] V. Sundararajan and K. K. Parhi, "Synthesis of Low-Power CMOS VLSI 
Circuits using Dual Supply Voltages," ACM Design Automation 
Conference, New Orleans,, pp. 72-75, 1999. 
[39] C. Chen, A. Srivastava, and M. Sarrafzadeh, "On gate level power 
optimization using dual-supply voltages," IEEE Transactions on Very 
Large Scale Integration (VLSI) Systems, Vol. 9, pp. 616-29, 2001. 
[40] Y. Cao, T. Sato, M. Orshansky, D. Sylve:ster, and C. Hu, "New paradigm of 
Predictive MOSFET and Interconnect Modeling for Early Circuit 
Simulation," IEEE Custom Integrated Circuits Conference Proceedings 
(CICC'2000), pp. 201-204. 2000. 
[41] V. Stojanovic, D, Markovic, B. Nikolic, M. Horowitz, and R. Brodersen, 
"Energy-Delay Tradeoffs in Combinational Logic Using Gate Sizing and 
Supply Voltage Optimization," Proc. European Solid-State Circuits Conf, 
Italy, Sept. 2002. 
[42] D. Nguyen, A. Davare, M. Orshansky, D. Chinnery, B. Thompson, K. 
Keutzer, "Minimization of Dynamic and Static Power Through Joint 
Assignment of Threshold Voltages and Sizing Optimization," ISLPED 
2003. 
154 
<l(gferences 
[43] S. Augsburger, B. Nikolic, "Reducing Power with Dual Supply, Dual 
Threshold and Transistor Sizing," International Conference on Computer 
Design, ICCD'02, pp.316-321, Sept. 2002. 
[44] W. Chuang, S. S. Sapatnekar, and I. N. Hajj, "A Unified Algorithm for Gate 
Sizing and Clock Skew Optimization to Minimize Sequential Circuit Area," 
in Proc. of the International Conference on Computer Aided Design, pp. 
220-223, Nov. 1993. 
[45] Burah M., Owens R. M., and Irwin M. J., "Transistor Sizing for low power 
CMOS circuits," IEEE Transaction on Computer Aided Design, pp. 665-
677, June 1996. 
[46] C. V. Schimpe, A. Wroblewski, and J. A. Nassek, "Transistor Sizing for 
Switching Activity Reduction in Digital Circuits," in Proc. of the European 
Conference on Theory and Design, Vol.1, pp. 114-117, Aug. 1999. 
[47] M. Borah, M. J. Irwin, and R. M. Owens, "Minimizing Power Consumption 
of Static CMOS Circuits by Transistor Sizing and Input Reordering," in 
Proc. Of the International Conference on VLSI Design, pp. 294-298, Jan. 
1995. 
[48] J. Frenkil and S. Venkatraman. Power Gating Design Automation Closing 
the Power Gap Between ASIC and Custom. Springer, 2007, Chapter 10. 
[49] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, 
"1 -V Power Supply High-Speed Digital Circuit Technology with 
Muhithreshold CMOS," IEEE Journal Solid-State Circuits Vol.30, No. 8, 
pp. 847-854, Aug. 1995. 
[50] K. Usami, N. Kawabe, and M. Koizumi, "Automated selective multi-
threshold design for ultra-low standby applications," ISLPED'02, pp.202-
206, Aug. 2002. 
[51] H. Ananthan, C. H. Kim, and K. Roy, "Larger-than-Vdd Forward Body Bias 
in Sub-0.5V Nanoscale CMOS," in Proc. of the International Symposium 
on Low Power Electronics and Design, pp. 8-13, 2004. 
155 
^ferences 
[52] V. K. Amim, E. Borinski, P. Seegebrecht, H. Fiedler, R. Brederlow, R. 
Thewes, J. Berthold, and C. Pacha, "Efficiency of Body Biasing in 90-nm 
CMOS for Low- Power Digital Circuits," IEEE Journal of Solid-State 
Circuits, Vol. 40, No. 7, pp. 1549-1556, 2005. 
[53] X. Liu and S. Mourad, "Performance of Submicron CMOS Devices and 
Gates with Substrate Biasing," in Proc. of the IEEE International 
Symposium on Circuits and Systems, pp. 9-12, 2000. 
[54] L. Wei, K. Roy, and V. K. De, "Low Voltage Low Power CMOS Design 
Techniques for Deep Submicron ICs," in Proc. of the 13th International 
Conference on VLSI Design, pp. 24-29,2000. 
[55] S. Narendra, S. Borkar, V. De, D. Antoniadis and A. Chandrakasan, 
"Scaling of Stack Effect and its Application for Leakage Reduction," IEEE 
Proc. of Low Power Electronics and Design, pp. 195-200, 2001. 
[56] M. C. Johnson, D. Somasekhar, C. Lih-Yih, and K. Roy, "Leakage Control 
With Efficient Use of Transistor Stacks in Single Threshold CMOS," IEEE 
Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 10, No. 
1, pp. 1-5,2002. 
[57] S. Mukhopadhyay and K. Roy, "Accurate Modeling of Transistor Stacks to 
Effectively Reduce Total Standby Leakage in Nano-Scale CMOS Circuits," 
in Proc. Symposium on VLSI Circuits, pp. 53-56, 2003. 
[58] J. Park, V. J.Mooney, and P. Pfeiffenberger, "Sleepy Stack Reduction in 
Leakage Power," Proceedings of the International Workshop on Power and 
Timing Modeling, Optimization and Simulation, pp. 148-158, September 
2004. 
[59] L.T. Clark, R. Patel, and T.S. Beaty, "Managing standby and active mode 
leakage power in deep submicron design," Proc. International Symposium 
on Low Power Electronics and Design, pp. 274-279, Aug. 2004. 
160] Xilinx Corporation, "XC4000 Field Programmable Gate Arrays: 
ProgrammableLogic Databook", 1996. 
156 
^ferences 
[61] Xilinx Corporation, "XC6200 Field Programmable Gate Arrays: Advance 
Product Specification", June 1996. 
[62] Srinivas, R., et al., "A High Density Embedded Array Programmable Logic 
Architecture", Altera Corporation, IEEE 1996 Custom Integrated Circuits 
Conference. 
[63] Xilinx, Inc., Datasheet: Virtex II Pro Platform FPGAs: Functional 
Description (DS083-2), January 2002. 
[64] Altera Corporation, Datasheet: Stratix Programmable Logic Family Device, 
February 2002. 
[65] Xilinx, Inc., Datasheet: Virtex-E 1.8V Field Programmable Gate Arrays, 
September 2000. 
[66] Accelerator Series FPGAs- ACT family, Actel corporation, 1997. 
[67] XS family of high performance FPGAs, Actel corporation, 2001. 
[68] Y. Lia, C. Kao, T. Chang and K. Chen, "A FPGA chip with Hierarchical 
Interconnection Structure," Proc. of IEEE International Symposium on 
Circuits and Systems, pp. 402-405, 1998. 
[69] "Virtex-II Platform FPGAs: Complete Data Sheet," www.xilinx.com. 
[70] T. Tuan, S. Kao, A. Rahman, S. Das, and S. Trimberger, "A 90nm low-
power FPGA for battery-powered applications," IEEE Trans. On CAD of 
Integrated Circuits and Systems, Vol. 26, No. 2, pp. 296-300, February 
2007. 
[71] Benton H. Calhoun, Frank A. Honore, and Anantha Chandrakasan, "Design 
Methodology for Fine-Grained Leakage Control in MTCMOS," ISLPED-
06, pp. 104-109. 
[72] Pradeep S. Nair, Santosh Koppa, Eugene; E5. John, "A comparative analysis 
of coarse-grain and fine-grain power gating for FPGA lookup tables," 52nd 
IEEE International Midwest Symposium on Circuits and Systems, pp.507-
510,2009. 
157 
<li^ferences 
[73] Arifur Rahmanl, Satyaki Dasl, Tim Tuanl, and Steve Trimberger, 
"Determination of Power Gating Granularity for FPGA Fabric," IEEE 
Custom Integrated Circuits Conference, pp.9-12, 2006. 
[74] Hassan Hassan, Mohab Anis, Antoine El Daher, Mohamed Elmasry, 
"Activity Packing in FPGAs for Leakage Power Reduction," Proceedings of 
the Design, Automation and Test in Europe Conference and Exhibition, Vol. 
I,pp.2I2-2I7,(DATE-2005). 
[75] F. Li, Y. Lin, L. He, and J. Cong, "Low-Power FPGA Using Pre-Defmed 
Dual-Vdd/Dual-Vt Fabrics," Proceedings of International Symposium on 
Field Programmable Gate Arrays, pp. 42-50, 2004. 
[76] L. Ciccarelli, A. Lodi, and R. Canegallo, "Low leakage circuit design for 
FPGAs," IEEE Custom Integrated Circuits Conference, pp. 715-718, 2004. 
[77] Arifur Rahmanl, Satyaki Dasl, Tim Tuanl, and Anirban Rahut, 
"Heterogeneous Routing Architecture for Low-Power FPGA Fabric," IEEE 
Custom Integrated Circuits Conference, pp.9-3(I)-5, 2005. 
[78] L. A. Geddes, "Historical Highlights in Cardiac Pacing", IEEE Engineering 
in Medicine and Biology Magazine, pp. 12-18, June 1990. 
[79] A. P. Pentland, et. al, "Digital Doctor: An Experiment in Wearable 
Telemedicine," International Symposium on Wearable Computers, pp. 173-
174, 1997. 
[80] C. Hyung Kim, H. Soeleman and K. Roy, "Ultra-Low- Power DLMS 
Adaptive Filter for Hearing Aid Applications," IEEE Transactions on Very 
Large Scale Integration (VLSI) Systems, Vol. 11, No. 6, , pp. 1058-1067, 
Dec. 2003. 
[81] E. Vittoz, "Micropower techniques," in Design of VLSI Circuits for 
Telecommunication and Signal Processing, J. E. Franca and Y. P. Tsividis, 
Eds. Englewood CHffs, NJ: Prentice-Hall, 1994, ch. 5. 
[82] G. Asada, M. Dong, T. S. Lin, F. Newberg, G. Pottie, and W. J. Kaiser, 
"Wireless integrated network sensors: Low power systems on a chip," in 
Proc. ESSCIRC'98,1998, pp. 9-16. 
158 
^ferences 
[83] D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, "Next century 
challenges: Scalable coordination in sensor networks," in Proc. ACM 
MobiCom'99, Aug. 1999, pp. 263-270. 
[84] J. Nyathi and B. Bero, "Logic circuits operating in subthreshold voltages". 
Int. Symp. On low power Electronics and Design, pp. 131-134, Oct. 2006. 
[85] B.H. Calhoun, A. Wang and A. Chandrakasan, "Modeling and sizing for 
minimum energy operation in subthreshold circuits," IEEE Journal of solid 
state circuits, Vol. 40, No.9 pp. 1778-1786, Sept. 2005. 
[86] A. Wang, A.P. Chandrakasan, "A 180-mV subthreshold FFT processor 
using a minimum energy design methodology," IEEE JSSC, Vol. 40, No.l, 
pp. 310-319, Jan. 2005. 
[87] H. Soeleman, K. Roy and B.C. Paul, "Robust subthreshold logic for ultra-
low power operation," IEEE Trans. On VLSI Systems, Vol. 9, No. 1 pp. 90-
99, Feb., 2001. 
[88] S. Hanson et. al.,"Exploring variability and performance in a sub- 200mV 
processor," IEEE Journal of solid state circuits, Vol. 43, No. 4, pp. 881-891, 
April, 2008. 
[89] S. Hanson et. al, "Energy optimality and variability in subthreshold design," 
Int. Symp. On low power Electronics and Design, pp. 363-365, Oct. 2006. 
[90] R.J. Ramirez, J. Jaffar and M. Anis, "Variability-aware design of 
subthreshold devices," Int. symp. On Circuits and Systems, pp. 1196- 1199, 
2008. 
[91] Fariborz Assaderaghi et. al, "Dynamic Threshold-Voltage MOSFET 
(DTMOS) for Ultra-Low Voltage VLSI," IEEE Trans. On Electron Devices, 
Vol. 44, No. 3, pp. 414 -422, Mar. 1997. 
[92] Fariborz Assaderaghi et. al., "A Dynamic Threshold-Voltage MOSFET 
(DTMOS) for Very Low Voltage Operation," IEEE Electron Devices Lett., 
Vol. 15, No. 12, pp. 510 - 512, Dec. 1994. 
[93] N. G. Tarr, R. Soreefan, T. W. MacElwee, W. M. Snelgrove, and S. 
Bazarjani, "Simple backgated MOSFET structure for dynamic threshold 
159 
^ferences 
control in fully depleted SOI CMOS," Electron. Lett., vol. 32, pp. 1093-
1095, June 1996. 
[94] N. Lindert, T. Sugii, S. Tang and C. Hu, "Dynamic threshold pass-transistor 
logic for improved delay at lower power supply voltages," IEEE J. Solid-
State Circuits, vol.34, no.l, pp.85-90, Jan. 1999. 
[95] http://www.eas.asu.edu/~ptm/ 
[96] Navid Azizi and Farid N. Najm, "Look-Up Table Leakage Reduction for 
FPGAs," IEEE Custom Integrated Circuits Conference, pp. 1-4, 2005. 
[97] L. Chang et al., "Stable SRAM Cell Design for the 32nm Node and 
Beyond," Symp. VLSI Tech. Dig., pp. 292-293, Jun., 2005. 
[98] Massimo Alioto, and Gaetano Palumbo, "Analysis and Comparison on Full 
Adder Block in Submicron Technology" IEEE Trans. Very Large Scale 
Integration (VLSI) Systems, Vol. 10, No. 6, pp. 860-82,3 Dec. 2002. 
[99] A.M. Shams, T.K. Darwish, M.A. Bayoumi, "Performance analysis of low-
power 1-bit CMOS full adder cells," IEEE Transactions on VLSI Systems, 
Vol. 10,pp. 20-29,Jan.2002. 
[100] Massimo Alioto, and Gaetano Palumbo, Impact of Supply Voltage 
Variations on Full Adder Delay: Analysis and Comparison, "IEEE Trans. 
Very Large Scale Integration (VLSI) Systems," Vol. 14, No. 12, pp. 1322-
1335, Dec. 2006. 
[101] S. lijima, "Carbon Nanotubes: past, present, and future," Physica B,323, 1-
5, (2002). 
[102] M. S. Dresselhaus, G. Dresselhaus, and R. Saito, "Carbon fibers based on 
C60 and their symmetry," Phys. Rev. B, Vol. 45, pp. 6234-6242, 1992. 
[103] "Nobelprize.org", http://www.nobelprize.org (2007). 
',104] Sumio lijima, "Helical microtubules of graphitic carbon," Nature, v. 354, p. 
56- 58 (1991). 
[105] J.W. Mintmire, B. I. Dunlap, and C. T. White, "Are fuUerene tubules 
metallic?," Phys. Rev. Lett, vol. 68, pp. 631-634,1992. 
160 
<l^ferences 
1106] P. L. McEuen, M. S. Fuhrer, and H. Park, "Single-walled carbon nanotube 
electronics," IEEE Trans. Nanotechnol., Vol. 1, No. 1, pp. 78-85, Mar. 
2002. 
[107] P. Avouris, "Supertubes [carbon nanotubes]," IP^ EE Spectrum, Vol. 41, No. 
8, pp. 40-45, Aug. 2004. 
[108] P. Avouris, J. Appenzeller, V. Derycke, R. Martel, and S. Wind, "Carbon 
nanotube electronics," in lEDM Tech. Dig., Dec. 2002, pp. 281-284. 
[109] A. Loiseau, P. Launois, P. Petit, S. Roche, and .1. P. Salvetat, Understanding 
CarbonNanotubes. Berlin Heidelberg: Springer, 2006. 
[110] J. Wildoer, L. Venema, A. Rinzler, R. Smalley, and C. Dekker, "Electronic 
structure of atomically resolved carbon nanotubes," Nature, Vol. 391, pp. 
59-62, January 1998. 
[Ill] J. W. Mintmire and C. T. White, "Universal density of states for carbon 
nanotubes," Phys. Rev. Lett, Vol. 81, No. 12, pp. 2506-2509, Sep. 1998. 
[i!2| Jie Deng, and H.-S. Philip Wong, "A Compact SPICE Model for Carbon-
Nanotube Field-Effect Transistors Including Nonidealities and Its 
Application—Part T. Model of the Intrinsic Channel Region," IEEE Trans. 
On Electronics devices, Vol.54, No. 12, pp.3186-3194, Dec. 2007. 
[113] M. S. Dresselhaus, G. Dresselhaus, and P. Avouris, Carbon Nanotube.-
Synthesis, Properties, Structure, and Applications: Springer Verlag, 2001. 
R. Saito, G. Dresselhaus, and M. S. Dresselhaus, "Physical Properties of 
Carbon Nanotubes," Imperial College Press, London, 1998. 
5] S. lijima and T. Ichihashi, "Single-shell carbon nanotubes of 1 nm 
diameter," Nature, Vol. 363, pp. 603-605, 1993. 
D. S. Bethune, C. H. Kiang, M. S. Devries, G. Gorman, R. Savoy, J. 
Vaszquez, and R. Beyers, "Cobalt-catalyzed growth of carbon nanotubes 
with single atomic layer walls," Nature, Vol. 363, pp. 605-607,1993 
H. Takikawa, M. Yatsuki, T. Sakakibara, and S. Itoh, "Carbon nanoubes in 
cathodic vacuum are discharge," Journal of Physics: Applied Physics, Vol. 
33, pp. 826-830, 2000. 
161 
[114] 
[116] 
[1171 
ligferences 
[118] C. D. Scott, S. Arepalli, P. niklaev, and R. E. Smalley, "Growh mechanism 
for singlewall carbon nanotubes in a laser-ablation process," Applied 
Physics Letters^, Vol. 72, pp. 573-580, 2001. 
[119] A. Thess, R. Lee, P. Nikolaev, H. Dai, P. Petit, J. Robert, X. Chunhui, L. 
Young Hee, K. Seong Gon, A. G. Rinzler, and D. T. Colbert et al., 
"Crystalline ropes of metallic carbon nanotubes," Science, Vol. 273, pp. 
483^87, 1996. 
[120] F. Kreupl, et al., "Carbon Nanotubes in Interconnect Applications," 
Microelectronic Engineering, 64 (2002), pp. 399-408. 
[121] N. Franklin and H. Dai, "An enhanced CVD approach to extensive 
nanotubes networks with directionality," Adv. Mater., Vol. 12, pp. 890-894, 
2000. 
[122] R. L. V. Wal, L. J. Hal, and G. M. Berger, "Optimization of flame synthesis 
for carbon nanotubes using supported catalyst," Journal of Physical 
Chemistry B, Vol. 106, pp. 13122-13132, 2002. 
[123] R. L. V. Wal, G. M. Berger, and L. J. Hall, "Single-walled carbon nanotube 
synthesis via a multi stage flame configuration," Journal of Physical 
Chemistry B, Vol. 106, pp. 3564-3567, 2002. 
[124] R. L. Vander-Wal and T. M. Ticich, "Flame and furnace synthesis of single-
walled and multi-walled carbon nanotubes and nanofibers," Journal of 
Physical Chemistry B„ Vol. 105, pp. 10249-10256, 2001. 
[125] H.-S. P. Wong, J. Appenzeller, V. Derycke, R. Martel, S. Wind, Ph. 
Avouris, "Carbon Nanotube Field Effect Transistors - Fabrication, Device 
Physics, and Circuit Implications," IEEE International Solid-State Circuits 
Conference, pp. 370-371, 2003. 
[126] A. Raychowdhury, A. Keshavarzi, J. Kurtin, V. De, and K. Roy, "Carbon 
nanotube field-effect transistors for high-performance digital circuits—DC 
analysis and modeling toward optimum transistor structure," IEEE Trans. 
Electron Devices, Vol. 53, No. 11, pp. 2711-2717, 2006. 
162 
^ferences 
[127] A. Raychowdhury, A. Keshavarzi, J. Kurtin, V. De, and K. Roy, "Carbon 
Nanotube Field-Effect Transistors for High-Performance Digital Circuits— 
Transient Analysis, Parasitics, and Scalability," IEEE Trans. Electron 
Devices, Vol. 53, No. 11, pp. 2718-2726,2006. 
[128] Jie Deng, and H.-S. Philip Wong, "A Compact SPICE Model for Carbon-
Nanotube Field-Effect Transistors Including Nonidealities and Its 
Application—Part II: Full Device Model and Circuit Performance 
Benchmarking," IEEE Trans. On Electronics devices Vol.54, No. 12, 
pp.3195-3205, Dec. 2007. 
[129] Stanford University CNFET Model Web site. (2008). [Online]. Available: 
http: //nano.stanford.edu/model.php?id=23 
[130] A. Javey, Q. Wang, A. Ural, Y. Li, H. Dai, "Carbon Nanotube Transistor 
Arrays for Multistage Complementary Logic and Ring Oscillators," Nano 
Lett.,vol. 2, p. 929,2002. 
[131] A. Rahman and V. Polavarapuv, "Evaluation of Low-Power Design 
Techniques for Field Programmable Gate Arrays", 12th Int. FPGA Symp., 
pp. 23-30, 2004. 
[132] F. Li, Y. Lin, and L. He, "FPGA power reduction using configurable dual-
vdd," Pro. Of Design Automation Conf., pp. 735-740, June 2004. 
[133] K. K. Poon, "Power Estimation for Field Programmable Gate Arrays," MS 
Thesis in Dept. of Electrical and Computer Engg.: University of British 
Colmbia, 1999. 
[134] B. H. Calhoun, F. A. Honore, and A. Chandrakasan, "A Leakage Reduction 
Methodology for Distributed MTCMOS," IEEE Journal of Solid State 
Circuits, Vol. 39, No. 5, pp. 818-826,2004. 
[135] J.H. Anderson and F.N. Najm, "Active Leakage Power Optimization for 
FPGAs," IEEE Trans. CAD of ICs and Sys., Vol. 25, No. 3, pp. 423- 437, 
March, 2006. 
163 
^ferences 
[136] Hassan Hassan, Mohab Anis and Mohamed Almasry, "Input vector 
reordering for leakage power reduction in FPGAs," IEEE Trans. CAD of 
ICs and Sys., Vol. 27, No. 9, pp. 1555-1564, Sept, 2008. 
[137] S. Mukhopadhyay, S. Bhunia , K. Roy, "Modeling and Analysis of loading 
effect on leakage of nanoscaled bulk-CMOS logic circuits", IEEE Trans. 
CAD of ICs and Sys. Vol. 25, No. 8, August, 2006. 
[138] A. Kumar and M. Anis, "An analytical state dependent leakage power 
model for FPGAs," in Proc. DATE, 2006, pp. 612-617. 
[139] R. Bharadwaj, R. Konar, P. Balsara, and D. Bhatia, "Exploiting Temporal 
Idleness to Reduce Leakage Power in Programmable Architectures," in 
Proc. Conference on Asia South Pacific design automation, Jan 2005. 
[140] V. Betz and J. Rose, "Circuit design, transistor sizing and wire layout of 
FPGA interconnect." in Cust. ICs Conference, pp. 171-174, 1999. 
[141] Zheng Guo. et al., "Large-Scale SR.\M Variability Characterization in 
45nm CMOS," IEEE Jomal of solid-state circuits. Vol. 44, No.I I, pp.3174-
3192, Nov. 2009. 
[142] N. Kim, et al., "Circuit and microarchitectural techniques for reducing cache 
leakage power," IEEE Trans. Very Large Scale Integrated (VLSI) Systems, 
Vol. 12, No. 2, pp. 167 - 184, Feb.2004. 
[143] R.Martel, et. al, "Carbon nanotube field effect transistor for logic 
applications," Electron Devices Meeting, 20 lEDM Technical Digest 
International, pp. 7.5.1-7.5.4, 2001. 
[144] http://www-device.eecs.berkeley.edu/bsim3/bism4.html. 
[145] J. Deng, A. Lin, G. C. Wan, H. S. Philip Wong, "Carbon nanotube transistor 
compact model for circuit design and performance optimization," ACM 
Journal on Emerging Technologies in Computing Systems, Vol. 4, No.2, pp. 
7-19,2008. 
[146] A. Raychowdhury, S. Mukhopadhyay, K. Roy, "A circuit-compatible model 
of ballistic carbon nanotube field-effect transistors," IEEE Trans, on 
164 
differences 
Computer-Aided Design of Integrated Circuits and Systems 23 (2004), pp. 
1411-1420. 
[147] J. Deng, H.-S.P. Wong, "Modeling and analysis of planar gate capacitance 
for 1- D FET with multiple cylindrical conducting channels," IEEE 
Transactions on Electron Devices 54 (2007), pp. 2377-2385. 
[148] X. Wang, H.-S.P. Wong, P. Oldiges, R.J. Miller, "Electrostatic analysis of 
carbon nanotube arrays," International Conference on Simulation of 
Semiconductor Processes and Devices," pp. 163-166, Sept. 2003. 
[149] E. Morifuji, D. Patil, M. Horowitz, Y. Nishi, "Power optimization for 
SRAM and its scaling," IEEE Trans, on Electron Devices, Vol. 54, No. 4, 
pp. 715-722,2007. 
[150] 0. Thomas, M. Reyboz, M. Belleville, "Sub-lV, Robust and Compact 6T 
SRAM cell in Double Gate MOS technology", Proc. On IEEE International 
Symposium on Circuits and Systems(ISCAS), ]pp.2778 - 2781, May 2007.. 
[151] E. Seevinck et al., "Static-Noise Margin Analysis of MOS SRAM Cells," 
IEEE JSSC, pp.748-754, Oct., 1987. 
[152] R. Aly, M. Faisal, A. Bayoumi, "Novel 7T SRAM cell for low power cache 
design," Proceedings of the IEEE SOC Conference, pp. 171-174, 2005. 
[153] S.H. Gunther, "Managing the impact of increasing microprocessor power 
consumption," Intel Technology Journal Q 1 (2001) 1-9. 
[154] S. Tans, A. Verschueren, C. Dekker, "Room-temperature transistor based on 
a single carbon nanotube," Nature 393, pp. 49-52, 1998. 
[155] K.Teo, et. al., "Carbon nanotube technology for solid state and vacuum 
electronics," lEE Proceedings—Circuhs Devices and Systems 151(5), pp. 
443-451,2004. 
[156] R. Kotlyar, B. Obradovic, P. Matagne, M. Stettler, M.D. Giles, "Assessment 
of room-temperature phonon-limited mobility in gated silicon nanowires," 
Applied Physics Letters, pp. 5270-5272, 2004. 
165 
^ferences 
[157] E. Grossar, M. Stucchi, K. Maex, W. Dehaene, "Read stability and write-
ability analysis of SRAM cells for nanometer technologies," IEEE Journal 
of Solid-state Circuits, Vol. 41, pp. 2577-2588, 2006. 
[158] Y. Li, Chien-Sung Lu, "Characteristic Comparison of SRAM Cells with 
20nm Planar MOSFET, Omega FinFET and Nanowire FinFET," Sixth 
IEEE Conference on Nanotechnology, pp. 339-342, June 2006. 
[159] C. Bipul, et al., "Impact of a process variation on nanowire and nanotube 
devices performance," IEEE Transactions on Electron Devices Vol. 54, No. 
9, pp. 2369-2376, Sept. 2007. 
[160] J. Rose and D. Hill, "Architectural and Physical Design Challenges for One-
Million Gate FPGAs and Beyond," ACM Int. Symp. on FPGAs, 1997, pp. 
129-132. 
[161] ITRS: International Technology Roadmap for Semiconductor 2005. 
[162] W. Steinhogl, et al., "Size-dependent Resistivity of Metallic Wires in the 
Mesoscopic Range," Physical Review B, 66, 075414, 2002 
[163] J. Li, et al., "Bottom-up Approach for Carbon Nanotube Interconnects," 
Applied Physics Letters, Vol. 82, No. 15, pp. 2491-2493, April 2003. 
[164] B. Q. Wei, et al., "Reliability and Current Carrying Capacity of Carbon 
Nanotubes," Applied Physics Letters, Vol. 79, No. 8, pp. 1172-1174, 2001. 
[165] S. Berber, et al, "Unusually High Thermal Conductivity of Carbon 
Nanotubes," Physical Review Letters, Vol. 84, No. 20, pp. 4613-4616, 
2000. 
[166] A. Naeemi and J. D. Meindl, "Design and performance modeling for single-
walled carbon nanotubes as local, semiglobal, and global interconnects in 
gigascale integrated systems," IEEE Trans. Electron Devices, Vol. 54, No. 
1, pp. 26-37, Jan. 2007. 
[167] A. Naeemi, et al., "Performance Comparison between Carbon Nanotube and 
Copper Interconnects for Gigascale Integration (GSI)," IEEE Electron 
Device Letters, Vol. 26, No. 2, pp. 84-86,2005. 
166 
^ferences 
[168] H. Li, W. Y. Yin, K. Banerjee, and J. F. Mao, "Circuit modeling and 
performance analysis of multi-walled carbon nanotube interconnects," IEEE 
Trans. Electron Devices, vol. 55, no. 6, pp. 1328-1337, June 2008. 
[169] A. Naeemi and J. D. Meindl, "Compact physical models for multiwall 
carbon-nanotube interconnects," IEEE Electron Device Lett, vol. 27, no. 5, 
pp. 338-340, May 2006. 
[170] S. Haruehanroengra and W. Wang, "Analyzing conductance of mixed 
carbon-nanotube bundles for interconnect applications," IEEE Electron 
Device Letters, Vol. 28, No. 8, pp. 756-759, 2007. 
[171] H. Cho, et. al., "Modeling of the performance of carbon nanotubes bundle 
Cu/ low k and optical global on-chip interconnects," Proc. SLIP, pp.81-88, 
2007. 
[172] S. Datta, "Electrical Resistance: An Atomistic View," 10? Publishing 
Nanotechnology, Vol. 15, pp. S433-S451, 2004. 
[173] N. Srivastava and K. Banerjee, "Performance Analysis of Carbon Nanotube 
Interconnects for VLSI Applications," ICCAD, 2005, pp. 383- 390. 
[174] S. Wang, S. Haruehanroengra and M. Liu, "Inductance of mixed carbon 
nanotube bundles", lET Micro and Nano Letters., Vol.2, No.2, pp. 35-39, 
2007. 
[175] P. J. Burke, "Luttinger Liquid Theory as a Model of the Gigahertz Electrical 
Properties of Carbon Nanotubes,", IEEE Trans. Nanotechnology, Vol. 1, 
No. 3, pp. 129-144,2002. 
[176] M. W. Bockrath, "Carbon Nanotubes: Electrons in One Dimension," Ph.D. 
Dissertation, Univ. of California, Berkeley, 1999. 
[177] Sudeep Pasricha, Fadi Kurdhai, Nikil Dutta, "System level performance 
analysis of carbon nanotube global interconnects for emerging chip 
microprocessors," IEEE/ACM International symposium on nano-scale 
architectures, pp. 1-7,2008. 
[178] W. H. Hayt and J. A. Buck, Engineering Electromagnetics. 7th ed. New 
York: McGraw-Hill, 2005. 
167 
^ferences 
[179] A. Nieuwoudt and Y. Massoud, "On the optimal design, performance and 
reliability of future carbon nanotub-based interconnect solutions," IEEE 
Transactions on Electron Devices, vol. 55, no. 8, pp. 2097-2110, 2008. 
[180] C. L. Cheung, A. Kurtz, H. Park, and C. M. Lieber, "Diameter-controlled 
synthesis of carbon nanotubes," J. Phys. Chem., vol. 106, no. 10, pp. 2429-
2433, 2002. 
[181] L. Zhu, J. Xu, Y. Xiu, Y. Dennis, W. Hess, and C. P. Wong, "Growth and 
electrical characterization of high-aspect-ratio carbon nanotube arrays," 
Carbon, vol. 44, no. 2, pp. 253-258, 2006. 
[182] H. J. Li,W. G. Lu, J. J. Li, X. D. Bai, and C. Z. Gu, "Multichannel ballistic 
transport in multiwall carbon nanotubes," Phys. Rev. Lett, Vol. 95, No. 8, 
pp. 601-604, Aug. 2005. 
[183] http://www.nanohub.org/tools/cnia. 
[184] H. Li, et al., "Modeling of carbon nanotube interconnects and comparative 
analysis with Cu interconnects," in Proceedings of the Asia-Pacific 
Microwave Conference (APMC '06), pp. 1361-1364, 2006. 
[185] Azad Naeemi, James D. Meindl, "Carbon Nanotube Interconnects," Inter 
Stable SRAM Cell Design for national Symposium on Phsical Design, 
ISPD-2007, pp. 77-84, 2007. 
[186] S. Heo et al., "Next Generation On-Chip Communication Networks", 
www.cag.lcs.mit.edu/ 6.893- f2000/project/heo_checkl.pdf 
[187] Fred Chen, Ajay Joshi, Vladimir Stojanovic, Anantha Chandrakasan, 
"Scaling and Evaluation of Carbon Nanotube Interconnects for VLSI 
Application," Proceedings of the 2"'' Intemation Conference on Nano-
networks, Nano-net 2007. 
[188] J. Guo, A. Javey, H. Dai, and M. Lundstrom, "Performance analysis and 
design optimization of near ballistic carbon nanotube field-effect 
transistors," in lEDM Tech. Dig., Dec. 2004, pp. 703-706. 
[189] S. Eachempati, N. Vijaykrishnan, A. Nieuwoudt, Y. Massoud, "Predicting 
the performance and reliability of future field programmable gate arrays 
168 
^ferences 
routing architecture with carbon nanotube bundle interconnect", lET 
Circuits devices systems, Vol.3, No.2, pp. 64-75, April 2009. 
[190] Altera, Datasheet: Configuration Devices for SRAM-Based LUT Devices, 
Feb. 2002. 
[191] M.Sheng and J.Rose, "Mixing buffers and pass transistors in FPGA routing 
architectures," in ACM/SIGDA Int. Symp. On FPGA, pp. 75- 84, Feb. 
2001. 
[192] A. Shibata et al., "Ultra low power supply voltage (0.3 V) operation with 
extreme high speed using bulk dynamic threshold voltage MOSFET (B-
DTMOS) with advanced fast-signal-transmission shallow well," in Symp. 
VLSI Tech. Dig. 1998. 
[193] A. Drake, K. Nowka, R. Brown, "Evaluation of Dynamic-Threshold Logic 
for Low-Power VLSI Design in 0.13um PD-SOI," VLSI-SOC 2003, pp. 
263-266. 
[194] M. Horiuchi, "A dynamic threshold SOI device with a J-FET embedded 
structure and a merged body-bias-control transistor," IEEE Trans. Electron 
Devices, vol. 47, pp. 1587-1598, Aug. 2000. 
[195] H. Kotaki et al, "Novel bulk dynamic threshold MOSFET (B-DTMOS) 
with advanced isolation (SITOS) and gate to shallow-well contact (SSS-C) 
processes for ultra low power dual gate CMOS," in lEDM Tech. Dig., 1996. 
[196] K. Bernstein and N. J. Rohrer, SOI Circuit Design Concepts, Boston: 
Kluwer Academic Publishers, 2001. 
[197] Assaderaghi F., "DTMOS: its derivatives and variations and their potential 
applications," Proceedings of the 12th International Conference on 
Microelectronics, Tehran, Iran, pp.9-10, 2000. 
[198] X. Zenglang et al., "The effect of self-heating on hot-carrier effects in deep 
submicron SOI/NMOS," in the proceedings of 22"'' International 
conference on Microelectronics ,2000,Vol.l,pp.221-223. 
169 
<S^ferences 
[199] A. Bellaouar, A. Fridi, M. J. Elmasry, and K. Itoh, "Supply Voltage Scaling 
for Temperature Insensitive CMOS Circuit Operation," IEEE Transactions 
on Circuits and Systems II, Vol. 45, No. 3, pp. 415-417, March 1998. 
[200] Rohini Krishnan, Jose Pineda de Gyvez, "Low Energy Switch Block for 
FPGAs," Proceedings of the 17th International Conference on VLSI Design 
(VLSID-2004). 
170 
