TUNNEL FIELD EFFECT TRANSISTORS: FROM THEORY TO APPLICATIONS by Li, Mingda
TUNNEL FIELD EFFECT TRANSISTORS: FROM
THEORY TO APPLICATIONS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Mingda Li
May 2018
c© 2018 Mingda Li
ALL RIGHTS RESERVED
TUNNEL FIELD EFFECT TRANSISTORS: FROM THEORY TO
APPLICATIONS
Mingda Li, Ph.D.
Cornell University 2018
The performance of computing systems has been increasingly choked by power
consumption and memory access time within and between system components.
Meanwhile, the explosion of artificial intelligence requires massive data-heavy
computation. Therefore, it is crucial to develop energy efficient computing
from devices to architectures. This work is developed along three streams: a
steep device with low operation voltage, a novel device enabling complex logic
operation, and an efficient modeling algorithm to quickly incorporate emerg-
ing devices into circuit designs. On the first front, tunnel field effect transis-
tors (TFETs), which switch by modulating quantum tunneling, promise sub-60
mV/dec subthreshold swing and operate at low power consumption. Based on
the unique properties of atomically thin 2D layered materials, two-dimensional
heterojunction interlayer tunneling field effect transistor (Thin-TFET) was pro-
posed as a ultra-scaled steep transistor. On the second front, we converted the
“undesirable” ambipolar behavior in TFETs into XNOR logic operation, and
proposed a one-transistor XNOR design: TransiXNOR. On the third front, we
structured artificial neural networks with awareness of device physics, and de-
veloped an accurate, efficient, and generic device compact modeling algorithm:
physics-inspired neural network (Pi-NN).
Mingda Li
Contact
Information
2250 N. Triphammer Rd., Apt. H2F 574-339-1802
Ithaca, NY 14850 ml888@cornell.edu
Research Focus Developing algorithms for physical and empirical modeling of electronic devices
Education Cornell University, Ithaca, NY
Ph.D., Electrical and Computer Engineering, Expected: May 2018
• Thesis Topic: Tunneling Field Effect Transistor: From Theory to Applications
• Commitee: Prof. Huili Grace Xing, Prof. Claire Cardie and Prof. Debdeep Jena
University of Notre Dame, Notre Dame, IN
M.S., Electrical Engineering, Jan 2015
Fudan University, Shanghai, P. R. China
B.S., Microelectronics, July 2012
Publications 1. Qin Zhang, Mingda Li, Edward B. Lochocki, Suresh Vishwanath, Xinyu Liu,
Rusen Yan, Huai-Hsun Lien, Malgorzata Dobrowolska, Jacek Furdyna, Kyle M.
Shen, Guangjun Cheng, Angela R. Hight Walker, David J. Gundlach, Huili G.
Xing, and N. V. Nguyen, “Band offset and electron affinity of MBE-grown SnSe2”,
Applied Physics Letter, 112, 042108, 2018
2. Mingda Li, Ozan I˙rsoy, Claire Cardie, and Huili Grace Xing, “Physics-Inspired
Neural Networks for Efficient Device Compact Modeling”, IEEE Journal on Exploratory
Solid-State Computational Devices and Circuits, vol. 2, pp. 44-49, 2016.
3. Mingda Li, Rusen Yan, Debdeep Jena, and Huili Grace Xing, “Two-dimensional
Heterojunction Interlayer Tunnel FET (Thin-TFET): From Theory to Applications”,
IEEE International Electron Devices Meeting (IEDM), pp. 19.2.1-19.2.4, 2016.
4. Mingda Li, Shudong Xiao, Rusen Yan, Suresh Vishwanath, Susan Fullerton-
Shirey, Debdeep Jena, and Huili Grace Xing, “Fermi Level Tunability of a Novel
2D Crystal: Tin Diselenide (SnSe2)”, Device Research Conference (DRC), pp.
1-2, 2016.
5. Nhan Nguyen, Mingda Li, Suresh Vishwanath, Rusen Yan, Shudong Xiao, Huili
Xing, Guangjun Cheng, Angela Hight Walker, Qin Zhang, “Internal Photoemission
Spectroscopy of 2-D Materials”, APS March Meeting, abstract R46.001, 2016
6. Mingda Li, David Esseni, Gregory Snider, Debdeep Jena, and Huili Grace Xing,
“Two-dimensional Heterojunction Interlayer Tunneling Field Effect Transistors
(Thin-TFETs)”, IEEE Journal of the Electron Devices Society, vol. 3, no. 3, pp.
200-207, 2015.
7. Rusen Yan, Sara Fathipour, Yimo Han, Bo Song, Shudong Xiao, Mingda Li,
Nan Ma, Vladimir Protasenko, David A. Muller, Debdeep Jena, and Huili Grace
Xing, “Esaki Diodes in van der Waals Heterojunctions with Broken-Gap Energy
Band Alignment”, Nano Letters15(9), 5791-5798, 2015
8. Mingda Li, David Esseni, Debdeep Jena, and Huili Grace Xing, “Lateral Transport
in Two-dimensional Heterojunction Interlayer Tunneling Field Effect Transistor
(Thin-TFET)”, Device Research Conference, pp. 17-18, 2014.
1 of 2
9. Shudong Xiao, Mingda Li, Alan Seabaugh, Debdeep Jena and Huili Grace Xing,
“Vertical heterojunction of MoS2 and WSe2”,Device Research Conference, pp.
169-170, 2014
10. Mingda Li, David Esseni, Gregory Snider, Debdeep Jena, and Huili Grace
Xing, “Single Particle Transport in Two-dimensional Heterojunction Interlayer
Tunneling Field Effect Transistor”, Journal of Applied Physics vol.115, pp. 074508,
2014.
11. Debdeep Jena, Mingda Li, Nan Ma, Wan Sik Hwang, David Esseni, Alan Seabaugh,
Huili Grace Xing, “Electron transport in 2D crystal semiconductors and their
device applications”, 2014 Silicon Nanoelectronics Workshop (SNW), Honolulu,
HI, pp. 1-2, 2014
12. Esseni, David, Marco G. Pala, Alberto Revelant, Pierpaolo Palestri, Luca Selmi,
Mingda Li, Gregory Snider, Debdeep Jena, and Huili Grace Xing. ”Challenges
and Opportunities in the Design of Tunnel FETs: Materials, Device Architectures,
and Defects.” ECS Transactions 64, no. 6: 581-595, 2014
13. Huili Grace Xing, Guangle Zhou, Mingda Li, Yiqing Lu, Rui Li, Mark Wistey,
Patrick Fay, Debdeep Jena, Alan Seabaugh, “Tunnel FETs with tunneling normal
to the gate”,Third Berkeley Symposium on Energy Efficient Electronic Systems
(E3S), pp. 1-1, 2013
14. Guo, Jiaojiao, Mingda Li, Qingqing Sun, Wen Yang, Peng Zhou, Shijin Ding,
and David Wei Zhang. “A Water-free Low Temperature Process for Atomic Layer
Deposition of Al2O3 Films.” Chemical Vapor Deposition 19, no. 46: 156-160.
2013
15. Yongjun Li, Mingda Li, Jianshuang Liu, Qingqing Sun, Peng Zhou, Pengfei
Wang, Shijin Ding, David Wei Zhang, “Atomic scale investigation of the abnormal
transport properties in bilayer graphene nanoribbon”, Applied Physics Letters 100
(1), 013110, 2012
Awards Best poster award at The 2nd International Symposium on Devices and Application of
Two-dimensional Materials 2016
Patent
Application
Two-dimensional Heterojunction Interlayer Tunneling Field Effect Transistors (Thin-
TFET)
U.S. Patent Application No. 14/629,222 2015
Work
Experience
Research Scientist, Facebook 2018
• Responsibility: Developing algorithms and models for personalized ranking systems.
2 of 2
This dissertation is dedicated to my parents and Jiayan.
6
ACKNOWLEDGEMENTS
Looking back on the day I first met Dr. Huili Grace Xing and Dr. Debdeep
Jena in Fudan University, Shanghai in 2012. It is their passion for scientific re-
search, relentlessness in the pursuit of new knowledge and rigorous attitude
toward science and engineering that I remembered and have been trying to fol-
low for the last five and half years.
I would like to first express my thanks to Dr. Huili Grace Xing for her
invaluable guidance in research and life. Whenever I find myself doubting
how far I can reach during my Ph.D., I was always lucky enough to have Dr.
Xing’s encouragement to keep me moving on. I appreciate her constructive ad-
vices on my research, which greatly improved the quality and impact of my
research. I am also thankful for the countless hours she has spent on revising
my manuscripts, abstracts and dissertation, mentoring on my writing and pre-
sentation skills. It is only through these productive hours could I have become
a better researcher.
Dr. Debdeep Jena’s device physics classes are the most enjoyable and fruitful
physics courses I have ever token. The skills and knowledge I learned from
him, through both the classes and discussions, have been a important part of
my research. His expertise in semiconductor device physics has greatly widen
the scope of my research.
I am also fortunate to have Dr. Claire Cardie on my committee to make
my studies more comprehensive and rigorous. Her introduction to natural lan-
guage processing class sparks my interests in machine learning. Her inputs help
me complete my first machine learning related paper, which in turn opens me
to a career in machine learning after graduation.
I would also like to thank Dr. David Esseni for generous collaboration on
7
my first paper, Dr. Gregory Snider for teaching me the discipline of clean room
works, Dr. Alan Seabaugh and Dr. Susan Fullerton for insightful suggestions,
Dr. Mert Sabuncu for mentoring me in the lung cancer detection competition
and introducing me several opportunities in medical research, and Dr. Lorenzo
Alvisi for lovely coffee time talk about Italian cars and motorcycles.
I would also like to thank my fellow group members, who have generously
offered much appreciated help on my research. These include: Mingda Zhu, Bo
Song, Shudong Xiao, Xiang Li, Nan Ma, Guangle Zhou, Rusen Yan, Vladimir
Protasenko, Zongyang Hu, Jashan Singhal, Hyunjea Lee, Malavika Attaluri,
Ozan Irsoy, Wenshen Li, Suresh Vishwanath, Kazuki Nomoto, Kevin Lee, Jia
Guo, Yuanzheng Yue, Guowang Li, Pei Zhao, Amit Verma, Meng Qi, Satyaki
Ganguly, Moudud SM Islam, Wenjun Li, Wansik Hwang, Jimy Joe Encomendero
Risco, Alexander Chaney, Brian Schutter, Sam Bader, Henryk Turski, Zexuan
Zhang, Nicholas Tanen, Reet Chaudhuri, Shyam Bharadwaj, Joseph Casamento,
John Wright.
I am thankful to the Center for Low Energy Systems Technology (LEAST)
sponsored by the Semiconductor Research Corporation (SRC) and the Defense
Advanced Research Projects Agency (DARPA), and the EFRI 2-DARE program
funded by National Science Foundation. Sincere thanks go to them for creat-
ing these great research programs, which not only allowed me to carry out the
research, but also to have interacted with experts in different fields.
I would also like to thank all the staff members in CNF at Cornell and NDNF
at Notre Dame. Without their professional work that keeps the facilities in good
operating conditions, I would not be able to complete the experiments presented
in this work.
Last but not least, I would like to expressed my most sincere gratitude to my
8
parents and Jiayan for their unconditional support and encouragement. I thank
them for making this work possible and a whole lot more.
9
TABLE OF CONTENTS
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1 Introduction 1
1.1 Scalability of Power Consumption . . . . . . . . . . . . . . . . . . 1
1.2 Steep Slope and Tunneling . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Previous Works on TFETs . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Devices and Machine Learning . . . . . . . . . . . . . . . . . . . . 8
1.5 Brief Outline of This Work . . . . . . . . . . . . . . . . . . . . . . . 10
Bibliography 11
2 Physical Modeling of Two-dimensional Heterojunction Interlayer
Tunneling FETs (Thin-TFETs) 19
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Modeling of the Tunneling Transistor . . . . . . . . . . . . . . . . 22
2.2.1 Device Concept and Electrostatics . . . . . . . . . . . . . . 22
2.2.2 Transport Model . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Effects of Energy Broadening . . . . . . . . . . . . . . . . . 30
2.2.4 Rotational Misalignment and Tunneling Between In-
equivalent Extrema . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.5 An Analytical Approximation for the Tunneling Current . 37
2.3 Numerical Results for the Tunneling Current . . . . . . . . . . . . 40
2.3.1 Parabolic Band Approximation . . . . . . . . . . . . . . . . 40
2.3.2 Effects of Correlation Lengthes, Interlayer Thicknesses
and Energy Broading . . . . . . . . . . . . . . . . . . . . . 41
2.4 N-type and P-type Thin-TFETs . . . . . . . . . . . . . . . . . . . . 46
2.4.1 Effects of Non-uniform van der Waals Gap Thickness and
Access Resistance . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.2 Capacitance Evaluation . . . . . . . . . . . . . . . . . . . . 53
2.4.3 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . 58
2.5.1 Experimental Insights . . . . . . . . . . . . . . . . . . . . . 58
2.5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Bibliography 62
10
3 Comparative Study of Intrinsic Capacitances of Thin-TFETs and pin-
TFET 70
3.1 Enhanced Miller Effect of TFETs . . . . . . . . . . . . . . . . . . . . 70
3.2 Effects of TFET Geometries: “lateral” TFETs vs. “vertical” TFETs . 70
3.3 Numerical Simulations of C-V Curves . . . . . . . . . . . . . . . . 74
3.3.1 Simulation Methods . . . . . . . . . . . . . . . . . . . . . . 74
3.3.2 Simulation Results with Different Undercut/Underlap
Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 Complimentary TFET Inverters . . . . . . . . . . . . . . . . . . . . 80
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Bibliography 85
4 XNOR-enabled Transistor (TransiXNOR) for Binarized Neural Net-
work Accelerator 86
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2 Dual-gated XNOR-enable Transistor: TransiXNOR . . . . . . . . . 90
4.2.1 Device Working Principle . . . . . . . . . . . . . . . . . . . 90
4.2.2 Simulation Approach . . . . . . . . . . . . . . . . . . . . . 92
4.2.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . 93
4.3 TransiXNOR Crossbar Architecture for Binary matrix-vector
Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Bibliography 101
5 Artificial Neural Networks (ANNs) for Device Compact Modeling 105
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.1 Low Current Regime Challenge . . . . . . . . . . . . . . . 110
5.3 The Idea of Pi-NN: Structured Physical System . . . . . . . . . . . 113
5.4 Adjoint Sensitivity Network . . . . . . . . . . . . . . . . . . . . . . 116
5.5 Weighted L1 Loss Function . . . . . . . . . . . . . . . . . . . . . . 122
5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.6.1 Modeling of GaN HEMT . . . . . . . . . . . . . . . . . . . 124
5.6.2 Modeling of Thin-TFET . . . . . . . . . . . . . . . . . . . . 132
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Bibliography 135
6 Future Works 138
6.1 Non-ideal effects in Thin-TFETs . . . . . . . . . . . . . . . . . . . . 138
6.2 Experimental Demonstration of TransiXNOR . . . . . . . . . . . . 140
6.3 Adjoint Network as Regularization in Pi-NN . . . . . . . . . . . . 141
11
Bibliography 142
12
LIST OF TABLES
2.1 The band gaps, electron affinities and effective masses used for
MoS2 and WTe2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2 Benchmarking Parameters . . . . . . . . . . . . . . . . . . . . . . 56
3.1 The material and device parameters of pin-TFETs and Thin-
TFETs in the simulation. . . . . . . . . . . . . . . . . . . . . . . . 76
13
LIST OF FIGURES
1.1 (a) Strong scaling of power consumption: Using the same num-
ber of transistor, compute a fixed-size of operations per second
with N times less power consumption; (b) Weak scaling of power
consumption: Using N times more transistors, compute a N
times bigger size of operations per sec with the same power con-
sumption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 (a.0) The schematic structure of an n-channel MOSFET; the band
diagrams of source and channel when the MOSFET is (a.1) OFF
and (a.2) ON. The orange shapes represent free electron distribu-
tions in the source conduction band. (b.0) The schematic struc-
ture of a n-type TFET; the band-diagrams of source and chan-
nel: (b.1) the TFET has no tunnel window, however the leakage
current is due to the band tail states; (b.2) the TFET has tunnel
window, however due to the long tunneling distance, the tunnel
current is still small; (b.3) the TFET is ON. The orange shades
represent the band tail states. . . . . . . . . . . . . . . . . . . . . 5
1.3 The interactions between physic, device, system and machine
learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 (a) Schematic device structure for the Thin-TFET, where VTG, VBG
and VDS are the top gate, bottom gate and drain to source volt-
ages; (b) sketch of the band diagram, where ΦM,T , ΦM,B are the
work-functions and EF,MT , EF,MB the Fermi levels of the metal
gates, while χ2D,T , χ2D,B are the electron affinities, EFT , EFB the
Fermi levels, ECT , ECB the conduction band edges and EVT , EVB
the valence band edges respectively in the top and bottom 2D
layer. VTOX, VIOX and VBOX are the potential drops respectively
across the top oxide, interlayer and bottom oxide. . . . . . . . . 23
2.2 Sketch of the band alignments in a Thin-TFET between the top
and bottom 2D layer in: (a) OFF state and (b) ON state. . . . . . 24
2.3 Sketch of a possible rotational misalignment between the top and
bottom 2D layer, x-y is the reference coordinate for the bottom
2D layer and x’-y’ is the reference coordinate for the top 2D layer.
θ is the rotational misalignment angle. We assume the top layer
and the bottom layer have the same lattice constant a0. . . . . . . 34
2.4 (a) Band structure for hexagonal monolayer MoS2 and (b) hexag-
onal monolayer WTe2 as obtained using DFT method described
in the paper of C. Gong et.al.18 The dashed lines represent the an-
alytical approximation obtained with a parabolic effective mass
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
14
2.5 Numerical results of (a) band alignment versus the top gate volt-
age VTG and (b) tunnel current density versus the top gate volt-
age VTG for different values of the correlation length LC. The pa-
rameters used in (b) are: matrix element is MB0 = 0.01 eV ; decay
constant of wave-function in the interlayer is κ = 3.8 nm−1; energy
broadening is σ = 10meV and interlayer thickness is TIL = 0.6 nm
(e.g. 2 atomic layers of BN). VBG = 0 and VDS = 0.3V in both (a)
and (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6 Numerical calculations for: (a) current density versus VTG with
several interlayer thicknesses; (b) current density versus VTG
with different values of energy broadening σ. The insert shows
that SS increases with σ, and a SS value of 60 mV/dec corre-
sponds to a energy broadening as high as 40 meV. The matrix
element is MB0 = 0.01 eV ; the decay constant of wave-function
in the interlayer is κ = 3.8 nm−1. In (a) the energy broadening is
σ = 10meV . In (b) the interlayer thickness is TIL = 0.6 nm (e.g. 2
atomic layers of BN). VBG = 0 and VTG = 0.3V in both (a) and (b). 44
2.7 An example to realize both n-type and p-type Thin-TFETs us-
ing one pair of 2D semiconductors (2H-WSe2 and 1T-SnSe2) with
near broken gap band alignment. For the n-type Thin-TFET,
SnSe2 is the top (i.e. drain) 2D layer and WSe2 is the bottom
(i.e. source) 2D layer, along with the top and back gate labeled
as n-type in blue. While for the p-type Thin-TFET, WSe2 is the
top (i.e. drain) 2D layer and SnSe2 is the bottom (i.e. source)
2D layer, along with the top and back gate labeled as p-type in
red; Band gaps, electron affinities, effective masses are shown for
WSe2 and SnSe2. The n-type and p-type metal work functions are
tuned to give symmetric threshold voltages for the n-type and p-
type Thin-TFETs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.8 For the n-type and p-type Thin-TFETs shown in Fig. 2.7: (a) the
band alignment versus VTG; (b) Current density versus VTG, the
average SS is calculated from 10−3 µA/µm to 10 µA/µm; (c) the
current density versus VDS at various VTG; (d) the transconduc-
tance versus VTG; (e) the carrier concentration in the top and bot-
tom 2D layers versus VTG at various VDS ; (f) the quantum capac-
itances of the top and bottom 2D layers versus VTG at various
VDS ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.9 Effect of van der Waals gap thickness variation on a p-type Thin-
TFET: (a) tunnel current density versus VTG for different van der
Waals gap thicknesses TvdW ; (b) differential SS versus current
density assuming an evenly distributed van der Waals gap thick-
ness TvdW in the specified range. . . . . . . . . . . . . . . . . . . . 51
15
2.10 Effect of total access resistance on a p-type Thin-TFET: (a) ID ver-
sus VTG and (b) ID versus VDS with various total access resistance
RC values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.11 Capacitance network model of the Thin-TFET . . . . . . . . . . . 53
2.12 For the p-type Thin-TFET, (a) CGD and CGS versus VDS at
VTG=−0.2, −0.3, −0.4 V; (b) CGD and CGS versus VTG at VDS=−0.2,
−0.3, −0.4 V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.13 The intrinsic switching energy and delay for HP CMOS, LP
CMOS, HetJTFET, HomJTFET and Thin-TFETs with VDD=0.2,
0.3, 0.4 V and RC=52, 320 Ωµm. . . . . . . . . . . . . . . . . . . . . 57
3.1 Schematic structure of (a) n-type pin-TFET and (a) n-type Thin-
TFET. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2 Comparison between pin-TFET and Thin-TFET: (Column 1) De-
vice schematics with the tunneling area highlighted. Thin-TFET
has a larger tunneling area, which can potentially render a higher
tunnel current; (Column 2) 1D intrinsic capacitance networks,
where CG is the gate capacitance of the pin-TFET, CQ is the quan-
tum capacitance of the channel material in the pin-TFET,CT (B)G is
the top (bottom) gate capacitance of the Thin-TFET, CQ,T (B) is the
quantum capacitance of the top (bottom) material, and CInterlayer
is the interlayer capacitance between the top and bottom materi-
als in the Thin-TFET; (Column 3 & 4) analytical expressions for
CGD and gate efficiency. . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3 (a)CGD versus VG at different VDS for both pin-TFETs (black lines)
and Thin-TFETs with different underlap and undercut lengths
receptively; (b) CGS versus VG at different VDS or both pin-TFETs
(black lines) and Thin-TFETs with different underlap and un-
dercut lengths receptively; (c) CGG versus VG at different VDS or
both pin-TFETs (black lines) and Thin-TFETs with different un-
derlap and undercut lengths receptively; (d) CGD versus under-
lap/undercut length at different VDS and VG = 0.4V . . . . . . . . 78
3.4 (a) Gate efficiencies versus the drain voltages VDS for both pin-
TFETs and Thin-TFETs, the solid lines are the gate efficiency at
the threshold while the dash lines are the average gate efficiency
when swiping VG from 0 to 0.4 V; (b) the threshold voltages ver-
sus the drain voltages VDS for both pin-TFETs and Thin-TFETs,
the increasing threshold voltages at smaller VDS lead to the non-
linear onset in the output characteristics. . . . . . . . . . . . . . . 80
16
3.5 (a) The threshold voltages versus the drain voltages for Thin-
TFETs with different undercut lengths. The red solid line is the
threshold voltages computed at the center of the channel (shown
in (b)), the dash lines are the threshold voltages computed at the
drain-side edge (shown in (b)). The differences between the dash
lines and the red solid line indicate the non-uniformity of the
threshold voltages along the channel of Thin-TFETs. . . . . . . . 81
3.6 The schematic layout of the complementary TFET (CTFET) in-
verter. CGD,n and CGD,p are the gate-to-drain capacitance of the
n-type and p-type TFETs. CL is the load capacitance. . . . . . . . 81
3.7 (a)-(d) the input and output voltages of pin-TFETs and Thin-
TFETs based CFET inverter versus time with different ON cur-
rent density and load capacitance; (e)-(h) the current density of
pin-TFETs and Thin-TFETs in the inverters versus time with dif-
ferent On current density and load capacitance. . . . . . . . . . . 83
4.1 The RRAM crossbar architecture. X is the input voltages signals,
W is the weight matrix whose elements are the RRAM conduc-
tivities, and Y is the output current signals. The relationship of
X, W, Y is shown in Eq.4.1 . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 (a) The schematic structure of TransiXNOR; (b) the band dia-
grams at different gate bias conditions of TransiXNOR when VDS
equals VDD: (left) the channel/drain tunnel junction is ON when
both VTG and VBG are 0; (right) the source/channel tunnel junc-
tion is ON when both VTG and VBG are VDD; (middle) both the
channel/drain tunnel junction and source/channel tunnel junc-
tion are OFF at the bias conditions such as both VTG and VBG are
VDD/2, or one gate is VDD and the other is 0; (c) The schematic
mapping of transiXNOR ON/OFF states at different VTG and VBG
when VDS is VDD, which resembles XNOR logic. . . . . . . . . . . 91
4.3 (a.1) The I-V curve of the drain current IDS versus VTG when VBG =
0 V and VDS = 0.2 V; (a.2) The band diagram and (a.3) the current
spectrum when VDS = 0.2 V and both VTG = VBG = 0 V. (b.1) THe
I-V curve of the drain current IDS versus VTG when VBG = 0.1 V
and VDS = 0.2 V; (b.2) The band diagram and (b.3) the current
spectrum when VDS = 0.2 V and both VTG = VBG = 0.1 V. (c.1) THe
I-V curve of the drain current IDS versus VTG when VBG = 0.2 V
and VDS = 0.2 V; (c.2) The band diagram and (c.3) the current
spectrum when VDS = 0.2 V and both VTG = VBG = 0.2 V. . . . . . 94
17
4.4 (a.1) The family characteristic of TransiXNOR with various VTG
at VBG = 0.2 V; (a.2) The band diagram and (a.3) the current spec-
trum when VDS = 0.1 V and both VTG = VBG = 0.2 V. (b.1) The
family characteristic of TransiXNOR with various VTG at VBG = 0
V; (b.2) The band diagram and (b.3) the current spectrum when
VDS = 0.1 V and both VTG = VBG = 0 V. . . . . . . . . . . . . . . . . 96
4.5 A grid of 2D mappings of IDS along both VTG and VBG axes at dif-
ferent VDS. The coloring represents the current density in loga-
rithm. When VDS is larger than 0.1 V (half VDD), the transiXNOR
resembles XNOR logic; and when VDS is smaller than 0.1 V, the
transiXNOR resemble AND logic. . . . . . . . . . . . . . . . . . . 97
4.6 The XNOR cell built with TransiXNOR and RRAM. The bit line
and work line are used to write to the RRAM. The RRAM is in
series with a regular resistor. After writing each element Wk,n
of the 2D binary weight matrix W to the each RRAM, the word
line is set floating, and the bit line is set to ground. During the
computing, the source line is set to VDD, and each element Xk
of the input vector X is set through each input line in parallel.
The current entering the output line represent the XNOR result
of Wk,n and Xk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.7 The XNOR array to compute Y=W×X in the constant time. . . . 98
5.1 The Multiplayer Perception (MLP) neural network model . . . . 111
5.2 A training procedure for Artificial Neural Network (ANN) de-
vice compact modeling. . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3 The compact model of the n-type Thin-TFET derived based
on the MLP neural network widely used in previous works,3–5
(a) the training errors and test errors for a variety of hyper-
parameters; (b) the MLP neural network with 7 tanh neurons
in the first and second hidden layers. From (c) to (f), the I-V
curves generated by the MLP neural network shown in (b) are
plotted along with the training data and the test data: (c) IDS
versus VDS at different VTG; (d) IDS versus VTG at different VDS
in linear scale; (e) IDS versus VDS at different VTG around VDS =
0 V, the embedded plot shows unphysical IDS-VDS relationships
around VDS equals 0; (f) IDS versus VTG at different VDS in semi-
log scale, unphysical oscillation of IDS around zero appears in the
sub-threshold region and when VDS = 0 V. . . . . . . . . . . . . . 114
5.4 The architecture of Pi-NN. The shaded area indicates a Pi-NN
block, which is the building block of Pi-NN network. . . . . . . 115
5.5 The Physics-Inspired Neural Network (Pi-NN) model. . . . . . . 117
5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
18
5.7 (a) The adjoint network of a fully connected (FC) layer with sig-
moid activation functions, where β=∇γy, γ is the output of FC
layer, and y is the outputs of the neural network; (b) The adjoint
network of a Pi-NN block, where β=∇γy and α=∇δy, γ is the out-
put of FC layer in sig subnet, δ is the output of FC in tanh subnet,
and y is the outputs of the neural network. . . . . . . . . . . . . . 121
5.8 Construction of the weighted L1 loss with max scale loss limit. . 123
5.9 The I-V characteristics of a GaN HEMT: (a) IDS versus VDS at dif-
ferent VGS in the linear scale and (b) in the log scale; (c) IDS versus
VGS at different VDS in the linear scale and (d) in the log scale. . . 125
5.10 (a) The weighted L1 metric versus max loss scale. Each red circle
represents the training weighted L1 metric, and each blue circle
represents the evaluation weighted L1 metric. The blue and red
line are the average value of each runs. (b) The L1 metric versus
max loss scale. Each red circle represents the training L1 metric,
and each blue circle represents the evaluation L1 metric. The
blue and red line are the average value of each runs. . . . . . . . 126
5.11 Each blue circuit represents one run, and the dash line is the av-
erage value of multiple runs. (a) The train weighted L1 losses
(with max loss scale limit) versus different base learning rates;
(b) The evaluation weighted L1 losses (with max loss scale limit)
versus different base learning rates; (c) The train L1 metric ver-
sus different base learning rates; (b) The evaluation L1 metric
versus different base learning rates; (a) The train weighted L1
metric (without max loss scale limit) versus different base learn-
ing rates; (b) The evaluation weighted L1 metric (without max
loss scale limit) versus different base learning rates. . . . . . . . 127
5.12 (a) The training/evaluation loss (weighted L1 loss with max loss
scale limit) versus epochs; (b) the training/evaluation weighted
L1 metric (without max loss scale limit) versus epochs. . . . . . 128
5.13 The I-V curves generated by the Pi-NN model are plotted along
with the training data (blue circles) and the evaluation data (red
circles): (a) IDS versus VDS at different VGS in the linear scale and
(b) in the log scale; (c) IDS versus VGS at different VDS in the linear
scale and (d) in the log scale. . . . . . . . . . . . . . . . . . . . . . 130
5.14 The I-V curves generated by the Pi-NN model are plotted along
with the training data (blue circles) and the evaluation data (red
circles) for IDS versus VDS at different VGS in the linear scale. VDS
for the model are extended to 80% beyond the training VDS range
and VTGextended to +/- 40% beyond the training VGS range. . . 131
19
5.15 (a) The partial derivatives of the drain current with respect to the
gate voltage (transconductance) versus gate voltage at different
drain voltages; (b) the partial derivatives of the drain current
with respect to the drain voltage (output conductance) versus
the drain voltage at different gate voltages. The red arrow line in
(a) indicates the peak transconductance voltage shifts at different
drain voltages, which can be explained by the combination of
self-heating effect and DIBL effect. . . . . . . . . . . . . . . . . . 132
5.16 For the Pi-NN developed in this work, (a) the training errors
and test errors for a variety of hyper-parameters. (b) the Pi-NN
model with 2 tanh neurons and 3 sigmoid neurons in the hidden
layer. From (c) to (f), the I-V curves generated by the Pi-NN
model shown in (b) are plotted along with the training data and
the test data: (c) IDS versus VDS at different VTG; (d) IDS vs. VTG
at different VDS in linear scale; (e) IDS vs. VDS at different VTG
around VDS = 0, the embeded plot shows well-behaved IDS-VDS
relationship around VDS = 0; (f) IDS vs. VTG at different VDS in
semi-log scale, good fitting is achieved in the sub-threshold re-
gion. All the unphysical behaviors of the MLP neural network
(shown in Fig.5.3) are eliminated, and the size of the neural net-
work is largely reduced. . . . . . . . . . . . . . . . . . . . . . . . 134
6.1 (a) IDVG curves of the measured WSe2 parasitic MOSFET, the
WSe2/SnSe2 Thin-TFET (TFET + MOSFET), and the intrinsic
WSe2/SnSe2 TFET, the insets show the optical image of the
device and the equivalent circuit with the parasitic MOSFET;
(b) the corresponding SS curves for the parasitic MOSFET, the
WSe2/SnSe2 Thin-TFET (TFET + MOSFET), and the intrinsic
TFET. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
20
CHAPTER 1
INTRODUCTION
1.1 Scalability of Power Consumption
What is a device? Different researchers may have their own answers. I would
argue devices are the implementation of physics. A device utilizes a set of phys-
ical effects to control a set of physical state variables with another set of physical
state variables. The first generation computers used vacuum tubes for circuitry
and magnetic drums for memory. Vacuum tubes utilized thermionic emission
of electrons to control current with voltage (as in tetrode tubes1) or with di-
rection of electrons (as in diodes2). Memory drums utilized ferromagnetism to
control magnetic orientation with electric fields. Since then, more than 70 years
went by, enormous amount of devices have been proposed, implemented, and
become ubiquitous. In this work, we introduce an electronic device: tunnel
field effect transistors (TFETs), which utilizes quantum tunneling and thermal
equilibrium statistics to control current with voltage.
TFETs are proposed to battle the high power consumption challenge in very-
large-scale integration systems (VLSI). Power consumption in VLSI consists of
both static and dynamic power consumption. Dynamic power is proportional to
C fV2, where C is the load capacitance, f is the clock frequency, and V is the sup-
ply voltage. Static power consumption is due to the off-state leakage. It mainly
consists of two parts: subthreshold leakage and the gate leakage. High-k gate
dielectrics3 has been developed to mitigate the gate leakage problem. Thus the
subthreshold leakage is the most important static power problem. Therefore,
solving the power consumption problem comes down to lower the operation
1
voltage while keeping the subthreshold leakage very low and ON/OFF ratio
high. Inevitably, the operation voltage is limited by the subthreshold slope,
namely the sharpness of the ON/OFF switching. In practice, there are two types
of power consumption scaling: strong scaling and weak scaling (shown in Fig.1.1).
When two transistors have the same leakage current and ON current, steeper
subthreshold slope means smaller operation voltage is required. Smaller op-
eration voltage means less power consumption. Same ON current means same
performance (operations per second). Therefore, we can compute a fixed-size of
operations per second more energy efficiently using the transistor with steeper
subthreshold slope. This is called strong scaling of power consumption. However,
sometimes the transistor with steeper subthreshold slope has smaller ON cur-
rent value. If we operate at lower ON current value, steeper subthreshold slope
can lead to smaller operation voltage. Although we won’t be able to achieve the
same performance due to lower ON current, the reduced energy consumption
per transistor can allow us to compute a bigger size of operations per second
with more transistors within the same energy budget. Current, TFETs have only
achieved weak scaling.
1.2 Steep Slope and Tunneling
Current VLSIs use metal-oxide-semiconductor field-effect transistors (MOS-
FETs) as the fundamental building blocks. Consider the basics of MOSFET op-
eration, figure 1.2(a.0) shows a schematic structure of an n-channel MOSFET.
When the device is OFF (shown in Fig.1.2(a.1)), a high energy barrier exists be-
tween the source and drain. The electron distribution is a product of the Fermi-
Dirac distribution and the electron density of states (DOS). Only the electrons
2
?????????????
???????
??????????
??????
CONTROL VOLTAGE
O
UT
PU
T 
CU
RR
EN
T
?????????????
???????
??????????
??????
CONTROL VOLTAGE
O
UT
PU
T 
CU
RR
EN
T
[a] [b]
Strong Scaling Weak Scaling
Figure 1.1: (a) Strong scaling of power consumption: Using the same number
of transistor, compute a fixed-size of operations per second with N times less
power consumption; (b) Weak scaling of power consumption: Using N times
more transistors, compute a N times bigger size of operations per sec with the
same power consumption.
above the energy barrier in the source can flow to channel. This leakage (diffu-
sion) current flows from source to drain. When positive gate voltage is applied
(shown in Fig.1.2(a.2)), the barrier for majority carrier diffusion from source to
channel is reduced. The lower the barrier, the more electrons can flow from
source to channel, so is the diffusion current. The device is therefore ON. Since
DOS is a slowly increasing function in the conduction band, the increasing rate
of the electron density above the barrier is dominated by the Fermi-Dirac distri-
bution. At room temperature (i.e. 300 K), the cumulative probability of Fermi-
Dirac distribution increases at the rate of 60 mV/dec, which gives the famous
60 mV/dec subthreshold swing limit in MOSFETs. On the other hand, the sub-
3
threshold slope of TFETs is no longer limited by the Fermi-Dirac distribution by
using tunneling instead of thermal emission as the transport mechanism. Figure
1.2(b.0) shows a schematic structure of an n-type TFET. When the device is OFF
(shown in Fig.1.2(b.1)), the source valance band source is below the channel con-
duction band, therefore, ideally there is no empty state the electron in the source
can tunnel into. However, in the real materials, the band edge has a finite broad-
ening, which we refer to as the band tail. Therefore, due to the electrons in the
band tail of the source valance band tunneling into the empty states in the band
tail of the channel conduction band, the resulting current in the OFF state is the
leakage current of the TFETs. The subthreshold slope of TFET is fundamentally
limited by the sharpness of the band tails. When positive voltage is applied
(shown in Fig.1.2(b.2-3)), the source valence band edge moves below the chan-
nel conduction band edge, therefore electrons in the source valence band can
tunnel into the empty states in the channel conduction band. We refer the en-
ergy difference between the source valance band edge and the drain conduction
edge the tunneling window. In MOSFET, the transport probability of electrons
above the energy barrier is approximate one. Unfortunately, for TFET, there is
always a finite energy barrier in the transport direction. Therefore, the trans-
port probability is much smaller than one and exponentially dependent on the
height and the thickness of the tunneling barrier. Therefore, the tunneling bar-
rier modulation also affects the subthreshold slope of TFETs. From Fig.1.2(b.2)
to Fig.1.2(b.3), the widening tunneling window and the thinning tunneling bar-
rier contributes to increasing tunneling current. Ideally, we prefer the ON/OFF
of a TFET purely controlled by the tunneling window. In Chapter 2, we will
introduce a novel TFET where the tunneling current is purely controlled by the
tunneling window.
4
Fermi Level
[a.1]
MOSFET 
SS Limited by Thermal Tail
??
??
Fermi Level
??
??
Fermi Level
[b.1]
TFET
SS Limited by Band Tail
??
??
Fermi Level
??
??
Fermi Level
??
??
[a.2] [b.2] [b.3]
Source Channel Source Channel Source Channel Source Channel Source Channel
Gate
Insulator
Source Channel Drain
?? ?
Gate
Insulator
Source Channel Drain
???? ?????????
[a.0] [b.0]
??
Figure 1.2: (a.0) The schematic structure of an n-channel MOSFET; the band
diagrams of source and channel when the MOSFET is (a.1) OFF and (a.2) ON.
The orange shapes represent free electron distributions in the source conduction
band. (b.0) The schematic structure of a n-type TFET; the band-diagrams of
source and channel: (b.1) the TFET has no tunnel window, however the leakage
current is due to the band tail states; (b.2) the TFET has tunnel window, however
due to the long tunneling distance, the tunnel current is still small; (b.3) the
TFET is ON. The orange shades represent the band tail states.
1.3 Previous Works on TFETs
In term of material systems, Group IV materials such as Si4 and Ge5 exhibit
smaller ON-currents resulting from their indirect band gaps and lower tun-
neling probability. Group III-V materials like InGaAs,6 InAs,7 and InSb8 have
higher ON currents due to their narrower and direct band gaps. The use of stag-
gered and broken-gap heterojunctions, such as AlGaSb/InAs,9, 10 InAs/GaSb11
and InP/GaAs,12 boosts the ON-current by reducing the tunneling barrier. Fol-
lowing the successful exfoliation of graphene,13 other two-dimensional layered
materials soon attract a lot of attention in the device community, such as metal
chalcogenides,14 hexagonal boron nitride (hBN),15 black phosphorus (bP).16
5
Among their distinguishing and fascinating properties, their layer-dependent
physical behaviors, atomic thin bodies and free of dangling bonds make them
suitable materials to build TFETs. Band-to-band tunneling (BTBT) was demon-
strated in dual-gated MoS2/WSe2 van der Waals junction by Roy et al.,17 in black
phosphorus/SnSe2 junction by Yan et al.,18 and in graphene nano-ribbon by
Hamam et al.19 Sarkar et al.20 first demonstrated sub-60 mV/dec in a TFET
with Ge/MoS2 tunnel junction, although the sub-60 mV/dec only occurs when
the drain current density is below 10−4 µA/µm. Various TFETs based on van
der Waals heterojunctions of 2D layered materials have experimentally demon-
strated, such as black phosphorus/MoS2 by Xu et al.,21 MoS2/WSe2 by Roy et
al.,22 and SnSe2/WSe2 by Yan et al.23 However, only Yan’s work reported sub-60
mV/dec at the room temperature. Recently, Li et al.24 exploited the polarization
in III-nitride heterojunctions such as GaN/InGaN/GaN to design TFETs.
Besides optimizing the material system, numerous device structures have
been proposed recently to boost TFETs’ performance. In general, TFET de-
vice structures can be divided to two categories: “lateral” TFETs vs. “verti-
cal” TFETs. In lateral TFETs, tunneling direction is perpendicular to gate elec-
tric field and in vertical TFETs, tunneling direction and gate electric field are
aligned. It is worth to note that many researchers use the word “vertical” to
refer the orientation of the channel, but we suggest to use the tunneling di-
rection relative to the gate electric field instead of the spatial orientation to
classify the TFET structures. Among the lateral TFETs, high ON current of
10 µA/µm and subthreshold slope of 48 mV/decade have been achieved in
InAs/GaAsSb/GaSb nanowire TFET by Memisevic et al.25 In vertical TFETs,
the tunneling occurs cross an area instead of a line in lateral TFETs, vertical
TFETs can achieve high ON current of 180 µA/µm by Zhou et al.,10 although
6
the vertical TFET takes much larger footprint than Memisevic’s lateral TFET.
Based on those two canonical structures, many variations have been proposed.
Imenabadi et al.26 proposed Z-shaped TFETs to take an advantage of high ON
current in vertical TFETs without increasing the device footprint. Wang et al.27
and Shih et al.28 developed the U-shaped TFETs to enhance ON current while
suppressing leakage current. On top of the different tunneling directions, re-
searchers often engineer the band diagram in the channel to further optimize the
TFET performance. Inserting a “pocket” between the source and channel have
been proposed to steepen the subthreshold slope and enhance the ON current
by Huang et al.29 and Long et al.,30 and inserting a “pocket” between the chan-
nel and the drain to reduce the gate-drain capacitance by Kwon et al.31 Special
Engineered band alignments in the tunneling direction by Long et al.32 and per-
pendicular to the tunneling direction by Zhao et al.33 are proposed to enhance
ON current, steepen subthreshold slope, and suppress leakage current. In thin
body channel materials such as 2D layered materials, doping can be achieved
by charge transfer or electrostatic doping from adjacent layers or plates, such as
the dielectric engineered TFET34 and junctionless TFET.35
The performance of TFETs is often affected by several non-ideal effects.
Huang et al.36 looked into the reliability issue due to dielectric breakdown in
TFETs. As the fundamental attribute to the subthreshold steepness, effects of
band tails37–40 have been studied. Besides the band tail, trap assisted tunnel-
ing,41 recombination,41 and interface roughness42 also have significant impacts
on the TFET performance. In order to mitigate those non-ideal effects, it is es-
sential to have high quality materials and advanced fabrication techniques, as
well as suitable device designs.43
7
Physics
Device
ML
Quantum 
mechanics, 
Crystallography, 
Electromagnetism,
Thermodynamics, ......
Process experimental data, ... 
(e.g. LHC data, gravitational-wave data)
?
Quantum machine learning, 
energy-based model
System
CPU, GPU,
FPGA, ASIC
(e.g. TPU, 
DaDianNao)
Design and 
verication 
of circuits 
and systems
Energy-ecient transistor, 
Non-volatile memory, Silicon photonic, ...... 
Figure 1.3: The interactions between physic, device, system and machine learn-
ing.
TFETs have been explored and evaluated in many circuit/system applica-
tions, including TCAM,44 analog/mixed-signal circuits,45–47 spiking neural net-
work,48 active pixel sensors,49 and GPU register file.50
1.4 Devices and Machine Learning
The increasing computing power, algorithm advances, and the explosion of dig-
ital data fuel machine learning techniques as powerful methods to find patterns
in the data. The interactions between physic, device, system and machine learn-
ing are illustrated in Fig.1.3.
As discussed above, devices are the implementation of physics, especially
8
condensed matter physics. On the other hand, devices are important platforms
for physics researches. Recently, physics and machine learning algorithms be-
come more and more connected. Quantum mechanism is proposed to enhance
classical machine learning algorithms.51, 52 Moreover, physical concepts such as
entropy and energy have been widely used to design and reason machine learn-
ing algorithms.53, 54 On the other hand, machine learning has been successfully
applied to analyze experimental data, such as searching new particles in the
Large Hadron Collider (LHC),55 discovering gravitational waves,56 and various
topics in the condensed matter physics.57–63
As one of the driving forces of machine learning popularity, device inno-
vations keep influencing the system designs, which ultimately enable efficient
hardwares for machine learning applications. For instance, FinFETs64 have be-
come the building blocks the latest generation of CPUs and GPUs. Non-volatile
memories have promised to greatly reduce the memory access energy and de-
lay,65 therefore open the possibility of novel architectures for machine learning
accelerators.66 The use of photonics in networking has been considered one of
many possible solutions for handling the growing demands on data centers.67
Various architectures have been developed to accelerate machine learning al-
gorithm, particular neural networks.68–70 On the other hand, machine learning
algorithms have become a popular method to enable fast, accurate design and
verification of electronic systems.71, 72
Apparently, device and machine learning researches are indirectly interact-
ing with each other through physics and systems. However, how to apply ma-
chine learning algorithms directly to help accelerating device researches and
how to design certain devices to directly solve machine learning problems re-
9
main a open questions. In this work, we proposed a novel device for binary
neural network accelerator and we developed a deep learning framework for
efficient and accurate device modeling.
1.5 Brief Outline of This Work
In Chapter 2, two-dimensional heterojunction interlayer tunnel FET (Thin-
TFET) is introduced. Thin-TFETs were positioned as an ultra-scaled steep tran-
sistor to offer high ON current and steep subthreshold slope. In Chapter 3,
we investigated the Miller effect in vertical and lateral TFETs and discovered
that vertical TFETs intrinsically have smaller Miller effect than lateral TFETs.
In Chapter 4, we proposed TransiNXOR, which utilized the tunneling at both
the source/channel junction and the channel/drain junction to enable exclu-
sive not or (XNOR) logic operation in a single transistor. In Chapter 5, physics-
inspired neural network (Pi-NN) is developed to learn efficient and accurate
device model from experimental or simulation data.
10
BIBLIOGRAPHY
1 J. Linvill and L. Schimpf, “The design of tetrode transistor amplifiers,” Bell
Labs Technical Journal, vol. 35, no. 4, pp. 813–840, 1956.
2 J. A. Fleming, “Instrument for converting alternating electric currents into
continuous currents.” Nov. 7 1905, uS Patent 803,684.
3 S. Natarajan, M. Armstrong, M. Bost, R. Brain, M. Brazier, C.-H. Chang,
V. Chikarmane, M. Childs, H. Deshpande, K. Dev et al., “A 32nm logic tech-
nology featuring 2 nd-generation high-k+ metal-gate transistors, enhanced
channel strain and 0.171 µm 2 sram cell size in a 291mb array,” in Electron
Devices Meeting, 2008. IEDM 2008. IEEE International. IEEE, 2008, pp. 1–3.
4 L. Knoll, Q.-T. Zhao, A. Nichau, S. Trellenkamp, S. Richter, A. Scha¨fer, D. Es-
seni, L. Selmi, K. K. Bourdelle, and S. Mantl, “Inverters with strained si
nanowire complementary tunnel field-effect transistors,” IEEE electron device
letters, vol. 34, no. 6, pp. 813–815, 2013.
5 S. H. Kim, H. Kam, C. Hu, and T.-J. K. Liu, “Germanium-source tunnel field
effect transistors with record high i on/i off,” in VLSI Technology, 2009 Sympo-
sium on. IEEE, 2009, pp. 178–179.
6 U. E. Avci, S. Hasan, D. E. Nikonov, R. Rios, K. Kuhn, and I. A. Young, “Un-
derstanding the feasibility of scaled iii–v tfet for logic by bridging atomistic
simulations and experimental results,” in VLSI technology (VLSIT), 2012 sym-
posium on. IEEE, 2012, pp. 183–184.
7 S. Agarwal, G. Klimeck, and M. Luisier, “Leakage-reduction design concepts
for low-power vertical tunneling field-effect transistors,” IEEE Electron Device
Letters, vol. 31, no. 6, pp. 621–623, 2010.
8 S. S. Sylvia, M. A. Khayer, K. Alam, and R. K. Lake, “Doping, tunnel barriers,
and cold carriers in inas and insb nanowire tunnel transistors,” IEEE transac-
tions on electron devices, vol. 59, no. 11, pp. 2996–3001, 2012.
9 Y. Lu, G. Zhou, R. Li, Q. Liu, Q. Zhang, T. Vasen, S. Doo Chae, T. Kosel,
M. Wistey, H. Xing, A. Seabaugh, and P. Fay, “Performance of AlGaSb/InAs
TFETs with gate electric field and tunneling direction aligned,” Electron Device
Letters, IEEE, vol. 33, no. 5, pp. 655–657, May 2012.
11
10 G. Zhou, R. Li, T. Vasen, M. Qi, S. Chae, Y. Lu, Q. Zhang, H. Zhu, J.-M. Kuo,
T. Kosel et al., “Novel gate-recessed vertical inas/gasb tfets with record high i
on of 180 µa/µm at v ds= 0.5 v,” in Electron Devices Meeting (IEDM), 2012 IEEE
International. IEEE, 2012, pp. 32–6.
11 U. E. Avci and I. A. Young, “Heterojunction tfet scaling and resonant-tfet for
steep subthreshold slope at sub-9nm gate-length,” in Electron Devices Meeting
(IEDM), 2013 IEEE International. IEEE, 2013, pp. 4–3.
12 B. Ganjipour, J. Wallentin, M. T. Borgstrom, L. Samuelson, and C. Thelander,
“Tunnel field-effect transistors based on inp-gaas heterostructure nanowires,”
ACS nano, vol. 6, no. 4, pp. 3109–3113, 2012.
13 A. C. Neto, F. Guinea, N. M. Peres, K. S. Novoselov, and A. K. Geim, “The
electronic properties of graphene,” Reviews of modern physics, vol. 81, no. 1, p.
109, 2009.
14 Q. H. Wang, K. Kalantar-Zadeh, A. Kis, J. N. Coleman, and M. S. Strano, “Elec-
tronics and optoelectronics of two-dimensional transition metal dichalco-
genides,” Nature nanotechnology, vol. 7, no. 11, pp. 699–712, 2012.
15 C. R. Dean, A. F. Young, I. Meric, C. Lee, L. Wang, S. Sorgenfrei, K. Watanabe,
T. Taniguchi, P. Kim, K. L. Shepard et al., “Boron nitride substrates for high-
quality graphene electronics,” Nature nanotechnology, vol. 5, no. 10, pp. 722–
726, 2010.
16 L. Li, Y. Yu, G. J. Ye, Q. Ge, X. Ou, H. Wu, D. Feng, X. H. Chen, and Y. Zhang,
“Black phosphorus field-effect transistors,” Nature nanotechnology, vol. 9, no. 5,
pp. 372–377, 2014.
17 T. Roy, M. Tosun, X. Cao, H. Fang, D.-H. Lien, P. Zhao, Y.-Z. Chen, Y.-L. Chueh,
J. Guo, and A. Javey, “Dual-gated mos2/wse2 van der waals tunnel diodes
and transistors,” Acs Nano, vol. 9, no. 2, pp. 2071–2079, 2015.
18 R. Yan, S. Fathipour, Y. Han, B. Song, S. Xiao, M. Li, N. Ma, V. Protasenko,
D. A. Muller, D. Jena et al., “Esaki diodes in van der waals heterojunctions
with broken-gap energy band alignment,” Nano letters, vol. 15, no. 9, pp. 5791–
5798, 2015.
19 A. M. Hamam, M. E. Schmidt, M. Muruganathan, S. Suzuki, and H. Mizuta,
“Sub-10 nm graphene nano-ribbon tunnel field-effect transistor,” Carbon, vol.
126, pp. 588–593, 2018.
12
20 D. Sarkar, X. Xie, W. Liu, W. Cao, J. Kang, Y. Gong, S. Kraemer, P. M. Ajayan,
and K. Banerjee, “A subthermionic tunnel field-effect transistor with an atom-
ically thin channel,” Nature, vol. 526, no. 7571, pp. 91–95, 2015.
21 J. Xu, J. Jia, S. Lai, J. Ju, and S. Lee, “Tunneling field effect transistor integrated
with black phosphorus-mos2 junction and ion gel dielectric,” Applied Physics
Letters, vol. 110, no. 3, p. 033103, 2017.
22 T. Roy, M. Tosun, M. Hettick, G. H. Ahn, C. Hu, and A. Javey, “2d-2d tunnel-
ing field-effect transistors using wse2/snse2 heterostructures,” Applied Physics
Letters, vol. 108, no. 8, p. 083111, 2016.
23 X. Yan, C. Liu, C. Li, W. Bao, S. Ding, D. W. Zhang, and P. Zhou, “Tunable
snse2/wse2 heterostructure tunneling field effect transistor,” Small, vol. 13,
no. 34, 2017.
24 W. Li, S. Sharmin, H. Ilatikhameneh, R. Rahman, Y. Lu, J. Wang, X. Yan,
A. Seabaugh, G. Klimeck, D. Jena et al., “Polarization-engineered iii-nitride
heterojunction tunnel field-effect transistors,” IEEE Journal on Exploratory
Solid-State Computational Devices and Circuits, vol. 1, pp. 28–34, 2015.
25 E. Memisevic, J. Svensson, M. Hellenbrand, E. Lind, and L.-E. Wernersson,
“Vertical inas/gaassb/gasb tunneling field-effect transistor on si with s= 48
mv/decade and i on= 10 µa/µm for i off= 1 na/µm at v ds= 0.3 v,” in Electron
Devices Meeting (IEDM), 2016 IEEE International. IEEE, 2016, pp. 19–1.
26 R. M. Imenabadi, M. Saremi, and W. G. Vandenberghe, “A novel pnpn-like z-
shaped tunnel field-effect transistor with improved ambipolar behavior and rf
performance,” IEEE Transactions on Electron Devices, vol. 64, no. 11, pp. 4752–
4758, 2017.
27 W. Wang, P.-F. Wang, C.-M. Zhang, X. Lin, X.-Y. Liu, Q.-Q. Sun, P. Zhou, and
D. W. Zhang, “Design of u-shape channel tunnel fets with sige source re-
gions,” IEEE Transactions on Electron Devices, vol. 61, no. 1, pp. 193–197, 2014.
28 P.-C. Shih, W.-C. Hou, and J.-Y. Li, “A u-gate ingaas/gaassb heterojunction
tfet of tunneling normal to the gate with separate control over on-and off-state
current,” IEEE Electron Device Letters, 2017.
29 Q. Huang, R. Huang, Z. Zhan, Y. Qiu, W. Jiang, C. Wu, and Y. Wang, “A novel
si tunnel fet with 36mv/dec subthreshold slope based on junction depleted-
modulation through striped gate configuration,” in Electron Devices Meeting
(IEDM), 2012 IEEE International. IEEE, 2012, pp. 8–5.
13
30 P. Long, J. Z. Huang, M. Povolotskyi, G. Klimeck, and M. J. Rodwell, “High-
current tunneling fets with (1 1¯ 0) orientation and a channel heterojunction,”
IEEE Electron Device Letters, vol. 37, no. 3, pp. 345–348, 2016.
31 D. W. Kwon, H. W. Kim, J. H. Kim, E. Park, J. Lee, W. Kim, S. Kim, J.-H. Lee,
and B.-G. Park, “Effects of localized body doping on switching characteristics
of tunnel fet inverters with vertical structures,” IEEE Transactions on Electron
Devices, vol. 64, no. 4, pp. 1799–1805, 2017.
32 P. Long, J. Huang, M. Povolotskyi, D. Verreck, J. Charles, T. Kubis, G. Klimeck,
M. Rodwell, and B. Calhoun, “A tunnel fet design for high-current, 120 mv
operation,” in Electron Devices Meeting (IEDM), 2016 IEEE International. IEEE,
2016, pp. 30–2.
33 Y. Zhao, C. Wu, Q. Huang, C. Chen, J. Zhu, L. Guo, R. Jia, Z. Lv, Y. Yang,
M. Li et al., “A novel tunnel fet design through adaptive bandgap engineering
with constant sub-threshold slope over 5 decades of current and high ion/ioff
ratio,” IEEE Electron Device Letters, vol. 38, no. 5, pp. 540–543, 2017.
34 H. Ilatikhameneh, T. A. Ameen, G. Klimeck, J. Appenzeller, and R. Rahman,
“Dielectric engineered tunnel field-effect transistor,” IEEE Electron Device Let-
ters, vol. 36, no. 10, pp. 1097–1100, 2015.
35 B. Ghosh and M. W. Akram, “Junctionless tunnel field effect transistor,” IEEE
electron device letters, vol. 34, no. 5, pp. 584–586, 2013.
36 Q. Huang, R. Jia, J. Zhu, Z. Lv, J. Wang, C. Chen, Y. Zhao, R. Wang, W. Bu,
W. Wang et al., “Deep insights into dielectric breakdown in tunnel fets with
awareness of reliability and performance co-optimization,” in Electron Devices
Meeting (IEDM), 2016 IEEE International. IEEE, 2016, pp. 31–5.
37 M. A. Khayer and R. K. Lake, “Effects of band-tails on the subthreshold
characteristics of nanowire band-to-band tunneling transistors,” Journal
of Applied Physics, vol. 110, no. 7, p. 074508, 2011. [Online]. Available:
http://dx.doi.org/10.1063/1.3642954
38 H. Zhang, W. Cao, J. Kang, and K. Banerjee, “Effect of band-tails on the sub-
threshold performance of 2d tunnel-fets,” in Electron Devices Meeting (IEDM),
2016 IEEE International. IEEE, 2016, pp. 30–3.
39 E. Memisevic, E. Lind, M. Hellenbrand, J. Svensson, and L.-E. Wernersson,
“Impact of band-tails on the subthreshold swing of iii-v tunnel field-effect
transistor,” IEEE Electron Device Letters, vol. 38, no. 12, pp. 1661–1664, 2017.
14
40 S. Sant and A. Schenk, “The effect of density-of-state tails on band-to-band
tunneling: Theory and application to tunnel field effect transistors,” Journal of
Applied Physics, vol. 122, no. 13, p. 135702, 2017.
41 Q. Smets, A. S. Verhulst, E. Simoen, D. Gundlach, C. Richter, N. Collaert,
and M. M. Heyns, “Calibration of bulk trap-assisted tunneling and shockley–
read–hall currents and impact on ingaas tunnel-fets,” IEEE transactions on elec-
tron devices, vol. 64, no. 9, pp. 3622–3626, 2017.
42 S. Sant and A. Schenk, “Modeling the effect of interface roughness on the per-
formance of tunnel fets,” IEEE Electron Device Letters, vol. 38, no. 2, pp. 258–
261, 2017.
43 ——, “Trap-tolerant device geometry for inas/si ptfets,” IEEE Electron Device
Letters, vol. 38, no. 10, pp. 1363–1366, 2017.
44 M.-H. Tu, Y.-N. Chen, P. Su, and C.-T. Chuang, “Exploration and evaluation
of tcam with hybrid tunneling fet and finfet devices for ultra-low-voltage ap-
plications,” in VLSI Technology, Systems and Application (VLSI-TSA), 2017 Inter-
national Symposium on. IEEE, 2017, pp. 1–2.
45 F. Settino, M. Lanuzza, S. Strangio, F. Crupi, P. Palestri, D. Esseni, and L. Selmi,
“Understanding the potential and limitations of tunnel fets for low-voltage
analog/mixed-signal circuits,” IEEE Transactions on Electron Devices, 2017.
46 A. Acharya, A. B. Solanki, S. Dasgupta, and B. Anand, “Drain current sat-
uration in line tunneling-based tfets: An analog design perspective,” IEEE
Transactions on Electron Devices, 2017.
47 J. Min and P. M. Asbeck, “Compact modeling of distributed effects in 2-d ver-
tical tunnel fets and their impact on dc and rf performances,” IEEE Journal
on Exploratory Solid-State Computational Devices and Circuits, vol. 3, pp. 18–26,
2017.
48 D. Rajasekharan, T. Dutta, A. R. Trivedi, and Y. S. Chauhan, “Energy-efficient
spiking neural networks based on tunnel fet,” in Emerging Electronics (ICEE),
2016 3rd International Conference on. IEEE, 2016, pp. 1–4.
49 J. Ferna´ndez-Berni, M. Niemier, X. Hu, H. Lu, W. Li, P. Fay, R. Carmona-Gala´n,
and A´. Rodrı´guez-Va´zquez, “Tfet-based well capacity adjustment in active
pixel sensor for enhanced high dynamic range,” Electronics Letters, vol. 53,
no. 9, pp. 622–624, 2017.
15
50 C. Xie, J. Tan, M. Chen, Y. Yi, L. Peng, and X. Fu, “Emerging technology en-
abled energy-efficient gpgpus register file,” Microprocessors and Microsystems,
vol. 50, pp. 175–188, 2017.
51 J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd,
“Quantum machine learning,” Nature, vol. 549, pp. 195 EP –, 09 2017.
[Online]. Available: http://dx.doi.org/10.1038/nature23474
52 T. Yoder, G. H. Low, and I. Chuang, “Quantum inference on bayesian net-
works,” in APS Meeting Abstracts, 2014.
53 C. Poultney, S. Chopra, Y. L. Cun et al., “Efficient learning of sparse representa-
tions with an energy-based model,” in Advances in neural information processing
systems, 2007, pp. 1137–1144.
54 D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithm for
boltzmann machines,” Cognitive science, vol. 9, no. 1, pp. 147–169, 1985.
55 P. Baldi, P. Sadowski, and D. Whiteson, “Searching for exotic particles in high-
energy physics with deep learning,” Nature communications, vol. 5, 2014.
56 B. P. Abbott, R. Abbott, T. Abbott, M. Abernathy, F. Acernese, K. Ackley,
C. Adams, T. Adams, P. Addesso, R. Adhikari et al., “Observation of gravi-
tational waves from a binary black hole merger,” Physical review letters, vol.
116, no. 6, p. 061102, 2016.
57 L.-F. Arsenault, A. Lopez-Bezanilla, O. A. von Lilienfeld, and A. J. Millis,
“Machine learning for many-body physics: the case of the anderson impu-
rity model,” Physical Review B, vol. 90, no. 15, p. 155136, 2014.
58 A. G. Kusne, T. Gao, A. Mehta, L. Ke, M. C. Nguyen, K.-M. Ho, V. Antropov,
C.-Z. Wang, M. J. Kramer, C. Long et al., “On-the-fly machine-learning for
high-throughput experiments: search for rare-earth-free permanent mag-
nets,” Scientific reports, vol. 4, 2014.
59 S. V. Kalinin, B. G. Sumpter, and R. K. Archibald, “Big-deep-smart data in
imaging for guiding materials design,” Nature materials, vol. 14, no. 10, pp.
973–980, 2015.
60 L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big
data of materials science: Critical role of the descriptor,” Physical review letters,
vol. 114, no. 10, p. 105503, 2015.
16
61 S. S. Schoenholz, E. D. Cubuk, D. M. Sussman, E. Kaxiras, and A. J. Liu, “A
structural approach to relaxation in glassy liquids,” Nature Physics, vol. 12, pp.
469 EP –, 02 2016. [Online]. Available: http://dx.doi.org/10.1038/nphys3644
62 P. Mehta and D. J. Schwab, “An exact mapping between the variational renor-
malization group and deep learning,” arXiv preprint arXiv:1410.3831, 2014.
63 J. Carrasquilla and R. G. Melko, “Machine learning phases of matter,”
Nature Physics, vol. 13, pp. 431 EP –, 02 2017. [Online]. Available:
http://dx.doi.org/10.1038/nphys4035
64 D. Bhattacharya and N. K. Jha, “Finfets: From devices to architectures,” Ad-
vances in Electronics, vol. 2014, 2014.
65 C. J. Xue, G. Sun, Y. Zhang, J. J. Yang, Y. Chen, and H. Li, “Emerging non-
volatile memories: opportunities and challenges,” in Hardware/Software Code-
sign and System Synthesis (CODES+ ISSS), 2011 Proceedings of the 9th Interna-
tional Conference on. IEEE, 2011, pp. 325–334.
66 P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A
novel processing-in-memory architecture for neural network computation in
reram-based main memory,” in Proceedings of the 43rd International Symposium
on Computer Architecture. IEEE Press, 2016, pp. 27–39.
67 Y. A. Vlasov, “Silicon cmos-integrated nano-photonics for computer and data
communications beyond 100g,” IEEE Communications Magazine, vol. 50, no. 2,
2012.
68 G. Lacey, G. W. Taylor, and S. Areibi, “Deep learning on fpgas: Past, present,
and future,” arXiv preprint arXiv:1602.04283, 2016.
69 N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates,
S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of
a tensor processing unit,” in Proceedings of the 44th Annual International Sympo-
sium on Computer Architecture. ACM, 2017, pp. 1–12.
70 Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun
et al., “Dadiannao: A machine-learning supercomputer,” in Proceedings of the
47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE
Computer Society, 2014, pp. 609–622.
71 W.-T. J. Chan, P.-H. Ho, A. B. Kahng, and P. Saxena, “Routability optimization
17
for industrial designs at sub-14nm process nodes using machine learning.” in
ISPD, 2017, pp. 15–21.
72 W.-T. J. Chan, K. Y. Chung, A. B. Kahng, N. D. MacDonald, and S. Nath,
“Learning-based prediction of embedded memory timing failures during ini-
tial floorplan design,” in Design Automation Conference (ASP-DAC), 2016 21st
Asia and South Pacific. IEEE, 2016, pp. 178–185.
18
CHAPTER 2
PHYSICAL MODELING OF TWO-DIMENSIONAL HETEROJUNCTION
INTERLAYER TUNNELING FETS (THIN-TFETS)
2.1 Introduction
The electronic integrated circuits are the hardware backbone of todays informa-
tion society and the power dissipation has recently become the greatest chal-
lenge, affecting the lifetime of existing portable equipments, the sustainabil-
ity of large and growing in number data centers, and the feasibility of energy
autonomous systems for ambiance intelligence,1, 2 and of sensor networks for
implanted monitoring and actuation medical devices.3 While the scaling of
the supply voltage, VDD, is recognized as the most effective measure to reduce
switching power in digital circuits, the performance loss and increased device
to device variability are a serious hindrance to the VDD scaling down to 0.5 V or
below.
The voltage scalability of VLSI systems may be significantly improved by re-
sorting to innovations in the transistor technology and, in this regard, the ITRS
has singled out Tunnel filed effect transistors (FETs) as the most promising tran-
sistors to reduce the sub-threshold swing, SS, below the 60 mV/dec limit of
MOSFETs (at room temperature), and thus to enable a further VDD scaling.4, 5
Several device architectures and materials are being investigated to develop
Tunnel FETs offering both an attractive on current and a small SS, including
III-V based transistors possibly employing staggered or broken bandgap hetero-
junctions,6–9 or strain engineering.10 Even if encouraging experimental results
have been reported for the on-current in III-V Tunnel FETs, to achieve a sub 60
19
mV/dec subthreshold swing is still a real challenge in these devices, probably
due to the detrimental effects of interface states.6, 11, 12 Therefore, as of today
the investigation of new material systems and innovative device architectures
for high performance Tunnel FETs is a timely research field in both the applied
physics and the electron device community.
In such a contest, two-dimensional (2D) crystals attract increasingly more
attention primarily due to their scalability, step-like density of states and ab-
sence of broken bonds at interface. They can be stacked to form a new class
of tunneling transistors based on an interlayer tunneling occurring in the di-
rection normal to the plane of the 2D materials. In fact tunneling and reso-
nant tunneling devices have been recently proposed,13 as well as experimen-
tally demonstrated for graphene-based transistors.14, 15 Furthermore, monolay-
ers of group-VIB transition metal dichalcogenides MX2 (M = Mo, W; X = S, Se,
Te) have recently attracted remarkable attention for their electronic and optical
properties.16, 17 Monolayers of transition-metal dichalcogenides (TMDs) have a
bandgap varying from almost zero to 2 eV with a sub-nanometer thickness such
that these materials can be considered approximately as two-dimensional crys-
tals.18 The sub-nanometer thickness of TMDs can provide excellent electrostatic
control in a vertically stacked heterojunction. Furthermore, the 2D nature of
such materials make them essentially immune to the energy bandgap increase
produced by the vertical quantization when conventional 3D semiconductors
are thinned to a nanoscale thickness, and thus immune to the corresponding
degradation of the tunneling current density.19 Moreover, the lack of dangling
bonds at the surface of TMDs may allow for the fabrication of material stacks
with low densities of interface defects,19 which is another potential advantage
of TMDs materials for Tunnel FETs applications.
20
In this paper we propose a two-dimensional heterojunction interlayer tun-
neling field effect transistor (Thin-TFET) based on 2D semiconductors and de-
velop a transport model based on the transfer-Hamiltonian method to describe
the current voltage characteristics and discuss, in particular, the subthreshold
swing. In Section 2.2 we first present the device concept and illustrate examples
of the vertical electrostatic control, then we develop a formalism to calculate the
tunneling current. Upon realizing that the subthreshold swing of the Thin-TFET
is ultimately determined by the energy broadening, in Sec.2.2.3 we show how
this important physical factor has been included in our calculations. In Sec.2.2.4
we address the effect of a possible misalignment between the two 2D semicon-
ductor layers, while in Sec.2.2.5 we derive some approximated, analytical ex-
pressions for the tunneling current density, which are useful to gain insight in
the transistor operation and to guide the device design. In Sec.2.3.2 we present
the results of numerically calculated transfer characteristics for the Thin-TFET
based on MoS2 and WSe2, and effects of correlation lengths, interlayer thick-
nesses, and energy broadening. In Sec.2.4 we discuss both n-type and p-type
Thin-TFETs employing a promising material system of 2H-WSe2 and 1T-SnSe2,
along with their capacitance model in Sec.2.4.2. The effect of a non-uniform van
der Waals gap thickness and the external source and drain total access resis-
tance are also discussed in Sec.2.4.1. Using the simulated results, we present the
benchmarking results in Sec.2.4.3 and finally in Sec.2.5 we draw some conclud-
ing remarks about the modeling approach developed in the paper and about
the design perspectives for the Thin-TFET.
21
2.2 Modeling of the Tunneling Transistor
2.2.1 Device Concept and Electrostatics
The device structure and the corresponding band diagram are sketched in
Fig.2.1, where the 2D materials are assumed to be semiconductors with sizable
energy bandgap, for example, transition-metal dichalcogenide (TMD) semicon-
ductors without losing generality.17, 20 Both the top 2D and the bottom 2D mate-
rial is a monolayer and the thickness of the 2D layers is neglected in the model-
ing of the electrostatics.
The working principle of the tunneling transistor sketched in Fig.2.1(a) can
be explained as follows. When the conduction band edge ECT of the top 2D layer
is higher than the valence band edge EVB of the bottom 2D layer (see Fig.2.2(a)),
there are no states in the top layer to which the electrons of the bottom layer can
tunnel into. This corresponds to the off state of the device. When ECT is pulled
below EVB (see Fig.2.2(b)), a tunneling window is formed and consequently an
interlayer tunneling can flow from the bottom to the top 2D material. The cross-
ing and uncrossing between the top layer conduction band and the bottom layer
valence band is governed by the gate voltages and it is described by the electro-
statics of the device.
To calculate the band alignment along the vertical direction of the intrinsic
device in Fig.2.1 we write the Gauss law linking the sheet charge in the 2D ma-
terials to the electric fields in the surrounding insulating layers, which leads to
CTOXVTOX −CIOXVIOX = e(pT − nT + ND)
CBOXVBOX +CIOXVIOX = e(pB − nB + NA)
(2.1)
22
Figure 2.1: (a) Schematic device structure for the Thin-TFET, where VTG, VBG
and VDS are the top gate, bottom gate and drain to source voltages; (b) sketch
of the band diagram, where ΦM,T , ΦM,B are the work-functions and EF,MT , EF,MB
the Fermi levels of the metal gates, while χ2D,T , χ2D,B are the electron affinities,
EFT , EFB the Fermi levels, ECT , ECB the conduction band edges and EVT , EVB the
valence band edges respectively in the top and bottom 2D layer. VTOX, VIOX and
VBOX are the potential drops respectively across the top oxide, interlayer and
bottom oxide.
where CT (I,B)OX is the capacitance per unit area of top oxide (interlayer, bottom
oxide) and VT (I,B)OX is the potential drop across top oxide (interlayer, bottom ox-
ide). The potential drop across the oxides can be written in terms of the external
voltages VTG, VBG, VDS and of the energy eφn,T = ECT − EFT and eφp,T = EFB − EVB
23
Figure 2.2: Sketch of the band alignments in a Thin-TFET between the top and
bottom 2D layer in: (a) OFF state and (b) ON state.
defined in Fig.2.1(b) as
eVTOX = eVTG + eφn,T − eVDS + χ2D,T − ΦM,T
eVBOX = eVBG − eφp,B + EGB + χ2D,B − ΦM,B
eVIOX = eVDS − eφp,B − eφn,T + EGB + χ2D,B − χ2D,T
(2.2)
where EFT , EFB are fermi levels of majority carriers in the top and bottom layer.
nT , pT are the electron and hole concentration in the top layer, nB, pB the concen-
trations in bottom layer, χ2D,T , χ2D,B are the electron affinities of the 2D materials,
ΦT , ΦB the workfunctions of the top and back gate and EGB is the energy gap in
the bottom layer. Eq. 2.2 implicitly assumes that the majority carriers of the two
2D materials are at thermodynamic equilibrium with their Fermi levels, with
the split of the Fermi levels set by the external voltages (i.e. EFB−EFT=eVDS ),
and the electrostatic potential essentially constant in the 2D layers.
Since in our numerical calculations we shall employ a parabolic effective
mass approximation for the energy dispersion of the 2D materials, as discussed
more thoroughly in Sec.2.3, the carrier densities can be readily expressed as an
24
analytic function of eφn,T and eφp,B21
n(p) =
gvmc(mv)kBT
pi~2
ln
[
exp
(
−qφn,T (φp,B)
kBT
)
+ 1
]
(2.3)
where gv is the valley degeneracy.
When Eq.2.2 and Eq.2.3 are inserted in Eq.2.1, we obtain two algebraic equa-
tions for φn,T and φp,B that can be solved numerically and describe the electro-
statics in a one dimensional section of the device.
2.2.2 Transport Model
In this section we develop a formalism to calculate the tunneling current based
on the transfer-Hamiltonian method,22–24 as also revisited recently for resonant
tunneling in graphene transistors.13, 14, 25 We start by writing the single particle
elastic tunneling current as
I = gv
4pie
~
∑
kT ,kB
|M(kT ,kB)|2δ(EB(kB) − ET (kT ))( fB − fT ) (2.4)
where e is the elementary charge, kB, kT are the wave-vectors respectively in
the bottom and top 2D material, EB(kB) ET (kT ) denote the corresponding ener-
gies, fB and fT are the Fermi occupation functions in the bottom and top layer
(depending respectively on EFB and EFT , see Fig.2.1), and gv is the valley degen-
eracy. The matrix element M(kT ,kB) expresses the transfer of electrons between
the two 2D layers is given by14
M(kT ,kB) =
∫
A
dr
∫
dzψ†T,kT (r, z)Usc(r, z)ψB,kB(r, z) (2.5)
where ψB,kB (ψT,kT ) is the electron wave-function in the bottom (top) 2D layer and
Usc(r, z) is the perturbation potential in the interlayer region.
25
Eq.2.5 acknowledges the fact that in real devices several physical mecha-
nisms occurring in the interlayer region can result in a relaxed conservation of
the in plane wave-vector k in the tunneling process. We will return to the dis-
cussion of Usc(r, z) in this section.
To proceed in the calculation of M(kT ,kB) we write the electron wave-
function in the Bloch function form as
ψk(r, z) =
1√
NC
eik·r uk(r, z) (2.6)
where uk(r, z) is a periodic function of r and NC is the number of unit cells in
the overlapping area A of the two 2D materials. Eq.2.6 assumes the following
normalization condition: ∫
ΩC
dρ
∫
z
dz|uk(ρ, z)|2 = 1 (2.7)
where ρ is the in-plane abscissa in the unit cell area ΩC and A=NCΩC.
The wave-function ψk(r, z) is assumed to decay exponentially in the inter-
layer region with a decay constant κ;13, 14 such a z dependence is absorbed in
uk(r, z) and we do not need to make it explicit in our derivations. It should be
noticed that absorbing the exponential decay in uk(r, z) recognizes the fact that
in the interlayer region the r dependence of the wave-function changes with z.
In fact, as already discussed,13 while the uk(r, z) are localized around the basis
atoms in the two 2D layers, these functions are expected to spread out while
they decay in the interlayer region, so that the r dependence becomes weaker
when moving farther from the 2D layers.
To continue in the calculation of M(kT ,kB) we let the scattering potential in
the interlayer region be separable in the form14
Usc(r, z) = VB(z) FL(r) (2.8)
26
where FL(r) is the in-plane fluctuation of the scattering potential, which is essen-
tially responsible for the relaxation of momentum conservation in the tunneling
process.
By substituting Eqs.2.6 and 2.8 in Eq.2.5 and writing r=r j+ρ, where r j is a
direct lattice vector and ρ is the in-plane position inside each unit cell, we obtain
M(kT ,kB) =
1
NC
NC∑
j=1
ei(kB−kT )·r j
∫
ΩC
dρ
∫
dz ei(kB−kT )·ρ ×
× u†T,kT (r j + ρ, z) FL(r j + ρ)VB(z) uB,kB(r j + ρ, z) (2.9)
We now assume that FL(r) corresponds to relatively long range fluctuations so
that it can be taken as approximately constant inside a unit cell, and that, fur-
thermore, the top and bottom 2D layer have the same lattice constant, hence the
Bloch functions uT,kT and uB,kB have the same periodicity in the r plane. More-
over, for the time being we consider that the conduction band minimum in the
top layer and the valence band maximum in the bottom layer are at the same
point of the 2D Brillouin zone, so that q=kB−kT is small compared to the size of
the Brillouin zone and eiq·ρ is approximately 1.0 inside a unit cell. These consid-
erations and approximations allow us to rewrite Eq.2.9 as
M(kT ,kB) ' 1NC
NC∑
j=1
eiq·r jFL(r j)
∫
ΩC
dρ
∫
dz u†T,kT (ρ, z)VB(z) uB,kB(ρ, z) (2.10)
where the integral in the unit cell has been written for r j=0 because it is inde-
pendent of the unit cell.
Consistently with the assumption that kB and kT are small compared to the
size of the Brillouin zone, in Eq.2.10 we neglect the kB (kT ) dependence of uB,kB
(uT,kT ) and simply set uT,kT (ρ, z)≈u0T (ρ, z), uB,kB(ρ, z)≈u0B(ρ, z), where u0T (ρ, z) and
u0B(ρ, z) are the periodic parts of the Bloch function at the band edges, which is
27
the simplification typically employed in the effective mass approximation ap-
proach.21 By recalling that the u0B and u0T retain the exponential decay of the
wave-functions in the interlayer region with a decay constant κ, we now write∫
ΩC
dρ
∫
dz u†0T (ρ, z)VB(z) u0B(ρ, z) ' MB0 e−κTIL (2.11)
where TIL is the interlayer thickness and MB0 is a k independent matrix element
that will remain a prefactor in the final expression for the tunneling current.
Since FL(r) has been assumed a slowly varying function over a unit cell, then
the sum over the unit cells in Eq.2.10 can be rewritten as a normalized integral
over the tunneling area
1
Ωc NC
NC∑
j=1
Ωc eiq·r jFL(r j) ' 1A
∫
A
eiq·rFL(r)dr (2.12)
By introducing Eq.2.11 and 2.12 in Eq.2.10 we can finally express the squared
matrix element as
|M(kT ,kB)|2 ' |MB0|
2 S F(q)
A
e−2κTIL (2.13)
where q=kB−kT and SF(q) is the power spectrum of the random fluctuation de-
scribed by FL(r), which is defined as21
S F(q) =
1
A
∣∣∣∣∣∫
A
eiq·rFL(r)dr
∣∣∣∣∣2 (2.14)
By substituting Eq.2.13 in Eq.2.4 and then converting the sums over kB and kT
to integrals we obtain
I =
gve |MB0|2 A
4pi3~
e−2κTIL
∫
kT
∫
kB
dkT dkB S F(q) δ(EB(kB) − ET (kT )) ( fB − fT ) (2.15)
Before we proceed with some important integrations of the basic model that will
be discussed in Secs.2.2.3 and 2.2.4, a few comments about the results obtained
so far are in order below.
28
According to Eq.2.15 the current is proportional to the squared matrix ele-
ment |MB0|2 defined in Eq.2.11 and decreases exponentially with the thickness
interlayer TIL according to the decay constant κ of the wave-functions. Attempt-
ing to derive a quantitative expression for MB0 is admittedly very difficult, in
fact it is difficult to determine how the periodic functions u0T (ρ, z) and u0B(ρ, z)
spread out when they decay in the barrier region and, furthermore, it is not
even perfectly clear what potential energy or Hamiltonian should be used to
describe the barrier region itself, which is an issue already recognized and thor-
oughly discussed in the literature since a long time.24 Our model essentially
circumvents these difficulties by resorting to the semi-empirical formulation of
the matrix element given by Eq.2.11, where MB0 is left as a parameter to be de-
termined and discussed by comparing to experiments.
It is also worth noting that in our calculations we have not explicitly dis-
cussed the effect of spin-orbit interaction in the bandstructure of 2D materials,
even if giant spin-orbit couplings have been reported in 2D transition-metal
dichalcogenides.26 If the energy separations between the spin-up and spin-
down bands are large, then the spin degeneracy in current calculations should
be one instead of two, which would affect the current magnitude but not its
dependence on the gate bias. Our calculations neglected also the possible mod-
ifications of band structure in the TMD materials produced by the vertical elec-
trical field, in fact we believe that in our device the electrical field in the 2D
layers is not strong enough to make such effects significant.27
The decay constant κ in the interlayer region may be estimated from the
electron affinity difference between the 2D layers and the interlayer material.13
Moreover, according to Eq.2.15 the constant κ determines the dependence of the
29
current on TIL, so that κ may be extracted by comparing to experiments dis-
cussing such a dependence, which, for example, have been recently reported
for the interlayer tunneling current in a graphene-hBN system.15
As for the spectrum SF(q) of the scattering potential, in our calculations we
utilize
S F(q) =
piL2C
(1 + q2L2C/2)3/2
(2.16)
where LC is the correlation length, which in our derivations has been assumed
large compared to the size of a unit cell. Eq.2.16 is consistent with an exponential
form for the autocorrelation function of FL(r),21 and a similar q dependence has
been recently employed to reproduce the experimentally observed line-width
of the resonance region in graphene interlayer tunneling transistors.14 Such a
functional form can be representative of phonon assisted tunneling, short-range
disorder,28 charged impurities29 or Moire´ patterns that have been observed, for
instance, at the graphene-hBN interface.30–32 We will see in Sec.2.2.5 that the
LC has an influence on the gate voltage dependent current, which has a neat
physical interpretation, hence a comparison to experimental data will be very
informative for an estimate of LC.
2.2.3 Effects of Energy Broadening
According to Eq.2.4 and Eq.2.15 the tunneling current is simply zero when there
is no energy overlap between the conduction band in the top layer and the va-
lence band in the bottom layer, that is for ECT>EVB. In a real device, however, the
2D materials will inevitably have phonons, disorder, host impurities in the 2D
layer and be affected by the background impurities in the surrounding materi-
30
als, so that a finite broadening of the energy levels is expected to occur because
of the statistical potential fluctuations superimposed to the ideal crystal struc-
ture.33 The energy broadening in 3D semiconductors is known to lead to a tail of
the density of states (DoS) in the gap region, that has been also observed in op-
tical absorption measurements and denoted Urbach tail.34, 35 It is thus expected
that the finite energy broadening will be a fundamental limit to the abruptness
of the turn on characteristic attainable with the devices of this work, hence it is
important to include this effect in our model.
Energy broadening in the 2D systems can stem from the interaction with ran-
domly distributed impurities and disorder in the 2D layer or in the surrounding
materials,33, 36, 37 by scattering events induced by the interfaces,38 as well as by
other scattering sources. We recognize the fact that a detailed description of the
energy broadening is exceedingly complicated due to the many-body and sta-
tistical fluctuation effects,39 and thus resort to a relatively simple semi-classical
treatment36,.33 We start by recalling that the density of states ρ0(E) for a 2D layer
with no energy broadening is
ρ0(E) =
gsgv
4pi2
∫
k
dk δ [E − E(k)] (2.17)
where E(k) denotes the energy relation with no broadening and gs, gv are spin
and valley degeneracy. In the presence of a randomly fluctuating potential V(r),
instead, the DoS can be written as33, 36
ρ(E) =
∫ ∞
0
dv ρ0(v)Pv(E − v)
=
gsgv
4pi2
∫
k
dk
[∫ ∞
0
dv δ [v − E(k)] Pv(E − v)
]
=
gsgv
4pi2
∫
k
dk Pv [E − E(k)]
(2.18)
where Pv(v) is the distribution function for V(r) (to be further discussed below),
31
and we have used the ρ0(E) definition in Eq.2.17 to go from the first to the second
equality.
Comparing Eq.2.18 to Eq.2.17, we see that the ρ(E) of the system in the
presence of broadening can be calculated by substituting the Dirac function in
Eq.2.17 with a finite width function Pv(v), which is the distribution function of
V(r) and it is thus normalized to one.
In order to include the energy broadening in our current calculations, we
rewrite the tunneling rate in Eq.2.4 as
1
τkT ,kB
=
2pi
~
|M(kT ,kB)|2δ [ET (kT ) − EB(kB)]
=
2pi
~
|M(kT ,kB)|2
∫ ∞
−∞
dEδ [E − ET (kT )] δ [E − EB(kB)]
(2.19)
and note that, consistently with Eq.2.18, the energy broadening can be included
in the current calculation by substituting δ[E −E(k)] with Pv[E −E(k)]. By doing
so the tunneling rate becomes
1
τkT ,kB
' 2pi
~
|M(kT ,kB)|2S E(ET (kT ) − EB(kB)) (2.20)
where we have introduced an energy broadening spectrum S E that is defined as
S E(ET (kT ) − EB(kB)) =
∫ ∞
−∞
dEPvT [E − ET (kT )] PvB [E − EB(kB)] (2.21)
where PvT and PvB is the potential distribution function due to the presence of
randomly fluctuating potential V(r) in the top and the bottom layer, respectively.
On the basis of Eq.2.20, in our model for the tunneling current we accounted
for the energy broadening by using in all numerical calculations the broaden-
ing spectrum S E(ET (kT )−EB(kB)) defined in Eq.2.21 in place of δ[ET (kT )−EB(kB)].
More precisely we used a Gaussian potential distribution for both the top and
32
the bottom layer
Pv(E − Ek0) = 1√
piσ
e−(E−Ek0)
2/σ2 (2.22)
which has been derived by Evan O.Kane for a broadening induced by randomly
distributed impurities,36 in which case σ can be expressed in terms of the aver-
age impurity concentration.
Quite interestingly, for the Gaussian spectrum in Eq.2.22 the overall broad-
ening spectrum S E defined in Eq.2.21 can be calculated analytically and reads
S E(ET (kT ) − EB(kB)) = 1√
pi(σ2T + σ
2
B)
e−(ET (kT )−EB(kB))
2/σ2
.
(2.23)
Hence also SE has a Gaussian spectrum, where σT and σB are the broadening
energies for the top and bottom 2D layer, respectively.
2.2.4 Rotational Misalignment and Tunneling Between In-
equivalent Extrema
The derivations in Sec.2.2.2 assumed that there is a perfect rotational alignment
between the top and the bottom layer and that the tunneling occurs between
equivalent extrema in the Brillouin zone, that is tunneling from a K to a K ex-
tremum (or from K′ to K′ extremum). We now denote by θ the angle expressing
a possible rotational misalignment between the two 2D layers (see Fig.2.3), and
still assume that the top 2D crystal has the same lattice constant a0 as the bottom
2D crystal. The principal coordinate system is taken as the crystal coordinate
33
system in the bottom layer, and we denote with r′, k′ the position and wave
vectors in the crystal coordinate system of the top layer (with r, k being the
vectors in the principal coordinate system). The wave-function in the top layer
has the form given in Eq.2.6 in terms of r′, k′, hence in order to calculate the
matrix element in the principal coordinate system we start by writing r′=RˆB→T r,
k′=RˆB→Tk, where RˆB→T is the rotation matrix from the bottom to the top coor-
dinate system, with RˆT→B=[RˆB→T ]T being the matrix going from the top to the
bottom coordinate system and MT denoting the transpose of the matrix M. The
rotation matrix can be written as
RˆT→B =
 cosθ −sinθsinθ cosθ
 (2.24)
in terms of the rotational misalignment angle θ.
Figure 2.3: Sketch of a possible rotational misalignment between the top and
bottom 2D layer, x-y is the reference coordinate for the bottom 2D layer and x’-y’
is the reference coordinate for the top 2D layer. θ is the rotational misalignment
angle. We assume the top layer and the bottom layer have the same lattice
constant a0.
Consistently with Sec.2.2.2 we set uT,kT (r′, z)≈u0T (r′, z), uB,kB(r, z)≈u0B(r, z),
34
where u0T (r′, z), u0B(r, z) are the periodic part of the Bloch function respectively
at the band edge in the top and bottom layer. We then denote withK0T the wave-
vector at the conduction band edge in the top layer (expressed in the top layer
coordinate system), and with K0B the wave-vector at the valence band edge in
the bottom layer (expressed in the principal coordinate system); the derivations
in this section account for the fact that K0T and K0B may be inequivalent extrema
(i.e. K0T,K0B).
By expressing r′ and k′ in the principal coordinate system we can essentially
follow the derivations in Sec.2.2.2 and write the matrix element as
M(kT ,kB) ' 1NC
NC∑
j=1
ei(q+QD)·r jFL(r j) ×
×
∫
ΩC
dr
∫
dz u†0T (RˆB→T (r j + ρ), z)VB(z) u0B(r j + ρ, z) (2.25)
where q=(kB−kT ) and we have introduced the vector
QD = K0B − RˆT→BK0T (2.26)
Eq.2.25 is an extension of Eq.2.10 that accounts for a possible rotational mis-
alignment between the 2D layers and describes also the tunneling between in-
equivalent extrema. The vector QD is zero only for tunneling between equiva-
lent extrema (i.e. K0B=K0T ) and for a perfect rotational alignment (i.e. θ=0). Con-
sidering a case where all extrema are at the K point, we have |K0B|=|K0T |=4pi/3a0,
then for K0B=K0T the magnitude of QD is simply given by QD=(8pi/3a0) sin(θ/2).13
One significant difference in Eq.2.25 compared to Eq.2.10 is that, in the pres-
ence of rotational misalignment, the top layer Bloch function u0T (RˆB→Tr, z) has
a different periodicity in the principal coordinate system from the bottom layer
u0B(r, z). Consequently the integral over the unit cells of the bottom 2D layer
35
is not the same in all unit cells, so that the derivations going from Eq.2.10 to
Eq.2.15 should be rewritten accounting for a matrix element MB0, j depending on
the unit cell j. Such an MB0, j could be formally included in the calculations by
defining a new scattering spectrum that includes not only the inherently ran-
dom fluctuations of the potential FL(r), but also the cell to cell variations of the
matrix element MB0, j. A second important difference of Eq.2.25 compared to
Eq.2.10 lies in the presence of QD in the exponential term multiplying FL(r j).
For the case of tunneling between inequivalent extrema and with a negligible
rotational misalignment (i.e. θ'0), Eq.2.26 gives QD=K0B−K0T and the current
can be expressed as in Eq.2.15 but with the scattering spectrum evaluated at
|q+QD|. Since in this case the magnitude of QD is comparable to the size of the
Brillouin zone, the tunneling between inequivalent extrema is expected to be
substantially suppressed if the correlation length Lc of the scattering spectrum
SR(q) is much larger than the lattice constant, as it has been assumed in all the
derivations.
Quite interestingly, the derivations in this section suggest that a possible ro-
tational misalignment is expected to affect the absolute value of the tunneling
current but not to change significantly its dependence on the terminal voltages.
From a technological viewpoint, if the stack of the 2D materials is obtained
using a dry transfer method the rotational misalignment appears inevitable.14, 40
Experimental results have shown that, when the stack of 2D materials is ob-
tained by growing the one material on top of the other, the top 2D and bottom
2D layer can have a fairly good angular alignment.41, 42
36
2.2.5 An Analytical Approximation for the Tunneling Current
The numerical calculations for the tunneling current obtained with the model
derived in Secs.2.2.2 and 2.2.3 will be presented in Sec.2.3, while in this section
we discuss an analytical, approximated expression for the tunneling current
which is mainly useful to gain an insight about the main physical and mate-
rial parameters affecting the current versus voltage characteristic of the Thin-
TFET. In order to derive an analytical current expression we start by assuming
a parabolic energy relation and write
EVB(kB) = EVB − ~
2k2B
2mv
ECT (kT ) = ECT +
~2k2T
2mc
(2.27)
where EVB(kB), ECT (kT ) are the energy relation respectively in the bottom layer
valence band and top layer conduction band and mv, mc the corresponding ef-
fective masses.
In the analytical derivations we neglect the energy broadening and start from
Eq.2.15, so that the model is essentially valid only in the on-state of the device,
that is for ECT<EVB.
We now focus on the integral over kB and kT in Eq.2.15 and first introduce
the polar coordinates kB=(kB,θB), kT=( kT ,θT ), and then use Eq.2.27 to convert
the integrals over kB, kT to integrals over respectively EB, ET , which leads to
I ∝
∫
kT
∫
kB
dkT dkB S F(q) δ(EB(kB) − ET (kT )) ( fB − fT ) (2.28)
=
mcmv
~4
∫ 2pi
0
dθB
∫ 2pi
0
dθT
∫ ∞
ECT
dET
∫ EVB
−∞
dEB S F(q) δ(EB − ET ) ( fB − fT )
where the spectrum S F(q) is given by Eq.2.16 and thus depends only on the
magnitude q of q=kB−kT . Assuming ECT<EVB, the Dirac function reduces one of
the integrals over the energies and sets E=EB=ET , furthermore the magnitude
37
of q=kB−kT depends only on the angle θ=θB−θT , so that Eq.2.28 simplifies to
I ∝ mcmv(2pi)
~4
∫ 2pi
0
dθ
∫ EVB
ECT
dE S F(q) ( fB − fT ) (2.29)
In the on-state condition (i.e. for ECT<EVB), the zero Kelvin approximation for
the Fermi-Dirac occupation functions fB, fT can be introduced to further simplify
Eq.2.29 to
I ∝ mcmv(2pi)
~4
∫ 2pi
0
dθ
∫ Emax
Emin
dE S F(q) (2.30)
where Emin=max{ECT , EFT}, Emax=min{EVB, EFB} define the tunneling window
[Emax − Emin].
The evaluation of Eq.2.30 requires to express q as a function of the energy E
inside the tunneling window and of the angle θ between kB and kT . By recalling
q2=k2B+k
2
T−2kBkT cos(θ), we can use Eq.2.27 to write
q2 =
2mv
~2
(EVB − E) + 2mc
~2
(E − ECT ) − 4
√
mcmv
~2
√
(EVB − E)(E − ECT ) cos(θ) (2.31)
with E=EB=ET . When Eq.2.31 is substituted in the spectrum SF(q) the resulting
integrals over E and θ in Eq.2.30 cannot be evaluated analytically. Therefore to
proceed further we now examine the maximum value taken by q2. The θ value
leading to the largest q2 is θ=pi, and the resulting q2 expression can be further
maximized with respect to the energy E varying in the tunneling window. The
energy leading to maximum q2 is
EM =
ECT + (mc/mv)EVB
1 + (mc/mv)
(2.32)
and the corresponding q2M is
q2M =
2(mc + mv)(EVB − ECT )
~2
(2.33)
38
When neither the top nor the bottom layer are degenerately doped the tun-
neling window is given by Emin=ECT and Emax=EVB, in which case the EM defined
in Eq.2.32 belongs to the tunneling window and the maximum value of q2 is
given by Eq.2.33. If either the top or the bottom layer is degenerately doped
the Fermi levels become the edges of the tunneling window and the maximum
value of q2 may be smaller than in Eq.2.33.
A drastic simplification in the evaluation of Eq.2.30 is obtained for q2M1/L2c ,
in which case Eq.2.16 returns to SF(q)≈piL2c , so that by substituting SF(q) in
Eq.2.29 and then in Eq.2.15 the expression for the current simplifies to
I ' egvA(mcmv)
~5
|MB0|2 e−2κTIL L2c (Emax − Emin) (2.34)
where we recall that Emin=max{ECT , EFT}, Emax=min{EVB, EFB} define the tun-
neling window.
It should be noticed that Eq.2.34 is consistent with a complete loss of mo-
mentum conservation, so that the current is simply proportional to the integral
over the tunneling window of the product of the density of states in the two 2D
layers. Since for a parabolic effective mass approximation the density of states
is energy independent, the current turns out to be simply proportional to the
width of the tunneling window. In physical terms, Eq.2.34 corresponds to a
situation where the scattering produces a complete momentum randomization
during the tunneling process.
As can be seen, as long as the top layer is not degenerate we have Emin=ECT
and the tunneling window widens with the increase of the top gate voltage
VT,G, hence according to Eq.2.34 the current is expected to increase linearly with
VT,G. However, when the tunneling window increases to such an extent that
q2M becomes comparable to or larger than 1/L
2
c , then part of the q values in the
39
integration of Eq.2.30 belong to the tail of the spectrum SF(q) defined in Eq.2.16,
and so their contribution to the current becomes progressively vanishing. The
corresponding physical picture is that, while the tunneling window increases,
the magnitude of the wave-vectors in the two 2D layers also increases, and con-
sequently the scattering can no longer provide momentum randomization for
all the possible wave-vectors involved in the tunneling process. Under these
circumstances the current is expected to first increase sub-linearly with VTG and
eventually saturate for large enough VTG values.
2.3 Numerical Results for the Tunneling Current
2.3.1 Parabolic Band Approximation
The 2D materials used for the tunneling current calculations reported in this
paper are the hexagonal monolayer MoS 2 and WTe2. The band structure for
MoS 2 and WTe2 have been calculated by using a density functional theory (DFT)
approach,18, 43 showing that these materials have a direct bandgap with the band
edges for both the valence and the conduction band residing at the K point in
the 2D Brillouin zone. Fig.2.4 shows that in a range of about 0.4 eV from the
band edges the DFT results can be fitted fairly well by using an energy relation
based on a simple parabolic effective mass approximation (dashed lines). Hence
the parabolic effective mass approximation appears adequate for the purposes
of this work, which is focussed on a device concept for extremely small supply
voltages (< 0.5 V). The values for the effective masses inferred from the fitting of
the DFT calculations are tabulated in Tab.2.1 together with some other material
40
parameters relevant for the tunneling current calculations.
Figure 2.4: (a) Band structure for hexagonal monolayer MoS2 and (b) hexagonal
monolayer WTe2 as obtained using DFT method described in the paper of C.
Gong et.al.18 The dashed lines represent the analytical approximation obtained
with a parabolic effective mass model.
2.3.2 Effects of Correlation Lengthes, Interlayer Thicknesses
and Energy Broading
Bandgap Electron affinity Conduction band Valence band
(eV) (χ) effective mass (m0) effective mass (m0)
MoS2 1.8 4.30 0.378 0.461
WTe2 0.9 3.65 0.235 0.319
Table 2.1: The band gaps, electron affinities and effective masses used for MoS2
and WTe2
41
In all current calculations we assume a top gate work function of 4.17 eV
(Aluminium) and back gate work function of 5.17 eV (p++ Silicon) and the top
and bottom oxide have an effective oxide thickness (EOT) of 1 nm (see Fig.2.1).
The top 2D layer consists of hexagonal monolayer MoS2 while the bottom 2D
layer is hexagonal monolayer WTe2. An n-type and p-type doping density of
1012cm−2 by impurities and full ionization are assumed respectively in the top
and bottom 2D layer and the relative dielectric constant of the interlayer mate-
rial is set to 4.2 (e.g. boron nitride). The voltage VDS between the drain and the
source is set to 0.3 V and the back gate is grounded for all calculations, unless
otherwise stated.
As already pointed out in Sec.2.2.2, it is very difficult to derive a quantitative
expression for the tunneling matrix element MB,0. However, the value of MB,0
could be inferred from the experimental data. In the lack of experimental data
for a vertical transistor consisting of transition-metal dichalcogenides, we have
set the value of MB,0 to be 0.01 eV in our calculations so that the resultant current
density is in the same order of magnitude with the experimental value reported
in the graphene/hBN system.44
In Fig. 2.5, the results of numerical calculations are shown for the band
alignment and the current density versus the top gate voltage VTG. Figure 2.5(a)
shows that the top gate voltage can effectively govern the band alignment in the
device and, in particular, the crossing and uncrossing between the conduction
band minimum ECT in the top layer and the valence band maximum EVB in the
bottom layer, which discriminates between the on and off state of the transistor.
The IDS versus VTG characteristic in Fig.2.5(b) can be roughly divided into
three different regions: sub-threshold region, linear region and saturation re-
42
Figure 2.5: Numerical results of (a) band alignment versus the top gate voltage
VTG and (b) tunnel current density versus the top gate voltage VTG for differ-
ent values of the correlation length LC. The parameters used in (b) are: ma-
trix element is MB0 = 0.01 eV ; decay constant of wave-function in the interlayer
is κ = 3.8 nm−1; energy broadening is σ = 10meV and interlayer thickness is
TIL = 0.6 nm (e.g. 2 atomic layers of BN). VBG = 0 and VDS = 0.3V in both (a) and
(b).
gion. The sub-threshold region corresponds to the condition ECT>EVB (see also
Fig.2.5(a)), where the very steep current dependence on VTG is illustrated better
in Fig.2.6 and will be discussed below.
In the second region IDS exhibits an approximately linear dependence on
VTG, in fact the current is roughly proportional to the energy tunneling window,
as discussed in Sec.2.2.5 and predicted by Eq.2.34, because the tunneling win-
dow is small enough that the condition q2M1/L2c is fulfilled. In this region IDS
is proportional to the long-wavelength part of scattering spectrum (i.e. small
q values), hence the current increases with Lc, as expected from Eq.2.34. The
43
super-linear behavior of IDS at small VTG values observed in Fig.2.5(b) is due to
the tail of the Fermi occupation function in the top layer. When VTG is increased
above approximately 0.5V, the current in Fig.2.5(b) enters the saturation region,
where IDS increasing with VTG slows down because of the decay of the scattering
spectrum SR(q) for q values larger than 1/Lc (see Eq.2.16).
Figure 2.6: Numerical calculations for: (a) current density versus VTG with sev-
eral interlayer thicknesses; (b) current density versus VTG with different values
of energy broadening σ. The insert shows that SS increases with σ, and a SS
value of 60 mV/dec corresponds to a energy broadening as high as 40 meV. The
matrix element is MB0 = 0.01 eV ; the decay constant of wave-function in the in-
terlayer is κ = 3.8 nm−1. In (a) the energy broadening is σ = 10meV . In (b) the
interlayer thickness is TIL = 0.6 nm (e.g. 2 atomic layers of BN). VBG = 0 and
VTG = 0.3V in both (a) and (b).
In Fig.2.6 we analyze the I-V curves for different interlayer thicknesses TIL
and broadening energies σ; in all cases an average inverse sub-threshold slope
(SS) is extracted in the IDS range from 10−5 and 10−2 µA/µm2. Figure 2.6(a) shows
that the tunneling current increases exponentially by decreasing TIL, and the de-
44
cay constant κ=3.8 nm−1 employed in our calculations results in a dependence
on TIL that is roughly consistent with the dependence experimentally reported
in graphene based interlayer tunneling devices.15 The threshold voltages are
also shifted to lower values by increasing TIL. It can be seen that the TIL im-
pact on SS is overall weak and a very steep sub-threshold region is obtained
for all the TIL values examined in Fig.2.6(a). This is because, in order for the
Thin-TFET to obtain a small SS, it is absolutely necessary that VTG has a tight
control on the electrostatic potential in the top semiconductor layer, but has a
negligible influence on the potential of the bottom semiconductor layer. The SS
is thus insensitive to TIL as long as TIL does not change the control of VTG on
such potentials. In short, for Thin-TFETs, a larger interlayer thickness reduces
substantially the current density, but does not deteriorate SS.
Figure 2.6(b) shows that according to the model employed in our calcu-
lations SS is mainly governed by the parameter σ of the energy broadening
(Eq.2.22). This result is expected, as already mentioned in Sec.2.2.3, since in our
model the energy broadening is the physical factor setting the minimum value
for SS and the IDS versus VTG approaches a step-like curve when σ is zero due
to the step-like DoS of these 2D semiconductors.45 More specifically, Fig.2.6(b)
shows that according to our calculations the Thin-TFET may be able to provide
an SS below the 60mV/dec (i.e. the limit of conventional MOSFETs at room
temperature), even for fairly large broadening energies up to about 40 meV. It
is here worth noting that energy broadening and band tails have been already
recognized as a fundamental limit to the SS of band-to-band tunneling transis-
tors,46 and are not a specific concern of the Thin-TFET. As already mentioned
in Sec.2.2.3, the band tails in 3D semiconductors have been investigated by us-
ing thermal measurements and are described in terms of the so called Urbach
45
parameter E0.34, 35 Values for E0 comparable to the room temperature thermal
energy, kBT'26meV, have been reported for GaAs and InP.47, 48 Our results sug-
gest that energy broadening and band tails in 2D materials play a critical role in
the minimum SS attainable by Thin-TFETs, and at the time of writing we are not
aware of experimental data reported for band tails in monolayers of transition-
metal dichalcogenides.
2.4 N-type and P-type Thin-TFETs
Out of various 2D semiconductors studied by density function theory calcula-
tions18 and experimental efforts, we chose the trigonal prismatic coordination
monolayer (2H) WSe2 and the octahedral coordination (CdI2 crystal structure)
monolayer (1T) SnSe2 (see Fig.2.7). WSe2/SnSe2 stacked-monolayer heterojunc-
tion can potentially form a near broken band alignment, which reduces the volt-
age drop in the van der Waals gap in the on-state condition.6 Since there is no
experimental band alignment reported for monolayer WSe2 and SnSe2, the band
alignment of the WSe2/SnSe2 system used in this work are based on the existing
experimental results of multilayer WSe2 and SnSe2,49–51 while their approximated
effective masses are based on the DFT results of monolayer WSe2 and SnSe218
(see Fig.2.7).
Following the complex band method,52 we assume the effective barrier
height EB of the van der Waals gap is 1 eV and the electron mass in the van der
Waals gap is the free electron mass m0, thus the decay constant is κ =
√
2m0EB/~
= 5.12 nm−1. In our model, we set the scattering correlation length LC in S F(q)
to LC=10 nm, which is also consistent with the value employed in;14 the energy
46
Figure 2.7: An example to realize both n-type and p-type Thin-TFETs using one
pair of 2D semiconductors (2H-WSe2 and 1T-SnSe2) with near broken gap band
alignment. For the n-type Thin-TFET, SnSe2 is the top (i.e. drain) 2D layer and
WSe2 is the bottom (i.e. source) 2D layer, along with the top and back gate la-
beled as n-type in blue. While for the p-type Thin-TFET, WSe2 is the top (i.e.
drain) 2D layer and SnSe2 is the bottom (i.e. source) 2D layer, along with the top
and back gate labeled as p-type in red; Band gaps, electron affinities, effective
masses are shown for WSe2 and SnSe2. The n-type and p-type metal work func-
tions are tuned to give symmetric threshold voltages for the n-type and p-type
Thin-TFETs.
broadening σ is set to be 10 meV.
MB0 in Eq.2.15 is directly related to the interlayer charge transfer time τ across
the van der Waals gap, which can be written as53
τ−1 =
2pi
~
ρ|MB0|2e−2κTvdWS F(q) (2.35)
where ρ=gvm∗/pi~2 is the density of states (DOS). Recall the tunneling current
can be written as:
47
JT =
gve |MB0|2 A
4pi3~
e−2κTvdW×∫
kT
∫
kB
dkT dkB S F(q) S E(EB − ET ) ( fB − fT )
(2.36)
As can be seen from Eq.2.35 and the expression of the scattering potential
spectrum SF(q) (given after Eq.2.36), due to scattering in our model, τ increases
with increasing q, which is the magnitude of the wave-vector difference across
the van der Waals gap defined as q=|kT−kB|. In a recent experiment, a charge
transfer time of 25 fs has been observed across the van der Waals gap between
a stacked-monolayer MoS2/WS2 heterostructure, which, according to Eq.2.35,
gives us MB0 ∼0.02 eV when q=0. We recognize that the charge transfer time
might be different for different 2D heterojunctions, nevertheless, this experi-
mentally determined charge transfer time is a reasonable value to use for the
first pass estimate. Thus, we choose MB0=0.02 eV in all following simulations.
Throughout this work, the gate length is set to be 15 nm, the back gate and
source are grounded. An effective oxide thickness (EOT) of 1 nm is used for
both the top and back oxide, which gives a top (back) oxide capacitance CTG
(CBG) of 0.518 fF/µm. The thickness of the van der Waals gap is set to 0.35 nm,
unless specified otherwise. We assume the relative dielectric constant of the van
der Waals gap is 1.0, therefore the van der Waals gap capacitance CvdW is 0.38
fF/µm. The external total access resistances are considered after the intrinsic
device performance is discussed first (Figs.2.8 and 2.9).
The example material systems for n-type and p-type Thin-TFETs based on
the stacked-monolayer WSe2 and SnSe2 are shown in Fig.2.7. The metal work
functions are tuned to obtain a symmetric threshold voltage for the n-type and
the p-type Thin-TFET. Figure 2.8(a) shows the band alignment versus VTG. VTG
48
(a) (b)
(d)(c)
(f )(e)
-0.4 -0.2 0.0 0.2 0.4
-0.2
-0.1
0.0
0.1
0.2 VDS= 0.4 V
Tunnel 
Window
 EVB (WSe2)
 ECT (SnSe2)
VDS= -0.4 V
nTFET
 
 
Tunnel 
Window
 EVT (WSe2)
 ECB (SnSe2)
E
n
e
rg
y
 (
e
V
)
VTG (V)
Use EFB as the energy reference
E
FB
=0
pTFET
-0.4 -0.2 0.0 0.2 0.4
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
 Average SS
=14 mV/dec
 Average SS
=14 mV/dec
C
u
rr
e
n
t 
D
e
n
s
it
y
 (
µ
A
/µ
m
)
 
 
 
 
V
TG 
(V)
pTFET
VDS= -0.4 V VDS= 0.4 V
nTFET
-0.4 -0.2 0.0 0.2 0.4
0
1
2
3
4
5
T
ra
n
s
c
o
n
d
u
c
ta
n
c
e
 g
m
 (
m
S
/µ
m
)
 
 
 
 
V
TG 
(V)
pTFET
VDS= -0.4 V VDS= 0.4 V
nTFET
-0.4 -0.2 0.0 0.2 0.4
10
5
10
6
10
7
10
8
10
9
10
10
10
11
10
12
10
13
n
B
p
T
-0.4 V
-0.3 V
VDS=-0.2 V
n
T
V
TG 
(V)
VDS=0.2 V
0.4 V
0.3 V
 
 
 
 
p
B
pTFET nTFET
C
h
a
rg
e
 C
o
n
c
e
n
tr
a
ti
o
n
 (
c
m
-2
)
-0.4 -0.2 0.0 0.2 0.4
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
10
0
10
1
Cq,T
Cq,B
Cq,T
-0.4 V
-0.3 V
VDS=-0.2 V
V
TG 
(V)
VDS=0.2 V
0.4 V
0.3 V
 
 
 
 
pTFET nTFET
Q
u
a
n
tu
m
 C
a
p
a
c
it
a
n
c
e
 (
fF
/µ
m
)
Cq,B
-0.4 -0.2 0.0 0.2 0.4
0
100
200
300
400
0 V
0.1 V
0.2 V
0.3 V
VTG= 0.4 V
C
u
rr
e
n
t 
D
e
n
s
it
y
 (
µ
A
/µ
m
)
0 V
-0.1 V
-0.2 V
-0.3 V
 
 
 
 
V
DS 
(V)
 
VTG= -0.4 V
pTFET nTFET
Figure 2.8: For the n-type and p-type Thin-TFETs shown in Fig. 2.7: (a) the
band alignment versus VTG; (b) Current density versus VTG, the average SS is
calculated from 10−3 µA/µm to 10 µA/µm; (c) the current density versus VDS at
various VTG; (d) the transconductance versus VTG; (e) the carrier concentration
in the top and bottom 2D layers versus VTG at various VDS ; (f) the quantum
capacitances of the top and bottom 2D layers versus VTG at various VDS ;
49
can effectively control the vertical band alignment in the device by controlling
primarily the band edge of the top (i.e. drain) layer while having a weak effect
on the band edge of the bottom (i.e. source) layer, so that a tunneling window
is modulated. Figure.2.8(b) shows ID versus VTG transfer curves with very com-
pelling average SS of ∼14 mV/dec averaged from 10−3 µA/µm to 10 µA/µm.
The ID versus VDS family curves are shown in Fig.2.8(c). ID saturates for VDS
when VDS>∼0.2 V. The superlinear onset is also observed and the so called VDS
threshold voltage increases at lower VTG.54 A peak transconductances of ∼4
mS/µm is observed around VTG=0.12 V (Fig.2.8(d)), which are much larger than
∼0.8 mS/µm reported peak transconductances of 10 nm Fin-FET.55 In Fig.2.8(e),
the top gate changes the carrier concentrations of the top 2D semiconductor
much faster than of the bottom 2D semiconductor under different VDS . The
ability to efficiently change a hole (electron) concentration in the top 2D semi-
conductor while keeping a high electron (hole) concentration in the bottom
2D semiconductor is vital to achieve good electrostatics control of these Thin-
TFETs. The quantum capacitance associated with the top and bottom semicon-
ductor layers can be expressed as Eq.2.37:
CQ,T (B) = −
[
e∂pT (B)
∂φp,T (B)
+
e∂nT (B)
∂φn,T (B)
)
]
(2.37)
The quantum capacitances are plotted in Figure.2.8(f) under various bias condi-
tions.
50
(a) (b)
-0.4 -0.3 -0.2 -0.1 0.0
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
T
vdW
= 3.0 Å
       = 3.5 Å
       = 4.0 Å
       = 4.5 Å
       = 5.0 Å
       = 5.5 Å
       = 6.0 Å 
C
u
rr
e
n
t 
D
e
n
s
it
y
 (
µ
A
/µ
m
)
 
 
 
 
V
TG 
(V)
VDS=-0.4 V
0 10 20 30 40 50
10
20
30
40
50
60
70
Step of  T
vdW
:0.5 Å  
T
vdW
=3.0 to 6.0 Å
T
vdW
=3.0 to 5.0 Å 
T
vdW
=3.0 to 4.0 Å
S
u
b
th
re
s
h
o
ld
 S
w
in
g
 S
S
 (
m
V
/d
e
c
) VDS=-0.4 V
 
 
 
Current Density (µA/µm)
 
T
vdW
=3.5 Å 
SS = 60 mV/dec
Figure 2.9: Effect of van der Waals gap thickness variation on a p-type Thin-
TFET: (a) tunnel current density versus VTG for different van der Waals gap
thicknesses TvdW ; (b) differential SS versus current density assuming an evenly
distributed van der Waals gap thickness TvdW in the specified range.
2.4.1 Effects of Non-uniform van der Waals Gap Thickness and
Access Resistance
Due to the nature of van der Waals bonds, the van der Waals gap thickness
is subject to intercalation of atoms/ions, interlayer rotational misalignment be-
tween 2D layers etc. For instance, in bilayer mechanically stacked Molybdenum
Disulfide (MoS2) with an interlayer twist, a maximum variation of 0.059 nm56
was experimentally verified in the van der Waals gap thickness [22]. Surface
roughening due to ripples in 2D crystals or roughness of the underlying sub-
strates can also introduce van der Waals gap variations.57 Meanwhile, tunnel-
ing probability is very sensitive to the tunneling distance, namely the van der
Waals gap thickness in a Thin-TFET, which makes it important to investigate ef-
fects of a non-uniform van der Waals thickness. First, the Thin-TFET I-V curves
are calculated by varying the van der Waals gap thickness TvdW from 0.3 nm to
51
0.6 nm and a step of 0.05 nm (which is roughly half of the Se covalent radius58).
The results are shown in Fig.2.9(a) for a p-type Thin-TFET: the on current den-
sity decreases and the threshold voltage moves towards 0 when increasing the
TvdW . We note that, as long as the TvdW is uniform, the SS remains as steep as
∼14 mV/dec. However, for a non-uniform TvdW , SS will degrade. To estimate its
impact, an evenly distributed TvdW over several ranges is used in the calculated
differential SS shown in Fig.2.9(b). For example, for a 2D heterojunction with an
evenly distributed TvdW from 0.3 nm to 0.5 nm and a step of 0.05 nm, we take the
corresponding ID-VTG curve for each TvdW (i.e. 0.3 nm, 0.35 nm, 0.4 nm, 0.45 nm,
and 0.5 nm) shown in Fig.2.9(a) and average them over the TvdW range to obtain
the overall ID-VTG curve for the calculation of SS. Fig.2.9(b) shows that up to 0.1
nm variation in TvdW is tolerable, resulting in a sub-60 mV/dec SS over a decent
current window (up to 50 µA/µm). Depending on how Thin-TFETs are fabri-
cated, the TvdW non-uniformity may have different distributions. Our first look
at its impact in this work highlights the importance to precisely control TvdW .
A finite total access resistance has a critical impact on ultrascaled transistors.
To date, how to minimize the total access resistance in 2D crystal based device
still remains an open question. In Fig. 2.10, we show its effects on Thin-TFET
by assuming several values for the total access resistance RC. At a sufficiently
high |VDS | of 0.4 V, maximum ID is almost the same for a RC of up to 320 Ωµm;
a higher RC decreases maximum ID appreciably. Understandably, a lower RC is
necessary for a lower VDD. In an ideal 2D conductor, the quantum limit of the
total access resistance is inversely proportional to the square root of the carrier
concentration; e.g. ∼52 Ωµm for a carrier concentration of 1013 cm−2.59 Thus the
access region of 2D semiconductors can be degenerately doped to minimize RC.
52
(a) (b)
-0.4 -0.3 -0.2 -0.1 0.0
0
100
200
300
400
RC=640 Ω µm
RC=52 Ω µm
RC=320 Ω µm
RC= 0
C
u
rr
e
n
t 
D
e
n
s
it
y
 (
µ
A
/µ
m
)
 
 
 
 
V
TG 
(V)
 
V
DS
= -0.4 V
       
-0.4 -0.3 -0.2 -0.1 0.0
0
100
200
300
400
RC=52 Ω µm
RC=0
RC=640 Ω µm
RC=320 Ω µm
C
u
rr
e
n
t 
D
e
n
s
it
y
 (
µ
A
/µ
m
)
 
 
 
 
V
DS 
(V)
 
            V
TG
= -0.4 V
                   
Figure 2.10: Effect of total access resistance on a p-type Thin-TFET: (a) ID versus
VTG and (b) ID versus VDS with various total access resistance RC values.
?????
?
??
?
???? ???
??? ?????
Figure 2.11: Capacitance network model of the Thin-TFET
2.4.2 Capacitance Evaluation
The gate-to-drain and gate-to-source capacitances (i.e. CGD, CGS ) can be readily
calculated from the capacitance network shown in Fig.2.11.
53
The quantum capacitances CQ,T (B) of the top (bottom) 2D semiconductor are
defined in Eq.2.37 and indicated as the red non-linear capacitances in Fig.2.11.
First we define CS as:
1/CS ≡ 1/CvdW + 1/(CQ,B +CBG) (2.38)
Then, CGD and CGS can be written as Eqs.2.39:
CGS =
CTGCS
CTG +CQ,T +CS
CGD =
CTGCQ,T
CTG +CQ,T +CS
(2.39)
Due to the symmetry in these p-type and n-type Thin-TFETs as well as the
similar hole and electron effective mass in these 2D crystals, we expect similar
C-V characteristics for the p-type and n-type Thin-TFETs. In Fig.2.12 we plot
the calculated C-V curves for the p-type Thin-TFETs shown in Fig.2.7. In the
linear region of the ID-VDS family of curves, CGD is significant, where the drain
is coupled with the top gate to modulate the tunnel current. From the linear
region to the saturation region, CGD drops to be near zero while CGS increases to
its maximum. What is worthy noting is that the magnitude of a Thin-TFET ca-
pacitance is smaller than CMOS and III-V TFET benchmarked in Sec. 2.4.3 for a
given gate oxide EOT thus capacitances, which stem from the serially connected
capacitance components as shown in Fig.2.11. The capacitance model is useful
for implementing the Thin-TFET into circuit simulations.
2.4.3 Benchmarking
The Semiconductor Research Corporation (SRC) Nanoelectronic Research Ini-
tiative (NRI) has supported research on beyond CMOS devices as reported by
54
(a) (b)
-0.4 -0.3 -0.2 -0.1 0.0
0.0
0.1
0.2
0.3
0.4
= -0.4 V
= -0.3 V
VTG= -0.2 V
= -0.2 V
= -0.3 V
VTG= -0.4 V
Red: C
GS
Blue: C
GD
C
a
p
a
c
it
a
n
c
e
 (
fF
/µ
m
)
 
 
 
 
V
DS 
(V)
-0.4 -0.3 -0.2 -0.1 0.0
0.0
0.1
0.2
0.3
0.4
VDS= -0.4 V
     = -0.3 V
= -0.2 V
= -0.4 V
= -0.3 V
VDS= -0.2 V
Red: C
GS
C
a
p
a
c
it
a
n
c
e
 (
fF
/µ
m
)
Blue: C
GD
 
 
 
 
V
TG 
(V)
Figure 2.12: For the p-type Thin-TFET, (a) CGD and CGS versus VDS at VTG=−0.2,
−0.3, −0.4 V; (b) CGD and CGS versus VTG at VDS=−0.2, −0.3, −0.4 V.
Bernstein, et al.60 As part of the initiative, the projected performance of the
beyond-CMOS devices and the CMOS of the same technology node was com-
pared, i.e. benchmarked. The benchmarking activity has continued by Nikonov
and Young61.62 Thin-TFET being proposed by us primarily under the support of
SRC STARnet, we participated in the recent benchmarking using the Nikonov
and Young (N&Y) methodology.
The N&Y methodology uses basic device performance parameters such as
operating voltage (VDD = |VDS |), saturation current (IDsat), and average gate ca-
pacitance (CG,avg), to project logic switching energy and delay. The change of
the net charge under the gate (∆Q=q∆ns) when VTG switches from 0 to VDD is
the sum of the change of the net charge in the top 2D semiconductor and the
bottom 2D semiconductor. The average gate capacitance (CG,avg) is defined as
∆Q/VDD. Here we take the p-type Thin-TFET as an example, IDsat and CG,avg are
provided in Table 2.2 for a few VDD values of 0.2, 0.3, and 0.4 V and a few total
55
Table 2.2: Benchmarking Parameters
Parameters for Thin-TFETs with various VDD and RC
VDD (V) 0.2 0.3 0.4
RC (Ωµm) 52 320 52 320 52 320
IDsat (µA/µm) 263 233 325 317 349 348
∆Q (fC/µm2) 2.34 2.80 3.33 3.72 4.30 4.47
∆ns×1012(/cm−2) 1.46 1.75 2.08 2.32 2.69 2.79
CG,avg (fF/µm) 0.175 0.210 0.167 0.186 0.161 0.168
Parameters for HP/LP CMOS and HetJ/HomJ TFET62
VDD (V) IDsat (µA/µm) CG,avg (fF/µm)
HP CMOS 0.73 1805 1.29
LP CMOS 0.3 2 1.29
HetJTFET 0.4 500 1.04
HomJTFET 0.2 25 1.04
Geometrical Parameters for Benchmarking
Half-pitch EOT (nm) Gate Length Gate Width
(F) (nm) (nm) (L) (nm) (W) (nm)
15 1 15 60
access resistance RC values of 52 and 320 Ωµm. The device parameters for High
Performance (HP) CMOS, Low Power (LP) CMOS, InAs Homojunction TFET
(HomJTFET) and InAs/GaSb Heterojunction TFET (HetJTFET) are taken from
Ref.62 and we use the same geometrical parameters for all the devices as shown
in Table 2.2, while neglecting the contact capacitance.
The intrinsic switching delay tint and the intrinsic switching energy Eint are
calculated by:62
tint =
CG,avgVDD
IDsat
Eint = CG,avgWV2DD
(2.40)
In Fig.2.13, we plot the projected values of tint and Eint of the devices listed in
Table 2.2.
As far as the intrinsic switching energy-delay product is concerned, the Thin-
56
10
-2
10
-1
10
0
10
1
10
2
10
-1
10
0
10
1
10
2
HetJTFET
HomJTFET
LP CMOS
R
C
=52 Ω µm
Desired Corner
In
tr
in
s
ic
 S
w
it
c
h
in
g
 E
n
e
rg
y
 (
a
J
)
 
 
 
 
Intrinsic Switching Delay (ps)
HP CMOS
V
DD
=0.4 V
     =0.3 V
     =0.2 V
R
C
=320 Ω µm
Thin-TFET
Figure 2.13: The intrinsic switching energy and delay for HP CMOS, LP CMOS,
HetJTFET, HomJTFET and Thin-TFETs with VDD=0.2, 0.3, 0.4 V and RC=52, 320
Ωµm.
TFET shows distinct energy consumption and performance advantages. For
instance, Thin-TFET operation at a VDD as low as 0.2 V is fast because its current
is still significantly high. The most distinguishing feature of a Thin-TFET is its
low intrinsic capacitance in comparison to the other devices. This advantage
will be less significant when device parasitics become dominant in completed
circuits.
It is observed that the Thin-TFET intrinsic switching energy-delay product
moves toward the desired corner when decreasing VDD from 0.4 V to 0.2 V. This
is an unusual but favorable behavior for ultrascaled switches. In the case of 15
nm CMOS, ID is roughly proportional to VDD. While in the ON state of Thin-
TFET, ID has much weaker dependence on VTG (see Fig. 2.10(a)) than CMOS,
thus VDD to ID ratio actually decreases when scaling down VDD from 0.4 V to
0.2 V. Therefore, given that CG,avg stays roughly the same (increasing slightly
57
with decreasing VDD), the intrinsic switching time tint slightly decreases when
decreasing VDD.
2.5 Discussion and Conclusions
2.5.1 Experimental Insights
Since our proposal of Thin-TFET in 201263 that is derived from our III-V TFET
design,6 several key challenges have been identified along our pursuit in ex-
perimental demonstration of Thin-TFETs.64 The foremost is the scarcity of
electronic-grade layered materials and knowledge of their properties, in partic-
ular, the semiconductor heterojunctions with near broken gap alignment. The
reasonably well-characterized material properties in the literature are largely
based on bulk layered materials. An exponentially growing number of pub-
lications in the recent years on monolayer and few-layer materials are mainly
theoretical calculations or based on exfoliation of naturally occurring crystals or
synthesized by chemical vapor transport, which typically contains a few atomic
percent of defects (impurities, vacancies etc). Both chemical vapor deposition
and molecular beam epitaxy65 are actively pursued by the community to grow
electronic grade layered materials.
Besides lack of high quality layered materials and heterojunctions, the fab-
rication development of Thin-TFET is also challenging. It inherits all the fun-
damental fabrication challenges of a TFET including doping profile, alignment
especially gate registry, gate dielectrics, ohmic contacts. Atomic layer depo-
sition has been improved over years to achieve good quality gate dielectrics
58
on 2D crystals.66 Using 2D dielectrics such as hexagonal boron nitride as the
gate dielectrics has also been pursued.67 Third, low resistance ohmic contacts
to 2D crystal are vital to device performance. Various techniques such as ex-
ternal chemical doping,68 internal chemical doping,69 electrostatic doping such
as ion doping70 and phase-engineering from the semiconductor phase to the
metallic phase of a 2D crystal,71 have been implemented to reduce the contact
resistances. Furthermore, Thin-TFETs demand true precision layer number con-
trol since the properties of nearly all layered materials critically depend on the
layer number when the layer number is in the range of 1-3 nm.
2.5.2 Conclusions
This paper proposed a new steep slope transistor based on the interlayer tunnel-
ing between two 2D semiconductor materials and presented a detailed model
to discuss the physical mechanisms governing the device operation and to gain
an insight about the tradeoffs implied in the design of the transistor.
The tunnel transistor based on 2D semiconductors has the potential for a
very steep subthreshold region and the subthreshold swing is ultimately lim-
ited by the energy broadening in the two 2D materials. The energy broaden-
ing can have different physical origins such as disorder, charged impurities in
the 2D layers or in the surrounding materials39 ,,37 phonon scattering72 and mi-
croscopic roughness at interfaces.38 In our calculations we accounted for the
energy broadening by assuming a simple gaussian energy spectrum with no ex-
plicit reference to a specific physical mechanism. However, a more detailed and
quantitative description of the energy broadening is instrumental in physical
59
modeling of the device and its design.
Quite interestingly, our analysis suggests that, while a possible rotational
misalignment between the two 2D layers can affect the absolute value of the
tunneling current, the misalignment is not expected to significantly degrade the
steep subthreshold slope, which is the crucial figure of merit for a steep slope
transistor.
An optimal operation of the device demands a good electrostatic control of
the top gate voltage VTG on the band alignments in the material stack, as shown
for example in Fig.2.5(a), which may become problematic if the electric filed in
the interlayer is effectively screened by the high electron concentration in the
top 2D layer. Consequently, since high carrier concentrations in the 2D layers
are essential to reduce the layer resistivities, a tradeoff exists between the gate
control and layer resistivities; as a result, doping concentrations in these 2D lay-
ers are important design parameters in addition to tuning the threshold voltage.
In this respect, chemical doping of TMD materials have been recently demon-
strated,68, 73 however these doping technologies are still far less mature than they
are for 3D semiconductors, and improvements in in-situ doping will be very im-
portant for optimization of the device performance. Since our model does not
include the lateral transport in the 2D materials, an exploration of the above
design tradeoffs goes beyond the scope of the present paper and demands the
development of more complete transport models.
The transport model proposed in this work does not account for possible
traps or defects assisted tunneling, which have been recently recognized as a
serious hindrance to the experimental realization of Tunnel-FETs exhibiting a
sub-threshold swing better than 60 mV/dec.11, 12 A large density of states in
60
the gap of the 2D materials may even lead to a Fermi level pinning that would
drastically degrade the gate control on the band alignment and undermine the
overall device operation. In this respect, from a fundamental viewpoint the
2D crystals may offer advantages over their 3D counterparts because they are
inherently free of broken/dangling bonds at the interfaces.19 However, the fab-
rication technologies for 2D crystals are still in an embryonal stage compared
to technologies for conventional semiconductors, hence the control of defects
in the 2D materials will be a challenge for the development of the proposed
tunneling transistor.
The simulation results reported in this paper indicate that the newly pro-
posed transistor based on interlayer tunneling between two 2D materials has
the potential for a very steep turn-on characteristic, because the vertical stack
of 2D materials having an energy gap is probably the device structure that al-
lows for the most effective, gate controlled crossing and uncrossing between the
edges of the bands involved in the tunneling process. A uniform van der Waals
gap thickness and low total access resistance are vital to optimize the Thin-TFET
performance. The benchmark study shows Thin-TFETs may have distinct ad-
vantages over CMOS and III-V TFETs in term of both performance and energy
consumption at low supply voltages. Our modeling approach based on the
Bardeen’s transfer Hamiltonian is by no means a complete device model but
instead a starting point to gain insight about its working principle and its de-
sign. At the present time an experimental demonstration of the device appears
of crucial importance, first of all to validate the device concept, and then to help
estimate the numerical value of a few parameters in the transport model that
can be determined only by comparing to experiments.
61
BIBLIOGRAPHY
1 J. M. Rabaey, J. Ammer, T. Karalar, S. Li, B. Otis, M. Sheets, and T. Tuan, “Pico-
radios for wireless sensor networks: the next challenge in ultra-low power de-
sign,” in Solid-State Circuits Conference, 2002. Digest of Technical Papers. ISSCC.
2002 IEEE International, vol. 1. IEEE, 2002, pp. 200–201.
2 R. Amirtharajah and A. P. Chandrakasan, “Self-powered signal processing us-
ing vibration-based power generation,” Solid-State Circuits, IEEE Journal of,
vol. 33, no. 5, pp. 687–695, 1998.
3 R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge,
“Near-threshold computing: Reclaiming moore’s law through energy efficient
integrated circuits,” Proceedings of the IEEE, vol. 98, no. 2, pp. 253–266, 2010.
4 I. W. Group et al., “International technology roadmap for semiconductors,
2011,” URL http://www. itrs. net, 2011.
5 A. C. Seabaugh and Q. Zhang, “Low-voltage tunnel transistors for beyond
cmos logic,” Proceedings of the IEEE, vol. 98, no. 12, pp. 2095–2110, 2010.
6 G. Zhou, R. Li, T. Vasen, M. Qi, S. Chae, Y. Lu, Q. Zhang, H. Zhu, J.-M. Kuo,
T. Kosel et al., “Novel gate-recessed vertical inas/gasb tfets with record high i
on of 180 µa/µm at v ds= 0.5 v,” in Electron Devices Meeting (IEDM), 2012 IEEE
International. IEEE, 2012, pp. 32–6.
7 K. Tomioka, M. Yoshimura, and T. Fukui, “Sub 60 mv/decade switch using
an inas nanowire–si heterojunction and turn-on voltage shift with a pulsed
doping technique,” Nano letters, 2013.
8 L. Knoll, Q.-T. Zhao, A. Nichau, S. Trellenkamp, S. Richter, A. Schafer, D. Es-
seni, L. Selmi, K. K. Bourdelle, and S. Mantl, “Inverters with strained si
nanowire complementary tunnel field-effect transistors,” Electron Device Let-
ters, IEEE, vol. 34, no. 6, pp. 813–815, 2013.
9 D. Mohata, R. Bijesh, S. Mujumdar, C. Eaton, R. Engel-Herbert, T. Mayer,
V. Narayanan, J. Fastenau, D. Loubychev, A. Liu et al., “Demonstration of
mosfet-like on-current performance in arsenide/antimonide tunnel fets with
staggered hetero-junctions for 300mv logic applications,” in Electron Devices
Meeting (IEDM), 2011 IEEE International. IEEE, 2011, pp. 33–5.
62
10 F. Conzatti, M. Pala, D. Esseni, E. Bano, and L. Selmi, “Strain-induced per-
formance improvements in inas nanowire tunnel fets,” Electron Devices, IEEE
Transactions on, vol. 59, no. 8, pp. 2085–2092, 2012.
11 M. Pala and D. Esseni, “Interface traps in inas nanowire tunnel-fets and mos-
fets 2014;part i: Model description and single trap analysis in tunnel-fets,”
Electron Devices, IEEE Transactions on, vol. 60, no. 9, pp. 2795–2801, 2013.
12 D. Esseni and M. G. Pala, “Interface traps in inas nanowire tunnel fets and
mosfets—part ii: Comparative analysis and trap-induced variability,” Electron
Devices, IEEE Transactions on, vol. 60, no. 9, pp. 2802–2807, 2013.
13 R. M. Feenstra, D. Jena, and G. Gu, “Single-particle tunneling in doped
graphene-insulator-graphene junctions,” Journal of Applied Physics, vol. 111,
no. 4, pp. 043 711–043 711, 2012.
14 L. Britnell, R. Gorbachev, A. Geim, L. Ponomarenko, A. Mishchenko,
M. Greenaway, T. Fromhold, K. Novoselov, and L. Eaves, “Resonant tun-
nelling and negative differential conductance in graphene transistors,” Nature
communications, vol. 4, p. 1794, 2013.
15 L. Britnell, R. V. Gorbachev, R. Jalil, B. D. Belle, F. Schedin, M. I. Katsnelson,
L. Eaves, S. V. Morozov, A. S. Mayorov, N. M. R. Peres, A. H. Castro Neto,
J. Leist, A. K. Geim, L. A. Ponomarenko, and K. S. Novoselov, “Electron
tunneling through ultrathin boron nitride crystalline barriers,” Nano Letters,
vol. 12, no. 3, pp. 1707–1710, 2012, pMID: 22380756. [Online]. Available:
http://dx.doi.org/10.1021/nl3002205
16 B. Radisavljevic, A. Radenovic, J. Brivio, V. Giacometti, and A. Kis, “Single-
layer mos2 transistors,” Nature nanotechnology, vol. 6, no. 3, pp. 147–150, 2011.
17 Q. H. Wang, K. Kalantar-Zadeh, A. Kis, J. N. Coleman, and M. S. Strano, “Elec-
tronics and optoelectronics of two-dimensional transition metal dichalco-
genides,” Nature nanotechnology, vol. 7, no. 11, pp. 699–712, 2012.
18 C. Gong, H. Zhang, W. Wang, L. Colombo, R. M. Wallace, and K. Cho, “Band
alignment of two-dimensional transition metal dichalcogenides: Application
in tunnel field effect transistors,” Applied Physics Letters, vol. 103, no. 5, pp.
053 513–053 513, 2013.
19 D. Jena, “Tunneling transistors based on graphene and 2-d crystals,” Proceed-
ings of the IEEE, vol. 101, no. 7, pp. 1585–1602, 2013.
63
20 K. F. Mak, C. Lee, J. Hone, J. Shan, and T. F. Heinz, “Atomically thin mos {2}:
A new direct-gap semiconductor,” Physical Review Letters, vol. 105, no. 13, p.
136805, 2010.
21 D. Esseni, P. Palestri, and L. Selmi, Nanoscale MOS transistors: Semi-classical
transport and applications. Cambridge University Press, 2011.
22 J. Bardeen, “Tunnelling from a many-particle point of view,” Phys. Rev. Letters,
vol. 6, 1961.
23 W. A. Harrison, “Tunneling from an independent-particle point of view,”
Physical Review, vol. 123, no. 1, p. 85, 1961.
24 C. B. Duke, Tunneling in solids. Academic Press New York, 1969, vol. 1999.
25 P. Zhao, R. Feenstra, G. Gu, and D. Jena, “Symfet: A proposed symmetric
graphene tunneling field-effect transistor,” Electron Devices, IEEE Transactions
on, vol. 60, no. 3, pp. 951–957, 2013.
26 Z. Zhu, Y. Cheng, and U. Schwingenschlo¨gl, “Giant spin-orbit-induced spin
splitting in two-dimensional transition-metal dichalcogenide semiconduc-
tors,” Physical Review B, vol. 84, no. 15, p. 153402, 2011.
27 A. Ramasubramaniam, D. Naveh, and E. Towe, “Tunable band gaps in bilayer
transition-metal dichalcogenides,” Physical Review B, vol. 84, no. 20, p. 205325,
2011.
28 Q. Li, E. Hwang, E. Rossi, and S. D. Sarma, “Theory of 2d transport in
graphene for correlated disorder,” Physical review letters, vol. 107, no. 15, p.
156601, 2011.
29 J. Yan and M. S. Fuhrer, “Correlated charged impurity scattering in graphene,”
Physical Review Letters, vol. 107, no. 20, p. 206601, 2011.
30 M. Yankowitz, J. Xue, D. Cormode, J. D. Sanchez-Yamagishi, K. Watanabe,
T. Taniguchi, P. Jarillo-Herrero, P. Jacquod, and B. J. LeRoy, “Emergence of
superlattice dirac points in graphene on hexagonal boron nitride,” Nature
Physics, vol. 8, no. 5, pp. 382–386, 2012.
31 J. Xue, J. Sanchez-Yamagishi, D. Bulmash, P. Jacquod, A. Deshpande,
K. Watanabe, T. Taniguchi, P. Jarillo-Herrero, and B. J. LeRoy, “Scanning
64
tunnelling microscopy and spectroscopy of ultra-flat graphene on hexagonal
boron nitride,” Nature materials, vol. 10, no. 4, pp. 282–285, 2011.
32 R. Decker, Y. Wang, V. W. Brar, W. Regan, H.-Z. Tsai, Q. Wu, W. Gannett,
A. Zettl, and M. F. Crommie, “Local electronic properties of graphene on a
bn substrate via scanning tunneling microscopy,” Nano letters, vol. 11, no. 6,
pp. 2291–2295, 2011.
33 P. Van Mieghem, G. Borghs, and R. Mertens, “Generalized semiclassical
model for the density of states in heavily doped semiconductors,” Physical
Review B, vol. 44, no. 23, p. 12822, 1991.
34 F. Urbach, “The long-wavelength edge of photographic sensitivity and of the
electronic absorption of solids,” Physical Review, vol. 92, pp. 1324–1324, 1953.
35 G. Cody, “Urbach edge of crystalline and amorphous silicon: a personal re-
view,” Journal of non-crystalline solids, vol. 141, pp. 3–15, 1992.
36 E. O. Kane, “Thomas-fermi approach to impure semiconductor band struc-
ture,” Physical Review, vol. 131, no. 1, p. 79, 1963.
37 S. D. Sarma and B. Vinter, “Thomas-fermi screening and level broadening in
interacting two-dimensional electron-impurity systems,” Surface Science, vol.
113, no. 1, pp. 176–181, 1982.
38 A. Knabchen, “Self-consistent level broadening in thin films with volume
and surface roughness scattering,” Journal of Physics: Condensed Matter, vol. 7,
no. 27, p. 5209, 1995.
39 A. Ghazali and J. Serre, “Disorder, fluctuations and electron interactions in
doped semiconductors: A multiple-scattering approach,” Solid-State Electron-
ics, vol. 28, no. 1, pp. 145–149, 1985.
40 L. Britnell, R. Gorbachev, R. Jalil, B. Belle, F. Schedin, A. Mishchenko, T. Geor-
giou, M. Katsnelson, L. Eaves, S. Morozov et al., “Field-effect tunneling tran-
sistor based on vertical graphene heterostructures,” Science, vol. 335, no. 6071,
pp. 947–950, 2012.
41 S. Tiefenbacher, C. Pettenkofer, and W. Jaegermann, “Moire´ pattern in leed
obtained by van der waals epitaxy of lattice mismatched ws2/mote2 heteroin-
terfaces,” Surface science, vol. 450, no. 3, pp. 181–190, 2000.
65
42 A. Koma, “Van der waals epitaxy for highly lattice-mismatched systems,”
Journal of crystal growth, vol. 201, pp. 236–241, 1999.
43 G.-B. Liu, W.-Y. Shan, Y. Yao, W. Yao, and D. Xiao, “A three-band tight-binding
model for monolayers of group-vib transition metal dichalcogenides,” arXiv
preprint arXiv:1305.6089, 2013.
44 L. Britnell, R. V. Gorbachev, R. Jalil, B. D. Belle, F. Schedin, M. I. Katsnel-
son, L. Eaves, S. V. Morozov, A. S. Mayorov, N. M. Peres et al., “Atomically
thin boron nitride: a tunnelling barrier for graphene devices,” arXiv preprint
arXiv:1202.0735, 2012.
45 S. Agarwal and E. Yablonovitch, “Pronounced effect of pn-junction dimen-
sionality on tunnel switch sharpness,” arXiv preprint arXiv:1109.0096, 2011.
46 M. A. Khayer and R. K. Lake, “Effects of band-tails on the subthreshold
characteristics of nanowire band-to-band tunneling transistors,” Journal
of Applied Physics, vol. 110, no. 7, p. 074508, 2011. [Online]. Available:
http://dx.doi.org/10.1063/1.3642954
47 J. I. Pankove, “Absorption edge of impure gallium arsenide,” Phys.
Rev., vol. 140, pp. A2059–A2065, Dec 1965. [Online]. Available: https:
//link.aps.org/doi/10.1103/PhysRev.140.A2059
48 A. V. Subashiev, O. Semyonov, Z. Chen, and S. Luryi, “Urbach tail
studies by luminescence filtering in moderately doped bulk inp,” Applied
Physics Letters, vol. 97, no. 18, p. 181914, 2010. [Online]. Available:
http://dx.doi.org/10.1063/1.3510470
49 R. Schlaf, C. Pettenkofer, and W. Jaegermann, “Band lineup of a
SnS2/SnSe2/SnS2 semiconductor quantum well structure prepared by van der
waals epitaxy,” Journal of applied physics, vol. 85, no. 9, pp. 6550–6556, 1999.
50 R. Schlaf, O. Lang, C. Pettenkofer, and W. Jaegermann, “Band lineup of
layered semiconductor heterointerfaces prepared by van der waals epitaxy:
Charge transfer correction term for the electron affinity rule,” Journal of ap-
plied physics, vol. 85, no. 5, pp. 2732–2753, 1999.
51 L. Upadhyayula, J. Loferski, A. Wold, W. Giriat, and R. Kershaw, “Semi-
conducting properties of single crystals of n- and p-type tungsten diselenide
(WSe2),” Journal of Applied Physics, vol. 39, no. 10, pp. 4736–4740, 1968.
66
52 C. Sergio, Q. Gao, and R. M. Feenstra, “Theory of graphene–insulator–
graphene tunnel junctions,” Journal of Vacuum Science & Technology B, vol. 32,
no. 4, p. 04E101, 2014.
53 K. T. Lam, G. Seol, and J. Guo, “Operating principles of vertical transistors
based on monolayer two-dimensional semiconductor heterojunctions,” Ap-
plied Physics Letters, vol. 105, no. 1, p. 013112, 2014.
54 L. De Michielis, L. Lattanzio, and A. M. Ionescu, “Understanding the super-
linear onset of tunnel-fet output characteristic,” 2012.
55 B. Yu, L. Chang, S. Ahmed, H. Wang, S. Bell, C.-Y. Yang, C. Tabery, C. Ho,
Q. Xiang, T.-J. King, J. Bokor, C. Hu, M.-R. Lin, and D. Kyser, “FinFET scaling
to 10 nm gate length,” in Electron Devices Meeting, 2002. IEDM ’02. Interna-
tional, Dec 2002, pp. 251–254.
56 A. M. van der Zande, J. Kunstmann, A. Chernikov, D. A. Chenet, Y. You,
X. Zhang, P. Y. Huang, T. C. Berkelbach, L. Wang, F. Zhang et al., “Tailoring
the electronic structure in bilayer molybdenum disulfide via interlayer twist,”
Nano letters, vol. 14, no. 7, pp. 3869–3875, 2014.
57 J. Brivio, D. T. L. Alexander, and A. Kis, “Ripples and layers in ultrathin
mos2 membranes,” Nano Letters, vol. 11, no. 12, pp. 5148–5153, 2011, pMID:
22010987. [Online]. Available: http://dx.doi.org/10.1021/nl2022288
58 W. M. Haynes, CRC handbook of chemistry and physics. CRC press, 2012.
59 D. Jena, K. Banerjee, and G. H. Xing, “2D crystal semiconductors: Intimate
contacts,” Nat Mater, vol. 13, no. 12, pp. 1076–1078, 12 2014. [Online].
Available: http://dx.doi.org/10.1038/nmat4121
60 K. Bernstein, R. Cavin, W. Porod, A. Seabaugh, and J. Welser, “Device and ar-
chitecture outlook for beyond cmos switches,” Proceedings of the IEEE, vol. 98,
no. 12, pp. 2169–2184, Dec 2010.
61 D. Nikonov and I. Young, “Uniform methodology for benchmarking beyond-
cmos logic devices,” in Electron Devices Meeting (IEDM), 2012 IEEE Interna-
tional, Dec 2012, pp. 25.4.1–25.4.4.
62 ——, “Overview of beyond-cmos devices and a uniform methodology for
their benchmarking,” Proceedings of the IEEE, vol. 101, no. 12, pp. 2498–2533,
Dec 2013.
67
63 “The Center for Low Energy Systems Technology (LEAST) proposal,” led by
the University of Notre Dame, submitted to SRC, 2012.
64 S. Xiao, M. Li, A. Seabaugh, D. Jena, and H. Xing, “Vertical heterojunction of
mos2 and wse2,” in Device Research Conference (DRC), 2014 72nd Annual, June
2014, pp. 169–170.
65 S. Vishwanath, X. Liu, S. Rouvimov, J. K. Furdyna, D. Jena, and H. G. Xing,
“Molecular beam epitaxy of layered material superlattices and heterostruc-
tures,” Bulletin of the American Physical Society, 2014.
66 L. Cheng, X. Qin, A. T. Lucero, A. Azcatl, J. Huang, R. M. Wallace, K. Cho,
and J. Kim, “Atomic layer deposition of a high-k dielectric on MoS2 using
trimethylaluminum and ozone,” ACS applied materials &amp; interfaces, vol. 6,
no. 15, pp. 11 834–11 838, 2014.
67 T. Roy, M. Tosun, J. S. Kang, A. B. Sachid, S. Desai, M. Hettick, C. C. Hu,
and A. Javey, “Field-effect transistors built from all two-dimensional material
components,” ACS nano, 2014.
68 H. Fang, S. Chuang, T. C. Chang, K. Takei, T. Takahashi, and A. Javey, “High-
performance single layered wse2 p-fets with chemically doped contacts,”
Nano letters, vol. 12, no. 7, pp. 3788–3792, 2012.
69 L. Yang, K. Majumdar, Y. Du, H. Liu, H. Wu, M. Hatzistergos, P. Hung,
R. Tieckelmann, W. Tsai, C. Hobbs et al., “High-performance MoS2 field-
effect transistors enabled by chloride doping: Record low contact resistance
(0.5 kΩ·µm) and record high drain current (460 µA/µm),” in VLSI Technology
(VLSI-Technology): Digest of Technical Papers, 2014 Symposium on. IEEE, 2014,
pp. 1–2.
70 H. Xu, E. Kinder, S. Fathipour, A. Seabaugh, and S. Fullerton-Shirey, “Recon-
figurable ion doping in 2h-mote2 field-effect transistors using peo:csclo4 elec-
trolyte.” The 41st International Symposium on Compound Semiconductor,
May 2014.
71 R. Kappera, D. Voiry, S. E. Yalcin, B. Branch, G. Gupta, A. D. Mohite, and
M. Chhowalla, “Phase-engineered low-resistance contacts for ultrathin MoS2
transistors,” Nat Mater, vol. 13, no. 12, pp. 1128–1134, 12 2014. [Online].
Available: http://dx.doi.org/10.1038/nmat4080
72 U. Bockelmann and G. Bastard, “Phonon scattering and energy relaxation in
68
two-, one-, and zero-dimensional electron gases,” Physical Review B, vol. 42,
no. 14, p. 8947, 1990.
73 H. Fang, M. Tosun, G. Seol, T. C. Chang, K. Takei, J. Guo, and A. Javey, “De-
generate n-doping of few-layer transition metal dichalcogenides by potas-
sium,” Nano letters, vol. 13, no. 5, pp. 1991–1995, 2013.
69
CHAPTER 3
COMPARATIVE STUDY OF INTRINSIC CAPACITANCES OF
THIN-TFETS AND PIN-TFET
3.1 Enhanced Miller Effect of TFETs
Over years, most designs of TFETs fall into two categories in terms of device
structures: lateral TFETs with tunneling direction perpendicular to gate elec-
tric field and vertical TFETs with tunneling direction and gate electric field
aligned. For ultrascaled devices, pin-TFET based on layered 2D materials1 is
the latest breed of the lateral TFET structure, and its vertical counterpart is Two-
Dimensional Heterojunction Interlayer Tunneling Field Effect Transistor (Thin-
TFET).2 It has been reported that pin-TFET has a much larger gate-drain capaci-
tance (CGD) compared to MOSFET, therefore exhibiting enhanced Miller effect.3
A large Miller effect leads to large voltage overshoot/undershoot and longer
switching time in the transient response of a circuit. In this chapter, we show
that a Thin-TFET offers smaller CGD than a pin-TFET. Furthermore, this obser-
vation can be generalized that all vertical TFETs offers smaller CGD than lateral
TFET counterparts.
3.2 Effects of TFET Geometries: “lateral” TFETs vs. “vertical”
TFETs
In a n-type pin-TFET structure (see Fig.3.1), a positive gate voltage induces the
electron inversion in the channel. Because the source terminal charge is entirely
70
?????
????????
?????????
???????????? ???????????????
???? ???? ??????????
????? ????????????????
VTG
VBG
VDS
?????? ???????? ?????????
?????
????????
?????????
???????????? ????
????????????? ?????
VG
VDS
??????
 [a]
 [b]
????????
????????
p+ doped intrinsic n+ doped
Figure 3.1: Schematic structure of (a) n-type pin-TFET and (a) n-type Thin-TFET.
composed of fixed charges in the depletion region of the tunneling junction, al-
most all the channel inversion charge is attributed to the drain terminal. This
so-called 100/0 drain/source charge partition in a pin-TFET gives a much larger
CGD than in a MOSFET.3 On the contrary, in a n-type Thin-TFET structure, the
heavily doped source is situated under the gate. In order to have efficient top
gate control of the band alignment between the top layer (i region) and bot-
tom layer (p+ region) in a Thin-TFET, the top layer has to have a low carrier
concentration to avoid screening out the electric field from the top gate, while
the bottom layer has to have a very high carrier concentration to terminate the
electric field. Because most of the electric field from top gate terminates at the
source beneath the gate, the change of gate terminal charge is mainly reflected in
the source instead of the drain. Therefore, a Thin-TFET intrinsically has smaller
CGD than a pin-TFET. Moreover, this trend is also applicable for any TFETs with
vertical structures, such as the recently proposed U-shaped TFET.4
71
To better understand the effect of TFET geometries, we presented the ab-
stract comparison between pin-TFET and Thin-TFET (see Fig.3.2). The device
schematics with highlighted tunneling areas are shown in the first column. The
Thin-TFET has notably larger tunneling areas then the pin-TFET, which is one
of the original motivation of vertical TFETs. Larger tunneling areas can con-
tribute to larger ON-current density.5 The different device geometries lead to
distinguish capacitance networks shown in the second column of Fig.3.2. These
1D capacitance networks neglect the effect of parasitic capacitances and lateral
junction capacitances. Because the tunneling junction between the source and
channel has much smaller conductance than the forward bias junction between
the drain and channel, the channel inversion charge is mostly attributed to the
drain. Therefore, in the pin-TFET capacitance network model, the source and
channel are approximately disconnected. Since we neglected the lateral junc-
tion capacitance between the channel and the drain, the channel capacitance is
the quantum capacitance of the channel material. The Thin-TFET capacitance
model has been introduced in Li et al.2 It is worth to note that in the pin-TFET,
both the top gate and bottom gate are biased at the same voltage. Whereas in the
Thin-TFET, the bottom gate is grounded and only the top gate is biased. Because
of this, the pin-TFET has better “gate control” than the Thin-TFET. To quantify
the ”gate control”, we define gate efficiency (GE) for TFETs as the change of
energy overlap (∆EOL) over the change of gate voltage (∆VG) times unit charge.
Gate efficiency tells how much the tunneling window changes by changing cer-
tain amount of gate voltage. The gate efficiencies of pin-TFET and Thin-TFET
are shown in the fourth column of Fig.3.2. From the equations, the gate effi-
ciency of pin-TFET is roughly twice of Thin-TFET, given the following assump-
tions: 1) the bottom 2D layer in Thin-TFET is heavily doped, namely CQ,B is
72
much larger than CInterlayer; 2) CInterlayer is almost the same as CTG in Thin-TFET; 3)
CTG in Thin-TFET is the same asCG in pin-TFET. It is advantageous for pin-TFET
in term of gate efficiency since its lateral geometries allows the top and bottom
gates control the channel potential together. While for Thin-TFET, potential dif-
ference is required between the top and bottom gates, therefore the top layer 2D
is essential controlled by the only one gate. The expressions for gate-drain ca-
pacitance (CGD) of pin-TFET and Thin-TFET are listed in third column of Fig.3.2.
If we use the same assumptions above, CGD of Thin-TFET is roughly half of CGD
of pin-TFET.
In a Thin-TFET, an undercut (labeled in Fig.3.1) is necessary to achieve a
small SS.6 Because of the absence of p+ source in the undercut region, the top
gate in the undercut region is strongly coupled with the drain. Therefore, the
undercut region in Thin-TFET will increaseCGD. On the other hand, an underlap
(labeled in Fig.3.1) in pin-TFET is used to address the the ambipolar problem
in TFETs.7 For a n-type TFET, when applying more negative gate voltage, the
valance band in the channel will move above the conduction band in the drain,
therefore forming a tunneling junction between the channel and the drain. It is
the so-called ambipolar effect of TFETs, which is undesired under the regular
circuit design (we will discuss how to use this ambipolar effect to create more
interesting device behavior in Chapter 4). By leaving a certain channel region
close to the drain (underlap) not controlled by the gate, the possible tunneling
current through the channel and drain junction is suppress, and the ambipolar
effect is mitigated. In term of CGD, the underlap in pin-TFET can be viewed as
”decoupling” between the channel and the drain. Therefore the underlap in pin-
TFET helps to decrease CGD. We will show how undercut and underlap affects
the capacitances in Thin-TFET and pin-TFET respectively in the later sections.
73
 G
 G
S D 
p+ n+i
TG
BG
S 
D 
p+
n+i
ox
ox
ox
ox
interlayer
pin-TFET 
Thin-TFET 
p++
Device Schematic
G
CG
CQ
D
TG
CTG
CQ,T
CQ,B
BG
S
D
CBG
CInterlayer
S
1D Intrinsic
Capacitance Network
CG
CGD =
CTG CQ, T
CTG +CQ, T +CS
where CS is :
1
CS
=
1
Cinterlayer
+
1
CBG +CQ, B
Gate-Drain Capacitance
CGD =
2CG CQ
2CG +CQ
Gate Eciency
EOL
q VG
=
2CG
2CG +CQ
Tunneling Area
 Larger
      Smaller      Lower
     Higher
EOL
q VTG
=
CTG
CTG +CS +CQ, T
1 CinterlayerCinterlayer +CBG +CQ, B
Figure 3.2: Comparison between pin-TFET and Thin-TFET: (Column 1) Device
schematics with the tunneling area highlighted. Thin-TFET has a larger tunnel-
ing area, which can potentially render a higher tunnel current; (Column 2) 1D
intrinsic capacitance networks, whereCG is the gate capacitance of the pin-TFET,
CQ is the quantum capacitance of the channel material in the pin-TFET, CT (B)G is
the top (bottom) gate capacitance of the Thin-TFET, CQ,T (B) is the quantum ca-
pacitance of the top (bottom) material, and CInterlayer is the interlayer capacitance
between the top and bottom materials in the Thin-TFET; (Column 3 & 4) analyt-
ical expressions for CGD and gate efficiency.
3.3 Numerical Simulations of C-V Curves
3.3.1 Simulation Methods
The numerical simulations of C-V curves of pin-TFET and Thin-TFET are based
on solving 2D Poisson equation using LENO.8 In TFETs, the tunnel junction
has the highest resistance in the normal operation conditions. Therefore, it is
reasonable to assume all the potential difference between the source and drain
voltage drops across the tunnel junction, and the Quasi-Fermi levels are flat
elsewhere. The capacitances are computed via the following steps:
1. By solving 2D Poisson equation, obtain the electric fields in the gate oxides
74
at different biases;
2. Using the electric fields in the gate oxides, obtain the area charge densities
in the gate terminals (QG) at different biases;
3. Calculate the gate capacitance (CGG), gate-to-source capacitance (CGS ), and
gate-to-drain capacitance (CGD) using Equation 3.1.
CGG =
∆QG
∆VG
CGD =
∆QG
∆VD
CGS =
∆QG
∆VS
(3.1)
It is worth noting that for pin-TFET, QG is the total charge on both the top
and bottom gate terminals, while for Thin-TFET, QG is just the charge on the top
gate terminal and the corresponding VG is the top gate voltage since Thin-TFET
is gated by the top gate only.
This capacitance model is known as the quasi-static quasi-equilibrium
model. When deriving the capacitances only from the Poisson equations, we
made two approximations:
1. Quasi-static approximation, also known as the low frequency approxima-
tion, where the finite charging time for the inversion layer is ignored.
2. Quasi-equilibrium approximation, also known as the low current approx-
imation, where the change of charges in the channel due to the injected
tunnel current from the source is ignored.
75
The material and device parameters of the pin-TFET and the Thin-TFET used
in the simulations are listed in Table.3.3.1.
Material system for n-type pin-TFETs
Bandgap Electron affinity m∗c (m0) m∗v (m0)
(eV) (χ) (eV)
WTe2 0.75 4.05 0.37 0.3
ΦM (eV) 4.13
Lead region doping level: ND(A) = 2 × 1012cm−2
Material system for n-type Thin-TFETs
Top 2D layer Bottom 2D layer
Materials SnSe2 WSe2
Bandgap Electron affinity m∗c (m0) m∗v (m0)
(eV) (χ) (eV)
WSe2 1.3 4.0 0.3 0.4
SnSe2 0.9 5.1 0.3 0.4
ΦM,T (eV) ΦM,B (eV) ND,B(cm−2) NA,B(cm−2)
n-Thin-TFET 5.20 5.95 0 7×1013
Top layer lead region doping level: ND(A) = 2 × 1012cm−2
Valley degeneracy for WTe2, WSe2, SnSe2: 2
Simulated Device Parameters for Thin-TFETs and pin-TFETs
Gate length (nm) S/D length interlayer thickness Gate EOT
(nm) (nm) (Thin-TFET) (nm) (nm)
10 5 0.35 1
Table 3.1: The material and device parameters of pin-TFETs and Thin-TFETs in
the simulation.
3.3.2 Simulation Results with Different Undercut/Underlap
Lengths
The simulation results of bias dependent CGG, CGD and CGS of pin-TFETs and
Thin-TFETs with different undercut/underlap lengths are shown in Fig.3.3. In
Fig.3.3(a), the black curves are CGD versus the gate voltage VG (for Thin-TFET,
76
VG=VTG) at different VDS for pin-TFETs, and the red curves are for Thin-TFETs.
For both pin-TFETs and Thin-TFETs, CGD increases with decreasing VG. This
trend can be understood using the simple capacitance network in Fig.3.2. When
applying more positive gate voltage, the quantum capacitance (CQ for pin-TFET,
CQ,T for Thin-TFET) increases due to the increasing free carrier concentration in
the channel, which increases CGD. Moreover, the CGD decreases with increasing
VDS . Similar with the explanation above, more positive VDS means the potential
difference between the channel potential to the drain potential will decrease,
which leads to lower free carrier concentration in the channel and sequentially
smaller quantum capacitance (CQ for pin-TFET, CQ,T for Thin-TFET).
The CGD values of Thin-TFETs are roughly half of the ones of pin-TFETs as
discussed in Section 3.2. A smaller CGD means less severe Miller effect. With
longer underlap region, CGD of pin-TFETs decrease; while with longer undercut
region, CGD of Thin-TFETs slightly increase. Figure.3.3(d) shows the trend of
CGD . As for CGS shown in Fig.3.3(b), because in Thin-TFET, the gate is more
“coupled” with the source, the CGS of Thin-TFETs are significant larger than the
CGS of pin-TFETs. Due to the so called 100/0 drain/source charge partition in
pin-TFET, the CGS of pin-TFETs are almost zero. Precisely the same reason gives
Thin-TFETs a relatively smaller CGD and larger CGS compared to pin-TFETs: the
gate has stronger “coupling” with the source.
During the switching of the devices in a inverter, CGG of the devices are the
input capacitance, which affects the energy dissipation. A smaller CGG means
less charges need to be moved in order to switch a inverter. Therefore, we also
compared CGG of pin-TFETs and Thin-TFETs. Shown in Fig.3.3(c), Thin-TFETs
have smaller CGG compared to pin-TFET, which indicates Thin-TFETs are more
77
[a] [b]
[c]  [d]
Figure 3.3: (a)CGD versus VG at different VDS for both pin-TFETs (black lines) and
Thin-TFETs with different underlap and undercut lengths receptively; (b) CGS
versus VG at different VDS or both pin-TFETs (black lines) and Thin-TFETs with
different underlap and undercut lengths receptively; (c) CGG versus VG at differ-
ent VDS or both pin-TFETs (black lines) and Thin-TFETs with different underlap
and undercut lengths receptively; (d) CGD versus underlap/undercut length at
different VDS and VG = 0.4V .
energy efficient than pin-TFETs. We will discuss this in details in the next sec-
tion.
Besides the capacitances, we also compared the gate efficiencies of pin-TFETs
and Thin-TFETs, and their non-linear onset effect in the output characteristics.
As discussed in Section 3.2, the gate-efficiency (GE) of pin-TFETs is roughly
78
twice higher than Thin-TFETs. In Fig.3.4(a), the black line is the GE of the pin-
TFET and the red line is the GE of the Thin-TFET. The GE of the Thin-TFET is
measured at the center of the channel, therefore it is independent of the under-
cut effect. The solid lines in Fig.3.4(a) are the GE right at the threshold voltage,
namely when the valence band edge of the source is aligned with the conduc-
tion edge of the channel. The dash lines are the average GE when changing VG
from 0 to 0.4 V. The simulation results of GE are in agreement with the anal-
ysis in Section 3.2. Another common phenomenon of TFETs is the so-called
non-linear or super-linear onset of tunnel-FET output characteristic.9 The non-
linear onset of tunnel-FET output characteristic happens when VDS is small, ID
is exponentially dependent on VDS instead of linearly dependent on VDS . On
the other hand, the non-linear onset can be viewed as higher threshold voltages
at smaller VDS . In Fig.3.4(b), we show the threshold voltage versus the drain
voltage (VDS ) for both pin-TFET and Thin-TFET. The threshold voltages staying
almost constant at higher VDS for both pin-TFET and Thin-TFET. At lower VDS ,
pin-TFET’s threshold voltage doesn’t increase as much as Thin-TFET’s, which
indicated that pin-TFETs has less non-linear onset in the output characteristics
when compared with Thin-TFETs.
Since the undercut in Thin-TFETs increases CGD, we investigated what is the
minimal necessary undercut length. The ideal case is that the whole channel in
Thin-TFETs has the same threshold voltage. However, because the influence of
the drain, the threshold voltage is lower at the drain-side edge (see Fig.3.5(a)).
The uniformity of threshold voltages in Thin-TFETs lead to detriment sub-
threshold slope, which has been discussed in Section 2.4.1. In Fig.3.5(a), the red
line is the threshold voltage at the center of the channel (indicated in Fig.3.5(a)).
The dash lines are the threshold voltage at the drain-side edge (indicated in
79
[a] [b]
Figure 3.4: (a) Gate efficiencies versus the drain voltages VDS for both pin-TFETs
and Thin-TFETs, the solid lines are the gate efficiency at the threshold while
the dash lines are the average gate efficiency when swiping VG from 0 to 0.4 V;
(b) the threshold voltages versus the drain voltages VDS for both pin-TFETs and
Thin-TFETs, the increasing threshold voltages at smaller VDS lead to the non-
linear onset in the output characteristics.
Fig.3.5(a)). When there is no undercut, the threshold voltages at the drain-side
edge are significantly smaller than the one in the center of the center, which
would deteriorate the sub-threshold slop. To balance between the sub-threshold
slop and the CGD, we use undercut equals 1 nm in the following simulation.
3.4 Complimentary TFET Inverters
In order to evaluate the impact of CGD in Thin-TFETs and pin-TFETs based cir-
cuits, material parameters have been chosen to render symmetric behaviors in
both the p-type and n-type devices. The Thin-TFET used here has 1 nm under-
cut and the pin-TFET has 1 nm underlap. The complementary TFET (CTFET)
inverter is shown in Fig.3.6. We can write its charge conservation equation (see
Equation 3.2):
80
???
??
??
??
??
??
???
??
???
?
?
????
???
????
???
????
???
????
??????
? ???? ??? ???? ??? ???? ??? ???? ???
????????????????????????????
???????????
?????????????
?????????????
?????????????????????????????????????????
?????????????????????????????????
??????????????????????????????????
?????????????????????????????????
[a]
?????
????????
?????????
???????????? ???????????????
???? ???? ??????????
????? ????????????????
VTG
VBG
VDS
?????? ???????? ?????????
????????
The center 
of the channel
The drain-side 
edge
[b]
Figure 3.5: (a) The threshold voltages versus the drain voltages for Thin-TFETs
with different undercut lengths. The red solid line is the threshold voltages
computed at the center of the channel (shown in (b)), the dash lines are the
threshold voltages computed at the drain-side edge (shown in (b)). The differ-
ences between the dash lines and the red solid line indicate the non-uniformity
of the threshold voltages along the channel of Thin-TFETs.
CGD,p+CGD,n CL
VIN VOUT
VDD
ILIGD
Ip
In
Figure 3.6: The schematic layout of the complementary TFET (CTFET) inverter.
CGD,n andCGD,p are the gate-to-drain capacitance of the n-type and p-type TFETs.
CL is the load capacitance.
81
CL
dVOUT
dt
= (CGD,n +CGD,p)
d(VIN − VOUT )
dt
+ Ip − In (3.2)
For a given sequence of VIN(t), we can define the right hand side as
f (t,VOUT (t),VIN(t)). Then we can solve for VOUT (t) using the backward Eular
Method:
Result: The output voltage VOUT at each time step t
for each time step tk do
Initialization: i = 1 and V iOUT (tk) = VOUT (tk−1);
while δ doesn’t meet convergence condition do
V i+1OUT (tk) = VOUT (tk−1) + ∆t × f(tk, V iOUT (tk), VIN(tk));
δ = V i+1OUT (tk) - (VOUT (tk−1) + ∆t × f(tk, V i+1OUT (tk), VIN(tk)));
end
VOUT (tk) = V i+1OUT (tk);
end
Algorithm 1: The backward Eular method used to compute the transient re-
sponse of CTFET inverters
Ignoring the static energy dissipation of CTFET inverters due to the leakage
current, the dynamic energy dissipation can be written as:
EDynamic = VDD
∫
VOUT=0→VDD
Indt + VDD
∫
VOUT=VDD→0
Ipdt + Eshortcircuit (3.3)
where Eshortcircuit is the short circuit current if the n-type and p-type TFETs are
ON simultaneously. Since we designed the threshold voltage Vth,n of the n-type
TFET and Vth,p of the p-type TFET such that
∣∣∣Vth,n∣∣∣ + ∣∣∣Vth,p∣∣∣ > VDD, the short circuit
82
Time (ps)
Cu
rr
en
t D
en
si
ty
 (µ
A
/µ
m
)
Input Voltage
pin-TFET
On Current Density: 25 µA/µm
Load Capacitance: 0.4 fF/µm 
Thin-TFET
Time (ps)
Cu
rr
en
t D
en
si
ty
 (µ
A
/µ
m
)
Input Voltage
pin-TFET
On Current Density: 250 µA/µm
Load Capacitance: 0.4 fF/µm 
Thin-TFET
Time (ps)
Cu
rr
en
t D
en
si
ty
 (µ
A
/µ
m
)
Input Voltage
pin-TFET
On Current Density: 250 µA/µm
Load Capacitance: 0.1 fF/µm 
Thin-TFET
Time (ps)
Vo
lta
ge
 (V
)
Thin-TFET dissipates 0.08 fJ/µm
Input Voltage
pin-TFET
On Current Density: 250 µA/µm
Load Capacitance: 0.1 fF/µm 
pin-TFET dissipates 0.13 fJ/µm
Thin-TFET
Time (ps)
Vo
lta
ge
 (V
)
Thin-TFET dissipates 0.14 fJ/µm
Input Voltage
pin-TFET
On Current Density: 250 µA/µm
Load Capacitance: 0.4 fF/µm 
pin-TFET dissipates 0.20 fJ/µm
Thin-TFET
Time (ps)
Vo
lta
ge
 (V
)
Thin-TFET dissipates 0.10 fJ/µm
Input Voltage
pin-TFET
On Current Density: 25 µA/µm
Load Capacitance: 0.1 fF/µm 
pin-TFET dissipates 0.15 fJ/µm
Thin-TFET
Time (ps)
Vo
lta
ge
 (V
)
Thin-TFET dissipates 0.17 fJ/µm
Input Voltage
pin-TFET
On Current Density: 25 µA/µm
Load Capacitance: 0.4 fF/µm 
pin-TFET dissipates 0.23 fJ/µm
Thin-TFET
Time (ps)
Cu
rr
en
t D
en
si
ty
 (µ
A
/µ
m
)
Input Voltage
pin-TFET
On Current Density: 25 µA/µm
Load Capacitance: 0.4 fF/µm 
Thin-TFET
In (solid line) Ip (dash line) In (solid line) Ip (dash line)
In (solid line) Ip (dash line) In (solid line) Ip (dash line)
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 3.7: (a)-(d) the input and output voltages of pin-TFETs and Thin-TFETs
based CFET inverter versus time with different ON current density and load
capacitance; (e)-(h) the current density of pin-TFETs and Thin-TFETs in the in-
verters versus time with different On current density and load capacitance.
current is eliminated. Therefore we can ignore Eshortcircuit when computing the
energy dissipation.
The transient response of the CTFET inverters is shown in Fig.3.7. The green
lines in Fig.3.7(a)-(d) are the input voltages. In order to clearly illustrate the out-
put voltage behaviors, we use the square wave as the input voltages. In practice,
the input voltage will be the output voltage of the previous stage. The output
voltage of pin-TFETs and Thin-TFETs are shown in red line. First, both of them
have the significant overshoot/undershoot. The overshoot/undershoot hap-
pens when there is a substantial large capacitance directly connecting the input
and output, namely CGD. When the input voltage ramping up from low to high,
the capacitance between the input and output will attempt to keep the poten-
83
tial difference between the input and the output, which leads to a displacement
current flow from the input to the output via CGD and start to charge CL. Conse-
quentially, the output voltage has been pushed over VDD before the ON current
of the n-type TFET pulls down the output voltage. The undershoot voltage can
be explained in a similar way. The overshoot/undershoot also lead to prop-
agation delays, which measures the time delays between the signal edge (i.e.
VDD/2) of the input voltages and of the output voltages. In the Fig.3.7(a)-(d), we
show that Thin-TFET based CTFET inverters has smaller overshoot/undershoot
voltage when comparing to pin-TFET based CTFET inverters since Thin-TFET
has smaller CGD. Besides CGD, a smaller CL and higher ON current density can
also help to reduce the overshoot/undershoot voltage. Since the energy dissi-
pation is determined by the integral of current over time, the areas under the
curve in Fig.3.7(e)-(h) are proportional to the energy dissipation. Comparing
Thin-TFETs and pin-TFETs, Thin-TFET based CTFET inverter can save around
30% energy dissipation and has shorter propagation delay due to smaller CGD.
3.5 Conclusion
Due to its vertical stacking structure, a Thin-TFET intrinsically has much smaller
CGD than a pin-TFET. With mitigated Miller effect, Thin-TFET inverters can save
around 30% power dissipation then pin-TFET inverters. This finding will help
to guide future designs of TFET structures.
84
BIBLIOGRAPHY
1 H. Ilatikhameneh, Y. Tan, B. Novakovic, G. Klimeck, R. Rahman, and J. Appen-
zeller, “Tunnel field-effect transistors in 2-d transition metal dichalcogenide
materials,” IEEE Journal on Exploratory Solid-State Computational Devices and Cir-
cuits, vol. 1, pp. 12–18, Dec 2015.
2 M. O. Li, D. Esseni, J. J. Nahas, D. Jena, and H. G. Xing, “Two-dimensional
heterojunction interlayer tunneling field effect transistors (thin-tfets),” IEEE
Journal of the Electron Devices Society, vol. 3, no. 3, pp. 200–207, 2015.
3 S. Mookerjea, R. Krishnan, S. Datta, and V. Narayanan, “On enhanced miller
capacitance effect in interband tunnel transistors,” IEEE Electron Device Letters,
vol. 30, no. 10, pp. 1102–1104, 2009.
4 W. Li, H. Liu, S. Wang, and S. Chen, “Reduced miller capacitance in u-shaped
channel tunneling fet by introducing heterogeneous gate dielectric,” IEEE Elec-
tron Device Letters, vol. 38, no. 3, pp. 403–406, 2017.
5 G. Zhou, R. Li, T. Vasen, M. Qi, S. Chae, Y. Lu, Q. Zhang, H. Zhu, J.-M. Kuo,
T. Kosel et al., “Novel gate-recessed vertical inas/gasb tfets with record high i
on of 180 µa/µm at v ds= 0.5 v,” in Electron Devices Meeting (IEDM), 2012 IEEE
International. IEEE, 2012, pp. 32–6.
6 Y. Lu, G. Zhou, R. Li, Q. Liu, Q. Zhang, T. Vasen, S. Doo Chae, T. Kosel,
M. Wistey, H. Xing, A. Seabaugh, and P. Fay, “Performance of AlGaSb/InAs
TFETs with gate electric field and tunneling direction aligned,” Electron Device
Letters, IEEE, vol. 33, no. 5, pp. 655–657, May 2012.
7 T. Krishnamohan, D. Kim, S. Raghunathan, and K. Saraswat, “Double-gate
strained-ge heterostructure tunneling fet (tfet) with record high drive currents
and 60mv/dec subthreshold slope,” in 2008 IEEE International Electron Devices
Meeting, Dec 2008, pp. 1–3.
8 M. Li, “Leno device simulator,” https://github.com/Oscarlight/leno/tree/
master/Leno beta1.5, 2015.
9 L. De Michielis, L. Lattanzio, and A. M. Ionescu, “Understanding the superlin-
ear onset of tunnel-fet output characteristic,” 2012.
85
CHAPTER 4
XNOR-ENABLED TRANSISTOR (TRANSIXNOR) FOR BINARIZED
NEURAL NETWORK ACCELERATOR
4.1 Introduction
In recent years, deep neural networks (DNNs) have become an important type
of machine learning algorithms, and achieved substantial improvements in a
wide range of tasks including object recognition in images,1, 2 speech recogni-
tion,3 machine translation,4 image generation5 and game plays.6, 7
However, the state-of-art DNNs have a lot of parameters and expensive com-
putational cost. Especially for mobile and embedded systems, the size of the
model and the energy consumption during inference are crucial. Numerous re-
searches have been conducted to provide efficient hardware designs for DNNs.8
Since the data intensive nature of deep learning, data movement becomes the
speed bottleneck and dominate the energy consumption. The concept known
as processing-in-memory aims at bringing the memory closer to the computation.
Among various non-volatile memories (NVM) based architectures, the resistive
RAM (RRAM) crossbar array allows computing the analog matrix-vector mul-
tiplication in the constant time O(1), therefore provides massive acceleration of
forward and backward pass of DNNs with reduced power consumption and in-
creased integration density.9 In order to accelerate the weight update in RRAM
crossbars, Gokmen et al.10 proposed to significantly simplify the multiplica-
tion operation itself by using stochastic computing technique, thus achieving
the O(1) time complexity for the weight update cycle of the training algorithm.
However, processing fixed point numbers with RRAMs has several drawbacks
86
including resistance variations, stuck-at faults and AD/DA overheads.11 Chen
et al.11 developed a weight-memristor mapping algorithm based on bipartite
matching and the self-healing capability of neural networks to improve the pre-
cision.
On the other hand, the energy and area costs of computation are reduced
rapidly by decreasing the number of bits used to represent the weight and the
activation.12 The recently introduced binarized neural network (BNNs) with
binary weights and activations13–17 turns the most computationally expensive
convolutions into bitwise operations, as well as dramatically shrinks the model
size. Many efforts have been made to design specific hardware to accelerate
BNNs. Zhong et al.18 and Umuroglu et al.19 contribute to building fast and
flexible FPGA accelerators for BNNs. Tutu et al.20 built the specialization tier
for BNNs in the Celerity chip. The performances of BNNs in different hardware
platforms such as FPGA, CPU, GPU, and ASIC were compared by Eriko et al.21
The RRAM crossbar architectures mentioned above can further support
BNNs.22–24 From one standpoint, the processing-in-memory capability of RRAM
crossbars can achieve faster and more energy efficient implementation of BNNs
along with smaller chip areas; from the other standpoint, BNNs use the single-
bit RRAM devices, which can tolerate more variation and be more reliable than
the multi-bit RRAM devices. Moreover, compared to the multi-bit RRAM cross-
bar, the single-bit RRAM crossbar is more energy efficient during computation
and required no AD/DA overhead.
At the heart of deep neural networks is general matrix-matrix multiplication
(GEMM) or general matrix-vector multiplication (GEMV). Both the forward and
backward pass of the fully-connected layer and the convolution layer can be
87
Xm,k-1
Xm,k
Xm,k+1
Wk,n-1 Wk,n Wk,n+1
Wk-1,n-1 Wk-1,n Wk-1,n+1
Wk+1,n-1 Wk+1,n Wk+1,n+1
Ym,n-1 Ym,n Ym,n+1
Figure 4.1: The RRAM crossbar architecture. X is the input voltages signals, W
is the weight matrix whose elements are the RRAM conductivities, and Y is the
output current signals. The relationship of X, W, Y is shown in Eq.4.1
built on GEMM or GEMV. In a RRAM crossbar architecture,23 parallel GEMV
is performed by using the conductivities of each RRAM devices as the weight
matrix W, the input voltage signals as the input vector X and the output current
signal as the output vector Y (shown in Fig.4.1).
The relationship between the input vectors X and output vectors Y can be
expressed as in Eq.4.1:
Ym,n =
∑
k
Xm,k ·Wk,n (4.1)
When the input vector X and weight matrix W are binary (i.e. Xm,k, Wk,n ∈ 0,
1), the multiplication in Equation 4.1 is replaced by XNOR (noted as ⊗). How-
ever, XNOR can not be implemented directly with RRAM. Therefore XNOR is
implemented as Eq.4.2:
88
Ym,n =
∑
k
Xm,k ⊗Wk,n =
∑
k
Xm,k ·Wk,n + Xm,k ·Wk,n (4.2)
Curious readers may wonder why binary multiplication is equivalent to
XNOR. In the BNNs,13–15 the binarization is done by constraining the variable
to +1 and -1 (instead of 1 and 0). If we define +1 as true and -1 as false, it was
obvious that the binary multiplication with +1 and -1 is equivalent to XNOR as
shown in Eq.4.3.
1 · 1 = 1 ⊗ 1 = 1
-1 · 1 = -1 ⊗ 1 = -1
1 · -1 = 1 ⊗ -1 = -1
-1 · -1 = -1 ⊗ -1 = 1
(4.3)
In the circuit, we normally use 1 and 0 instead of 1 and -1. Fortunately, these
two representations are interchangeable through Eq.4.4.
2
N∑
i
ai ⊗ bi − N =
N∑
i
ci ⊗ di
ai, bi ∈ {0, 1}, ci, di ∈ {−1, 1}
(4.4)
We will keep the convention in circuit design by using 1 and 0, but follow
the binary multiplication rule of -1 and 1, which is the XNOR operation.
Since the RRAM crossbar architecture doesn’t natively support XNOR, we
need in total 2 multiplications, 1 addition, and 2 bit complements. In RRAM
89
crossbar, the multiplications and additions can be done in parallel, therefore
there is no extra time consumption. From Eq.4.2, it may seems like implement-
ing XNOR required twice as many as RRAMs in the crossbar comparing with
its multi-bit counterpart. Note that, in order to represent negative weights in
the multi-bit RRAM, each weight is implemented by a pair of RRAM devices.25
Therefore there is no extra area penalty. Taking the bit complement, however,
may become a potential overhead.
What if we can use a single device to compute XNOR? It will lead to around
50% area saving and potentially faster and more energy efficient implementa-
tion. Due to the channel to drain tunneling, TFETs are known to have ambipo-
lar behavior, which is considered undesirable in logic circuits.26–31 However, the
ambipolar behavior can enable interesting logic operations, such as exclusive or
(XOR) and its complement XNOR. In this letter, we utilize the ambipolar behav-
ior in TFETs to propose a novel device, TransiXNOR, a dual-gate XNOR-enable
transistor based on Zener tunneling. Eventually, we propose to integrate a non-
volatile memory, for instance RRAM, and use it as the building block to create
a new crossbar architecture to compute binary GEMV by utilizing the unique
NXOR functionality of TransiNORs.
4.2 Dual-gated XNOR-enable Transistor: TransiXNOR
4.2.1 Device Working Principle
The schematic structure of TransiXNOR is shown in Fig.4.2(a). The structure
resembles a double-gated tunnel field effect transistor (DG-TFET). But there are
90
Channel
Top Gate VTG
Gate Oxide
Source Drain VDS
Gate Oxide
Bottom Gate VBG
[a]
[b]
EC
EV
EC
EV
EC
EV
qVDS
S D S DS D
VTG = VBG = 0 VTG = VBG = VDD
VTG = VBG = VDD /2
or 
VT(B)G = 0  VB(T)G = VDD
ON
ON
OFF
0 VDD
VDD
Top Gate
B
ot
to
m
 G
at
e
[c]
Gate Oxide
Figure 4.2: (a) The schematic structure of TransiXNOR; (b) the band diagrams
at different gate bias conditions of TransiXNOR when VDS equals VDD: (left) the
channel/drain tunnel junction is ON when both VTG and VBG are 0; (right) the
source/channel tunnel junction is ON when both VTG and VBG are VDD; (middle)
both the channel/drain tunnel junction and source/channel tunnel junction are
OFF at the bias conditions such as both VTG and VBG are VDD/2, or one gate
is VDD and the other is 0; (c) The schematic mapping of transiXNOR ON/OFF
states at different VTG and VBG when VDS is VDD, which resembles XNOR logic.
three major differences:
1. The top gate and bottom gate are biased independently;
2. The channel has to be thin enough such that the top gate and bottom gate
control the same conducting channel;
3. The tunneling current plane is alternated between the source/channel
junction and channel/drain junction.
The working principle of TransiXNOR is shown in Fig.4.2(b): when both VTG
and VBG are zero biases and VDS biased at VDD, the channel is electrostatically
p-doped such that the valence band edge of the channel is above the conduction
91
band edge of the drain. Therefore, the channel/drain tunnel junction is ON. On
the other hand, when both VTG and VBG are biased to VDD and VDS biased at VDD,
the channel is gated to be n-type such that the conduction band edge of the chan-
nel is below the valence band edge of the source. Therefore, the source/channel
tunnel junction is ON. However, when both VTG and VBG are biased at VDD/2,
or one gate at VDD and the other gate at 0, both the source/channel junction and
channel/drain channel are OFF.
Therefore, if we map the transiXNOR ON/OFF states with respect to VTG
and VBG at VDS = VDD (shown in Fig.4.2(b)), transiXNOR is ON only when VTG
and VBG are either both low or both high, and OFF otherwise. This behavior is
precisely XNOR logic.
4.2.2 Simulation Approach
To demonstrate the concept of TransiXNOR, we choose 2 quintuple-layer (2QL)
Bi2Se332 as an example channel material. The simulated device structure is
shown in Fig.4.2(a). Following the Bi2Se3 TFET simulation by Zhang et al.,28
2QL Bi2Se3 channel material is used with a thickness of 1.4 nm and a relative
static dielectric constant r of 100 (from bulk Bi2Se333). The bandgap of 2QL
Bi2Se3 is 0.252 eV and its electron/hole effective mass is 0.124/2.23 m0.28, 32 The
source is p-doped with a fixed negative charge concentration of 5.5×1013 cm−2
and the drain is n-doped with a fixed positive charge concentration of 3.8×1012
cm−2. We define the source degeneracy ∆ES = EV - EFS, and drain degeneracy
∆ED = EFD - EC. The gate length is 18 nm and the source/drain region is 10
nm. The top and bottom gate oxides are both 1.1 nm HfO2 with r of 25. The
92
workfunction of the gate metals is 0.215 eV above the conduction edge EC of
Bi2Se3. Ballistic transport is solved self-consistently with the 2D Poisson equa-
tion, within the Non-Equilibrium Greens function (NEGF) formalism, using the
NanoTCAD ViDES simulation environment.34
4.2.3 Results and discussion
The transport characteristics and the corresponding band diagrams and current
spectra are shown in Fig.4.3. As discussed in the Section 4.2.1, when VDS = 0.2
V and VBG = 0.2 V, the IDS vs. VTG curve resembles a n-type TFETs (shown in
Fig.4.3(a.1)). When both VTG and VBG biased at 0.2 V, the valence band edge EV
in the source is above the conduction band edge EC in the channel and the tun-
neling happens a the source/channel junction (shown in Fig.4.3(a.2)). On the
other hand, when VDS = 0.2 V and VBG = 0.1 V, the IDS vs. VTG curve resembles a
p-type TFETs (see Fig.4.3(c.1)). When both VTG and VBG biased at 0 V, EV in the
channel is above EC in the drain and tunneling happens at the channel/drain
junction (shown in Fig.4.3(c.2)). Therefore, the transiXNOR is ON when both
VTG and VBG are biased at 0 V or 0.2 V. When VDS = 0.2 V and VBG = 0.1 V,
the IDS vs. VTG curve shows ambipolar operation (shown in Fig.4.3(b.1)). The
asymmetry between the n and p branch is due to the difference in the conduc-
tion/valence band effective masses of 2QL Bi2Se3. The drain current is minimal
when both VTG and VBG are biased at 0.1 V, or one gate at 0.2 V and the other
gate at 0, since there is no tunneling window at both source/channel junction
and channel/drain junction (shown in Fig.4.3(b.1)).
The key to TransiXNOR design is to manage the three major components
93
[a.1] [a.2] [a.3]
EC
EV
EFS EFD
Source
Drain
[b.1]
VTG  (V)
I D
S 
 (µ
A
/µ
m
)
VBG  = 0.1 V
VDS  = 0.2 V
y  (nm)
En
er
gy
 (e
V)
VBG  = 0.1 V
VDS  = 0.2 V
VBG  = 0.1 V
[b.2]
A
JE  (A/m/eV)
VBG  = 0.1 V
VDS  = 0.2 V
VBG  = 0.1 V
[b.3]
En
er
gy
 (e
V)
B
C
[c.1] [c.2] [c.3]
EC
EV
EFS EFD
Source
Drain
VTG  (V)
I D
S 
 (µ
A
/µ
m
)
VBG  = 0 V
VDS  = 0.2 V
y  (nm)
En
er
gy
 (e
V)
VBG  = 0 V
VDS  = 0.2 V
VBG  = 0 V
A
JE  (A/m/eV)
VBG  = 0 V
VDS  = 0.2 V
VBG  = 0 V
En
er
gy
 (e
V)
B
C
EC
EV
EFS EFD
Source
Drain
VTG  (V)
I D
S 
 (µ
A
/µ
m
)
VBG  = 0.2 V
VDS  = 0.2 V
y  (nm)
En
er
gy
 (e
V)
VBG  = 0.2 V
VDS  = 0.2 V
VBG  = 0.2 V
A
JE  (A/m/eV)
VBG  = 0.2 V
VDS  = 0.2 V
VBG  = 0.2 V
En
er
gy
 (e
V)
B
C
Figure 4.3: (a.1) The I-V curve of the drain current IDS versus VTG when VBG = 0
V and VDS = 0.2 V; (a.2) The band diagram and (a.3) the current spectrum when
VDS = 0.2 V and both VTG = VBG = 0 V. (b.1) THe I-V curve of the drain current
IDS versus VTG when VBG = 0.1 V and VDS = 0.2 V; (b.2) The band diagram and
(b.3) the current spectrum when VDS = 0.2 V and both VTG = VBG = 0.1 V. (c.1)
THe I-V curve of the drain current IDS versus VTG when VBG = 0.2 V and VDS =
0.2 V; (c.2) The band diagram and (c.3) the current spectrum when VDS = 0.2 V
and both VTG = VBG = 0.2 V.
of the drain current: A) electron thermionic current, B) inter-band tunneling
current, C) hole thermionic current. In the OFF state (shown in Fig.4.3(b.3)),
sufficient high doping levels in both source and drain are required in order to
reduce the thermionic current (A and C). However if the doping levels were
too high, the tunneling energy window (∆ES + ∆ED + qVDS) became larger than
the channel bandgap EG,Channel and TransiXNOR can not be turned off. A suf-
94
ficient long gate length is necessary to reduce the direct inter-band tunneling
from source to drain. The workfunction of the gate metals is designed to mini-
mize the drain current when both VTG and VBG are biased at VDD/2 or one gate
at VDD and the other at 0. In the ON state, the inter-band tunneling current
(B) peaks at either the source/channel junction or the channel/drain junction
(shown in Fig.4.3(a.3, c.3)). The VDD has to be large enough so that both the
source/channel and the channel/drain junction can be turned ON, while VDD
has to be smaller than the channel bandgap so that the tunneling energy win-
dow (∆ES + ∆ED + qVDS) is smaller than EG,Channel when VDS = VDD. Therefore,
VDD should be chosen to be slightly smaller than EG,Channel - ∆ES - ∆ED.
The output characteristics and the corresponding band diagrams and cur-
rent spectra are shown in Fig.4.4. When VBG = 0.2 V, the output characteristics of
TransiXNOR resemble an ordinary n-type TFET as shown in Fig.4.4(a.1). Com-
paring the band diagrams at the same gate biases but different VDS in Fig.4.4(a.2)
and Fig.4.3(a.2), the tunneling energy window of the source/channel junction
remains virtually unchanged, therefore its inter-band tunneling current stays
almost the same when changing VDS from 0.2 V to 0.1 V (shown in Fig.4.4(a.3)).
On the other hand, when VBG = 0 V, the channel/drain junction is used as the
tunnel junction, VDS inevitably affects the tunneling energy window of the chan-
nel/drain tunnel junction. Therefore, when VBG = 0 V, the output characteristics
of TransiXNOR resemble p-type tunnel diodes. Comparing the band diagrams
at the same gate biases but different VDS in Fig.4.4(b.2) and Fig.4.3(c.2), the tun-
neling energy window of the channel/drain junction decreases when chang-
ing VDS from 0.2 V to 0.1 V, so does its inter-band tunneling current (shown in
Fig.4.4(b.3)).
95
EC
EV
EFS EFD
Source
Drain
[a.1]
VDS  (V)
I D
S 
 (µ
A
/µ
m
)
VBG  = 0.2 V VTG  = 0.20 V
y  (nm)
En
er
gy
 (e
V)
VBG  = 0.2 V
VDS  = 0.1 V
VBG  = 0.2 V
[a.2]
A
JE  (A/m/eV)
VBG  = 0.2 V
VDS  = 0.1 V
VBG  = 0.2 V
[a.3]
En
er
gy
 (e
V)
B
C
EC
EV
EFS EFD
Source
Drain
[b.1]
VDS  (V)
I D
S 
 (µ
A
/µ
m
)
y  (nm)
En
er
gy
 (e
V)
VBG  = 0 V
VDS  = 0.1 V
VBG  = 0 V
[b.2]
A
JE  (A/m/eV)
VBG  = 0 V
VDS  = 0.1 V
VBG  = 0 V
[b.3]
En
er
gy
 (e
V)
B
C
VTG  = 0.15 V
VTG  = 0.10 V
VTG  = 0.05 V
VTG  = 0 V
VBG  = 0 V
VTG  = 0.00 V
VTG  = 0.05 V
VTG  = 0.10 V
VTG  = 0.15 V
VTG  = 0.20 V
Figure 4.4: (a.1) The family characteristic of TransiXNOR with various VTG at
VBG = 0.2 V; (a.2) The band diagram and (a.3) the current spectrum when VDS =
0.1 V and both VTG = VBG = 0.2 V. (b.1) The family characteristic of TransiXNOR
with various VTG at VBG = 0 V; (b.2) The band diagram and (b.3) the current
spectrum when VDS = 0.1 V and both VTG = VBG = 0 V.
A grid of drain current maps with different VTG and VBG at different VDS is
shown in Fig.4.5. Because of the diode-like output characteristics stemming
from the channel/drain tunnel junction, TransiXNOR establishes XNOR be-
havior when VDS is larger than VDD/2 but AND behavior instead when VDS
is smaller than VDD/2.
4.3 TransiXNOR Crossbar Architecture for Binary matrix-
vector Multiplication
We utilize both the TransiXNOR and the Resistive RAM (RRAM) to build a
XNOR cell shown in Fig.4.6.
96
Figure 4.5: A grid of 2D mappings of IDS along both VTG and VBG axes at different
VDS. The coloring represents the current density in logarithm. When VDS is
larger than 0.1 V (half VDD), the transiXNOR resembles XNOR logic; and when
VDS is smaller than 0.1 V, the transiXNOR resemble AND logic.
In the cell, the bit line and work line are used to write to the RRAM. The
RRAM is in series with a regular resistor R. The RRAM is set to either low
resistance state RL or high resistance state RH. After mapping each element Wk,n
of the 2D binary weight matrix W to either RL or RH state of each RRAM, the
word line is set floating, and the bit line is set to ground. During the computing,
the source line is set to VDD. The resistance R of the regular resister is designed
such as the bottom gate voltage of TransiXNOR is close to zero when RRAM is
at the low resistance state RL and close to VDD when RRAM at the high resistance
state RH. Given the RRAM ON/OFF resistance ratio k (i.e. RH/RL), the optimal
R to maximize the margin between high and low bottom gate voltage is
√
kRL,
97
Xk
Wk,n
Input Line
Output Line
Bit Line
Word Line
Source Line
RRAM R
TransiXNOR
Figure 4.6: The XNOR cell built with TransiXNOR and RRAM. The bit line and
work line are used to write to the RRAM. The RRAM is in series with a regular
resistor. After writing each element Wk,n of the 2D binary weight matrix W to
the each RRAM, the word line is set floating, and the bit line is set to ground.
During the computing, the source line is set to VDD, and each element Xk of the
input vector X is set through each input line in parallel. The current entering
the output line represent the XNOR result of Wk,n and Xk.
Xk-1
Xk
Xk+1
Wk,n-1 Wk,n Wk,n+1
Wk-1,n-1 Wk-1,n Wk-1,n+1
Wk+1,n-1 Wk+1,n Wk+1,n+1
Yn-1 Yn Yn+1
RRAM R RRAM R RRAM R
RRAM R RRAM R RRAM R
RRAM R RRAM R RRAM R
Figure 4.7: The XNOR array to compute Y=W×X in the constant time.
98
namely
√
k times of the low resistance RL. When ratio k are 102, 103, and 104, the
high bottom gate voltage are 0.90VDD, 0.97VDD, and 0.99VDD respectively; the
low bottom gate voltage are 0.10VDD, 0.03VDD, and 0.01VDD respectively. Since
the ON/OFF resistance ratio of RRAM can be larger than 104,35 the bottom gate
voltage has large enough margin between high and low voltage to correctly
perform the XNOR logic. Each element Xk of the input vector X is set as the
voltage signal on each input line in parallel. The current entering the output
line represent the XNOR result of Wk,n and Xk. Putting the XNOR cell shown
in Fig.4.6 into an array, we got the XNOR array (shown in Fig.4.7) computing
binary GEMV in the constant time. The current emerging from Yn is the sum
of the currents through each XNOR cell along the output line due to Kirchoff’s
law. Therefore, Y , which is W×X, can be read out in parallel from the output
lines.
4.4 Conclusion
In this chapter, we proposed a XNOR-enabled transistor: TransiXNOR. Tran-
siXNOR is based on a double gate lateral TFET structure but it uses not only the
source/channel junction as a tunnel junction, but also the channel/drain junc-
tion. This unique dual junctions enable TransiXNOR to be ON when and only
when the top and bottom gate voltage are both high or both low.
Since binary multiplication with -1 and 1 is equivalent to XNOR operation.
We proposed a TransiXNOR cell with integrated RRAM to compute the binary
product between the memory state in RRAM and input voltage signal. By in-
tegrating the TransiXNOR cells into a crossbar architecture, we could compute
99
the binary GEMV in the constant time, which can be used to greatly accelerate
binarized neural network.
100
BIBLIOGRAPHY
1 Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no.
7553, pp. 436–444, 2015.
2 K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recog-
nition,” in Proceedings of the IEEE conference on computer vision and pattern recog-
nition, 2016, pp. 770–778.
3 D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case,
J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al., “Deep speech 2: End-to-
end speech recognition in english and mandarin,” in International Conference
on Machine Learning, 2016, pp. 173–182.
4 D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly
learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
5 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neu-
ral information processing systems, 2014, pp. 2672–2680.
6 V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra,
and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv
preprint arXiv:1312.5602, 2013.
7 S. Singh, A. Okun, and A. Jackson, “Artificial intelligence: Learning to play
go from scratch,” Nature, vol. 550, no. 7676, p. 550336a, 2017.
8 V. Sze, Y.-H. Chen, T.-J. Yang, and J. Emer, “Efficient processing of deep neural
networks: A tutorial and survey,” arXiv preprint arXiv:1703.09039, 2017.
9 Y. Wang, L. Xia, M. Cheng, T. Tang, B. Li, and H. Yang, “Rram based learn-
ing acceleration,” in Compliers, Architectures, and Sythesis of Embedded Systems
(CASES), 2016 International Conference on. IEEE, 2016, pp. 1–2.
10 T. Gokmen and Y. Vlasov, “Acceleration of deep neural network training with
resistive cross-point devices,” arXiv preprint arXiv:1603.07341, 2016.
11 L. Chen, J. Li, Y. Chen, Q. Deng, J. Shen, X. Liang, and L. Jiang, “Accelerator-
friendly neural-network training: Learning variations and defects in rram
crossbar,” in 2017 Design, Automation & Test in Europe Conference & Exhibition
(DATE), 2017, pp. 19–24.
101
12 M. Horowitz, “1.1 computing’s energy problem (and what we can do about
it),” in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014
IEEE International. IEEE, 2014, pp. 10–14.
13 M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Training deep
neural networks with binary weights during propagations,” in Advances in
Neural Information Processing Systems, 2015, pp. 3123–3131.
14 I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized
neural networks,” in Advances in neural information processing systems, 2016,
pp. 4107–4115.
15 M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet
classification using binary convolutional neural networks,” in European Con-
ference on Computer Vision. Springer, 2016, pp. 525–542.
16 S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “Dorefa-net: Training low
bitwidth convolutional neural networks with low bitwidth gradients,” arXiv
preprint arXiv:1606.06160, 2016.
17 W. Tang, G. Hua, and L. Wang, “How to train a compact binary neural net-
work with high accuracy?” in AAAI, 2017, pp. 2625–2631.
18 R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta,
and Z. Zhang, “Accelerating Binarized Convolutional Neural Networks with
Software-Programmable FPGAs,” Int’l Symp. on Field-Programmable Gate Ar-
rays (FPGA), Feb 2017.
19 Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre,
and K. Vissers, “Finn: A framework for fast, scalable binarized neural
network inference,” in Proceedings of the 2017 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, ser. FPGA ’17. New
York, NY, USA: ACM, 2017, pp. 65–74. [Online]. Available: http:
//doi.acm.org/10.1145/3020078.3021744
20 T. Ajayi, K. Al-Hawaj, A. Amarnath, S. Dai, S. Davidson, P. Gao, G. Liu,
A. Lotfi, J. Puscar, A. Rao et al., “Celerity: An open-source risc-v tiered ac-
celerator fabric,” in Symp. on High Performance Chips (Hot Chips), 2017.
21 E. Nurvitadhi, D. Sheffield, J. Sim, A. Mishra, G. Venkatesh, and D. Marr,
“Accelerating binarized neural networks: Comparison of fpga, cpu, gpu, and
asic,” in Field-Programmable Technology (FPT), 2016 International Conference on.
IEEE, 2016, pp. 77–84.
102
22 L. Ni, Z. Liu, H. Yu, and R. Joshi, “An energy-efficient digital reram-crossbar
based cnn with bitwise parallelism,” IEEE Journal on Exploratory Solid-State
Computational Devices and Circuits, 2017.
23 T. Tang, L. Xia, B. Li, Y. Wang, and H. Yang, “Binary convolutional neural
network on rram,” in Design Automation Conference (ASP-DAC), 2017 22nd Asia
and South Pacific. IEEE, 2017, pp. 782–787.
24 S. Yu, Z. Li, P. Y. Chen, H. Wu, B. Gao, D. Wang, W. Wu, and H. Qian, “Binary
neural network with 16 mb rram macro chip for classification and online train-
ing,” in 2016 IEEE International Electron Devices Meeting (IEDM), Dec 2016, pp.
16.2.1–16.2.4.
25 M. Prezioso, F. Merrikh-Bayat, B. Chakrabarti, and D. Strukov, “Rram-based
hardware implementations of artificial neural networks: progress update and
challenges ahead,” in Proc. of SPIE Vol, vol. 9749, 2016, pp. 974 918–1.
26 U. E. Avci, R. Rios, K. Kuhn, and I. A. Young, “Comparison of performance,
switching energy and process variations for the tfet and mosfet in logic,” in
VLSI Technology (VLSIT), 2011 Symposium on. IEEE, 2011, pp. 124–125.
27 T. Krishnamohan, D. Kim, S. Raghunathan, and K. Saraswat, “Double-gate
strained-ge heterostructure tunneling fet (tfet) with record high drive currents
and 60mv/dec subthreshold slope,” in 2008 IEEE International Electron Devices
Meeting, Dec 2008, pp. 1–3.
28 Q. Zhang, G. Iannaccone, and G. Fiori, “Two-dimensional tunnel transistors
based on bi2se3 thin film,” IEEE Electron Device Letters, vol. 35, no. 1, pp. 129–
131, 2014.
29 C. Anghel, A. Gupta, A. Amara, A. Vladimirescu et al., “30-nm tunnel fet with
improved performance and reduced ambipolar current,” IEEE Transactions on
Electron Devices, vol. 58, no. 6, pp. 1649–1654, 2011.
30 S. Sahay and M. J. Kumar, “Controlling the drain side tunneling width to re-
duce ambipolar current in tunnel fets using heterodielectric box,” IEEE Trans-
actions on Electron Devices, vol. 62, no. 11, pp. 3882–3886, 2015.
31 J. Wu and Y. Taur, “Reduction of tfet off-current and subthreshold swing by
lightly doped drain,” IEEE Transactions on Electron Devices, vol. 63, no. 8, pp.
3342–3345, 2016.
103
32 Y. Zhang, K. He, C.-Z. Chang, C.-L. Song, L.-L. Wang, X. Chen, J.-F. Jia,
Z. Fang, X. Dai, W.-Y. Shan et al., “Crossover of the three-dimensional topo-
logical insulator bi2se3 to the two-dimensional limit,” Nature Physics, vol. 6,
no. 8, pp. 584–588, 2010.
33 H. Ko¨hler and C. R. Becker, “Optically active lattice vibrations in bi2se3,” phys-
ica status solidi (b), vol. 61, no. 2, pp. 533–537, 1974.
34 G. Fiori and G. Iannaccone, “Nanotcad vides,” May 2016 [Online].
35 S. H. Jo, T. Kumar, C. Zitlaw, and H. Nazarian, “Self-limited rram with on/off
resistance ratio amplification,” in VLSI Technology (VLSI Technology), 2015 Sym-
posium on. IEEE, 2015, pp. T128–T129.
104
CHAPTER 5
ARTIFICIAL NEURAL NETWORKS (ANNS) FOR DEVICE COMPACT
MODELING
5.1 Introduction
Device compact modeling bridges device researches to their applications, al-
lowing circuit level simulations before the hardwares are production-ready. The
predominant compact models are physics-based,1, 2 where fundamental device
physics are used as the building blocks, and empirical equations are hand-
crafted to modify and merge physical expressions into smooth analytical func-
tions. However developing a high-quality physics-based compact models is
very expensive and time-consuming. In order to quickly incorporate new gen-
erations of devices into circuit simulations, data-oriented modeling methods are
developed to circumvent the detailed physics, focusing on delivering numeri-
cally stable and computationally efficient models directly from the device data.
Table look-up models are current widely-used data-oriented models. Arti-
ficial neural networks (also rebranded as “deep learning”) has also raised a lot
of interests.3–6 Comparing table look-up models and artificial neural networks
(ANNs), in theory, the neural network model performs better on the following
three aspects.
1. Scalability: In order to achieve certain level of accuracy, the table lookup
model needs a large amount of data, and the space complexity increases
exponentially with increasing dimensions. In contrast, the neural network
model is lightweight and scalable.
105
2. Generalization: The table lookup model has poor generalization perfor-
mance. The polynomial fitting used in the table lookup model often has
high out-of-sample errors. In contrast, by using correct learning algo-
rithms, neural network model can be well generalized, which make it
more robust against noises.
3. Smoothness: An ideal compact model needs to be infinitely differentiable.
The table lookup model is not infinitely differentiable due to the nature
of polynomial fitting, while using higher order polynomial fitting will im-
prove the smoothness, and it is at the expense of computation efficiency.
Therefore, the table lookup model is not possible to be both smooth and
computationally efficient. In contrast, the neural network model is guar-
anteed to be infinitely differentiable.
Despite of all the theoretical benefits of a neural network, a fundamental
question arise: Can neural network model very small current in the deep sub-threshold
region or around VDS equals zero?. Gradient-based learning in a neural network
relies on the gradients of the loss function. A common loss function is the mean
squared error (MSE): 1/N∗∑i(predi−labeli)2, where N is the number of examples,
labeli and predi are the true value and the neural network output of each exam-
ple i. Its partial gradient with respect to predi scales linearly with the value of
predi. In the context of device modeling, the value of predi (i.e. current density)
varies over 8 orders of magnitudes. Assuming the max value of predi has been
normalized to 1. For a very small current value (< 10−6), its partial gradient is
too small to have significant impacts on the training. Even worse is that most
ANNs compact modeling frameworks3, 4, 6 used the vanilla form of feed-forward
neural networks known as multi-layer perceptions (MLPs) with hyperbolic tan-
gent (tanh) activation functions. In this neural network architecture, nothing
106
stops its prediction value to oscillate around zero when the label value is very
small. It leads to the unphysical behaviors in both the ID-VDS and ID-VGS curves.7
This is a fundamental limitation of MLPs with tanh activation functions and the
MSE loss function.
Learned from the groundbreaking successes of deep learning in image clas-
sification8 and speech recognition,9 it is important of leveraging invariants in
the problem to design structured neural network architectures. In the previ-
ous work,7 we encoded the fundamental device physics into the new neural
network architecture: Physics-based neural networks (Pi-NN) to overcome the
fundamental limitation of MLPs. As far as we are aware, Pi-NN is the only
neural network based compact modeling framework can accurately model the
deep sub-threshold region. The major criticism of the previous work is that the
current needs to multiply a scalar function in the form of exp(−a(VG + b)) for
better deep sub-threshold modeling. Besides complicating the modeling pro-
cess by introducing more hyper-parameters, this pre-processing method failed
to improve deep sub-threshold modeling when the drain voltage affecting the
threshold voltage (e.g. Drain-induced barrier lowering (DIBL) effect1).
In this work, we redesigned the loss function of Pi-NN to successfully elim-
inate the need of this tricky pre-processing step in the original work,7 laying
the groundwork for the new compact modeling framework: Pi-NN. We dis-
cussed how Pi-NN utilized the invariants of device physics in the section 5.3.
In the section 5.5, we proposed the new reweighted L1 loss function for effi-
cient training. The Pi-NN framework has been used to generate compact mod-
els from experimental data of a Gallium Nitride (GaN) High Electron Mobility
Transistor (HEMT),10 and theoretical simulated data of Two-dimensional Het-
107
erostructure Interlayer Tunneling Field Effect Transistors (Thin-TFETs).11 The
proposed Pi-NN framework is 1) the only framework that can generate an accu-
rate and smooth transistor compact model in all operation regimes; and 2) an
generic framework such that it can be used to model very different devices (e.g.
Tunnel FETs, HEMTs, and other exotic transistors) without any device-specific
modification and preprocessing.
5.2 Previous Works
There have been several recent advances aimed at developing neural network
based compact models. Wang and Zhang et al.5, 12, 13 proposed to use neural
networks for RF and microwave design. In the proposed knowledge-based
neural models,5 the neural network structure embedded the empirical or semi-
analytical functions as the activation functions, and the problem dependent
boundary functions as the “boundary layer”. This neural network structure
combines the empirical functions usually valid only in a certain region of the
parameter space. However, this knowledge-based neural models relies on the
selected empirical functions which may be different for different devices. There-
fore it may not be a generic framework. Moreover, the quality of these models
are largely dependent on the quality of the selected empirical functions. And
those practical empirical functions may not be available for emerging devices.
Xu and Root et al.14–16 developed the commercialized framework NeuroFET in-
side Agilent IC-CAP. However, this framework works poorly in the very small
current region (e.g. deep sub-threshold region) as discussed in the Section 5.1.
This limitation was also recognized by Zhang et al.6 Zhang et al. enhanced the
ANN model accuracy with data preprocessing, which transfers (VGS ,VDS , IDS )
108
to (VGS , log(VDS ), log(IDS )). However, this data preprocessing method has three
limitations: 1) An assumption of this method is the linear IDS −VDS dependence,
but exponential IDS −VDS dependence is also very common due to short channel
effects such as DIBL; 2) Since the model has to exchange source and drain when
a negative VDS is given, the model won’t be able to guarantee smooth deriva-
tives across VDS equals zero; 3) For VDS equals zero, either an exact zero current
needs to be assigned or a smooth function needs to be implemented to guaran-
tee an exact zero current. Also, one additional training data point close to zero
is required for improving the model accuracy in the linear region.
All of these limitations stem from the ignorance of the device physics in the
MLP network structure. Regardless of the preprocessing, MLPs still treats VDS
and VGS inputs interchangeable even thought they are responsible for two very
different physical effects in the device. The more graceful solution is to incor-
porate these intrinsic structures in the input space into the neural network ar-
chitecture. The two key contributions of this work are 1) the Pi-NN architec-
ture, which is structured according to the invariants in the fundamental device
physics, and 2) the reweighted L1 loss function, which ensures accurate mod-
eling in all device operation region without the need of tricky preprocessing
steps. Combining the Pi-NN architecture and the reweighted L1 loss function,
the new Pi-NN framework can 1) eliminate the need of the preprocessing in
Zhang et al.;6 2) generate accurate and smooth compact models in all operation
region; 3) easily adapt to different devices.
109
5.2.1 Low Current Regime Challenge
When the VDS is close to zero or the device is in subthreshold region, the current
is very small comparing to the ON current. Modeling the output with very
large range (around 8 order of magnitudes) is challenging for neural networks.
Moreover, I-V relationships are also very different in the low current regime for
the gate voltage or the drain voltage. IDS is exponentially dependent on VGS
while linearly dependent of VDS in the low current regime. Therefore, there are
the following challenges when approximating with the neural network:
1. For common loss functions, such as L2 loss, the gradient decreases with
smaller outputs. Therefore, neural networks failed to accurately model
the low current regime;
2. It is hard to model exact zero in MLP. However, when VDS equals zero, the
output should be exact zero;
3. Along different input features VGS and VDS, the relationships are vastly
different (one is exponential and another is linear).
We further illustrate these challenges by using the MLP neural network to
generate a compact model for the DC I-V curves of the Thin-TFET.11 Thin-TFET
structure is shown in Fig.3.1(b). The training data are simulated for the top gate
voltage (VTG) from 0 to 0.4 V and the drain-source voltage (VDS) from -0.1 to
0.4 V with an uniform step of 0.01 V, while the test data are for VTG from 0.005
to 0.405 V and VDS from -0.095 to 0.405 V with an uniform step of 0.01 V. The
MLP neural network architecture and its well-established learning algorithms
are shown in Fig.5.1.
110
Previous works: 
Multilayer Perceptron (MLP) Neural Network
??
??
???
??
???
???
???
??
???
???
??
???
??
σ ( )
= +
=
=
−
L, ] :
W
for each layer l in
z x b
x z
y x
[1 ,2 ...
ll l l
l l
L
1
???
??
????
??
???
???
???
??
??
??
???
???
??
• l : Layer Index
• N
l
 : No. of neurons in the lth layer
Symbols:
Parameters: W
b
l
l
the
w
i (l - 1)
j l
b
i l
: the weight matrix, whose element connects
the  neuron of the  layer
to  neuron of the layer
: the bias vector, whose element is the bias of
the  neuron of the layer
ji, l
i, l
th th
th th
th th
( )
∂
∂
=
∂
∂
−
∂
∂
=
∂
∂
∂
∂
∂
∂
=
∂
∂
−
,1] :
W
For the output layer
E
x
E
y
For the hidden layers
for each layer l in L L
E
z
x
z
E
x
E
x
E
z
(* *)
(* *)
[ , 1 ...
l l
l
l
l l
l
l
T
l1
o
??
???
??
??
???
?
= −E y t where t is the desired output vector of y
1
2
2
• L : No. of layers in the network
• x
l
 : The output vector of the lth layer 
• y : The output vector of the network 
• z
l
 : The intermediate vector 
          of the lth layer 
•      : The entrywise product 
          of two vectors 
 • x
0 
: The input vector of the network
Output vector:
......
......
Input vector: x
0
y
layer  L
(Ouput layer)
layer  L-1
(Hidden layer)
......
layer  1
(Hidden layer)
layer  0
(Input layer)
w
11, 1
w
21, 1
w
1(N1-1), 1
w
2N1, 1
......
??
???
???
???
???
???
???
??
???
?
Figure 5.1: The Multiplayer Perception (MLP) neural network model
111
Pick hyperparameters: 
1. numbers of neurons in each
     hidden layers;
2. Max loss scale value for 
    scaled loss function; 
3. Optimization parameters.
For example:
Features:  (VGS  , VDS )
Labels:   ID
Raw Data: Preprocessing :
1. Scale the output to fall within the range from  -0.85 to 0.85;
  2.  Scale the inputs to fall within the range from -1 to 1.
Initialization:
Initialize weights and bias with uniform
distribution from -1/ dimension to 1/dimension.
Split
Training Data
Evaluation Data
Training:
1. Compute the gradients using back-propagation;
  2. Update the model using AdaGrad.
Model
Evaluation:
   Compute loss and metrics on
   the evaluation data.
Is model
good enough?
Yes
No
End
Figure 5.2: A training procedure for Artificial Neural Network (ANN) device
compact modeling.
We follow the training procedure shown in Fig.5.2. After some initial train-
ing, we choose to use MLP neural networks with two hidden layers and defined
its hyper-parameter as (i, j), where i is the number of neurons in the first hidden
layer and j is the number of neurons in the second hidden layer. Each neuron
uses the hyperbolic tangent function tanh(x)=(ex-e−x)/(ex+e−x) as the activation
function. By choosing the hyper-parameter (i, j) to be (5, 5), (7, 7) and (9, 9),
these three MLP neural networks were trained for 5 million epochs. Using the
loss function defined in Fig. 3, the root-mean-squared (R.M.S) deviations for
training data and test data are plotted in Fig.5.3(a). The test errors are used to
evaluate the generalization ability of the model, namely how the model fit the
unseen data. As shown in Fig. 4(a), the test errors stay close to the training
errors, which indicated a good generalization. We choose to plot the I-V curves
modeled by the MLP neural network with 7 tanh neurons in the first and second
hidden layers as shown in Fig.5.3(b), which gives a neural network with 15 neu-
rons and 85 parameters in total. Figure 5.3(c-f) show the I-V curves generated
by the MLP neural network compact model along with the training data and
the test data. Good fitting in the linear scale is achieved for both the IDS-VDS
and the IDS-VTG curves. However, if we zoom in the region near VDS = 0, IDS
is not zero when VDS is zero, indicating the IDS-VDS relationship is unphysical
112
around VDS = 0 (see Fig. 4(e) and the inset). Moreover, the IDS-VTG relationship
is also unphysical in the sub-threshold region (shown in Fig.5.3(f)). The fun-
damental reason of these unphysical behaviors is that the MLP neural network
has no knowledge of the device physics; therefore the fitting is no longer phys-
ical when ID is very small. In order to eliminate these unphysical behaviors,
we have to design a neural network with apriori knowledge of the fundamental
device physics.
5.3 The Idea of Pi-NN: Structured Physical System
Structured models make independent assumptions to limit the size of the con-
figuration set. Structures introduce invariance in the system. Invariance serves
as the prior knowledge. Prior knowledge profoundly influences the effective-
ness of learning. For example, the convolutional neural network (CNN)17 incor-
porates spatial invariance, and the recurrent neural network (RNN)18 incorpo-
rates natural ordering.
When comes to device modeling, we first note that the inputs VDS and VTG
are related to two different physical effects: VDS drives the current through the
device while VTG controls the channel potential profile to change the magnitude
of the current. Therefore, VDS and VTG should be fed to two different neural
networks as shown in Fig.5.4. According to the fundamental device physics, we
know IDS-VDS curves have a linear region at small VDS and a saturation region at
large VDS. This behavior is similar to a tanh function. This indicates VDS should
be fed into a neural network with tanh activation functions (tanh subnet). To
ensure IDS equals zero when VDS equals zero, all the tanh neurons in the tanh
113
−0.1 0.0 0.1 0.2 0.3 0.410
-4
10-3
10-2
10-1
100
101
102
103
−0.04 −0.02 0.00 0.02 0.04−50
0
50
100
150
−0.1 0.0 0.1 0.2 0.3 0.4−50
0
50
100
150
200
250
300
350
−0.1 0.0 0.1 0.2 0.3 0.4−50
0
50
100
150
200
250
300
350
V
DS  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
V
TG
= 0.4 V
    0.385 V
         0.37 V
    ...
   0.01 V
     0.005 V
   0
V
TG  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
V
TG  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
V
DS  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
Training Data
Test Data
Compact Model
Training Data
Test Data
Compact Model
V
DS
= 0.4 V
    0.385 V
         0.37 V
    ...
   0.01 V
     0.005 V
   0
Training Data
Test Data
Compact Model
Absolute values of 
negative numbers
V
DS
= 0.4 V
    0.385 V
         0.37 V
    ...
   0.01 V
     0.005 V
   0 V
   
Training Data
Test Data
Compact Model
V
TG
= 0.4 V
    0.385 V
         0.37 V
    ...
   0.01 V
     0.005 V
   0
(c) (d)
(e) (f)
−0.004 −0.002 0.000 0.002 0.004−1.0
−0.5
0.0
0.5
1.0
V
DS  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
Unphysical
Unphysical
Unphysical
104 105 106 107
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Training Error
Test Error
(9, 9)
(7, 7)
(5, 5)
Hyperparameter (i, j):
i(j): no. of neurons in the
    !rst(second) hidden layer
 
Number of Epochs
 
R
.M
.S
 D
e
v
ia
ti
o
n
 (
µA
/µ
m
)
(a)
Hyperparameter
                   (i,  j) = ......
......
V
TG
w
11, 1
w
21, 1
w
1(N1-1), 1
w
2N1, 1
......
V
DS
I
D
7 tanh 
neurons
7 tanh 
neurons
85 parameters
 in total
(b)
Figure 5.3: The compact model of the n-type Thin-TFET derived based on the
MLP neural network widely used in previous works,3–5 (a) the training errors
and test errors for a variety of hyper-parameters; (b) the MLP neural network
with 7 tanh neurons in the first and second hidden layers. From (c) to (f), the
I-V curves generated by the MLP neural network shown in (b) are plotted along
with the training data and the test data: (c) IDS versus VDS at different VTG; (d)
IDS versus VTG at different VDS in linear scale; (e) IDS versus VDS at different VTG
around VDS = 0 V, the embedded plot shows unphysical IDS-VDS relationships
around VDS equals 0; (f) IDS versus VTG at different VDS in semi-log scale, un-
physical oscillation of IDS around zero appears in the sub-threshold region and
when VDS = 0 V.
114
sig input tanh input
Output
FC
sig tanh
Pi-NN Block
FC
w/o bias
FC
FC
sig tanh
FC
FC
w/o bias
Figure 5.4: The architecture of Pi-NN. The shaded area indicates a Pi-NN block,
which is the building block of Pi-NN network.
subnet must have no bias terms. On the other hand, the IDS-VTG curves have an
exponential turn-on in the sub-threshold region and then become a polynomial
in the ON region. This is best simulated as a sigmoid function sig(x)=1/(1+e−x).
Therefore, VTG is fed into a neural network with sigmoid activation functions
(sig subnet). It should be noted that we assumed gate leakage current is neg-
ligible, so VTG would not change the sign of IDS. The final drain current is the
entrywise product of the outputs of the tanh subnet and the sig subnet. This
entrywise product reflects the control of VTG on the drain current driven by VDS.
In addition, VDS can affect the channel potential profile controlled by VTG due to
various non-ideal effects such as the short channel effects. Therefore weighted
connections are added between each layer in the tanh subnet and its correspond-
ing layer in the sig subnet. By embedding the above device physics in a neural
115
network structure, we arrive at the Physics-Inspired Neural Network (Pi-NN).
The Pi-NN architecture and its pseudo-codes for the feed-forward and error
back-propagation algorithms are shown in Fig.5.5.
5.4 Adjoint Sensitivity Network
The neural network sensitivity analysis is to find the network output sensitivi-
ties with respect to variations in the inputs of multilayer feedforward neural net-
works with differentiable activation function.19, 20 This analysis has been used to
understand variable contributions in the neural network21 and visualize the im-
portance of inputs with respect to classification decision.22, 23 Recently, the sen-
sitivity analysis also provides the gradient information to fool the deep neural
networks to produce high confidence classifications of unrecognizable images.24
In the context of device modeling, unlike the application of sensitivity analy-
sis mentioned above, we would like to train the origin network with the output
sensitivity with respect to the inputs. Therefore, we create a new network called
adjoint sensitivity neural network (adjoint network).25 Shown in Fig.5.6, the ad-
joint network share the same parameters (weights) with the origin one, and the
take the activations (i.e. the output of the activation function) from each layer
in the origin network as the input. The adjoint network has the same input di-
mension as the output of the original network, and the same output dimension
as the input of the original network. The input to the adjoint network is a vector
with only one non-zero element (the selector). If the ith element of the selector is
non-zero, the output of the adjoint network is gradient of ith element of output
with respect to the input vector. Here the adjoint sensitivity network mainly
116
This work: 
Physics-Inspired Neural Network (Pi-NN)
??
??
???
??
???
???
???
??
???
???
??
???
??
???
??
????
??
???
???
???
??
??
??
???
???
??
T(S)
• N
l
  : No. of neurons in the lth layer of the tanh(sig) subnet 
Symbols:
Parameters:
N
l
  = N
l     
(The output layers of the two subnet have same no. of neurons)Constraint:
= −E y t where t is the desired output vector of y
1
2
2
• l : Layer Index • L : No. of layers in the network
T S
• d
l
(g
l
): The output vector of the lth layer in the tanh(sig) subnet  
• y : The output vector of the network 
• z
l
 : The intermediate vector of the lth layer in the tanh(sig) subnet  
•      : The entrywise product of two vectors 
• d
0
(g
0
)
 
: The input vector of the tanh(sig) subnet
(Note: There is no bias in the tanh neurons  of the tanh subnet)
,1] :
For the output layer
E
g
d
E
y
E
d
g
E
y
For the hidden layers
for each layer l in L L
E
z
g g
E
g
sig subnet
E
g
W
E
z
sig subnet
E
z
d d
E
d
W
E
z
tanh subnet peephole connections
E
d
W
E
z
tanh subnet
(* *)
,
(* *)
[ , 1 ...
1 (* *)
(* *)
1
(* & *)
(* *)
l
S
l
l
L
L
L
L
l
S l l
l
l
T
l
S
j l
T l l
l
P
T
l
S
l
T
T
l
T
1
,
1
o o
o
 
o
 
o o
( )
( )
( )
( )
( )
∂
∂
=
∂
∂
∂
∂
=
∂
∂
−
∂
∂
= −
∂
∂
∂
∂
=
∂
∂
∂
∂
= −
∂
∂
+
∂
∂




∂
∂
=
∂
∂
−
−
W
W
b
l
S
l
T(S)
l
P
w
w
tanh(sig) subnet,whose element
i (l - 1) j l
tanh subnet  
       tanh subnet, whose element i l
tanh subnet j l sig subnet
b i l
sig subnet
: the weight matrix for the connects
the  neuron of the  layer to  neuron of the layer
: the weight matrix for the peephole connections from the to the
connects the  neuron of the  layer in 
       the to  neuron of the layer in the 
: the bias vector, whose element is the bias of the  neuron of the layer
      in the 
ji, l
ji, l
i, l
T S
P
S
( )
th th th th
th th
th th
th th
the
the
L, ] :
W
W W
for each layer l in
z d tanh subnet
d tanh z tanh subnet
z d g b
sig subnet peephole connections
g sig z tanh subnet
y d g
[1 ,2 ...
(* *)
(* *)
(* & *)
(* *)
l
T
l l
l
T
l
l l
T
l
S P
l
S
l l
S
l l
S
L L
1
1
o
( )
( )
=
=
= + +
=
=
−
−
??
???
??
??
???
?
sig
....
..
sig
tanh
tanh
tanh
....
..
tanh
sig
tanh
sig
sig
y
d
0 g0
layer  L
(Ouput layer)
layer  L-1
(Hidden layer)
......
layer  0
(Input layer)
??
???
???
???
???
???
???
??
???
?
Figure 5.5: The Physics-Inspired Neural Network (Pi-NN) model.
117
[x1 x2  x3]
[y1 y2]
???????????????? ???????????????
???
????
???
?
[            ]
[f’(y1) 0]
∆y1
∆x1
∆y1
∆x2
∆y1
∆x3
??????????????? ??????????????????????????????
????????? ?????? ???????? ??????
Figure 5.6:
serves two applications:
1. Use the adjoint sensitivity network to train the parameters of the original neural
network: In circuit network analyzer measurements, we are able to obtain
gradient responses of the outputs with respect to the inputs. For exam-
ple, a capacitance between two terminals is the partial derivative of the
charge from one terminal with respect to the voltage on the other terminal
(shown in Fig.3.1). Training an adjoint sensitivity network on the capac-
itance data will, at the same time, generate the original neural network,
which outputs the terminal charges at different terminal voltage. More
generally, the adjoint sensitivity network trains on the Jacobian matrix of
a vector function, and its trained parameters are shared with the original
neural network, which approximates the vector function.
2. Train the parameters of the original neural networks and use the adjoint sensitiv-
ity to output the first-order partial derivative between the outputs and inputs (i.e.
Jacobian matrix): In I-V modeling, the first-order partial derivatives of the
drain current with respect to the gate and drain voltage are the transcon-
118
ductance and output conductance respectively. Also monotonicity of the
model in interpolation and extrapolations is an important model verifica-
tion metric for some devices. The adjoint sensitivity network can be used
to verify the monotonicity in interpolation and extrapolations.
The adjoint sensitivity network can be constructed from the origin neural
network layer-by-layer (shown in Fig.5.7). First, we define a sensitivity vector
β:
βlqi =
∂yq
∂γli
=
Nl+1∑
k=1
∂yq
∂γl+1k
∂γl+1k
∂γli
=
Nl+1∑
k=1
∂yq
∂γl+1k
∂γl+1k
∂xli
∂xli
∂γli
= xli(1 − xli)
Nl+1∑
k=1
βl+1qk W
l+1
ki
(5.1)
For a MLP with sigmoid activation function in Fig.5.7(a), yq is qth element
of the output vector; γli is the i
th element of the output vector of FC layer in lth
layer; Nl is the number of neurons in lth layer; xli is the i
th element of the output
vector of sigmoid functions (i.e. activation) in lth layer; and W lki is the weight
connecting ith element of the input vector to kth element of the output vector in
lth layer. We can turn Eq.5.1 into a computation graph, resulting in Fig.5.7(a).
For the last layer L in the origin layer (i.e. the first layer in the adjoint layer):
βLqi = yq(1 − yq) iff q = i otherwise 0 (5.2)
Therefore the input vector of the adjoint network only has one non-zero el-
119
ement, and if the qth element of the input vector is non-zero then the output
vector is the gradient of qth element of output with respect to the input vector.
As for the Pi-NN block and its adjoint network shown in Fig.5.7(b), there
are two input vectors and two output vectors in the adjoint block. The output
vector of Pi-NN is the entrywise product of the output vectors S L and T L from
the last Pi-NN block. Therefore, we define two sensitivity vector β and α defined
in Eq.5.3 and Eq.5.4.
βlqi =
∂yq
∂γli
= S li(1 − S li)
NSl+1∑
k=1
βl+1qk [WS ]
l+1
ki
(5.3)
αlq j =
∂yq
∂ lj
= (1 − (T lj)2)
NTl+1∑
k=1
αl+1qk [WT ]
l+1
k j +
N Il+1∑
i=1
βlqi[WI]
l+1
i j
(5.4)
As shown in Fig.5.7(b), yq is qth element of the output vector; γli is the i
th
element of the output vector of FC layer in lth layer of sig subnet;  lj is the j
th
element of the output vector of FC layer in lth layer of tanh subnet; NS (T )l is the
number of neurons in lth layer of sig (tanh) subnet; S (T )li( j) is the i( j)
th element
of the output vector of sigmoid (tanh) functions (i.e. activation) in lth layer of
sig (tanh) subnet; and [WS ]lki is the weight connecting i
th element of the input
vector to kth element of the output vector in lth layer of sig subnet, similarly
[WT ]lk j in l
th layer of tanh subnet, and [WT ]lk j in l
th layer between the sig subnet
and tanh subnet. We can turn Eq.5.3 and Eq.5.4 into a computation graph as
well, resulting in Fig.5.7(b). For the last layer L in the origin layer (i.e. the first
layer in the adjoint layer):
120
xl
FC
sig
MLP Block
xl+1
Wl+1
βl+1
Adjoint MLP Block
[Wl+1]T
(1-xl)xl
βl
FC
w/o bias
Shared 
variable
Sl Tl
FC
sig tanh
Pi-NN Block
FC
w/o bias
FC
Sl+1 Tl+1
WS
l+1
WI
l+1
WT
l+1
βl+1
Adjoint Pi-NN Block
FC
w/o bias
[WS
l+1]T
(1-Sl)Sl
βl
FC
w/o bias
FC
w/o bias
αl+1[WT
l+1]T
1-(Tl)2
[WI
l+1]T
αl
Shared 
variable
(a)
(b)
Figure 5.7: (a) The adjoint network of a fully connected (FC) layer with sigmoid
activation functions, where β=∇γy, γ is the output of FC layer, and y is the out-
puts of the neural network; (b) The adjoint network of a Pi-NN block, where
β=∇γy and α=∇δy, γ is the output of FC layer in sig subnet, δ is the output of FC
in tanh subnet, and y is the outputs of the neural network.
βLqi = T
L
i (1 − S Li )S Li iff q = i otherwise 0
αLq j = S
L
i (1 − (T Li )2) iff q = j otherwise 0
(5.5)
After we construct the computation graph for both the original and the ad-
joint network, we can utilize the “automation” differentiation function in mod-
ern deep learning libraries such as Tensorflow and Caffe2 to generate the com-
121
putation graphs for back-propagation and updating.
5.5 Weighted L1 Loss Function
Even though Pi-NN has been structured for device modeling, a suitable opti-
mization algorithm is still needed to train Pi-NN properly. Unlike other ma-
chine learning tasks, the output of the model, namely the current density, varies
over 8 order of magnitudes and precise modeling in all output range is required.
To meet this demand, instead of usual mean square error loss function (L2 loss),
we proposed to use L1 loss function, and re-weight the element-wise loss to give
significant large gradient signals when the output value is extremely small. This
new loss function is named “weighted L1 loss function”. Figure 5.8 illustrates
how the weighted L1 loss function is computed.
The target value has a large range over 8 order of magnitudes illustrated in
Fig.5.8(a). We would like to assign higher weight to the loss with small target
value (shown in Fig.5.8(b)). The weight is computed as in Eq.5.6:
weighti =
Max(|target|)
|targeti| (5.6)
If the absolute value of targeti is 109 times smaller than the maximum value
among the absolute values in target, its weight will be 109. This weight value
for extremely small value could so large that trap the optimizer into some bad
local minimal. Therefore, we apply a hand-tuned “max loss scale” to limit the
the maximum scale of the weight (shown in Fig.5.8(c)). The scaled weight is
computed using Eq.5.7:
122
log scale
10 0
10 -9
sub-threshold
region
log scale
10 9
10 0
Target
with 
Large Range
Model Output
L1 Loss
Reweighted 
L1 Loss w/ 
Max Loss Scale
Compute dierent weights
for the loss function:
Weights
weight i=
|target| i 
|target|MAX 
10 9
10 0
Max
Loss 
Scale
Apply “max loss scale” limit:
(weight i - 1) (max loss scale - 1)
|target| MIN |target|MAX - 1
+ 1
L1 Loss i = |target i - Output i | 
scaled weight i =
Scaled
Weights
[a]
[b] [c]
Figure 5.8: Construction of the weighted L1 loss with max scale loss limit.
scaled weighti = (weighti − 1) max loss scale − 1Max(weighti) − 1 + 1 (5.7)
where the Max(weighti) = Max(|target|)/Min|target|. Finally we compute the
entrywise product of the L1 loss and the scaled weight to get the weighted L1
loss function:
weighted L1 lossi = scaled weighti  |targeti − outputi| (5.8)
Since L1 loss is a non-smooth function, it is considered to be “unstable” be-
cause it tends to “jump around” the solution. This “instability” property is due
to that the gradient of L1 loss stays constant no matter how close to the solu-
123
tion. In practice, we find this instability property actually helps to prevent the
optimizer getting stuck in the sub-optimal local minima. On the other hand,
in order to achieve a stable solution, we use AdaGrad algorithm,26 where its
learning rate increases with accumulation of the squared gradients. Therefore
the final solution will be stable due to decreasing learning rate.
In order to evaluation the performance with different max loss scale, we use
two metrics: the first one is the “weighted L1 metric”. Unlike the weighted
L1 loss used for training, there is no “max loss scale” applied to weighted L1
metric, namely defined in Eq.5.9:
weighted L1 metrici = weighti  |targeti − outputi| (5.9)
Weighted L1 metric provides an consistent measure of model performance
for different max loss scale values. Since the very small target values in the train-
ing data receive very large weight, weighted L1 metric is usually dominated by
the errors coming from the targets with small values.
5.6 Experiments
5.6.1 Modeling of GaN HEMT
To demonstrate the ability of Pi-NN to accurately model I-V characteristics, we
trained Pi-NN on the experimental measurement data of a promising high-
power, high frequency device: Gallium Nitride (GaN) HEMT. The details of
the device is discussed by Schuette et al.10 Its I-V characteristics are plotted in
124
? ?
???
??
?
?
???
Train + Eval Data
???????
? ?
???
??
?
?
? ?
???
??
?
?
? ?
???
??
?
?
???
???????
???
???????
???
???????
Train + Eval Data
Train + Eval Data
Train + Eval Data
VGS= -1 V 
VGS= -0.30 V
VGS= 0.40 V
VGS= 1.1 V
VGS= 1.8 V
VGS= 2.5 V∆VGS= 0.35 V
VDS= 0 V 
VDS= 6 V
∆VDS= 0.12 V
Figure 5.9: The I-V characteristics of a GaN HEMT: (a) IDS versus VDS at different
VGS in the linear scale and (b) in the log scale; (c) IDS versus VGS at different VDS
in the linear scale and (d) in the log scale.
Fig.5.9.
Throughout this experiment, We randomly left 20% of data as the evaluation
set. We also fixed the Pi-NN structure to have two layers of Pi-NN blocks as
125
Eval L1 error
??????????????
?
??
??
??
??
??
??
??
???
??? ???
??
??
??
???
Train weighted L1 metric
Eval weighted L1 metric
Train L1 metric
v l L  metric
X 10 -3
??????????????
Figure 5.10: (a) The weighted L1 metric versus max loss scale. Each red circle
represents the training weighted L1 metric, and each blue circle represents the
evaluation weighted L1 metric. The blue and red line are the average value of
each runs. (b) The L1 metric versus max loss scale. Each red circle represents
the training L1 metric, and each blue circle represents the evaluation L1 metric.
The blue and red line are the average value of each runs.
shown in Fig.5.4. Both the FC layers in the first Pi-NN block have 16 neurons
and both the FC layers in the second Pi-NN block have 1 neuron. To select the
max scale loss and base learning rate for optimization, we first fixed the base
learning rate of AdGrad to 0.1 (with  = 0.0001), and varied the max scale loss
from 102 to 5×105. Since the stochastic nature of the optimization, for each max
scale loss value, we repeated the training 5 times, and each model were trained
for 106 epochs.
When increasing max loss scale, the weighted L1 metric value decreases as
shown in Fig.5.10(a), while the L1 metric value increases as shown in Fig.5.10(b).
As discussed at the end of Section 5.5, lower weighted L1 metrics indicates bet-
126
??????????????????
??
??
??
??
??
??????????????????
??
??
??
??
?
??????????????????
??
??
??
??
??
??
???
??????????????????
??
??
??
??
?
??
???
??????????????????
??
??
??
??
??
??
??
??
??
??
???
??????????????????
??
??
??
?
??
??
??
??
??
??
??
???
??? ???
??? ???
??? ???
Loss function: Weighted L1 loss
limited by max loss scale = 5X104
Loss function: Weighted L1 loss
limited by max loss scale = 5X104
L1 metric: L1 error 
w/o weighting 
L1 metric: L1 error 
w/o weighting 
Weighted L1 loss w/o 
max loss scale limit
Weighted L1 loss w/o 
max loss scale limit
Figure 5.11: Each blue circuit represents one run, and the dash line is the average
value of multiple runs. (a) The train weighted L1 losses (with max loss scale
limit) versus different base learning rates; (b) The evaluation weighted L1 losses
(with max loss scale limit) versus different base learning rates; (c) The train L1
metric versus different base learning rates; (b) The evaluation L1 metric versus
different base learning rates; (a) The train weighted L1 metric (without max loss
scale limit) versus different base learning rates; (b) The evaluation weighted L1
metric (without max loss scale limit) versus different base learning rates.
ter small value region accuracy, and lower L1 metrics indicates better large value
region accuracy. Therefore, we picked 5×104 as our max loss scale. After deter-
mining the max loss scale, we varied the base learning rate shown in Fig.5.11.
127
?????
??
??
??? ???
?
??
??
??
??
??
??
??
???
Train weighted L1 metric
Eval weighted L1 metric
Train Loss
Eval Loss
?????
Loss function: Wighted L1 loss
limited by max loss scale = 5X104
Weighted L1 Metric 
w/o max loss scale limit
Figure 5.12: (a) The training/evaluation loss (weighted L1 loss with max loss
scale limit) versus epochs; (b) the training/evaluation weighted L1 metric (with-
out max loss scale limit) versus epochs.
Judged Fig.5.11, if the base learning is either too big or too small, the model
performance becomes bad and unstable. We chose the base learning rate equals
0.1 since it was able to achieve good and stable optimization result in both
weighted L1 metrics and L1 metrics.
With max loss scale equals 5×104 and base learning rate equals 0.1, we plot-
ted the weighted L1 loss (with max loss scale limit) and weighted L1 metric
(without max loss scale limit) at different epochs in Fig.5.12.
As shown in Fig.5.12(a), the optimizer first got stuck in a local minima before
found its way to further optimize the model. The evaluation loss stays close
with the train loss, which indicate good generalization. Figure 5.12(b) plots the
weighted L1 metric without max loss scale limits. It is worth to note that the
128
training weighted L1 metric had a huge spike before decreasing. It means, if
we would train with the weighted L1 metric without the max scale loss, we
would never be able to get out of the local minima. This observation proves the
necessity of having the max scale loss limit on the weighted L1 loss.
The model output along with the training and evaluation data are shown in
Fig.5.13. As shown, the model has excellent agreement with both the training
and evaluation data. Note that this GaN HEMT has a relatively severe DIBL
effect. The threshold voltage is also controlled by the drain voltage. So at small
VGS, the IDS is not only exponentially dependent on VGS, but also exponentially
dependent on VDS. This DIBL effect is challenging to model with traditional
hand-crafted compact model. Pi-NN, on the other hand, is able to model this
complicate dependence very well.
The interpolation and extrapolation abilities of a device model are also cru-
cial. In Fig.5.14, When extending VDS to 80% beyond the training VDS range
and VTG to +/- 40% beyond the training VGS range, the model has shown no
abnormal behavior.
Another important metric of a GaN HEMT model is monotonicity first-order
derivative. An abnormal oscillation in the device model will lead to artificial
high-frequency oscillations in the circuit simulation. We used the adjoint net-
work introduced in Section 5.4 to plot out both the transconductance and output
conductance (shown in Fig.5.15).
Both the transconductance and output conductance are positive and smooth
throughout the whole voltage range. The red arrow line in Fig.5.15(a) indicates
the peak transconductance voltage shifts at different drain voltages, which can
129
? ?
???
??
?
?
???
Train Data
Eval Data
???????
Model Ouput
Train Data
Eval Data
Model Ouput
Train Data
Eval Data
Model Ouput
Train Data
Eval Data
Model Ouput
? ?
???
??
?
?
? ?
???
??
?
?
? ?
???
??
?
?
???
???????
???
???????
???
???????
Figure 5.13: The I-V curves generated by the Pi-NN model are plotted along
with the training data (blue circles) and the evaluation data (red circles): (a) IDS
versus VDS at different VGS in the linear scale and (b) in the log scale; (c) IDS
versus VGS at different VDS in the linear scale and (d) in the log scale.
130
? ?
???
??
?
?
Train Data
Eval Data
???????
Model Ouput
VGS=4.9 V
VGS=-2.4 V
VGS=2.5 V
VGS=-1 V
Figure 5.14: The I-V curves generated by the Pi-NN model are plotted along
with the training data (blue circles) and the evaluation data (red circles) for IDS
versus VDS at different VGS in the linear scale. VDS for the model are extended
to 80% beyond the training VDS range and VTGextended to +/- 40% beyond the
training VGS range.
be explained by the combination of self-heating effect and DIBL effect. Overall,
Pi-NN was able to generate accurate, smooth, and computation efficient device
compact model.
131
? ?
???
??
?
?
???
???????
VDS=0, 0.03, ...
   ..., 5.7, 6.0 V
???
???????
? ?
?
??
??
??
?
?
Darker Blue represents
higher VDS
Darker Blue represents
higher VGS
VGS=-1, -0.82, ...
   ..., 2.32, 2.5 V
Figure 5.15: (a) The partial derivatives of the drain current with respect to the
gate voltage (transconductance) versus gate voltage at different drain voltages;
(b) the partial derivatives of the drain current with respect to the drain voltage
(output conductance) versus the drain voltage at different gate voltages. The red
arrow line in (a) indicates the peak transconductance voltage shifts at different
drain voltages, which can be explained by the combination of self-heating effect
and DIBL effect.
5.6.2 Modeling of Thin-TFET
We also tested the Pi-NN model on the Thin-TFETs simulation data. After initial
training, we chose to use Pi-NNs with one hidden layer and define the hyper-
parameter as (m, n), where m is the number of the tanh neurons in the hidden
layer and n is the number of the sigmoid neurons in the same hidden layer. The
test errors stay close to the training errors as shown in Fig. (a), which indicates
good generalization. Balancing between model complexity and accuracy, we
132
chose the model with the hyper-parameter (2, 3) as shown in Fig.5.16(b), which
give a small Pi-NN model with only 7 neurons and 20 parameters in total. Ex-
cellent modeling is demonstrated in both the ON region (shown in Fig. 6(c,
d)) and the sub-threshold region (shown in Fig. 6(f)). The IDS-VDS relationship
around VDS equals zero is shown in Fig.5.16(e). All the unphysical behaviors
that appeared in the MLP neural network model (shown in Fig.5.3) have been
eliminated. Moreover, thanks to the embedded device physics, the Pi-NN re-
quires much less parameters than the MLP neural network, which results in a
smaller, more efficient compact model.
5.7 Conclusion
Motivated by the need of high-quality compact models for emerging devices,
we have proposed a novel neural network: Pi-NN, along with weighted L1 loss
function for device compact modeling. With fundamental device physics incor-
porated, the Pi-NN method can produce accurate, smooth and computational
efficient transistor models with good generalization ability. GaN HEMT and
Thin-TFET are presented as examples to illustrate the capabilities of Pi-NN. The
adjoint network of Pi-NN have also been developed to model the differential in-
formation in the device measurements. Finally, the Pi-NN framework has been
implemented in Caffe2, which can be readily integrated on commercial mea-
surement and modeling systems.
133
−0.1 0.0 0.1 0.2 0.3 0.410
-4
10-3
10-2
10-1
100
101
102
103
−0.04 −0.02 0.00 0.02 0.04−50
0
50
100
150
−0.1 0.0 0.1 0.2 0.3 0.4−50
0
50
100
150
200
250
300
350
−0.1 0.0 0.1 0.2 0.3 0.4−50
0
50
100
150
200
250
300
350
V
DS  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
V
TG
= 0.4 V
    0.385 V
         0.37 V
    ...
   0.01 V
     0.005 V
   0
V
TG  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
V
TG  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
V
DS  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
Training Data
Test Data
Compact Model
Training Data
Test Data
Compact Model
V
DS
= 0.4 V
    0.385 V
         0.37 V
    ...
   0.01 V
     0.005 V
   0
Training Data
Test Data
Compact Model
V
DS
= 0.4 V
    0.385 V
         0.37 V
    ...
   0.01 V
     0.005 V
   
Training Data
Test Data
Compact Model
V
TG
= 0.4 V
    0.385 V
         0.37 V
    ...
   0.01 V
     0.005 V
   0
(c) (d)
(e) (f)
−0.004 −0.002 0.000 0.002 0.004−1.0
−0.5
0.0
0.5
1.0
V
DS  
(V)
D
ra
in
 C
u
rr
e
n
t 
(µ
A
/µ
m
)
(b)
104 105 106 107
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Training Error
Test Error
(2, 2)
(4, 3)
(2, 3)
(3, 4)
(3, 3)
Hyperparameter
                   (m, n) = 
Number of Epochs
 
R
.M
.S
 D
e
v
ia
ti
o
n
 (
µA
/µ
m
)
(3, 2)
m(n): no. of tanh(sigmoid) 
             neurons in the hidden layer
 
(a)
sig
sig
tanh
tanh
tanh sig
sig
V
DS VTG
20 parameters
 in total
I
D
Figure 5.16: For the Pi-NN developed in this work, (a) the training errors and
test errors for a variety of hyper-parameters. (b) the Pi-NN model with 2 tanh
neurons and 3 sigmoid neurons in the hidden layer. From (c) to (f), the I-V curves
generated by the Pi-NN model shown in (b) are plotted along with the training
data and the test data: (c) IDS versus VDS at different VTG; (d) IDS vs. VTG at
different VDS in linear scale; (e) IDS vs. VDS at different VTG around VDS = 0,
the embeded plot shows well-behaved IDS-VDS relationship around VDS = 0; (f)
IDS vs. VTG at different VDS in semi-log scale, good fitting is achieved in the
sub-threshold region. All the unphysical behaviors of the MLP neural network
(shown in Fig.5.3) are eliminated, and the size of the neural network is largely
reduced.
134
BIBLIOGRAPHY
1 Y. Tsividis and C. McAndrew, Operation and Modeling of the MOS Transistor.
Oxford Univ. Press, 2011.
2 Y. S. Chauhan, S. Venugopalan, M. A. Karim, S. Khandelwal, N. Paydavosi,
P. Thakur, A. M. Niknejad, and C. C. Hu, “Bsimindustry standard compact
mosfet models,” in ESSCIRC (ESSCIRC), 2012 Proceedings of the. IEEE, 2012,
pp. 30–33.
3 J. Xu and D. E. Root, “Advances in artificial neural network models of active
devices,” in Numerical Electromagnetic and Multiphysics Modeling and Optimiza-
tion (NEMO), 2015 IEEE MTT-S International Conference on. IEEE, 2015, pp.
1–3.
4 H. B. Hammouda, M. Mhiri, Z. Gafsi, and K. Besbes, “Neural-based models
of semiconductor devices for spice simulator,” American Journal of Applied Sci-
ences, vol. 5, no. 4, pp. 385–391, 2008.
5 F. Wang and Q.-J. Zhang, “Knowledge-based neural models for microwave
design,” IEEE Transactions on Microwave Theory and Techniques, vol. 45, no. 12,
pp. 2333–2343, 1997.
6 L. Zhang and M. Chan, “Artificial neural network design for compact model-
ing of generic transistors,” Journal of Computational Electronics, pp. 1–8, 2017.
7 M. Li, O. I˙rsoy, C. Cardie, and H. G. Xing, “Physics-inspired neural networks
for efficient device compact modeling,” IEEE Journal on Exploratory Solid-State
Computational Devices and Circuits, vol. 2, pp. 44–49, 2016.
8 A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with
deep convolutional neural networks,” in Advances in neural information pro-
cessing systems, 2012, pp. 1097–1105.
9 G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior,
V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural networks for
acoustic modeling in speech recognition: The shared views of four research
groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
10 M. L. Schuette, A. Ketterson, B. Song, E. Beam, T.-M. Chou, M. Pilla, H.-Q.
Tserng, X. Gao, S. Guo, P. J. Fay et al., “Gate-recessed integrated e/d gan hemt
135
technology with f t/f max¿ 300 ghz,” IEEE Electron Device Letters, vol. 34, no. 6,
pp. 741–743, 2013.
11 M. O. Li, D. Esseni, J. J. Nahas, D. Jena, and H. G. Xing, “Two-dimensional
heterojunction interlayer tunneling field effect transistors (thin-tfets),” IEEE
Journal of the Electron Devices Society, vol. 3, no. 3, pp. 200–207, 2015.
12 Q.-j. Zhang and K. C. Gupta, Neural networks for RF and microwave design (Book+
Neuromodeler Disk). Artech House, Inc., 2000.
13 Q.-J. Zhang, K. C. Gupta, and V. K. Devabhaktuni, “Artificial neural networks
for rf and microwave design-from theory to practice,” IEEE transactions on
microwave theory and techniques, vol. 51, no. 4, pp. 1339–1350, 2003.
14 J. Xu, D. Gunyan, M. Iwamoto, A. Cognata, and D. E. Root, “Measurement-
based non-quasi-static large-signal fet model using artificial neural net-
works,” in Microwave Symposium Digest, 2006. IEEE MTT-S International.
IEEE, 2006, pp. 469–472.
15 D. E. Root, J. Xu, J. Horn, and M. Iwamoto, “The large-signal model: the-
oretical foundations, practical considerations, and recent trends,” Nonlinear
Transistor Model Parameter Extraction Technique, pp. 123–170, 2011.
16 D. E. Root, “Future device modeling trends,” IEEE Microwave Magazine,
vol. 13, no. 7, pp. 45–59, 2012.
17 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning ap-
plied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp.
2278–2324, 1998.
18 J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp.
179–211, 1990.
19 S. Hashem, “Sensitivity analysis for feedforward artificial neural networks
with differentiable activation functions,” in Neural Networks, 1992. IJCNN., In-
ternational Joint Conference on, vol. 1. IEEE, 1992, pp. 419–424.
20 L. Fu and T. Chen, “Sensitivity analysis for input vector in multilayer feedfor-
ward neural networks,” in Neural Networks, 1993., IEEE International Confer-
ence on. IEEE, 1993, pp. 215–218.
136
21 J. D. Olden and D. A. Jackson, “Illuminating the black box: a randomiza-
tion approach for understanding variable contributions in artificial neural net-
works,” Ecological modelling, vol. 154, no. 1, pp. 135–150, 2002.
22 K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional net-
works: Visualising image classification models and saliency maps,” arXiv
preprint arXiv:1312.6034, 2013.
23 W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K.-R. Mu¨ller, “Eval-
uating the visualization of what a deep neural network has learned,” IEEE
transactions on neural networks and learning systems, 2017.
24 A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled:
High confidence predictions for unrecognizable images,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 427–436.
25 J. Xu, M. C. Yagoub, R. Ding, and Q. J. Zhang, “Exact adjoint sensitivity anal-
ysis for neural-based microwave modeling and design,” IEEE Transactions on
Microwave Theory and Techniques, vol. 51, no. 1, pp. 226–237, 2003.
26 J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online
learning and stochastic optimization,” Journal of Machine Learning Research,
vol. 12, no. Jul, pp. 2121–2159, 2011.
137
CHAPTER 6
FUTURE WORKS
6.1 Non-ideal effects in Thin-TFETs
Many non-ideal effects in Thin-TFETs need to be studied. In experiments, it
is usually hard to achieve monolayer top and bottom 2D materials. Therefore
it is important to study the effect of the top and bottom 2D layer thickness.
We found if there is no chemical doping in both the top and bottom 2D layers,
increasing layer thickness of either top layer or bottom layer will result in less
steep subthreshold slope. Chemical doping in the bottom layer can prevent
subthreshold slope degradation even with increasing bottom layer thickness.
The permittivity of the van der Waals gap directly affects the gate efficiency
of Thin-TFETs. Therefore higher permittivity leads to less steep subthreshold
slope and lower ON current. The interfacial trap density (Dit) also has signifi-
cant impact on Thin-TFET. Since Thin-TFET is made of layers of 2D materials,
it is also interesting to study which location of Dit has the most profound in-
fluence on the device performance. When Dit is unavoidable, we could design
a Dit tolerant device structure to move critical regions away from Dit. More-
over, trap-assist tunneling (TAT) and ShockleyReadHall (SRH) recombination
are known to limit the TFETs’ performance. The in-depth studies of TAT and
SRH recombination in Thin-TFETs are still undergoing efforts.
In the experiments, the access region between the channel to the contacts
often plays a important role in Thin-TFET performance. As shown in the op-
tical image in Fig.6.1,1 the access region between the overlapping SnSe2/WSe2
138
D
ra
in
 C
ur
re
nt
 (A
)
[a] [b] 
VG  (V) Drain Current (A)
SS
 (m
V
/d
ec
)
VDS= -0.2 V
WSe2 Parasitic MOSFET
WSe2 /SnSe2 TFET (simulated)
TFET + MOSFET
VDS= -0.2 V
WSe2 parasitic
MOSFET
WSe2 /SnSe2
TFET
VG
1 2 3
WSe2 Parasitic MOSFET
WSe2 /SnSe2 TFET (simulated)
TFET + MOSFET
Experimental Data
Experimental Data
WSe
2
SnSe
2 2
13
Figure 6.1: (a) IDVG curves of the measured WSe2 parasitic MOSFET, the
WSe2/SnSe2 Thin-TFET (TFET + MOSFET), and the intrinsic WSe2/SnSe2 TFET,
the insets show the optical image of the device and the equivalent circuit with
the parasitic MOSFET; (b) the corresponding SS curves for the parasitic MOS-
FET, the WSe2/SnSe2 Thin-TFET (TFET + MOSFET), and the intrinsic TFET.
junction and WSe2 contacts becomes a lateral parasitic WSe2 MOSFET. This par-
asitic MOSFET will limit the subthreshold steepness of Thin-TFET (as shown
in Fig.6.1(a-b)). To eliminate this parasitic MOSFET, the access region has to be
heavily doped either chemically or electrostatically.
Which material systems are the best to realize Thin-TFET also remains an
open question. Black-phosphorous, due to its small electron affinity, can be ex-
cellent p-type layer in Thin-TFETs. Incorporating Black-phosphorous in Thin-
TFET is under active researches in our group.
139
6.2 Experimental Demonstration of TransiXNOR
In order to realize TransiXNOR, the channel material is the key. Unlike nor-
mal TFETs, where the tunneling junction is only between the source and chan-
nel, both the source/channel junction and the channel/drain junction will be
served as tunnel junctions. When the tunnel junction is at the source and chan-
nel interface, the electrons tunnel from the source valence band to the channel
conduction band; when the tunnel junction is at the channel and drain inter-
face, the electrons tunnel from the channel valence band to the drain conduction
band. Therefore, both the conduction and valence band of the channel material
involve in the tunneling process in TransiXNOR. The first requirement of the
channel material is the bandgap. The bandgap of the channel material has to be
larger than VDD in order to have low leakage in the OFF state, and the bandgap
has to be close to VDD in order to have a high ON current.
The second requirement of the channel material is the thickness. For Tran-
siXNOR to work, the two gates of TransiXNOR have to control the same chan-
nel. If the two gates controlled their individual channels under the gates, the
device will behave like the two TFETs in parallel, thus the XNOR behavior can-
not be achieved. On the other hand, for 3D materials, scaling down to ultra-thin
body will increase its bandgap due to the quantization effect. Using 2D layered
materials can achieve atomically thin bodies with reasonable bandgaps. How-
ever, finding the right 2D layered materials with the suitable bandgap, doping
the source/drain region in 2D layer materials, and integrating 2D layers ma-
terials in CMOS compatible system are all challenging. One possibility is to
use 2D layered materials in the channel region, and use 3D materials in the
source/drain region. Therefore, not only doping the source/drain region is no
140
longer a problem, but we can design staggered/broken band alignments in both
the source/channel and channel/drain junction to boost the ON current. How-
ever, how to build edge contact 3D/2D junction is still under research.2
6.3 Adjoint Network as Regularization in Pi-NN
Due to its structured architecture and embedded device physics, Pi-NN has
shown very good generalization capability without explicit regularization.
However, for device modeling, we sometimes want to enforce a certain set of
rules beyond the region that training data are available. For example, we would
like to enforce the model to be monotonic increasing in the region that exper-
imental measurement are either not available or impossible. Co-training with
the adjoint network gives us access to the output sensitivities with respect to
the inputs. Therefore, we can add a regularization term using the output sensi-
tivities to penalize negative first derivative. Moreover, the adjoint network also
gives us access to the sensitivity of each activation in the model. Since the ac-
tivation contributing little to the final output may cause small oscillations,3 we
could prune out the corresponding neurons during training.
Pi-NN framework is readily applicable to TransiXNOR and other emerging
devices. Initial integration of Pi-NN model and the circuit simulator has started.
The inputs from circuit/system designers will help Pi-NN become the potential
new paradigm of device compact modeling.
141
BIBLIOGRAPHY
1 M. O. Li, R. Yan, D. Jena, and H. G. Xing, “Two-dimensional heterojunction in-
terlayer tunnel fet (thin-tfet): From theory to applications,” in Electron Devices
Meeting (IEDM), 2016 IEEE International. IEEE, 2016, pp. 19–2.
2 A. Allain, J. Kang, K. Banerjee, and A. Kis, “Electrical contacts to two-
dimensional semiconductors,” Nature materials, vol. 14, no. 12, pp. 1195–1205,
2015.
3 F. Girosi, M. Jones, and T. Poggio, “Regularization theory and neural networks
architectures,” Neural computation, vol. 7, no. 2, pp. 219–269, 1995.
142
