HOLLISTIC APPROACHES TO DESIGN HIGH SPEED ELECTRONIC CIRCUITS by Aghasi, Hamidreza
HOLLISTIC APPROACHES TO DESIGN HIGH
SPEED ELECTRONIC CIRCUITS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Hamidreza Aghasi
May 2017
c© 2017 Hamidreza Aghasi
ALL RIGHTS RESERVED
HOLLISTIC APPROACHES TO DESIGN HIGH SPEED ELECTRONIC
CIRCUITS
Hamidreza Aghasi, Ph.D.
Cornell University 2017
The most valuable asset we are given is time. This is perhaps the main motiva-
tion behind the desire of human being to minimize the time that it takes for a
certain task to be completed. Starting from the middle of 20th century, electronic
components brought the hope to perform certain tasks faster than human brain
or other existing techniques. Today, we live in an era that billions of computa-
tions are performed in less than a second and enormous amount of data can be
transmitted between people, thanks to the electronic circuits.
Operation frequency and computation time are the measures of speed in
modern electronics. Therefore, we like to find new approaches to push the lim-
its of operation with respect to these metrics. In this thesis, we introduce new
design approaches of high speed electronic circuits. The systematic design the-
ory in each chapter is verified by measurement results and compared with sim-
ulations. Chapter 1 overviews the advances in each domain and highlights the
design challenges ahead of speed enhancement.
In chapters 2 and 3, a new harmonic power maximization theory is pro-
posed which leads to the design of high power active frequency multipliers with
record performance. It is shown that by characterizing the nonlinear behavior of
a transistor or any nonlinear element, circuit embedding can be selected to max-
imize the power at any desired harmonic. By exploiting this nonlinear model,
the design of millimeter wave and sub-millimeter wave circuits becomes more
power efficient and higher operation frequencies can be reached compared to
linear design approaches.
In Chapter 4, it is shown how emerging technologies such as spin-based de-
vices can outperform the existing technologies in terms of computation time.
Essentially, a systematic design of pattern recognition circuits using spin-based
devices is provided which is scalable and area efficient. It is shown that by com-
bining circuit design techniques with applied physics principles, these emerg-
ing devices improve the existing technology in terms of operation speed, area,
and computation burden.
Chapter 5 and 6 highlight two collaboration projects of the author which
demonstrate the first terahertz phase-locked transmitter and the first integrated
negative inductance circuits. The implementation of these systems bridge the
gap between other research areas such as optics and the integrated circuit tech-
nology. The findings of the thesis are concluded in Chapter 7.
BIOGRAPHICAL SKETCH
Hamidreza Aghasi was born in Isfahan, Iran 1989. He received the B.Sc. de-
gree in electrical engineering (communication systems) from Sharif University
of Technology, Tehran,Iran, in 2011. His undergraduate thesis title was “Source
localization based on signal attenuation and delay estimation in sensor net-
works.” He joined the Ultra high-speed Nonlinear Integrated Circuit Lab, Cor-
nell University, Ithaca, NY, USA, in 2011. In 2014, he joined Samsung Research
America, Mountain View, CA, USA, and the Display Lab, San Jose, CA, USA, as
an Intern.
He is currently a Visiting Scholar with the University of Michigan, Ann Arbor,
MI, USA. His current research interests include signal processing, spin-based
devices, terahertz power generation, and RF and millimeter-wave design. Mr.
Aghasi was a recipient of the Cornell Fellowship in 2011 and the Jacobs Fellow-
ship in 2012. He was also a recipient of the Cornell ECE Innovation Award for
the “Teratooth” system in 2013.
iii
To my family for their endless and selfless support and love.
iv
ACKNOWLEDGEMENTS
Aside from Him and my biological family, there are few individuals with “enor-
mous impact” on my life. My dear friend, brother, and advisor, Ehsan Afshari,
is certainly one of these people. Ehsan is a remarkable intellectual and more
importantly a truly caring and supportive friend. Not only an excellent teacher
and a leading scientist in analog circuit design, he is also one of the moral role
models for me and my friends. Ehsan keeps the balance between the profes-
sional and personal relationship with his students very well. This is why he is
both a great friend, and a reliable and charismatic advisor to his students. In
2011, when I joined his research group, one of my fellow senior lab-mates said
that “those who join Ehsan’s research group, are given a special opportunity”.
Today, I cannot agree more with this fact that joining Ehsan Afshari’s research
group has been the most special opportunity of my entire life. Within the time
that I worked under Ehsan’s supervision, I was never hesitant of exploring new
ideas and research areas, as I could always rely on his genuine support. This
was the main reason that I could implement my ideas fearlessly and work on
my desired research problems. He encouraged me more than any one else to
explore new ideas and paths in my technical and non-technical efforts. To bet-
ter explain how influential Ehsan has been on my life, I would like to make
two claims: 1) Ehsan Afshari has taught me professional and technical lessons
more than any one else in my life and 2) starting from Aug 2011, any single
academic contribution that I have made or will make in future, would have not
been achieved if I did not have the chance to get to know Ehsan Afshari.
I am also grateful to Prof. Alyssa Apsel and Prof. Kevin Tang for being in
my committee and giving feedback on my research and teaching accomplish-
ments. I would also like to thank Prof. Farhan Rana from Cornell University,
v
Prof. Azad Naeemi from Georgia Institute of Technology, Prof. Omeed Momeni
from University of California Davis, and Prof. Kamal Sarabandi from Univer-
sity of Michigan for their technical and professional support. I also appreciate
the great support from the department staff, especially Scott Coldren and Sue
Bulkley from Cornell and Jennifer Fenneley from Michigan.
I would also like to thank my colleagues and lab-mates at the Ultra High
Speed Nonlinear Integrated Circuit Design Laboratory (UNIC): Yahya Tousi,
Guansheng Li, Wooram Lee, Muhammad Adnan, Ruonan Han, Vahnood
Pourahmad, Hamid Khatibi, Somayeh Khiyabani, Mohammad Emadi, Amirah-
mad Tarkeshdouz, Ali Mostajeran, Chen Jiang, Sajjad Ohadi, Saghar Seyedab-
baszadeh and Hossein Naghavi.
I have incredible friends. I know Farhad Shirani from age of 5 and within
last 23 years, he has consistently been a great friend for me. Navid Naderial-
izadeh and Mohsen Heidari are great examples of caring and supportive friends
who happen to be genius intellectuals. Within last ten years of friendship and
brotherhood with them, I have never felt alone in the darkest nights of my life.
Mohammad Hassan Lotfi, Morteza Hashemi, Shervin Minaee, and Ali Ebrahimi
have also been very helpful to me throughout the school years.
My friends in Ithaca and Ann Arbor pumped a lot of positive energy and
motivation into my life and made the long PhD journey a very enjoyabale one.
They are: Hadi and Baran, Mina and Tina, Abolhassan and Mahya, Sepehr and
Mahsa, Sina Lashgari, Rad Niazadeh, Amirhossein Tajdini, Salman Mirzaei,
Phil Gordon, and Byesah Gantsog from Ithaca and Noyan Akbar, Amin Has-
vi
sanzadeh, Behzad Yektakhah, Mina Jafari, Parisa Ghaderi, Mehrzad Samadi,
Armin Jam, Avish Kosari, Mahmood Barangi, and Omar Abdellaty from Ann
Arbor.
Last but not least, I would love to thank my family members. Without my
lovely family who always pumped love and support into my veins, I would
have not been able to make one step forward. It will be even scary to imagine
how different my life could have been without these heavenly creatures. I am al-
ways grateful to my parents who are still inspiring me and leading me towards
the right path. Words cannot describe how much I respect my father for being a
true scientist and a great role model. He remains as the only role model in my
life who instilled the meaning of stamina and patience in my mind. He remains
as the most respected scientist for me and the only example of the kind of father
that I would like to be. My mother, a gifted heavenly angel with true love and
support, has taught me humanity lessons more than any one else and remains
a genuine source of love to me. I am very certain that if I dedicate the rest of
my life to her, I would not be able to respond her genuine and selfless love. My
Brother, Alireza knows very well that he is the most influential individual in my
life. After many years, he is still the most caring friend that I could ever wish
for. From my very early ages, I have tried to follow the trace of his steps and
I cannot be even close to what he is. My lovely sister, Hengaame, means the
whole world to me. I cannot even imagine how different my life would have
been without her selfless love.
vii
TABLE OF CONTENTS
Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1 Challenges in High Speed Circuit Design 1
1.1 High Frequency Operation Limits . . . . . . . . . . . . . . . . . . 1
1.2 Challenges of high speed computation . . . . . . . . . . . . . . . . 3
2 A 0.92 THz SiGe Power Radiator Based on a Nonlinear Harmonic Gen-
eration Theory 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Nonlinear Model of the two-port device . . . . . . . . . . . . . . . 7
2.2.1 Review of Volterra-Weiner Theory of Nonlinear Devices . 8
2.2.2 Nonlinear Two-port Device Modeling . . . . . . . . . . . . 10
2.2.3 Combination of linear passive network and nonlinear device 14
2.3 Nonlinear Harmonic Power Optimization . . . . . . . . . . . . . . 16
2.3.1 Impact of Circuit Topology and the Ratio Functions . . . . 17
2.4 Design of a High Power Frequency Quadrupler . . . . . . . . . . 21
2.4.1 Transistor Selection . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Circuit Topology and Optimum Embedding Network . . . 23
2.4.3 Device characteristic manipulation . . . . . . . . . . . . . . 24
2.4.4 Matching Conditions . . . . . . . . . . . . . . . . . . . . . . 27
2.4.5 Single to Differential Power Conversion . . . . . . . . . . . 28
2.4.6 Output Power Extraction . . . . . . . . . . . . . . . . . . . 29
2.5 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Analytical study of ratio functions . . . . . . . . . . . . . . . . . . 38
2.7.1 No reflection from source . . . . . . . . . . . . . . . . . . . 39
2.7.2 Reflection from the source . . . . . . . . . . . . . . . . . . . 41
3 A 0.43-0.51 THz SiGe Frequency Doubler Based on the Nonlinear Har-
monic Generation Theory 44
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Nonlinear Model of Harmonic Generation . . . . . . . . . . . . . . 45
3.3 Circuit Design and Optimization . . . . . . . . . . . . . . . . . . . 47
3.3.1 Wideband Operation . . . . . . . . . . . . . . . . . . . . . . 48
3.3.2 Differential Signal Generation . . . . . . . . . . . . . . . . . 49
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 50
viii
4 Smart Detector Cell: A Scalable All-Spin Circuit for Low-Power Non-
Boolean Pattern Recognition 54
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 All-Spin Logic Devices . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.1 Majority Gate Operation . . . . . . . . . . . . . . . . . . . . 60
4.2.2 Switching Delay Variation . . . . . . . . . . . . . . . . . . . 63
4.2.3 Impact of Thermal Noise . . . . . . . . . . . . . . . . . . . 63
4.3 Pattern Recognition Scheme . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Mainly Similar Images . . . . . . . . . . . . . . . . . . . . . 65
4.3.2 Majority Training and Decision Making . . . . . . . . . . . 66
4.4 Proposed Structure and Design Considerations . . . . . . . . . . . 67
4.4.1 Memory+Logic Comparator . . . . . . . . . . . . . . . . . 68
4.4.2 Construction of the mean pixel . . . . . . . . . . . . . . . . 70
4.4.3 Single Pixel Comparator . . . . . . . . . . . . . . . . . . . . 71
4.4.4 Non-Boolean Row Decision-Maker . . . . . . . . . . . . . . 75
4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.1 Non-Boolean Hamming Distance Identifier of 3×3 Pixel
Pattern and Input Image . . . . . . . . . . . . . . . . . . . . 78
4.5.2 Non-Boolean Similarity Comparison of a 9×9 Pixel Image
and a Set of 3 Pattern Images . . . . . . . . . . . . . . . . . 79
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.7 Proof of proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8 Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 A SiGe terahertz heterodyne imaging transmitter 89
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Design of a 320-GHz Transmitter . . . . . . . . . . . . . . . . . . . 91
5.2.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.2 Coupled Radiator Array . . . . . . . . . . . . . . . . . . . . 96
5.2.3 On-Chip Phase-Locked Loop . . . . . . . . . . . . . . . . . 98
5.3 Prototype And Experimental Results . . . . . . . . . . . . . . . . . 99
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6 Low Power Negative Inductance Integrated Circuit for GHz Applica-
tions 106
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2 Negative Impedance Converter Design . . . . . . . . . . . . . . . 107
6.3 Negative Inductor Implementation . . . . . . . . . . . . . . . . . . 108
6.3.1 Biasing Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.2 circuit simulation . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4 Fabrication and Measurement Result . . . . . . . . . . . . . . . . . 112
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7 Conclusion and Future Directions 114
ix
Bibliography 118
x
LIST OF TABLES
2.1 Comparison with state-of-the-art . . . . . . . . . . . . . . . . . . 38
3.1 Performance Comparison of Solid State Sources . . . . . . . . . . 52
4.1 Performance Comparison with existing CMOS systems . . . . . . 85
4.2 Possibilities of x, y1, · · · , yP . . . . . . . . . . . . . . . . . . . . . . . 86
xi
LIST OF FIGURES
1.1 the state-of-the-art terahertz and sub-terahertz integrated elec-
tronic sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Energy consumption and delay of different technologies in im-
plementation of a 32bit adder . . . . . . . . . . . . . . . . . . . . . 4
2.1 (a) Simplified circuit model of (a) MOS transistor and (b) BJT
transistor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 (a) A three-terminal linear two-port device, (b) three-terminal
nonlinear two-port device, (c) a linear device embedded in a 4-
port network and (d) a nonlinear device embedded in a 4-port
network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 (a) Nonlinear two-port representation of device and (b) embed-
ded with a general passive network. . . . . . . . . . . . . . . . . . 12
2.4 (a) the variations of R2 and S 2 with respect to the illustrated
changes of TL1 and TL2 and, (b) the similar variations on the
transmission line values in a different topology. For an identi-
cal transistor, the topology determines the behavior of the ratio
functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 (a) The selected transistor size and the corresponding G and H
coefficients, (b) topology of the frequency quadrupler and, (c)
the variations of the topology Ri and S i functions by changing
TL1, TL2 and TL3. The mean value of the functions and their
relative location on the real-imaginary axis are also shown. . . . 15
2.6 (a) the harmonic power prediction using the approximate model
(2.21), compared with the circuit simulations, (b) the harmonic
power prediction using the moderate model (2.19), compared
with the circuit simulations, (c) the harmonic power prediction
using the accurate model (2.15) compared with circuit simula-
tions and (d) the circuit schematic used in these simulations. . . 18
2.7 According to the approximate model, we can find the location
of optimum harmonic conditions in the [A,Φ] plane. In order
to maximize the harmonic power, the transmission lines are se-
lected to reach these optimum conditions. . . . . . . . . . . . . . 22
2.8 (a) the nonlinearity manipulation by adding transistor Q2 and,
(b) the layout of the Q1 and Q2 transistor combination. . . . . . . 24
2.9 (a) The model prediction of the harmonic power in the vicinity
of optimum point for the selected topology when (a) Q1 is the
only nonlinear component and (b) the Q1 and Q2 combination is
the nonlinear component. . . . . . . . . . . . . . . . . . . . . . . 25
xii
2.10 (a) the circuit simulations of the fourth harmonic output power
and (b) the simulated fundamental power. It is clear that the
optimum conditions of the fundamental and fourth harmonic,
Opt1 and Opt4, are different, (c) the simulation testbench. . . . . 26
2.11 The summary of the proposed design flow for THz frequency
multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.12 (a) frequency quadrupler schematic and the matching conditions
at different harmonics, and (b) the distribution of the current in-
tensity of the harmonics. . . . . . . . . . . . . . . . . . . . . . . . 28
2.13 (a) the layout of the input balun and (b) the performance sum-
mary of the microwave short-open structure. . . . . . . . . . . . 29
2.14 (a) the layout structure of the patch antenna array and (b) the
simulated performance summary of the radiator. . . . . . . . . . 30
2.15 The die photograph . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.16 The implemented power measurement set up. . . . . . . . . . . 32
2.17 (a) the implemented near-field power measurement and (b) the
diagonal horn antenna near field coupling (d=2mm), simulated
by an accurate FDTD solver and (c) the simulated total radiated
power at different harmonics for an input at 232 GHz. . . . . . . 33
2.18 (a) Simulation results of the wideband power generation and (b)
the circuit simulated operation in “radiator on” and “radiator
off” modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.19 Measurement results. The power difference technique is utilized
to measure all data points. . . . . . . . . . . . . . . . . . . . . . . 36
2.20 The performance summary of state-of-the-art THz sources. . . . 37
2.21 The simulated variation of amplitude and phase of input and
output voltage components by variations of length of TL1. . . . 42
2.22 The simulated variation of amplitude and phase of input and
output voltage components by variations of length of TL2. . . . 42
2.23 The simulated variation of amplitude and phase of input and
output voltage components by variations of length of TL1 when
there is source reflection. . . . . . . . . . . . . . . . . . . . . . . . 43
3.1 (a) The transistor operators of harmonic generation and (b) the
impact of passive network. . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Schematic of the active doubler and the input/output matching
scenarios on smith chart . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Wideband matching at (a) input and (b) output . . . . . . . . . . 49
3.4 Using bias manipulation technique for (a) wideband power gen-
eration and (b) maximum power generation at a fixed frequency. 49
3.5 (a) 3D illustration of the input pad and balun and (b) the inser-
tion loss and phase/amplitude imbalance simulated by HFSS . . 50
3.6 (a) the die photograph, (b) and (c) the measurement set up. . . . 51
3.7 Simulation and measurement results for an input power of 5dBm. 53
xiii
3.8 Simulation and measurement results of operation at 240 GHz for
different power levels. . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 (a) Configuration of single ASL device (b) Applied voltage on
the magnet, creates an electric field and enforces electron move-
ments. (c) Spin-polarized electrons at the input side exhibit a
higher density compared to the output side. (d) The diffusion
of spin-polarized electrons towards the output magnet, changes
the output magnetization direction. (e) An ASL Majority Gate
with 3 inputs [101]. The 3 input magnets, M1, M2 and M3 are
connected to the output magnet MO using 3 metallic intercon-
nects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 (a) Switching transient response for different scenarios of input
magnetization in a majority gate with 5 inputs. (b) Switching
transition comparison of majority gates with 3 and 5 inputs. In
this comparison, the input magnetization of magnets to the 3 in-
put gate are all similar. For the gate with 5 inputs, 4 inputs have
similar magnetization and the net spin current is equal to the
other gate. The applied voltage on the magntes in these simula-
tions is −5 mV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Switching delay variation versus the supply voltage. Each point
is simulated 3 times to verify the results[104] . . . . . . . . . . . . 64
4.4 The two images are mainly similar (along the rows), however,
the Hamming distance between the third columns is 3 which
does not imply a similarity along the columns . . . . . . . . . . . 66
4.5 1-bit full adder used as XNOR[104]. In the 2D implementation
of this work, X and Y wires are in-plane metal wires and connec-
tions along the Z axis are vias. . . . . . . . . . . . . . . . . . . . . 69
4.6 Simulated output waveforms of XNOR gate . . . . . . . . . . . . 70
4.7 (a) Standard single pixel detector schematic. (b) The truth table
with the detailed operation of the circuit. . . . . . . . . . . . . . . 73
4.8 (a) Comparator-first pixel detector schematic. (b) The truth table
with the detailed operation of the circuit. . . . . . . . . . . . . . . 74
4.9 Structure of the unit smart detector cell . . . . . . . . . . . . . . . 78
4.10 Using a single smart detector cell, we can compare these 3 × 3
pixel images. The waveforms of the comparators and majority
gates (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.11 Training set for the 9 × 9 pixel images . . . . . . . . . . . . . . . . 81
4.12 The input image (left) and the representation of the mean image
(right). The mean image is not a direct output of the circuit. . . . 81
4.13 Due to fan-in considerations, the circuit is consisted of 9 smart
detector cells. The corresponding breakdowns of the mean im-
age and the input image are shown here . . . . . . . . . . . . . . . 82
xiv
4.14 The switching delay of output magnetization in last stage repre-
sents the similarity of input data and pattern data. . . . . . . . . 83
4.15 Since these clusters represent mismatch, they can not switch and
the initial magnetization does not change. Note that the y-axis
is showing from -1.002 to -0.998 in contrast with Figure 4.14 in
which the y axis is from -1 to 1. . . . . . . . . . . . . . . . . . . . . 84
5.1 The performance of the state-of-the-art THz radiator sources in
silicon: (a) the total radiated power at varying frequencies and
(b) the achieved DC to THz radiation efficiency over the past few
years. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Full-wave electromagnetic simulation of the THz radiator: (a)
odd-mode excitation/loading ports and the intensity distribu-
tion of the electrical field (b) S-parameters near the fundamental
oscillation frequency of 160 GHz. . . . . . . . . . . . . . . . . . . 92
5.3 The two-port active network including a SiGe HBT and a series
half-RPGC structure at the transistor base. Also shown is the
simulated optimum phase of the complex voltage gain of such
active network at 160 GHz. . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Full-wave electromagnetic simulation of the THz radiator: (a)
even-mode excitation/loading ports and the intensity distribu-
tion of the electrical field (b) S-parameters near the 2nd-harmonic
frequency of 320 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.5 The simulated radiation pattern of the proposed 320-GHz radia-
tor unit. A backside hemispheric silicon lens is assumed. . . . . . 95
5.6 The architecture of the 320-GHz transmitter with a fully-
integrated phase-locking loop. . . . . . . . . . . . . . . . . . . . . 96
5.7 The mutual coupling between adjacent radiators: (a) in-phase
coupling mode (supported) (b) out-of-phase coupling mode (un-
supported). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.8 The simulated radiation pattern of the 320-GHz 4×4 radiator ar-
ray. The pitch between the elements is 220 µm, and a backside
hemispheric silicon lens is assumed. . . . . . . . . . . . . . . . . . 98
5.9 The schematic of the 160-GHz VCO inside the on-chip phase-
locked loop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.10 (a) The microphotograph of the 320-GHz transmitter using 130-
nm SiGe BiCMOS process. The THz radiator based on the
return-path gap coupler is also shown. (b) The chip packaging
with the backside attachment of a silicon lens. . . . . . . . . . . . 100
5.11 The measurement setup for the 320-GHz transmitter and a photo
of the packaged chip. . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.12 The measured down-converted spectrum of the transmitter ra-
diation: (a) on-chip PLL is OFF and (b) on-chip PLL is ON. . . . . 102
xv
5.13 The received radiated power of the power meter at varying dis-
tance, d, from the 320-GHz transmitter chip. . . . . . . . . . . . . 103
5.14 The performance of the state-of-the-art THz radiator sources in
silicon (a) the total radiated power (b) DC to THz radiation effi-
ciency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.15 The total radiated power of the 320-GHz transmitter, as well as
the associated DC-to-THz radiation efficiency, at different DC
power supply voltage and dissipation power. . . . . . . . . . . . 104
6.1 A Basic non-Foster element, (a) Negative Impedance Converter
circuit implementation based on Linvill’s OCS model, (b) the
equivalent small-signal model of the proposed NIC based on
CMOS technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.2 Schematic for the self-biased current source using an on-chip re-
sistor Rs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3 Schematic diagram of the Negative Impedance Converter (NIC)
integrated circuit which has been implemented in a 65 nm pro-
cess and it produces a negative inductance of L = −1nH. . . . . . 109
6.4 Reactance vs. frequency for both positive inductor (solid red
curve), and negative inductor (dotted blue curve). . . . . . . . . . 110
6.5 Negative inductance simulation (L = −1nH) with a real part of
approximately 45 Ω (dashed blue), imaginary part equal to L =
−1nH (solid red), and ideal imaginary part for L = −1nH (dotted
green). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.6 Microphotograph of the proposed negative inductor design in-
cluding GND, Power and RF pads (left), and measurement setup
(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.7 Comparison of the measured and simulated (Spectre-RF)
impedance of the negative inductance circuit (L = −1nH). . . . . 111
6.8 Measured and simulated results for the negative inductance. . . 112
xvi
CHAPTER 1
CHALLENGES IN HIGH SPEED CIRCUIT DESIGN
1.1 High Frequency Operation Limits
Emerging applications of the THz range (0.3-3 THz) include different areas of
interest, e.g., molecular spectroscopy, high resolution imaging and ultra-fast
communication. To demonstrate THz transmitters, at least milliwatt level of
power should be generated in order to battle the high propagation loss in this
frequency range.
The reliable implementation of terahertz systems on silicon chips provides
a low-cost and scalable solution to the desired applications. The device scaling
within the last decade, has provided a mainstream path for circuit designers to
design standard circuit topologies at higher operational frequencies. However,
as predicted by Moore’s law [6], the scaling growth is reaching the limits and a
more thorough design approach is required. Besides, the faster devices due to
the scaled size, normally come along with a lower breakdown voltage, which
limits the power handling capability of the transistor.
In addition, there are specific high frequency design challenges that have to
be addressed. First and most importantly, at high frequencies where the opera-
tion frequency f0 is close to the fmax of the transistor, the active device exhibits a
low power gain. Moreover, the skin effect and substrate coupling deteriorates
the power efficiency. These factors are the main obstacles of high power sig-
nal generation at the THz range. Fig. 1.1 shows the state-of-the-art terahertz
electronic sources.
1
0.2 0.6 1
Frequency (THz)
G
e
n
e
ra
te
d
/R
ad
ia
te
d
 P
o
w
e
r 
(d
B
m
) 5
10
0.3 0.4 0.5 0.7 0.8 0.9
-15
-10
-5
0
-20
-30
-25
-35
JSSC 2016
JSSC 
2011
ISSCC 2014
MTT 2014
ISSCC 2015
ISSCC 
2012
IMS 2013
ISSCC 
2014
MTT
 2013
JSSC
 2015
1.1 1.2 1.3 1.4 1.7 1.81.5 1.6
Si/SiGe Technology
InP/GaAs Technology
ISSCC 
2011
VLSI 
2015 ISSCC 
2016
JSSC 2011
MTT 2010
MWCL
 2005
MWCL
 2004
My Designs
ESSCIRC 2016
JSSC 2016
(a)
Figure 1.1: the state-of-the-art terahertz and sub-terahertz integrated elec-
tronic sources.
Secondly, by increasing the operation frequency beyond fmax, the power has
to be extracted from the harmonic components of the fundamental signal. Since
the harmonic power generation is determined by the nonlinear profile of the de-
vice, an accurate nonlinear model to maximize the output power is required. It
is noteworthy that the extraction of signal from the harmoinc components, fur-
ther lowers the power efficiency. This harmonic power modeling and extraction
is the most fundamental distinction of THz design with lower frequency circuit
design.
Thirdly, the operation bandwidth of THz sources are usually low. In the par-
ticular case of harmonic/fundamental oscillators, the insufficient variation of
tuning components such as varactors, limits the signal bandwidth. This is why
frequency multipliers are a good choice for power generation, where conven-
tional matching techniques can cover the large operation bandwidth.
Lastly, the power distribution/combination of single blocks can add to the
2
loss of the network. This is particularly important in the traditional design
technique where multiple sources are combined to increase the output power
level. In order to defeat this factor, novel low-loss microwave structures such as
baluns [7] and scalable blocks [8] are required.
Due to all these issues, the generated power levels in Fig.1 for the state-of-
the-art THz sources, have been increasing slowly within the last decade. More-
over, it is clear from Fig.1(a) that the generated output power drops with a 1/ f 3
to 1/ f 4 trend by increasing the frequency. Therefore, it still remains a challenge
to generate a high power at the terahertz range.
1.2 Challenges of high speed computation
Along with the Moore’s prediction, Dennard scaling had brought the hope to
increase the clock frequency by reducing the transistor dimensions. However,
the underestimated quantum mechanical effects and in particular the “leakage
current” started to flaunt their contribution since 2006. In addition it turned out
that by shrinking the transistor size, short channel effects such as drain induced
barrier lowering and mobility degradation occur. Therefore, fundamental lim-
its for Moore’s law were identified and minimum effective gate length of few
nanometers is found for the transistors.
In the “ integrated digital” domain the clock frequencies have not changed
within last few years and emerging technologies such as “spin-based logic”
gates with a lower switching energy have been introduced. In addition, “non-
Boolean computation”, “approximate computation” and “machine learning”
3
techniques are introduced to overcome the limits of “Boolean computation”
for recognition applications. Spintronic devices exhibit unique features such as
high density, non-volatility, low switching energy and instant wake up. How-
ever, according to Fig.1.2 the energy-delay product of spin-based logic gates is
larger than CMOS and implementation of “Boolean logic gates” is not a good
benchmark for these devices.
Limited by 
Capacitor 
Charging
Limited by spin 
dynamics
Worse
Better
Fast
Slow
Steep turn-on/off 
(TFETs)
GpnJ
SpinFET
CMOS HP
HJTFET
IIIvTFET
gnrTFET
SWD
SMG
CMOS LV
Potentially 
Nonvolatile
Magneto-electric
NML
STTtraid
ASLD
STOlogic
STT/DW
10
10
10
10
10
10
En
e
rg
y,
 f
J
10 10 10 10 10 10
1 2 3 4 5 6
Delay, pS
32bit adder
3
2
1
0
-1
-2
Switching Time and Energy
Figure 1.2: Energy consumption and delay of different technologies in im-
plementation of a 32bit adder
4
CHAPTER 2
A 0.92 THZ SIGE POWER RADIATOR BASED ON A NONLINEAR
HARMONIC GENERATION THEORY
2.1 Introduction
A growing interest in sub-mm wave (0.3 to 3 THz) circuits and systems has
been witnessed within the last decade. This is mainly due to the plethora of
opportunities within this previously unexplored electromagnetic spectrum [9]-
[12]. The characteristic of absorption profiles and proliferation of molecular
resonances makes this region of spectrum a unique platform of material spec-
troscopy. Higher spatial resolution compared to lower frequency ranges and
the non-ionizing nature compared to the higher frequency radiations (e.g., the
X ray) adds to the exclusive features of the THz frequency window for imaging
applications [12]-[14].
At the early stages of investigation, compound semiconductors [15]-[23],
gifted with a high speed operation potential ( fmax >300 GHz) and quantum cas-
cade lasers [24], brought the hope of THz power generation. However, they did
manifest challenges such as stringent operation requirements (e.g., the low tem-
perature) and high cost. CMOS circuits, were investigated as the next platform,
due to their lower cost and integration capability. The first CMOS mm-wave cir-
cuits were high bandwidth transceivers and imaging/sensing circuits [25]-[41].
After the successful demonstration of CMOS mm-wave circuits, fundamen-
tal and harmonic oscillators [42]-[54] were designed to reach a higher operation
frequency.Due to the insufficient DC-to-RF efficiency and the limited bandwidth
5
of the oscillators, frequency multipliers were explored to boost the performance
[55]-[63]. Despite the high output power of passive and active CMOS frequency
multipliers, frequency tuning [59], phase-locked operation [60], and closer to
fmax operation [61], [62], were the pending milestones before a sub-mm wave
transceiver could be demonstrated.
In a parallel route, the feasibility of on chip antenna demonstration opened
the path for THz radiators [58], [64] and detectors [65]-[68]. The novel radia-
tion techniques introduced in some of these works [65]-[69] was later used in
phased-arrays [70].
At the THz range where CMOS has limited capabilities, the BiCMOS tech-
nologies joined the relay and ignited the path towards higher power genera-
tion [?], [69]. The higher break down voltage, higher operational frequency
( fmax >300 GHz) and the more nonlinear I-V characteristic were the assets that
CMOS lacked. Taking advantage of these unique features, 3.3 mW of power
was achieved at 320 GHz in a ( fT/ fmax=220/280 GHz) process. Similarly, in [69],
1 mW of power was generated at 0.53 THz inside a re-configurable array.
In all the mentioned beyond fmax demonstrations the power extraction is
through the harmonics of the fundamental frequency. Previous work on max-
imizing the fundamental power [62], [58] could enhance the harmonic power
significantly due to operation in a more nonlinear region. However, the power
generation at higher harmonics still requires a thorough modeling of the har-
monic signal. A systematic design methodology to generate maximum har-
monic power based on a nonlinear model remains as the missing piece before
circuit designers could demolish the obstacle of beyond 1 THz operation using
the existing transistor technologies.
6
In order to push the operation limits of electronic circuits, a systematic de-
sign approach is required. The proposed design methodology should provide
an accurate model of the nonlinear device. Moreover, the effect of circuit topol-
ogy and the embedding network on the harmonic power should be character-
ized. This is the main intention behind the proposed design approach in this
paper. We introduce a nonlinear characteristic of the active device based on the
Volterra theory [72], which can be exploited to model any arbitrary nonlinear
component. Based on the proposed model, the optimum embedding network
and the nonlinearity enhancement techniques are proposed which yield to a
high harmonic power.
Using a 130nm SiGe:C BiCMOS process ( fT/ fmax=220/280 GHz, Vceo=1.6V
[71]), a frequency quadrupler radiator at 0.92-0.944 THz is designed. The circuit
achieves a generated output power of -17.3 dBm at .928 THz, which results in
-10 dBm of EIRP and stands among one of the highest frequency radiators in
Si/SiGe processes. The nonlinear model of the active device is presented in
Section II. The harmonic power optimization is presented in Section III. Section
IV covers the design of a high power frequency quadrupler. The measurement
results and comparison with the state-of-the-art are presented in Section V and
the paper is concluded in Section VI.
2.2 Nonlinear Model of the two-port device
An accurate model of transistor or any nonlinear component is essential to de-
termine the mechanisms of harmonic signal generation. The proposed model
should easily cope with arbitrary embedding networks and also encapsulate
7
the variations of device behavior, e.g., the frequency dependence of the nonlin-
ear profile. In this section, we propose a model based on the Volterra-Weiner
theory [72], which meets the desired criteria.
2.2.1 Review of Volterra-Weiner Theory of Nonlinear Devices
In 1958, Weiner re-arranged the nonlinear series that Volterra had found in 1887
and that is why engineers mostly refer to Volterra-Weiner theory [72]. These
series capture different linear and nonlinear effects that contribute to output
components. Both time domain and frequency domain representations of these
series exist; however, we are interested in frequency domain equations as we
would like to capture the dynamics of harmonic variations. Utilizing the multi-
dimensional Fourier transform, the frequency correspondence of each kernel
[72] ki is defined as,
Ki( jω1, · · · jωi) =
∫
R
· · ·
∫
R
ki(τ1 · · · τi)e− j(ω1τ1+···ωiτi)dτ1 · · · dτi. (2.1)
As an example, for i = 1, the Fourier transform of linear system K1( jω) is ob-
tained, i.e.,
Y1( jω) = K1( jω)X( jω). (2.2)
Similarly, for a Pth-order operator, utilizing the P-dimensional Fourier trans-
form,
YP( jω1 + · · · + jωP) = KP( jω1, · · · , jωP)X( jω1) · · · X( jωP). (2.3)
8
The equation in (2.3) is the general form of an intermodulation. As an example,
for a 2nd-order input signal, containing ω0 and 2ω0 components, K2( jω0, j2ω0)
represents the fundamental and second harmonic intermodulation to generate
the 3rd harmoinc component of the output. For identical ωi’s, KP represents the
Pth-order nonlinear transfer of ωi at the input to Pωi at the output. For P = 1,
(2.3) is a linear operator.
We are mostly interested in the response of the system to a periodic time
domain signal, e.g.,
x(t) =
∑
i≤L
ci.cos(iω0t), (2.4)
which is a real periodic signal that contains the harmonics up to the Lth-order.
For this input, the output time domain signal y(t), is the combination of linear,
nonlinear and intermodulation operators. For example, the response of a second
order system to the input Acos(ω0t) is,
y(t) = 2(
A
2
)2Re{K2( jω0, jω0)e j2ω0t} + ARe{K1( jω0)e jω0t} + 2(A2 )
2Re{K2( jω0,− jω0)},
(2.5)
where the first term represents the 2nd order operator from ω0 to 2ω0, the second
term is the linear response of the system at ω0 and the last term is the inter-
modulation. Without showing the complex general expression for the case of
multi-harmonic input signal applied to a higher order nonlinear system [72],
we will introduce a two-port nonlinear model, in the next subsection.
9
2.2.2 Nonlinear Two-port Device Modeling
To propose a general nonlinear model, we should capture the important non-
linear mechanisms in different types of transistors, e.g., MOS and BJT. The stan-
dard simplified circuit model of these devices are shown in Fig. 2.1. According
to these models, the three-port device can be modeled with two ports if the volt-
age variatinos at the third port (the emitter in BJT or source in MOS) are neg-
ligible. The linear two-port model of transistors is shown in Fig. 2.2(a). Based
Vg Vd
Cgs  gmVgs
Cgd  
1/go
(a)
Vb Vc
C 
gmVbe
C  1/go
(b)
r 
Vs
Ve
µ
Figure 2.1: (a) Simplified circuit model of (a) MOS transistor and (b) BJT
transistor.
on what Mason introduced in [73], activity condition of this three-terminal (two-
port) device is defined by an invariant function U. In this terminology, the three-
terminal device is embedded in a 4-port, linear, lossless, reciprocal network, as
shown in Fig. 2.2(c). Based on the method proposed in [62], the net power
flowing out of the device determines the device activity. Moreover, optimum
conditions to generate maximum power at the oscillation frequency are calcu-
lated in [62]. The elements of the admittance (Y) matrix, determine the values
of these optimum conditions.
The linear two-port model fails to address the harmonic generation due to
the device nonlinearity. Therefore, we would like to find a two-port model
10
I
in
I
out
V
in
V
out
(a) (b)
I
1 I2
V
1
V
2
2 port 
Device
[Y]
Linear, Lossless, Reciprocal 
Embedding
Linear, Lossless, Reciprocal 
Embedding
I'
2
V'
2
I'
1
V'
1
I
in
I
out
V
in
V
out
I'
out
V'
out
I'
in
V'
in
(c) (d)
Nonlinear DeviceLinear Device
 
I
1
I
2
V
1
V
2
2 port 
Device
[Y]
Figure 2.2: (a) A three-terminal linear two-port device, (b) three-terminal
nonlinear two-port device, (c) a linear device embedded in a 4-
port network and (d) a nonlinear device embedded in a 4-port
network.
which can capture the major nonlinear mechanisms in the transistor. By select-
ing invariant DC conditions across the transistor, the variations in the device
capacitors can be almost negligible [74]. However, for a fixed DC condition, the
transistor still exhibits nonlinear performance due to the current distortion of
the output channel. The output channel distortion is mainly due to the transcon-
ductnace gm and the output conductance go. For the two-port model in this
work, we would consider a nonlinear profile for any element of Y matrix which
contains gm or go and leave the rest simply as linear parameters. The reader can
verify that Y21 and Y22 of the active devices in Fig. 2.1 contain terms from gm and
go, while Y11 and Y12 do not.
Therefore, similar to the terminology of [73], in this paper, the transistor is
modeled as a two-port network, as shown in Fig. 2.2(b). Consequently, the tran-
11
sistor can be embedded in a linear, reciprocal network as shown in Fig. 2.2(d).
The multi-harmonic arrays of voltage and current
−→
I and
−→
V are considered in
this model to mention the fact that current and voltage components at at any
harmonic is dependent on components from other harmonics.
I
in
I
out
V
in
V
out
(a) (b)
V
in V out
I
in
Y
11
Y
12
V
in
I
out
V
out
G
H
Linear Part of model Nonlinear Part of model
Vin Vout
ZM
I
in
I
out
Y3Y1
Figure 2.3: (a) Nonlinear two-port representation of device and (b) embed-
ded with a general passive network.
The current generation mechanisms in transistors dictate a nonlinear rela-
tionship between the output current
−→
Iout and input/output voltages (
−→
Vin and
−−→
Vout). However, the input admittance (Y11) and the feedback admittance (Y12)
of the transistor, are still considered as linear functions. In the proposed model
of Fig. 2.2(a), the G and H functions, represent the different order transconduc-
tance functions of
−→
Vin to
−→
Iout and conductances of
−−→
Vout to
−→
Iout, respectively. The
coefficient Gi j is defined as the gain from the ith harmonic of the input voltage to
the jth harmonic of the output current. Similarly, Hi j is defined as the gain from
the ith harmonic of the output voltage to the jth harmonic of the output current.
It is easily induced that for identical i and j, the operator is linear and for dif-
ferent i and j it is a nonlinear or intermodulation operator. If j is a multiple of i,
12
the Gi j and Hi j coefficient represent a j/i-order nonlinear operator, i.e.,
Gi j(iω0, jω0) =
∆Iout, jω0
∆V j/iin,iω0
, (2.6)
Hi j(iω0, jω0) =
∆Iout, jω0
∆V j/iout,iω0
. (2.7)
It is noteworthy that each coefficient is calculated when the other voltage
terms are held constant. We have to mention that the proposed model has a
major difference with broadband polyharmonic model [75]. The assumption
of dominant fundamental signal in [75] is essential to use the harmonic super-
position principle [76]. Therefore, the coefficients in [75] are dependent on the
amplitude of the fundamental voltage. However, in the proposed model, the
voltage variations are captured in the calculation scheme of coeffiicients; hence,
the G and H coefficients are assumed amplitude-independent. The values of
nonlinear operators are calculated for a fundamental frequency of ω0 and we
simply denote them as Gi j and Hi j coefficients. In case that j is not a multiple of
i, the G and H are intermodulation operators. Based on (2.3), there are many
intermodulations between any two harmonics. As an example H2(2ω0, ω0),
H3(2ω0, 3ω0,−2ω0), H4(2ω0, 3ω0,−ω0,−ω0), all relate the second harmonic of the
output voltage and the 3rd harmonic of the output current in a transistor with 4th
order nonlinearity. However, as we simulated, the numerical values of the inter-
modulation components at higher frequencies are small; hence, we neglect the
effect of intermodulation terms. In contrast, for low frequency highly-nonlinear
circuits such as mixers, these coefficients should be considered for the analysis
of mixing products. In [72], the calculation of these intermodulation coefficients
is shown.
13
0.045 0.055 0.065 0.075 0.085
-0.12
-0.08
-0.04
0
0.04
VCC
TL2
TL1
LE=7µm
IE=1.3mA
Fin=120GHz
ΔTL1
ΔTL2
R2
S2
(a)
Real
Im
ag
in
ar
y
VCC
TL2
LE=7µm
IE=1.3mA
Fin=120GHz
Real
Im
ag
in
ar
y
TL1
ΔTL1
ΔTL2
(b)
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
R2
S2
0.035
ΔTL1 ΔTL2
ΔTL1
ΔTL2
ΔTL2
ΔTL1
Figure 2.4: (a) the variations of R2 and S 2 with respect to the illustrated
changes of TL1 and TL2 and, (b) the similar variations on the
transmission line values in a different topology. For an identi-
cal transistor, the topology determines the behavior of the ratio
functions.
2.2.3 Combination of linear passive network and nonlinear de-
vice
As shown in [72], the interconnection of two nonlinear systems, is the sum value
of the corresponding kernels in time domain which translates to summation of
transfer functions in frequency domain. Based on (2.2), a linear system can be
considered as a first-order nonlinear system. Therefore, by placing the transis-
tor in a linear embedding network, the linear operators of the device (Y11, Y12,
Gii and Hii) are updated accordingly; however, the nonlinear coefficients remain
unchanged. For example, by placing the transistor in the general embedding
network of Fig. 2(b), the new linear operators of transistor, are updated accord-
14
-0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
 
 
R2
R4
-0.06 -0.04 -0.02 0 0.02 0.04 0.06
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
 
 
S2
S4
R2
R4
S2
0.0064 75
0.0032 193
0.1227 - 43
0.0063 -74
S2
S4
R2
R4
VCC
TL3
TL1
LE=2µm
IE=1.5mA
Fin=240GHz
TL2
(c)
Real
Im
ag
in
ar
y
Real
Im
ag
in
ar
y
S4
G
24
G
14
0.0021 -105
0.000031 -29
0.00137 241
0.00006 223
H
24
H
14
I
in
I
out
V
in
V
out
LE=2µm
C2
G
44 0.031 -100
0.056 82H44
(b)(a)
Figure 2.5: (a) The selected transistor size and the corresponding G and H
coefficients, (b) topology of the frequency quadrupler and, (c)
the variations of the topology Ri and S i functions by changing
TL1, TL2 and TL3. The mean value of the functions and their
relative location on the real-imaginary axis are also shown.
ingly, i.e.,
In these equations, Yi j,M are the elements of the Y matrix of ZM which are
calculated to be
The proposed nonlinear model is utilized inside any arbitrary circuit topol-
ogy. Using this result, the nonlinear harmonic power optimization is performed
in the next section.
15
2.3 Nonlinear Harmonic Power Optimization
When the transistor operates at a fundamental frequency of ω0, the nonlinear
operators generate components at different harmonics, i.e., nω0. For a more
nonlinear device, the higher order operators appear and contribute to power
generation at higher harmonic index. The frequency dependence of the G and
H operators, takes into account the variations of device characteristic by fre-
quency. In order to evaluate the device performance at a certain harmonic, the
real power flowing out of the device at the particular harmonic is defined as,
PR,i = Re{V∗in,iIin,i + V∗out,iIout,i} (2.8)
The expression in (2.8) consists of two power terms at the input and output ports
of the device. This definition is more comprehensive for oscillators, where the
total power determines the DC-to-RF efficiency. However, in circuits with input,
such as frequency multipliers, the transistor is fed with RF power at ω0 and the
output power, is extracted at a certain harmonic of the fundamental frequency.
Therefore, we solely consider the real power at the output port,
PR,out,i = Re{V∗out,iIout,i}. (2.9)
Based on the harmonic index i, particular nonlinear operators that generate
Iout,i are considered. The general expression of (2.9) is written as,
PR,out,i = Re{V∗out,i[G( ~Vin) + H( ~Vout)]}, (2.10)
where G( ~Vin) and H( ~Vout) represent the contribution of input voltage array and
16
output voltage array on Iout,i, respectively. As an example, if the harmonic con-
tent up to the 4th-order is considered in a nonlinear transistor, operating at the
fundamental frequency of ω0,
Iout,4 = G14V4in,1 +G24V
2
in,2 +G
N
44Vin,4 + H14V
4
out,1 + H24V
2
out,2 + H
N
44Vout,4, (2.11)
represents the major terms that generate the harmonic current, Iout,4. It is note-
worthy that the third harmonic and higher harmonics impact the output current
at the fourth harmonic, only through intermodulation terms. As mentioned be-
fore, based on the discussion on the relative value of these terms compared to
those in (2.11), the intermodulation terms are neglected. The expression in (2.11)
contains different voltage components and the power optimization is not triv-
ial. Moreover, the N exponent on the linear Volterra coefficients, exhibits the
variation of these coefficients by the linear passive network. Therefore, we need
to reduce the number of voltage components and simplify the harmonic current
expression. Simultaneously, the impact of the selected circuit topology on the
nonlinear device should be characterized.
2.3.1 Impact of Circuit Topology and the Ratio Functions
The number of independent parameters in (2.11) can be reduced by relating
the harmonic components and fundamental components. However, the ratio
of harmonic voltage components and the fundamental components, are deter-
mined by the circuit topology. In other words, the structure of passive network,
the initial DC conditions and transistor size determine the ratio of the harmonic
and the fundamental signals at the input and output ports of the device.
17
1
1.5
2
2.5
3
0
100
200
300
400
0
10
20
30
40
1
1.5
2
2.5
3
0
100
200
300
400
0
10
20
30
40
1 1.5
2
2.5
3
0
100
200
300
400
0
10
20
30
40
Circuit simulation
Model Prediction
(a) (b)
(c)
φ (degree) A (V/V)
4
th
 h
ar
m
o
n
ic
 o
u
tp
u
t 
p
o
w
e
r(
µ
W
)
φ (degree) A (V/V)
4
th
 h
ar
m
o
n
ic
 o
u
tp
u
t 
p
o
w
e
r(
µ
W
)
φ (degree) A (V/V)
4
th
 h
ar
m
o
n
ic
 o
u
tp
u
t 
p
o
w
e
r(
µ
W
)
(d)
Pin=8dBm
VCC
TL3
TL1
LE=2µm
IE=1.5mA
Fin=240GHz
TL2
C2
Pout
Figure 2.6: (a) the harmonic power prediction using the approximate
model (2.21), compared with the circuit simulations, (b) the
harmonic power prediction using the moderate model (2.19),
compared with the circuit simulations, (c) the harmonic power
prediction using the accurate model (2.15) compared with cir-
cuit simulations and (d) the circuit schematic used in these sim-
ulations.
Accurate Model
For any fixed circuit topology (invariant passive network with variable values,
constant bias conditions and invariant transistor), the “ratio functions” are de-
fined, i.e.,
18
Vin,i = Ri(Vin,1), (2.12)
Vout,i = S i(Vout,1), (2.13)
and for the 4th harmonic current,
Iout,4 = G14V4in,1 +G24R
2
2(Vin,1) +G
N
44R4(Vin,1) + H14V
4
out,1 + H24S
2
2(Vout,1) + H
N
44S 4(Vout,1),
(2.14)
PR,out,4 = Re{S ∗4(V∗out,1)Iout,4}. (2.15)
By changing the values of passive components around the transistor in a fixed
topology, the values of the ratio functions change. More importantly, as shown
in Fig. 2.4, for a certain transistor, the ratio functions behave completely dif-
ferent when the topology is changed. This is why the topology invariance con-
straint is imposed in this paper to simplify the power optimization.
In Appendix.1, an analytical study of ratio functions are shown for the cir-
cuit of Fig. 2.4(a). The amplitude and phase of each harmonic voltage com-
ponent inside this simple circuit is dependent on many parameters, e.g., the
width of transmission lines, the input source impedance and the transistor lin-
ear/nonlinear operators. The trend and closed-form expressions can become
very complicated in circuits with more passive components. Therefore, we
would like to find a simpler way to perform the power optimization. In the next
two subsections, we will introduce two simpler approximations, which can be
used as an estimation to find the optimum conditions of harmonic power.
19
Moderate Model
As the values of the passive components change, the ratio values also change
slightly for a fixed topology (Fig. 2.3). In addition, the variation of passive
components changes Gii and Hii. We prefer an easier power optimization and
substitute the ratio functions with a fixed value (e.g., ratio constant) for the se-
lected topology. Without loss of generality, the mean value of the ratio values
(R¯i and S¯ i) is selected to be the ratio constant as shown in Fig. 2.5(c), i.e.,
Ri,m = R¯i, (2.16)
S i,m = S¯ i, (2.17)
I
′
out,4 = G14V
4
in,1 +G24R
2
2,mV
2
in,1 +G
N
44R4,mVin,1 + H14V
4
out,1 + H24S
2
2,mV
2
out,1 + H
N
44S 4,mVout,1,
(2.18)
PR,out,4 ' Re{S ∗4,mV∗out,1I
′
out,4}. (2.19)
Approximate Model
Taking into account the variations of the linear coefficients with the passive
components in (2.15) and (2.19) adds to the complexity of the power expression.
To achieve a faster power estimation, we sacrifice the accuracy by neglecting
this effect. As shown later, this approximate version still guides us through the
maximum power location. Therefore, we neglect the impact of passive compo-
nent variations on the linear coefficients, i.e.,
20
I
′′
out,4 = G14V
4
in,1 +G24R
2
2,mV
2
in,1 +G44R4,mVin,1 + H14V
4
out,1 + H24S
2
2,mV
2
out,1 + H44S 4,mVout,1,
(2.20)
PR,out,4 ' Re{S ∗4mV∗out,1I
′′
out,4}. (2.21)
The final simplified harmonic power expression is obtained by substituting
Vout,1 = A1Vin,1, where A1 = Ae jφ is a complex gain. This enables us to find the op-
timum conditions of harmonic power generation in terms of A and φ. Therefore,
the harmonic power is related to the circuit topology (ratio constants), the tran-
sistor nonlinearity (Gi j and Hi j coefficients) and the fundamental frequency gain,
A1. Moreover, the dependency on the amplitude of Vin,1 captures the harmonic
power variations by changing the input power at the fundamental frequency.
2.4 Design of a High Power Frequency Quadrupler
Based on the developed model in the previous section, we design a high power
THz frequency quadrupler. The utilized technology (BiCMOS 130 nm SiGe:C)
has a fmax of 280 GHz [77]; hence, the input frequency is selected below fmax to
sustain the high swing of fundamental frequency. For a target output frequency
of 0.92-0.96 THz, by extraction of the 4th harmonic, an input frequency of 230-240
GHz is selected.
21
2.4.1 Transistor Selection
In selection of the transistor, the major consideration is regarding the parasitics
and current drive. For THz operation, the transistor size is selected to be small
in order to shrink the parasitic resistance and capacitance. On the other hand,
the smallest possible width for MOS or emitter length for BJT does not provide
sufficient transconductance to generate high output power. This is the main
reason that a bipolar transistor (Q1) with an emitter length of 2µm is selected as
shown in Fig. 2.5(a). The Volterra coefficients of the selected transistor are also
listed in Fig. 2.5(a).
φ (degree) A (V/V)
0
50
100
150
200
250
300
350 1
1.5
2
2.5
3
0
5
10
15
20
4
th
 harmonic power in terms of fundamental A and  using the approximate model
4
th
 h
ar
m
o
n
ic
 o
u
tp
u
t 
p
o
w
e
r 
( 
  W
)
φ
Figure 2.7: According to the approximate model, we can find the location
of optimum harmonic conditions in the [A,Φ] plane. In order
to maximize the harmonic power, the transmission lines are se-
lected to reach these optimum conditions.
22
2.4.2 Circuit Topology and Optimum Embedding Network
Among different circuit configurations for a bipolar transistor, the common-
emitter (CE) topology of Fig. 2.5(b) is selected. Due to the extraction of the
output current from the collector node, the CE topology conserves the nonlinear
profile of the device and generates high power harmonics. The second reason is
about the matching feasibility and the power extraction. The base transmission
line TL1 and the junction capacitor of Q1, are used for the input power match-
ing. On the other hand, adding the TL1 transmission line at the base determines
the value of ratio constants (i.e., R and S ) at the input. By adding the TL2 at the
emitter, the values of Ri and S i constants are changed again. Consequently, the
addition of TL2 transmission line, transforms the impedance seen by the cur-
rent source and the bypass capacitor (C2) to a smaller value. For the selected
transistor and topology, by changing the characteristic of TL1, TL2 and TL3, the
ratio functions and corresponding mean values (ratio constants) are shown in
Fig. 2.5(b) and Fig. 2.5(c). The power expressions in (2.15),(2.19) and (2.21) are
used to predict the harmonic power for the circuit topology of Fig. 2.5. As the
results in Fig. 2.6 illustrate, all the 3 different models, exhibit a close match to
the circuit simulations. This is the main reason that the simplified expression in
(2.21) is used in Fig. 2.7 to find the location of maximum 4th harmonic power.
In order to reach these optimum conditions, transmission lines can be selected
similar to [62].
23
Q1
Q2
I
in
I
out
V
in
V
out
(a) (b)
B
E
C
C
E
B Q1 Q2
GND
Substrate Contact
Figure 2.8: (a) the nonlinearity manipulation by adding transistor Q2 and,
(b) the layout of the Q1 and Q2 transistor combination.
2.4.3 Device characteristic manipulation
Besides the passive component effects, the nonlinear device can be manipulated
to further increase the harmonic power. As shown in Fig. 2(b), by adding a
parallel component with the transistor, the linear operators, i.e., Gii and Hii are
manipulated. This is particularly important for operation at higher harmonics
where multiple harmonics are involved in the power generation and more ad-
justments are required before reaching the optimum power. Based on the value
of the G and H functions and the S and R coefficients of the selected topology
in Fig. (4), to achieve a higher 4th harmonic power: 1) The magnitude of G44
should be increased and 2) the value of S 2 and R2 at the output and input ports
of device should be preserved. The first condition is satisfied by adding a pas-
sive component which exhibits a low impedance (high admittance) at the 4th
harmonic frequency. However, the selected component has to provide a high
impedance (ideally zero admittance) at the fundamental and second harmonic
24
frequencies in order to preserve the R2 and S 2 constants. Therefore, we have to
add a small capacitor in parallel with the transistor. Since it is not straightfor-
ward to design small capacitors at the THz frequency range, a bipolar transistor
in cut-off region is utilized instead. The selected 0.6µm transistor, exhibits a high
impedance at the lower harmonics and low impedance at the fourth harmonic.
In order to minimize the parasitics of the combination of the two transistors, the
substrate contact of the confined transistors is redrawn inside a compact struc-
ture as shown in Fig. 2.8. As shown in Fig. 2.9, by combination of Q1 and Q2, the
predicted harmonic power at the design point increases by 2.2 dB, compared to
the case that Q2 is not added. In the circuit simulations, a 2dB power increment
is achieved.
(j   V =AV e
out,1 in,1
)
Design Point
φ (degree) A (V/V)
(b)
4
th
 h
ar
m
o
n
ic
 o
u
tp
u
t 
p
o
w
e
r(
µ
W
)
Design Point
(a)
120
122
124
126
128
130
2.2
2.4
2.6
2.8
3
5
10
15
20
25
30
120
122
124
126
128
130
2.2
2.4
2.6
2.8
3
0
5
10
15
20
4
th
 h
ar
m
o
n
ic
 o
u
tp
u
t 
p
o
w
e
r(
µ
W
)
φ (degree) A (V/V)
Figure 2.9: (a) The model prediction of the harmonic power in the vicinity
of optimum point for the selected topology when (a) Q1 is the
only nonlinear component and (b) the Q1 and Q2 combination
is the nonlinear component.
In order to design close to the optimum point, it is noticed that no extra
passive components are allowed, as the circuit topology changes and the ratio
constants have to be updated. However, as it is verified by the circuit simula-
tions in Fig. 2.5, there exists a combination of TL1, TL2 and TL3 which operates
25
φ (degree) A (V/V)
4th harmonic output power(µW)
φ (degree) A (V/V)
Fundamental output power(mW)
(a) (b)
1
2
3
4
0
100
200
300
400
0
10
20
30
40
1
2
3
4
0
100
200
300
400
-1
0
1
2
3
4
Opt4
Opt4
Opt1
Opt1
Pin=8dBm
VCC
TL3
TL1
LE=2µm
IE=1.5mA
Fin=240GHz
TL2
C2
Pout
(c)
3 3.8
120 180
Figure 2.10: (a) the circuit simulations of the fourth harmonic output
power and (b) the simulated fundamental power. It is clear
that the optimum conditions of the fundamental and fourth
harmonic, Opt1 and Opt4, are different, (c) the simulation
testbench.
close to the optimum point of Fig. 2.7. This point is indicated as Opt4 in Fig.
2.10. On the other hand, due to the nonlinear profile of transistors, the harmonic
power is not maximized by providing the optimum fundamental conditions.
This phenomenon is illustrated in Fig. 2.10, where the fundamental and fourth
harmonic output power reach maximum at different points. This means that
utilizing a linear assumption, fails to guide us towards the optimum point for
the harmonic power. And finally, it is noteworthy that based on the proposed
theory, the maximum harmonic power is found for a particular topology. This
is the main difference of this model with previous linear model techniques. The
proposed systematic design flow of THz frequency multipliers is shown in Fig.
2.11.
26
Design Flow of Beyond  fmax  Frequency Multipliers
Set the fundamental 
frequency (fout/N)
Select the output 
frequency (fout)
Choose a transistor size  
Select the circuit topology 
Find the G/H coeffiicents
Calculate the harmonic 
power 
Is the
 power
 sufficient?
Yes
Complete the design
No
Add passive 
matching circuit
Change the 
topology
Does the 
embedding 
network provide 
the required 
amplitude and 
phase 
conditions?
Yes
No
Find the ratio functions 
either analytically or using 
the approximate model
Does the embedding 
network provide the 
power matching 
conditions?
Yes
No
Change the 
transistor
Figure 2.11: The summary of the proposed design flow for THz frequency
multipliers
2.4.4 Matching Conditions
The output embedding at the collector side is selected such that a large fun-
damental voltage swing is achieved for the selected transistor size. It should
also provide the matching conditions for the output signal and block the cur-
rent flow of unwanted harmonic components to the output. For this particular
frequency quadrupler, the output network is expected to provide the conjugate
matching condition for the 4th harmonic signal. In addition, the harmonic com-
ponents which are crucial for the power generation at the output frequency are
kept confined at the transistor.
As illustrated in Fig. 2.12, by differential operation, the powerful funda-
mental current does not flow toward the common node and remains inside the
27
 Q2
TL1TL1
CT1
Q1
Zf, Z2f, Z4f
CT1
Q1
Q2
TL2 TL2
λ/4 at 4f
Zm,f, Zm,2f, Zm,4f
Zm,f
Zm,2f
Zm,4f
Z4f
Z2f
Zf
Conjugate Match Bias
TL3
TL5
Bias
J
f0
TL4
TL6
 
(a) (b)
Ground plane
J
2f0
J
4f0A
Vbase Vbase
M2M1 W/L=
5  /0.13 
W/L=
0.5  /0.13   
Figure 2.12: (a) frequency quadrupler schematic and the matching condi-
tions at different harmonics, and (b) the distribution of the
current intensity of the harmonics.
transistor. However, the second harmonic currents of the two transistors in a
differential circuit are in phase and can potentially flow towards the output an-
tenna feed point (node A). To block the flow of second harmonic, a matching
network is utilized in the circuit which provides high impedance to the second
harmonic current and blocks it before reaching the output node. This condition
is provided by adding the stub transmission lines (TL5). On the other hand, the
impedance of the stub is seen high at the fourth harmonic and the correspond-
ing current component has to flow through TL6 and reach node A.
2.4.5 Single to Differential Power Conversion
Based on the mentioned advantage of differential operation, the input power
coming from the external source is converted into differential form, using a
balun [78]. There are mainly three considerations on the selected structure of
28
this balun: 1) It has to be wideband, 2) it should exhibit a low conversion loss
and 3) the selected architecture should embed in the layout pattern imposed by
the rest of the circuit. Therefore, a microwave passive short-open balun [79] is
designed as shown in Fig. 2.13. The selected balun has a low conversion loss
(< 0.8 dB) and preserves the phase/amplitude match at the input.
L
Port 1  (Input)
Ground PadGround Pad
 
Port 2
 
Port 3
W
S1 S1
S2
Ground
Ground Open
Open
Parameter Value
S1
S2
W
L
1.8   m
14   m
4.5   m
171   m
2
7
6
5
4
3
1
0
220 225 230 235 240 245
0.60
0.62
0.64
0.66
0.68
0.72
0.74
0.70
d
B
P
h
as
e
 im
b
al
an
ce
 (
d
e
gr
e
e
)
Insertion loss
Amplitude imbalance
250
Frequency (GHz)
(b)(a)
Figure 2.13: (a) the layout of the input balun and (b) the performance sum-
mary of the microwave short-open structure.
2.4.6 Output Power Extraction
The generated harmonic power of the frequency multiplier has to be extracted
using an efficient scheme. In practice, there are two major techniques for THz
wave measurement: 1) Design of low-capacitance pads and probing the output
signal and, 2) radiation of the THz wave with an on-chip antenna. The probe so-
lution faces challenges in terms of designing low-loss pads (due to the substrate
capacitance) and the sensitive performance of the probe. However, by probing,
a wideband measurement is feasible due to the broad transmission S 21 profile
of the THz probes. The alternative to the probing solution is the wave radia-
tion. By approaching THz frequencies, the substrate thickness and wavelength
29
-25
-20
-15
-10
-5
0
0.91 0.92 0.93 0.94 0.95 0.96
S1
1
(d
B
)
Frequency(THz)
-1.6
3.2
5.6 30
-150
60
-120
90-90
120
-60
150
-30
180
0
0.8
-6.5
0.5
4 30
-150
60
-120
90-90
120
-60
150
-30
180
0
-3
Directivity at Φ=0 Directivity at Φ=π/2 
S11
λ/2
(a)
(b)
7.34
5.32
3.33
1.34
-0.66
Figure 2.14: (a) the layout structure of the patch antenna array and (b) the
simulated performance summary of the radiator.
become comparable and energy is lost in substrate modes. Extra wafer thin-
ning and matched silicon lens are utilized to cancel the unexpected substrate
loss [46],[58], [64]. However, in this design, to avoid this issue, a two-element
patch antenna array with broad-side radiation from the top of the chip is uti-
lized as shown in Fig. 2.14. Due to the limited distance between the top metal
layer and ground shield layer (< 5µm), the antenna bandwidth is limited. The
simulated performance of the antenna is shown in Fig. 2.14 and 72% radiation
efficiency at 960 GHz and 7.3 dBi of directivity are achieved. Since the character-
istic impedance of patch antenna is high (100∼200 Ω), an array of two antennas
30
is designed for a better matching.
2.5 Measurement Results
The chip prototype is fabricated in 0.13 µm SiGe:C BiCMOS technology from
STMicroelectronics. The circuit contains the microwave balun, the input match-
ing network, the frequency quadrupler and the patch antenna array. The chip
occupies a small area (0.37 mm2) as shown in Fig. 2.15 and drwas 3 mA of cur-
rent from a 1.9V supply. The measurement setup is shown in Fig. 2.16. A
Input pad and balun
λ/4 open stub
Decoupling capacitors
5
6
0
µ
m
670µm
Output matching
Patch Antenna radiator
Figure 2.15: The die photograph
conventional measurement of THz radiators is by placing the receiver antenna
at far field distance and approximate the transmitted power by using the Friis
equation. In addition, the design of diagonal horn antennas is optimized for the
coupling of plane waves [80]. However, the drawback of the far field measure-
ment is the propagation loss which lowers the received power level significantly
31
signal source
(Agilent 
E8257D)
Power Measurement Setup
VDI 
AMC
(x16)
WR1-diagonal 
horn antenna
Sensor 
Head
Power 
meter 
Erickson 
PM4
DUT
2mm
DUT
DC
Attenuation
control
WR-3.4
BEND
Cascade i325 
GSG probe 
(WR-3.4)
WR10 
waveguide
VDI AMC
(x16)
PM4 Erickson 
Powermeter
Figure 2.16: The implemented power measurement set up.
below the transmitted power (∝ 1/d2).
For this chip, a near-field power measurement is performed due to the lim-
ited power and insufficient sensitivity of the power meter. As shown in Fig.
2.17(a), the WR-1 horn antenna at a spacing of 2mm on top of the radiator col-
lects the radiation. Based on FDTD simulations with a precise EM solver, a
power coupling efficiency of 9.8% is achieved (10 dB of power loss) for the 4th
harmonic radiation from the chip to the input of the WR-10 waveguide in the
near field measurement scheme. In addition, the WR-10 waveguide after the
horn antenna adds a propagation loss of 1dB, measured using a VNA. Com-
bined with the horn antenna coupling loss, a total power loss of 11 dB is consid-
ered. The rectangular waveguide of the horn antenna, enforces an exponential
decay of lower harmonics radiation. On the other hand, the WR-10 waveguide
passes the smaller wavelengths and in particular the THz wave. Moreover, as
illustrated in Fig. 2.17(c), the simulated total radiated power of the 4th harmonic
is significantly higher than the other harmonics. This further guarantees that
32
-0.5
x (in um)
Z 
(i
n
 c
m
)
Ex Field coupling
 
 
1000 2000 3000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
-1
0
0.5
1
2mm
Nλ =100
CPML=-60dB
Drude Copper Consideration
WR-1 Horn 
Antenna
WR-10 Waveguide
Cascade i325 
GSG probe 
(WR-3.4)
2mm
(b)(a)
Near Field Measurement Set up
-100
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
232 464 696 928 1160 1392 1624 1956
Frequency (GHz)
To
ta
l R
ad
ia
te
d
 P
o
w
e
r 
(d
B
m
)
(c)
Figure 2.17: (a) the implemented near-field power measurement and (b)
the diagonal horn antenna near field coupling (d=2mm), sim-
ulated by an accurate FDTD solver and (c) the simulated to-
tal radiated power at different harmonics for an input at 232
GHz.
the unwanted radiations are not coupled to the power meter.
One of the important considerations in measurement of THz waves is the
characterization of the blackbody radiation. Therefore, to perform an accurate
power measurement, the THz radiated power and the thermal emission should
33
be distinguished. The blackbody emission is in particular important when the
silicon die is large and consumes a high DC power. In this measurement, we rely
on an operational feature of the chip, which helps the calibration of the black-
body power. In principle, the quadrupler operation is divided into two different
modes: “radiator on” and “radiator off” as shown in Fig. 2.18. Except the base
voltage of Q1 (VBase in Fig. 2.12), all the voltages are kept constant in the two
cases. By changing the base voltage, the junction capacitors of Q1 and the G and
H coefficient are changed accordingly. However, due to the constant bias of the
tail current source and the constant value of Vdd, the DC power consumption
of the circuit does not change. On the other hand, the variation of the G and
H coefficients changes the radiated power at the 4th harmonic by around 15 dB
(Fig. 2.18(b)). Based on this phenomenon, by measuring the coupled power in
the “radiator off” mode, the coupled power to the power meter is effectively the
thermal radiation. In the “radiation on” mode, the coupled power to the power
meter corresponds to the summation of the direct THz coupled power and the
same blackbody emission. It is clear that the power difference of the two cases
is not related to thermal variations as the power consumption is the same. As
it is seen in the measurement results of Fig. 2.19, the coupled power exhibits
a jump within the lower frequency band of measurement when the radiator is
turned on and remains unchanged out of this band. This validates that the tem-
perature variation is not the cause of these changes. Moreover, based on the
bandwidth of the designed antenna, we expect to observe the THz power radi-
ation in a limited bandwidth, which is also the case in our measurement. The
simulated antenna with HFSS, has a slightly higher bandwidth which is due to
the solution of the antenna based on the surface current components.
Taking into account the power coupling loss, the radiated power at differ-
34
230 231 232 233 234 235 236 237 238 239 240
-22
-21
-20
-19
-18
-17
-16
 
 
VBase=1V (Radiator Off)
VBase=1.7V (Radiator On)
Frequency (GHz)
In
p
u
t 
m
at
ch
in
g 
(d
B
)
-40
-35
-30
-25
-20
-15
920 924 928 932 936 940 944 948 952 956 960
Frequency (GHz)
To
ta
l R
ad
ia
te
d
 
P
o
w
e
r 
(d
B
m
)
-40
-30
-20
-10
0
10
20
30
0 .5 1 1.5 2 2.5 3 3.5 4
0
0.5
1
1.5
2
2.5
3
3.5
Time(ps)
C
o
lle
ct
o
r 
V
o
lt
ag
e
 (
V
)
C
o
lle
ct
o
r 
C
u
rr
e
n
t 
(m
A
)
-2
-1.5
-1
-0.5
0
.5
1
1.5
0 .5 1 1.5 2 2.5 3 3.5 4
Time(ps)
D
e
liv
e
re
d
 C
u
rr
e
n
t
 t
o
 A
n
te
n
n
a 
(m
A
)
Simulation Results of Vdd=1.8V
920 936 944 952 960 968 976 984 992
-23
-22
-21
-20
-19
-18
-17
-16
-15
928 1000
Frequency (GHz)
To
ta
l R
ad
ia
te
d
 P
o
w
e
r 
(d
B
m
) Pin=8dBm
(a)
(b)
Figure 2.18: (a) Simulation results of the wideband power generation and
(b) the circuit simulated operation in “radiator on” and “radi-
ator off” modes.
ent frequencies is calculated. As shown in Fig. 2.19, the circuit radiates a peak
power of -17.3 dBm at 0.928 THz and the antenna bandwidth will block the ra-
diations above 0.944 THz. As we are exciting the input power from an external
source by a probe waveguide and the power meter sensitivity is insufficient,
the E/H plane radiation pattern measurement is not possible for this radiator.
Therefore, based on the simulated antenna directivity (7.3 dBi), the circuit exh-
bits a peak equivalent isotropic radiated power of -10 dBm at 0.928 THz.
The output power radiation of this source is from 0.92-0.944 THz due to the
limited bandwidth of the antenna. However, the quadrupler is in prinicple very
wideband and exhibits only 8 dB of power variation within 80 GHz bandwidth
(920 GHz to 1000 GHz) as the simulation results in Fig. 2.17 verify. According
35
956920 924 928 932 936 940 944 948 952
4
4.2
4.4
4.6
4.8
5
5.2
5.4
Frequency(GHz)
R
e
ce
iv
e
d
 P
o
w
e
r(
µ
W
)
Pin=8dBm, d=2mm
Power Difference Measurement Results
Radiator On
Radiator Off
5 6 7 8
Pin at 232 GHz (dBm)
C
o
n
ve
rs
io
n
 L
o
ss
 (
d
B
)
25
26
28
29
P
o
u
t 
at
 9
2
8
 G
H
z 
(d
B
m
)
Frequency(GHz)
To
ta
l R
ad
ia
te
d
  P
o
w
e
r 
(d
B
m
)
-34
-32
-30
-28
-26
-24
-22
-20
-18
-16
920 924 928 932 936 940 944
Pin=5dBm
Pin=6dBm
Pin=7dBm
Pin=8dBm
-28
-26
-24
-22
-20
 
27
24
-18
Figure 2.19: Measurement results. The power difference technique is uti-
lized to measure all data points.
to Fig. 2.20, the design methodology introduced here, can boost the generated
power and the operation frequency of electronic circuits. To the best of our
knowledge, this circuit demonstrates the highest frequency radiator among all
Si/SiGe sources. Table I, compares the performance of this THz power radiator
with state-of-the-art Si/SiGe sources.
36
0.2 0.6 1
Frequency (THz)
G
e
n
e
ra
te
d
/R
ad
ia
te
d
 P
o
w
e
r 
(d
B
m
) 5
10
0.3 0.4 0.5 0.7 0.8 0.9
-15
-10
-5
0
-20
-30
-25
-35
This 
Work
ISSCC 2014
JSSC 
2011
ISSCC 2014
MTT 2014
ISSCC 2015
ISSCC 
2012
IMS 2013
ISSCC 
2014
MTT
 2013
JSSC
 2015
1.1 1.2 1.3 1.4 1.7 1.8 1.91.5 1.6
Si/SiGe Technology
InP/GaAs Technology
ISSCC 
2011
VLSI 
2015 ISSCC 
2016
JSSC 2011
MTT 2010
MWCL
 2005
MWCL
 2004
Figure 2.20: The performance summary of state-of-the-art THz sources.
2.6 Conclusion
In order to generate power at the higher end of THz frequency range, there are
fundamental limits imposed by the existing technology. In order to extract the
full potential of devices, an accurate model of power generation at the harmon-
ics of the operation frequency ( f0 < fmax) is required. In this work, we propose
a novel nonlinear model of harmonic power generation in electronic circuits
which can be used in THz circuits. Based on the introduced model, a 0.92-0.944
THz frequency quadrupler is designed which demonstrates a radiated power
level that is beyond the reach of conventional circuits. The proposed design
methodology paves the path towards higher power generation as well as reach-
ing higher operational frequencies.
37
Table 2.1: Comparison with state-of-the-art
Technology Source type
Output Freq 
(GHz)
Generated 
Power 
(dBm)
Conversion 
Loss (dB) Pdc (mW) EIRP (dBm)
130 nm SiGe Quadrupler
920-944/ 
920-1000*
-17.3 25.3 5.7 -10
45 nm CMOS Quadrupler 390-440 -10 19.5 700 3
Oscillator/
doubler
270-280 -7.2 NA 810 9.445 nm CMOS
Oscillator 482 -7.9 NA 61 NA65 nm CMOS
Reference
This work
MTT 2013 [48]
ISSCC 2012[58]
JSSC 2011[57]
JSSC 2011[56] 
The first frequency range is from measurement which is limited by the input frequency range. The second frequency range 
is based on simulation.
*
ISSCC 2011 [38] 250 nm SiGe Multiplier -17 3700NA NA820-845
90 nm BiCMOS 480-510Oscillator -16.6 NA 400 NA
Remarks
Radiator/
no lens
Radiator/
Quartz sub.
Radiator/
no lens
Radiator/no 
lens
Probed
Probed
65 nm CMOS Multiplier 1290-1440 -22.7 40 0 -13
Radiator/
no lens
ISSCC 2016[73] 
Oscillator 320 5.18 NA 610 22.5130 nm SiGeISSCC 2015[55]
Radiator/
with lens
65 nm CMOS Quintupler 650-730 -21.3 33.8 0 -22
Radiator/
no lens
VLSI 2015[72] 
MTT 2010 [12]
250 nm InP HBT Oscillator 573 15.35-19.2 115 NA NA
Planar GaAs 
Schottky diode
Multiplier 840-900 15.351.4 NA NA NA
JSSC 2011 [11]
**
** Based on simulated far-field antenna directivity
2.7 Analytical study of ratio functions
In this section, we show how the variations of device embedding in the simple
circuit of Fig. 2.4(a) can impact the ratio functions. Based on the results of
these analyses, the reader may conclude that the analytic derivation of ratio
functions in some circuits might not be straightforward and simpler approaches
similar to Section. II can be used. For simplicity, we consider the case of loss-less
transmisssion lines since they can be designed to exhibit a high quality factor.
We assume an input power source with delivered power of P0 and a corre-
sponding voltage of VS across a source impedance of Z0S . There are two possi-
38
bilities for the base transmission line, TL1.
2.7.1 No reflection from source
This case happens when the input line impedance is identical with the source
internal impedance, i.e.,
Z0L1 = Z0S , (2.22)
which leads to Γs = 0. The amplitude of fundamental voltage at the base of
transistor (Vin1) is expressed in terms of load reflection coefficient, i.e.,
Vin1 = VinL1 = V+0 (1 + ΓL), (2.23)
where
ΓL =
ZinL1 − Z0L1
ZinL1 + Z0L1
, (2.24)
V+0 = VS
ZinS
ZinS + Z0S
. (2.25)
This voltage will generate output fundamental voltages of Vout1 and Vout2 where,
Vout,1 ' jZ0L2Vin,1G11 tan(β1L2), (2.26)
Vout2 ' jZ0L2(V2in,1G12 + H12V2out,1) tan(β2L2). (2.27)
It is noteworthy that β1 and β2 are the propagation constants at the 1st and 2nd
harmonic, respectively. The output harmonic voltage will generate an input
current at 2 f0 by the linear feedback admittance, i.e.,
Vin2 = ZinL1@2 f0Y12@2 f0Vout,2. (2.28)
According to (28)-(32), the ratio functions can be found in terms of the device
properties and the embedding network parameters. By changing the length of
39
TL1, the phase of Vin,1 changes according to (30). This phase variation is linear
up to the first order and is equal to β1∆L1 which results in the same phase varia-
tion at Vout,1. However, the second harmonic voltages are determined by second
power of Vin1 and Vout1; hence, their phases change by 2β1∆L1. Similarly for any
higher harmonic n, the phase of harmonic voltages change by nβ1∆L1. In ad-
dition, for a limited range of variations on the length of transmission lines, the
amplitude of (30) does not change significantly and the amplitude of voltages
can be approximated as constants. Therefore, by changing the length of TL1, the
amplitude of second harmonic ratio functions R2 and S 2 are almost preserved;
however, their phases change by β1∆L1. Fig. 2.21 (b) illustrates the simulated
amplitude and phase of input and output voltages which match with the the-
ory.
On the output side, by changing the length or Z0 of TL2, the output volt-
ages Vout1 and Vout2 change according to (31) and (32). In addition, the second
harmonic voltage at the input Vin2 changes according to (33). Finally, the fun-
damental voltage at the input Vin1 changes according to (29) where the input
impedance is impacted by the variations of the load impedance, i.e.,
ZinL1 ' 1Y11 ||{
−G11
Y11H11 − Y12G11 + ZL||
1
H11
}, (2.29)
where the Y parameters of the transistor and the load impedance of ZL =
jZ0L2 tan(β1L2) appear. Since the trigonometric functions operate differently at
f0 and 2 f0 in (31) and (32), the amplitude and phase of output ratio functions
change this time. Similarly, according to (31) and (32), the amplitude and phase
of ratio functions at the input port also change. Fig. 2.22 illustrates the ampli-
tude and phase variations of fundamental and harmonic voltages as well as the
ratio functions.
40
2.7.2 Reflection from the source
This condition happens when the the internal impedance of the source Z0s and
the characterisitc impedance of the base transmission line TL1 are different. This
condition is undesired if the source reflection leads to a poor power delivery
at the fundamental frequency. However, we would analytically consider the
slight variations from the identical Z0S . More importantly, the relative phase
and amplitude of fundamental and harmonic voltages can change differently in
this case which could lead to a higher harmonic power.
As it is shown in [78], for this case V+0 is impacted by ΓS =
Z0s−Z0L1
Z0s+Z0L1
, i.e.,
V+0 = VS
Z0L1
Z0S + Z0L1
e− jβ1l
1 − ΓLΓS e−2 jβ1l (2.30)
By combination of (28) and (35), the fundamental input voltage is deter-
mined. In this case, the length variation of TL1 impacts the amplitude and phase
of the fundamental voltage Vin1 significantly. By applying (31)-(33), the ampli-
tude and phase variation of other voltage components are also determined. The
results in Fig. 2.23 illustrate the simulated ratio functions for this case.
41
50 100 150 200 250 300 350
695
700
705
710
715
720
725
730
735
740
745
50
55.5
56
56.5
57
57.5
58
58.5
59
59.5
60
402
404
406
408
410
412
414
416
50
97
97.5
98
98.5
99
99.5
100
100.5
-300
-250
-200
-150
-100
-50
0
50
100
 
Vin1
Vin2
Vout1
Vout2
50 100 150 200 250 300 350 100 150 200 250 300 350
100 150 200 250 300 350
50 100 150 200 250 300 350
Vin1
Vin2
Vout1
Vout2
A
m
p
lit
u
d
e
(m
V
)
A
m
p
lit
u
d
e
(m
V
)
A
m
p
lit
u
d
e
(m
V
)
A
m
p
lit
u
d
e
(m
V
)
Length(TL1) (   m)
Length(TL1) (   m) Length(TL1) (   m)
Length(TL1) (   m) Length(TL1) (   m)
P
h
as
e
(d
e
gr
e
e
)
Figure 2.21: The simulated variation of amplitude and phase of input and
output voltage components by variations of length of TL1.
100 300 500 700 900 1100
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
100 300 500 700 900 1100
100 300 500 700 900 1100 100 300 500 700 900 1100
Vin1 Vin2
Vout1
Vout2
A
m
p
lit
u
d
e
(V
)
A
m
p
lit
u
d
e
(V
)
A
m
p
lit
u
d
e
(V
)
A
m
p
lit
u
d
e
(V
)
100 300 500 700 900 1100
-165
-160
-155
-150
-145
-140
-135
-130
-125
-250
-200
-150
-100
-50
0
100 300 500 700 900 1100
-100
-50
0
50
100
100 300 500 700 900 1100
-200
-150
-100
-50
0
50
100
150
200
250
100 300 500 700 900 1100
P
h
as
e
(d
e
gr
e
e
)
P
h
as
e
(d
e
gr
e
e
)
P
h
as
e
(d
e
gr
e
e
)
P
h
as
e
(d
e
gr
e
e
)
Vin1
Vout1 Vout2
Vin2
Length(TL2) (   m) Length(TL2) (   m) Length(TL2) (   m) Length(TL2) (   m)
Length(TL2) (   m) Length(TL2) (   m) Length(TL2) (    m) Length(TL2) (   m)
Figure 2.22: The simulated variation of amplitude and phase of input and
output voltage components by variations of length of TL2.
42
50 100 150 200 250 300 350
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
50 100 150 200 250 300 350
55
60
65
70
75
80
85
90
0.395
0.405
0.415
0.425
0.435
50 100 150 200 250 300 350
90
95
100
105
110
115
50 100 150 200 250 300 350
Vin1
Vout1
Vin2
A
m
p
lit
u
d
e
(V
)
A
m
p
lit
u
d
e
(m
V
)
A
m
p
lit
u
d
e
(V
)
A
m
p
lit
u
d
e
(m
V
)
V out2
50 100 150 200 250 300 350
-350
-300
-250
-200
-150
-100
-50
0
50
Vin1
Vin2
Vout1
Vout2
P
h
as
e
(d
e
gr
e
e
)
Length(TL1) (   m)Length(TL1) (   m)Length(TL1) (   m)
Length(TL1) (   m) Length(TL1) (   m)
Nonlinear 
Phase 
variations
Figure 2.23: The simulated variation of amplitude and phase of input and
output voltage components by variations of length of TL1
when there is source reflection.
43
CHAPTER 3
A 0.43-0.51 THZ SIGE FREQUENCY DOUBLER BASED ON THE
NONLINEAR HARMONIC GENERATION THEORY
3.1 Introduction
Emerging applications of the THz range (0.3-3 THz) include different areas of in-
terest, e.g., molecular spectroscopy, high resolution imaging and ultra-fast com-
munication. To demonstrate THz transmitters, mW order of power should be
generated in order to battle the high propagation loss in this frequency range.
However, there are different challenges to reach the required power, e.g., the
lower power gain of the active devices and the lower quality factor of passive
components.
The two conventional schemes of power generation at the THz range are: 1)
extraction of power from a harmonic VCO [62] and 2) conversion of high power
fundamental signals to the desired harmonic through a chain of frequency mul-
tipliers. Although the former technique exhibits a simpler design, the limited
tuning of oscillators makes the second scheme more favorable.
Previously, frequency multipliers have been demonstrated at the THz range.
However, the bandwidth of operation and saturated output power in most cases
are not sufficient. In this work, a nonlinear model to characterize the active
device and maximize the output harmonic power is introduced. Using a 130 nm
SiGe technology, this work demonstrates the highest frequency among active
multipliers and the highest bandwidth among all sources above 300 GHz.
44

I
out,j
G =
ij j/i
V
in,i


I
out,j
H =
ij j/i
V
out,i
I
in
I
out
V
in
V
out
(a)
Re
Im
G
12
H
12
G
22
H
22
Im
V
in1
V AV
out1 in1
V R V
in2 2 in1
V S V
out2 2 out1
q
f
f
R
S
Re
Vin Vout
ZMI
in
I
out
Y3Y1
(b)
Figure 3.1: (a) The transistor operators of harmonic generation and (b) the
impact of passive network.
3.2 Nonlinear Model of Harmonic Generation
In order to maximize the harmonic power, all the mechanisms that generate the
power at the harmonic of interest should be considered. Based on the Volterra-
Weiner theory of nonlinear devices [72], the active device is characterized by
nonlinear and linear operators, as depicted in Fig.3.1. The Gi j and Hi j operators
represent the gain from the ith harmonic of the voltage at the input and output
ports to the jth harmonic of the output current, respectively. For identical i and
j the operators are linear, otherwise they are nonlinear. By neglecting the small
intermodulation operators at the operation frequency, the 2nd harmonic current
can be written as,
Iout,2 = G1,2V2in,1 + H12V
2
out,1 +G22Vin,2 + H22Vout,2. (3.1)
45
Moreover, the real power generated at the 2nd harmonic can be written as,
Pout,2 = Re{Iout,2V∗out,2}. (3.2)
As illustrated in Fig. 3.1(b), by placing the transistor in a linear embedding net-
work, the linear operators change accordingly and the nonlinear operators re-
main unchanged [72]. For any selected circuit topology, the amplitude (A) and
phase (θ) of fundamental voltage gain, as well as the ratio of the second har-
monic to fundamental voltage components (R2e jΦR and S 2e jΦS ) are determined.
For an invariant circuit topology (invariant passive network with variable val-
ues, constant bias conditions and invariant transistor), by changing the values
of the passive components, A and θ change easily. However, the harmonic ratio
values exhibit very small variations and are considered as constants. Therefore,
for a particular circuit topology, the current expression in (1) can be approxi-
mated as,
Iout,2 ' G1,2V2in,1 + H12V2out,1 +G22 ~R2Vin,1 + H22 ~S 2Vout,1, (3.3)
where ~R2 = R2e jΦR and ~S 2 = S 2e jΦS .
By substitution of Vout2 = S 2Vout1, the 2nd harmonic power expression in (2)
is represented as a nonlinear function of the fundamental voltage components.
Therefore, by changing the values of the passive components, the optimum con-
ditions of the harmonic power generation are determined in terms of A and θ. It
is noteworthy that this nonlinear power expression, is different from the linear
power optimization in [62].
46
3.3 Circuit Design and Optimization
The schematic of the multiplier circuit is shown in Figure .3.2. To take the max-
imum benefit from the nonlinear profile, a differential common-emitter circuit
is used in this design. For a stable operation, MOS tail current sources are used.
However, the impact of degeneration on the nonlinear profile should be mini-
mized. Therefore, tail current source impedance is bypassed with custom finger
capacitors C1 ' 40 f F. Furthermore, TL3 with Z0 = 40Ω shrinks the capacitive
impedance looking from the base and effectively a small real impedance is seen
at the emitter node.
TL1
Q1Q1
TL3
TL5
Base 
Voltage
Base 
Voltage
Balun 
output
TL2
TL4
Zin
Za
Input Matching
Z2f
Zina
TL2
TL1
1.00.50.2 2.0 5.0
Cpad
Vdd
Output
Probe bias-tee
Zb
b
1C 1C
Balun 
output
Z2f
TL4
1.00.50.2 2.0 5.0
2Z2f
2ZbZb
TL5
Cpad
Output Matching
Figure 3.2: Schematic of the active doubler and the input/output match-
ing scenarios on smith chart
The transmission lines TL1, TL2, TL4 and TL5 should satisfy the optimum
power matching conditions. As shown in Figure 3.2, the combination of TL1
and TL2 transforms the impedance seen at node “a” to real part of 100Ω. The
low-loss (S 21=-0.6dB) finger capacitors are designed to reach zero imaginary
47
around the input operation frequency. Therefore, they act as almost ideal de-
coupling capacitors within a sufficiently large bandwidth. Since the two balun
output nodes are matched to the 50Ω input impedance, a wideband fundamen-
tal matching at the input is performed, as shown in Figure 3.3(a). In addition,
inductor of TL1 and Cpi of Q1 form a local resonator which increases amplitude
of Vin,1.
3.3.1 Wideband Operation
Due to the differential operation of the circuit, the fundamental components
reaching node “b” are out of phase and form a virtual ground which reflects
them back to the transistors. However the in-phase 2nd harmonic components
have to be matched to the output port. As shown in Figure 3.2, TL4 and TL5
translate the impedance Z2 f to an impedance with real part of 50Ω and an in-
ductive imaginary part. Using the output pad capacitance (Cpad), the output
is matched to 50Ω within a wide range, as shown in Figure 3.3(b). Between,
the different combinations of TL4 and TL5 the one that reaches the maximum
fundamental swing and provides the fundamental optimum Φ are selected, i.e.,
TL4 = 60µm and TL5 = 90µm.
One of the major features of this active circuit is the bias manipulation to in-
crease the power and bandwidth. In particular, by changing the base voltage of
Q1, the junction capacitors and subsequently the G operators change. As illus-
trated in Figure 3.4(a), the bias manipulation provides a high 3-dB bandwidth of
80 GHz. Moreover, for a fixed frequency, by varying the power level, the maxi-
mum power generation is achieved by the base bias manipulation, as shown in
48
420 440 460 480 500 520 540 560
-12
-10
-8
-6
-4
-10dB bandwidth
400
-14
Frequency (GHz)
Si
m
u
la
te
d
 |
S 
 |
  (
d
B
)
2
2
200 210 220 230 240 250 260 270 280
-13
-12
-11
-10
-9
-8
-7
-6
-10dB bandwidth
Frequency (GHz)
Si
m
u
la
te
d
 |
S 
 |
  (
d
B
)
1
1
(a) (b)
Figure 3.3: Wideband matching at (a) input and (b) output
Figure3.4(b).
1.4 1.6 1.8 2 2.2 2.4
-30
-25
-20
-15
-10
-5
0
5
-35
3
5
7
9
11
13
VB(Base Voltage) (V)
O
u
tp
u
t 
P
o
w
e
r 
(d
B
m
) 15
Input Power (dBm)
V  =1.9V
-15.0
-14.5
-14.0
-13.5
-13.0
-12.5
-12.0
-11.5
-11.0
450 460 470 480 490 500 510 520 530
I =0.5mAC I =1.3mAC
V  =2.3VV  =1.8V
dd dd
dd
VB  =2.4V VB  =2.1VO
u
tp
u
t 
P
o
w
e
r 
(d
B
m
)
Frequency (GHz)
(a) (b)
Pin=5dBm
Figure 3.4: Using bias manipulation technique for (a) wideband power
generation and (b) maximum power generation at a fixed fre-
quency.
3.3.2 Differential Signal Generation
To generate a differential power from the single-ended external source, a mi-
crowave balun is required. The selected balun should exhibit a wideband oper-
ation. Moreover, the conversion loss should be minimized to deliver the max-
imum power to the doubler. Finally, the selected balun should match with the
49
layout pattern imposed by the rest of the circuit. Therefore, a microwave passive
short-open balun is designed as shown in Figure 3.5. The capacitive coupling of
the adjacent transmission lines with proper terminations, leads to out of phase
signal coupling at ports 2 and 3. As depicted in Figure 3.5(b), this balun pre-
serves the phase/amplitude balance as well as a low conversion loss (< 0.8dB)
within a large bandwidth.
2
7
6
5
4
3
1
0
220 225 230 235 240 245
0.60
0.62
0.64
0.66
0.68
0.72
0.74
0.70
d
B
P
h
as
e
 im
b
al
an
ce
 (
d
e
gr
e
e
)
Insertion loss
Amplitude imbalance
250
Frequency (GHz)
(a) (b)
Port 3Port 2
Input Port 1
M
et 1-M
et 6 Ground
M
et 1-M
et 6 Ground
Figure 3.5: (a) 3D illustration of the input pad and balun and (b) the inser-
tion loss and phase/amplitude imbalance simulated by HFSS
3.4 Experimental Results
The doubler is fabricated in 130 nm SiGe BiCMOS technology from STMicro-
electronics. The chip photograph is shown in Figure3.6(a). The circuit occu-
pies a small area of 0.36 mm2 and consumes a maximum DC power of 5.7 mW.
The measurement setup is shown in Figure3.6(b,c). The VDI AMC (X16) ampli-
fier/multiplier chain generates the input power within the limited bandwidth
of 230-240 GHz. The additional measured loss of WR-3.4 bend waveguide and
the WR-3.4 probe adds attenuation to this value; hence a maximum power
of 8dBm is delivered to the circuit. On the output side, the WR-2.2 probe is
used for both power extraction and biasing the chip with the internal bias-tee.
50
The waveguide in this probe fully rejects the components below the cut-off fre-
quency, including the fundamental signal. Moreover, the simulated power of
the 3rd harmonic is 25 dB lower than the second harmonic. Therefore, the out-
put power in this measurement is ensured to be related to the second harmonic
component.
in
p
u
t
o
u
tp
u
t
balun
5
6
0
 m
m
650 mm
signal source
(Agilent E8257D)
VDI AMC 
(x16)
Sensor 
Head Power meter Erickson 
PM4
DUT
DC
Attenuation
control
WR-3.4
BEND
Cascade i325 GSG 
probe (WR-3.4) WR-2.2 to WR10 
taper waveguide
WR-2.2 bend
Cascade i500 GSG probe 
(WR-2.2) with bias-tee
(c)
(a) (b)
Sensor 
Head
WR-3.4 
Probe
WR-2.2 
Probe
VDI AMC (X16) 
Figure 3.6: (a) the die photograph, (b) and (c) the measurement set up.
In Figure 3.7, the measurement and simulation results for an input power
level of 5dBm are compared. The input power level is controlled by the at-
tenuation control of the VDI source. Using the bias manipulation technique,
51
Table 3.1: Performance Comparison of Solid State Sources
Reference [62] [55] [?] [?] [8] [?]
This
Work
fout (GHz) 482 325 480 180 317 390 480
Type oscillator
active
x2
passive
x2
active x2 oscillator oscillator active x2
Pout (dBm) -7.9 -3 -6.3 0 5.2 -26.6 -8.2
CL(dB) N/A 6(gain) 14.3 6.4 N/A N/A 16.2
3-dB BW 0 6.3% 14.6% 11.1% N/A 2.2%
†
16.6%(sim)
Pdc(mW) 61 24 0 39 610 21 5.7
Technology
65 nm
CMOS
130-nm
SiGe
65 nm
CMOS
45 nm
CMOS
130 nm
SiGe
40 nm
CMOS
130 nm
SiGe
† Due to the limited bandwidth of the input source, the full 3-dB bandwidth
could not be measured.
the output power changes by only 0.8dB, within the 20 GHz measured output
bandwidth, which declares the wideband characteristic of the circuit.
In Figure 3.8, the simulation and measurement results of the doubler re-
sponse to power sweep at 240 GHz are illustrated. Based on the simulation
results, this active circuit, exhibits a high saturated power level of 4 dBm and
conversion loss variation of 5.5 dB within 25 dB variation of input power. In
measurements, due to the limited power of the input source, the maximum out-
put power of -8.2 dBm is achieved for an input power of 8dBm.
52
-15.0
-14.5
-14.0
-13.5
-13.0
-12.5
-12.0
-11.5
-11.0
450 460 470 480 490 500 510 520 530
O
u
tp
u
t 
P
o
w
e
r 
(d
B
m
)
Frequency (GHz)
VB  =2.4V VB  =2.2V
D P=0.8dB
Measurement
Simulation
Pin=5dBm
Figure 3.7: Simulation and measurement results for an input power of
5dBm.
-5
-25
-20
-15
-10
-5
0
5
0 5 10 15 20
-18
-17
-16
-15
-14
-13
-12
Input Power at 240 GHz (dBm)
O
u
tp
u
t 
P
o
w
e
r 
at
 4
8
0
 G
H
z 
(d
B
m
)
C
o
n
ve
rs
io
n
 G
ai
n
 (
d
B
)
Measurement
Simulation
VB  =2.3V VB  =1.3V
Figure 3.8: Simulation and measurement results of operation at 240 GHz
for different power levels.
53
CHAPTER 4
SMART DETECTOR CELL: A SCALABLE ALL-SPIN CIRCUIT FOR
LOW-POWER NON-BOOLEAN PATTERN RECOGNITION
HOLLISTIC APPROACHES TO DESIGN HIGH SPEED ELECTRONIC
CIRCUITS
Hamidreza Aghasi, Ph.D.
Cornell University 2017
Abstract—We present a new circuit for non-Boolean recognition of binary im-
ages. Employing all-spin logic (ASL) devices, logic comparators and non-
Boolean decision blocks for compact and efficient computation are designed.
By manipulation of fan-in number in different stages of the circuit, the structure
can be extended for larger training sets or larger images. Operating based on the
mainly similarity idea, the system is capable of constructing a mean image and
compare it with a separate input image within a short decision time. Taking
advantage of the non-volatility of ASL devices, the proposed circuit is capa-
ble of hybrid memory/logic operation. Compared with existing CMOS pattern
recognition circuits, this work achieves a smaller footprint, lower power con-
sumption, faster decision time and a lower operational voltage.
4.1 Introduction
Pattern recognition and in particular, image recognition techniques have been
widely studied in machine learning and image processing [84, 85, 86]. Hard-
ware demonstration of computation units for pattern recognition; however, has
consistently been a challenging problem in terms of chip size, power consump-
tion, computation complexity and decision speed.
Among different solid state technologies, CMOS provides the chance of
low cost, highly-integrated low power implementation for pattern recognition
[88, 89, 87, 90] and processing [91, 92] systems. For boolean logic systems,
CMOS gates exhibit processing speeds up to a few GHz and can be designed to
have a low static power. However, the dynamic power consumption of a large
system with a GHz clock frequency can still limit the scalability. Fan-in and fan-
out considerations for CMOS devices also impact the speed, power consump-
tion and the size of devices. Besides boolean systems, some novel non-Boolean
techniques have been developed to overcome these issues. In non-Boolean sys-
tems, logic gates will no longer be the key block and analog/mixed signal cir-
cuits are used. In [89], the authors propose a technique for non-Boolean training
and detection of image pixels using a network of coupled oscillators. This struc-
ture has the capability to detect any scaled or rotated version of a desired image.
On the other hand, this method suffers from high computational complexity,
large area and high power consumption which limit the application for large
image arrays. To this, we should also add the long convergence time. Other
proposed CMOS systems have demonstrated artificial neural networks (ANN)
by designing circuits emulating neurons and synapses [88, 87]. In these systems,
the larger computation burden, leaves the search open for new solutions.
55
To overcome the limitations of CMOS devices, other technologies are being
investigated for pattern recognition applications. Spintronic devices, in particu-
lar, have received a lot of attention recently because of some unique properties,
e.g., low voltage operation and non-volatility. In [93], a non-volatile logic-in-
memory full adder is fabricated using the magnetic tunnel junctions (MTJ). The
proposed architecture is compared with an 0.18µm CMOS process counterpart
and exhibits major advantages. The dynamic power consumption compared to
a conventional CMOS circuit is reduced by 23% due to reduction in the number
of paths from VDD to GND. On the other hand the static power consumption is
eliminated due to the non-volatility and the chip area is smaller.
As shown in [94], the ASL devices are also non-volatile and the conputa-
tional state is preserved when the power to the circuit is turned off. In [95], a
spin-based artificial neural network (ANN) is proposed using lateral spin valves
to achieve a low power consumption and a low operatoinal voltage. In [96], spin
swithces to develop compact neurons and synapses are proposed. In [97], all-
spin logic (ASL) and charge-spin logic (CSL) devices are shown to be capable
of Boolean and non-Boolean operations which make them an attractive choice
to build some fundamental blocks such as ring oscillators. In addition, the de-
sign of ASL gates with graphene channels have been proposed recently [?]. Due
to the unique features of graphene in terms of the spin transport, the design
of Boolean and non-Boolean computation units with these new devices can be
investigated as a future direction.
The majority gate operation of ASL devices has been previously introduced
in some Boolean logic systems [100, 102]. This unique feature of these devices
can overcome fan-in and fan-out limitations of large integrated systems. Be-
56
sides, the inverting and non-inverting operation modes of ASL devices can
be the key to design many logic circuits e.g., full adder circuits and multipli-
ers [104]. The time domain transient behavior of magnetization in these de-
vices also provides another degree of freedom to demonstrate non-Boolean op-
erations. These features combined, enable us to design an all-spin logic non-
Boolean compact structure with low power consumption and low computa-
tional complexity.
In this paper we propose a novel pattern recognition circuit that takes ad-
vantage of novel features of spintronic devices such as non-volatility, efficient
implementation of majority gates and XOR functions and the ability to distin-
guish strong and weak majorities. Non-volatility of the devices enables storing
large sets of training images within the logic with no standby power dissipa-
tion. This feature also enables instant-on operation and saves on energy and
delay penalties imposed by loading training images from a main memory.
The rest of this paper is organized as follows. Section II describes the oper-
ation of ASL devices. The proposed approach and the basic of computation are
given in Section III. In Section IV, the proposed architecture and a comprehen-
sive discussion on design considerations are presented. Simulation results and
summary are shown in Sections V and VI, respectively.
4.2 All-Spin Logic Devices
Spin of electron is introduced as a new state variable in spintronic devices to
process and store information. This new alternative to charge-based systems,
provides the possibility of achieving an ultra-low voltage operation and easier
57
demonstration of digital systems coming from the bistable nature of spin [98].
In all-spin logic devices, input and output data are represented by the mag-
netization of two ferromagnets [101] which are communicating through a spin-
coherent metallic channel. The physical view of these devices is shown in Figure
4.2(a). As shown in the Figure 4.2 and discussed in [99], the applied voltage on
the input ferromagnet, creates a flow of electrons which moves them from the
supply voltage to the ground. This flow of electrons, becomes spin-polarized
when passing through the input ferromagnet. Since the concentration of the
spin-polarized electrons are different at the input side and the output side of
the channel, the electrons diffuse to the output side. The accumulated spin-
polarized electrons under the output ferromagnet, can switch the magnetization
orientation of the magnet by applying a torque based on spin-transfer torque ef-
fect.
As shown in [100], these devices can be concatenable, exhibit nonlinear char-
acteristics and support all Boolean operations. In all-spin logic operation, by us-
ing the direct spin signal, the nanomagnet can be switched and this signal can
be transferred to the next stage. By storing the information in the spin magneti-
zation of magnets, the input and output magnets can effectively be considered
as digital capacitors linked by a spin-coherent channel. The sign and magnitude
of control voltages applied on the magnets, determine the polarity of majority
spin electrons and the device speed, respectively. Any change of magnetization
in the bistable input magnet can exert a spin current through the channel and
this current can determine the spin magnetization of the output magnet [101].
The channel between the two magnets can be either a metal or a semiconductor
[106, 107, 108]. In our modeling and simulations we assume a copper intercon-
58
-V -V
e
-V -V -V -V
(a)
(c)
(b)
(d)
-V
Metallic  Interconnect
-V
Input
Magnet
Output
Magnet
V>0
M3
M1
M2
MO
(e)
Figure 4.1: (a) Configuration of single ASL device (b) Applied voltage
on the magnet, creates an electric field and enforces electron
movements. (c) Spin-polarized electrons at the input side ex-
hibit a higher density compared to the output side. (d) The
diffusion of spin-polarized electrons towards the output mag-
net, changes the output magnetization direction. (e) An ASL
Majority Gate with 3 inputs [101]. The 3 input magnets, M1,
M2 and M3 are connected to the output magnet MO using 3
metallic interconnects.
nect.
The models utilized in this work are based on [101] where the different phys-
ical effects are captured. The accurate parameters of channel, magnet and inter-
face that determine different performance characteristics, e.g., the spin injection,
59
detection and transport efficiency are taken into account. The most important
size effect parameters for the purpose of this work are the side wall specular-
ity, the grain boundary reflectivity and the average grain size [99]. The average
grain size is assumed to be equal to the width of thickness of the metals [99].
The complete list of parameters is in Appendix B.
4.2.1 Majority Gate Operation
As mentioned earlier, the ASL device supports a majority operation as shown
in Figure 4.1(e). This feature is achieved because the net spin current to the
output magnet can be determined by the sum of all input spin currents from
all the input devices. In principle, this system can be designed for large num-
ber of inputs. As a trade off, by increasing the number of input devices in a
majority gate, the uncorrelated thermal noise of these devices add up and im-
pact the transient magnetization of output magnet, thus we need to make sure
that in this design we have the proper fan-in. As it will be discussed later, if we
only want to monitor the final steady value of the output magnetization, we can
keep increasing the number of input devices as far as the output magnetization
is predictable. Based on the device properties, this phenomenon sets a practical
limit on the number of input devices to a majority gate. On the other hand, if
we care about the transient behavior of the output magnetization, fewer inputs
should be connected to the output magnet to avoid the noise accumulation of
input devices. In our simulations, for 3 and 5 input cases, the transient out-
put magnetization is less impacted by the thermal noise, compared to higher
fan-in numbers. We have to clarify that the steady state value of the majority
gate depends on the sign of applied voltage on the magnets. In case of having
60
a negative voltage applied on the magnets, the magnetization orientation value
will finally be the exact majority of the input magnetizations. However, if the
applied voltage is positive, the steady state value of the output magnet will be
the complementary majority of the input magnetizations.
The interesting phenomenon in ASL majority gates rises from the depen-
dency of the transient behavior of the output magnetization on the number of
similar input magnetizations. This effect can be validated by the fact that the
transferred spin torque increases when there are more magnets with magneti-
zation in the same direction. Figure 4.2, shows the different scenarios of tran-
sient output magnetization in majority gates with 3 and 5 inputs. As shown in
Figure 4.2(a), in a majority gate, with 5 inputs, the switching of output magne-
tization becomes faster when there are more inputs with similar magnetization
directions. As the number of magnets with similar magnetization decreases,
the switching happens slower and the effect of thermal noise is sensed more. In
Figure 4.2(b), the switching transition for two majority gates with 3 inputs and
5 inputs are compared. By considering the fact that the thermal noise accumu-
lation in the gate with 3 inputs is less compared to the gate with 5 inputs, in
the case of having equal net spin currents to both gates, the gate with 3 inputs,
exhibits more deterministic transition.
61
0 500 1000 1500
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
m
ag
n
e
ti
za
ti
o
n
time(ps)
All Spin up
3 spin down currents
3 spin-up currents
1 spin down current
1 spin up current
All Spin down
(a)
0 500 1000 1500
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
m
ag
n
e
ti
za
ti
o
n
time(ps)
3 Input majority gate
5 Input majority gate
(b)
Figure 4.2: (a) Switching transient response for different scenarios of input
magnetization in a majority gate with 5 inputs. (b) Switching
transition comparison of majority gates with 3 and 5 inputs. In
this comparison, the input magnetization of magnets to the 3
input gate are all similar. For the gate with 5 inputs, 4 inputs
have similar magnetization and the net spin current is equal
to the other gate. The applied voltage on the magntes in these
simulations is −5 mV.
62
4.2.2 Switching Delay Variation
The switching time of a ferromagnet is calculated in [111] using the small cone-
angle approximation
τs,ω = τ0
ln(pi/θ0)
χ − 1 , (4.1)
where τ0 is a fitting parameter, θ0 is the initial angle of the magnet, and χ is the
ratio of the magnitude of injected spin current to the critical spin current re-
quired for the switching of the magnetization of the output ferromagent. Based
on this equation, if the value of injected spin current increases, the switching
delay decreases. However, as shown in [101], the channel in this device can
be approximated as an RC network; hence, the injected spin current and the
value of supply voltage are directly correlated. Therefore, the switching delay
is inversely proportional to the value of supply voltage. This result is shown
in Figure 4.3. The device parameters used for these simulations are shown in
Appendix B.
4.2.3 Impact of Thermal Noise
The thermal motion of electrons inside the ferromagents, is the main cause of
thermal noise. The amount of noise is correlated to the temperature of mag-
nets and can directly affect the steady-state precession angel θ0. Based on the
derivations in [110]
< θ20 >=
KbT
Eb
. (4.2)
In this equation, Eb is the barrier energy and T represents the temperature. This
thermal effect acts as the main source of noise which can impact the output
63
5 10 15 20 25 30 35 40 45 50
0.1
0.2
0.3
0.4
0.5
0.6
Equation
Simulation
D
e
la
y(
n
Se
c)
Supply Voltage(mV)
55
Figure 4.3: Switching delay variation versus the supply voltage. Each
point is simulated 3 times to verify the results[104]
.
magnetization. In our simulations, θ0 can differ 5% to 10% from the analytic
solution based on different parameters.
4.3 Pattern Recognition Scheme
Similar to any recognition system, in this work we consider two major phases
for the operation. The first phase is the “learning” phase where the desired pat-
tern is stored in memory. In “detection” phase, the circuit identifies the similar-
ity of an input data and the stored pattern with respect to the decision making
criteria. In the learning phase, the circuit can receive a single image or a training
set. The training set includes multiple training images from different users.
64
In this section, we propose a new technique using all-spin logic devices and
establish a fully spin-based operation. By illustrating several examples, we ver-
ify the performance for various image sizes.
4.3.1 Mainly Similar Images
We first provide the mathematical definition of mainly similarity and then show
how this can help the training of the circuit. In our simulations, all the images
are binary-valued matrices with 0 and 1 representing white and black pixels,
respectively. In our circuit, we assume that binary “0” logic corresponds to
magnetization orientation in −X direction and binary “1” logic corresponds to
magnetization orientation in +X direction.
For a given pair of binary vectors x and y with equal length L, the Hamming
distance [103] is defined as
d(x, y) =
L∑
i=1
1 − δxiyi ,
where xi and yi denote the ith components of x and y respectively and δ is the
kronecker delta. Subsequently, we can exploit this quantity as a measure of
similarity between two images.
Definition 1 Two binary images B and B′ ⊂ {0, 1}m×n are called mainly similar if
the majority of pixels across every two rows are identical. More specifically,
∀k ∈ {1, · · · ,m} : d(Bk,:, B′k,:) < b
n
2
c, (4.3)
where Bk,: denotes the kth row of B and bac represents the floor operation on a (i.e., the
largest integer not greater than a).
65
By this comparison, we ensure that the two images have almost similar pix-
els along the corresponding rows. For the purpose of this paper, we considered
the comparison along the rows, although a column-wise comparison could be
established with no loss of generality. As illustrated in Figure 4.4, being mainly
similar along the rows, does not imply being similar along the columns.
Figure 4.4: The two images are mainly similar (along the rows), however,
the Hamming distance between the third columns is 3 which
does not imply a similarity along the columns
4.3.2 Majority Training and Decision Making
In the learning phase, we train the circuit by providing a number of mainly sim-
ilar images. In reality, these images could be different representations of a target
image (say a character or a certain binary pattern). We build up a representative
of the given similar images by constructing a so-called mean image.
66
Definition 2 For a set of P binary images B1, B2, · · · , BP ⊂ {0, 1}m×n, the corre-
sponding mean image denoted as B¯ is a binary image with entries
B¯(i, j) = nint(
1
P
P∑
k=1
Bk(i, j)). (4.4)
In this equation, nint denotes the nearest integer function. In our circuit, the
mean image represents the desired pattern by the users and is utilized as a ref-
erence. Since this matrix is constructed using all-spin majority gates, the num-
ber of training images, P, is considered to be odd and upper bounded by the
maximum number of inputs to a majority gate as discussed in subsection 4.2.1.
After the training data is stored and the mean image is constructed, we make
a row-wise comparison between the input and the mean image. As we will
see in the next section, depending on the initial value of output magnetization,
the non-Boolean row decision maker can return the total count of matches or
mismatches between the compared rows of input image and the mean image.
4.4 Proposed Structure and Design Considerations
Based on the pattern recognition scheme shown in Section III, we study two
different implementations of the circuit. By comparing the performances of the
two different versions of the single pixel comparator unit, we choose the one with
more capabilities, at the expense of slightly more power consumption and occu-
pied area. In the single pixel comparator, the circuit receives the training pixels
from P different users and the mean image is constructed, subsequently. The
value of the mean pixel is then compared with the corresponding value in the
67
input image and the steady state magnetization of Pixel magnet stores this infor-
mation. The two versions of this unit both operate based on the idea of training
the circuit with a set of mainly similar images and comparison of the single pix-
els from the input image with their correspondence in the mean image. With
respect to the required operations, the single pixel comparator, needs a mem-
ory to store the training data, a logic comparator and a circuit to construct the
mean pixel. As previously mentioned, the mean pixel can be constructed by an
all-spin majority gate; however, for the memory and the comparator, we will
propose a new circuit in the following subsection.
4.4.1 Memory+Logic Comparator
1-bit full adder structures with a total number of 5 nanomagnets have been de-
signed in [104] and [105]. By proper setting of the circuit in [104], we use it as an
area and power efficinet comparator (XNOR) block as shown in Figure 4.5. The
two inputs to this block (A and B) are coming from distinct sources. One of the
inputs comes from the input image synchronized with the control voltage and
the other input is given to the circuit during the learning phase. Compared to
a CMOS counterpart, this structure exhibits very important advantages. First,
it requires 5 magnets whereas the CMOS version requires at least 8 transistors
for XNOR implementation. Second, this circuit has the capability of storing
the training information without extra static power consumption, whereas in
CMOS, excess power is consumed to store this data [93]. Taking advantage of
the non-volatile operation in ASL devices, the input magnets of this circuit can
store the binary data and later the stored information is used to determine the
68
Predefined 
Next Stage
A
B
Cin
Cout Sum
A B Cin Cout Sum
0 0 0 1 1
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
Y
X
Z
Figure 4.5: 1-bit full adder used as XNOR[104]. In the 2D implementation
of this work, X and Y wires are in-plane metal wires and con-
nections along the Z axis are vias.
magnetization direction of the next stages. Figure 4.6 shows the simulated out-
put waveform ( ¯sum magnet) of the XNOR block for different scenarios of input
magnetization. As it is important to consider the breakdown current effects, we
choose the 5mV supply voltage in our simulations. This is to ensure that the
current density is safely below the breakdown value. It is noteworthy that for
channels with higher breakdown current densities, higher voltages can be ap-
plied and the operation speed increases. The control voltage is applied on the
magnets at t = 0.
69
The total power consumption of the XNOR gate is 11µW and the estimated
area is less than 0.3µm2. As we apply a control voltage on the XNOR gate, the
output magnetization remains in −X orientation (initial condition of magnetiza-
tion in this simulation) if the pixel values are different. In case of having similar
inputs to this gate, the output magnetization switches to +X direction as shown
in Figure 4.6. We have to clarify that the initial condition of output magnet does
not change the final magnetization orientation.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Input=1
Pattern=1
Input=0
Pattern=0
Input=1
Pattern=0
Input=0
Pattern=1
O
u
tp
u
t 
m
ag
n
e
ti
za
ti
o
n
time(ns)
Figure 4.6: Simulated output waveforms of XNOR gate
4.4.2 Construction of the mean pixel
As a reliable and simple way to extract the information from the training set,
we construct the mean image as discussed in Section III. The ASL majority gate
with the schematic shown in Figure 4.2(b), provides a low power and efficient
implementation of the mean image. The inputs to this majority gate, come from
P different users. In addition, the images that system receives during the learn-
70
ing phase are constrained to be mainly similar along the rows. By applying
the control voltage on the magnets of this gate, the output magnetization either
switches to other value or remains in the same magnetization orientation. If the
applied control voltage is negative, the output final magnetization orientation
is the majority of the input magnetizations. In the case of positive control volt-
age, the output magnetization settles to the complementary majority value of
the input magnetizations. For this system, since we apply unified positive volt-
ages, the majority gates settle to the complementary majority value. In order to
extract more information from the majority gates operation in this circuit, we
assume a unified value of initial magnetization orientation on the output mag-
nets of each stage of majority gates. This enables us to recognize the total count
of matches or mismatches between the input magnetizations to each majority
gate, as we will discuss later. The total power consumption of each majority
gate in this circuit is 3.75µW and the corresponding estimated area is less than
0.2 µm2.
4.4.3 Single Pixel Comparator
By having the required blocks, we propose the two different versions of the
single pixel comparator.
Standard implementation
The schematic of this implementation and the table with the detailed operation
are shown in Figure 4.7. This circuit operates in the same order discussed in
71
Section III. The first stage of the circuit is a majority gate with inputs coming
from the P users in the learning phase. The output of this majority gate settles
to the corresponding mean pixel value. The output of this gate is connected to a
comparator circuit which has the other input coming from the input image. The
connection is through a short metallic interconnect to minimize the delay. When
the learning phase is over and the detection phase starts, by applying the control
voltage across the magnets of the comparator circuit, the “Pixel” magnet settles
to the comparison value of the mean pixel and the input pixel. It is noteworthy
that the input pixel can be applied on magnet Qi j after the P¯i j magnetization
settles to the mean pixel; hence, no extra memory circuit is required to store the
value of P¯i j.
Comparator-First implementation
In this version, there are the same number of comparator circuits as the total
number of training images at the input side. The comparators have the input
image pixel, Qi j in common and differ in their other input which comes from
their corresponding training image. The output magnets of the comparators
are connected to the “Pixel” magnet through metallic interconnects in a major-
ity gate configuration. During the learning phase, the pattern pixels are stored
in the corresponding input magnets. By applying the control voltage on the
magnets of the circuit, the detection phase starts and the “Pixel” magnetiza-
tion settles to the comparison value of the mean pixel and the input pixel. The
schematic of the circuit and the detailed operation table are shown in Figure4.8.
As it can be verified by comparing the last columns of Figure4.7(b) and
Figure4.8(b), the “Pixel” steady state value is identical in the two versions. To
72
Q1
Cin
Cout
Q11
Cout
Cin
Pixel
P11-1
P11-2
P11-3
To Row
Training Image 1
Training Image 2
Training Image 3
Input Pixel
Input 
Pixel
Pattern 
1
Pattern 
2
Pattern 
3
Majority
Initial 
Value
Majority
Switching
Mean 
Pixel
Output
Final
Value
0 0 0 0 1 NO 0 0
0 0 0 1 1 NO 0 0
0 0 1 0 1 NO 0 0
0 0 1 1 1 YES 1 1
0 1 0 0 1 NO 0 0
0 1 0 1 1 YES 1 1
0 1 1 0 1 YES 1 1
0 1 1 1 1 YES 1 1
1 0 0 0 1 NO 0 1
1 0 0 1 1 NO 0 1
1 0 1 0 1 NO 0 1
1 0 1 1 1 YES 1 0
1 1 0 0 1 NO 0 1
1 1 0 1 1 YES 1 0
1 1 1 0 1 YES 1 0
1 1 1 1 1 YES 1 0
(a)
(b)
2
0
0
n
m
550nm
Figure 4.7: (a) Standard single pixel detector schematic. (b) The truth table
with the detailed operation of the circuit.
verify the identical output result from the two different versions of the imple-
mentation in a more general case, we have to prove that the majority operation
and the comparison (XOR/XNOR) operation are interchangeable, i.e.,
Proposition 1 Given x, y1, y2, · · · , yP as binary variables and P as an odd integer num-
ber,
x ⊕ nint( 1
P
P∑
k=1
yk) = nint(
1
P
P∑
k=1
(x ⊕ yk)), (4.5)
where ⊕ denotes the XOR operation. The mathematical proof of this proposition
is shown in Appendix A.
73
Q1
Cin
Cout
P11-3
Q11 Cout
Cin
Training 
Image 3
Input Image
Q1
Cin
Cout
P11-2
Q11 Cout
Cin
XOR2
Training 
Image 2
Input Image
Q1
Cin
Cout
P11-1
Q11 Cout
Cin
XOR1
Training 
Image 1
Input Image
XOR3
Pixel
Input 
Pixel
Pattern 1 Pattern 2 Pattern 3 Output
Initial 
Value
Output
Switching
Output
Final
Value
0 0 0 0 1 YES 0
0 0 0 1 1 YES 0
0 0 1 0 1 YES 0
0 0 1 1 1 NO 1
0 1 0 0 1 YES 0
0 1 0 1 1 NO 1
0 1 1 0 1 NO 1
0 1 1 1 1 NO 1
1 0 0 0 1 NO 1
1 0 0 1 1 NO 1
1 0 1 0 1 NO 1
1 0 1 1 1 YES 0
1 1 0 0 1 NO 1
1 1 0 1 1 YES 0
1 1 1 0 1 YES 0
1 1 1 1 1 YES 0
(a)
(b)
450nm
8
0
0
n
m
Figure 4.8: (a) Comparator-first pixel detector schematic. (b) The truth ta-
ble with the detailed operation of the circuit.
Although the standard implementation has slightly lower power consump-
tion (Less number of devices) and a smaller area, we select the comparator-first
design as the unit cell of this circuit. This is due to the fact that the output mag-
74
netization transient of this circuit provides more information on the similarity of
the training pixels and the input pixel. Based on Fig ??(b) and Figure4.8(b), the
final value of output magnetizations, in the two cases are identical. However,
the Comparator-first output magnetizations is coming from a majority gate and
switches when the majority of pattern pixels have the same value of the input
pixel. If the majority gate at the output of Comparator-first circuit has a low
fan-in ( e.g., ≤ 5), the switching transient behavior will be less sensitive to the
accumulated thermal noise and the information on the number of training pix-
els with identical values will be provided. On the other hand, in the standard
implementation, the output magnetization is from the XNOR circuit and con-
veys no information on the number of similar pattern pixels. Based on Fig ??(b),
the output magnetization transient will not add information on the number of
training pixels with identical values. This is particularly important when the
user in the detection phase tracks the total count of pattern pixels with the sim-
ilar value.
4.4.4 Non-Boolean Row Decision-Maker
The last stage of the proposed circuit uses the interesting feature of the ASL ma-
jority gate as a means to quickly decide about the mainly similarity of the input
image and the mean image, along the rows. The inputs to this majority gate is
from the “Pixel” magnets of the pixels along the same row of the image. The
connection is through short interconnects to minimize the delay. As mentioned
before, the spin torque transferred from the input magnet to the output mag-
net in the ASL majority gate, is determined by the magnetization of the input
devices. As the number of devices with similar magnetization orientations in-
75
creases, the transferred spin torque increases; hence, the output magnetization
switching becomes faster according to (3). By proper selection of the control
voltage timing and also the dimensions of the nanomagnets and metallic inter-
connects in this gate, a reliable decision-making based on the transient behavior
of output magnetization is achieved. This final majority gate is sensitive to the
uncorrelated thermal noise of input magnets; hence, an intentional low fan-in
number (≤ 5) has to be selected. In our simulations, 3 magnets from the previ-
ous pixel stages are connected to this gate and as it will be shown in simulation
results, a reliable decision-making is achieved.
The complete circuit for the full image comparison consists of two stages.
The unit pixel comparator and the row majority gate. The structure consisting
of the comparator-first circuits and the Row majority gates is called the “Smart
Detector Cell”. This naming convention, helps the discussion of operation in the
next section. We call these detector cells smart because they can perform mul-
tiple tasks of “storage”, “Boolean Computation” and “non-Boolean decision-
making” in a time-efficient manner. The schematic of this circuit is shown in
Figure 4.9. The total power consumption of this circuit is 115 µW and the occu-
pied area is less than 0.5 µm2.
We have to mention that in our simulations, we have taken into account the
effect of magnetization switching. The ASL device acts like a resistive network
and the power consumption will not change with time. In these devices, the
current passes through only one magnet and therefore does not change with
time and the magnet switching of the input side, will influence the switching
delay of the last magnet. In addition, we consider the worst case delay which
76
takes into account the switching delay from the input magnet of the first device
to the output magnet of the last device as well as the transport delay within
the metallic interconnects. For DC power consumption estimations, we have
performed DC and transient simulations and the results are consistent. It is
noteworthy that the low operational voltage of the circuit will lead to a low
power consumption.
In a real implementation of this work, read/write circuits are added to fully
realize the circuit. However, this paper focuses on the processing circuit without
concerns regarding the feeding and extraction of the input and output data. In
order to feed the input data, spin polarized currents are used to initialize the
magnetization of input magnets based on the training images, similar to [95].
On the other hand, the number of write units is equal to the number of pixels,
while there is one output, which translates to small overhead. The decision
data is in form of time delay and can be stored on a capacitor, where the delay
impacts the amount of the stored charge. The other possibility to extract he
output data will be using MTJ devices, as mentioned in [114].
4.5 Simulation Results
In this section, we provide two different examples to show the reliable perfor-
mance of smart detector cells.
77
A B
X
O
R
4
X
O
R
5
X
O
R
6
X
O
R
7
X
O
R
8
X
O
R
9
P
ixe
l 2
P
ixe
l 3
Pixel 1
Q1P11-3
Q11
Training 
Image 3
Input Image
Q1P11-2
Q11
XOR2
Training 
Image 2
Input Image
Q1P11-1
Q11
XOR1
Training 
Image 1
Input Image
XOR3
Input 
Image
Input 
Image
Training 
Image 3
Input 
Image
Training 
Image 1
Input 
Image
Training 
Image 2
Input 
Image
Training 
Image 3
Input 
Image
P
1
2
-1
Q
1
2
P
1
2
-2
Q
1
2
P
1
2
-3
Q
1
2
P
1
3
-1
Q
1
3
Q
1
3
P
1
3
-2
Q
1
3
P
1
3
-3
Training 
Image 1
Training 
Image 2
950nm
1
0
0
0
n
m
Cin
Cin
Cin
C
in
C
in
C
in
C
in
C
in
C
in
Figure 4.9: Structure of the unit smart detector cell
4.5.1 Non-Boolean Hamming Distance Identifier of 3×3 Pixel
Pattern and Input Image
In this example, we only have one training image and one input image. To
compare the similarity of these two images, we need 9 XNOR gates to identify
the similarity of corresponding pixels in the two images and 3 majority gates
78
with Fan-in of 3 to decide on the similarity of the corresponding rows. It is also
obvious that the mean image in this case will be the same pattern image. The
smart detector cell in Figure4.9, has 3 comparator-first circuits and a Row ma-
jority gate. The mainly similarity of the rows can be determined by the Pixel
majority gates. The last majority gate in this case, settles to +X magnetization if
at least 2 rows are mainly similar. The initial magnetizations of the comparators
and the majority gate outputs are set to 1. Figure 4.10 shows the two images
as well as the transient magnetization for various magnets. The Pixel wave-
forms overlap in some cases and that is why we are showing only 3 pixels in
this Figure. As expected, the comparator outputs switch for P21 and P22 pixels
since the values in the input image and the pattern image are different. For the
rest of pixels, the comparator output is +X magnetization and will not switch.
Subsequently, row 1 and row 3 both exhibit perfect similarity and the output of
the corresponding majority gates switch within the shortest time. On the other
hand, row 2 exhibits a mismatch and therefore can not switch to −X magneti-
zation orientation. The control voltage of 5 mV is applied on all the magnets at
t = 0 and the circuit compares the two images in less than 0.6 ns. Compared
to CMOS circuits, this exhibits a much lower operational voltage and decision
time.
4.5.2 Non-Boolean Similarity Comparison of a 9×9 Pixel Image
and a Set of 3 Pattern Images
In order to incorporate the smart detector cells for larger images, we need an
accurate design of the cells. Here, we develop a circuit for training with 9 × 9
79
0 0.1 0.2 0.3 0.4 0.5 0.6
-1
-0.5
0
0.5
1
Pixel 21
Pixel 22
Row 2
Row 1 Row 3
Pixel 11
time(ns)
m
ag
n
e
ti
za
ti
o
n
3x3 Trained 3x3 Input
3 9
9 9
3 9
9 9
Figure 4.10: Using a single smart detector cell, we can compare these 3 × 3
pixel images. The waveforms of the comparators and majority
gates (bottom).
pixel images and perform a non-Boolean comparison between the constructed
mean image and the 9× 9 pixel input image. In this simulation, 3 different users
write the word “Spin” by their own choice of pixels. The 3 pattern images are
shown in Figure 4.11.
In the detection phase, a new user of the circuit, chooses an arbitrary image
of interest as the input. As an example in this simulation, the user chooses the
word “swim” as shown in Figure4.12 (left). The circuit should compare this im-
age and the mean image constructed from the training set.
80
7 1
1 1
0 2
7 9
1 8 1
7 9 1
1
0
7 9 1
1 1
0 8 2
9
7 1
1 1
0 2
8 2
7 9 1
1 8 1
0 8 2
1
2
39 2 9 0
9
9
9
9
9
9
0 0
9
9
9
9
9
9
9
9
9
9
9
2
9
9
9 99
9
9
9
29
7 1
1 1
0 2
7 9
1 8 1
7 9 1
1
0
7 9 1
1 1
8 2
9
7 1
1 1
0 2
8 2
7 9 1
1 8 1
0 8 2
1
2
39 9 0
9
9
9
9
9
9
0 0
9
9
9
9
9
9
9
9
9
9
9
2
9
9
9 99
9
9
9
9
7 1
1 1
0 2
7 9
1 8 1
7 9 1
1
0
7 9 1
1 1
0 8 2
9
7 1
1 1
0 2
8 2
7 9 1
1 8 1
0 8 2
1
2
39 2 9 0
9
9
9
9
9
0 0
9
9
9
9
9
9
9
9
9
9
9
2 2
9
9
9 99
9
9
9
29
Figure 4.11: Training set for the 9 × 9 pixel images
7
1
0
7
1
0
7 9 1
1
0
7 9 1
1 1
0 8 2
9
7 1
1 1
0 2
0 8 2
7 9 1
1 8 1
0 8 2
1
2
39 9 0
9
9
9
9
9
90
0
0
0
0
0
0 0 0
0
0
0
9
9
9
9
9
9
9
9
9
9
9
9
9
9 99
9
9
0
0
0
0
9
9
0
7 1
1 1
0 2
7 9
1 8 1
7 9 1
1
0
7 9 1
1 1
0 8 2
9
8 2
7 9 1
1 8 1
0 8 2
1
2
39 9 0
9
0
9
9
9
9
9
9
9
9
9
9 99
99
9
9
9
9
9
9
9
9
9
9
9
9
9
0
7
1
0
7
0
0
0
0
7 9
1 8
0 8
0
0
0
0
07
1
0
9
8
8
7
10
7
1
0
0
0
1
0
0
9
Input image Mean training image
Figure 4.12: The input image (left) and the representation of the mean im-
age (right). The mean image is not a direct output of the cir-
cuit.
The mean image of the training set is also shown in Figure4.12 (right). One
particular advantage of constructing the mean image can be discussed here. As
it can be seen in Figure 4.12, those pixels which are mistakenly valued by a sin-
gle user (e.g., P26 and P49) in the learning phase, are automatically corrected
when the mean image is constructed. This is specifically useful, when the users
during the learning phase, train the system with multiple versions of an image
to make sure that the mean image represents their desired pattern. The mis-
taken values could be due to any source of error or distortion. In an ideal case
where the thermal noise effect can be ignored, by simply changing the fan-in
81
of different stages in the smart detector cell, the circuit can compare these two
large images. However, in our simulations, as we model the thermal noise ac-
curately, the fan-in considerations mentioned before, are particularly important.
Based on these considerations, we break these 9×9 images into smaller 3×3 sub-
images, where a single smart detector cell unit can be used for the comparison.
The 9 smart detector cells can operate in parallel and the circuit configuration
can be determined by the user. By this breakdown, we can also achieve more in-
formation on the pixels as we can check the mainly similarity for smaller blocks
of the original image. The breakdowns of the mean image (squares on the right)
and the input image (squares on the left) are shown in 3 × 3 partitions in Figure
??.
7 1
1 1
0 2
7 9
1 8 1
7 9 1
1
0
7 9 1
1 1
0 8 2
9
7 1
1 1
0 2
8 2
7 9 1
1 8 1
0 8 2
1
2
39 9 0
9
9
9
9
9
0 0
9
9
9
9
9
9
9
9
9 9
9 99
9
9
9
9
7 1
1 1
0 2
7 9
1 8 1
7 9 1
1
0
7 9 1
1 1
0 8 2
9
8 2
7 9 1
1 8 1
0 8 2
1
2
39 9 0
9
0
9
9
9
9
9
9
9
9
9
9 99
99
9
9
9
9
9
9
9
9
9
9
C11
C22 C32
C41
C42
C52
C43
C72
1
9
9
Input Input InputMean Mean Mean
Input Input InputMean Mean Mean
Input Input InputMean Mean Mean
Figure 4.13: Due to fan-in considerations, the circuit is consisted of 9 smart
detector cells. The corresponding breakdowns of the mean
image and the input image are shown here .
In order to distinguish the different rows of smaller blocks, we use the no-
tation of Ci j clusters, which represents the elements of the ith row from column
3 j − 2 to column 3 j. The magnetization waveforms shown in Figure 4.14 and
82
Figure4.15 separately show the output magnetizations of smart detector cells
for various clusters. The unified initial condition of the output magnet in this
simulation is −X magnetization orientation. In Figure 4.14, the switching delay
of output magnetizations for the clusters with perfect match (C11, C22 and C41)
and those with 1 mismatch (C52, C42 and C32) can be easily distinguished. This
phenomenon was previously described as the unique feature of ASL majority
gates and helps the users to identify the number of mismatches along differ-
ent rows. At the same time, the output magnetization of the clusters with the
same level of similarity, are very close in time domain which makes this non-
Boolean decision-making a reliable metric. On the other hand, in Figure 4.15,
-1
-0.8
-0.6
-0.2
0
0.4
0.8
0.2
0.6
1
-0.4
-0.005
0.005
su
p
p
ly
(v
)
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
time(ns)
0.2
m
ag
n
e
ti
za
ti
o
n
Weak Match
Strong Match
Control 
Voltage
C11
C22
C41
C32
C42
Figure 4.14: The switching delay of output magnetization in last stage rep-
resents the similarity of input data and pattern data.
the output magnetization can not switch for clusters with mismatches (C43 and
C72) and as it can be seen, the level of precession for different mismatch levels
is not the same. This is due to the different amount of spin torques provided
in these two cases. If the user has a very high resolution study on the output
magnetization, this can help to identify the number of mismatches; however,
the switching transient is a more reliable metric and the same information can
83
be extracted by repeating the simulation with the output magnet initial condi-
tion set to +X magnetization.
-0.005
0.005
su
p
p
ly
(v
)
-0.988
-0.990
-0.992
-0.994
-0.996
-0.998
-1
time(ns)
m
ag
n
e
ti
za
ti
o
n
C72, Strong Mismtach
C43, Weak 
Mismatch
Control Voltage
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.20.2
-0.986
Figure 4.15: Since these clusters represent mismatch, they can not switch
and the initial magnetization does not change. Note that the
y-axis is showing from -1.002 to -0.998 in contrast with Figure
4.14 in which the y axis is from -1 to 1.
As it can be seen in all the simulation results, this circuit can make a decision
in almost 1000 ps for a 9 × 9 pixel image, whereas in CMOS, this decision time,
can not be less than few nanoseconds. For a detailed comparison between the
two technologies, in table 1, the performance of this circuit and two existing
CMOS circuits are compared.
4.6 Conclusion
We have presented a novel non-Boolean image recognition circuit based on
all-spin logic devices. The introduced circuit can perform all the phases of a
non-Boolean pattern recognition for binary images. Taking advantage of the
84
Table 4.1: Performance Comparison with existing CMOS systems
Reference [90] [95] This Work
Decision time 30ns N.A. 1 ns
Image Size 32 × 32 86 neurons 9 × 9
DC Power N.A. 2.2mW 990 uW
Area N.A. 0.018mm2 < 1µm2
Technology CMOS Spin-CMOS All-spin
non-volatility of ASL devices, the learning phase operation is performed incor-
porating no additional memory devices. By introducing the mainly similarity
scheme, two different implementations of the circuit are proposed. As verified
by simulation results, this circuit can recognize various sizes of binary image
patterns faster than existing CMOS counterparts and consumes less power with
an operational voltage of 5mV. Since the comparisons in this circuit are based on
ASL majority gates, the computational complexity of the operation is also less
compared to existing circuits. The proposed circuit has applications in fast and
low power image recognition for security, medical imaging, and sensing.
85
4.7 Proof of proposition 1
In this appendix, we mathematically verify (7). Since all the variables are binary-
valued,
0 ≤ 1
P
P∑
k=1
yk ≤ 1.
The nint operation results in 0, when
P∑
k=1
yk <
P
2
, (4.6)
otherwise it results in 1. Therefore, there are 4 different possibilities for the
variables, as shown in table II. In order to simplify the notations, we also define,
zk = x ⊕ yk ∀k ∈ {1, · · · , P}. (4.7)
Table 4.2: Possibilities of x, y1, · · · , yP
x
P∑
k=1
yk nint( 1P
P∑
k=1
yk) x ⊕ nint( 1P
P∑
k=1
yk)
0 < P2 0 0
0 > P2 1 1
1 < P2 0 1
1 > P2 1 0
Here, we verify (7) for the first row of table II. Using the same method for
the other 3 rows, the proposition can be completely proved.
If
P∑
k=1
yk < P2 , fewer than
P
2 of yk’s are 1. Given x = 0, this means that fewer
86
than P2 of zk’s are 1 and the rest are zero, i.e.,
P∑
k=1
zk <
P
2
.
Similar to (8), by applying the nearest integer function,
nint(
1
P
P∑
k=1
zk) = 0.
87
4.8 Simulation Parameters
Size Effect Parameters[99]
Side Wall specularity P 0
Grain Boundary Reflectivity R 0.2
Interface Parameters(Co/Cu)[109]
Majority Spin conductance G↑ 0.375 1/Ω
Minority Spin conductance G↓ 0.125 1/Ω
Real Spin-Mixing Conductance ReG↑↓ 3.43751/Ω
Imaginary Spin-Mixing Conductance ImG↑↓ 9.37×10−3 1/Ω
Ferromagnet(Co)[112]
Ferromagnet Length Lx 75.00 nm
Ferromagnet Width Ly 25.00 nm
Ferromagnet Height Lz 3.00 nm
Gilbert Damping Coefficient α 0.0021
Gyromagnetic Ratio γ 1.76×1011 1/sT
Saturation Magnetization Ms 1.45×106 A/m
Number of spins in magnet Ns 1.34×106 1/V
Energy Density Ku 0.5×105J/m2
Channel(Cu)[113]
Channel Length Lint 212.5 nm
Channel Width Wint 50 nm
Thickness/Width aspect ratio AR 2.0
Channel Thickness Hint 100.0 nm
Cross section Area A 5000 nm2
Finite difference spacing ∆x 10.0 nm
Conductivity σ 41.549 1/µΩm2
Diffusion coefficient D 0.014 m2/s
Permeability µ 0.003 m2/Vs
Spin relaxation time τs 10.939 ps
88
CHAPTER 5
A SIGE TERAHERTZ HETERODYNE IMAGING TRANSMITTER
5.1 Introduction
Electromagnetic radiations in the terahertz range has demonstrated great poten-
tial in the imaging applications for biomedicine, security and industrial quality
control [115], due to its high spatial resolution (compared to millimeter wave)
and non-ionizing natures (compared to X-ray). At present, the barrier to the
wide application of this emerging sensing technology is mainly due to the dif-
ficulty of the high-power signal generation. Conventional THz sources include
quantum-cascade laser (QCL)[116], photoconductive emitter [117], and vacuum
electronics. However, these solutions have significant drawbacks, such as the
high cost, large form factor and stringent operation conditions (e.g. cryogenic
cooling for QCL), etc. Because of these, active THz imaging microsystems using
integrated circuit technology is drawing increasing attentions . In particular,
imagers based on CMOS and BiCMOS processes are expected to not only re-
solve the above problems, but also achieve an high systematic integration level
[119][120]. This makes affordable and portable THz imaging equipment possi-
ble.
However, there are several technical barriers on the paths towards this goal.
First, the radiated power of nowadays THz transmitters in silicon is still insuffi-
cient. This is mainly due to the limited speed and breakdown voltage of the sil-
icon transistors. The first THz CMOS radiator source reported in 2008 only gen-
erates 20-nW power [121]. Ever since then, significant progress has been made
with synergistic efforts in device, circuit and electromagnetism. In [122], 390-
89
µW power is obtained in the 288-GHz radiator based on a triple-push oscillator
topology. In [123], the 338-GHz phased array achieves 810-µW power. In [124],
a self-feeding oscillator array generates 1.1-mW radiated power at 260 GHz. Be-
sides these work in CMOS, radiation sources in BiCMOS processes also demon-
strate great potential, thanks to the superior speed and breakdown voltage of
the SiGe heterojunction bipolar transistor (HBT) [125].
To some extent, larger total radiated power can be obtained through the com-
bination of an increasing number of array elements. By comparison, the DC to
THz radiation efficiency is more relevant to the performance of the devices and
basic circuit blocks. Such merit is also very important for energy and thermal
limited portable systems. As Figure5.1(b) shows, within less a decade, the DC
to THz radiation efficiency of silicon sources has increased by over three orders
of magnitude. However, due to the approach of harmonic generation, the abso-
lute efficiency level is still low. Previously reported highest DC to THz radiation
efficiencies are 0.14% in CMOS [122][124] and 0.33% in SiGe BiCMOS [126].
The challenge of the on-chip active THz imaging system also resides in the
receiver side. Due to the lack of power amplification for THz signals ( fin> fmax),
focal-plane arrays in silicon rely on the direct passive detection using nonlinear
devices, such as Schottky diode [119] and MOSFET [120]. This leads to poor sen-
sitivity and further requires high-power generation from the transmitter. On the
other hand, due to the Rayleigh diffraction limit [127] and the usage of resonant
antenna coupling, the size of an imaging pixel at THz, especially at low-THz
(∼300 GHz) is large. It is therefore difficult to accommodate a large number of
pixels on a single silicon die andmechanical scanning is commonly used.
90
0.2 0.3 0.4 0.5
-15
-10
-5
0
5
10
R
a
d
ia
te
d
 P
o
w
e
r 
(d
B
m
)
Frequency (THz)
 
 
[IMS 2013]
[ISSCC 2013]
[ISSCC 2012]
[ESSIRC 2013]
[ISSCC 2014]
[T-MTT 2014] [ISSCC 2014]
(Incoherent)
★ This Work
(a)
2006 2008 2010 2012 2014 2016
1E-5
1E-4
1E-3
0.01
0.1
1
10
D
C
 t
o
 T
H
z
 R
a
d
. 
E
ff
ic
ie
n
c
y
 (
%
)
Year
 
 10
-2
1
10
-
[ISSCC 2008]
[ISSCC 2011]
[VLSI 2011]
[ISSCC 2012]
[ESSIRC 2012]
[ISSCC 2013]
[ISSCC 2014]
[IMS 2013]
This Work
★
(b)
Figure 5.1: The performance of the state-of-the-art THz radiator sources in
silicon: (a) the total radiated power at varying frequencies and
(b) the achieved DC to THz radiation efficiency over the past
few years.
5.2 Design of a 320-GHz Transmitter
5.2.1 Simulation Results
The operations described above are verified in the full-wave electromagnetic
simulations using HFSS [133]. First, the return-path gap structure is stimulated
by a differential (odd-mode) signal. The two-port simulation set up is shown in
91
0 70µm
0 70µm
E-Field
(a.u.)
0.1
1
E-Field
(a.u.)
0.1
1
(a)
Reflection
Transmission (A→B)
ΓB
ΓA
(b)
Figure 5.2: Full-wave electromagnetic simulation of the THz radiator: (a)
odd-mode excitation/loading ports and the intensity distribu-
tion of the electrical field (b) S-parameters near the fundamen-
tal oscillation frequency of 160 GHz.
Figure 5.2(a)1. Figure 5.2(a) also presents the intensity distribution of the electri-
cal field inside the slots of the structure at the fundamental oscillation frequency
of 160 GHz. It can be seen that the odd-mode signal is able to propagate along
the return-path gap, and transfer from the differential Port 1 to Port 2. Mean-
while, standing waves are formed inside the four folded RF-choke slots. The
results of the S-parameter simulation are plotted in Figure 5.2(b). At 160 GHz,
the insertion loss (S 21) of the structure is only 0.6 dB, which proves that the
return-path gap is transparent to the differential oscillation signal.
1The self-feeding lines and the two transistors (in white) in Figure 5.2(a) and Figure 5.4(a) are
only for the purpose of illustration. They are not included in the actual EM simulation structure.
92
0 25 50 75 100
0
1
2
3
4
5)
W
m( T
B
H eht fo re
woP noitallics
O A (degree)
Posc  = -Re(V1I1*)-Re(V2I2*)
V1
I1
I2
[Y ]
V2=AV1
Aopt=52
Oscillation power at f0:
Half RPGC (in odd mode)
Figure 5.3: The two-port active network including a SiGe HBT and a series
half-RPGC structure at the transistor base. Also shown is the
simulated optimum phase of the complex voltage gain of such
active network at 160 GHz.
Although the RPGC inserted in series with the transmission lines of the self-
feeding oscillator pair does not add much loss, the induced phase shift cannot be
ignored. Therefore, the HBT and one half of the RPGC (in odd-mode operation)
can be considered to be a new equivalent ”transistor” (Figure 5.3). By simulating
the Y-parameters of such combined network, we see that the optimum phase of
the voltage gain is 52◦ (or -308◦).
For the common- (even-) mode operation, the simulation set up is presented
in Figure 5.4(a). The stimulus from Port B represents the in-phase 2nd-harmonic
signal generated at the two bases of the SiGe HBTs. At 320 GHz, the intensity
distribution of the electrical field inside the slots is shown in Figure 5.4(a) too. It
is evident that the injected signal is fully blocked by the return-path gap. Mean-
while, four standing waves inside the folded slots on the right are formed. As
indicated in Figure 5.4(b), the simulated isolation between the two sides of the
return-path gap is better than -30 dB, meaning that the structure is opaque to
the even-mode signal. Please also note that the small reflection coefficient at
Port B (ΓB) means that the 2nd-harmonic signal is fully absorbed by the struc-
93
0 70µm
0 70µm
E-Field
(a.u.)
0.1
1
E-Field
(a.u.)
0.1
1
(a)
Reflection
Transmission 
(B→A)
Radiation
ΓB
ΓA
(b)
Figure 5.4: Full-wave electromagnetic simulation of the THz radiator: (a)
even-mode excitation/loading ports and the intensity distri-
bution of the electrical field (b) S-parameters near the 2nd-
harmonic frequency of 320 GHz.
ture. In fact, the signal is turned into a downward-propagating radiation wave
inside the silicon; and the simulated radiation pattern is shown in Figure 5.5.
The directivity in the perpendicular direction is 5.6 dBi. To reduce the excita-
tion of the substrate-mode wave inside the silicon substrate (250-µm thick), a
backside hemispheric silicon lens is assumed in the simulation (modeled as a
semi-infinite silicon boundary condition beneath the chip substrate) [136]. The
simulated radiation efficiency, including ∼30% power reflection at the silicon-
to-air interface [124], is as high as ∼50%. The additional loss is due to the finite
substrate resistivity (∼10 Ω·cm).
94
90 135 180 225 270
-5
0
5
10
)i
Bd( ytivitceri
D detalu
miS
Theta (degree)
 
 
90 135 180 225 270
-10
0
10
20
)i
Bd( ytivitceri
D detalu
miS
Theta (degree)
 
 
θ 
θ 
Dir (dB)
-10
10
0
Dir (dB)
-10
20
5
E-Plane
H-Plane
E-Plane
H-Plane
Figure 5.5: The simulated radiation pattern of the proposed 320-GHz ra-
diator unit. A backside hemispheric silicon lens is assumed.
Lastly, from Figure 5.2(b) and Figure 5.4(b), it can be seen that the orthog-
onal behaviors of the proposed structure for odd and even mode signals have
a very broad bandwidth. This makes the RPGC structure suitable for future
implementations of wide-tuning source and broadband data transmitter.
The proposed RPGC-based THz radiator is integrated into a 4×4 array of
a 320-GHz transmitter for heterodyne imaging system (shown in Figure 5.6).
Compared to the incoherent fully-intensity-based detection (e.g. [119][120])
where the incident THz wave undergoes a self-mixing, in heterodyne detection
it is mixed with an LO signal with much larger power. The output response,
as well as the imaging sensitivity, are therefore greatly enhanced [134]. Mean-
while, the sinusoidal output of the heterodyne detector also preserves the phase
of the incident THz wave at each pixel. This enables electronic beam scanning
in a multi-pixel configuration, which could potentially eliminate the needs for
the mechanical scanning in conventional THz imaging systems (discussed in
Section 5.1). To perform heterodyne detection, it is critical to lock the phase of
the RF signal in the transmitter and the LO signal in the receiver. To achieve
this goal, in Figure 5.6 the 16-element radiator array is phase-locked by a fully-
integrated PLL through injection locking at 160 GHz. The phase of the radi-
95
ILFD (÷8)
CML
(÷32/32.5)
PFD
CP fREF
Vctrl
160GHz PLL
160GHz 
VCO
f≈80GHz
320GHz Radiator Array
Figure 5.6: The architecture of the 320-GHz transmitter with a fully-
integrated phase-locking loop.
f0 f00
o
0
o180
o
180
o
Z=∞ 
CPW GCPW
f0 f00
o 0o180o 180oZ=0 
(a)
f0 f00
o
0
o180
o
180
o
Z=∞ 
CPW GCPW
f0 f00
o 0o180o 180oZ=0 
(b)
Figure 5.7: The mutual coupling between adjacent radiators: (a) in-phase
coupling mode (supported) (b) out-of-phase coupling mode
(unsupported).
ated wave at 320 GHz is then locked to an externally applied reference clock at
∼310 MHz. Next, some critical design details of the transmitter are given.
5.2.2 Coupled Radiator Array
Although the proposed return-path gap structure optimizes the generation ef-
ficiency of THz radiation, the absolute power level generated by each radia-
96
tor unit is still limited by the HBT size. Therefore the 16-element array is im-
plemented for increasing the total radiated power. The power combining is
through the constructive superposition of the radiated waves in the far field.
Such quasi-optical power combining [135] is efficient, broadband, and highly
scalable. The array is partitioned into four rows, in which elements are passively
coupled. The mutual coupling between radiators is through a CPW transmis-
sion line tapping on the self-feeding lines of the radiators (shown in Figure 5.7).
By symmetry, there are in-phase and out-of-phase coupling modes in the steady
state2. In the in-phase coupling mode (Figure 5.7(a)), the boundary between two
units is equivalent to an open termination, hence the added CPW lines behave
as shunted capacitors. To minimize the impact of such capacitors, the signal
path of the CPW lines is designed to be very narrow (W=3 µm) and far from the
ground (D=6 µm). On the other hand, the out-of-phase mode (shown in Fig-
ure 5.7(b)) leads to a virtual ground at the connector of the coupling lines. It
presents highly inductive suseptance in shunt with the self-feeding lines, which
greatly reduces the oscillation power; this undesired mode is therefore naturally
suppressed.
Due to the compactness of the proposed radiator design, the entire 16-
radiator array, equipped with functions of fundamental oscillation, harmonic
generation and on-chip radiation, only occupies an area of 0.9×0.9 mm2. As a
result, the achieved radiator density is ∼4X higher than the prior arts [123][129].
The small radiator pitch also helps suppressing the side lobes of the combined
beam. The simulated radiation pattern of the array is shown in Figure 5.8. The
simulated peak directivity is 17.6 dBi.
2Any coupling phase between 0◦ and 180◦ leads to net power flow between units, and is
therefore unstable. The mutual dragging and pulling eventually converges back to either in-
phase/out-of-phase modes.
97
90 135 180 225 270
-5
0
5
10
)i
Bd( ytivitceri
D detalu
miS
Theta (degree)
 
 
90 135 180 225 270
-10
0
10
20
)i
Bd( ytivitceri
D detalu
miS
Theta (degree)
 
 
θ 
θ 
Dir (dB)
-10
10
0
Dir (dB)
-10
20
5
E-Plane
H-Plane
E-Plane
H-Plane
Figure 5.8: The simulated radiation pattern of the 320-GHz 4×4 radiator
array. The pitch between the elements is 220 µm, and a back-
side hemispheric silicon lens is assumed.
5.2.3 On-Chip Phase-Locked Loop
Shown in Figure 5.6, the on-chip PLL consists of 4 coupled VCOs, provid-
ing 160-GHz injection-locking signal to each radiator row. A divider chain
samples the phase/frequency of the VCO linear array and then a global
phase/frequency control signal Vctrl is provided through a phase detector cas-
caded by a charge pump. Figure 5.9 presents the schematic of the VCO, in-
cluding two output buffers at 80 GHz and 160 GHz. The VCO is based on a
differential Colpitts oscillator topology, in which the resonance tank on one side
is mainly formed by the transmission line stub TL1, MOS varactor C1 and the Cpi
of the HBT transistor Q1. Compared to the cross-coupled topology, the advan-
tage of the Colpitts oscillator is that the large, untunable Cpi is in series with C1;
therefore for the same HBT size and tuning range, we can use a smaller size var-
actor. Since MOS varactor at millimeter-wave frequency is very lossy, the above
design strategy leads to higher oscillation power at 80 GHz. In addition, the
oscillator has two common-base cascode stages in parallel (Q3∼Q6). Q3 and Q4
increase the generation of the 2nd-harmonic signal at 160 GHz, which is further
amplified by a casocode buffer. For the VCO at the bottom in Figure 5.6, Q5 and
98
VB1
VB5
VB2
VB3VDD
Vctrl
VDD
VB6
VDD
VB4
To 
Radiator
80GHz Buffer
160GHz Buffer To divider
(or dummy load)
160GHz VCO
160GHz
To 
VCOi+1
VB5
To VCOi-1
 VCOi
TL1
TL2
C1
Q1
Q2
Q3
Q4
Q5
Q6
Figure 5.9: The schematic of the 160-GHz VCO inside the on-chip phase-
locked loop.
Q6 are used as a differential buffer to provide the 80-GHz output to the divider
chain inside the PLL.
Between the adjacent VCOs, a tight coupling is obtained by directly con-
necting the intermediate nodes of their transmission line TL2 at the emitters of
the core HBTs (Figure 5.9). Similar to the radiator coupling described in Sec-
tion 5.2.2, the VCOs are coupled with in-phase mode, and the coupling bound-
aries present open (hence has no impact other than phase alignment) to the VCO
circuit. Since each VCO is coupled to its neighbors, and the only global signal
routing is the low-frequency varactor bias control Vctrl, this proposed PLL archi-
tecture is highly scalable, and can accommodate even bigger radiator size.
5.3 Prototype And Experimental Results
The proposed 320-GHz transmitter is implemented using a 130-nm SiGe:C BiC-
MOS process ( fT/ fmax=220 GHz/280 GHz [125]). The microphotographs of the
99
1
.3
m
m
1.6mm
Radiators
V
C
O
s
 +
 B
u
ff
e
rs
Other PLL Blocks
200µm
9
0
µ
m
(a)
SiGe 
Chip
Hemispheric, 
High-Resistivity 
Silicon Lens
High-Resistivity 
Silicon Wafer
PCB
(b)
Figure 5.10: (a) The microphotograph of the 320-GHz transmitter using
130-nm SiGe BiCMOS process. The THz radiator based on
the return-path gap coupler is also shown. (b) The chip pack-
aging with the backside attachment of a silicon lens.
chip as well as a THz radiator unit, are shown in Figure 5.10(a). The entire chip,
including the 4×4 radiator array and the PLL, occupies an area of 1.6×1.3 mm2.
The chip packaging is shown in Figure 5.10(b). First, the chip is mounted onto a
high-resistivity silicon wafer (∼1 cm2). The wafer is then glued to a PCB with a
hole, so that the exposed front side of the chip is wire-bonded to the metal leads
on the PCB for the connections of DC power supply, bias, and the PLL reference
clock signal. Finally, a hemispheric, high-resistivity silicon lens (with a diame-
ter of 1 cm) is fixed on the other side of the wafer after alignment. Compared
to the radius of the lens (5 mm), the distance between the THz radiator and the
spherical center of the lens is small (0.4 mm). Therefore, the beam collimation
effect due to the lens is not significant.
100
Signal Source (~20GHz)
Spectrum 
Analyzer
VDI WR-3 
EHM
Diplexer
WR-3 Horn 
Antenna
Frequency / Spectrum Testing
φ
θ
Sensor 
HeadmW
WR-3-10 
Taper
Erikson PM4 
Power Meter
Power Measurement
WR-10 
Waveguide
PCB
DC
IF
LO
LNA
Loss
0.7dB
Distance, d ≥ 6cm
fref
DC Supply/Bias 
Control Board
Packaged Chip
Figure 5.11: The measurement setup for the 320-GHz transmitter and a
photo of the packaged chip.
The measurement setup is shown in Figure 5.11. The output THz beam of
the chip is received by a diagonal horn antenna. For testing the frequency and
spectrum of the radiation, a VDI WR-3.4 even-harmonic mixer (EHM) is used
to mix the input THz signal with the 16th harmonic of an externally-applied LO
signal (∼20 GHz). The measured spectrum of the down converted IF output is
shown in Figure 5.12. When the on-chip PLL is turned off, radiator units are
not synchronized and oscillate at their own free running oscillation frequencies.
This is indicated in the multiple spurs in Figure 5.12(a). The number of spurs
does not equal to the number of radiators (N=16); this may be due to the mutual
pulling between some of the radiators through the silicon substrate. When the
on-chip PLL is turned on, a single, coherent radiation is measured, as shown in
the spectrum in Figure 5.12(b). Due to the constructive power combining, the
output power is 7-dB higher than the radiation measured in the former case.
The measured phase noise of the radiation is -79 dBc/Hz at 1-MHz offset.
Next, an Erikson power meter (with a WR-10 interface) is used to measure
the absolute power level of the radiation. Shown in Figure 5.11, an additional
101
PLL OFF PLL ON
Δ=7dB
Figure 5.12: The measured down-converted spectrum of the transmitter
radiation: (a) on-chip PLL is OFF and (b) on-chip PLL is ON.
1” WR-10 waveguide is used to protect the metal flange of the sensor head, and
another 1” WR-10 to WR-3.4 taper is used to connect to the WR-3.4 horn an-
tenna with a smooth transition. The total loss of such additional connection is
0.7 dB. To begin with, the distance between the 320-GHz transmitter and the
horn antenna, d, is changed from 4 cm to 9 cm. The associated power received
by the horn antenna, Pr, is plotted in Figure 5.13. It can be seen that when the
distance is larger than 6 cm, the roll-off of the received power complies with the
Friis transmission equation (Pr ∝ d−2 [137]). Because of this, all the subsequent
measurements are based on the far-field distance limit of 6 cm. With this dis-
tance, the received power is 61 µW, resulting in a transmitter effective isotropic
radiated power (EIRP) of 22.5 dBm.
Using the harmonic mixer, the radiation pattern of the transmitter is mea-
sured by rotating the chip in both azimuth ϕ and elevation θ directions (Fig-
ure 5.11). When the silicon lens is attached on the back side of the chip, the ra-
diation pattern is shown in Figure 5.14(a). The measured directivity is 17.3 dBi
and the 3-dB beamwidth is ±10◦. Such high directivity is due to the coherent
16-element array configuration, and is consistent with the simulation in Sec-
tion 5.2. In addition, the radiation performance without the backside silicon
lens is also characterized. It is expected that the output beam will undergo more
102
3 4 5 6 7 8 9 10
20
60
100
140
180
R
e
c
e
iv
e
d
 P
o
w
e
r 
(
W
)
Distance (cm)
 
 
@ VDD=2.15V
Figure 5.13: The received radiated power of the power meter at varying
distance, d, from the 320-GHz transmitter chip.
-30
-20
-10
0
0
30
60
90
120
150
180
-30
-20
-10
0
 
 
-30
-20
-10
0
0
30
60
90
120
150
180
-30
-20
-10
0
 
-10dB -20dB
0
30
60
90
120
150
180
Air
Hi-Res Hemisphperic
Si Lens
Chip
E-Plane
H-Plane
Hi-Res Si 
Wafer
Chip
Hi-Res Si 
Wafer
-10dB -20dB
0
30
60
90
120
150
180
E-Plane
H-Plane
(a)
-30
-20
-10
0
0
30
60
90
120
150
180
-30
-20
-10
0
 
 
-30
-20
-10
0
0
30
60
90
120
150
180
-30
-20
-10
0
 
-10dB -20dB
0
30
60
90
120
150
180
Air
Hi-Res Hemisphperic
Si Lens
Chip
E-Plane
H-Plane
Hi-Res Si 
afer
Chip
Hi-Res Si 
Wafer
-10dB -20dB
0
30
60
90
120
150
180
E-Plane
H-Plane
(b)
Figure 5.14: The performance of the state-of-the-art THz radiator sources
in silicon (a) the total radiated power (b) DC to THz radiation
efficiency.
divergence at the silicon-to-air interface, due to the more significant refraction
near the critical angle (θc=16◦ [124]). Nevertheless, the measured pattern shown
in Figure 5.14(b) still has a high directivity of 13 dBi, with an associated 3-dB
beamwidth of ±27◦ (H-plane) and ±13◦ (E-plane). The asymmetric profile is due
to the different reflection rates for s-polarized wave and p-polarized wave [127].
Finally, the supply voltage of the transmitter, VDD, is swept. The associated
103
0.3 0.5 0.6 0.7
0
1
2
3
4
R
a
d
ia
te
d
 P
o
w
e
r 
(m
W
)
DC Power (mW)
 
0.4
0.5
0.6
0.7
 D
C
-T
H
z
 R
a
d
. E
ffic
ie
n
c
y
 (%
)
Figure 5.15: The total radiated power of the 320-GHz transmitter, as well
as the associated DC-to-THz radiation efficiency, at different
DC power supply voltage and dissipation power.
total radiated power, DC power consumption, as well as the DC to THz radi-
ation efficiency, are plotted in Figure 5.15. Here, the total radiated power of
the chip equals to the measured EIRP subtracted by the measured directivity.
At the VDD of 2.15 V, the total radiated power reaches its maximum of 5.2 dBm
(3.3 mW), and the associated DC power is 610 mW. This leads to a DC to THz ra-
diation efficiency of 0.54%. It is noteworthy that even when the backside silicon
lens is removed, the measured total radiated power and the DC to THz radia-
tion efficiency are still as high as 0.9 dBm (1.2 mW) and 0.2%, respectively. Such
small degradation indicates that when a highly directive beam (perpendicular
to the silicon-air interface) is generated from a large-array configuration, the
undesired impact of the refraction and internal reflection (i.e. substrate-mode
wave) is less significant. This should eventually lead to an efficient on-chip ra-
diation without the need for a silicon lens.
104
5.4 Conclusions
This work achieves one of the highest radiated powers and DC to THz radiation
efficiency numbers among all Si/SiGe technologies. Without the silicon lens,
the output power is higher than most of the other radiators (only 0.1-dB lower
than [126], which operates at a lower frequency and uses a faster SiGe process).
Therefore, the proposed radiator design based on the return-path gap coupler
has fully optimized the THz generation potential of the silicon transistors. It is
also noteworthy that the proposed device analysis approach and radiator design
can be applied to other integrated circuit technologies.
105
CHAPTER 6
LOW POWER NEGATIVE INDUCTANCE INTEGRATED CIRCUIT FOR
GHZ APPLICATIONS
6.1 Introduction
In recent years, non-Foster circuits have attracted a great deal of attention due
to their capability to overcome the gain-bandwidth limitation. It is well-known
that all passive networks are bound by this limitation, and therefore any in-
crease in the bandwidth results in a lower gain, and vice versa. However, with
the use of non-Foster circuits, it is possible to simultaneously improve both gain
and bandwidth. Non-Foster elements are capable of neutralizing the induc-
tive or capacitive properties of any passive networks, such as matching net-
works, antennas, and impedance surfaces, over a wide range of frequencies by
providing negative inductors or negative capacitors. These active negative ele-
ments can be synthesized using Negative Impedance Converters (NIC). In 1954,
Linvill designed the first working transistor-based NIC circuit which was capa-
ble of providing negative resistance for low frequency operations (below 1MHz)
[139]. The design was based on Open Circuit Stable (OCS) operation, implying
that the circuit is stable only when there is relatively large impedance across the
port. Recently, Sussman-Fort and his colleague have shown that the bandwidth
characteristic of an electrically small antenna can be significantly enhanced us-
ing non-Foster circuits [140]. In their paper, a non-Foster circuit which is capa-
ble of providing a negative capacitor in the frequency range of 20 MHz to 100
MHz is designed and implemented using discrete components. However, with
the advancement in silicon technology, more practical integrated non-Foster cir-
106
cuits can be designed and implemented for a wide range of applications. For
instance, at high-frequency applications (above 1GHz) where the impedance of
the load is not purely resistive, a non-Foster matching circuit can be used to
maximize the power transfer between the source and the load. Non-Foster cir-
cuits can also be used to achieve electrically small broadband antennas, as well
as wideband metamaterials and metasurfaces [141],[142].
6.2 Negative Impedance Converter Design
Up to this date, researchers have theoretically analyzed many different imple-
mentations of NIC circuits. However, only few have built and tested their cir-
cuits to verify that they can provide a low loss and stable output at high frequen-
cies [140], [143], [144]. In this paper, an NIC circuit based on Linvill’s open cir-
cuit stable topology is proposed. The design is fully CMOS compatible, which
allows having a device with lower power consumption compared to its BJT and
BiCMOS counterparts [145]. The core of the NIC is a cross-coupled differential
pair of n-MOS transistors as shown in Figure 6.1(a). The NIC converts the load
impedance seen by the drain of M1 and M2 to its negative equivalent at the in-
put terminal. To investigate this analytically, a small signal model is developed
as shown in Figure6.1(b), and the input impedance of the NIC (Zin) is obtained
in terms of the load impedance (ZL) [141].
Zin =
2r0
1 + gmr0
+
zL
1 + gmr0
− ZLgmr0
1 + gmr0
(6.1)
where r0 and gm are the transistor’s output resistance and transconductance,
respectively. Assuming gmr0  1, equation (1) can be approximated as
Zin =
2
gm
− zL
gm2
=⇒ Zin ' −ZL (6.2)
107
From (2) it can be seen that the obtained negative impedance comes with a re-
sistance of 2gm which can be minimized by increasing gm.
6.3 Negative Inductor Implementation
A practical implementation requires a number of design considerations, such as
biasing circuits, transistor’s type, and transistor’s aspect ratio to name a few. All
these parameters should be chosen carefully in order to achieve a linear, stable
output with minimal noise.
Figure 6.1: A Basic non-Foster element, (a) Negative Impedance Converter
circuit implementation based on Linvill’s OCS model, (b) the
equivalent small-signal model of the proposed NIC based on
CMOS technology.
6.3.1 Biasing Circuits
First, a self-biased current source, which is typically less sensitive to the sup-
ply voltage fluctuation, is chosen for this task [146]. Figure 6.2 illustrates the
schematic of a self-biased current source. In addition, analysis shows that the
108
Figure 6.2: Schematic for the self-biased current source using an on-chip
resistor Rs.
Figure 6.3: Schematic diagram of the Negative Impedance Converter
(NIC) integrated circuit which has been implemented in a 65
nm process and it produces a negative inductance of L = −1nH.
output current, Iout, can be obtained as follows:
Iout =
2
µncox(wl )R
2
s
(1 − 1√
k
)2 (6.3)
where µn, cox and wl are the electron mobility, oxide capacitance and transis-
109
Figure 6.4: Reactance vs. frequency for both positive inductor (solid red
curve), and negative inductor (dotted blue curve).
Figure 6.5: Negative inductance simulation (L = −1nH) with a real part
of approximately 45 Ω (dashed blue), imaginary part equal to
L = −1nH (solid red), and ideal imaginary part for L = −1nH
(dotted green).
tor’s aspect ratio, respectively. Then, high impedance active loads are placed
between the cross-coupled transistors and the power supply in order to avoid
having a short circuit RF signal. Note that these active loads must present very
high impedances in order to reduce loading effects at the NIC output. Current
mirrors are then attached to the bottom of the cross-coupled transistors to pro-
vide the current through which gm of cross-coupled transistors is determined.
The complete schematic of the proposed NIC circuit is illustrated in Figure6.3,
which includes a self-biased current source, active loads, a current mirror and a
110
Figure 6.6: Microphotograph of the proposed negative inductor design in-
cluding GND, Power and RF pads (left), and measurement
setup (right).
Figure 6.7: Comparison of the measured and simulated (Spectre-RF)
impedance of the negative inductance circuit (L = −1nH).
general load of Z. In this design, the self-biased current source provides a cur-
rent of 1.7mA resulting in a transconductance of gm=30mS for the cross-coupled
transistors.
6.3.2 circuit simulation
In this section, the implementation of a negative inductor is demonstrated in
detail. A negative inductor has the same magnitude of reactance compared to
111
Figure 6.8: Measured and simulated results for the negative inductance.
a positive inductor, but with the opposite sign. Figure 6.4 demonstrates the be-
havior of the positive and negative reactance for an inductor. The configuration
shown in Figure 6.3 is used to obtain the negative inductor by replacing the
general load (Z) with an inductive load of L = 1nH. As expected, the NIC will
produce a negative load equivalent to L = −1nH at its output, as marked by RF+
and RF-. Figure 6.5 illustrates the simulated negative inductance obtained by
Spectre-RF [147]. The simulation also shows that the NIC consumes 5.1 mW of
power when it is connected to a 1.2 V power supply, and that the output noise
voltage is 0.75, 0.70, and 0.65 nV/
√
Hz at 0.1, 1, and 3 GHz, respectively at 27C.
6.4 Fabrication and Measurement Result
The negative inductance design shown in Figure 6.3 has been fabricated in a
65nm CMOS process on a 0.3 × 0.3 mm2 die. Figure 6.6 demonstrates the mi-
crophotograph of the fabricated negative inductance design, along with its mea-
surement setup.
112
The prototype was measured using an Agilent E8364B Vector Network Ana-
lyzer (VNA), and a 1.2 V DC supply was used to power up the chip. The dif-
ferential nature of the NIC output port requires the use of two Ground-Signal-
Ground (GSG) probes. These probes are attached to the VNA ports with an
impedance of 50Ω. Probe tips are then calibrated prior to the measurement by
performing the short-open-load calibration. Figure 6.7 compares the measured
and simulated (Spectre-RF) impedance of the negative inductance circuit oper-
ating from 0.1 GHz to 6 GHz. Furthermore, the equivalent inductance is ex-
tracted as a function of frequency, and a relatively constant negative inductance
is observed over the same range of frequency as shown in Figure 6.8.
6.5 Discussion
The simulated and measured results are in excellent agreement for a wide range
of frequencies from 100 MHz - 6 GHz. At higher frequencies, a slight variation
is observed which is primarily due to parasitic effects and possibly the test set
up. The Open Circuit Stable (OCS) condition implies that the NIC is stable when
the load connected to the NIC is large compared with the NIC impedance. In
other words, the NIC output impedance should be relatively small compared
to the load in order for the design to remain stable. The resistance associated
with the negative L, which is inversely proportional to the transconductance
of the cross-coupled transistors as formulated in (2), varies from 40 Ω to 50 Ω
for the proposed design. In order to reduce this resistance, one can increase
the gm by injecting more current into the cross-coupled transistors at the cost of
increasing power consumption. Therefore, there is a trade-off between power
consumption, chip size, and loss.
113
CHAPTER 7
CONCLUSION AND FUTURE DIRECTIONS
In this thesis we have introduced new approaches to design high speed elec-
tronic circuits. We show how by accurate characterization of electronic de-
vices, the performance of high speed circuits can be improved. In the proposed
approaches we blend multiple disciplines, e.g., analog circuit design, device
physics, electromagnetic and applied mathematics.
For design of high frequency and high power electronic circuits, we propose
a systematic design approach which employs a new nonlinear device model.
The proposed nonlinear power modeling facilitates the design of blocks and
systems where the harmonic content of the signal is important. Harmonic en-
hancement techniques can be performed more precisely in different classes of
power amplifiers to improve the efficiency. Harmonic suppression in funda-
mental oscillators based on Manley-Row principle enhances the fundamental
power efficiency. In other words, a harmonic power suppression based on a
nonlinear model leads to design of more efficient local oscillator circuits. As a
summary, by capturing the harmonics behavior in nonlinear circuit blocks, the
performance of transceivers, communication links and imaging circuits at mm-
wave and sub-mm-wave range is enhanced.
The proposed nonlinear design approach can be applied to emerging tech-
nologies. III-V compound semiconductor devices, e.g., GaN and InP, exhibit
a larger bandgap compared to the existing Silicon technologies. The intrinsic
superior electron mobility in N-type devices provides a high cut-off frequency
114
( fmax) and makes them a good candidate for Terahertz applications. Addition-
ally, equipped with a higher break-down voltage, these devices are well-suited
as the future platform of high power Terahertz implementation. Therefore, de-
vices such as High-Electron-Mobility transistors (HEMT’s) have been devel-
oped to pave the path towards high frequency applications. GaN transistors
with a Johnson’s Figure of merit of 27.5 times larger than Silicon, enhance the
power levels and operation frequencies of high speed electronic circuits signifi-
cantly. Despite the mentioned advantages of the compound devices, their oper-
ation beyond fmax requires a nonlinear model to capture the harmonic dynamics.
Taking advantage of my proposed technique to characterize the nonlinear pro-
file of transistor, future Terahertz electronic circuits can reach beyond 1 THz of
operation frequency and the unwanted gap of optics and electronics, Terahertz
Gap, will further shrink.
Terahertz waves exhibit a superior resolution compared to lower frequency
components and leave the material non-ionized in contrast to X-ray imagers.
Due to the plethora of water absorption bands from 300 GHz to 3 THz, sub-
mm-wave imaging is suitable for biomedical applications. In particular “cancer
tumor monitoring”, “in-vivo tooth cavity detection” and “spectroscopy” can
be implemented, should there be sufficient generated power at this frequency
range. The proposed nonlinear Terahertz design techniques, pave the path to-
wards the implementation of high power integrated circuits with sufficient out-
put power. It is noteworthy that the antenna elements at this frequency range
can be eaisly fit on the silicon chip and a fully integrated design is possible. Last
but not least, these integrated circuits can be packaged to be utilized in-vivo for
sensitive monitoring applications and simplify the tracking of different biomed-
115
ical mechanisms and organs.
In the domain of high speed computation, we have proposed another bench-
mark application of spin devices which exploited their intrinsic features. We se-
lected “non-Boolean computation” since the non-volatile profile of all-spin de-
vices simplifies the implementation of majority gates and comparators. Called
“Smart Detector Cell”, the system performs pattern recognition on matrices of
binary images within a few nanoseconds and returns the number of mismatches
in a non-Boolean scheme. This circuit outperforms CMOS counterparts in terms
of DC power consumption, computational complexity, processing time and ef-
fective area. To the best of our knowledge, this system is the first all-spin circuit
to perform binary image recognition.
By introduction of “approximate computers”, certain problems such as moni-
toring and tracking can be performed 100 times faster than conventional pro-
cessors. Search engines may not also find the exact match of an input query;
however, they can find many acceptable answers using non-exact techniques.
All-spin circuits such as adders, multipliers and non-volatile storage units are
the key building blocks of an approximate computer, which is in principle ca-
pable of implelmenting most machine learning algorithms. Operating at lower
voltages, these units are ultra-low power and exhibit a high integration density.
Our proposed all-spin pattern recognition circuit also performs a non-Boolean
approximate computation and can be utilized in the design of future artificial
intelligence units such as neural networks. In addition, the non-volatility of
spin devices, facilitates the implementation of exact and approximate storage
units which are the key to training complex models such as deep neural net-
works. Ultimately, these systems are expected to mimic the operation of human
116
brain for pattern recognition applications and pave the path towards the more
efficient implementation of “Human-Machine Interface”.
117
BIBLIOGRAPHY
[1] H. R. Aghasi, A. Cathelin, E. Afshari “A 0.92 THz SiGe Power Radiator
Based on a Nonlinear Harmonic Generation Theory,” IEEE Journal of Solid
State Circuits.(2016)
[2] H.R. Aghasi, E. Afshari, “Design of Broadband mm-Wave and Terahertz Fre-
quency Doublers” Invited Paper to ESSCIRC 2016.
[3] H. R. Aghasi, R.M. Iraei, A. Naeemi, E. Afshari “Smart Detector Cell: A
Scalable All-Spin Circuit for Low Power Non-Boolean Pattern Recognition,”
IEEE Transactions of Nanotechnology 15.3 (2016): 356-366.
[4] S. Saadat, H. R. Aghasi, , E. Afshari, H. Mosallaei “Low Power Negative
Inductance Integrated Circuit for GHz applications,” IEEE Microwave and
Wireless Components Letters, 25(2), 118-120.
[5] R. Han, C. Jiang, A. Mostajeran, M. Emadi, H.R. Aghasi, H. Sherry A. Cathe-
lin, E. Afshari “A SiGe Terahertz Heterodyne Imaging Transmitter with 3.3-
mW Radiated Power and Fully-Integrated Phase-Locked Loop”, IEEE Jour-
nal of Solid-State Circuits, vol. 50, no. 12, pp. 2935-2947, Dec. 2015.
[6] R. R. Schaller, “Moore’s law: past, present and future,” in IEEE Spectrum,
vol. 34, no. 6, pp. 52-59, Jun 1997.
[7] R. Han and E. Afshari, “A High-Power Broadband Passive Terahertz Fre-
quency Doubler in CMOS,” in IEEE Transactions on Microwave Theory and
Techniques, vol. 61, no. 3, pp. 1150-1160, March 2013.
[8] R. Han, C. Jiang, A. Mostajeran, M. Emadi, H. Aghasi, H. Sherry, A. Cathe-
lin, E. Afshari,“A 320GHz phase-locked transmitter with 3.3mW radiated
power and 22.5dBm EIRP for heterodyne THz imaging systems,” in Solid-
State Circuits Conference - (ISSCC), 2015 IEEE International , vol., no., pp.1-3,
22-26 Feb. 2015
[9] P. H. Siegel, “Terahertz technology in biology and medicine,” IEEE Trans.
Microw. Theory Tech., vol. 52, no. 10, pp. 24382447, Oct. 2004.
[10] M. Tonouchi, “Cutting-edge terahertz technology,” Nature Photonics, vol. 1,
pp. 97105, Feb. 2007.
118
[11] T. W. Crowe, W. L. Bishop, D. W. Porterfield, J. L. Hesler, and R. M. Weikle,
“Opening the terahertz window with integrated diode circuits,” IEEE J.
Solid-State Circuits, vol. 40, no. 10, pp. 21042110, Oct. 2005.
[12] K. B. Cooper, R. J. Dengler, G. Chattopadhyay, E. Schlecht, J. Gill, A.
Skalare, I. Mehdi, and P. H. Siegel, “A high-resolution imaging radar at 580
GHz,” IEEE Microw. Wireless Compon. Lett., vol. 18, no. 1, pp. 6466, Jan. 2008.
[13] F. C. De Lucia, D. T. Petkie, and H. O. Everitt, “A double resonance ap-
proach to submillimeter/terahertz remote sensing at atmospheric pressure,”
IEEE J. Quantum Electron., vol. 45, no. 2, pp. 163170, Feb. 2009.
[14] K. B. Cooper, R. J. Dengler, N. Llombart, T. Bryllert, G. Chattopadhyay, E.
Schlecht, J. Gill, C. Lee, A. Skalare, I. Mehdi, and P. H. Siegel, “Penetrating
3-D imaging at 4-and 25-m range using a submillimeter-wave radar,” IEEE
Trans. Microw. Theory Tech., vol. 56,
[15] L. A. Samoska, “An overview of solid-state integrated circuit amplifiers in
the submillimeter-wave and THz regime,” IEEE Trans. Terahertz Sci. Technol.,
vol. 1, no. 1, pp. 9-24, 2011.
[16] S. P. Voinigescu et al., “A study of SiGe HBT signal sources in the 220-330-
GHz range,” IEEE J. Solid-State Circuits, vol. 48, no. 9, pp. 2011-2021, Sep.
2013.
[17] A. Tessmann, “220-GHz metamorphic HEMT amplifier MMICs for high-
resolution imaging applications,” IEEE J. Solid-State Circuits, vol. 40, no. 10,
pp. 2070-2076, Oct. 2005.
[18] V. Radisic, D. Sawdai, D. Scott, W. Deal, L. Dang, D. Li, J. Chen, A. Fung, L.
Samoska, T. Gaier, and R. Lai, “Demonstration of a 311-GHz fundamental
oscillator using InP HBT technology,” IEEE Trans. Microw. Theory Tech., vol.
55, no. 11, pp. 23292335, Nov. 2007.
[19] M. Seo et al., “InP HBT IC Technology for Terahertz Frequencies: Funda-
mental Oscillators Up to 0.57 THz,” in IEEE Journal of Solid-State Circuits,
vol. 46, no. 10, pp. 2203-2214, Oct. 2011
[20] A. Maestrini et al., “A Frequency-Multiplied Source With More Than 1 mW
of Power Across the 840-900 GHz Band,” in IEEE Transactions on Microwave
Theory and Techniques, vol. 58, no. 7, pp. 1925-1932, July 2010
119
[21] A. Maestrini et al., “Local oscillator chain for 1.55 to 1.75THz with 100-W
peak power,” in IEEE Microwave and Wireless Components Letters, vol. 15, no.
12, pp. 871-873, Dec. 2005
[22] A. Maestrini et al., “A 1.7-1.9 THz local oscillator source,” in IEEE Mi-
crowave and Wireless Components Letters, vol. 14, no. 6, pp. 253-255, June 2004
[23] Z. Lao, J. Jensen, K. Guinn, and M. Sokolich, “80-GHz differential VCO in
InP SHBTs,” IEEE Microw. Wireless Compon. Lett., vol. 14, no. 9, pp. 407409,
Sep. 2004.
[24] B. S. Williams, “Terahertz quantum-cascaded lasers,” Nature Photonic, vol.
1, pp. 517-525, 2007.
[25] E. Afshari, H. Bhat, Li Xiaofeng, A. Hajimiri, “Electrical funnel: A broad-
band signal combining method,” in Solid-State Circuits Conference, 2006.
ISSCC 2006. Digest of Technical Papers. IEEE International , vol., no., pp.751-
760, 6-9 Feb. 2006
[26] N. Saito et al., “A fully integrated 60-GHz CMOS transceiver chipset based
on WiGig/IEEE 802.11ad with built-in self calibration for mobile usage,”
IEEE J. Solid-State Circuits, vol. 48, no. 12, pp. 31463159, Dec. 2013.
[27] S. Shahramian, Y. Baeyens, and Y.-K. Chen, “A 70-100 GHz directconver-
sion transmitter and receiver phased array chipset demonstrating 10 Gb/s
wireless link,” IEEE J. Solid-State Circuits, vol. 48, no. 5, pp. 11131125, May
2013.
[28] N. Pohl, T. Klein, K. Aufinger, and H. Rein, “A low-power wideband trans-
mitter front-end chip for 80 GHz FMCW radar systems with integrated 23
GHz downconverter VCO,” IEEE J. Solid-State Circuits, vol. 47, no. 9, pp.
19741980, Sep. 2012.
[29] J. Chen et al., “A digitally modulated mm-Wave cartesian beamforming
transmitter with quadrature spatial combining,” in IEEE ISSCC Dig. Tech.
Papers, 2013, pp. 232233.
[30] E. Cohen, M. Ruberto, M. Cohen, O. Degani, S. Ravid, and D. Ritter, “A
CMOS bidirectional 32-element phased-array transceiver at 60 GHz with
LTCC antenna,” in IEEE Radio Frequency Integrated Circuits Symp. Dig., 2012,
pp. 439442.
120
[31] L. Kong, D. Seo, and E. Alon, “A 50 mW-TX 65 mW-RX 60 GHz 4-element
phased-array transceiver with integrated antennas in 65 nm CMOS,” in
IEEE ISSCC Dig. Tech. Papers, 2013, pp. 234236.
[32] A. Moroni, R. Genesi, and D. Manstretta, “Analysis and design of a 54 GHz
distributed Hybrid wave oscillator array with quadrature outputs,” IEEE J.
Solid-State Circuits, vol. 49, no. 5, pp. 11581172, May 2014.
[33] W. Shin, B. Ku, O. Inac, Y.-C. Ou, and G. M. Rebeiz, “A 108-114 GHz
4×4 wafer-scale phased array transmitter with high-efficiency on-chip an-
tennas,” IEEE J. Solid-State Circuits, vol. 48, no. 9, pp. 20412055, Sep. 2013.
[34] A. Valdes-Garcia et al., “A fully integrated 16-element phased-array trans-
mitter in SiGe BiCMOS for 60-GHz communications,” IEEE J. Solid-State Cir-
cuits, vol. 45, no. 12, pp. 27572773, Dec. 2010.
[35] K. Kawasaki et al., “A millimeter-wave intra-connect solution,” IEEE J.
Solid-State Circuits, vol. 45, no. 12, pp. 26552666, Dec. 2010.
[36] B. P. Ginsburg, S. M. Ramaswamy, V. Rentala, E. Seok, S. Sankaran, and
B. Haroun, “A 160 GHz pulsed radar transceiver in 65 nm CMOS,” IEEE J.
Solid-State Circuits, vol. 49, no. 4, pp. 984995, Apr. 2014.
[37] V. Giannini et al., “A 79 GHz phase-modulated 4 GHz-BW CW radar TX in
28 nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2014, pp. 250251.
[38] J. Lee, Y.-A. Li, M.-H. Hung, and S.-J. Huang, “A fully-integrated 77-GHz
FMCW radar transceiver in 65-nm CMOS technology,” IEEE J. Solid-State
Circuits, vol. 45, no. 12, pp. 27462756, Dec. 2010.
[39] A. Arbabian, S. Callender, S. Kang, M. Rangwala, and A. M. Niknejad, “A
94 GHz mm-wave-to-baseband pulsed-radar transceiver with applications
in imaging and gesture recognition,” IEEE J. Solid-State Circuits, vol. 48, no.
4, pp. 10551071, Apr. 2013.
[40] P. Chen, P. Peng, C. Kao, Y. Chen, and J. Lee, “A 94 GHz 3D-image radar
engine with 4TX/4RX beamforming scan technique in 65 nm CMOS,” in
IEEE ISSCC Dig. Tech. Papers, 2013, pp. 146148.
[41] F. C. Ii, L. Gilreath, S. Pan, Z. Wang, F. Capolino, and P. Heydari, “Design
and analysis of a W-band 9-element imaging array receiver using spatial-
121
overlapping super-pixels in silicon,” IEEE J. Solid-State Circuits, vol. 49, no.
6, pp. 13171332, Jun. 2014.
[42] Q. J. Gu, Z. Xu, H.-Y. Jian, X. Xu, M. F. Chang, W. Liu, and H. Fetterman,
“Generating terahertz signals in 65nm CMOS with negative-resistance res-
onator boosting and selective harmonic suppression,” in IEEE Symp. VLSI
Circuits Dig., Jun. 2010, pp. 109110.
[43] B. Razavi, “A 300-GHz fundamental oscillator in 65-nm CMOS technol-
ogy,” in IEEE Symp. VLSI Circuits Dig., Jun. 2010, pp. 113114.
[44] D. Huang, T. R. LaRocca, M. C. F. Chang, L. Samoska, A. Fung, R. L.
Campbell, and M. Andrews, “Terahertz CMOS frequency generator using
linear superposition technique,” IEEE J. Solid-State Circuits, vol. 43, no. 12,
pp. 27302738, Dec. 2008.
[45] M. Adnan and E. Afshari, “A 247-to-263.5 GHz VCO with 2.6 mW peak
output power and 1.14% DC-to-RF efficiency in 65 nm bulk CMOS,” in IEEE
ISSCC Dig. Tech. Papers, 2014, pp. 262263.
[46] J. Grzyb, Y. Zhao, and U. R. Pfeiffer, “A 288-GHz lens-integrated balanced
triple-push source in a 65-nm CMOS technology,” IEEE J. Solid-State Circuits,
vol. 48, no. 7, pp. 1751-1761, Jul. 2013.
[47] P. Y. Chiang, O. Momeni, P. Heydari, “A 200-GHz Inductively Tuned VCO
With -7-dBm Output Power in 130nm SiGe BiCMOS,” IEEE Transactions on
Microwave Theory and Techniques, Oct. 2013.
[48] E. Seok and K. K. O , “A 410 GHz CMOS push-push oscillator with an on-
chip patch antenna,” in 2008 IEEE Int. Solid-State Circuits Conf. Dig. Tech.
Papers, Feb. 2008, pp. 472-473.
[49] B. Razavi, “A millimeter-wave circuit technique,” IEEE J. Solid-State Cir-
cuits, vol. 43, no. 9, pp. 2090-2098, Sep. 2008.
[50] B. Heydari, M. Bohsali, E. Adabi, and A. Niknejad, “Millimeter-wave de-
vices and circuit blocks up to 104 GHz in 90 nm CMOS,” IEEE J. Solid-State
Circuits, vol. 42, no. 12, pp. 2893-2903, Dec. 2007.
[51] E. Laskin, P. Chevalier, A. Chantre, B. Sautreuil, and S. Voinigescu, “165-
GHz transceiver in SiGe technology,” IEEE J. Solid-State Circuits, vol. 43, no.
5, pp. 10871100, May 2008.
122
[52] S. Nicolson, K. Yau, P. Chevalier, A. Chantre, B. Sautreuil, K. Tang, and S.
Voinigescu, “Design and scaling of W-band SiGe BiCMOS VCOs,” IEEE J.
Solid-State Circuits, vol. 42, no. 9, pp. 1821-1833, Sep. 2007.
[53] R. Wanner, R. Lachner, and G. Olbrich, “A monolithically integrated 190-
GHz SiGe push-push oscillator,” IEEE Microw.Wireless Compon. Lett., vol. 15,
no. 12, pp. 862-864, Dec. 2005.
[54] S. Trotta, H. Knapp, K. Aufinger, T. Meister, J. Bock, W. Simburger, and
A. Scholtz, “A fundamental VCO with integrated output buffer beyond 120
GHz in SiGe bipolar technology,” in IEEE MTT-S Int. Microwave Symp. Dig.,
Jun. 2007, pp. 645-648.
[55] E. jefors, B. Heinemann, and U. R. Pfeiffer, “Active 220-and 325-GHz Fre-
quency Multiplier Chains in an SiGe HBT Technology,” IEEE Transactions on
Microwave Theory and Techniques, May 2011
[56] F. Golcuk, O. D. Gurbuz, G. M. Rebeiz, ”A 0.39-0.44 THz 2x4 Amplifier-
Quadrupler Array with Peak EIRP of 3-4 dBm,” IEEE Transactions on Mi-
crowave Theory and Techniques, Dec. 2013.
[57] O. Momeni and Ehsan Afshari, “A 220-to-275GHz traveling-wave fre-
quency doubler with 6.6dBm Power at 244GHz in 65nm CMOS,” ISSCC Dig.
Tech. Papers, pp. 286-287, Feb. 2011.
[58] R. Han and E. Afshari, “A CMOS high-power broadband 260-GHz radi-
ator array for spectroscopy,” IEEE J. Solid-State Circuits, vol. 48, no. 12, pp.
30903104, Dec. 2013.
[59] Y. M. Tousi, O. Momeni, and E. Afshari, “A novel CMOS high-power tera-
hertz VCO based on coupled oscillators: Theory and implementation,” IEEE
J. Solid-State Circuits, vol. 47, no. 12, pp. 30323042, Dec. 2012.
[60] P.-Y. Chiang, Z. Wang, O. Momeni, and P. Heydari, “A 300 GHz frequency
synthesizer with 7.9% locking range in 90 nm SiGe BiCMOS,” in IEEE ISSCC
Dig. Tech. Papers, 2014, pp. 260261.
[61] Taiyun Chi; Jun Luo; Song Hu; Hua Wang, ”A Multi-Phase Sub-Harmonic
Injection Locking Technique for Bandwidth Extension in Silicon-Based THz
Signal Generation,” in Solid-State Circuits, IEEE Journal of , vol.50, no.8,
pp.1861-1873, Aug. 2015
123
[62] O. Momeni, E. Afshari, “High Power Terahertz and Millimeter-Wave Os-
cillator Design: A Systematic Approach,” IEEE Journal of Solid-State Circuits,
March 2011.
[63] R. Han, E. Afshari, “A High-Power Broadband Passive Terahertz Fre-
quency Doubler in CMOS,” in Microwave Theory and Techniques, IEEE Trans-
actions on , vol.61, no.3, pp.1150-1160, March 2013
[64] K. Sengupta and A. Hajimiri, “A 0.28 THz power-generation and beam-
steering array in CMOS based on distributed active,” IEEE J. Solid-State Cir-
cuits, vol. 47, no. 12, pp. 3013-3031, Dec. 2012.
[65] R. Han et al., “Active terahertz imaging using Schottky diodes in CMOS:
Array and 860-GHz pixel,” IEEE J. Solid-State Circuits, vol. 48, no. 10, pp.
2296-2308, Oct. 2013.
[66] R. Al Hadi et al., “A 1 k-pixel video camera for 0.7-1.1 terahertz imaging
applications in 65nm CMOS,” IEEE J. Solid-State Circuits, vol. 47, no. 12, pp.
2999-3012, Dec. 2012.
[67] M. Uzunkol, G. Ozan, D. F. Golcuk, and G. M. Rebeiz, “A 0.32 THz SiGe
4x4 imaging array using high-efficiency on-chip antennas,” IEEE J. Solid-
State Circuits, vol. 48, no. 9, pp. 20562066, Sep. 2013.
[68] L. Gilreath, V. Jain, and P. Heydari, “Design and analysis of a W-band SiGe
direct-detection-based passive imaging receiver,” IEEE J. Solid- State Circuits,
vol. 46, no. 10, pp. 2240-2252, Oct. 2011.
[69] U. R. Pfeiffer et al., “A 0.53 THz reconfigurable source array with up to
1 mW radiated power for terahertz imaging applications in 0.13 µm SiGe
BiCMOS,” in IEEE ISSCC Dig. Tech. Papers, 2014, pp. 256-258.
[70] Y. Tousi and E. Afshari, “A scalable THz 2D phased array with +17 dBm
of EIRP at 338 GHz in 65 nm bulk CMOS,” in IEEE Int. SolidState Circuits
Conf. (ISSCC) Dig. Tech. Papers, 2014, pp. 258259
[71] P. Chevalier, T. Lacave, E. Canderie, A. Pottrain, Y. Carminati, J. Rosa, F.
Pourchon, N. Derrier, G. Avenier, A. Montagne, A. Balteanu, E. Dacquay, I.
Sarkas, D. Celi, D. Gloria, C. Gaquiere, S. P. Voinigescu, A. Chantre, “Scal-
ing of SiGe BiCMOS technologies for applications above 100 GHz,” IEEE
Compound Semiconductor Integrated Circuit Symp., La Jolla, CA, Oct. 2012.
124
[72] M. Schetzen, “The Volterra and Wiener theories of nonlinear systems.”
(1980).
[73] S. Mason, “Power gain in feedback amplifier,” IRE Trans. Circuit Theory,
vol. 1, no. 2, pp. 2025, Jun. 1954.
[74] Razavi, Behzad. “Design of CMOS analog integrated circuits.” Mc-
GrawHill, New York, 2001.
[75] Verspecht, Jan, and David E. Root. “Polyharmonic distortion modeling.”
Microwave Magazine, IEEE 7.3 (2006): 44-57.
[76] J. Verspecht and P. Van Esch, “Accurately characterizing hard nonlinear be-
havior of microwave components with the nonlinear network measurement
system: Introducing nonlinear scattering functions,” in Proc. 5th Int. Work-
shop Integrated Nonlinear Microwave Millimeterwave Circuits, Germany,
Oct. 1998, pp. 1726.
[77] P. Chevalier et al., “Scaling of SiGe BiCMOS technologies for applications
above 100 GHz,” in Proc. IEEE Compound Semiconductor Integrated Circuit
Symp., La Jolla, CA, USA, Oct. 2012.
[78] D. M. Pozar “ Microwave engineering”. John Wiley and Sons., 2009.
[79] Richard C. Li “RF circuit design.” Vol. 90. John Wiley and Sons, 2008.
[80] J. F. Johansson, N. D. Whyborn,“ The diagonal horn as a sub-millimeter
wave antenna,”. IEEE Transactions on Microwave Theory and Techniques.
[81] E. jefors, J. Grzyb, Y. Zhao, B. Heinemann, B. Tillack and U. R. Pfeiffer, “A
820GHz SiGe chipset for terahertz active imaging applications,” 2011 IEEE
International Solid-State Circuits Conference, San Francisco, CA, 2011, pp. 224-
226
[82] Z. Ahmad and K. O. Kenneth, “0.65-0.73 THz quintupler with an on-chip
antenna in 65-nm CMOS,” 2015 Symposium on VLSI Circuits (VLSI Circuits),
Kyoto, 2015, pp. C310-C311
[83] Z. Ahmad, M. Lee and K. K. O, “1.4THz, -13dBm-EIRP frequency mul-
tiplier chain using symmetric- and asymmetric-CV varactors in 65nm
CMOS,” 2016 IEEE International Solid-State Circuits Conference (ISSCC), San
Francisco, CA, 2016, pp. 350-351
125
[84] S. Wold. “Pattern recognition by means of disjoint principal components
models.” Pattern recognition 8.3 : 127-139, 1976.
[85] J. Flusser, and T. Suk. “Pattern recognition by affine moment invariants.”
Pattern recognition 26.1 : 167-174, 1993.
[86] DH. Ballard “Generalizing the Hough transform to detect arbitrary
shapes.” Pattern recognition 13.2 : 111-122, 1981.
[87] J. S. Seo, B. Brezzo, Y. Liu, B. D. Parker, S. K. Esser, R. K. Montoye ... and D. J.
Friedman “A 45nm CMOS neuromorphic chip with a scalable architecture
for learning in networks of spiking neurons”. In Custom Integrated Circuits
Conference (CICC), IEEE (pp. 1-4). IEEE, 2011.
[88] M. Koyanagi, Y. Nakagawa, K. W. Lee, T. Nakamura, Y. Yamada, K. Ina-
mura, ... and H. Kurino “Neuromorphic vision chip fabricated using three-
dimensional integration technology” In Solid-State Circuits Conference, 2001.
Digest of Technical Papers. IEEE (pp. 270-271),2001.
[89] R. W. Hlzel and K. Krischer. “Pattern recognition with simple oscillating
circuits.” New Journal of Physics 13.7 : 073031, 2011.
[90] S. P. Levitan, Y. Fang, D. H. Dash, T. Shibata, D. E. Nikonov, and G. I. Bouri-
anoff “Non-Boolean associative architectures based on nano-oscillators”. In
Cellular Nanoscale Networks and Their Applications (CNNA), 13th International
Workshop on (pp. 1-6). IEEE, 2012
[91] M. Ishikawa, K. Ogawa, T. Komuro and I. Ishii “A CMOS vision chip with
SIMD processing element array for 1 ms image processing”. In Solid-State
Circuits Conference, 1999. Digest of Technical Papers, (pp. 206-207). IEEE,
1999.
[92] J. Liu and M. Brooke, “Fully parallel on-chip learning hardware neural net-
work for real-time control” in Proc. IEEE Int. Symp. Circuits Syst., vol. 5, pp.
371374, 1999
[93] Matsunaga, Shoun, Jun Hayakawa, Shoji Ikeda, Katsuya Miura, Haruhiro
Hasegawa, Tetsuo Endoh, Hideo Ohno, and Takahiro Hanyu. “Fabrication
of a nonvolatile full adder based on logic-in-memory architecture using
magnetic tunnel junctions.” Applied Physics Express 1, no. 9 (2008): 091301.
[94] Dmitri E. Nikonov, and Ian Young. ”Overview of beyond-CMOS devices
126
and a uniform methodology for their benchmarking.” Proceedings of the
IEEE 101, no. 12 (2013): 2498-2533.
[95] M. Sharad, C. Augustine, G. Panagopoulos, and K. Roy “Spin-based neu-
ron model with domain-wall magnets as synapse” Nanotechnology, IEEE
Transactions on, 11(4), 843-853.
[96] V. Q. Diep, B. Sutton, B. Behin-Aein, and S. Datta (2014) “Spin switches for
compact implementation of neuron and synapse. Applied Physics Letters”
104(22), 222405.
[97] S. Datta, S. Salahuddin and B. Behin-Aein. “Non-volatile spin switch for
Boolean and non-Boolean logic.” Applied Physics Letters 101.25 : 252411, 2012
[98] Augustine, Charles, et al. “Ultra-low power nanomagnet-based comput-
ing: a system-level perspective.” Nanotechnology, IEEE Transactions on 10.4 :
778-788, 2011.
[99] R. M. Iraei, P. Bonhomme, N. Kani, S. Manipatruni, D. E. Nikonov, I.
A. Young and A. Naeemi, “Impact of dimensional scaling and size effects
on beyond CMOS All-Spin Logic interconnects”. In Interconnect Technology
Conference/Advanced Metallization Conference (IITC/AMC), 2014 IEEE Interna-
tional (pp. 353-356). IEEE, 2014.
[100] B. Behin-Ain, D. Datta, S. Salahuddin, and S. Datta, “Proposal for an all-
spin logic device with built-in memory, Nature Nanotechnol., vol. 5, no. 4, pp.
266270, 2010
[101] P. Bonhomme, S. Manipatruni, R.M. Iraei, S. Rakheja, Sou-Chi Chang, D.E.
Nikonov, I.A. Young, A. Naeemi “Circuit Simulation of Magnetization Dy-
namics and Spin Transport,” Electron Devices, IEEE Transactions on , vol.61,
no.5, pp.1553,1560, May 2014
[102] C. Augustine, G. Panagopoulos, B. Behin-Ain, S. Srinivasan, A. Sarkar,
and K. Roy, “Low-power functionality enhanced computation architecture
using spin-based devices, in Proc. IEEE/ACM Int. Symp. Nanoscale Arch., pp.
129136, 2011.
[103] D. J. Robinson “An introduction to abstract algebra”. Walter de Gruyter,
2003.
[104] R. Mousavi Iraei, S. Manipatruni, D. E. Nikonov, I. A. Young and Azad
127
Naeemi “Device and Interconnect Co-Optimization for All Spin Logic” To
be submitted to IEEE Transactions on Electron Devices.
[105] C. Augustine, G. Panagopoulos, B. Behin-Aein, S. Srinivasan, A. Sarkar
and K. Roy (2011, June). Low-power functionality enhanced computa-
tion architecture using spin-based devices. In Nanoscale Architectures
(NANOARCH), 2011 IEEE/ACM International Symposium on (pp. 129-
136). IEEE.
[106] T. Yang, K. Kimura and Y. Otani “Giant spin-accumulation signal and
pure spin-current-induced reversible magnetization switching” Nature Phys.
4, 851854 (2008).
[107] A. Brataas, G. E. Bauer, and P. J. Kelly, “Non-collinear magnetoelectronics,
Phys. Rep., vol. 427, no. 4, pp. 157255, 2006.
[108] S.-F. Lee, W. P. Holody, Q. Yang, P. Holody, R. Loloee, P. Schroeder, et al.,
“Two-channel analysis of CPP-MR data for Ag/Co and AgSn/Co multilay-
ers, J. Magn. Magn. Mater., vol. 118, nos. 12, pp. L1L5, 1993.
[109] A. Bratas, G. E. Bauer and P. J. Kelly. ”Non-collinear magnetoelectronics”.
Physics Reports, 427(4), 157-255. 2006
[110] W. Brown, “Thermal fluctuation of fine ferromagnetic particles, IEEE
Trans. Magn., vol. 15, no. 5, pp. 11961208, Sep. 1979.
[111] J. Z. Sun, “Spin-current interaction with a monodomain magnetic body:
A model study, Phys. Rev. B, vol. 62, pp. 570578, Jul. 2000.
[112] M. Beleggia, M. D. Graef, and Y. T. Millev, “The equivalent ellipsoid of a
magnetized body, J. Phys. D, Appl. Phys., vol. 39, no. 5, p. 891, 2006.
[113] S. Rakheja, S.-C. Chang, and A. Naeemi, “Impact of dimensional scaling
and size effects on spin transport in copper and aluminum interconnects,
IEEE Trans. Electron Devices, vol. 60, no. 11, pp. 39133919, Nov. 2013.
[114] J. Wang, H. Meng and J. P. Wang .“Programmable spintronics logic device
based on a magnetic tunnel junction element.” Journal o f applied physics,
97(10), 10D509. 2005.
[115] Z. Taylor, R. S. Singh, D. B. Bennett, P. Tewari, C. P. Kealey, N. Bajwa, M.
O. Culjat, A. Stojaddinovic, H. Lee, J. Hubschman, E. R. Brown and W. S.
128
Grundfest, “THz medical imaging: in vivo hydration sensing,” IEEE Trans.
THz Science and Tech., vol. 1, no. 1, pp. 201-219, Sep. 2011.
[116] B. S. Williams, “Terahertz quantum-cascaded lasers,” Nature Photonic, vol.
1, pp. 517-525, 2007.
[117] N. T. Yardimci, S-H. Yang, C. W. Berry, and M. Jarrahi, “High-power ter-
ahertz generation using large-area plasmonic photoconductive emitters,”
IEEE Trans. THz Science and Tech., vol. 5, no. 2, pp. 223-229, Mar. 2015.
[118] J. H. Booske, R. J. Dobbs, C. D. Joye, C. L. Kory, G. R. Neil, G-S. Park, J.
Park, and R. J. Temkin, “Vacuum electronics high power terahertz sources,”
IEEE Trans. THz Science and Tech., vol. 1, no. 1, pp. 54-75, Sep. 2011.
[119] R. Han, Y. Zhang, Y. Kim, D. Kim, H. Shichijo, E. Afshari, and K. K. O,
“Active terahertz imaging using Schottky diodes in CMOS: Array and 860-
GHz pixel”, IEEE J. Solid-State Circuits, vol. 48, no. 10, pp. 2296-2308, Oct.
2013.
[120] R. Hadi, H. Sherry, J. Grzyb, Y. Zhao, W. Forster, H. M. Keller, A. Cathelin,
A. Kaiser, and U. Pfeiffer, “A 1k-pixel video camera chip for 0.7-1.1 terahertz
imaging applications in 65-nm CMOS,” IEEE J. Solid-State Circuits, vol. 47,
no. 12, pp. 2999-3012, Dec. 2012.
[121] E. Seok, C. Cao, D. Shim, D. J. Arenas, D. B. Tanner, C.-M. Hung, and
K. K. O, “410-GHz CMOS push-push oscillator with a patch antenna,” Intl.
Solid-State Circuits Conf., San Francisco, CA, Feb. 2008.
[122] J. Grzyb, Y. Zhao, and U. Pfeiffer, “A 288-GHz lens-integrated balanced
triple-push source in a 65-nm CMOS technology,” IEEE J. Solid-State Circuits,
vol. 48, no. 7, Jul. 2013.
[123] Y. Tousi and E. Afshari, “A scalable THz 2D phased array with +17dBm
of EIRP at 338GHz in 65nm bulk CMOS”, Intl. Solid-State Circuits Conf., San
Francisco, CA, Feb. 2014.
[124] R. Han and E. Afshari, “A CMOS high-power broadband 260-GHz radi-
ator array for spectroscopy,” IEEE J. Solid-State Circuits, vol. 48, no. 12, Dec.
2013.
[125] P. Chevalier, T. Lacave, E. Canderie, A. Pottrain, Y. Carminati, J. Rosa, F.
Pourchon, N. Derrier, G. Avenier, A. Montagne, A. Balteanu, E. Dacquay, I.
129
Sarkas, D. Celi, D. Gloria, C. Gaquiere, S. P. Voinigescu, A. Chantre, “Scal-
ing of SiGe BiCMOS technologies for applications above 100 GHz,” IEEE
Compound Semiconductor Integrated Circuit Symp., La Jolla, CA, Oct. 2012.
[126] K. Shmalz, R. Wang, J. Borngraber, W. Debski, W. Winkler, and C. Meliani,
“245 GHz SiGe transmitter with integrated antenna and external PLL,” IEEE
Intl. Microwave Symp., Seattle, WA, Jun. 2013.
[127] C. A. Brau, Modern Problems in Classical Electrodynamics, New York, NY:
Oxford University Press, 2004.
[128] D. M. Pozar, Microwave Engineering, Third Edition, New York: John Wiley
& Sons, Inc., 2005.
[129] K. Sengupta and A. Hajimiri, “A 0.28 THz power-generation and beam-
steering array in CMOS based on distributed active radiators,” IEEE J. Solid-
State Circuits, vol. 47, no. 12, pp. 3032-3042, Dec. 2012.
[130] C. Mao, C. Nallani, S. Sankaran, E. Seok, and K. K. O, “125-GHz diode
frequency doubler in 0.13-µm CMOS”, IEEE J. Solid-State Circuits, vol. 44,
no. 5, pp. 1531-1538, May 2009.
[131] D. Shim, C. Mao, S. Sankaran, and K. K. O, “150 GHz complementary
anti-parallel diode frequency tripler in 130 nm CMOS”, IEEE Microwave and
Wireless Components Letters, vol. 21, no. 1, pp. 43-45, Jan. 2011.
[132] G. P. Gaunthier, S. Raman, and G. M. Rebeiz, “A 90-100 GHz double-
folded slot antenna”, IEEE Trans. Antennas and Propagation, vol. 47, no. 6,
pp. 1120-1122, Jun. 1999.
[133] High Frequency Structure Simulator (HFSS) User Guide, ANSYS Inc. [On-
line]. Available: http://www.ansys.com/.
[134] F. Friederich, W. von Spiegel, M. Bauer, F. Meng, M. D. Thomson, S. Bop-
pel, A. Lisauskas, B. Hils, V. Krozer, A. Keil, T. Loffler, R. Henneberger, A. K.
Huhn, G. Spickermann, P. H. Bolivar, and H. G. Roskos, “THz active imag-
ing systems with real-time capabilities”, IEEE Trans. THz Sci. and Tech., vol. 1,
no. 1, pp. 183-200, Sep. 2011.
[135] J. W. Mink, “Quasi-optical power combining of solid-state millimeter-
wave sources,” IEEE Trans. Microw. Theory Tech., vol. 34, no. 2, pp. 273-279,
Feb. 1986.
130
[136] D. F. Filipovic, S. S. Gearhart, G. M. Rebeiz, “Double-slot antennas on
extended hemispherical and elliptical silicon dieletric lenses,” IEEE Trans.
Microw. Theory Tech., vol. 41, no. 10, pp. 1738-1749, Oct. 1993.
[137] H. T. Friis, Proc. IRE, vol. 34, no. 5, pp. 254-256, May 1946.
[138] P. Y. Chiang, Z. Wang, O. Momeni, and P. Heydari, “A 300GHz Frequency
Synthesizer with 7.9% Locking Range in 90nm SiGe BiCMOS”, Intl. Solid-
State Circuits Conf., San Francisco, CA, Feb. 2014.
[139] J. G. Linvill, “Transistor Negative-Impedance Converters,” Proc. IRE, Vol.
41, No. 6, pp. 725-729, Jun 1953.
[140] S. E. Sussman-Fort, R. Rudish, “Non-Foster Impedance Matching of
Electrically-Small Antennas,” IEEE Trans. Antennas Propag., Vol. 57, No.
8, pp. 22302241, 2009.
[141] S. Saadat, M. Adnan, H. Mosallaei, and E. Afshari, “Composite Metama-
terial and Metasurface Integrated With Non-Foster Active Circuit Elements:
A Bandwidth-Enhancement Investigation,” IEEE Trans. Antennas Propag.,
Vol. 61, No. 3, pp. 1210-1218, 2013.
[142] N. Zhu, R. W. Ziolkowski, “Active Metamaterial-Inspired Broad-
Bandwidth, Efcient, Electrically Small Antennas, ” IEEE Microw. Wireless
Compon. Lett. Vol. 10, pp. 1582-1585, 2011.
[143] C. R. White, J. W. May, J. S. Colburn, “A Variable Negative-Inductance
Integrated Circuit at UHF Frequencies, ” IEEE Microw. Wireless Compon.
Lett. Vol. 22, No. 1, pp. 35-37, 2012.
[144] S. D. Stearns, “Non-Foster Circuits and Stability Theory, ” in Proc. IEEE.
Ant. Prop. Int. Symp., 2011, pp. 1942-1945.
[145] D. J. Gregoire, C. R. White, and J. S. Colburn, “Wideband Artificial Mag-
netic Conductors Loaded With Non-Foster Negative Inductors,” IEEE Mi-
crow. Wireless Compon. Lett., Vol. 10, No. 1, pp. 1586-1589, 2011.
[146] B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill Ed-
ucation, 2000.
[147] Spectre RF Simulator, Ver. 12.1.1., San Jose, CA, 2013.
131
[148] T. P. Weldon, K. Miehle, and R. S. Adams, “A Wideband Microwave
Double-Negative Metamaterial With Non-Foster Loading.” Southeastcon,
2012 Proc. of IEEE, 15-18 March 2012.
132
