Design and demonstration of integrated micro-electro-mechanical relay circuits for VLSI applications by Fariborzi, Hossein
Design and Demonstration of Integrated Micro-Electro-
Mechanical Relay Circuits for VLSI Applications
by
Hossein Fariborzi
B.S. in Electrical Engineering, Sharif University of Technology, Iran,
2006
M.S. in Electrical Engineering, University of Malaya, Malaysia, 2008
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of A
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2013
© Massachusetts Institute of Technology 2013. All rights reserved.
Signature of Author..................
Department of Electrical
~1rruTE~
~~~~~~1
. ...........
Engineering and Computer Science
d May 22, 2013
Certified by ........... . . . . . . . . . . .
Vladimir M. Stojanovid
Emanuel E. Landsman Associate Professor of Electrical Engineering
Thesis Supervisor
Accepted by ....
0 Q Leslie A. Kolodziejski
Chair, Department Committee on Graduate Students

Design and Demonstration of Integrated Micro-Electro-Mechanical
Relay Circuits for VLSI Applications
by
Hossein Fariborzi
Submitted to the Department of Electrical Engineering and Computer
Science on May 22, 2013, in partial fulfillment of the requirements for
the degree of Doctor of Philosophy
Abstract
Complementary-Metal-Oxide-Semiconductor (CMOS) feature size scaling has resulted in
significant improvements in the performance and energy efficiency of integrated circuits in
the past 4 decades. However, in the last decade and for technology nodes below 90 nm, the
scaling of threshold and supply voltages has slowed, as a result of subthreshold leakage, and
power density has increased with each new technology node. This has forced a move toward
multi-core architectures, but the energy efficiency benefits of parallelism are limited by the
sub-thresahold leakage and the minimum energy point for a given function. Avoiding this
roadblock requires an alternative device with more ideal switching characteristics. One
promising class of such devices is the electro-statically actuated micro-electro-mechanical
(MEM) relay which offers zero leakage current and abrupt turn-on behavior. Although a
MEM relay is inherently slower than a CMOS transistor due to the mechanical movement, we
have developed circuit design methodologies to mitigate this problem at the system level.
This thesis explores such design optimization techniques and investigates the viability of
MEM relays as an alternative switching technology for very-large scale integration (VLSI)
applications.
In the first part of this thesis, the feasibility of MEM relays for power management
applications is discussed. Due to their negligibly low leakage, in certain applications, chips
utilizing power gates built with MEM relays can achieve lower total energy than those built
with CMOS transistors. A simple comparative analysis is presented and provides design
guidelines and energy savings estimates as a function of technology parameters, and
quantifies the further benefits of scaled relay designs. We also demonstrate a relay chip
successfully power-gating a CMOS chip, and show a relay-based pulse generator suitable for
3
self-timed operation.
Going beyond power-gating applications, this work also describes circuit techniques and
trade-offs for logic design with MEM-relays, focusing on multipliers which are commonly
known as the most complex arithmetic units in a digital system. These techniques leverage
the large disparity between mechanical and electrical time-constants of a relay, partitioning
the logic into large, complex gates to minimize the effect of mechanical delay and improve
circuit performance. At the component design level, innovations in compressor unit design
minimize the required number of relays for each block and facilitate component cascading
with no delay penalty. We analyze the area/energy/delay trade-offs vs. CMOS designs, for
typical bit-widths, and show that scaled relays offer 10-20x lower energy per operation for
moderate throughputs (<10-100MOPS). In addition to this analysis, we demonstrate the
functionality of some of the most complex MEM relay circuits reported to date.
Finally, considering the importance of signal generation and transmission in VLSI systems,
this thesis presents MEM relay-based I/O units, focusing on design and demonstration of
digital to analog converters (DAC). It also explores the concept of faster-than-mechanical-
delay signal transmission.
Thesis Supervisor: Vladimir M. Stojanovid
Title: Emanuel E. Landsman Associate Professor of Electrical Engineering
4
Acknowledgments
One of the most exciting and fruitful chapters of my life is coming to an end with the
submission of this thesis. The memories of my five years at MIT will always stay with me and
make me smile, whenever and wherever I recall them. Here, I would like to thank the people
who created those beautiful memories.
First and foremost, I would like to express my deepest gratitude to my advisor, Professor
Vladimir Stojanovid for teaching me the true meaning of MIT's motto, Mens et Manus. The
luck has followed me since the day I stopped by your office in the second floor of building 38
and asked you to hire me. Thank you for being such an exceptionally helpful and supportive
teacher and friend during all these years. I simply cannot thank you enough!
I would also like to thank Professor Anantha Chandrakasan and Professor Dana Weinstein for
serving on my thesis committee. Professor Chandrakasan has been a great inspiration for me
and shared with me his wealth of knowledge on different aspects of circuit design. His
insightful advices during my research qualifying examination, and every meeting we have had
afterwards, have helped me find the right path to educational success. Professor Weinstein has
helped me get a better grasp of device physics and opportunities and challenges in the MEMS
world, especially in the field of device scaling.
I would also like to thank all of our collaborators on the MEM relay project, especially
Professor Tsu-jae King Liu, Professor Elad Alon and Professor Dejan Markovid who have
been instrumental in the success of this project. Chengcheng and Matt, your sense of humor
and great taste of music made those sleepless and frantic testing nights a memorable
experience. Louis, thank you for helping me get access to DCL and patiently setting up the
test station for me during the summer of 2012. As a member of the MEM relay project, I
would like to appreciate the support of our sponsors as well.
I would also like to thank all of my friends in the members of Integrated Systems Group (ISG)
for creating such a fun work environment. I'm especially grateful to Fred Chen, who has been
a great friend and mentor from day one. The best thing I have learned from you is not to say
"It doesn't work," ever. I would like to thank people behind the scene as well, especially Nora
Luongo, the administrative assistant of our group.
I would like to take the opportunity to thank my Master's thesis supervisor at University of
5
Malaya, Malaysia, for helping me set foot in the world of academic research, and for literally
"forcing" me to apply for MIT graduate school a few hours before the application deadline.
I'm grateful for the friends and family I have, in the US and in my birth place, Iran. Amirreza
and Morteza, you have no idea how much our hot and never-ending debates about almost
everything, such as Esteghlal vs. Perspolis, or iPhone vs. Galaxy, have helped in relieving me
from the inevitable stress of the school. I'm grateful to both of you for that. Sima, Faezeh,
Amir, Mahsa, Marzieh and Hamed: your friendship has been a great blessing for me. I would
like to thank my family, especially my parents, whose enduring love and sacrifices have
helped me become the person I am today. I am forever indebted to you. Last but definitely not
least, I'd like to express my gratitude to my wife, Mahjubeh, for her unconditional love and
support during our seven years of marriage. You have been there for me every step of the way
and we both know that none of this would have been possible without you. This thesis is
dedicated to you.
6
Contents
1 Introduction ..........................................................................................................-.------... 21
1.1 Crisis in the Chip Industry ..................................................................................... 21
1.2 The Relay Reborn .................................................................................................. 23
1.3 Thesis Contributions............................................................................................. 24
1.3.1 Energy Management in Integrated Circuits with MEM Relays ............ 25
1.3.2 Complex Digital Circuit and System Design with MEM Relays...........25
1.3.3 Comparison of Energy/Performance/Area trade-offs for MEM relay and
C M O S designs................................................................................................................25
1.4 Thesis Overview .................................................................................................... 26
2 MEM Relay Background................................................................................................29
2.1 MEM Relay Structure and Fabrication ...................................................................... 30
2.2 MEM Relay Operation...............................................................................................31
2.3 MEM Relay Modeling ............................................................................................... 33
2.3.1 Pull-in and Pull-out Voltages ......................................................................... 33
2.3.2 Mechanical Delay ........................................................................................... 36
2.3.3 Electrical Modeling of the MEM Relay ............................................................. 37
2.4 Verification of the Feasibility of MEM Relay VLSI Systems................40
7
2.4.1 Inverter and Voltage Transfer Curve............................................................. 41
2.4.2 Basic Logic .................................................................................................... 42
2.4.3 M em ory and Latching Units.......................................................................... 43
2.4.4 Clocking Units................................................................................................ 46
2.5 The Evolution of M EM Relays..................................................................................47
2.6 Sum m ary....................................................................................................................51
3 M EM Relay Power G ating ................................................................................................ 53
3.1 Power Gating Background..................................................................................... 53
3.2 Energy-efficiency A nalysis and Comparisons...................................................... 53
3.3 Experim ental Dem onstration ................................................................................ 61
3.4 Sum m ary....................................................................................................................63
4 Com plex Logic D esign with M EM Relays........................................................................65
4.1 Basic M EM Relay Logic Design ............................................................................... 66
4.1.1 M EM Relay as a Logic Elem ent ........................................................................ 66
4.1.2 M EM Relay Logic Design Paradigm ................................................................. 66
4.2 M EM Relay M ultiplier D esign .................................................................................. 70
4.2.1 M icroarchitecture of M EM Relay M ultiplier ..................................................... 71
4.2.2 M EM Relay M ultiplier Com ponents Design ..................................................... 73
4.2.3 M ultiplier Design Trade-offs.......................................................................... 81
4.3 Energy/delay Estimates of MEM Relay vs. CMOS Multipliers...............84
4.4 Experim ental Dem onstration and Practical Issues.................................................. 88
8
4.4.1 Reliability, Contact Oxidation and Oxide Breaking Procedure ..................... 88
4.4.2 The (7:3) Com pressor Test Results ............................................................... 91
4.5 Sum m ary .................................................................................................................... 93
5 M EM Relay I/O D esign.................................................................................................. 95
5.1 D igital to A nalog Converter .................................................................................. 95
5.2 Sub-m echanical Delay Signal Transm ission ........................................................ 97
5.3 Sum m ary....................................................................................................................99
6 Device and Circuit Design Challenges and Future Prospects ...................................... 101
6.1 Device Engineering Challenges...............................................................................101
6.1.1 Lim its of M EM Relay Scaling ......................................................................... 101
6.1.2 Contact Engineering and Device Reliability .................................................... 104
6.2 Future Prospect of M EM Relay Circuit D esign.......................................................105
6.3 Sum m ary..................................................................................................................107
7 Conclusions ....................................................................................................................... 109
Appendix A: Microarchitecture of 16-bit Multipliers..........................................113
Bibliography..........................................................................................................................115
9
10
List of Figures
Figure 1-1: The trends of threshold and supply voltage scaling [7] (a) and the predicted
increase in power density in Intel microprocessors [8] (b) ................................................. 22
Figure 1-2: Minimum energy per operation problem and the limits of parallelism for
CMOS. Normalized dynamic, leakage and total energy/op versus supply voltage for a
given CMOS functional unit (a). Parallelism enables lower energy per operation at the
same throughput or more throughput for the same energy/op (b)......................................... 22
Figure 1-3: Zuse Z3, the first general-purpose, electrically powered digital computer
[1 5 ] ........................................................................................................................................... 2 3
Figure 2-1: 3D and cross-section views of a 4-terminal folded flexure (crab leg) MEM
rela y .......................................................................................................................................... 3 0
Figure 2-2: Major MEM relay fabrication steps, using the processes that were originally
developed to build CM O S chips [16].................................................................................. 32
Figure 2 4: A lumped-parameter electro-mechanical model of a MEM relay as a
dynamic parallel-plate capacitor and mass-spring-damper system...................................... 34
Figure 2-5: Electrical model of the folded spring (crab leg) MEM relay [9]............. 37
Figure 2-6: Simplified electrical model of the "pulled-in" MEM relay ................ 40
Figure 2-7: Die photo of the first MEM relay test chip (CLICKR1) [11]............... 41
Figure 2-8: An ideal relay inverter (a), the modified inverter with grounded body
terminals (b), an XOR gate (c), and the VTC of the modified inverter ................................ 42
Figure 2-9: MEM relay based carry generation (PGK) circuit and measured waveform
demonstrating operation in propagate, generate and kill modes. .......................................... 43
11
Figure 2-10: Design and demonstration of MEM relay latch; The optimal MEM relay
latch with one mechanical delay (a), the reconfigured pseudo-NMOS style latch used
for functionality measurement (b), measurement results, illustrating the functionality of
the latch in both transparent and opaque states. .................................................................... 44
Figure 2-11: The schematic (a) and operation (b) of a 10-bit MEM relay-based NAND-
style DRAM column. Simultaneous read and write of a single bit in the last memory
c e ll. ........................................................................................................................................... 4 5
Figure 2-12: The schematic and measured waveforms of a pseudo N-relay oscillator.
These waveforms were used to estimate load capacitance (55pF), device on-resistance
(lkQ), and "weak overdrive" mechanical delay (34ps)........................................................ 46
Figure 2-13: MEM relay device evolution: layout and SEM ................................................... 48
Figure 2-14: Evaluating the ambipolar operation of the 1st generation MEM relay. (a)
Bias configuration for both experiments, (b) I-V curves show that both Vi and V,0 are
much higher for the case of SiGe gate and tungsten body. Data courtesy of Rhesa
N ath anael [22].......................................................................................................................... 4 9
Figure 2-15: Parasitic electrostatic force between the gate and the source/drain results
in the dependence of the pull-in and pull-out voltages on the drain voltage......................... 49
Figure 2-16: The new layout for the 4T MEM relay enables improved ambipolar
actuation (a) and Vd independent pull-in and pull-out voltages. ......................................... 50
M easured data is indicated with an asterisk ......................................................................... 52
Figure 3-1: MOSFET (a) and MEM relay (b) power gates. a and a, are the average and
peak activity factors of the CMOS logic, f is the operation frequency and CL the CMOS
logic load capacitance. P is the ratio of supply capacitance (at VDD) and CMOS logic
load capacitance, while y and E are the ratios of drain and gate capacitances of
MOSFET (Cm) and MEM relay (CR) headers, respectively [33]. ............................................ 54
Figure 3-2: Energy ratio vs. Tff , for various Ton, for designs power gated with (a) 90nm
12
MOSFETs and 2 generation 4T MEM relays, and (b) 90nm MOSFETs and CLICKR3
scaled MEM Relays. Here, a and a, are 0.1 and 0.5, respectively, f= 1GHz, p = 5,
y = 1 and c = 0 .1 [33]. .............................................................................................................. 57
Figure 3-3: Required Tff for specific energy gains of relay power gating over
MOSFET power gating for different CMOS processes (T0n=10ps) [33].............................. 58
Figure 3-4: Energy gain vs Tfg/Tn for designs with MEM relay vs. MOSFET gating,
for MOSFET power gates implemented in different CMOS processes. (a) V10 = VEXT,M,
and (b) V10 = VNOM-------------------------------------- ----.......................................................................... 59
Figure 3-5: Maximum logic current density as a function of MOSFET power-gate
technology node and MEM relay device pitch. For MOSFET power gates, cases with
power-gate device area overhead of 1% and 10% of the design area are shown. MEM
relay power gates are assumed to be fabricated in the backend metallization layers, and
thus incur no active area penalty [33].................................................................................. 60
Figure 3-6: Measured waveforms for MEM relay-gated CMOS chip [4]: MEM
Relay gate voltage (top), supply voltage of the CMOS chip (middle), and
synchronization signal from the CMOS chip (bottom). ....................................................... 61
Figure 3-7: Test setup for self-driven and external power gate pulse generation for
the relay gating switch illustrated in Figure 3-1.................................................................. 62
Figure 3-8: Measured waveforms for MEM relay timer-gated CMOS chip [35].......... 63
Figure 3-9: Packaged MEM relay die photo and relay device micrograph.............. 63
Figure 4-1: M EM Relay as a logic element.............................................................................. 66
Figure 4-2 Comparison of CMOS (a) and MEM relay (b) logic styles................. 67
Figure 4-3: Delay of cascaded 2-input AND gates in series, implemented using relays
in both static and pass gate logic styles ................................................................................ 68
Figure 4-4: Mechanical vs. stack electrical delay for second generation 4T, third
generation 6T and predictive 90nm M EM relays ..................................................................... 69
13
Figure 4-5: A 6-bit M ultiplication operation........................................................................ 71
Figure 4-6: 6-bit relay multiplier built with (a) half- and full-adders, and (b) (N:3)
co m pressors ................................................................................................ :............................. 7 2
Figure 4-7: Partial products matrix generation: (a) MEM relay AND network, (b) Booth
encoding algorithm table, (c) MEM relay Booth encoded partial products generation ........... 74
Figure 4-8: Implementation of MEM relay (a) half adder and (b) full adder [8].......... 75
Figure 4-9: Implementation of (a) a two input XOR/XNOR gate and (b) the Yo sub-
circuit (c) optimized Yo sub-circuit with signals on body terminals.................................... 77
Figure 4-10: Steps toward implementation of the Y sub-circuit of a complementary
(7:3) compressor: (a) propagation paths for Y1 of the (5:3) compressor, (b) YJ of the
(5:3) compressor, (c) Y1 of the single output (7:3) compressor and (d) YJ of the full
com plem entary (7:3) com pressor ......................................................................................... 78
Figure 4-11: Steps toward implementation of the Y2 sub-circuit of a (7:3) compressor:
(a) propagation paths for Y2 of the (5:3) compressor, (b) Y2 of the (5:3) compressor, (c)
Y2 of the (7:3) compressor, (d) Y2 of the (7:3) CMOS compressor [42]. ............................. 79
Figure 4-12: (a) half-adder and (b) full-adder built with the 6T MEM relay............. 81
Figure 4-13: The (7:3) compressor built with the 6T MEM relay ........................................... 82
Figure 4-14: M EM relay multiplier design trade-offs .............................................................. 83
Figure 4-15: Energy/throughput comparison of CMOS and predictive (4T and 6T)
M EM relay 16-bit m ultipliers................................................................................................ 86
Figure 4-16: Comparison of energy/throughput figures for various MEM relay
multiplier implementations. The circuits are parallelized to achieve the same area. ........... 87
Figure 4-17: Native oxide breaking procedure for a single MEM relay ................ 88
Figure 4-18: An example of oxide breaking procedure for Y2 of the 6T relay (7:3)
compressor. (a) Observing the output to find an active path (base), and (b) selecting a
stack with only two unshared relays with the base. By increasing the source-drain
14
v oltage of th is ........................................................................................................................... 90
Figure 4-19: Die photo and experimental results of the 2 nd generation 4T (7:3)
com pressor................................................................................................................................ 92
Figure 4-20: Die photo and experimental results of the scaled 6T (7:3) compressor ........... 93
Figure 5-1: Sample MEM relay DAC topology, schematic and equivalent circuit ........ 96
Figure 5-2: MEM Relay 2-bit, thermometer coded DAC design and test results .......... 97
Figure 5-3: Proposed serializer/deserializer diagrams and working principles..................... 98
Figure 6-1: Average FA (with standard deviation indicated) versus the contact dimple
area [13 , 5 6 ]............................................................................................................................ 10 3
Figure 6-2: Measured and predicted endurance of MEM relay vs. I/VDD ------------................. 104
Figure 6-3: RON evolution with the number of hot-switching cycles [59] (Vds=4V,
f= 5 k H z)................................................................................................................................... 10 5
Figure 6-4: M EM Relay design infrastructure ....................................................................... 106
Figure A- 1: Microarchitecture of a 16-bit multiplier built with Half- and Full-adders ......... 113
Figure A-2: Microarchitecture of a 16-bit multiplier built with (N:3) compressors .............. 114
15
16
List of Tables
Table 2-1: Scaled and current MEM relay device model parameters .................. 52
Table 3-1: CMOS technology parameters from standard and predictive models [33-34].......55
Table 4-1: Logic function and truth table of an (N:3) compressor.......................................76
Table 4-2: Different implementations of the 16-bit MEM relay multiplier ............. 85
Table 6-1: Constant Field Scaling for M EM relays ............................................................... 102
17
18
Acronyms
ADC analog-to-digital converter
ALU arithmetic logic unit
DAC digital-to-analog converter
DRC design rule checker
IC integrated circuit
LSB least significant bit
LVS layout versus schematic
MEM micro-electro-mechanical
MEMS micro-electro-mechanical system
MOSFET metal-oxide-semiconductor field-effect-transistor
MSB most significant bit
MUX multiplexer
NMOS N-type metal-oxide-semiconductor field-effect-transistor
PMOS P-type metal-oxide-semiconductor field-effect-transistor
poly-SiGe Polycrystalline silicon-germanium
RAM random access memory
RF radio frequency
Ru Ruthenium
VTC voltage transfer characteristic
VLSI very-large-scale integration
19
20
CHAPTER 1
Introduction
1.1 Crisis in the Chip Industry
In the past few decades, CMOS technology scaling has resulted in drastic improvements in
energy efficiency, cost per function and performance of integrated circuits. But after enjoying
years of progress, the improvement in energy efficiency has begun to slow down. The
problem is that transistors are not perfect switching devices and leak current, even when they
are supposedly off. Especially in sub-100-nm regime, the increasing proportion of sub-
threshold leakage current, attributed to its dependence on the non-scalable thermal voltage,
has slowed the scaling of the threshold voltage. In order to retain performance, scaling of the
supply voltage has slowed as well (Figure 1-la). As the energy per operation has scaled
slowly with each new technology node, power density has significantly increased and this
increase has overshadowed the performance benefits of transistor scaling, Figure 1- b,
Figure 1-2a illustrates the energy per operation versus supply voltage for a given CMOS
functional unit and technology. This block, like every other CMOS functional unit, has a well-
defined minimum energy point [1] at which an incremental decrease in leakage current is
exactly offset by a corresponding loss of performance and vice versa. For applications
requiring throughputs below the throughput of this point, the threshold voltage can be scaled
to maintain the minimum energy. For higher throughputs, the only available solution is
moving towards multi-core processing and parallel circuit design. The idea behind
parallelism, illustrated in Figure 1-2b, is to allow each functional block to operate at a lower
energy point, i.e. lower supply voltage (VDD), and parallelize multiple functional units to
21
10
% 1000
.. VT 100
A10
A AANA&k~k AA
0.1 o.of '10.1 1 0 1 0 19
Gate Length (mm) 1970 1980 990
(a) (b)
Figure 1-1: The trends of threshold and supply voltage scaling [7]
increase in power density in Intel microprocessors [8] (b)
1000
100
10
2000 2010
(a) and the predicted
regain performance. The limit to which parallelism can improve energy efficiency is
determined by CMOS subthreshold leakage and the minimum energy point for a given
function. As a result, there is a lower bound to the energy consumption per operation,
regardless of how slow the circuits are allowed to run. Correspondingly, for high performance
applications, there exists an upper bound on throughput for a given power budget, regardless
0.
C 10
±J9
0 8
z
0.1 0.2 0.3 0.4 0.5
(a) VDD (V)
0 1 2 3 4 5 6
() 1/throughput
Figure 1-2: Minimum energy per operation problem and the limits of parallelism for CMOS.
Normalized dynamic, leakage and total energy/op versus supply voltage for a given CMOS
functional unit (a). Parallelism enables lower energy per operation at the same throughput or
more throughput for the same energy/op (b).
22
25
0.
0
'3 20
015
10
E &- 50
z
of the amount of parallelism.
1.2 The Relay Reborn
A switch with significantly improved leakage characteristics (i.e., steeper sub-threshold slope
and lower off-state current) may achieve significant improvements in energy efficiency over
CMOS. Many researchers have therefore been exploring new switching device concepts to
achieve sub-threshold slopes steeper than the limit set by kBT/q in field-effect or bipolar-
junction transistors [2-5]. However, many of these devices achieve sharp sub-threshold slope
over only a limited range of supply voltage, leading to relatively poor on-to-off current ratios
and/or very low on-state current at low supply voltages. Also, any CMOS-like technology will
have a lower limit on energy per operation due to non-zero IOFF. One important observation in
this context is that leakage can be eliminated altogether with a switch that simply lacks a
physical pathway for electricity to flow through when it is off. One of the earliest examples of
such a switch, the electro-mechanical relay, was invented in 19th century by Joseph Henry. In
5cm
Figure 1-3: Zuse Z3, the first general-purpose, electrically powered digital computer [15]
23
1941, about 2000 of such relays were used to build Germany's Zuse Z3, the very first
electrically powered fully automatic computing machine (Figure 1-3) [6]. The relay era in the
world of digital computers was cut short by the invention of more efficient and faster vacuum
tube and transistor computers [15], but decades of advances in CMOS micro-fabrication
techniques have paved the way for the rebirth of relays in a smaller, faster and more energy-
efficient form.
Micro-electro-mechanical (MEM) relays have recently been proposed as a promising
alternative class of devices for digital applications and a solution to overcome the minimum
energy limitations of CMOS circuits, because they offer virtually zero off-state current, on-to-
off current ratios of ten orders of magnitude and very abrupt switching slope [9]-[13]. These
relays, fabricated at UC Berkeley and Sematech, are 4- or 6-terminal devices that are electro-
statically actuated, but functionally similar to CMOS transistor switches. Although the
mechanical movement makes relays slower than CMOS transistors, we have developed a
MEM-relay catered circuit design methodology to narrow the performance gap at the circuit
and system level, rather than the device level.
1.3 Thesis Contributions
This thesis focuses on addressing the energy efficiency crisis of the CMOS designs by
examining the potential of MEM relays as an attractive alternative switching device, mostly in
digital design applications, and developing a circuit and system design methodology for
implementation of complex circuits with MEM relays. It also explores the design space in
which MEM relay implementations can compete with or outperform mainstream CMOS
counterparts.
It should be noted that the development of MEM relay technology, its circuit design,
simulation and testing infrastructure is a multi-university collaboration of many students,
researchers and faculty members. The following are specific contributions of this thesis
24
towards the goals of this project:
1.3.1 Energy Management in Integrated Circuits with MEM Relays
This thesis investigates the use of MEM relays for energy management applications such as
power gating. Due to their inherently negligible leakage, in certain applications, chips
utilizing power gates built with MEM relays can achieve lower total energy than those built
with CMOS transistors. In addition to experimental demonstration of MEM relay power
gating, this thesis offers a comprehensive analysis that provides design guidelines and energy
savings estimates in various scenarios as a function of technology parameters, and quantifies
the further benefits of MEM relay scaling.
1.3.2 Complex Digital Circuit and System Design with MEM Relays
A major contribution of this thesis is developing circuit design techniques to mitigate the
relatively slow operation of MEM relays at the system level and promote relay-based complex
digital systems as an energy-efficient alternative to CMOS-based systems. To reach that goal,
we combine various optimization techniques, at micro-architecture and building blocks levels,
to achieve minimum cumulative mechanical delay and hence maximum performance. These
techniques are used to design multipliers which are commonly known as the most complex
arithmetic blocks in a VLSI system. In addition to design and analysis, this thesis
experimentally demonstrates the functionality of the largest MEM relay based circuits
reported to date.
1.3.3 Comparison of Energy/Performance/Area trade-offs for MEM relay
and CMOS designs
For each circuit/system explored in this thesis, different trade-offs are investigated, MEM
relay and CMOS implementations are compared and the potential design spaces in which
MEM relay implementation surpass CMOS designs are highlighted.
25
1.4 Thesis Overview
Chapter 2: MEM Relay background
This chapter describes the general structure and operation of MEM-relays. It also provides the
electromechanical model of MEM relays and the corresponding technology parameters that
are used for simulation and analysis in this thesis. It also discusses our early steps toward
verification of the feasibility of the relay technology and includes experimental demonstration
of several basic circuit components in our first test chip.
Chapter 3: MEM Relay Power Gating
As mentioned earlier, the zero off-state current of MEM-relays, resulting from the physical
separation of the channel from source and drain, makes them appealing for low power VLSI
design and also power gating applications, considering the fact that the finite IoIoff ratio of
MOSFET power gates limits their ability to reduce off-state leakage. However, some non-
idealities of MEM-relay power gates, such as increased switching energy and voltage droop
due to relatively large device dimensions and/or operating voltages and on-state resistance,
justify the need for an analysis to predict the conditions under which MEM relays can achieve
energy savings over MOSFETs for power gates. Chapter 3 is dedicated to this analysis, and
also experimental demonstration of MEM-relay pulse generation and power gating.
Chapter 4: Complex Logic Design with MEM Relays
In Chapter 4, principles of complex logic design with MEM-relays are described. Despite the
nearly ideal I-V characteristics, a relay built in 90nm process node can take anywhere from 1
to IOns to switch - a 100 to 1000 times as long as a CMOS gate built in the same process
node [16]. Although the large mechanical delay of MEM relays suggests that MEM relay
circuits would have very poor performance, we have proposed circuit architectures that
significantly mitigate this by implementing logic as large, complex gates that minimize the
26
number of mechanical delays on the critical path. In this chapter, MEM-relay circuit design
principles and some area-optimization techniques at the micro-architecture level are utilized
to build multipliers which are commonly known as the most complex arithmetic blocks of a
VLSI system. MEM-relay specific circuit design techniques developed in this thesis
significantly improve the critical path (e.g. 5x for 32-bit multipliers), compared to CMOS-like
pass-gate logic.
To quantify the impact of device topology and dimensions on MEM circuits, scalable model
of the MEM relay behavior described in [13] and [14] is used to drive a comparison between
the predicted capabilities of scaled MEM relays and CMOS in a modem technology node. A
comparison of optimized multipliers built with CMOS and MEM relays indicates that despite
their large mechanical delay, MEM relays can achieve energy-delay characteristics that are
nearly an order of magnitude better than CMOS over a wide range of frequencies, assuming
area can be traded for improved energy-efficiency. This further validates the notion that MEM
relays are viable candidates for complex energy-efficient digital integrated circuits.
Chapter 5: MEM Relay I/O Design
In addition to memory, power management, logic and processing units, a complete system
requires a way to interface to analog inputs and outputs, or I/O. Although the abrupt switching
behavior of the relays is a key advantage for energy efficiency in digital applications, it makes
the implementation of energy-efficient, high speed I/O, which consists of analog and mixed
signal blocks, challenging. In this chapter we focus on the implementation of digital-to-analog
(DAC) for the purposes of signal transmission, and sub-mechanical delay transmission
circuits as a potential solution for improving the transmission speed beyond the switching
speed of MEM-relays.
Chapter 6: Device and Circuit Design Challenges and Future Prospects
27
Scaling of MEM relays is crucial for improvement of device density and speed and reduction
of operating voltage and energy levels. This chapter discusses some challenges in the way of
MEM relay scaling, reliability enhancement and also some practical considerations for
successful design and experimental demonstration of the relay-based circuits.
Chapter 7: Conclusions
The last chapter reviews the analytic and experimental results of MEM relay circuits explored
in this work and, through comparison with CMOS implementations, highlights applications
for which MEM relay implementations can significantly improve energy efficiency. This
chapter also discusses the prospect of MEM relays in the IC design world, the current
roadblocks and challenges of both device technology and circuit design, and the future work
in this field.
28
CHAPTER 2
MEM Relay Background
In general terms, a Micro-electro-mechanical (MEM) relay can be described as a miniature
switch in which the electrical connection is completed by an electrically actuated mechanical
displacement. The advances in microfabrication and MEMS technology provide plenty of
ways to go about designing such a switch [10-12, 17-21]. In 2001 and in one of the early
efforts, researchers demonstrated latching micro magnetic relay which is based on preferential
magnetization of a permalloy cantilever in a permanent external magnetic field. They showed
that a cantilever made of an iron nickel alloy could be drawn down to complete a circuit by
running current through a nearby coil [17]. This device was neither compact, nor energy-
efficient or fast enough to be practical for VLSI applications. Some of the proposed micro
relays derive their mechanical motion from thermal expansion or from piezoelectricity .
These designs have shown great promise, but are not fully compatible with existing chip
production processes and cannot be scaled down to the sizes comparable to those of today's
transistors [22]. In contrast, MEM relays utilizing electrostatic actuation are relatively easy to
manufacture using conventional planar processing techniques and materials, operate with
lower voltage levels, and are more scalable and hence more suitable for complex logic design
[23]. An electrostatic switch usually consists of a movable and a fixed electrode which form a
capacitor. When a voltage is applied to this capacitor, these electrodes gain opposite charges
and the resulting electrostatic force accelerates the movable electrode toward the fixed
electrode. The structure and general operation of such an electrostatically actuated micro-
Expansion or contraction under the influence of an electric field.
29
relay for digital logic applications are similar to the well-known RF MEM Switches [18-21].
However, the relay contact resistance requirements for these two applications are drastically
different. While for RF relays in which achieving ultra-low on-state resistance (RON < 10) is
the primary target, for logic relays, RON can be much higher because the switching delay of a
relay-based circuit is dominated by the mechanical pull-in time rather than the electrical RC
delay. For IC design applications, high endurance, fast switching speed, high energy
efficiency and high layout density are the main priorities in this application while relatively
high RON can be tolerated to improve MEM relay's reliability and expected life time.
The rest of this chapter describes the structure, fabrication process, operation and model of the
MEM relays designed and fabricated at the University of California, Berkeley and used in this
thesis for circuit design, analysis and experimental demonstration.
2.1 MEM Relay Structure and Fabrication
Figure 2.1 shows the schematic 3-D and cross-sectional views of our original 4-terminal
folded spring (a.k.a "crab leg") relay. This relay resembles a CMOS transistor, as it contains a
GATE OXIDE OFF STATE
BODY GATE
DRAINDRD4RARAI SOURCE
A g
INSULATOR SUBSTRATE
A-A' cross-section: OFF state (I Vgb I< Vpo)
CHANNEL ON STATE
SOURCE_
A-A' cross-section: ON state (I Vgb|> Vpi)
Figure 2-2: 3D and cross-section views of a 4-terminal folded flexure (crab leg) MEM relay
30
source, a drain, and a channel through which current flows, as well as gate and body
electrodes that control the device's state. The main structural difference is that the gate and
channel are not in the same plane as the source/drain electrodes. In the MEM relay, springy
suspension flexures are used to anchor the gate structure. The metal channel is suspended
beneath the gate's center plate by an insulating gate dielectric. The key incentive for moving
from an earlier cantilever relay design [9] to this folded spring design was the robustness of
the latter to residual stress which may cause structural warping (out of plane deflection), and
was the main failure mechanism of the cantilever design [10].
Polycrystalline silicon-germanium (poly SioAGeo.6) is used for gate's structural material
because it is robust against fracture and fatigue, offers low residual stress and most
importantly, it is thermally compatible with CMOS backend processes [24]. The metal used
for channel, source, drain and body electrodes is tungsten, mainly because of its resistance to
wear and plastic deformation (high hardness). In addition to physical durability, from a
process integration point of view, tungsten is an excellent candidate as it is highly resistant to
HF vapor and can be deposited using Physical Vapor Deposition (PVD), within thermal
budget constraints, with good uniformity and conformality [22]. For the insulating dielectric
(one layer between the electrodes and silicon substrate, and another between the channel and
the gate), A120 3 has been chosen, as it offers high electrical breakdown voltage, low leakage
current, low stress resistance to HF vapor [25]. Figure 2-2 illustrates the steps toward building
the original 4-terminal (4T) MEM relay.
2.2 MEM Relay Operation
The operating states of the MEM relay are shown in Figure 2.3. When the gate-to-body
voltage exceeds a certain threshold voltage called the pull-in voltage (Vp,), or in other words
there is enough electrostatic force to overcome the mechanical spring force of the relay, the
entire gate and channel structure will be "pulled-in", and the channel completes the contact
31
DRAINELECTRODE BODYELECTRODE SOURCEELECTRODE
SILICONSUBSIRATE
(a) The source, drain, and body
electrodes are built by depositing
tungsten on top of an electrically
insulating layer.
SILICON DOXI0E
(b) A layer of removable, sacrificial
silicon dioxide is added to form the
foundation of the suspended part of
the switch. The contact points of
the channel and the source/drain
electrodes are etched away.
(c) A second layer of sacrificial
material is deposited to create a
contoured surface. The tungsten
channel is then added, its lowest
points aligned with the source and
drain electrodes.
RELAY-
GATE
(d) Another insulating layer is added
to electrically isolate the channel from
the gate. The gate and the springlike
structures are built from a thick Poly
SiGe layer. Finally, an insulating layer
on top protects the gate and the
springy coils as unneeded material is
etched away.
Figure 2-3: Major MEM relay fabrication
developed to build CMOS chips [16].
HF
(e) HF vapor clears away the sacrificial
material, leaving the gate and channel
structure suspended over an air gap. To
improve device reliability, a thin
titanium oxide (TiO 2) coating is applied
to the device to reduce current flow at
the contacts and slow the formation of
tungsten native oxides.
steps, using the processes that were originally
between the source/drain electrodes to allow the current to flow. When the gate-to-body
voltage is below the release (pull-out) voltage (V,), the restoring force of the spring exceeds
the electrostatic force and the moving structure will be "pulled-out" causing the relay to enter
the off-state in which an air gap separates the channel from the tungsten source/drain
electrodes so that no current flows. Based on these basic operation principles, the relays are
expected to have steep subthreshold slope and virtually zero leakage in the off-state. Figure 2-
3 shows both of these attractive aspects of MEM relays measured for the first generation of
4T relays: the leakage is beneath the noise floor of the measurement instrumentation and the
current drops by many orders of magnitude over a 1 mV input voltage. An important note
about the operation of the MEM relay is its ambipolar actuation, as the electrostatic force is
32
Turn-On Turn-Off
10-3 10
1 - V S =V B=OV 10 - ,4_B "_
0 VD=lV 10IVvOv
10 .A *8 10-
10i 10 Noise10 F or - 1 1011 F or
10-12 10-12 - - mv
10.1 
-p 10~1 4 -104 10-14 -
6.12 6.125 6.1 5.9 5.905 5.91
VG ) VG(V)
g ghysteresis go hysteresis
window window
(2/3)go (2 /3 )go
1 Pull-in
gO-gd 'd-- go-gd
Pull-out
Vpo Vpi iVg| VPo Vpi |Vgh|
Figure 2-4: Illustration of the operation and ideal switching characteristics of MEM relays.
The gap distance (g) is plotted against I Vgl, showing the ambipolarity of pull-in and pull-out
for relays [10, 26]. gd is the gap between the dimple contacts and the drain/source electrodes.
created based on the absolute value of gate-to-body voltage. As a result, both a PMOS and
NMOS-equivalent relays (also called P-relay and N-relay) can be constructed with the same
relay by biasing the body terminal.
2.3 MEM Relay Modeling
2.3.1 Pull-in and Pull-out Voltages
33
The MEM relay can be modeled as a parallel plate capacitor on a mass-spring-damper system
as shown in Figure 2-4. Based on Newton's second law of motion, the motion dynamics of
the gate structure can be described by the following second-order system equation:
mi + bi + kz = Feiec(z) (2.1)
where z is the displacement of the gate plane, k is the effective spring constant, b is the
damping force and m is the effective inertial mass. Feec is the electrostatic force when a
voltage, Vgo, is applied between the two electrodes, and is a function of the displacement, z.
While this non-linear differential equation needs to be solved numerically, the structure can
be modeled as a simple parallel plate capacitor [27]. The electrostatic force is
Feiec(z) = oAOVb - OAOV2b (2.2)2(go-z)l 2g'
where I o is the permittivity of air, g is the varying gap thickness, go is the free-suspension
(as-fabricated) gap thickness and Aov is the effective actuation area which is the overlap of the
moving and fixed electrodes. The spring restoring force acts as an opposing force to Feiec:
Fspring (Z) = kz = k(go - g) (2.3)
where k can be analytically expressed as follows [13,27]:
Anchor
1/k
k bN
Movable gate Fetec
-- _ ele electrode
Fixed body electrode
Figure 2-4: A lumped-parameter electro-mechanical model of a MEM relay as a dynamic
parallel-plate capacitor and mass-spring-damper system.
2EWf H 3k = f 3  (2.4)
Lf
where, as shown in Figure 2.1, Lf and W denote the length and width of the spring coils,
respectively, H is the thickness of gate structure and E is the Young modulus.
At the equilibrium the electrostatic force and the spring restoring force balance each other:
2V b - kz = 0 (2.5)
2(go-z) 2
Based on (2.2) and (2.3), as the displacement increases, the electrostatic force increases faster
than the spring force (quadratic vs. linear), so there exists a critical displacement beyond
which Feec is always greater than Fspring, and as a result the gap closes abruptly, or the "pull-
in" occurs, even with a small increase in Vg. In order to have a stable equilibrium, just before
the pull-in, the derivative of the net force with respect to the displacement must be negative,
which results in the following condition:
vVb- k < 0 (2.6)(g 0 -z) 3
By merging (2.5) and (2.6) and eliminating k, we can solve for the critical displacement, zp,:
1
ZPg = - go (2.7)
The pull-in voltage, the voltage which creates this critical displacement, can be found by
substituting zpiback into (2.5):
8kg'
Vp= 8 (2.8)27 0 Aov
The abrupt pull-in at g-2/3go is illustrated in Figure 2-3. When the device is on, the gap is
effectively reduced to go-gd, where gd is the gap between the dimple contacts and the
drain/source electrodes (Figure 2.1). Turning-off the device requires the gate-to-body voltage
35
to be lowered below Vp,. The spring restoring force must overcome both the electrostatic force
and surface adhesive forces (FA) that exist when contact is made:
2V + FA < kg (2.9)
2(90--9a)'
So the pull-out voltage can be found by solving (2.9) for Vgb.
po ~.2(kgd-FA 0--d)2 10)
An important observation is that VO is always smaller than V,,, because once the device is
pulled-in, the actuation gap is smaller and hence the same electrostatic force is achieved with
smaller gate-to-body voltage. This hysteresis effect is shown in Figure 2-3.
2.3.2 Mechanical Delay
The mechanical delay is an important property of MEM relays, as it determines the circuit
performance. A rigorous delay model has been developed by Hei Kam [23], which derives an
accurate analytical formulation to find the effective mass and spring constant in (2.1):
The simplified effective mass, verified by ANSYS simulation, is expressed by:
meff = 1.8pAH + 2.74pWJ Lf H (2.11)
where p is the density, A is the plate area, H is its thickness and Wf and Lf are the width and
length of the flexure. Assuming that non-ideal effects such as contact bounce and settling time
are negligible, twech is approximated in closed form by:
tmech -- X)- (2.12)kef (go) VP1
This approximation is valid for 5 V > Vdd > 1.1pi . As expected, the mechanical time
constant is inversely proportional to the resonance frequency ( keff/meff ) and directly
proportional to the gate overdrive (VddVpi), meaning that the delay can be tuned either in the
36
device design stage, by changing the mass or the spring constant, or during device operation,
by applying higher actuation voltage. In (2.12), X ~ 0.8 and the other parameters (a, # and y)
depend on the quality factor (Q) of the relay.
In spite of the hysteresis effect and the surface forces that attract the channel to the
source/drain electrodes in the contact regions, the mechanical turn-off delay is typically much
smaller than the turn-on delay because electrical contact is broken as soon as the channel
dimple moves slightly away from the surface when the gate-to-body voltage is reduced below
VPO. This difference between the turn-on and turn-off delays enables tree-like logic design
with exclusive long stacks of relays.
2.3.3 Electrical Modeling of the MEM Relay
Although the performance of MEM relay circuits is mainly determined by the mechanical
turn-on time constant, another factor that affects the general circuit switching delay is the
Rcon+Rpox Re/2
Cgd trace
S D
Channel 
, G
c g_ Cgd
\'T . -"gcReh/2Rtr 
-~h/
sCeD
Rcon+Rpox B
Figure 2-5: Electrical model of the folded spring (crab leg) MEM relay [9]
37
electrical delay (teiee), which is defined as the amount of time required for a relay in the on-
state to charge/discharge a load. Calculation of the electrical time constant requires accurate
modeling of the on-state resistance and capacitances of the relay. Developing such a
comprehensive model is also important for energy efficiency analysis of complex MEM relay
circuits discussed in the following chapters of this thesis.
The electrical model of the 4T crab leg relay is shown in Figure 2-5. Based on this model, a
VerilogA model has been developed by Hei Kam and Fred Chen [9] and significantly
improved by Matthew Spencer, and is used in simulations presented in this work.
The most prominent capacitance in this model is the actuation capacitance, Cgb, and its value,
calculated based on the parallel-plate capacitor model, is:
Ce "A"" (2.13)
Cgb 9- + -Z
The other capacitances are all parasitics. Cgs and Cgd can be calculated similar to Cgb . The
largest of these parasitic capacitances is the gate-to-channel capacitance, Cge:
Cgc = KoxEoWCLC (2.14)tox
where t0, and KO, are the gate oxide thickness and relative permittivity and W and Le are the
dimensions of the channel. Cgc is relatively large because the channel is separated from the
gate by a thin layer of aluminum oxide which has a dielectric constant of approximately 9.
The gate-to-source/drain capacitances can be expressed with similar equations, although the
effective overlap area for them is much smaller. It should be noted that when the switch is in
the on-state, the effective gap will be go - g. The channel-to-body capacitance (Ceo) is not
significant because the overlap area is small compared to gate-to-body area and it is formed
over the same air gap.
The total on-state resistance of the relay includes the metal trace resistance (R,), the channel
38
resistance (Rch), the channel-to-source/drain contact resistance (Rcon) and the resistance of the
passivating TiO2 oxide used to improve the endurance of the device (Rox). The first two
components are negligible as the path is metallic. According to [28], Ren at each side of the
channel depends on the properties of the material and is limited by asperities on the contacting
metallic surfaces, and can be estimated by the following equation:
Reon = 3pAV (2.15)3 Ar
where Ar is the area of the contact asperities, and , are the resistivity and electron mean free
path of the contact material, respectively. For tungsten, the values for and , are 55 n -m and
33 nm respectively. Ar is a function of the material hardness (H), the deformation coefficient
at the contact () and also the effective loading force, which can be approximated by the
electrostatic force:
Ar Felec -d) (2.16)Ar (H
For tungsten, , = 0.3. These equations show that soft contacting electrode materials (esp.
gold) and a large applied load can minimize asperities and achieve low contact resistance.
Although this approach is used in RF relays applications, it significantly reduces the
endurance. Since for logic relays the required number of switching cycles is a lot larger than
in RF MEMS, the use of hard contacting materials, e.g. tungsten, and operation with lower
contact forces are preferred [29-31], since an on-state resistance of up to 10K (for load
capacitance of 10-100fF) can be tolerated.
In the current generations of MEM relays, the resistance of the ultra-thin passivating oxide
(TiO 2) is around IKQ, much larger than the contact resistance which is smaller than 1n [26].
However, for the scaled MEM relay modeled in [14], the contact resistance becomes
significant and reaches the kQ range.
The electrical model of MEM relay (Figure 2-5) in the on-state can be simplified by
39
BCgb pi T
T
Cgs,pi Cgd,pi
Ron/2 Ron/2
S VAMD Channel
Figure 2-6: Simplified electrical model of the "pulled-in" MEM relay
eliminating the negligible resistances and capacitances and using circuit reduction techniques,
as shown in Figure 2-6. In this model, Ron=2(R,,+Rc.n) and for all capacitances, g = g0 - gd.
Also, the gate-to-channel capacitance has been eliminated, because of the symmetry between
source and drain contacts and gap uniformity, Cgc forms a perfect Wheatstone bridge with
R 0 ±x+Rcon, Cgs and Cgd.
2.4 Verification of the Feasibility of MEM Relay VLSI Systems
This section describes our early steps toward verification of the feasibility of the relay
technology and includes experimental demonstration of several basic circuit components in
our first test chip, named CLICKR1 [11]. The fabrication was done in a 1 Im lithographic
process and the chip was built with the 1St generation 4T relays. The design, layout and testing
of the circuits was a collaborative effort between students from MIT, UC-Berkeley and
UCLA. The students besides myself that were involved in circuit design and testing were:
Fred Chen (MIT), Matt Spencer (UC-Berkeley), and Chengcheng Wang (UCLA). The chip
fabrication was done at the UC-Berkeley Microlab by Rhesa Nathanael.
The die photo of the test chip is shown in Figure 2-7. It contains logic (adder and multiplier
blocks), sequential memory elements, memory arrays, I/O and clocking circuits.
40
9mm lb
3 g
:4-bit ADCt*
4-bit DAC :::Otutr
Figure 2-7: Die photo of the first MEM relay test chip (CLICKR1) [11]
2.4.1 Inverter and Voltage Transfer Curve
Figure 2-8 shows the simplest MEM relay circuit, an inverter, and the measured VTC for it.
As we will discuss later in this chapter, the 10 generation relay was not capable of ambipolar
operation because of layout-related parasitic effects, so while the original inverter design is
shown in Figure 2-8a, we had to avoid body biasing and use the structure of Figure 2-8b
instead, where A =10-A. The measured VTC is illustrated in Figure 2-8d, showing full-rail
swing at the output and digital gain. The VTC and voltage levels of drain, source and gate
terminals show the "composability" of the relay, meaning that it can handle necessary
voltages to actuate another relay.
41
Vdd
T
(aH )
A- Vout
(a)
12
10
8
6
* 4
2
0
-2(d)
Vdd=10 V
T
-x H"'
Vout
A- H1'
(b)
A
B
il
B
-7-
A
(c)
....... A device'-pull-in.
A device pull-in
0 2 4 6 8 10
A (V)
Figure 2-8: An ideal relay inverter (a), the modified inverter with grounded body terminals
(b), an XOR gate (c), and the VTC of the modified inverter
It should be noted that the same structure can be used as an XOR gate by replacing the Vdd and
GND on source/drain terminals with B and B (Figure 2-8c).
2.4.2 Basic Logic
Figure 2-9 shows the carry generation (PGK) subcircuit of a full adder which has been
modified such that all relays are configured as N-relays (i.e. all body terminals are tied to
ground). The measured results illustrate the operation of this circuit in propagate, generate,
and carry modes, showing the feasibility of MEM relays for building complex gates. An
important observation is that the circuit incurs no additional mechanical delay between Cin
and C,,, so it can be used in a Manchester carry chain configuration for implementation of a
42
Vout
Propagate r
-- A 1
I A B |A
Figure 2-9: MEM relay based carry generation (PGK) circuit and measured waveform
demonstrating operation in propagate, generate and kill modes.
fast adder [9].
2.4.3 Memory and Latching Units
Figure 2-10 illustrates the schematic and functionality of a simple MEM relay latch in both
transparent and opaque states. While Figure 2-10a shows the optimal design for the latch, in
this chip we used a reconfigured topology similar to pseudo-NMOS structures (Figure 2-10b)
because we were limited to N-relays. Like in CMOS, a cascade of two latches could be used
to create a flip-flop.
The composability of MEM-relay circuits is verified in Figure 2-10c, where each MEM relay
provides actuation voltage for the following relay. Although the circuit is functional, it incurs
an extra mechanical delay during operation compared to the original design. The advance of
technology and the availability of viable P-relays have addressed that problem, as we will
describe in Section 2-5.
43
I
I
VDD
IH FCik iH
D- Q
H VDD(a)
Clk
..L.
(b)
Opaque Transparent: Opaque
10
10 Cik........... .. ...
5 .. . . . . . . . . . . . a . . . . . . . . . . . .
0 - - - --.-.--- .-
>15
5D
15
10 - - - -
5 ---. --. .. -. .. -.-.
0-
0 10 20 40 40' 50
Time (s) '
Figure 2-10: Design and demonstration of MEM relay latch; The optimal MEM relay latch
with one mechanical delay (a), the reconfigured pseudo-NMOS style latch used for
functionality measurement (b), measurement results, illustrating the functionality of the latch
in both transparent and opaque states.
In addition to the latch, a 10-bit DRAM was also demonstrated in [11] and illustrated in
Figure 2-11. The DRAM is configured like a NAND Flash, built exclusively with N-relays.
Each DRAM cell consists of a storage device which can either short or open the read bit-line,
an access device that passes the signal between the write bit-line (BLWR) and the gate of the
storage device, and a bypass device in parallel with the storage device.
During a write operation the write word-line (WLWR) is raised high to let BLWR charge or
discharge the gate of the selected storage device(s).
During a read operation, the bypass path in all of the inactive rows of the memory is enabled,
shunting all storage devices except that of the active cell. If the gate capacitance of the storage
device in the active cell holds a "1", the storage device will be turned on and the conductive
44
path to the supply, highlighted in Figure 2-1la for the last memory cell, pulls the BLRD high.
If the stored value is "0", the BLRD will remain low.
While a DRAM design similar to CMOS implementations would require 2 mechanical delays,
the proposed configuration allows the memory to perform a read operation in a single
mechanical turn-on delay (for decoding the address) plus a mechanical turn-off delay (to turn-
off the bypass device of the cell being read. The bypass devices are normally on). As we
Bypass V
Device
Storage Access
Device Device
10
5
1
10
5
0'
0 100 200
time (ms)
(a)
300
(b)
Figure 2-11: The schematic (a) and operation (b) of a 10-bit MEM relay-based NAND-style
DRAM column. Simultaneous read and write of a single bit in the last memory cell.
45
B Iw I .
BLWLR:
... ... . .. .....
mentioned earlier, the turn-off delay is orders of magnitude smaller than turn-on delay, and
hence the read latency of this DRAM design is half of the CMOS implementation. The
experimental results show a simultaneous read and write operation; this was enabled by
replacing the pre-charge relay with a large external pull-down resistor.
2.4.4 Clocking Units
Figure 2-12 shows the schematic and waveforms for a single relay pseudo-NMOS style
8 V
74 kQ
Vout
0d
7.5
7
6.5
6
5.5
5
4.5
7.4
7.2
7
Z 6.8
6.6
>0.
0 0.05
0.14
Time (ms)
0.16/
0.1 0.15 0.2
Time (ms)
Figure 2-12: The schematic and measured waveforms of a pseudo N-relay oscillator. These
waveforms were used to estimate load capacitance (55pF), device on-resistance (1k92), and
"weak overdrive" mechanical delay (34pts).
46
............
..... .......... 
........ .... .....
.............
..... ..... .....V-Pi
4mffs .... =MEW ...
..............
....................... 3 4 .-
L ........... ......   ...... ..... ..... 
............. ............  .. I .. ......
oscillator. When V.,<V,, the relay is off and the output rises at a rate that is set by the RC
time constant of the 74kQ external load resistor and the total capacitance of the test
infrastructure. Based on the measured rise time, this capacitance is estimated to be 55pF.
When the output voltage reaches V,, the relay actuates. The mechanical delay can be extracted
by finding the time difference between when the output voltage exceeds V,, (previously
characterized with a DC analyzer and found to be 7V for this device), and when the relay
switches on. An important observation here is the low gate overdrive (Vo,/V,,) which
increases the mechanical delay (Equation 2.12) and also makes it sensitive to small changes in
the overdrive voltage. Therefor the measured mechanical delay varies between 25-35ps on
different cycles.
As soon as the relay turns on, the output voltage drops abruptly, because the total fall time is
set by the on-resistance of the relay (which is much smaller than the external load resistor)
and the same load capacitance as the rising edge. Based on the measured electrical delay
which is approximately 300ns, Ron is estimated as 1kQ, which is consistent with our DC
characterizations. If instead of the load capacitance of the test infrastructure, the oscillator
only drives another relay, the electrical time constant would be ~300ps.
2.5 The Evolution of MEM Relays
Since its introduction, the MEM relay technology has gone through various revisions to
improve the functionality, performance and reliability. Many critical decisions for upgrading
the device layout and parameters have been made based on analytic or experimental circuit
insights. Figure 2-13 summarizes the evolution of the MEM relays in recent years. Although
we successfully demonstrated the functionality of the original 4T relay (Figure 2-13a) and
implemented some basic logic and timing units [11], implementation of larger circuits with
that relay proved to be impossible due to layout-related parasitic effects and biasing
limitations which made the implementation of a PMOS-like relay (P-relay) impractical.
47
The first problem, observed during device testing, was that the pull-in voltage when the SiGe
terminal was used as gate and the body terminal was grounded turned out to be significantly
lower than when the signal was applied to the body and the SiGe gate was grounded (Figure
2.14). This can be attributed to the discrepancy in the Feec generated by the gate structure plus
spring arms to the body terminal and the substrate, versus the electrostatic force (Feiec)
generated by just the body terminal to the gate. The reason for different Feiec is that in the first
case, the actuation capacitance consists of Cgb ,Cgd and Cgs, while in the second case it only
includes Cgb.
B B
D S D S 2D
B B
04'S1 S2
S DI D2
90 sm 85 pm 15 un
(a) Original 4T relay (b) Improved 4T relay (c) Scaled 6T relay
Figure 2-13: MEM relay device evolution: layout and SEM
48
VG 0 +VD
SiGe Gate
BIG. Gift
0
10-
10
10"'
10
10
14
101 :
104
1 4
10104~
0 2 8 10 12 14 16 18
VG(V)
Figure 2-14: Evaluating the ambipolar operation of the 1st generation MEM relay. (a) Bias
configuration for both experiments, (b) I-V curves show that both Vp, and V,, are much higher
for the case of SiGe gate and tungsten body. Data courtesy of Rhesa Nathanael [22].
The significant gate-to-drain and gate-to-source overlap area in the 1st generation 4T relay
resulted in a parasitic actuation which amounted to 40% of total actuation. As illustrated in
Figure 2-15, while the body voltage and drain-to-source voltages are fixed (VB=O, VDSlV),
0
8
7
6
5
4
3
2
1
0
-4 -2 0
VD (V)
2 4 6
Figure 2-15: Parasitic electrostatic force between the gate and the source/drain results in the
dependence of the pull-in and pull-out voltages on the drain voltage.
49
W Gate
7*
2
3
- SGe Gate
- W Gate
VD)=2V
vs=ov
2
3 ws4Body
V= 0 +VD,
(a)
4 6
(b)
Ideal slope= 0
VP
both pull-in and pull-out are highly influenced by the drain voltage, a phenomenon similar to
drain-induced barrier lowering (DIBL) in CMOS. Ideally, the source and drain should have
minimal effect to the switching voltages (i.e. slope = 0). Another key problem was the large
area of the channel-to-body overlap and a significant channel-to-body coupling that
contributed to an undesired electrostatic attractive force between them and made the pull-out
impossible, even when the gate to body voltage was zero.
All of these issues were addressed in the second generation of the 4T relay, shown in Figure
2-13b, by updating the device layout to minimize the gate-to-drain, gate-to-source and
channel-to-body overlap areas. The improved 4T relay design shows promising ambipolar
operation (Figure 2-16a). The share of parasitic source and drain actuation has been reduced
from 40% to 2.3% resulting in a nearly ideal I-V characteristics with slopes close to zero
(Figure 2-16b). The reduced channel footprint also has energy benefits, as the gate
capacitance is dominated by gate-to-channel capacitance.
In a later generation of MEM relays (2-13c), fabricated by SEMATECH, the channel area has
been further reduced by moving the channel, source and drain electrodes to the corners of the
10-2 14
.4 -SiGe Gate 12 M
-s W Gate V, slope-0.0034
1 104 1 8
0 v D=2v 8 V slope=0.0055
0 VS=OV 6
1013
10-1 2
104 0
0 2 4 6 8 10 12 14 16 -10 -5 0 5 10 15
VG(V) VD(V)
(a) (b)
Figure 2-16: The new layout for the 4T MEM relay enables improved ambipolar actuation (a)
and Vd independent pull-in and pull-out voltages.
50
device. The area of this relay is ~30 times smaller than the second generation 4T, and it has 6
terminals: two independent source-drain pairs and a common gate-body pair. Such
enhancement approximately halves the number of relays required to implement most logic
functions [26, 32]. As a result, in addition to significant reduction of parasitic effects, the 6T
MEM relay enables considerable reduction in area cost and gate switching energy, compared
to the 4T MEM relays.
The measured and derived model parameters of the relays used for design, analysis and
experimental demonstration in this thesis, the improved 4T and scaled 6T MEM relays, are
summarized in Table 2.1. Also included in that table are the parameters of the predictive
90nm relay model which will be used for analysis of MEM relay energy efficiency benefits
later in this thesis.
2.6 Summary
The structure, fabrication process, operation and electro-mechanical model of the MEM relays
were reviewed in this section. In addition, our groundwork for verification of the feasibility of
MEM relays for VLSI applications was discussed by summarizing the results from our early
tests. These early results helped us find out the non-idealities of the first generation 4T relays,
which were mostly layout related. Those issues have been addressed in the design of more
recent generations of relays, the improved 4T and later the scaled 6T relays, and as a result the
device attributes have drastically improved. This motivated us to look at larger-scale
demonstrations of MEM relay applications for VLSI systems, which will be described in next
chapters.
51
Table 2-1: Scaled and current MEM relay device model parameters
Measured data is indicated with an asterisk
2 Mechanical delay in this table roughly corresponds to 1.2 Vp Vgb 52 V
52
Parameter 2nd Generation 4T Scaled 6T 90nm equivalent 90nm equivalent
1221 Relay 1121 4T Model 1141 6T Model
A0 , (pm 2) 754 45 1.54 1.54
go (nm) 130 100 10 10
gd(nm) 50 50 5 5
Ron (kW) 1* 2-3.5* 2-3 2-3
Cgb(fF) (on,off) 78, 49 7.2, 3.8 2.3, 1.46 2.3, 1.46
Cge(fF) 92 16 0.9 0.4
Cg, s(fF) (on,off) 0.93, 0.59 0.32, 0.17 1,0.6 0.3, 0.18
V,,, VO 10, 7* 8, 6* 0.04, 0.03 0.04, 0.03
2 tmech (pis) ~10* 0.2-1* 0.02-0.08 0.02-0.08
CHAPTER 3
MEM Relay Power Gating
3.1 Power Gating Background
Power gating is one of the most prominent circuit design techniques for reducing leakage
current. The concept of power gating is straightforward: the flow of current to a part of the
circuit that is not currently active is cut-off by means of a current switch. Although this
technique has become ubiquitous in integrated circuits to reduce the power consumed by
inactive CMOS logic circuits, the finite Io,/Icf ratio of MOSFET power gates limits their
ability to reduce off-state leakage. In contrast, as described earlier, MEM relay-based power
gates that mechanically make or break electrical contact can completely eliminate off-state
leakage. However, the leakage benefits of MEMS-based power gates may be outweighed by
increased switching energy and voltage droop due to relatively large device dimensions
and/or operating voltages and on-state resistance. In this chapter we present a comprehensive
analysis that predicts the conditions under which electrostatically-actuated MEM relays can
achieve energy savings over MOSFETs for power gates. Furthermore, we utilize the second
generation of 4T MEM relays to experimentally demonstrate that these switches can
successfully power-gate a functional CMOS chip [33].
3.2 Energy-efficiency Analysis and Comparisons
Figure 3-1 shows the basic structures and relevant parameters for a generic CMOS logic
power-gated by MOSFET or MEM relay headers. The total energy consists of active,
switching and leakage energy, although for MEM relay power-gated circuits the last
component is negligible.
53
VEXT,M
-.LYCM J- SCR
T T
VEXT,M CMOS LOGIC 1 CMOS LOGICVETr pCL F S CLa,X ,, f, CL jX, ,p 51 3CL
(a) (b)
Figure 3-5: MOSFET (a) and MEM relay (b) power gates. a and a, are the average and peak
activity factors of the CMOS logic,f is the operation frequency and CL the CMOS logic load
capacitance. p is the ratio of supply capacitance (at VDD) and CMOS logic load capacitance,
while y and c are the ratios of drain and gate capacitances of MOSFET (CM) and MEM relay
(CR) headers, respectively [33].
For a given amount of time that the CMOS logic is in sleep (Toff) or active (Ton) mode, the
energy per power gate switching cycle for MOSFET (EM) and relay (ER) gating can be
expressed as follows [33]:
Em = E.te + Eakage + Eswthg
=(1TT+UMIof f,,)EXTM + (UM CM+PCL) DDEXTM +UMCMP1o 2  (3.1)
E R = E,.,+ Esrg
=I TF (3.2)?C~V uc
on oVT,R+(EURCR+0CR L VDDEYT,R R RR,effT
where the value of the external supply VEXT,M(R) is set by the desired on-die supply VDD and the
IR drop through the power gate - i.e.:
VEXTM(R) VDD+IonpRon,M(R)/UM(R) (3.3)
Ron,M(R) is the on-resistance of a unit MOSFET (relay) switch, CM(R) is the gate capacitance of
54
VEXTR
the unit switch, and UM(R) is the upsizing factor. We defined the upsizing factor as the width
for the MOSFET and the number of parallel switches for the MEM relay power gate. Vro is
the I/O voltage driving the gate capacitance Cm, and VR,eff the effective relay gate voltage
obtained from pull-in (Vpi) and pull-out (Vp,) voltages:
VR,eff = Vpi(Vpi Vpo) (3.4)
The average and peak load currents, I0, and Ion, are equal to afCL VDD and anfCL VDD,
respectively. Ioff is the leakage current of a unit MOSFET switch. Tables 2-1 and 3-1 provide
parameters for a variety of relay generations and a range of CMOS technologies for a target
VDD Of 1 V.
The principal
power-gating
follows:
variable that a designer can adjust to minimize the energy consumed in each
cycle is the upsizing factor, UM(R), whose optimum value can be obtained as
dE kRon,M T
M = 0 -- U2 2 ~ IdU "'"' "VM o ff off 1)1) + CM 1( TCV)
(3.5)
Table 3-1: CMOS technology parameters from standard and predictive models [33-34]
CM RonM Iqg(A/tm)
CMOS process (nm) VNOM(V)
(fF/pm) (kQi-m) VIO=VNOM VIO=VEX7'
32 0.9 0.55 336n 336n 1
45 1.2 1 115n 115n 1
65 1.7 1.1 1.3n 37.ln 1.1
90 2.2 1.6 54p 3.15n 1.2
130 2.5 2.4 22p 378p 1.2
180 1.6 2.8 3p 48p 1.8
250 1.3 4.5 1.5p 9.6p 2.5
55
dE kRlRT(
R 0 -+>Un'P R~r on 2(nR.o62)
dUR g CRVReff +CRYDD
where k = IonR/Ion and it was assumed that aJT,,n >> P. In order to clearly compare the
performance of the two power gating schemes, it is useful to examine the normalized amount
of energy each one loses relative to the on-state energy with ideal power gating
(Eon= VDDIonTon):
kRo(na ( ,, +EGMyi") E X kR on, off TP
6E =2 1 I+ + + (3.7)TV 2  +2afT+ 4 Tn uea,+EG, on
Tn oDD off
6 -2 TOIVDD2  2ajo (3.8)
where 3 EM(R) (EM(R) - Eon)/Eon, EG,M IO + YVDD2 ), EG, R(VR,eff VDD, and
Pleak,M VDDIoff.
Using equations (3.7) and (3.8) with parameters from a standard 90nm CMOS process and
our current 4T relays (Tables 2-1 and 3-1), Figure 3-2a shows the energy ratio of designs with
MOSFET and MEM relay power gates versus Toff for fixed Ton. For short Toff (< ims), the
increased switching energy and the energy lost due to IR drop of the relay-based power gate
outweigh its leakage reduction benefit. However, even with current relays, for Toff> Ims and
To, > 1OOns, the relay's negligible leakage continuously reduces the total energy as off-time is
increased.
As shown in Figure 3-2b, scaling the relays to dimensions comparable to current mass-
produced MEMS, like our scaled 6T relays which are 30x smaller than the 4T design, reduces
their capacitance and operating voltages and enables the relay power gating solution to begin
56
-1 1 n0- -
-+-10ns
-4--1ms ---
- 7T. ,..... .......
1 f 101 Id 1d
T0f(")
IU ...... ;-;:1 .... ....
............. :: :: .... .::::,........ ................ ... ...
................................  ..............................................  I I 
.... 1. 1 ..... - .... - ......................................................  ..... .
.................................................................. . . ' ...........  ...  . .
T w it .... ........ ... .... ........... - ............  - .. . I ...
---. "','. .A ................ ... I ..   .......... 
0 100ne
...................... ......... ....... .....  
...................... .....................  .......... .........
...... 1 0 131a ....... - ...............  ...... ............... .....
......... Ir I M S ........ ...........  ............ ....... ...
............................   ... .." ....... ...........
lel 1004111" ......
.............................. 
.......................................................
 I ................................  .......
............ ..................... ..................
............ I ......... ..........
............ T w r .......... .......................  ......
.............................  ............ ...
1d 1d Id i 101 id 10'
T (PS)
Figure 3-2: Energy ratio vs. Tff , for various Ton, for designs power gated with (a) 90nm
MOSFETs and 2nd generation 4T MEM relays, and (b) 90nm MOSFETs and CLICKR3
scaled MEM Relays. Here, a and ap are 0.1 and 0.5, respectively,f= 1GHz, fl = 5, y = 1 and
e = 0.1 [33].
accruing energy savings at a substantially lower Tff of 10ps (a 100x improvement). In fact,
the minimum Tffg at which relay power gating provides savings over MOSFET power gating
(i.e., TOg* in Figure 3-2) is well predicted by:
2
SRm,RCR
qj'RMCML
I+ fl-1+T,
1+ M+ f
2af, 4T(IT+,,T2 )
k7,R _ k =
P P!ea, Ieay,
&,R R k,R FtM(JMCM kR E-1)
R.MCM F ', *4f
which states that the cross-over time constant is set by the ratio of switching energy overheads
and MOSFET leakage power. The value of Tff for achieving a required energy improvement
can also be obtained by revisiting (3.7) and (3.8) to calculate the energy gain EM/ER:
kRoM Toff, k
TnVDD
(kRon,REGI)(2 - 1 'Hi
T LjfVD 2Y
+ +
ao
+ + +1
2aflon) afTo
kR' IO, T
on(Off
VDD T
57
i I Id
(3.9)
E
ER (3.10)
where we assume that Toff> 100pts, To,> 1OOns, and parameters a, k, f and p equal the values
in Figure 3-2.
Equation (3.10) makes it clear that as should be expected, in applications with large off-on
ratios, the switching energy overhead of even today's relays becomes negligible. Therefore,
the energy reduction from relay power gates is set entirely by the removed leakage energy,
and this energy savings grows linearly with Tof/Ton.
It is interesting to note that in this regime, the energy improvement for relay power gates over
MOSFETs is also a function of the peak-to-average ratio k of the on-state current. This stems
from the fact that both relay and MOSFET power gates must be sized to achieve a certain IR
drop under worst-case load conditions. In the case of MOSFET power gates, the increased
transistor width (vs. a power gate sized for average load) leads to linearly increased leakage.
In contrast, the number of parallel relays used to implement the power gate can be increased
without impacting the off-state leakage.
10Energy gan
1010
VIO=VNOM VIo=VEXT.M
32 45 65 90 130 180 250
Technology node (nm)
Figure 3-3: Required Tff for specific energy gains of relay power gating over MOSFET
power gating for different CMOS processes (Ton=10ps) [33].
58
We next examine the off-times across different technology nodes, assuming that even for the
most advanced CMOS designs, I/O transistors with the properties of an older technology node
can be utilized as power-gates in order to exploit their potentially lower leakage properties. In
Figure 3-3, parameters from standard and predictive CMOS models [34] and (3.9) and (3.10)
are used to find the Tog* and required Tff for energy gains between I and 10 for different
MOSFET technology nodes, and for current relay technology.
If a separate rail (other than VEXTM) is used for Vo and matched to the available CMOS
power-gate devices, lower Ioff and hence higher T 0g are expected for a given energy gain, as in
Figure 3-4. This is especially true for 0.25pm and 0.18pm power gates, as their nominal
voltages (2.5V and 1.8V) are substantially higher than the -1V VEAT,M, resulting in substantial
gate under-drive for the PMOS headers. However, the leakage suppression with these long-
channel/thick-oxide 1/0 devices is limited by junction leakage.
Figure 3-4a illustrates the energy gain of designs with MEM relay gating over designs with
MOSFET gating as a function of To/TTo, for different power gate technologies. In Figure 3-4b,
10 10
9 Current relay Scaled relayg Current relay Scaled relay
-- 250nmI
-- 180nm - -- 250nm
y-- 130nm -4- 180nm
-+-90nm +7 - 130nm
6-4- 65nm 6-- 90nm
45n | 9: -14- 6 5nm
w 32nm A
1001 10- 10 10 10
a) T*T" b) T*ff"
Figure 3-4: Energy gain vs TOg'T, for designs with MEM relay vs. MOSFET gating, for
MOSFET power gates implemented in different CMOS processes. (a) V1o= VEXTM, and (b)
VO = VNOM-
59
a separate, higher voltage rail VIO = VNOM is assumed to be available, resulting in lower energy
gains for relay-gated vs. MOSFET-gated designs. Interestingly, as pointed out in Figure 3-2b,
the scaled 6T relay technology does not significantly increase the energy gain (since the gain
is mostly determined by the MOSFET leakage characteristics and off-on ratio), but does affect
the off-on ratio at which the relay-gated designs begin to show savings over MOSFET-gated
designs.
As mentioned previously, relay reliability is improved by the use of hard metals, which results
in relatively higher contact resistance. For a given relay size, this resistance limits the current
density that an array of relay power gates can deliver while maintaining the optimal voltage
drop. As shown in Figure 3-5, using 2nd generation 4T relays with -100 m device pitch,
MEM power gating can be applied to CMOS chips with up to -mA/mm 2 current density.
12
10
Ohm
1 CMOs i: Scaled6T
Reay:
~10 
- :
E N4T
E - Relay
10 o overhed
8 - - MDS p 1%-
10 - -- MOSlims 1%
- osIMo1tss 10%
1MOS11ms 10%
10 Ton Ron R
0ey 10ss 10k
10- flayI1ms 10
*--Relay 1ms Ik
-2 10 10 10 1 2
MOS technology node and relay device pitch [sm]
Figure 3-5: Maximum logic current density as a function of MOSFET power-gate technology node
and MEM relay device pitch. For MOSFET power gates, cases with power-gate device area
overhead of 1% and 10% of the design area are shown. MEM relay power gates are assumed to be
fabricated in the backend metallization layers, and thus incur no active area penalty [33].
60
However, power gates built from moderately scaled relays with a device pitch of 20pm, the
scaled 6T devices, would support > 10-1OOmA/mm 2 and would still fit into the same area as
the CMOS chip they are driving. The relays could therefore be post-fabricated on top of the
chip or integrated into the backend metallization layers with no penalty in the overall die area.
For comparison, MOSFET power gates in older technology nodes (0.1 8pm and 0.25pm) that
are most competitive to relays in terms of leakage suppression would provide similar current
density, while consuming -10% of the active area of the design. It is interesting to note that
relays built with 90nm lithography node could potentially far surpass the current-density
handling ability of the CMOS power-gates built in the same node.
3.3 Experimental Demonstration
To experimentally demonstrate the feasibility of power-gating with current relay technology,
we applied MEM relay power gating to a 90nm CMOS chip operating at VDD= 0.6-1V
(In = 10-25pA). Figure 3-6 illustrates the waveforms of the MEM relay power-gating this
chip with gate-to-body voltage Vgb swinging between 5 and 7V. In this figure the inset
8
1 -
VEXT, 0.8
Pulse VG > 04i
Generator I 0 o
VDD
Logic SYN-CCmoS - JU Jd
C0~- 
-0_ ----
-0.1 -0.05 0 0.05 0.1 0.15
Time (s)
Figure 3-6: Measured waveforms for MEM relay-gated CMOS chip [4]: MEM Relay
gate voltage (top), supply voltage of the CMOS chip (middle), and synchronization signal
from the CMOS chip (bottom).
61
VH
R
R1 C1 | OCvG
R2 C2 External Pwrgt
pulse gen Power gate
Self-driven pulse generator to CMOs logic
Figure 3-7: Test setup for self-driven and external power gate pulse generation for the relay
gating switch illustrated in Figure 3-1.
indicates the chip's correct I/O activity during Ton. As this CMOS chip was not originally
designed to support MEM relay power gating, in the off-state the supply is limited by I/O
ESD diode clamps to ~ 300mV.
Beyond driving the power switches with an externally generated signal, Figure 3-7 shows a
simple MEM relay-based timer circuit. The timer is based on a single-relay oscillator that
enables autonomous operation, and was used to gate a CMOS chip as shown in Figure 3-8.
Although explicit capacitors would be removed in an optimized timer, current design uses
adjustable RC elements to allow for tuning of period and duty cycle.
Figure 3-9 illustrates the packaged MEM relay chip with devices configured to implement
relay power gate and timer circuits. The inset shows a zoomed-in view of one of the 4T relay
devices from this chip [33].
62
8 3% Duty Cycle
6
e4
Power-gating input (Vosc)
Gated CMOS VDD
0 , . ._._,_._-.
0 0.5 1 1.5 2 2.5 3 3.5
Time (s)
Figure 3-8: Measured waveforms for MEM relay timer-gated CMOS chip [35].
- 85pim
-- ,
Power gates
Timer devices_
Figure 3-9: Packaged MEM relay die photo and relay device micrograph.
3.4 Summary
This chapter showed that due to their negligibly low leakage, in certain applications, chips
utilizing power gates built even with today's relatively large, high-voltage micro-electro-
mechanical (MEM) relays can achieve lower total energy than those built with CMOS
transistor sleep devices. A simple analysis provides design guidelines for off-time and savings
estimates as a function of technology parameters, and quantifies the further benefits of scaled
63
6T MEM relay. The analytical comparative analysis illustrated that the 2 "d generation
(improved 4T) MEM relays even with their modest state of technology (- 7V voltage and
100pm pitch) can provide energy-reduction benefits over MOSFET power gates for off-
periods > Ims. With relays scaled to current mass-produced MEMS device dimensions (-
15pm), the minimum off-period for which MEM relays provide energy-reduction benefit
reduces to 1-1 Ots and current densities greater than 10-1 OOmA/mm 2 can be supported.
Finally, in this chapter we demonstrated a relay chip successfully power-gating a CMOS chip,
and showed the first relay-based timer suitable for self-timed operation.
64
CHAPTER 4
Complex Logic Design with MEM Relays
In the previous chapters the potential of Micro-electro-mechanical (MEM) relays as a
promising alternative class of switching devices for digital applications was discussed. MEM
relays offer a solution to overcome the minimum energy limitations of CMOS circuits,
because they have virtually zero off-state current, on-to-off current ratios of ten orders of
magnitude and very abrupt switching slope.
Although the 4T and 6T relays are functionally similar to CMOS, the mechanical movement
renders them slower than CMOS transistors. This attribute makes the implementation of
relay-based VLSI circuits with competing performance challenging, and motivates us to
explore alternative circuit design techniques compared to conventional CMOS styles.
This chapter discusses MEM-relay catered circuit design methodology for narrowing the
performance gap at the system, rather than device level, by implementing the functional units
as large complex logic gates instead of staged logic, and hence minimizing the mechanical
delay on the critical path [9]. The main challenge in utilizing this approach on large logic
functions is the appropriate logic partitioning to minimize the number of mechanical delays
on the critical path, while maintain reasonable design area. In this section we describe the
optimal logic portioning techniques and circuit design strategies for MEM relay multipliers
along with the analysis of various design trade-offs.
65
4.1 Basic MEM Relay Logic Design
4.1.1 MEM Relay as a Logic Element
As we discussed in the previous chapters, the pull-in voltage (Vp,) in MEM relays is similar to
threshold voltage of a CMOS. The MEM relay switch turns on when the gate-to-body voltage
becomes larger than Vp,. In that regard, the operations of MEM relays and CMOS switches are
similar. However, there are two main differences between their logic styles. First, the MEM
relay is an ambipolar switch, meaning that either negative or positive Vgb can pull the relay in.
As a result with proper biasing, a single relay can operate as both an N-relay (when the body
is grounded) and a P-relay (when the body is at Vdd). Moreover, the pull-in voltage is
independent of source and drain voltages, so both N-relay and P-relay can be used as pull-up
(to Vdd) and pull-down (to ground) devices and unlike CMOS, we will be able to implement
inverting and non-inverting logic using relays, as illustrated in Figure 4-1. It should be noted
that only the MEM relays of 2 generation and after that benefit from these two important
characteristics (Section 2.4).
4.1.2 MEM Relay Logic Design Paradigm
In Chapter 2 we mentioned that MEM relays and CMOS have similar functionality for logic
design, but their switching characteristics is radically different. As a result, for designing an
BUF INV
N-Relay P-Relay A Y A->O-Y
D S
G B 11 G A Y A Y
H |HiS DII
Figure 4-6: MEM Relay as a logic element
66
optimized relay-based circuit we cannot simply pick the CMOS equivalent and replace each
transistors with relays. The reason is that in CMOS circuits the total delay is dominated by the
quadratic Elmore delay for stacking devices in series. Hence circuit designers tend to avoid
long stacks of transistors, and instead buffer and distribute the logical and electrical effort
over many stages of simpler logic gates [36-37]. This is not an optimal solution for relay
circuits, because the significant disparity between the mechanical and electrical time constants
favors circuits in which all mechanical movements would happen simultaneously, even if it
requires stacking of many relays. In other words, the main idea behind designing fast relay-
based logic is to avoid partitioning and buffering the logic as much as possible, because those
require driving gates, and each time we drive a gate, we pay an additional mechanical delay
penalty.
Figure 4-2 clarifies the difference between CMOS and MEM-relay logic design styles. A
30 Transistors
AO
Al
out
A2
A3
4 gate delays
(a)
12 MEM Relays
Al A2 A3
:.------- ------- outA3~
H T A1-| H j A2- H TA3-|H
1 mechanical delay a
(b)
Figure 4-7 Comparison of CMOS (a) and MEM relay (b) logic styles
67
simple substitution of CMOS transistors with relays in a standard CMOS 4-input AND logic
circuit (Figure 4-2a) would result in 4 mechanical delays as each signal driving a gate
introduces an additional mechanical delay. In the optimized relay design of the same function,
shown in Figure 4-2b, all the actuation activities happen at the same time, resulting in only
one mechanical delay. Thus, given a logic function, the preferred design strategy is a pass-
transistor style. Another interesting observation here is that the number of relays required for
implementation of a specific function is smaller than the number of transistors for the same
function, necessitating area comparison at the function block, rather than device level.
Figure 4-3 illustrates the difference of two approaches, static-gate and pass-gate logic styles,
Gate 1 0 Gate N
(a)
Static Ga ogic
500
400 ___
Ad 10 300 Pass Gate Logic
200 B
A Y
100 B- W
0
0 5 10 15 20
Stages
(b)
Figure 4-8: Delay of cascaded 2-input AND gates in series, implemented using relays in both
static and pass gate logic styles.
68
for a cascaded series of AND gates implemented with the predictive 4T model (Table 2-1). In
the static-gate example (Figure 4-3b) the output of each AND gate drives the gate terminal of
the next stage, and the mechanical delay accumulates along the path. In contrast, in the
example of Figure 4-3c the output of each AND gate propagates through the source-channel-
drain path of the following stage. Therefore we only see the exponential electrical delay and
this approach is radically faster than the static-gate logic style. In addition to superior
performance, the pass-gate style implementation requires only half the number of relays as the
static-gate style design3 .
The upper bound on the number of MEM-relay devices in a stack is reached when the
electrical and mechanical delays are equal. Figure 4-4 compares the electrical delay of a stack
of relays with the mechanical delay, for our current 4T, 6T and predictive 90nm MEM-relays.
The mechanical delay is obtained for a reasonable range of Vgb overdrives. Figure 4-4 shows
VGB=VeI..
10"""-""""" ""-"""" "7 X
VGB=3Vpl
**
VGB=VpI 4b
.6 GB=PI441 4
VVGB=3VVl
Predictive 4T/6T relay -----
VGB=VP0I -
100 10 1102 103
Stack length
Figure 4-4: Mechanical vs. stack electrical delay for second generation 4T, third generation
6T and predictive 90nm MEM relays
3 Data courtesy of Fred Chen [26], with adjustments made according to Table 2-1.
69
that the proposed design approach is extendable to stacks of hundreds of pass-gate style relays
and consequently encompasses most practical logic functions [32,38]. An important
consideration for designing tree-like logic circuits with long stacks of devices, like the
multipliers and compressors discussed later in this chapter, is to guarantee that the adjacent
paths are mutually exclusive in order to avoid crowbar (short circuit) current. As a result, the
effective timing margin is equal to twech,on - tmech off- telec rotal. It should be noted that process or
temperature variations affect both electrical and mechanical delays of relays. For example, the
pull-in voltage of the original 4T relay fluctuates between 8V-9.2V within a temperature
range of 300 to 200'C, which results in 10-15% variation in mechanical delay. For the same
range, Ron can reach a maximum of 15Kohm [22]. So, in order to have a safe timing margin,
the length of stacks should be 30-40% smaller than the values suggested in Figure 4-4, which
is still larger than the length of stacks in the logic units discussed in the rest of this chapter.
4.2 MEM Relay Multiplier Design
Multiplication is among the most complex arithmetic operations in a digital system and the
performance of many computational units is limited by the delay of this operation. In order to
satisfy the ever-growing need for performance in VLSI applications, today's microprocessors
integrate many dedicated multiplier units. As we mentioned earlier, due to disparity between
the electrical and mechanical delays, the performance of a MEM relay based circuit is
essentially bound by the relay's mechanical delay and relay circuits pay very little penalty in
total delay by adding complexity, provided that this additional complexity does not introduce
mechanical delay overhead. As a result, for a complex arithmetic block such as a multiplier
the performance gap between the relay and CMOS implementations can be narrowed, while
the potential energy benefits of the relay implementation promote it as a promising
alternative. However, unlike with adders, where Manchester-carry chain topology directly
benefits the relay device properties to enable single mechanical delay operation [9], the
70
multiplier arithmetic requires careful logic partitioning and delay-area trade-offs to minimize
the number of mechanical delays.
4.2.1 Microarchitecture of MEM Relay Multiplier
A multiplication function is normally implemented by a sequence of conditional addition and
shift operations. The first stage of multiplication is to generate the matrix of partial products
of the multiplicand and multiplier bits. We have implemented this with two different
approaches: a simple AND network, and radix-4 Booth encoding enabled partial products
generation. The detailed description of both methods and the corresponding circuit blocks will
be discussed in the next sub-section. In the next stage, the partial products need to be shifted
and added together until the final result emerges, i.e. the depth of the matrix is reduced from
N (for an N-bit multiplier) to 1. Figure 4-5 shows the operation for a 6-bit multiplication.
The main opportunity for innovation in relay multiplier design comes from the logic for the
partial products matrix reduction. The most straightforward solution for this operation is
"logarithmic compression," which puts the partial products in groups of two, adds them up,
groups the results in a same manner and repeats this sequence until the final result is achieved.
Despite its simplicity, this method requires large adders, and always has an accumulated
mechanical delay of 1+logN for compressing a partial product matrix with depth of N, even
by using single mechanical delay ripple-carry adders introduced in [39].
1 0 1 0 1 1 Multiplicand
1 1 1 0 1 0 Multiplier
0 0 0 0 0 0
1 0 1 0 1 1
+ 00 0 0 0 0 Partial Products
1 0 1 0 1 1
1 0 1 0 1
1 0 1 0 1 1
1 0 0 1 1 0 1 1 1 1 1 0 Product
Figure 4-5: A 6-bit Multiplication operation
71
Although ideally we would like to implement the partial products reduction function such that
it only incurs one mechanical delay, even the most advanced commercial CMOS or MEM
relay optimized custom synthesis tools [40] cannot achieve this goal without using hundreds
of thousands of MEM relays. In order to avoid such unrealistic designs, the logic needs to be
deliberately partitioned, meaning that extra mechanical delay should be inserted into the
system to enable simplification of the logic. Figure 4-6 shows the microarchitectures of a 6-bit
multiplier implemented with two optimized partial products reduction techniques explored in
this work. The first approach, illustrated in Figure 4-6a uses small blocks such as half- and
full-adders to achieve simplicity and lower relay counts. For such an implementation, each of
the small blocks needs to operate in one mechanical delay, and also have an electrical path
Mech.,qpation Partial Product Generation matrix
Elec. Propagation
+ - -- -.. 
. ..... .._.. ...... 4 LSB
(N:3)
Compressor
N -- I
FA MIcH.
d*lay
HA
Multiplication result
(a)
Partial Product Generation matrix
LSB
M ch.
(b) Multiplication result
Figure 4-6: 6-bit relay multiplier built with (a) half- and full-adders, and (b) (N:3)
compressors
72
(source-drain pass-through) from input to output. The electrical propagation paths allow for
stacking of half- and full- adders without additional mechanical delays inside each reduction
step.
Figure 4-6b shows the second microarchitecture, which utilizes larger compressor blocks to
decrease the total number of reduction steps and hence minimizes the total mechanical delay.
In each reduction step, (N:3) compressors are stacked in such a way as to avoid paths with
mechanical delays. An (N:3) compressor, described in Table 4-1, simply sums N input bits (4
< N <7) and generates 3 output bits, one in the same position (j), and two in higher positions
(j+1 and j+2). Similar to the building blocks of the previous approach, the compressors need
to be able to perform their functions in one mechanical delay time, and also have an electrical
pass-through from one of their inputs to the outputs to facilitate zero-mechanical-delay
stacking. The impact of using higher ratio compressors is highlighted in Figure 4-6, where the
example 6-bit multiplier built with (N:3) compressors compromises area to reduce the number
of mechanical delays from 4 to 3 (including one mechanical delay for partial product
generation). This can easily be generalized to larger multipliers as we will discuss later in this
chapter. The microarchitectures 16-bit multipliers implemented with both approaches are
illustrated in Appendix A.
4.2.2 MEM Relay Multiplier Components Design
The most important design criteria for components of the MEM relay multiplier is their
"stackability," meaning that at each stage of multiplication (i.e. between intentional
mechanical delay insertions), the components should be cascaded through source-channel-
drain paths (similar to Figures 4-2b and 4-3c). In other words, an electrical path must exist
between the input and output of each stage.
4.2.2.1 Partial Product Generation
The easiest solution for creating the partial products matrix is to find all bit products of the
73
multiplier and multiplicand with an AND network. The implementation of an AND gate with
MEM relays is shown in Figure 4-7a. For an N-bit multiplier, N2 of these blocks are required.
As we mentioned earlier, using more complex compressors with a larger radix introduces an
area/delay trade-off. A technique that can potentially benefit both area and delay is a modified
radix-4 Booth encoded partial products generation, which reduces the total number of partial
products by half. For Booth encoded partial product generation, instead of shifting and adding
for every column of the multiplier term (N) and multiplying by 1 or 0, every other column is
chosen and according to its adjacent bits and the logic table in Figure 4-7b, the multiplicand
(M) is multiplied by ±1, ±2, or 0, and every new entry in the matrix is shifted by two bits to
the right instead of one [41]. The corresponding relay-based partial product generation circuit
is shown in Figure 4-7c. Although it adds one mechanical delay to the partial product
generation step and uses 1Ox more relays compared to the AND network, we will see later in
this paper that Booth encoded implementation proves to be effective in reducing the overall
complexity and delay for larger multipliers.
_N 1
N~-
N
(a) 2  N2N ~-i
Multiplier Bits Block Partial Product
N2+1N 2N2i-1
000 0
001 1*Multiplicand
010 1*Multiplicand
011 2*Multiplicand
100 -2*Multiplicand
101 ..1*Multiplicand
110 ..1*Multiplicand
111 0
(b)
LNI ]IEM1
N21-1
Extra Mech.
-_ DelayN21-1 2
N21-1 7
(C)~
Figure 4-7: Partial products matrix generation: (a) MEM relay AND network, (b) Booth
encoding algorithm table, (c) MEM relay Booth encoded partial products generation
74
4.2.2.2 Half- and Full-Adder
MEM relay-based half adder can be designed with simple MEM relay AND/XOR gates, as
shown in Figure 4-8a [32]. The electrical paths start from inputs AO and AO and reach Sum and
Carry outputs. The implementation of full-adder with 12 relays is described in [9]. The entire
circuit, including the propagate, generate and kill functions, operates in one mechanical time
delay and has multiple electrical paths from input to output, one of which is highlighted in
Figure 4-8b.
4.2.2.3 Large Compressors
The logic function and truth table of an (N:3) compressor is shown in Table 4-1. The circuit is
comprised of 3 sub-circuits for generating Yo, Y, and Y2 bits. Each of these subcircuits has 7FJ~| S 
generate
A
B
propagate
_S in- out
CotBjaikil: BIhA:: B[H L 
_d
C t -- --- ---
ouirpalat A k
C- generat~B~H- - -
(a) (b)
Figure 4-8: Implementation of MEM relay (a) half adder and (b) full adder [8]
75
B
input bits, of which we have assigned 6 to drive the gates, and one to pass through and create
the electrical path. These electrical paths are provided through A0 - A0 source/drain
connections to Y - Yj, wherej = 0, 1, 2.
According to Table 4-1, the LSB (Yo) is a 7-input XOR gate. As shown in Figure 4-9b, this
sub-circuit can be built by cascading 6 two-input relay XOR gates shown in Figure 4-9a. An
alternative to this implementation is shown in Figure 4-9c, where the body terminal is used to
reduce the total number of relays from 24 to 12. However, since driving the body terminal
with active signals was not possible due to practical concerns related to the layout of the early
generations of MEM relay [11], and also we wanted to avoid using complements of signals as
much as possible, we decided to use the circuit in Figure 4-9b for the first chip
implementation.
The steps toward implementation of the Y, and Y2 sub-circuits and the final designs are
illustrated in Figure 4-10 and Figure 4-11, respectively. In both cases, a (5:3) compressor is
used for illustration of the different steps of the implementation. Figure 4-10a shows the
propagation path logic for Y1, where Ao is passed to the output when ZE,_A 1 = 1 and AO is
Table 4-1: Logic function and truth table of an (N:3) compressor
N
i=0
76
N-1
ZAj Y 2  Y1  Yo
i=0
o 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1
passed when E'=1Ai = 3. Figure 4-10b shows the integration of generate and kill paths,
which happen for the cases of 1 Ai = 2 and E! = Ai = 0 or 4, respectively. Using a similar
method, the Y, sub-circuit of a full (7:3) compressor can be built, as shown in Figure 4-1Oc.
Symmetry in the Y, sub-circuit and fully complementary paths enable sharing of intermediate
nodes to implement P1 by adding only 8 relays to the Yj design, Figure 4-1 Od.
YO
AO E A1 AO OA1
H
H
H
H
H
H
H
AO
(a)
H"1
H"1
H"1
H"1
H"1
H"1
AO
(b)
YO
HXT
AO
(C)
Figure 4-9: Implementation of (a) a two input XOR/XNOR gate and (b) the
optimized Yo sub-circuit with signals on body terminals
Yo sub-circuit (c)
77
Y1
(a)
H-
AO
Y1
(d)
Hie
H"e
Figure 4-10: Steps toward implementation
compressor: (a) propagation paths for YIf
compressor, (c) Y of the single output (7:3)
(7:3) compressor
A2 H A2H F-A2H ' H H - " H '
Al A1 A1 A
H H- .-11  iH H lH
AA0
of the Yj sub-circuit of a complementary (7:3)
of the (5:3) compressor, (b) Y, of the (5:3)
compressor and (d) Y of the full complementary
Figures 4-11 a,b show the implementation of propagate and kill/generate sections of the Y2
78
sub-circuit for a (5:3) compressor using similar methods. Here El=, A = 3 propagates A0,
Ef 1 Ai > 3 generates and Ell Ai < 3 kills the output. Figure 4-1 ic shows the full Y2 circuit
for a (7:3) compressor.
(a)
A3
Y2
Al
"F-flH"r
(d)
"IH
Y2(b)
Figure 4-11: Steps toward implementation of the Y2 sub-circuit of a (7:3) compressor: (a)
propagation paths for Y2 of the (5:3) compressor, (b) Y2 of the (5:3) compressor, (c) Y2 of the
(7:3) compressor, (d) Y2 of the (7:3) CMOS compressor [42].
79
As can be seen in the compressor circuits, A0 is the only complementary signal required for
the operation of the multiplier, and even the more area efficient Yo of Figure 4-9c requires the
complements of only half of its inputs. In the proposed MEM relay compressor, those bits are
generated by FO and F1 of previous or same stages, and F2 is not implemented. The reason is
that the added area and energy overhead is relatively small for building Y0 and Y1 due to their
symmetric designs. On the other hand, Y2 lacks that symmetry, and hence implementation of
Y2 requires twice the number of relays. As a result, the omission of Y2 subcircuit reduces the
MEM relay count by 30, bringing the total relay count for a (7:3) compressor to 98.
If the proposed (N:3) compressor family is used to build a multi-stage 32-bit multiplier based
on the microarchitecture shown in Figure 4-6b the total delay would be 5 mechanical delays.
By comparison, a direct MEM-relay translation of the CMOS (7:3) compressor's MSB (Y2),
proposed in [42] and shown in Figure 4-11d, effectively requires two mechanical delays, and
would result in 19 accumulated mechanical delays in the critical path of a 32-bit multiplier,
even when utilized in an optimized microarchitecture, like the one shown in Figure 4-6b. In
the proposed design, the mechanical delay corresponding to the top PMOS gate in Figure 4-
11d is eliminated by direct implementation of kill/generate paths, and stacking has been
enabled by guaranteeing an electrical path from input to output of each stage of partial
products compression.
4.2.2.4 Redesigning the Multiplier Components for 6T MEM Relay
In addition to 30x smaller area and improved energy figures and yield, the 6T MEM relay
benefits from having two independent drain/source terminals, which helps in cutting the total
number of relays for symmetric topologies to approximately half. Figure 4-12 shows how the
6T relay implementation of the half- and full-adders reduces the relay count compared to the
4T implementations shown in Figure 4-8.
The partial product generation does not benefit much from this transition, as neither the AND
80
B Cout B
A-
out
(a)
Ci ------ Cout
SS B L:
CiA Sou
(b)
Figure 4-12: (a) half-adder and (b) full-adder built with the 6T MEM relay
network nor the Booth encoding solutions have many common gate-body controls (except for
the top two gates driven by N2, and the one driven by M in Figure 4-7c, which will reduce the
count from 20 to 16).
For the compressors, the designs of Figures 4-9, 4-10 and 4-11 can be updated by finding the
common gate-body controls and sharing the paths. The full (7:3) compressor built with the 6T
MEM relay is shown in Figure 4-13. By using 6T relays and sharing the devices, the total
relay count has been reduced to 46, compared to 98 used in the original 4T implementation.
4.2.3 Multiplier Design Trade-offs
Figure 4-14 summarizes the area/delay trade-offs described in this work for a broad range of
multiplier designs. The results show that the applications targeting performance may benefit
from utilizing larger compressors. These benefits are even more significant for larger
multipliers. However, for the applications with area constraints, the best approach would be
utilization of half- and full-adders. That is also the case for smaller multipliers, for which
larger compressors offer no significant delay reduction and add unnecessary complexity.
81
Y2J
Fg1 I F6 hIA6
IC A zzz!'IAdI--iIAAr IFAT.-| IFA4-H FA, lA4
YO Y
A4- _A --{l I M Z FA4 7-| FA4-dI FA4. - JFA4T- r FA4 A- ~
3d- |FAI-| f Aa
As-| I. T A 
.g-|1 IF3r F3- s3-
Kri|d FA1
A2I-|- FA2 .- |I FA2T- FA2 = I-A2 H VA2 WjL-A
-H . I |FA1- FA 1 FA1T- FA1
Ao KO A0L
Figure 4-13: The (7:3) compressor built with the 6T MEM relay
For smaller multipliers (N 5 16), the relative overhead for Booth encoded partial products
generation circuit, compared to the AND network solution, is considerably large and the
performance benefit of smaller initial matrix is diminished by the additional mechanical delay
of the Booth encoding process. So, the AND network is the right choice for those multipliers
in terms of both area and performance. The expected relay count reduction of Booth enabled
design can be realized only in larger multipliers. As an example, the area of a 32-bit multiplier
which uses large compressor microarchitecture and 6T relays can be reduced by 40% if Booth
encoding is utilized instead of the AND bit product network, without paying any additional
mechanical delay penalty.
Regardless of the microarchitecture and partial product generation technique, implementation
of multipliers with 6T MEM relays enables -40% reduction in the total relay count.
82
Logarithmic Compression
Half- and Full-adders
m Large Compressors
Optimized CMOS Style
50
40
30
20
[- 10
0
CD
-I
CD
00
-I
0~
CD
'9
CD
0
s0 I-
Cu40
0
20
4 10
0
8 16 32
Multiplier Size (bit)
64
AND Network PP Generation
50
40
30
20
10
8 16 32 64 8 16 32 64
Multiplier Size (bit) Multiplier Size (bit)
Booth Enabled PP Generation
I =Cu0Cu
so-
40-
30-
20-
10-
8 16 32 64
Multiplier Size (bit)
8 16 32
Multiplier Size (bit)
64
100
-
10
0
.z
1
8 16 32 64
Multiplier Size (bit)
100 F
4i
Cu
10
1
-
-
gill
00*
4.3 Energy/delay Estimates of MEM Relay vs. CMOS Multipliers
In order to benchmark the 16-bit multipliers built with 90nm equivalent 4T and 6T MEM
relays introduced in Chapter 2, they are compared to two optimized 16-bit CMOS multipliers
in 90nm and 45nm technology nodes.
The first CMOS multiplier employs optimally tiled compressor tree architecture (OTCT) with
radix-4 Booth encoding and an arrival-profile aware completion adder [46]. This multiplier is
2built in a 90nm CMOS technology and its total area is 0.03mm . The energy/throughput
values have been extracted for various reported operation voltages and frequencies.
The second CMOS multiplier is designed using the Dadda tree algorithm [47] with Han-
Carlson adders [48], and placed and- routed using the Nangate 45nm Standard Cell Library
[49], resulting in a total area4 of 0.014mm 2. The energy/delay curves are obtained by scaling
the supply voltage between 0.7V- 1.4V.
A few variations of MEM relay-based 16-bit multipliers are selected for this analysis. The
microarchitectures of these multipliers are shown in appendix A and Table 4-2 summarizes
the specifications of each implementation. In all implementations the AND network is
preferred to Booth encoding, as the latter offers no area or speed advantages. The area of the
predictive relay devices (both 4T and 6T) is 11 Im 2.
The energy/delay curves of all implementations are shown for the operating voltage in the
range of 2V,, to 6Vp,. The simulations are based on the parameters of Table 2-1 which is a
modified version of the MEM-relay model described in [14], and mechanical delay derivation
in [13]. Since the VerilogA model is not yet optimized for simulation of complex logic with
standard CAD tools, an analytic model of all multiplier blocks has been developed in
MATLAB by calculating the activity factor of all nodes in the circuit, taking into account all
4 Corrected value of the area presented in [32].
84
parasitic and interconnect capacitances to improve the accuracy of analysis.
The energy-delay trade-offs (with supply-scaling) of both CMOS and MEM-relay 16-bit
multipliers are shown in Figure 4-15. The CMOS multipliers reach their minimum energy
point for delays greater than 50ns. As a result, the scaled MEM relay multipliers on average
offer -10x better energy-efficiency over CMOS multipliers for sub-10 MOPS operation. As
predicted, the 6T implementation is able to achieve even better energy/op figures due to lower
switching energy and smaller overall area. As shown in Figure 4-15, a further trade-off of
increasing area to parallelize multipliers enables operation in the GOPS region, while
preserving the energy-efficiency. It should be noted that for relays of today, for example the
scaled 6T device, the minimum energy would be ~20pJ/op and -16pJ/op for large compressor
and half-/full-adder-based microarchitectures.
In order to gain more insight about the merits of relay based implementation of Table 4-2 and
make a fair comparison between microarchitecture and device choices, it is useful to study the
energy/throughput curves for equal implementation area. Figure 4-16 illustrates such
comparisons between a few pairs of implementations, parallelized accordingly to have the
Table 4-2: Different implementations of the 16-bit MEM relay multiplier
Microarchitecture Relay Type Total Relay Count Total Area (mm2)
4T 3449 0.052
Small Counters
(Half/ Full adders) 6T 2009 0.039
6T 2009 0.047
Long Flexure
Large (N:3) 4T 5610 0.087
Compressors
6T 3211 0.046
85
10 'Pre~~~dictive -TRelay,(N3 Mfnprs
-9. Preittve.6-T.Relay, (N 3).comlpressors.
-4- PredHetive-6-T Relay, HAIFA.
...... . ...... Ped tve 6Relay, HA/FA,Long Eexu e
-- - CMWOS OTCT (90 hm)
10 :::::O:::S:DaddaIHCNangate(4nm )
0
a,.0 . . . . . . . . ... ....... ..... .... }. .
..... ... ... .I... .. .. ... .. . .. .. .. i d t v - . P
10 10 102 10
1/throughput(ns)
Figure 4-15 Energy/throughput comparison of CMOS and predictive (4T and 6T) MEM
relay 16-bit multipliers
same area. Figure 4-1 6a shows that for the same area of ~0.260mm2, obtained by 3-way
parallelization of 4T/large compressor (N:3) design and 4-way parallelization of 4T/HA-FA
design, the latter approach is preferred in terms of energy-efficiency and throughput. A
similar comparison for implementation with 6T devices is shown in Figure 4-16b, but in this
case the large compressors are more energy efficient for applications with throughputs over
50 MOPS. Finally, Figure 4-16c compares two similar architectures (small counters: HA/FA)
implemented with two different versions of 6T device, one with standard spring flexure
(L=3.5pim) and the other with extended spring flexure (L=4pim). The longer flexures offer
smaller spring constant and hence lower V,, and energy/op. But for similar Vas/V,,, the long
flexure device has longer mechanical delay (equations 2.4, 2.8 and 2.12) and also the total
implementation area of the circuit is larger, due to 15% increase in device area. Considering
all these trade-offs, Figure 4-1 6c suggests that for the same area of 0.19mm 2 , the standard
flexure 6T implementation is more energy-efficient for throughputs over ~30 MOPS (with 4-
86
103r
(a) .L
W5 I
10
1n
10
10
(b)
0
.1
-
10d
1n
10
0.
(C)
gM
Uj
210
4110
I10
10
2
10
1/throughput (ns)
2
10
1/throughput (ns)
210
1
I
3
30
3
1/throughput (ns)
Figure 4-16: Comparison of energy/throughput figures for various MEM relay multiplier
implementations. The circuits are parallelized to achieve the same area.
way parallelization), while the long flexure device implementation with the long flexure
device is preferred for ultra-low power applications with sub-20 MOPS throughput figures.
87
...... ...... Predictive 4-T Relay, (N:3) compressors (90 nm)
.............. -01m Predictive 4-T Relay, HAIFA (90 nm)
-~~~~~ 0.61m 2  8arlel-mm;..
.. ...... .. ......... .... ............ .
.. .. .I . . . ..S  . .. . . . .. . .. .. . . . . . . . . . . .. . . . . . .
X
.. .. . ... .. .. .... .. .. ..r. .. ..... .. .... 
.. .. ... ... .... .. . .. . ... . . .. . ... 
.0. .. ... ... . ... .. . . .. ..0.. ....260 sum.
.-.- Predictive 6-T Relay, (N:3) compressors (90 nm)
. . - -u- Predictive 6-T Relay, HAIFA (90 nm)
- 0.230 sam26X Parallel . . 046im
-
0.228 mm - 4"~0.038 m
-.-- Predictive 6-T Relay, HAIFA (90 nm) -
- - - Predictive 6-T Relay, HAIFA, Long Flexures (90 nm)
.190. - u ... Pra..l......................
............ .... ... ....-.
0.047 .
4X ParalleIl~
0.188 mmn % ------ -- -- -
4.4 Experimental Demonstration and Practical Issues
4.4.1 Reliability, Contact Oxidation and Oxide Breaking Procedure
A practical reliability challenge, since the first generation of MEM relays, has been the
growth of native oxide on the tungsten contacting electrode surfaces of the relays. The
oxidation, which can be aggravated by Joule heating when the relay circuit is active, leads to
increased RON and, eventually, circuit stuck-open failure. In [44] it has been shown that
depending on the operation frequency and the duty cycle, the relays can operate for 20-200
million cycles before the increased RON makes the circuit inoperable.
Although ongoing efforts toward contact material engineering, device encapsulation and
hermetic packaging of relay chips [45] will eventually remedy this problem, our experimental
demonstrations done in open ambient were challenged by this contact oxidation issue. Before
any current can flow through the relay, the oxide layer needs to be "broken". This process,
depicted in Figure 4-17 for a single 4T relay, requires a sufficiently high source-drain voltage
(3-8V, depending on the technology and logic topology) while the relay is actuated.
For more complex circuit blocks the contact oxide-break procedure is not as straightforward.
Observe
SD
e e' Native ,' * DlS
*Oxide 
' ,
VDD VDS
VGS Oxide Broken
Figure 4-17: Native oxide breaking procedure for a single MEM relay
88
Although during our early experiments we realized that by increasing the source-drain
voltage/current, the oxide of two to three relays in a stack can be removed, for longer stacks
the required voltage/current is so high that it can weld the contact and permanently short the
drain and/or source to the channel. Also, any high Vgs or Vgd ( > 7-12V, depending on the
relay technology) can result in breaking the gate oxide and permanent gate-drain or gate-
source shorts.
In order to avoid all these operational failures, a fully-controlled oxide-breaking procedure
with maximum access to all MEM relays of the circuit under test should be employed. In the
4T test chip, which was designed for functional validation of second generation 4T MEM
relay circuits with no area constraints, we placed many observation and control access points
throughout our sample (7:3) compressor. As a result, we were able to group the relays into
stacks of 2 or 3 and this facilitated the initial oxide-breaking, without a need to apply
prolonged high voltages to source/drain terminals. With that strategy we were able to
demonstrate the functionality of the compressor which is the largest functional 4T MEM relay
circuit reported to date [32].
For the scaled 6T test chip, fabricated in SEMATECH's 250nm process, our goal was to
optimize the area and implement a compressor that can be used in a full multiplier. This
limited the number of access points for the oxide-breaking to the input, output and supply
terminals. According to the implementation shown in Figure 4-13, the (7:3) compressor
consists of many long stacks of relays whose contact oxide cannot readily be removed, even
with source-drain voltages as high as 4 times the operational voltage. We have addressed that
problem by developing a fully automated oxide-breaking procedure, which finds the shortest
inactive stacks (the stacks with oxidized relay contacts) and tries to break their contact oxide
and activate them. An example of this method for the Y2 subcircuit is shown in Figure 4-18.
At the first step, a moderate voltage (2-4V) is applied to source/drain inputs and supplies, all
89
the A -6 vectors are fed to the gates of relays and the output(s) are observed for any sign of
existing active paths. Normally, the first active paths are the ones with shorter length, in this
case A =0000xxx, which has a length of 4 (Figure 4-18a). The existing active paths are used as
a base. In the next step, the other paths that share the largest number of relays with the base
0 1 1
Y,
4 13A4 A4C4 1= FI LM NF'
- 0 1 2 3 4 tl IN --ff: 3
Time(s) 42.
(a) vbD
6hV6
Fiue -8 Aneaml ofoid rekngpocdr for 1Y ofte6 rly73
8 1000
09
- 0 1 2 3 4I 2I
Time (s)A2dI m FA1[-j F2
(b) 
_ _
VD
Figure 4-18: An example of oxide breaking procedure for Y2 of the 6T relay (7:3)
compressor. (a) Observing the output to find an active path (base), and (b) selecting a stack
with only two unshared relays with the base. By increasing the source-drain voltage of this
90
paths, such as the one highlighted in Figure 4-18b, are temporarily supplied with a higher
source-drain voltage, increasing the chance of breaking the oxide of their inactive relays. The
reason is that an active relay, when actuated, can be considered a wire, and this significantly
reduces the effective length of the inactive stack. For this example, the effective length of the
stack in Figure 4-18b is reduced from 5 to 2. The change in the output and the appearance of
A=000O1xx vector stack show that both of the new devices in the stack have become active.
In order to avoid welding and other side effects of high voltage/current in the circuit, the
source-drain voltage is reduced to normal at the end of each step and as soon as a path is
active. The break procedure keeps on searching for inactive relays, activating them and
expanding the base directory until all relays are active and the circuit is ready for actual
testing.
4.4.2 The (7:3) Compressor Test Results
The die photo and the operation of two generations of (7:3) compressor, built with 4T and
scaled 6T devices, are shown in Figures 4-19 and 4-20, respectively. The small size of the 6T
MEM relay, availability of two metal routing layers, and the fact that the 6T compressor
needs only 46 MEM relays, compared to the 98 relays for the 4T compressor, lead to 40x
smaller circuit area, even though the area utilization for the new compressor is only 50%.
Following the oxide breaking procedure discussed earlier, a full-set of random A 0 .6 input
vectors ranging from 0 to 127 is applied to the compressor, actuating the relay gates and
activating different propagate, generate and kill paths in the sub-circuits. The required Vgb is
12V for the 4T and 9V for the 6T implementation. The measured output codes in both cases
perfectly match the expected values, demonstrating the correct functionality. The decline of
output voltage over time is attributed to the formation of native oxide on the contacting
surfaces of the MEM relays and the gradual increase in their on-resistance, which is
exacerbated by prolonged operation and increased temperature of contacts. Usually after a
91
4
2
0
0
*.
0 0. 1 1. 2 2. 3 3. 4
Tie s
0 Correct result 0 0 0 4
05 * Test result 0 * 0 m
3 * mm m e@m m e m s es e e 0
mmm me a * me0
0 20 40 60 80 100 120
Input code
Figure 4-19: Die photo and experimental results of the 2nd generation 4T (7:3) compressor
few cycles, as low as 3 and as high as 41 for different dies, the on-resistance of branches
becomes so high that the output voltages drop to zero, so another oxide breaking session
needs to performed for correct operation of the circuit. In the future relay generations and for
larger circuits like a full multiplier, this problem needs to be tackled at the manufacturing
level, either by using a different material for the contacts (a metal that do not grow oxide
easily, or has a conductive oxide), or by developing reliable packaging techniques.
Both the 4T and 6T compressors are the largest functional MEM relay circuits demonstrated
to date. In addition, the 6T implementation is also the largest operational circuit implemented
with smallest logic relays to date.
92
EH
(4-
0 1 2 3 4 5 6 7 8 9
010
CL
0 1 2 3 4 5 6 7 8 9
Time(S)
7 0 Correc result0i 5 - Test result 0 0 e e
ee e0 os e es In e
im- m me see e 
0 20 40 60 80 100 120
Input code
Figure 4-20: Die photo and experimental results of the scaled 6T (7:3) compressor
4.5 Summary
In this chapter we discussed the use of MEM relays as a logic element for VLSI circuit
applications. In spite of their nearly ideal switching behavior, with immeasurably low off-
state leakage and abrupt turn-on and turn-off, the switching speeds are significantly slower
than in CMOS. However, in this chapter we investigated design strategies to address this issue
at circuit and system level, focusing on the design of multipliers which are commonly known
as the most complex logic units.
The microarchitecture, circuit design optimization techniques and operation of two
93
generations of MEM relay-based multipliers and their components over an area-delay trade-
off space have been presented. Design analysis shows the performance benefits of higher ratio
compressors and Booth encoding enabled partial product generation for large multipliers,
while suggesting the use of simple half- and full-adders for smaller multipliers, and also
where area is constrained. The operation of the main building block of the multiplier, the (7:3)
compressor, is experimentally demonstrated for both 4T and 6T implementations.
Simulation results of 16-bit relay multipliers built in scaled relay processes predicts 5-20x
improvement in energy-efficiency over CMOS designs in the sub-100 MOPS performance
range and suggests that parallelism can be employed to extend these benefits to GOPS
operation region. The relative performance of the multiplier enhancements confirms that the
energy-gains previously predicted for a MEM relay 32-bit adder [14] can be extended to
larger arithmetic blocks, suggesting that complete VLSI systems such as microprocessors
would expect to see similar energy/performance improvements from adopting MEM relay
technology.
94
CHAPTER 5
MEM Relay I/O Design
In previous chapters it was shown that MEM relays are promising alternative switching
devices for digital computation. In addition to memory, power management, logic and
processing units, a complete VLSI system requires a way to interface to analog inputs and
outputs, or I/O. In this chapter we have focus on the implementation of MEM relay-based
digital-to-analog converters (DAC) for the purposes of signal transmission.
In most of the CMOS DAC/ADC architectures the transistors operate in saturation and
provide high output impedance and linear voltage gain with limited drain-source voltage. The
challenge in designing such mixed-mode circuits with MEM-relays is that the relays are pure
switching elements and without transconductance gain cannot effectively perform traditional
analog tasks. However, recently digitally-assisted and/or digital-like converters have gained
popularity in the CMOS world, mainly because of their scalability [26, 51-52]. We have
adapted some of these concepts to implement MEM relay I/O in this work. Moreover, this
chapter proposes sub-mechanical delay transmission circuits as a potential solution for
improving the transmission speed beyond the switching speed of MEM-relays.
5.1 Digital to Analog Converter
Figure 5-1 shows our implementation of a MEM relay DAC [9,11], inspired by existing
CMOS designs, such as the one proposed in [50]. This implementation is suitable for use as
an I/O transmitter. In this example, each buffer is driven by one of the thermometer encoded
inputs, where N=2k-i and k is the bit resolution of the DAC. Each driver is composed of a
MEM relay-based buffer followed by a resistor; the resistor is necessary to provide both a
95
DAC Architecture Voltage Output AC Impedance
Vio
D[l]
0 m -R m -V1,
R D[2] N RN CL
D[N] 
CL? D[NJ (N-m)-R CL
Figure 5-1: Sample MEM relay DAC topology, schematic and equivalent circuit
constant controlled termination (RIN) and a means for intermediate voltage generation. Such a
configuration creates a programmable resistor divider between the I/O voltage rail and
ground. For energy efficiency, it is important that the DAC input voltages and the I/O voltage
be independent of each other, especially for the generations of MEM relays in which
Vgb VIo [9].
Figure 5-2 demonstrates the operation of a 2-bit thermometer-coded DAC and the 4 output
states corresponding to the thermometer encoded levels. Since body biasing was not possible
in the CLICKR1 chip [11], the buffers were configured similar to the inverter of Figure 2-8b,
with A and A swapped. As mentioned earlier, since the relay actuation ideally depends only
on the gate-to-body voltage and not on the drain or source voltages, it is relatively
straightforward to incorporate a level-shift into the output stage. Although in the 1 t
generation MEM relay almost one third of actuation is attributed to gate-to-drain/source,
rather than gate-to-body area overlap, this level shifting function is still observed in Figure 5-
2, where the output full-scale voltage is 3V while the relay actuation voltage is 1 OV. With
scaled relays, this design style would also support relay actuation in the 100s of mV range
while producing output swings in the V range as required for different signaling applications
and I/O standards.
96
1 'g " ~'"111" Vin1 [-VV-
,,I, Ivr'R0.8 vi;---in
0.6 
"o11*-
M 0.4
0.2 V Vi
( 0 ~ 0 0.05 0.1 0.15 0.2 0.25 0.3
0 time (ms) Vout
1V Code = "Vin" 1 1
-0.8
0.6 V
Z0.4-
0.2 10"001" 
0 Vout0 0.1 0.2 0 0.4 Vi
time (ms) Vianntx; Code = 0 0 "Vin
Figure 5-2: MEM Relay 2-bit, thermometer coded DAC design and test results
5.2 Sub-mechanical Delay Signal Transmission
In the previous chapters we discussed the fact that the performance of MEM relay-based
systems is predominantly determined by the mechanical delay of the devices. In that context
and in order to leverage faster-than-mechanical-movement operation, it would be interesting
to examine the sub-mechanical delay operation of relay circuits. One such scenario is
proposed in Figure 5-3 which shows an on-chip data serializer/deserializer. The serializer
(transmitter) part consists of N+1 parallel branches. Each branch is made of a relay
multiplexer and a couple of overlap-capture relays. The parallel data inputs (Do... DN) select
97
whether a zero or a one is stored in the intermediate nodes. These stored values are
transmitted in a serial fashion during the overlap of CLKM and CLKM+J when the
corresponding overlap-capture relays are both on (tbi,). In other words, each overlap-capture
pair is active for tDor/(N+1) and only one pair is on at a time. As a result, the serial output
contains a stream of data bits that is N+1 times faster than the mechanical delay of a single
relay.
On the receiver (deserializer) side, the reverse process is performed: the fast input (SI) is
stored in the intermediate nodes of the consecutive receiving branches during the
so
I CLKM
CLKo -d Hi' CLK2N Hie
J Lf ICLKM+1:
CLK1 - CLK2N+1 1
Transmitter ]N [+]
-|| b,on
tbit
tov tD,on + tbit - tD,off
CLKo CLK1  L+ tD,off <tD,on
-D
Receiver -
SI-
CLK2N CLK2 N+1
5/ liN
Figure 5-3: Proposed serial izer/deserializer diagrams and working principles
98
corresponding overlap periods and enable the output MUXes which select whether zeros or
ones should be sent to the outputs.
It should be noted that the proposed concept of sub-mechanical delay operation relies on the
fact that the turn-off delay is much shorter than the pull-in delay, as the gate structure only
needs to travel enough distance (-1nm) to break the contacts rather than the entire gap
distance.
5.3 Summary
In this chapter we described solutions for on-chip signal generation and reception. The energy
efficiency of our proposed digital to analog converter, a CMOS inspired implementation, has
been verified and the operation of a simple DAC has been demonstrated. Finally, the concept
of sub-mechanical signal transmission has been introduced.
99
100
CHAPTER 6
Device and Circuit Design Challenges and Future
Prospects
6.1 Device Engineering Challenges
6.1.1 Limits of MEM Relay Scaling
In the previous chapters we have demonstrated the functionality of a variety of MEM relay-
based circuits. However the size and relatively high switching voltages of the current devices
do not immediately show the benefits of MEM relay over CMOS implementations. But,
similar to CMOS transistors, reliable scaling of the physical dimensions enables MEM relays
to achieve substantially lower switching energy and improved speed.
The relay energy is primarily spent driving the parasitic capacitances, and thus the energy per
operation is improved by lowering the relay operating voltage and decreasing the load
capacitance, which is possible by shrinking the actuation area. Such modifications reduce the
electrostatic force needed for actuation, so the opposing spring restoring force must be
reduced accordingly. This can be achieved by reducing the thickness of the flexures and the
gap, which results in reduced pull-in voltage. An interesting observation is that in this
scenario the mass of the device scales more quickly than the spring constant, which results in
faster actuation at smaller dimensions.
Similar to the classic scaling theory developed for CMOS, constant-field scaling for MEM
devices has been studied [53-54]. Based on this scaling methodology, the electric field across
the actuation gap is fixed at a constant value, while all of the.dimensions of the device are
101
scaled by the scaling factor of S Although there are more advanced scaling strategies that
optimize switching speed, energy, and layout area [13], this linear scaling methodology
provides useful insight into the benefits of scaling and the path towards closing the gap
between the CMOS and MEM relays at the device level. Table 6-1 shows that with this
constant field scaling the pull-in voltage and gate capacitances scale linearly. As a result, the
energy scales cubically (~C, V,,), while the mechanical delay scales linearly (Equation 2.12).
There are, however, challenges and fundamental limits to the relay scaling. One of the
greatest obstacles to scaling may be reducing the gap thickness. Although sacrificial layers
can be deposited at sub-20 nm dimensions, reliably releasing the devices at these dimensions
is not as straightforward. The thinnest gap that has been successfully realized for MEMS is 10
nm [55]. Another challenge is strain gradient in the structural layer that causes out-of-plane
deflection. This strain gradient is worse for Poly SiGe layers thinner than 1 pim, so optimization of
the low-temperature poly SiGe deposition process for such thin layers needs to be studied. Multi-
layer structural materials can also be used to lower the overall strain gradient of the structure [12].
Table 6-1: Constant Field Scaling for MEM relays
MEM Relay Parameter Constant Scaling Factor
Physical dimensions: W, L, Wf, L, H, g, gd S
Actuation area : A,, S2
Gate capacitance: Cge, Cgb, Cgigs S
Surface forces: FA S2
Pull-in/out voltages: Vi, Vp, S
Mechanical delay: tmech S
Energy S,
102
The energy-efficiency limit of scaling relays is mainly determined by the surface forces at the
contacts (primarily hydrogen bonds, capillary forces, and Van der Waals force). The spring
restoring force must be able to overcome the surface forces and deactuate the relay when the
gate voltage and electrostatic force is removed. This sets a lower bound on the spring stiffness
constant and hence the switching energy of the device.
Figure 6-1 shows that the surface forces are proportional to the area of the contact dimple [13,
56]. So if the dimple area scales with other dimensions, the surface forces are reduced and the
spring restoring force will still be able to overcome them. However, for a very small contact
dimple area both the surface forces (and contact resistance) will be determined by a few
metal-metal bonds at the contact asperities. The minimum stored spring energy must be large
enough to overcome the energy of these bonds. For example with five bonds, each of which
having an energy of 0.2aJ [57, 58], the minimum switching energy of 4aJ per device
_-%
0
.
0
CD
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
Figure 6-1: Average F
[13, 56].
0 10 20 30 40 50
Dimple Area [pm 2J
A (with standard deviation indicated) versus the contact dimple area
103
2 M . - ___ I . - I . MWA
switching cycle, which is 1Ox lower than CMOS, would be achievable [13].
6.1.2 Contact Engineering and Device Reliability
The endurance of MEM relay logic can be limited by Joule heating at the contact points
which may result in welding and the failure of the device. As illustrated in Figure 6-2, an
experimentally validated contact endurance model projects that a scaled 90nm node relay with
tungsten electrodes operating at IV can achieve up to 1015 switching cycles before welding-
induced failure [43]. The mean number of cycles to failure (MCTF) increases exponentially
and linearly with decreasing VDD and CL, respectively
MEM relay contact engineering is an immediate challenge for improving the reliability of
devices. Tungsten (W) is an excellent material to prevent contact failure from wear, plastic
deformation, and operational stiction. However, as we described in Chapter 4, tungsten readily
forms insulating native oxide, so the thin layer of TiO 2 and native tungsten oxide need to be
"broken" to let the current flow through the relay. This process requires a sufficiently high
voltage and current to breakdown the oxide, which introduces the risks of either breaking the
gate oxide, which may cause a gate-to-drain or gate-to-source short circuit, or welding the
metal contact which can permanently stick the channel and source/drain terminals together.
1.E+16
1.E+14 - Extracted EA=7.23eV
1.E+142
S1.E+12-
4 1.E+10 CL=440pF
1.E+08
1.E+06 S=~13decV
CL=4nF1.E+04 * ,
1.E+02 x
1.E+00 '-
0 0.2 0.4 0.6 0.8 1
1NDD[V
Figure 6-2: Measured and predicted endurance of MEM relay vs. 1/VDD
104
1W
Z 5
0
0 1 2 3
# of Cycles X 10
Figure 6-3: RON evolution with the number of hot-switching cycles [59] (Vd=4V,f=5kHz).
In a recent development in the field of contact engineering, Ruthenium (Ru) contact relays
with good switching behavior and more stable on-state resistance than W-contact relays are
demonstrated. The main characteristics that make Ru an attractive candidate is that both Ru
and its native oxide (RuO 2) are hard and electrically conductive so that RON does not increase
significantly with exposure to air, as illustrated in the experimental measurements of Figure 6-
3 [59]. Another potential way to tackle the contact oxidation issue is a reliable solution for
hermetic sealing of devices and/or circuits [45, 60].
6.2 Future Prospects of MEM Relay Circuit Design
With the advances in the MEM relay technology, the importance of developing a robust
design infrastructure becomes more significant. We have leveraged the existing CMOS
infrastructures and commercial tools as a starting point for developing a custom infrastructure
catered for MEM relay circuit design styles (Figure 6-4). For example we developed a simple
VerilogA model for simulation of relay circuits, created the symbol/schematic level design
environment, automated placement and routing procedure and implemented some
rudimentary design rule checker (DRC) and layout versus schematic (LVS) tools. Kevin
Dwan and Chengcheng Wang have already developed a preliminary version of MEM relay
synthesis tool [40]. This tool can synthesize small circuits (<200 relays) in one mechanical
105
Figure 6-4: MEM Relay design infrastructure
delay, but needs to be enhanced to enable optimized logic partitioning and mechanical delay
insertion for more complex logic designs, like the ones we explored in Chapter 4 for the
implementation of MEM relay multipliers. The relay and circuit models needs to be improved
to account for second order effects, such as mismatch, noise, and device non-linearities.
With the support of this design infrastructure, and with MEM relays being scaled down with
the advances of fabrication and relay design technologies, implementation of a complete
MEM-relay based system, such as a microcontroller, becomes more feasible. We have
adapted the instruction set architecture of Xilinx's picoBlaze 8-bit microcontroller [61] and
designed a MEM relay optimized microcontroller. The first generation microcontroller
includes all the essential building blocks of a traditional architecture, such as reduced size
Instruction/program memory (64x18b), Instruction decode logic, custom-designed RAMs
106
(8x1Ob call return stack, program counter), 16 byte-wide registers, interrupt handling, custom
ALU, etc. The microcontroller is currently under fabrication and features ~12k of new
generation Ruthenium-based 6T devices with oxidation stable contacts. With these new
generations of devices, we expect to be able to design a variety of large-scale MEM-relay
based VLSI systems.
6.3 Summary
The MEM relay technology has advanced with a fast pace in the last few years. Although
moderate scaling has been achieved in the new iterations of the device, more aggressive
scaling is required to realize the predicted energy efficiency benefits over CMOS circuits. In
this chapter we discussed the implications and limitations of device scaling and the challenges
of contact engineering. We also described the current status of our design infrastructure and
required enhancements for optimal MEM relay VLSI system design.
107
108
CHAPTER 7
Conclusions
In this thesis we investigated the feasibility of MEM relays as an alternative switching
technology for VLSI applications, with a promising potential to address the energy-efficiency
crisis faced by the CMOS industry. While trying to leverage the ideal switching
characteristics of MEM relays, i.e. zero leakage current and abrupt switching behavior, to
maximize power savings, we have developed circuit design techniques and optimization
methodologies to minimize the impact of slow mechanical turn-on at the functional unit level
and implement MEM relay-based systems which offer competitive performances.
An analytical comparative analysis of MEM relay and CMOS power gating solutions has
highlighted the design space in which MEM relay power gating enables energy savings and
provided a set of design guidelines regarding the required off-time and duty cycle for various
relay and CMOS technologies. In addition to energy savings, MEM relay power gating offers
the advantage of reduced active area overhead as the relays could be post-fabricated on top of
the chip or integrated into the backend metallization layers with no penalty to the active die
area. The analysis is followed by an experimental demonstration of MEM relays power gating
a CMOS analog chip, and also an integrated autonomous pulse generation unit which is used
in conjunction with the power gates.
This thesis has also described the microarchitecture, circuit design optimization techniques
and operation of two generations of MEM relay-based multipliers and their components over
an area/delay/energy trade-off space. Just like with CMOS-based synthesis tools, emerging
relay-based synthesis is unable to implement multipliers efficiently. We have introduced the
109
concept of optimal logic partitioning to make such complex arithmetic units implementable.
In addition to that, we have optimized the building blocks of both the proposed
microarchitectures (adders and compressors), to enable zero-mechanical delay cascading. It is
interesting to note that if we use CMOS style compressors, even in our optimized
microarchitectures, we will end up with multipliers 3-10x slower than the proposed
implementation. Design analysis shows the performance benefits of higher ratio compressors
and Booth encoding enabled partial product generation for large multipliers, while suggesting
the use of simple half- and full-adders for smaller multipliers, and also where area is
constrained. The operation of the main building block of the multiplier, the (7:3) compressor,
is experimentally demonstrated for both 4T and 6T implementations.
Simulation results of 16-bit relay multipliers implemented with 90nm equivalent relay model
shows 5-20x improvement in energy-efficiency over CMOS designs in the sub-100 MOPS
performance range and suggests that parallelism can be employed to extend these benefits to
GOPS operation region. The relative performance of the multiplier enhancements confirms
that the energy-gains previously predicted for a MEM relay 32-bit adder can be extended to
larger arithmetic blocks, suggesting that complete VLSI systems such as microprocessors
would expect to see similar energy/performance improvements from adopting MEM relay
technology.
In spite of the promising prospect of MEM relay as the "next transistor," there are challenges
that need to be addressed before the relays become relevant in VLSI systems. A clear
challenge is to scale the devices reliably, in order to realize the predicted energy efficiency.
Although contact welding was mentioned as one of the main device failure mechanisms, the
good news is that with scaling of relay features and operating voltage, the joule heating can be
minimized and a lifetime of more than 1015 cycles can be achieved for MEM relays. The main
obstacles for reliable scaling, namely the practical limitations of gap thickness scaling and the
110
strain gradient for thin Poly SiGe layers, were discussed in the last chapter.
As the MEM relay technology advances, the development of a robust design infrastructure
becomes more important. The MEM relay design infrastructure developed by our team
includes many essential components, such as custom designed tools for schematic entry and
simulation, LVS and DRC verification, automated placement and routing, etc. Since the
infrastructure was initially optimized for smaller VLSI blocks, some components such as the
relay model and the synthesis tool need major enhancements to support the implementation of
complex MEM relay logic.
111
112
APPENDIX A
Microarchitecture of 16-bit Multipliers
6 -- - -
* 6 - - -
m-E T---
*m mmmva y
* mmmmm
m mmmmmm a
m mm mme
U e
vmmm
0 M
jm
0 v --------
M v ---------
m v
mmv
mma
ammmv
mmmm a
, Tmmm
mmm
m
m
m
m
m
m
m
ym
yv
M
v
v
v
v
Figure A-1: Microarchitecture of a 16-bit multiplier built with Half- and Full-adders
113
x
-o
C
C
------------ 1+
M v
m v
M M-
my
Sy
em
mma
mma
mme
mma
Z
CL
+
.* t~4
m
v
v
v
y
----f
m.
m.
yh.
Ym.
mb.
mh.
V6,
T6-
V6-
V6
T6-
m
m
m
m
m
m
m
m
m
m
m
v
v
-- - - - - - -- - - - --O - - - - - - -
16-bit m tiplicand
Partial Product Generation matrix
~1
0
0
00
~10
0
0~
0~
2
0
0
-t0
0
-t
IIIII III41I1II1-
32-bit Product
(N:3) C ressor
4-.IIIIIIIIIII
-IIIIII1
--- -IIIEIuI.I.I if,1 1 1
16-bit T tiplier
Bibliography
[1] B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing for minimum
energy operation in subthreshold circuits," IEEE Journal of Solid-State Circuits, vol.
40, no. 9, pp. 1778-1786, Sep. 2005.
[2] T. Baba, "Proposal for surface tunnel transistor," Japan Journal of Applied Physics,
vol. 31, no. 4B, pp. L455-L457, Apr. 1992.
[3] A. M. Ionescu, V. Pott, R. Fritschi, K. Banerjee, M. J. Declerq, P. Renaud, C. Hibert,
P. Fluckiger, and G. A. Racine, "Modeling and design of a low-voltage SOI
suspended-gate MOSFET (SG-MOSFET) with a metal over-gate architecture," in
Proceedings of International Symposium on Quality Electronic Design, 2002, pp.
496-501.
[4] K. Gopalakrishnan, P. B. Griffin, and J. D. Plummer, "I-MOS: A novel semiconductor
device with a subthreshold slope lower than kT/q," International Electron Devices
Meeting. IEDM Technical Digest, Dec. 2002, pp. 289-292.
[5] S. Salahuddin and S. Datta, "Use of negative capacitance to provide a sub-threshold
slope lower than 60 mV/decade," Nanoletters, vol. 8, no. 2, pp. 405-410, 2008.
[6] Zuse, Konrad (1993). Der Computer. Mein Lebenswerk. (in German) (3rd ed.). Berlin:
Springer-Verlag. p. 55. ISBN 978-3-540-56292-4.
[7] E. Nowak, "CMOS devices below 0.1 gm: how high will performance go?"
International Electron Devices Meeting. IEDM Technical Digest, pp. 215-218, 1997.
[8] Intel, "Microprocessor Quick Reference Guide," 2008. [Online]. Available: http://
www.intel.com/pressroom/kits/quickreffam.htm.
115
[9] F. Chen, H. Kam, D. Markovi6, T.-J. K. Liu, V. Stojanovid, and E. Alon, "Integrated
circuit design with NEM relays," in 2008 IEEE/ACM International Conference on
Computer-Aided Design, Nov. 2008, pp. 750-757.
[10] R. Nathanael, V. Pott, H. Kam, J. Jeon, and T.-J. K. Liu, "4-Terminal Relay
Technology for Complementary Logic," in 2009 IEEE International Electron Devices
Meeting (IEDM), Dec. 2009, pp. 1-4.
[11] F. Chen, M. Spencer, R. Nathanael, C.Wang, H. Fariborzi, A. Gupta, H. Kam, V. Pott,
J. Jeon, T.-J. K. Liu, D. Markovi, V. Stojanovid, and E. Alon, "Demonstration of
integrated micro-electro-mechanical switch circuits for VLSI applications," in IEEE
International Solid-State Circuits Conference, Feb. 2010, pp. 150-15 1.
[12] I.R. Chen, L. Hutin, C. Park, R. Lee, R. Nathanael, J. Yaung, J. Jeon, and T.-J. King
Liu "Scaled Micro-Relay Structure with Low Strain Gradient for Reduced Operating
Voltage," ECS Trans., vol 45, no. 6, pp 101-106, 2012.
[13] H. Kam, T.-j. K. Liu, V. Stojanovid, D. Markovi6, and E. Alon, "Design,
Optimization, and Scaling of MEM Relays for Ultra-Low-Power Digital Logic," IEEE
Transactions on Electron Devices, vol. 58, no. 1, pp. 236-250, 2011.
[14] M. Spencer, F. Chen, C.Wang, R. Nathanael, H. Fariborzi, A. Gupta, H. Kam, V. Pott,
J. Jeon, T.-J. K. Liu, D. Markovi6, E. Alon, and V. Stojanovid, "Demonstration of
integrated micro-electro-mechanical switch circuits for VLSI applications," IEEE
Journal of Solid State Circuits, vol. 46, no. 1, pp. 308-320, Feb. 2011.
[15] H. Goldstine, The Computer: from Pascal to von Neumann. Princeton, New Jersey:
Princeton University Press.
[16] T.-J.K. Liu, D. Markovi6, V. Stojanovid, E. Alon, "The relay reborn," Spectrum, IEEE
, vol.49, no.4, pp.40,43, April 2012.
[17] M. Ruan, J. Shen, and C. B. Wheeler, "Latching micro magnetic relays with multistrip
permalloy cantilevers," in Proc. IEEE International Conference on Micro Electro
Mechanical Systems (MEMS), Interlaken, Switzerland, Jan. 2001, pp. 224-227.
116
[18] P. Zavracky, S. Majumder, and N. McGruer, "Micromechanical switches fabricated
using nickel surface micromachining," Journal of Microelectromechanical Systems,
vol. 6, no. 1, pp. 3-9, Mar. 1997.
[19] G.-L. Tan and G. M. Rebeiz, "A DC-contact MEMS shunt switch," IEEE Microwave
and Wireless Components Letters, vol. 12, no. 6, pp. 212-214, June 2002.
[20] C. Goldsmith, J. Randall, S. Eshelman, T. H. Lin, D. Denniston, S. Chen, and B.
Norvell, "Characteristics of micromachined switches at microwave frequencies," in
Proc. IEEE M7T-S Int. Microwave Symposium Dig., San Francisco, CA, Jun. 1996,
pp. 1141-1144.
[21] A.Q. Liu, M. Tang, A. Agarwal and A. Alphones, " Low-loss lateral micromachined
switches for high frequency applications," in Journal of Micromechanical
Microengineering, vol. 15 pp.157-167, 2005.
[22] R. Nathanael, "Nano-Electro-Mechanical (NEM) Relay Devices and Technology for
Ultra-Low Energy Digital Integrated Circuits," Ph.D. dissertation, University of
California - Berkeley, 2012.
[23] H. Kam, "MOSFET Replacement Devices for Energy-Efficient Digital Integrated
Circuits," Ph.D. dissertation, University of California - Berkeley, 2009.
[24] C. W. Low, T.-J. King Liu, and R. T. Howe, "Characterization of polycrystalline
silicon-germanium film deposition for modularly integrated MEMS applications,"
Journal ofMicroelectromechanical Systems, vol. 16, no. 1, pp. 68-77, Feb. 2007.
[25] R. L. Puurunen, J. Saarilahti, and H. Kattelus, "Implementing ALD Layers in MEMS
processing," Electrochemical Society Transactions, vol. 11, no. 7, pp. 3-14, Oct. 2007.
[26] F. Chen, "Energy-efficient Wireless Sensors: Fewer Bits, Moore MEMS," Ph.D.
dissertation, Massachusetts Institute of Technology, 2011.
[27] S. D. Senturia, Microsystem Design. Boston, MA: Kluwer Academic, 2001.
[28] R. Holm, Electric Contacts. Springer-Verlag, 1967.
117
[29] B. D. Jensen, K. Huang, L. L. W. Chow, and K. Kurabayashi, "Adhesion effects on
contact opening dynamics in micromachined switches," Journal of Applied Physics,
vol. 97, no. 10, p. 103 535, May 2005.
[30] S. P. Sharma, "Adhesion coefficients of plated contact materials," Journal of Applied
Physics, vol. 47, no. 8, pp. 3573-3576, Aug. 1976.
[31] M.E. Sikorski, "The adhesion of metals and factos that influence it," Wear, 7, 144,
1964.
[32] H. Fariborzi, F. Chen, R. Nathanael, J. Jeon, T.-J. K. Liu, and V. Stojanovid, "Design
and demonstration of micro-electro-mechanical relay multipliers," in IEEE Asian
Solid-State Circuits Conference (Jeju, Korea), November 2011.
[33] H. Fariborzi, M. Spencer, V. Karkare, J. Jeon, R. Nathanael, C. Wang, F. Chen, H.
Kam, V. Pott, T.-j. K. Liu, E. Alon, V. Stojanovid, and D. Markovi6, "Analysis and
Demonstration of MEM-Relay Power Gating," in IEEE Custom Integrated Circuits
Conference, vol. 1, 2010.
[34] Predictive Technology Models, 2011. [Online]. Available: http://ptm.asu.edu.
[35] F. Chen, A. P. Chandrakasan, and V. Stojanovid, "A Signal-agnostic Compressed
Sensing Acquisition System for Wireless and Implantable Sensors," in IEEE Custom
Integrated Circuits Conference, 2010, pp. 1-4.
[36] I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS
Circuits, 1st ed. Morgan Kaufmann, 1999.
[37] R. Gupta, B. Tutuianu and L.T. Pileggi, "The Elmore delay as a bound for RC trees
with generalized input signals," IEEE Transactions Computer-Aided Design of
Integrated Circuits and Systems, vol.16, no.1, pp.95, 104, Jan 1997.
[38] H. Fariborzi, F. Chen, R. Nathanael, I. Chen, L. Hutin, R Lee, T-J. K. Liu, V.
Stojanovid, "Relays Do Not Leak - CMOS Does," to be presented in IEEE Design
Automation Conference (DAC 2013), June 2013.
[39] F. Chen, H. Kam, D. Markovic, T.-J. K. Liu, V. Stojanovic, and E. Alon, "Integrated
circuit design with NEM relays," in Proc. IEEE/ACM International Conference on
Computer-Aided Design, Nov. 2008, pp. 750-757.
118
[40] K. Dwan, "Logic Synthesis of MEM Relay Circuits," M.Sc. Thesis, University of
California - Los Angeles, 2011.
[41] 0. L. McSorley, "High speed arithmetic in binary computers," in Proc. IRE, vol. 49,
no. 1, pp. 67-91, Jan. 1961.
[42] P.J. Song and G. De Micheli, "Circuit and architecture trade-offs for high speed
multiplication," IEEE Journal of Solid-State Circuits, vol. 26, no. 9, pp. 1184-1198,
Sep 1991.
[43] H. Kam, E. Alon, and T.-J.K. Liu, "A predictive contact reliability model for MEM
logic switches," in Proc. IEEE International Electron Device Meeting Technical
Digest, pp. 16.4.1-16.4.4., Dec. 2010.
[44] Y. Chen, R. Nathanael, J. Jeon, J. Yaung, L. Hutin, and T.-J.K. Liu "Characterization
of Contact Resistance Stability in MEM Relays With Tungsten Electrodes", IEEE
Journal ofMicroelectromechanical Systems, vol. 21, no. 3, pp. 511-513, June 2012.
[45] E.S. Park, J. Jeon, V. Subramanian, T.-J.K. Liu, "Inkjet-
printed microshell encapsulation: A new zero-level packaging technology", in IEEE
International Micro Electro Mechanical Systems Conference, pp. 357-360, Jan. 2012.
[46] S.K. Hsu, S.K. Mathew, M.A. Anders, B.R. Zeydel, V.G. Oklobdzija, R.K.
Krishnamurthy, and S.Y. Borkar, "A 110 GOPS/W 16-bit multiplier and
reconfigurable PLA loop in 90-nm CMOS," IEEE Journal of Solid-State Circuits, vol.
41, no. 1, pp. 256-264, Jan. 2006.
[47] L. Dadda, "Some schemes for parallel multipliers", Alta Frequenza, vol. 34, No. 5, pp.
349--356, March 1965.
[48] T.D. Han and D. A. Carlson, "Fast area-efficient VLSI adders," 8 'h symposium on
Computer Arithmetic, May 1987.
[49] Nangate "45nm Open Cell Library," 2011. [Online]. Available:
www.nangate.com/openlibrary.
[50] K.-L.J. Wong, H. Hatamkhani, M. Mansuri, and C.-K.K. Yang, "A 27-mW 3.6Gb/s
I/O Transceiver," IEEE Journal of Solid-State Circuits, vol. 39, no. 4, Apr. 2004, pp.
.602-612.
119
[51] L. Brooks and H.-S. Lee, "A Zero-Crossing-Based 8-bit 200 MS/s Pipelined ADC,"
IEEE Journal ofSolid-State Circuits, vol. 42, no. 12, pp. 2677-2687, Dec. 2007.
[52] M. Z. Straayer and M. H. Perrott, "An efficient high-resolution 11-bit noise-shaping
multipath gated ring oscillator TDC," in IEEE Symposium on VLSI Circuits, 2008, pp.
82-83.
[53] R. H. Dennard, F. H. Gaensslen, H. N. Yu, V. L. Rideout, E. Bassous, and A. R.
LeBlanc, "Design of ion-implanted MOSFET's with very small physical dimensions,"
IEEE Journal ofSolid-State Circuits, vol. SC-9, pp. 256, 1974.
[54] M. L. Roukes, "Nanoelectromechanical systems," in Technical Digest of 2000 Solid-
State Sensor and Actuator Workshop, Hilton Head Island, SC, June 4-8, 2000, pp.
367-376.
[55] T. J. Cheng and S. A. Bhave, "High-Q, low impedance polysilicon resonators with 10
nm air gaps," in Proceedings of International Conference on Micro Electro
Mechanical Systems (MEMS), 2010, pp. 695-698.
[56] H. Kam, V. Pott, R. Nathanael, J. Jeon, E. Alon, and T.-J. K. Liu, "Design and
reliability of a MEM relay technology for zero-standby-power digital logic
applications," in International Electron Devices Meeting (IEDM) Technical Digest,
Dec. 2009, pp. 809-812.
[57] R. Holm and E. Holm, Electric Contacts; Theory and Application, 4th ed. Berlin,
Germany: Springer-Verlag, 1967.
[58] G. Rubio-Bollinger, S. R. Bahn, N. Agrait, K. W. Jacobsen, and S. Vieira,
"Mechanical properties and formation mechanisms of a wire of single gold atoms,"
Physical Review Letters, vol. 87, no. 2, p. 026 101, Jul. 2001.
[59] I-R. Chen, Y. Chen, L.Hutin, V. Pott, R. Nathanael and T.-J. K. Liu, "Stable
Ruthenium-Contact Relay Technology for Low-Power Logic," to be presented at
IEEE Transducers 2013.
[60] R. Candler, W. Park, H. Li, G. Yama, A. Partridge, M. Lutz, and T. Kenny, "Single
wafer encapsulation of MEMS devices," IEEE Transactions on Advanced Packaging,
vol. 26, no. 3, pp. 227-232, Aug. 2003.
120
[61] Xilinx, "PicoBlaze 8-bit Microcontroller," 2012. [Online]. Available:
http://www.xilinx.com/products/intellectual-property/picoblaze.htm.
[62] R. Palmer, J. Poulton, W. J. Dally, J. Eyles, a. M. Fuller, T. Greer, M. Horowitz, M.
Kellam, F. Quan, and F. Zarkeshvari, "A 14mW 6.25Gb/s Transceiver in 90nm
CMOS for Serial Chip-to-Chip Communications," 2007 IEEE International Solid-
State Circuits Conference. Digest of Technical Papers, pp. 440-614, Feb. 2007.
121
