Ontwerp van ingebedde STT-MRAM cellen vanaf de 10 nm finFET generatie by Appeltans, Raf
ARENBERG DOCTORAL SCHOOL
Faculty of Engineering Science
Embedded STT-MRAM cell
design in and beyond 10 nm
finFET nodes
Making the link to the physical
implementation
Raf Appeltans
Dissertation presented in partial
fulfillment of the requirements for the
degree of Doctor of Engineering
Science (PhD): Electrical Engineering
August 2017
Supervisors:
Prof. dr. ir. W. Dehaene
Prof. dr. ir. L. Van der Perre

Embedded STT-MRAM cell design in and
beyond 10 nm finFET nodes
Making the link to the physical implementation
Raf APPELTANS
Examination committee:
Prof. dr. Adhemar Bultheel, chair
Prof. dr. ir. W. Dehaene, supervisor
Prof. dr. ir. L. Van der Perre, supervisor
Prof. dr. ir. M. Heyns
Prof. dr. ir. J. Van Houdt
Prof. dr. ir. R. Lauwereins
Dr. ir. P. Raghavan
(imec-Leuven)
Prof. dr. ir. D. Wouters
(RWTH-Aachen)
Dissertation presented in partial
fulfillment of the requirements for
the degree of Doctor of Engineering
Science (PhD): Electrical Engineer-
ing
August 2017
© 2017 KU Leuven – Faculty of Engineering Science
Uitgegeven in eigen beheer, Raf Appeltans, Celestijnenlaan 200A box 2402, B-3001 Leuven (Belgium)
Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigd en/of openbaar gemaakt
worden door middel van druk, fotokopie, microfilm, elektronisch of op welke andere wijze ook zonder
voorafgaande schriftelijke toestemming van de uitgever.
All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm,
electronic or any other means without written permission from the publisher.
Preface
No one who achieves success does so without acknowledging the
help of others. The wise and confident acknowledge this help with
gratitude - Alfred North Whitehead
I would like to thank the members of the jury, my supervisors, my daily
supervisor and the many colleagues that contributed to this work. Doing a PhD
can sometimes feel lonely, but would surely be impossible without the support
of so many people willing to spend time to help you with your project.
The true sign of intelligence is not knowledge but imagination -
Albert Einstein
I would like to thank the PhD community at imec for the wonderful PhD Days
that I was fortunate enough to help organize. The interaction with students
from different fields with different backgrounds really lifted my spirits and fueled
my creativity.
A people without the knowledge of their past history, origin and
culture is like a tree without roots - Marcus Garvey
I would like to thank my friends and family, mom, dad and brother for not
calling me crazy for quiting my job and starting a PhD, ... more than five times.
I pride myself for coming from a small country town and a hard working, caring
family. With love and dedication, there is nothing you can’t achieve.
The greatest thing you’ll ever learn is just to love and be loved in
return - Ewan McGregor - Nat King Cole
I would like to thank my wife for supporting me throughout my PhD and for
every moment we have spent together before and after. She often keeps my
focus where it should be, when I drift off with another crazy idea that will
change the world as we know it.
i
ii PREFACE
Samen delen, samen spelen - A., L. and E. Appeltans
I would like to thank my sons for never asking me about work. There really
is no better antidote for getting caught up in work than biking home with a
bike full of kids, just being kids. I would like to thank my daughter for the one
month break she gave me during my PhD, so I could take care of her before she
went to daycare. Not only is this is a once in a lifetime experience, but it clears
your mind and helps you prolong your PhD when the timing becomes tight.
Real knowledge is to know the extent of one’s ignorance - Confucius
As you probably have noticed by now, I am a big fan of quotes. This last one is
one of my favorites. It keeps me modest and helps to put things in perspective,
but it also challenges me and keeps me going. There is so much more to learn
and do, so let’s get to it ...
Abstract
Memories which are embedded on the same physical chip as the processor, are
becoming dominant in chip area as opposed to the processor itself. Spin-Transfer
Torque Magnetic Random Access Memory or STT-MRAM is being proposed
as an area efficient alternative to the common Static Random Access Memory
or SRAM. In the technology nodes when STT-MRAM should be introduced,
in and beyond the 10 nm node, the manufacturing process of semi-conductor
chips has undergone some significant changes. The most important changes are
the use of fixed size "fin"-based transistors and the use of multiple patterning
techniques to allow the creation of small and dense physical structures. The
manufacturing of the embedded memory and the design of the memory cells
needs to be fully compatible with this process in all three dimensions in order
to guarantee successful integration.
A thorough analysis of the physical layout of embedded STT-MRAM cells
in the 10 nm and 7 nm node shows the importance of secondary design rules
impacted by the different multiple patterning techniques. Process techniques
to enhance the size scaling such as multi-level via’s can effectively reduce the
size of STT-MRAM cells and are imperative for future scaling. Two novel
cell designs targeting area density and high performance respectively, show
the importance of making the link to the physical implementation. They are
optimized to counter the ever increasing parasitic resistance of the interconnect
lines and show how through inclusive design, more is actually less!
iii

Beknopte samenvatting
Een processor chip bestaat zowel uit rekeneenheden om berekeningen
uit te voeren als uit geheugens om data tijdelijk te bewaren. Deze
geïntegreerde geheugens, typisch Static Random Access Memory of SRAM,
nemen tegenwoordig meer ruimte in op de chip dan de rekeneenheden zelf. Spin-
Transfer Torque Magnetic Random Access Memory of STT-MRAM is een nieuwe
geheugen technologie en wordt als alternatief voor SRAM bekeken omdat het
veel minder ruimte op de chip inneemt. Omdat het nog een nieuwe technologie is,
zal STT-MRAM pas vanaf de 10 nm technologie gebruikt worden. Het fabricage
proces van halfgeleider chips zal dan enkele belangrijke veranderingen hebben
doorgemaakt t.o.v. voorgaande generaties. De belangrijkste veranderingen zijn
het gebruik van "vin-gebaseerde transistoren die een vaste grootte hebben en
het gebruik van technieken om kleine en compacte structuren te maken door
middel van het printen van meerdere, minder compacte, patronen na elkaar.
De fabricage en het ontwerp van de geïntegreerde geheugen cellen moet volledig
compatibel zijn met dit proces, in elke dimensie, om een geslaagde integratie te
garanderen.
Een grondige analyse van het fysische ontwerp van geïntegreerde STT-MRAM
cellen in de 10 nm en 7 nm technologie is uitgevoerd. Deze analyse toont het
belang van nieuwe of extra strenge ontwerp regels die beïnvloed worden door de
verschillende technieken die meerdere patronen gebruiken. Fabricage technieken
om het verkleinen van deze cellen te verbeteren zijn onderzocht zoals verticale
verbindingen die meerdere niveaus overbruggen. Zij kunnen de grootte van
STT-MRAM cellen effectief verminderen en zijn van het grootste belang voor
verdere verkleining. Twee nieuwe cel-ontwerpen tonen aan hoe belangrijk het is
om rekening te houden met de fysische implementatie. Een eerste is ontworpen
om zo weinig mogelijk ruimte in te nemen op de chip en een tweede is ontworpen
om een zo snel mogelijk te werken. Ze zijn bovendien geoptimaliseerd om de
steed groter wordende, en ongewenste, weerstand van de verbindingslijnen tegen
te gaan. Deze geheugencellen tonen aan hoe door middel van ontwerp dat
rekening houdt met alle aspecten, meer eigenlijk minder is!
v

Abbreviations
193i 193 nm immersion. 9, 25, 26
1T 1MTJ 1 Transistor 1 Magnetic Tunnel Junction. 21, 22, 76, 78, 83, 84, 86,
87, 92, 99–101
1T 2MTJ 1 Transistor 2 Magnetic Tunnel Junctions. 80
2D two-dimensional. 117
2T 2MTJ 2 Transistors 2 Magnetic Tunnel Junctions. 78, 80, 84, 86–93,
95–101
3D three-dimensional. 21, 117
3T 2MTJ 3 Transistors 2 Magnetic Tunnel Junctions. 75, 81, 83, 88
3TGG 3T 2MTJ cell with Ground Grid. 75, 83–93, 95–101, 116
3T sSL 3T 2MTJ cell with shared Source Line. 81–85, 87, 88, 91–93, 95–97, 99
4T 2MTJ 4 Transistors 2 Magnetic Tunnel Junctions. 81, 82
ALD Atomic Layer Deposition. 105
AP anti-parallel. 6, 78, 86
AP2P anti-parallel to parallel. 8, 12, 65, 71, 78, 80, 82, 84–87, 99, 100
BEOL Back-End-Of-Line. 3, 9, 19, 33, 34, 56, 71, 95, 96, 117
BL Bit Line. 12–15, 17, 19, 41–43, 50, 53, 56, 57, 62, 64, 65, 67, 69, 71–73, 78,
82, 83, 89, 90, 92, 93, 97
BTI Bias-Temperature Instability. 106, 107
vii
viii Glossary
CD Critical Dimension. 105, 107–109, 111–113
CMOS-compatible compatible with standard Complementary Metal-Oxide-
Semiconductor process flow. 1, 2, 4, 9
CMP Chemical Mechanical Polishing. 10, 57, 58
DC Direct Current. 89
DRAM Dynamic Random Access Memory. 2
DRC Design Rule Checking. 19
DTCO Design Technology Co-Optimization. 21
EM Electro-Migration. 65
EUV extreme ultra-violet. 9
FEOL Front-End-Of-Line. 3, 19, 33, 34, 39–42, 47, 54, 117
finFET fin Field Effect Transistor. 2, 23, 34, 85, 103, 104, 112, 113
HRS high resistive state. 6, 15
iN10 imec 10 nm technology. 11, 21, 25, 33–37, 39–41, 44, 49, 55, 56, 60–62,
73, 91, 92, 105–107
iN7 imec 7 nm technology. 11, 19, 21, 25, 33, 34, 40, 41, 43–45, 50, 53–55, 57,
58
IR Current-Resistance. 65
LE litho(graphy)-etch. 25, 26, 31, 34
LE2 double litho-etch. 25, 33, 34
LE3 triple litho-etch. 25, 33, 34
LEx multiple litho-etch. 25, 31
LLG Landau-Lifshitz-Gilbert. 7, 10
LRS low resistive state. 6, 15
LVT Low Threshold Voltage. 107
Glossary ix
MOL Middle-Of-Line. 34, 39–42, 47, 54
MP metal pitch. 25, 36, 40–43, 50, 52, 53
MTJ Magnetic Tunnel Junction. 3–12, 14, 18, 19, 21, 35, 40, 51, 55–58, 60, 67,
69, 72, 75–78, 80–82, 84–86, 91, 92, 98–101, 103–107, 109, 111–113, 116
NMOS Negative-channel Metal-Oxide-Semiconductor. 12, 104, 107–109, 111–
113
P parallel. 6, 78, 84, 86
P2AP parallel to anti-parallel. 8, 12, 15, 65, 71, 78, 82, 84–87, 98, 99
PDK Process Development Kit. 4, 19, 62
PEX Parasitics EXtraction. 19, 92
PMOS Positive-channel Metal-Oxide-Semiconductor. 12, 103, 104, 107–109,
111–113
PP poly(-silicon) pitch. 22–25, 35–37, 40, 41, 43, 46, 48, 52, 53, 67, 93
PSLP Partial Source Line Plane. 61, 64, 65, 67, 69, 71–73, 115
RA Resistance-Area product. 6, 9, 69, 85, 91, 92, 99, 108
RWL Read Word Line. 84, 92
SADP self-aligned double patterning. 25, 33, 34
SAQP self-aligned quadruple patterning. 21, 25, 28, 29, 34, 40–47, 49–51,
53–55
SAxP self-aligned multiple patterning. 25, 28, 31
SL Source Line. 12–14, 17, 22, 23, 41–44, 46, 47, 49–54, 61, 62, 64, 65, 67, 69,
71, 73, 80, 81, 83, 92, 93
SOT-MRAM Spin Orbit Torque Magnetic Random Access Memory. 117
SRAM Static Random Access Memory. 2, 13, 17, 21, 61, 80, 82, 115, 116
STT Spin-Transfer Torque. 8, 10, 12, 14, 117
STT-MRAM Spin-Transfer Torque Magnetic Random Access Memory. 2–5,
8, 9, 11, 12, 17–19, 21, 33, 34, 45, 51, 55, 60–62, 73, 75, 76, 82, 83, 101,
103–105, 115–117
x ABBREVIATIONS
SVT Standard Threshold Voltage. 107
T2T tip-to-tip. 31, 33–36, 38–40, 46, 48, 51, 53, 67
TCAD Technology Computer Aided Design. 19
TMR Tunnel Magneto-resistance Ratio. 6, 9, 69, 77, 78, 85, 89, 91, 92, 99, 107
WL Word Line. 12–14, 17, 19, 43, 50, 52, 54, 57, 62, 64, 65, 69, 71, 84, 92
WWL Write Word Line. 84, 92
Contents
Abstract iii
Contents xi
List of Figures xv
List of Tables xxi
1 Introduction 1
1.0.1 Embedded memory overload . . . . . . . . . . . . . . . 1
1.0.2 Advanced technology complexity . . . . . . . . . . . . . 2
1.0.3 Missing link between architecture, circuit and device
research . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.0.4 Embedded versus standalone . . . . . . . . . . . . . . . 4
1.1 Research aim of the PhD . . . . . . . . . . . . . . . . . . . . . 5
1.2 General approach and research methods . . . . . . . . . . . . . 5
2 STT-MRAM cell basics 7
2.1 Magnetic Tunnel Junction . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Free layer physics . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 11
xi
xii CONTENTS
2.1.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Creating memory arrays . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Access transistors . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Biasing lines . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Design-Technology Co-Optimization 23
3.1 Cell variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Two finger cell . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Dummy poly cell . . . . . . . . . . . . . . . . . . . . . . 25
3.1.3 DRAM-style cell . . . . . . . . . . . . . . . . . . . . . . 25
3.1.4 Pitch based cell sizes . . . . . . . . . . . . . . . . . . . . 27
3.1.5 111 SRAM cell . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Multiple patterning in iN10 and iN7 . . . . . . . . . . . . . . . 27
3.2.1 Multiple patterning schemes . . . . . . . . . . . . . . . . 27
3.2.2 Design rules . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.3 Technologies . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Critical rules analysis of physical layouts . . . . . . . . . . . . . 37
3.3.1 iN10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 iN7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Scaling boosters . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.1 SAQP mandrel and spacer engineering . . . . . . . . . . 53
3.4.2 Multi-level via . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 The third dimension . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.1 MTJ integration . . . . . . . . . . . . . . . . . . . . . . 66
3.5.2 Multi-level via to the rescue . . . . . . . . . . . . . . . . 68
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
CONTENTS xiii
4 Cell design for high density caches 73
4.1 Baseline STT-MRAM cell . . . . . . . . . . . . . . . . . . . . . 73
4.1.1 Line resistance . . . . . . . . . . . . . . . . . . . . . . . 74
4.1.2 Cell layout for reduced SL resistance . . . . . . . . . . . 74
4.2 Cell with partial source line plane . . . . . . . . . . . . . . . . . 76
4.2.1 Cell design and operation . . . . . . . . . . . . . . . . . 76
4.2.2 Cell layout . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Electrical assessment . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.1 Source line resistance versus area . . . . . . . . . . . . . 82
4.3.2 Write performance and energy consumption . . . . . . . 84
4.3.3 Read performance and voltage difference . . . . . . . . . 86
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5 Cell design for high performance caches 89
5.1 Complementary cell designs . . . . . . . . . . . . . . . . . . . . 89
5.1.1 What’s wrong with just one? . . . . . . . . . . . . . . . 90
5.1.2 Adding a second MTJ and transistor . . . . . . . . . . . 92
5.1.3 Adding a third transistor . . . . . . . . . . . . . . . . . 95
5.1.4 Adding even more transistors . . . . . . . . . . . . . . . 96
5.2 Improved 3T2MTJ cell design with ground grid . . . . . . . . . 97
5.2.1 Adding the ground grid: more is less . . . . . . . . . . . 97
5.2.2 Improving the write operation: using what is already there 98
5.2.3 Improving the read operation: using what is already there 103
5.3 Comparison with state-of-the-art . . . . . . . . . . . . . . . . . 106
5.3.1 Layouts and area and resistance comparison . . . . . . . 107
5.3.2 Sense margin comparison . . . . . . . . . . . . . . . . . 109
5.3.3 Read performance comparison . . . . . . . . . . . . . . 112
xiv CONTENTS
5.3.4 Write performance comparison . . . . . . . . . . . . . . 114
5.3.5 Write energy consumption comparison . . . . . . . . . . 115
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6 Write performance under time-dependent variability 119
6.1 The relevant questions . . . . . . . . . . . . . . . . . . . . . . . 119
6.1.1 What is the optimal MTJ diameter for write performance?120
6.1.2 Should PMOS or NMOS access transistors be used? . . 120
6.2 Main sources of variability . . . . . . . . . . . . . . . . . . . . . 121
6.2.1 MTJ variability . . . . . . . . . . . . . . . . . . . . . . . 121
6.2.2 FET variability . . . . . . . . . . . . . . . . . . . . . . . 122
6.3 The answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.1 Optimal MTJ target CD . . . . . . . . . . . . . . . . . . 124
6.3.2 PMOS access transistors should be preferred . . . . . . 128
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7 Conclusion 131
7.1 Beyond pitch based scaling . . . . . . . . . . . . . . . . . . . . 131
7.2 The importance of resistance . . . . . . . . . . . . . . . . . . . 133
7.3 Beware of variability . . . . . . . . . . . . . . . . . . . . . . . . 134
7.4 The future of STT-MRAM . . . . . . . . . . . . . . . . . . . . 134
Bibliography 137
List of publications 143
List of Figures
2.1 MTJ with its functional layers in a) parallel low resistive state
and b) anti-parallel high resistive state. . . . . . . . . . . . . . . 8
2.2 Schematic of the P2AP and AP2P write operation for the four
MTJ-transistor combinations. . . . . . . . . . . . . . . . . . . . 15
2.3 The baseline dual-BL cell with its a) circuit and b) array
configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Write operation of the baseline cell dual-BL cell with bias voltages
for switching a)P2AP and b)AP2P. . . . . . . . . . . . . . . . 17
2.5 Read operation of the baseline dual-BL cell with bias voltages
for a) low voltage read and b) discharge based read. . . . . . . . 18
2.6 Delay to a BL voltage difference of 100mV between an LRS and
HRS cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7 Maximum BL voltage difference between an LRS and HRS cell. 19
2.8 The common SL cell with its a) circuit and b) array configuration. 20
2.9 Schematic of the simulation framework. . . . . . . . . . . . . . 20
3.1 Circuit of the basic cell highlighting the internal node (IN). . . 24
3.2 Circuit of the two-finger cell design. . . . . . . . . . . . . . . . 25
3.3 Circuit of the dummy-poly cell design . . . . . . . . . . . . . . 26
3.4 Circuit of the DRAM-style cell design . . . . . . . . . . . . . . 26
3.5 Circuit of the 6 transistor SRAM cell with pull-up (PU), pass
gate (PG) and pull-down (PD) transistors indicated. . . . . . . 28
xv
xvi LIST OF FIGURES
3.6 Pitch density multiplying by multiple (i.e. triple) litho-etch scheme. 29
3.7 Multiple litho-etch scheme alignment problem. . . . . . . . . . 29
3.8 Multiple litho-etch scheme pull-back problem. . . . . . . . . . . 30
3.9 Pitch doubling and quadrupling by self-aligned multiple patterning. 30
3.10 Pattern generation with SAxP and cut masks. . . . . . . . . . . . 31
3.11 Pattern generation with SAxP and blocks masks. . . . . . . . . 32
3.12 Different populations of SAQP patterned metal lines. . . . . . . 33
3.13 T2T rule for both a) LEx and b) SAxP . . . . . . . . . . . . . . 34
3.14 Extension rule for both a)M0 extension over fin and b)M0
extension V0/M1. . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.15 3D view of the two-finger cell design . . . . . . . . . . . . . . . 38
3.16 Layout of the two-finger cell in iN10 with critical rules highlighted. 39
3.17 3D view of the dummy-poly cell design . . . . . . . . . . . . . . 40
3.18 Illustration of the contacting problem for the dummy-poly cell
in iN10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.19 Layout of the dummy-poly cell in iN10 with critical rules
highlighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.20 3D view of the DRAM-style cell design . . . . . . . . . . . . . . 43
3.21 Layout of the DRAM-style cell in iN10 with critical rules
highlighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.22 Layout of the 111 SRAM cell in iN10 on the left with critical
FEOL rules highlighted on the right. . . . . . . . . . . . . . . 46
3.23 Layout of the two-finger cell in iN7 at 2 MP high. . . . . . . . . 48
3.24 Illustration of the problem with the vertical M1 strips in iN7. . 49
3.25 Layout of the dummy-poly cell in iN7 at 3 MP high. . . . . . . 50
3.26 Layout of the DRAM-style cell in iN7 at 4 MP high. . . . . . . . 51
3.27 Layout of the 111 SRAM cell in iN7 of 6MP high. . . . . . . . 52
3.28 Illustration of possible methods to tune the SAQP grid for Mint
lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
LIST OF FIGURES xvii
3.29 Illustration of the problem with the horizontal Mint strips for
the dummy-poly cell in iN7. . . . . . . . . . . . . . . . . . . . . 55
3.30 Layout of the dummy-poly cell in iN7 with SAQP tuning, with
critical rules highlighted. . . . . . . . . . . . . . . . . . . . . . . 56
3.31 SAQP mandrel width and pitch tuning to align the fin and Mint
layers to the minimum cell height of the dummy-poly cell in iN7. 57
3.32 Layout of the DRAM-style cell in iN7 with SAQP tuning, with
critical rules highlighted. . . . . . . . . . . . . . . . . . . . . . . 58
3.33 SAQP mandrel width and pitch tuning and spacer merging to
align the fin and Mint layers to the minimum cell height of the
DRAM-style cell in iN7. . . . . . . . . . . . . . . . . . . . . . . 59
3.34 Layout of the 111 SRAM cell in iN7 with SAQP engineering on
the left with critical FEOL rules highlighted on the right. . . . 60
3.35 Illustration of two potential implementations of a multi-level via. 62
3.36 Layout of the two-finger cell in iN7 with multi-level via’s at 2
MP high. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.37 Layout of the dummy-poly cell in iN7 when using one level of
multi-level via’s, with critical rules highlighted. . . . . . . . . . 64
3.38 Layout of the dummy-poly cell in iN7 when using two levels of
multi-level via’s, with critical rules highlighted. . . . . . . . . . 65
3.39 Typically MTJ integration within the via height. . . . . . . . . 67
3.40 MTJ integration problems with MTJ stacks higher than the via
height. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.41 Layout of the two-finger cell in iN7 with high MTJ stack using
M1 lines to protect the MTJ’s from the M1 CMP step. . . . . . 69
3.42 MTJ integration with high MTJ stack and multi-level via’s. . . 69
3.43 Layout of the dummy-poly cell in iN7 with high MTJ stack using
multi-level via’s to protect the MTJ’s from the M1 CMP step. 70
3.44 3D view of the dummy-poly cell with high MTJ stack and multi-
level via’s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1 Copper resistivity as reproduced from [1]. The increase of
resistivity at small widths leads to high resistance increase. . . 75
xviii LIST OF FIGURES
4.2 Layout of the two-finger cell in iN10 with increased SL width
highlighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3 Circuit of the cell design with a PSLP in bold. . . . . . . . . . 77
4.4 Write operation of a PSLP-based cell with bias voltages for
switching a)P2AP and b)AP2P. . . . . . . . . . . . . . . . . . 78
4.5 Discharge based read operation of a PSLP-based cell. . . . . . . 79
4.6 3D view of the PSLP-based two finger cell. . . . . . . . . . . . 80
4.7 Layout of the PSLP-based two finger cell in iN10 with critical
rules highlighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.8 Minimum cell area for various SL topologies. . . . . . . . . . . 83
4.9 SL resistance versus cell area. . . . . . . . . . . . . . . . . . . . 84
4.10 Write delay for PSLP-based cells for different array sizes. . . . 85
4.11 Average write energy for PSLP-based cells for different array sizes. 85
4.12 Delay to BL voltage difference of 100mV for PSLP-based cells
for different array sizes. . . . . . . . . . . . . . . . . . . . . . . 86
4.13 Maximum read voltage difference for PSLP-based cells for
different array sizes. . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1 Normal distributions of RP and RAP with relevant parameters
for assessing sense margin. . . . . . . . . . . . . . . . . . . . . . 90
5.2 Upper limit on resistance variation allowed for readability for 1
MTJ per cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Upper limit on resistance variation allowed for readability for 2
MTJ’s per cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 a) Circuit of a 2T2MTJ cell and b) its independent write
operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5 a) Circuit of a 3T sSL cell and b) its serial write operation. . . 95
5.6 Circuits of two 4T2MTJ cells as in a) [9] and b) [18]. . . . . . 97
5.7 Array configuration of a) the 3T sSL cell as in [9] and b) the
novel 3TGG cell as in this work. . . . . . . . . . . . . . . . . . 98
5.8 Write operation of the 3TGG cell with boosted P2AP switch. . 100
LIST OF FIGURES xix
5.9 P2AP write currents over time of the different complementary
cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.10 AP2P write currents over time of the different complementary
cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.11 Total write currents over time of the different complementary cells.103
5.12 Circuit of a) 2T 2MTJ and b) 3TGG cell with ideal current input.104
5.13 Circuit of a) 2T 2MTJ and b) 3TGG cell with ideal voltage input.105
5.14 Layout of 2T 2MTJ cell (black box) in iN10 of 2x 102 nm high
and 128 nm wide. . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.15 Layout of 3T sSL cell (black box) in iN10 of 144 nm high and
192 nm wide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.16 3D view of the 3TGG cell. . . . . . . . . . . . . . . . . . . . . 110
5.17 Layout of 3TGG cell (black box) in iN10 of 106 nm high and
192 nm wide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.18 Minimum cell area for the different complementary cells. . . . . . 111
5.19 Read current difference in function of RP for the three
complementary cells for the best case column with low BEOL
impact. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.20 Read current difference in function of RP for the three
complementary cells for the worst case column with high BEOL
impact (256 columns). . . . . . . . . . . . . . . . . . . . . . . . 113
5.21 Delay to a BL voltage difference of 25mV for the different
complementary cells. . . . . . . . . . . . . . . . . . . . . . . . . 114
5.22 Write delay of the different cells with MTJ area variation. . . . 116
5.23 Write energy of the different cells with MTJ area variation. . . 117
6.1 Threshold voltage shift for both PMOS and NMOS access
transistors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2 Write delay under variability. . . . . . . . . . . . . . . . . . . . 125
6.3 Write delay versus size. . . . . . . . . . . . . . . . . . . . . . . 126
6.4 Optimal MTJ CD target for given MTJ CD control. . . . . . . 126
xx LIST OF FIGURES
6.5 Relative delay for optimal MTJ CD target at t0 mean as compared
to the optimal delay for given MTJ CD control. . . . . . . . . 127
6.6 Delay for optimal MTJ CD target for given MTJ CD control. 128
List of Tables
2.1 Operating voltages for the four MTJ-transistor combinations . 16
2.2 Operating voltages for the read and write operations of the
baseline dual-BL cell . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 MTJ simulation parameters . . . . . . . . . . . . . . . . . . . . . 21
3.1 iN10 technology sizes, pitches and patterning schemes . . . . . 35
3.2 iN10 design rules . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 iN7 technology sizes, pitches and patterning schemes . . . . . . 36
3.4 iN7 design rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Minimum area [nm2] of the different cell variants for the different
technology options . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Operating voltages for the read and write operations of the
PSLP-based cell. . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 Operating voltages for the read and write operations of the
2T 2MTJ cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.2 Operating voltages for the read and write operations of the 3T sSL
cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3 Operating voltages for the read and write operations of the 3TGG
cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4 MTJ simulation parameters . . . . . . . . . . . . . . . . . . . . 107
xxi
xxii LIST OF TABLES
5.5 Parasitic array resistances of the different complementary cells 108
5.6 Operating voltages of the low voltage read operation for the three
complementary cells. . . . . . . . . . . . . . . . . . . . . . . . . 109
5.7 Operating and pre-charge voltages of the discharge based read
operation for the three complementary cells. . . . . . . . . . . . 113
5.8 Operating voltages of the logic 1 write operation for the three
complementary cells. . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1 iN10 technology and time-dependent variability parameters . . 123
Chapter 1
Introduction
Embedded memories are becoming dominant in chip area as opposed to the
processor itself. The poor scaling of Static Random Access Memory or SRAM
is at the heart of this problem. Spin-Transfer Torque Magnetic Random Access
Memory or STT-MRAM is being proposed as an area efficient alternative,
but little research has been done on how it scales to the complex advanced
technology nodes at which STT-MRAM would be introduced. The ultimate
goal of this research is to design novel embedded STT-MRAM cells that deal
with the challenges in and beyond 10 nm technology nodes. This will be done
by setting up a simulation framework that links to the physical implementation
and is used to analyze existing solutions and verify the novel designs.
1.0.1 Embedded memory overload
Computer memories which are embedded on the same physical chip as the
processor are at the origin and the heart of this work. They were devised to
counter the well-known Von Neumann bottleneck, where the speed of the off-
chip data transfer to the main memory is limiting the system performance. The
solution to this problem was to integrate a fast memory on the processor chip, the
so-called embedded memory. Because this memory is fabricated together with
the processor, it needs to follow the same processing steps when it is fabricated.
Next to this need for high speed and the need to be CMOS-compatible, it also
needs to be very durable since it would be used continuously.
SRAM has been the embedded memory of choice, since it combines the highest
speed, best endurance and is fully CMOS-compatible. The biggest downside
1
2 INTRODUCTION
of SRAM is its size and with that its cost. This is the reason why embedded
memory typically act as a smaller cache to a larger main memory, in stead of
just replacing it. Using a fast cache to store the data that the processor will
need or is using, increases the memory bandwidth and maintains the overall
performance increase of the system.
Throughout the years of scaling down the transistor technology, it has become
more difficult to create an SRAM cell which meets all criteria for performance,
power and area. This has led to a diversification of SRAM cells in what are
typically called a high performance cell which is optimized for speed and
a low voltage or high density cell which is optimized for power and area.
Together with this diversification in embedded memories goes the creation of
multiple cache levels. Their is no longer a single cache connecting the processor
with the main memory, but an entire memory hierarchy both on-chip and
off-chip.
The increase of the number of caches and the ever growing need for bigger
memories has caused embedded memories to dominate chip area. The large area
of the SRAM cell, its biggest downside, is causing this. Therefor the integration
of other memory technologies on chip or on package has begun. Embedded
Dynamic Random Access Memory or DRAM, for instance, is being used for
providing high density memories close to the processor. By eliminating the
need to go off-chip and/or off-package, the speed of these memories has also
increased as compared to its standalone variant.
This trend has also sparked a massive amount of research and interest in new
embedded memory technologies. Many technologies exist, but the leading
candidate for potential SRAM replacement is STT-MRAM. Its main benefits
over other emerging technologies are its high endurance and low voltage
operation [32]. High endurance is imperative for an embedded memory as
mentioned before, as it will be used continuously at a high frequency. The
low voltage operation is important to be CMOS-compatible, so that the logic
transistors can safely operate and also to limit energy consumption. Also the
high speed requirement of embedded memories has been shown to be attainable
with STT-MRAM up to level-2 caches [16]. Therefor the focus of this PhD
research is on embedded STT-MRAM.
1.0.2 Advanced technology complexity
As mentioned in the title of this thesis, the research is done in and beyond
10 nm technology. This brings with it two main differences as compared to
older technology nodes: the replacement of planar transistors by "fin"-based
INTRODUCTION 3
Field Effect Transistors (finFET) and the introduction of multiple patterning
techniques.
The finFET technology has a vertical semiconductor "fin" sticking up from the
bulk, which is covered on three sides with the controlling gate. The fin width is
a fixed technology parameter and extra drive can only be reached by creating
either parallel fins or parallel fingers. This limits transistor width tuning to
integer numbers of the minimum sized transistor. The three side control over the
channel region allows to reduce the transistor leakage current in the off-state and
indirectly to optimize the transistor for higher drive current. Due to the higher
drive current, the minimal area of STT-MRAM cells are mostly dominated by
Back-End-Of-Line (BEOL) processing in stead of Front-End-Of-Line (FEOL)
processing.
Another challenge at these advanced nodes is the use of multiple patterning
techniques. The patterns that are being used in these nodes have crossed the
boundaries of single pattern pitches. Therefor a single dense pattern needs
to be printed in multiple less dense patterns. This approach brings with it
many new or stricter constraints on the possible patterns that can be created.
More margins need to be taken into account for misalignment between these
patterns. This results in many secondary design rules to dominate the minimum
achievable area of the designs. It has become impossible to predict the size of
future designs by simple pitch scaling. Estimates of memory sizes that were
previously done in F 2, where F would be the minimum feature size of the
technology, have become obsolete.
By switching to finFET technology and using multiple patterning techniques,
the entire design, layout and integration of STT-MRAM cells will change as
compared to planar technology with single patterning techniques. The cell
level work that has been done [3][33][7] [8][24], focuses however mostly on
existing nodes. Since STT-MRAM is still a novel technology that will only get
integrated on the processor chip in future technologies, it is imperative that
the design, layout and integration of STT-MRAM cells is investigated in these
nodes. Therefor the focus of this PhD research is on 10 nm technology and
beyond.
1.0.3 Missing link between architecture, circuit and device
research
For designing and experimenting with embedded cache memories, simulation
tools have been created [29] and extended to include new memory technologies
[2]. These tools are using the basic planar scaling rules in order to extrapolate
4 INTRODUCTION
to future technologies. As discussed before, these future technologies are not
following the basic scaling trends anymore.
On the flip side of this higher level simulations for embedded memory design is
the device research that has been done on STT-MRAM. The Magnetic Tunnel
Junction or MTJ is the main component of STT-MRAM cells and is used for
implementing the memory function. A lot of research has been performed on
implementing and improving this device [10][16] [4]. However, there has been
little work done in studying the device in its cell context, which is crucial for
the operation of the memory, because the highly non-linear access transistor
also has a large impact on the cell behavior. Especially in an embedded memory
where the transistor can not be tuned for the memory needs.
The link between architecture, circuit and device research in advanced
technologies needs to be established in order for STT-MRAM to get introduced
as an embedded memory. Therefor the focus of this PhD research is on providing
realistic area, performance and energy numbers for comparing the different cell
options and guiding device scaling.
1.0.4 Embedded versus standalone
STT-MRAM is also being researched as a standalone memory technology.
This is the most common approach for introducing a new technology, as it is
already very difficult to create it on its own, let alone integrate it with the
logic technology. As a standalone memory, it is targeting DRAM replacement.
DRAM is used as a main memory to the processing chip, ensuring it has enough
memory capacity to run its processes. The requirements for this main memory
are very different compared to the ones for the embedded cache memories.
The density of the standalone main memory needs to be higher than that of
the embedded cache memory. This requires to scale the dimensions of the MTJ
to the limit of the technology. The standalone memory can use a dedicated
transistor technology optimized for density, as is done in DRAM, to complement
the scaled MTJ. In an embedded memory the logic transistors and BEOL are
used which are optimized for the performance of the logic. The embedded
STT-MRAM cells will therefor not be limited by the dimensions of the MTJ,
which reduces the need to scale it to the limit of the technology.
The performance of the embedded cache memory needs to be higher than that
of the standalone main memory, whose performance is already bottlenecked
by its off-chip nature. Combined with the lower capacity of the embedded
memory, this results in different electrical requirements for the MTJ. The large
capacity of the main memory will require higher read margins to cover the
RESEARCH AIM OF THE PHD 5
larger population of devices under variability. The higher speed of the cache
memory will require lower resistances of the MTJ and faster switching.
This PhD research is focused on embedded STT-MRAM cells and will therefor
not focus on the scaling limits of the MTJ due to i.e. etch damage. All MTJ
sizes used in this PhD research have been demonstrated in silicon in literature.
1.1 Research aim of the PhD
The main goal of the PhD research is to design STT-MRAM cells for embedded
memories in and beyond 10 nm technologies. Embedded memories have been
targeted, because there lies the biggest promise for STT-MRAM, since it is
CMOS-compatible, has high endurance and uses a low voltage to operate. Here,
it is also important to note that there is not a single cell solution for all embedded
memories, as some target higher performance and others target higher density.
The subtitle of this thesis "Making the link to the physical implementation",
also clarifies the secondary research aim of the PhD research. In these advanced
technology nodes, it has become ever more important to create the link with
the physical implementation of the devices in order to reach valid results and
conclusions. Secondary effects are becoming dominant for cell area and parasitics
are influencing the cells ever more.
1.2 General approach and research methods
The main approach and research methods to reach the main goals of this thesis
are summarized as follows. First, it is vital to build a simulation framework that
is capable of both exploring different cells at a circuit level, while creating a link
with the physical implementation. The link with the physical implementation
is created by using an existing Process Development Kit or PDK that has been
developed at imec for the 10 nm and 7 nm technology nodes. This framework is
then adapted and extended to include the MTJ in order to design STT-MRAM
cells. The circuit level exploration is done by adapting and updating an existing
in-house Matlab/Spice framework to include STT-MRAM.
With this framework in place, it is then possible to analyze the existing solutions.
Physical cell designs can be made for achieving accurate area numbers and
for showing where the bottlenecks for further scaling are. They also quantize
the parasitic resistances and capacitances present in the design in advanced
6 INTRODUCTION
technology nodes. Circuit simulations can then be used for investigating the
electrical impact on the operation of the cells.
Finally after this investigation of the bottlenecks for scaling and the effects of
parasitics on the cell operation, novel cell designs can be created to accommodate
for these issues. They can then be put through the same process of physical
design, parasitic extraction and circuit simulation to compare them to existing
solutions. In this way, there is a fair comparison done between the different
cell designs while making the link with the physical implementation in these
advanced technology nodes in and beyond 10 nm.
Chapter 2
STT-MRAM cell basics
In this first chapter on the basics of STT-MRAM cells, some necessary
prerequisites for the later chapters will be explained. The main components of
STT-MRAM cells are discussed: the MTJ, the access transistor and the biasing
lines. All these components together are used to form a simulation framework
which is used for all simulation experiments throughout this PhD research.
In this PhD research, a framework for the design, physical layout and electrical
evaluation of STT-MRAM cell designs has been created. The adaption and
integration of existing frameworks and device models has enabled experiments
with existing cell designs and novel cell concepts.
Section 2.1 introduces the Magnetic Tunnel Juction or MTJ as the main device
and storage element of STT-MRAM cells. It highlights its physical behavior,
processing and modeling. Section 2.2 presents the other components needed to
create memory arrays, the access transistors and biasing lines. Extra attention
is paid to the different MTJ-transistor combinations and to the different array
configurations. Section 2.3 present the framework that is used throughout this
PhD research for designing, simulating and analyzing the STT-MRAM cells.
2.1 Magnetic Tunnel Junction
The MTJ is the main component of STT-MRAM cells, as this device gives the
cells their memory functionality. The device is composed of three main layers
as illustrated in Fig. 2.1: the free magnetic layer, a barrier layer and a pinned
magnetic layer. The physical orientation of these layers can be inverted, but
7
8 STT-MRAM CELL BASICS
the order remains the same. In this way, top-pinned MTJ’s, as in Fig. 2.1, have
the pinned layer at the top of the device, whereas bottom-pinned MTJ’s have
the pinned layer at the bottom.
free layer
pinned layer
a) b)
barrier layer
Figure 2.1: MTJ with its functional layers in a) parallel low resistive state and
b) anti-parallel high resistive state.
The free layer of an MTJ has bi-stable magnetization states. When the
magnetization of the free layer aligns in parallel (P) to the pinned layer, electrons
will tunnel through the barrier layer more easily, creating a low resistive state
(LRS). When the magnetization of the free layer aligns in anti-parallel (AP)
to the pinned layer, electrons will tunnel through the barrier layer less easily,
creating a high resistive state (HRS). In this way, the pinned layer is used as a
reference for determining the state of the free layer. This effect is called tunnel
magneto-resistance and the higher the ratio of this low and high resistive state,
the easier it is to distinguish between them. This is characterized by the Tunnel
Magneto-resistance Ratio or TMR, as expressed in Eq. 2.1:
TMR = RAP −RP
RP
= RHRS −RLRS
RLRS
(2.1)
In the rest of this thesis, the more common RAP and RP notations will be used.
This lower or higher resistance of the device can then be interpreted as a logic 0
or 1 in order to create the bitwise memory functionality. Note that the bi-stable
nature of the device limits it to store one bit per MTJ. In this section, the free
layer physics, processing and modeling of the MTJ are discussed.
Another important parameter of the MTJ is the Resistance-Area product or
RA. This parameter is used to characterize the absolute resistance of the layer
stack independent of the device area. Eq. 2.2 and Eq. 2.3 describe both MTJ
resistances in function of RA, TMR and the device area.
RP =
RA
A
(2.2)
MAGNETIC TUNNEL JUNCTION 9
RAP =
RA
A
(TMR+ 1) = RP (TMR+ 1) (2.3)
2.1.1 Free layer physics
The free magnetization layer is the part of the MTJ that should change its
properties and is therefor the main focus of this explanation.
Magnetization dynamics
The physical functioning of the device is determined by the magnetization
dynamics of the free layer. The physical effects impacting the operation of the
device can be described by magnetic field contributions on the free layer. The
Landau-Lifshitz-Gilbert (LLG) equation (Eq. 2.4 as in [26]) formulates these
magnetization dynamics in function of the effective magnetic field Heff:
∂M
∂t
= −γ0(M×Heff) + α
MS
(M× ∂M
∂t
) (2.4)
where the magnetization M is a function of space and time M(r,t), α is the
damping factor, γ0 is the product of the electron gyro-magnetic ratio γ and
the permeability of vacuum µ0 and MS is the saturation magnetization of the
free layer, which is assumed to be time-independent. The first term describes
the precession behavior of the magnetization and the second term describes the
damping effect.
The effective magnetic field is composed of an exchange field, an anisotropy
field (magneto-crystalline, shape or interfacial) and a demagnetizing field and
can also have an externally applied field as a contribution. The origin of these
fields is beyond the scope of this thesis, but the impact they have on the device
are worth highlighting here.
The anisotropy field is the contribution that gives the MTJ its bi-stable
states. It will cause the device to have two preferred alignments, in parallel or
anti-parallel with this field. This field is created by the crystal structure of the
material, the shape of the free layer or the interfacial effects at the boundary
with the barrier layer. In this way it is possible to create so-called in-plane
or perpendicular MTJ’s. In-plane MTJ’s have their preferred magnetization
direction in the plane of the free layer. This is commonly achieved by making
elliptic MTJ’s, where the magnetization preferably aligns to the long axis of the
device. These elliptic devices are however not scalable to very small sizes, so
there has been a switch to the perpendicular MTJ’s. Perpendicular MTJ’s have
10 STT-MRAM CELL BASICS
their preferred magnetization direction perpendicular to the plane of the free
layer. This is commonly achieved by making very thin free layers of which the
magnetization is dominated by interfacial effects. This has also led to the use
of so-called double MgO based MTJ’s which have these interfacial effects twice.
The exchange field and the demagnetizing field are always present and
counteract each other. The exchange field tries to aligns the magnetic moments
whereas the demagnetizing field is opposite to the magnetization. This
counteraction will cause the formation of magnetic domains in the free layer.
The externally applied field can be used to flip the magnetization of the
free layer. This way of controlling the state of the MTJ is however very energy
consuming. The breakthrough in developing this device has come with the
discovery of Spin-Transfer Torque (STT).
Spin-Transfer Torque
STT is vital for controlling the state of an MTJ in an energy efficient way.
As described by Slonczewski in [26], electron spins exert a torque on the
magnetization of the free layer. In this way, the state of the MTJ can be
changed by pushing a current through it. When electrons first go through the
pinned layer, the majority of them that tunnel through the barrier will have
their spin aligned in parallel to the pinned layer. The pinned layer now acts
as a polarizer. The polarized electrons will exert a torque on the free layer
magnetization, causing it to align in parallel to the pinned layer. When electrons
first go through the free layer, again the majority of them that tunnel through
the barrier will have their spin aligned in parallel to the pinned layer. This will
cause an increase in anti-parallel spins to build up in the free layer, causing it
to align in anti-parallel to the pinned layer.
It is important to note that this slightly different writing mechanism causes the
MTJ write operation to be asymmetrical. It is easier to change the state from
anti-parallel to parallel (AP2P), since this is a majority pass-through spin effect.
Whereas the change from parallel to anti-parallel (P2AP) is more difficult, since
this is a minority accumulating spin effect.
Since STT is a torque effect, it is vital for this system to be non-ideal. When
starting from an ideal parallel or anti-parallel state, the angle between the spin
and the magnetization will be zero or pi. In these cases, there will be zero
torque and no change will happen. In practice, due to thermal fluctuations and
non-ideal fabrication, there will be an initial angle between the spin and the
magnetization when applying a current. This initial angle will influence the write
operation of STT-MRAM cells greatly, as was shown in [14]. Throughout this
MAGNETIC TUNNEL JUNCTION 11
PhD research, simulations have been done with various initial angles which have
shown that the relative performance and energy consumption of the different
cell designs does not depend on this initial angle.
The thermal fluctuations in an MTJ are also important when considering data
retention. Although the MTJ has two stable states, it can switch randomly
from one state to the other if enough energy is given by thermal fluctuations.
This phenomenon is characterized by the thermal stability factor ∆, which is
described in Eq. 2.5:
∆ = E
kBT
(2.5)
where E is the energy barrier between both stable states, kB is the Boltzmann
constant and T is the absolute temperature. The higher this ∆ the longer it will
take on average for an MTJ to randomly switch states and lose its data. The
energy barrier is depending on the volume of the free layer, making it more easy
to randomly switch smaller devices made with the same process and materials.
2.1.2 Processing
The processing of an MTJ is done in three steps: the thin film deposition, the
printing of the pillars and the etching of the pillars. It is important that these
steps can be integrated in a regular CMOS-compatible BEOL stack in order to
use STT-MRAM as an embedded memory.
The thin film deposition creates a multi-layered stack across the wafer that
forms the functional layers as discussed in section 2.1. The TMR and RA
parameters are determined by the thin film deposition as they are independent
of the device area. The thin film layers are however more complex than the
functional layers suggest. Especially the pinned layer is composed of many
layers to achieved the desired effect on the free layer. The barrier layer is the
most straightforward and is usually made of a single MgO layer. The free layer
also used to be straightforward and was usually made of a single CoFeB layer.
In order to improve the free layer properties, also devices with two MgO layers
have been fabricated to improved the interfacial anisotropy of the free layer.
The printing of the pillars is done by lithography. Depending on the size, a
few different techniques have been applied. The standard 193 nm immersion
(193i) single exposure printing can reach the required density of the MTJ’s in
advanced nodes, but the variability is too large. Double exposure printing has
been applied to reduce this and also a switch to Extreme Ultra-Violet (EUV)
12 STT-MRAM CELL BASICS
printing is being considered. Also the crossing of two spacer-defined lines is
under investigation.
The etching of the pillars is the last step in the processing together with the
encapsulation and Chemical Mechanical Polishing (CMP). After the printing is
done, this pattern is etched into the multi-layered stack to create the individual
MTJ devices. The encapsulation ensures that they remain isolated from their
surroundings and the CMP allows further processing to start from a flat surface.
The device area and with it the thermal stability factor are dependent on both
the printing and etching when forming the MTJ pillars.
2.1.3 Modeling
A main split that can be made in modeling approaches is whether they are
physics based or empirical.
Physics based models
This first kind of models is based on the LLG equation described above in
Eq. 2.4. In addition to modeling the field contributions described above, the
STT is mathematically converted into another contribution to the effective
field and the thermal fluctuations are modeled as a random field depending
on the temperature. Their are two approaches to solve this equation: the
micro-magnetic model or the macro-spin approximation.
In the micro-magnetic model, the LLG equation is numerically solved for
many points in the free layer. This requires a lot of computation as integrals over
the entire volume need to be calculate for every point in the simulation. This
approach gives the highest accuracy and is very suited for physical understanding
of the magnetization dynamics.
The macro-spin approximation reduces the complexity of the micro-
magnetic model by removing the position dependence of the magnetization.
The assumption is that at very small scale, there will only be a single magnetic
domain that will behave as a single big spin, hence the macro-spin approximation.
With this assumption, the exchange field gets removed from the equation and
the computation intensive demagnetizing field gets simplified. This approach
is less accurate, but far more suited to do circuit simulations thanks to the
improved simulation time. The single domain assumption is also reasonable for
the dimensions used in this PhD research.
CREATING MEMORY ARRAYS 13
Empirical models
The second kind of models is based on measurement data which is fitted to
curves with a minimum amount of fitting parameters. The biggest benefit of
this model is that it can exactly mimic the behavior of the processed devices.
This makes it ideal to do simulations for designs with a known technology. The
downside of this approach is that this can only be done for a known technology
of which there is measurement data available. This was an issue during this
PhD research, since there was no measurement data available for the dimensions
that were being investigated. Moreover, in order to explore different parameters
of the MTJ device, as was done during this PhD research, a lot of measurement
data should be available to be able to interpolate between different parameter
sets.
2.2 Creating memory arrays
Two other cell components are needed to create STT-MRAM arrays: access
transistors and biasing lines.
2.2.1 Access transistors
The access transistor is the second device present in STT-MRAM cells. It is
needed to select or access the proper MTJ from which information needs to be
retrieved or in which it needs to be stored. The transistor is placed in series
with the two-terminal MTJ and connected by its drain terminal.
Logic finFET technologies
Logic technologies are used for the cells, since this PhD research is performed
in the context of embedded memories. The designs fully comply with the logic
process in order to guarantee flawless integration [6]. The nominal supply
voltages and regular logic transistors are used in the cells.
In this PhD research, the advanced nodes of imec 10 nm and 7 nm technology
(iN10 and iN7) have been used, which are both technologies. These technologies
have replaced planar technology due to their superior electrostatic channel
control, which in turn leads to higher drive currents for the same leakage
specifications. One of the biggest consequences is that their is no more transistor
14 STT-MRAM CELL BASICS
width tuning allowed and the only way to increase the transistor size from the
minimum one is to arrange multiples of them in parallel.
MTJ-transistor combinations
When combining both possible MTJ devices, being top-pinned and bottom-
pinned, with both possible access transistors, being Negative-channel and
Positive-channel Metal-Oxide-Semiconductor (NMOS and PMOS), there are
four combinations that can be created. Due to the asymmetry of the STT effect
in MTJ’s and the complementary nature of NMOS and PMOS transistors, not
all combinations are equally interesting.
Fig. 2.2 shows both write operations of the four combinations: a) top-pinned
MTJ with NMOS transistor, b)bottom-pinned MTJ with NMOS transistor,
c) top-pinned MTJ with PMOS transistor and d)bottom-pinned MTJ with
PMOS transistor. The P2AP switch is the most difficult as discussed in
section 2.1.1 and is therefor highlighted in red in the figure. The voltage biasing
of the cell is highlighted in blue. Remember that the current through the MTJ
flows opposite to the electron flow as in the explanation of the STT effect.
Table 2.1 summarizes the operating voltages from Fig. 2.2. VMTJ is the voltage
at the MTJ-terminal, Vs/d is the voltage at the source/drain-terminal of the
transistor, Vg is the voltage at the gate-terminal of the transistor and Vgs is the
gate-to-source voltage of the transistor.
Combinations a) and d) have the full gate-to-source bias for the more difficult
P2AP switch and a degenerated gate-to-source bias for the easier AP2P switch.
Combinations b) and c) have a degenerated gate-to-source bias for the more
difficult P2AP switch and the full gate-to-source bias for the easier AP2P
switch. Therefor combination a) top-pinned with NMOS and combination
d) bottom-pinned with PMOS are the preferred ones. Throughout this text,
the top-pinned NMOS combination will be used by default, unless specified
otherwise.
2.2.2 Biasing lines
The third component are the biasing lines to connect the different cells in an
array configuration. As seen in Fig. 2.2, an STT-MRAM cell has three terminals
which need to be biased. The connection to the gate of the access transistor is
the Word Line (WL) and is used to select a column of cells. The connection to
top of the MTJ is the Bit Line (BL) and the connection to the source of the
access transistor is the Source Line (SL). These lines are used to bias the MTJ
CREATING MEMORY ARRAYS 15
a) b)
I I
I
I
I
I
0
0
0 0
0
0
P2AP AP2P
P2AP AP2P
PINNED
LAYER
FREE
LAYER
FREE
LAYER
PINNED
LAYER
c) d)
I I
I
I
I
I
0
0
0 0
0
0
P2AP AP2P
P2AP AP2P
PINNED
LAYER
FREE
LAYER
FREE
LAYER
PINNED
LAYER
Figure 2.2: Schematic of the P2AP and AP2P write operation for the four
MTJ-transistor combinations.
for changing or reading its state. Especially the BL is the line which contains
the data and should always run perpendicular to the WL which selects the
column.
Note that, throughout this text, the convention of WL columns and BL rows
will be used. Contrary to standalone memories, embedded memories like SRAM
follow this convention because the gates in logic cells are always drawn vertical.
With the need for the WL and BL to run perpendicular, there are two array
configurations possible: the SL running in parallel with either the BL or the
WL.
16 STT-MRAM CELL BASICS
Table 2.1: Operating voltages for the four MTJ-transistor combinations
Combination Switch VMTJ Vs/d Vg Vgs
a) NMOS top P2AP High Low High FullAP2P Low High High Degenerated
b) NMOS bottom P2AP Low High High DegeneratedAP2P High Low High Full
c) PMOS top P2AP High Low Low DegeneratedAP2P Low High Low Full
d) PMOS bottom P2AP Low High Low FullAP2P High Low Low Degenerated
Baseline dual-BL cell
The baseline version is the dual-BL cell (as in [16][3]), where the SL runs in
parallel with the BL. Fig. 2.3 shows this cell with its biasing lines and how it
is configured in an array. For all the cells in the same column, their access
transistors are controlled by the same WL and they are biased by different BL’s
and SL’s. This array configuration allows for a fully independent control of all
selected cells.
WL
BL
SL
MTJ
cell 11 cell 12
cell 21 cell 22
WL1 WL2
BL1
SL1
BL2
SL2
a) b)
Figure 2.3: The baseline dual-BL cell with its a) circuit and b) array
configuration.
Fig. 2.4 illustrates the write operation of the baseline dual-BL cell. The BL and
SL are biased in order to control the direction of the current flow through the
MTJ depending on which state must be written to the cell. All cells controlled
by the same WL are biased independently by their own SL and BL, enabling a
fully parallel write operation.
Because the MTJ is a two terminal device, the read operation will also pass
CREATING MEMORY ARRAYS 17
a) b)
I
0
I
I
I
0
Figure 2.4: Write operation of the baseline cell dual-BL cell with bias voltages
for switching a)P2AP and b)AP2P.
a current through it that will also create the STT effect. Therefor there is a
risk of disturbing the state of the MTJ when reading the cell, known as read
disturb. As shown in [21], this leaves two options for the read operation: a
slower low voltage mode or a faster high voltage mode. A pre-charge/discharge
scheme will also be considered, since this will also limit the amount of electrons
flowing through the MTJ preventing read disturb. This scheme also allows to
characterize the read delay impact of different cell versions.
Fig. 2.5 illustrates the low voltage read operation and the discharge based read
operation of the baseline dual-BL cell. For the low voltage read operation,
the BL’s are biased to low read voltage and the SL’s are grounded. For the
discharge based read operation, the BL’s are pre-charged to the supply voltage
and the SL’s are grounded. All cells controlled by the same WL can sense their
own BL, enabling a fully parallel read operation.
Table 2.2 summarizes the operating voltages from Fig. 2.4 and Fig. 2.5. Note
that the initial biasing and flow of current for the read operations is the same
as for the P2AP switch. This biasing is most robust against read disturb, since
this is the most difficult switch.
Two metrics from the discharge based read operation will be used to characterize
the cells: the delay to a specific BL voltage difference between an LRS and HRS
cell and the maximum BL voltage difference at any time during the discharging.
Fig. 2.6 illustrates the delay and Fig. 2.7 illustrates the maximum difference.
Both figures show the BL voltages during pre-charging and discharging of an
LRS cell in orange and a HRS cell in blue (purple when they overlap).
18 STT-MRAM CELL BASICS
I
0
a) b)
I
0
Vread PRE= l
Figure 2.5: Read operation of the baseline dual-BL cell with bias voltages for
a) low voltage read and b) discharge based read.
Figure 2.6: Delay to a BL voltage difference of 100mV between an LRS and
HRS cell.
CREATING MEMORY ARRAYS 19
Table 2.2: Operating voltages for the read and write operations of the baseline
dual-BL cell
Operation VBL VSL VWL
Low voltage read Vread Low High
Discharge based read PRE = High Low High
P2AP write High Low High
AP2P write Low High High
Figure 2.7: Maximum BL voltage difference between an LRS and HRS cell.
Common SL cell
The other version is the common SL cell (as in [33]), where the SL runs in
parallel with the WL. Fig. 2.8 shows this cell with its biasing lines and how
it is configured in an array. For all the cells in the same column, their access
transistors are controlled by the same WL and they are biased by different BL’s,
but all selected cells have the same SL bias. When writing the selected cells,
this equal SL bias causes a need for either a negative BL voltage or a two phase
write operation. There is no such issue when reading the selected cells, since
the SL’s have the same read bias.
20 STT-MRAM CELL BASICS
WL
BL
SL
MTJ
cell 11 cell 12
cell 21 cell 22
WL1 WL2
BL1
SL1
BL2
SL2
a) b)
Figure 2.8: The common SL cell with its a) circuit and b) array configuration.
2.3 Framework
In this PhD research, a simulation framework has been build to conduct the
circuit simulation experiments. This framework has been used for all work
presented in this thesis. Fig. 2.9 gives a high level overview of the different
components while highlighting the contributions made in this PhD research.
Figure 2.9: Schematic of the simulation framework.
The core of the framework is a Matlab/Spice based framework which was
developed in-house for circuit simulations of i.e. SRAM cells. It has been
developed to flexibly generate circuit netlists and execute them with a simulator
FRAMEWORK 21
Table 2.3: MTJ simulation parameters
Parameter Range
Diameter 20 - 50 nm
RA 2 - 5Ωµm2
TMR 50 - 200%
such as Cadence Spectre. The framework also offers software modules for
post-processing all signals in the circuits. It needs input scripts for the netlist
generation and device models for simulation execution. Post-processing scripts
have been created in the framework to analyze the performance and energy
consumption of the STT-MRAM cells.
STT-MRAM specific scripts have been added in order to use the framework
for this PhD research. These scripts contain all the cell designs which will
be discussed in this thesis. They enable to simulate the different array
configurations and operation modes that have been introduced, as well as
the baseline versions. Also the introduction of variability in the simulations has
been included in these scripts.
A Verilog-A based MTJ model has been created, based on an in-house
C-code based model which uses the macro-spin approximation in 3 dimensions
discussed in section 2.1.3. Verilog-A was used for easy integration with the
circuit simulators. The macro-spin approximation is used in order to reduce
simulation time. It is assumed to be accurate enough in the MTJ diameter
range used in this PhD research to see the relative effects between the different
cells that were investigated. The MTJ properties that were used in this model
are based on [10]. Table 2.3 shows the properties that were swept in experiments
with their ranges.
The final component of the framework is the in-house created Process
Development Kit (PDK) as in [22]. It is used to develop the imec 10 nm
technology with bulk devices for logic designs. It contains a physical layout
environment with Design Rule Checking (DRC) and Parasitics EXtraction
(PEX) for both Back-End-Of-Line (BEOL) and Front-End-Of-Line (FEOL)
parasitics. Netlists can be generated which are compatible with in-house device
models. These transistor compact models are another very important part
of the PDK and are used for the circuit simulations. They are based on
silicon measurements, Technology Computer Aided Design (TCAD) studies
and references [31][30]. An MTJ layer has been added to this PDK in order to
create the STT-MRAM cell designs. By using this PDK, accurate area numbers
for the cell designs can be attained as shown also in the next chapter 3. It also
provides the simulation framework with the correct transistor parameters and
22 STT-MRAM CELL BASICS
line resistance and capacitance.
The use of the total framework can be summarized as follows. After an initial
cell concept design, a physical layout of the cell array is created in the PDK
and validated for manufacturability with the built-in DRC. This physical layout
already gives the area of the new design. The PEX of this physical layout gives
the transistor parameters and the line resistances and capacitances. The new
cell design and/or operation mode is added to the scripts in order to generate
a trimmed memory array for the simulations. A trimmed array consists of
only the active column and the row(s) under investigation and is used to
save simulation time. The PEX is done on a full array in order to include the
coupling capacitances to nodes that are not included in the trimmed deck. These
coupling capacitances are then added to the intrinsic capacitance to ground for
the simulation. For every simulation, only one cell is considered and the other
cells on the BL’s and WL’s are lumped together in small groups, forming a
distributed chain. This lumping is parameterized to trade-off accuracy versus
simulation time. Then simulations are run with multiple MTJ parameters, array
sizes and cell designs. After these simulations, the performance and energy
consumption of the different designs can be analyzed and compared.
With this framework that has been created during this PhD research, it is
now straightforward to evaluate different STT-MRAM cell designs in advanced
technology nodes. It can also be used for newer technologies, such as the imec
7 nm technology (iN7) by the straightforward addition of the MTJ layer to
the PDK. Also the replacement of the MTJ model would be straightforward
if it is compatible with circuit simulators. An update to the model is also
straightforward by extending the Verilog-A code or simply updating the
parameters.
Chapter 3
Design-Technology
Co-Optimization
As STT-MRAM will first target bigger, denser cache memories, the minimum
cell area will have a significant impact on total die size. In order to achieve
successful integration of STT-MRAM as area efficient higher level embedded
cache, it needs to be included as benchmark to develop the process flow for
new logic technologies. SRAM cells and logic standard cells are already used as
benchmarks [22][23][25][12], but the simple cell structure of STT-MRAM brings
extra patterning challenges to achieve high density.
Design-Technology Co-Optimization (DTCO) is pursued in this PhD research
to investigate the scaling bottlenecks of STT-MRAM cells and to achieve high
density cells by introducing scaling boosters. The results presented in this
chapter show the large impact of secondary design rules on the minimum cell
area and the large area reduction achieved when introducing scaling boosters
that target these critical rules.
In this chapter, section 3.1 first introduces the 111 SRAM cell and the three
variants of the standard dual bit line 1 Transistor 1 MTJ (1T1MTJ) cell: the
two-finger cell, the dummy-poly cell and the DRAM-style cell. Next, section 3.2
explains the patterning techniques and design rules of technology nodes iN10
and iN7. Section 3.3 shows the analysis of the cell variants in iN10 and iN7 and
highlights the critical rules for scaling the cell size. Section 3.4 shows the effect
of SAQP mandrel and spacer engineering on the cell size and introduces the
use of multi-level via’s as a scaling booster in STT-MRAM cells. Section 3.5
turns the attention to the three-dimensional (3D) integration of MTJ’s in deeply
23
24 DESIGN-TECHNOLOGY CO-OPTIMIZATION
scaled metal layers and shows how the multi-level via’s can also solve this issue.
3.1 Cell variants
When going from the 1T1MTJ cell circuit to a physical layout, there is a
fundamental layout problem that needs to be addressed. The 1T 1MTJ circuit
has an internal node, which cannot be shared with any neighboring cell (Fig. 3.1).
Three cell variants are investigated in this PhD research, each giving different
solutions for this problem.
BL
SL
WL
IN
Figure 3.1: Circuit of the basic cell highlighting the internal node (IN).
3.1.1 Two finger cell
The two finger cell, which is commonly used [16][33][3], solves the problem of
the internal node by adding a second transistor finger to the cell. This way the
internal node gets removed from the cell boundary and placed in the middle of
the cell. Fig. 3.2 shows the two-finger cell and how it can be easily connected to
its neighbors by a shared SL connection.
The benefit of this cell variant is that it uses all available transistors and therefor
has no unnecessary parasitic front-end capacity. The downsides are that it
always has a multiple of two fins, further reducing the flexibility in transistor
sizing of finFETs which are already limited to an integer number of fins, and
that it widens the cell to 2Poly(-silicon) Pitch (PP), making the biasing lines
longer.
CELL VARIANTS 25
BL
SL
WL1p2 WL2p1 WL2p2WL1p1
WL1 WL2
Figure 3.2: Circuit of the two-finger cell design.
3.1.2 Dummy poly cell
The other common solution is the cell where two transistor drains are abutted
with the minimum active spacing in between [3]. However this can not be
implemented in finFET technologies. The fin will be epitaxially grown in the
drain region to get a good contact resistance with the contact metal. The end of
the fin needs to be covered by a gate in order to properly control this epitaxial
silicon growth.
The dummy-poly cell (devised after discussions with Stefan Cosemans) solves
the problem of the internal node and the abutting of drains by sacrificing a
transistor in between two cells that is used to separate both internal nodes.
This transistor is always in the OFF-state, so the gate needs to be biased for
correct use of the cells. Fig. 3.3 shows the dummy-poly cell which is connected
to its neighbors by a shared SL on one side and by the dummy-poly, a transistor
in the OFF-state, on the other side.
The benefit of this cell variant is that it can increment the transistor sizing per
single fin. The downsides are that it has "useless" transistors which increase
unwanted parasitic front-end capacity and that it still has a somewhat wider
cell of 1.5 PP.
3.1.3 DRAM-style cell
The DRAM-style cell is investigated as it provides a solution to the problem
of the internal node by leaving a gap in between two cells to separate both
26 DESIGN-TECHNOLOGY CO-OPTIMIZATION
BL
SL
WL1 DP WL2 WL3 WL4DP
Figure 3.3: Circuit of the dummy-poly cell design
internal nodes. Because the transistor fins need to stop under a gate to prevent
uncontrolled epitaxial silicon growth when creating the source-drain contacts,
this gap takes up two adjacent gates. To avoid losing these gates altogether,
they are used with another fin. This results in a staggered configuration of
active access transistors as is customary in DRAM, hence the DRAM-style
cell. Fig. 3.4 shows the DRAM-style cell with its staggered pattern of access
transistors.
BL
SL
WL1 WL2 WL3 WL4
Figure 3.4: Circuit of the DRAM-style cell design
The benefits of this cell variant are that it is only 1PP wide and that it can
increment the transistor sizing per single fin. The downside is that it also has
"useless" transistors (or at least half ones) which increase unwanted parasitic
front-end capacity.
MULTIPLE PATTERNING IN IN10 AND IN7 27
3.1.4 Pitch based cell sizes
Cell sizes for memory technologies have traditionally been expressed in the
square of the feature size F. Originally this feature size was the smallest printable
size of the technology, which would typically also be the minimum transistor
size. For memory technologies this was often the half pitch of the metal layers,
since these were often determining the cell size. In advanced logic technologies
and especially finFET technologies there is no clear feature size present. For
this reason estimates of the cell sizes are based on pitches.
For the three cell variants, these pitch based estimates are simple and
straightforward. They all are using the dual bit line scheme, so they must
be at least 2Metal Pitch (MP) high. The width of the cell varies from 2 over 1.5
to 1PP wide for the two-finger, dummy-poly and DRAM-style cell respectively.
This would suggest that the two-finger cell is the largest and the DRAM-style
cell is the smallest. As will be shown in the next sections, this is a very simplified
estimate that will not be correct due to the multiple secondary design rules
present in advanced technology nodes.
3.1.5 111 SRAM cell
For SRAM, the area optimized 111 cell variant is investigated. These numbers
describe the number of fins for the pull-up, pass gate and pull-down transistors.
Fig. 3.5 shows the circuit of the standard 6 transistor SRAM cell. The 111 cell
variant is the minimum sized version and has a single fin for the pull-up (PU),
pass gate (PG) and pull-down (PD) transistors.
3.2 Multiple patterning in iN10 and iN7
193i lithography has a single patterning pitch limit of about 80 nm. The desired
pitches for iN10 and iN7 (shown in Table. 3.1 and Table. 3.3) are however beyond
this limit, necessitating the use of multiple patterning schemes. In this section
these schemes are discussed and the design rules that go with them to conclude
with specifics of the technologies used in this work.
3.2.1 Multiple patterning schemes
There are two main options for multiple patterning schemes used, the multiple
litho-etch (LEx) schemes and the self-aligned multiple patterning (SAxP)
28 DESIGN-TECHNOLOGY CO-OPTIMIZATION
BLBL
WL
Vdd
PD
PU
PGPG
PU
PD
Figure 3.5: Circuit of the 6 transistor SRAM cell with pull-up (PU), pass gate
(PG) and pull-down (PD) transistors indicated.
schemes. Common flavors include single, double and triple litho-etch (LE,
LE2 and LE3) and self-aligned double and quadruple patterning (SADP and
SAQP).
Multiple Litho-Etch
LEx uses multiple lithographic printing and etching steps at relaxed pitch to
make patterns at denser pitch. This is a direct print technique where the masks
contain the desired patterns which are printed directly on the die. Fig. 3.6 shows
how a dense pattern is decomposed in multiple patterns which are less dense so
they can be printed with 193i lithography. This scheme however comes with
some problems to ensure that the desired patterns are printed reliably.
First, there is the alignment between the different LE steps. Since the multiple
patterns are in the same layer, but printed in different steps, they are not
automatically aligned. Therefor there is a need to keep sufficient distance
between the patterns in the different LE steps. Fig. 3.7 illustrates this problem:
a) shows the desired pattern, b) shows the overlap in case of mis-alignment of
steps 2 and 3, c) shows the desired pattern with extra margin and d) shows no
overlap in case of mis-alignment of steps 2 and 3.
Secondly, there is the effect of rounding and pull-back. Although the problem of
the dense pitch can be resolved by multiple patterning, the problem of printing
narrow lines remains. When printing rectangular patterns with sharp corners,
the light can not get all the way into these corners, causing rounding to happen.
MULTIPLE PATTERNING IN IN10 AND IN7 29
Figure 3.6: Pitch density multiplying by multiple (i.e. triple) litho-etch scheme.
Figure 3.7: Multiple litho-etch scheme alignment problem.
When two corners are close together, as is the case with narrow lines, the
rounding of these corners can cause the tip of the line to pull-back entirely,
making the line shorter than intended. Therefor it is important to extend
the tips of the lines beyond the lines on other layers to which they need to
connect, so that the via will make a reliable connection between them. Fig. 3.8
illustrates this problem: a) shows the desired pattern for a wide and a narrow
line, b) shows limited rounding and c) shows severe rounding with pull-back for
the narrow line.
30 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.8: Multiple litho-etch scheme pull-back problem.
Self-Aligned Multiple Patterning
SAxP uses spacer growth on both sides of a printed mandrel to double the pitch
density. When extending this with a second spacer growth step, this pattern
gets doubled again, creating a pattern with quadruple density of the original.
Fig. 3.9 shows how the pattern gets doubled and quadrupled by spacer growth
steps.
Figure 3.9: Pitch doubling and quadrupling by self-aligned multiple patterning.
MULTIPLE PATTERNING IN IN10 AND IN7 31
SAxP is an indirect print technique where the combination of multiple masks
and process steps form the desired patterns. SAxP is very efficient in printing
dense lines, but needs to be complemented with cut or block masks. These extra
masks cut the lines when the desired pattern is generated with the spacer itself
i.e. transistor fins or block the gaps from extending when the desired pattern is
generated from the gaps i.e. metal lines. Fig. 3.10 and Fig. 3.11 illustrate the
use of these extra masks.
Figure 3.10: Pattern generation with SAxP and cut masks.
The downside of SAxP is that, in practice, it limits the layer to be one-
dimensional. This means that all the lines in a given layer need to run either
horizontal or vertical. This limits the routing flexibility within any given layer,
since a jump to another layer needs to be made to connect points which are not
on a single line.
Finally, it is important to note that SAQP for metal lines, where the desired
pattern is generated from the gaps, has patterns which originate from different
32 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.11: Pattern generation with SAxP and blocks masks.
processes. The final metal line width gets determined by either the thickness
of the spacer 1 deposition or a combination of the mandrel width and pitch
and the thickness of the spacers. This results in the lines having a different
variability profile and a systematic variability component between the different
populations.
Fig. 3.12 illustrates the different populations of SAQP patterned metal lines.
The dash-dotted lines highlight a line which is fully determined by the thickness
of the spacer 1 deposition. This population consists of every other line and has
the best control of the width of the lines. The dotted lines highlight a line which
is determined by the width of the mandrel and by the thickness of the spacer
2 deposition. This population consists of one out of four lines. The dashed
lines highlight a line which is determined by the pitch and width of the mandrel
and by the thickness of both spacer 1 and 2 depositions. This population also
consists of one out of four lines and has the poorest control of the width of the
MULTIPLE PATTERNING IN IN10 AND IN7 33
lines.
Figure 3.12: Different populations of SAQP patterned metal lines.
3.2.2 Design rules
Due to the extra problems caused by multiple patterning and the inherent
nature of extremely scaled dimensions, there are new and/or stricter design
rules to ensure reliable construction of the desired structures. In this section,
the most important ones for this work are explained.
Tip-to-Tip Spacing
Fig. 3.13 shows the Tip-to-Tip (T2T) spacing rule as the minimum distance
between two line ends or tips on the same track. This rule is very different for
the different multiple patterning schemes. With LEx this minimum distance
is mostly determined by the alignment of two LE steps, while with SAxP this
distance is determined by the minimum size of the cut or block mask pattern.
34 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.13: T2T rule for both a) LEx and b) SAxP .
Extension
Fig. 3.14 shows a few examples of extension rules. These rules are in place
to make sure that structures from different layers overlap properly despite of
alignment mismatch. This overlap is important to make sufficient contact i.e.
M0 contacting the fin or to protect lower layers from processing damage i.e. M0
used as etch stop for V0.
Figure 3.14: Extension rule for both a)M0 extension over fin and b)M0 extension
V0/M1.
MULTIPLE PATTERNING IN IN10 AND IN7 35
Table 3.1: iN10 technology sizes, pitches and patterning schemes
Layer Pitch (nm) Width (nm) Patterning
Fin 36 10 SADP
Gate 64 20 LE2
Local interconnect (M0) 64 32 LE3
Metal M1 48 24 LE3
Metal M2 48 24 SADP
Minimum area
The minimum area rule is the final one to highlight. This rule is in place to
make sure that a metal strip with a via underneath gets properly filled with
metal. This is especially important for copper interconnect, since it uses barrier
and liner layers in order to be able to fill narrow trenches with copper. For
copper, an aggressive number of 1386 nm2 is used and for tungsten, 700 nm2 is
used.
3.2.3 Technologies
The two technologies used for the analysis of the three cell variants are iN10
and iN7. Their sizes, pitches and patterning schemes are shown with their
differences highlighted.
iN10
The 10 nm bulk finFET technology presented in [22] is used. Table 3.1 shows
the sizes, pitches and patterning schemes of the most important FEOL and
BEOL layers of iN10. Table 3.2 shows the most important design rules for
STT-MRAM in iN10.
iN7
The 7 nm bulk finFET technology presented in [23][25] is used. Table 3.3 shows
the sizes, pitches and patterning schemes of the most important FEOL and
BEOL layers of iN7. Note that, compared to iN10, there is an extra metal
layer inserted between M0 and M1 which is called Mint. This is an extra
Middle-Of-Line MOL layer that can either be in tungsten, cobalt or copper.
Table 3.4 shows the most important design rules for STT-MRAM in iN7.
36 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Table 3.2: iN10 design rules
Rule Minimum value (nm)
Gate extension over fin 20
Gate T2T 40
Gate contact height 40
Gate contact spacing to M0 20
M0 T2T 30
M0 extension over fin 15
M0 extension V0/M1 16
M1 extension over V0 15
M1 T2T 35
Table 3.3: iN7 technology sizes, pitches and patterning schemes
Layer Pitch (nm) Width (nm) Patterning
Fin 24 5 SAQP
Gate 42 18 SADP
Local interconnect (M0) 42 20 self-aligned
Metal Mint 32 21 SAQP
Metal M1 42 24 SADP
Metal M2 32 16 SAQP
Table 3.4: iN7 design rules
Rule Minimum value (nm)
Gate extension over fin 15
Gate extension over dummy fin 5
Gate T2T 21
Gate contact height 21
Gate contact spacing to M0 9
M0 T2T 18
M0 extension over fin 5
M0 extension V0/Mint 0
Mint T2T 18
Mint extension over V0 2
M1 T2T 18
CRITICAL RULES ANALYSIS OF PHYSICAL LAYOUTS 37
There are some major differences with iN10 in the design rules that will impact
the cell sizes. There is a big drop in minimum M0 T2T distance from 30 nm
to 18 nm. This is because of the switch from LE3 patterning to self-aligned
contacts with block patterns. Where in iN10 this rule was dominated by
alignment between the different LE steps, it is now determined by the minimum
with of the block pattern. There is also a big drop in M0 extension V0/Mint
from 16 nm to zero and in Mint (M1 in iN10) extension over V0 from 15 nm to
2 nm. This is partly due to improved alignment from layer to layer and partly
because partially landing via’s are allowed for density. The V0 etch has also
become a timed etch and does not use M0 as stopping layer. This way the
lower layers are protected from damage, even if there is only partial V0-landing
on M0. Also the gate and gate contact have significant improvements: a new
extension rule for dummy fins, a big drop in gate T2T from 40 nm to 21 nm,
a smaller gate contact height from 40 nm to 21 nm, a smaller spacing between
gate and M0 from 20 nm to 9 nm and the possibility to contact the gate from
one side only.
3.3 Critical rules analysis of physical layouts
In this PhD research, the circuits of the cell variants have been translated to
optimized physical layouts taking into account the multiple patterning schemes
and their design rules. After a thorough analysis of the critical rules, scaling
boosters have been proposed in this PhD research which are presented in the
next section 3.4, in order to optimize the cell area density.
3.3.1 iN10
In iN10 the MTJ will be integrated on top of the vertical M2. All the 3D views
and layouts that are shown in the figures of section 3.3.1 use the same color
scheme:
• Green: fin,
• Red: gate,
• Black box: gate contact,
• Blue: M0,
• Purple: horizontal M1,
38 DESIGN-TECHNOLOGY CO-OPTIMIZATION
• Yellow: vertical M2,
• Orange: MTJ and
• Black: horizontal M3.
Two finger cell
Fig. 3.15 shows the 3D view of the two-finger cell. Six gate lines are shown of
which two are active and highlighted. Also two BL/SL combinations are shown.
The cell shares SL contacts with its neighbors and has a continuous active fin
pattern.
Figure 3.15: 3D view of the two-finger cell design
CRITICAL RULES ANALYSIS OF PHYSICAL LAYOUTS 39
Fig. 3.16 shows the layout of the two-finger cell in iN10 with its critical rules
that determine the cell height. A single cell is highlighted by the dashed black
box. The width of the cell is 2PP by construction. The critical rules that
determine the height of the cell are highlighted by the orange arrows. They are
• (A) M0 T2T,
• (B) M0 extension over fin,
• (C) half of the fin width,
• (D) half of M1 minimum spacing,
• (E) M1 minimum width and
• (F) M0 extension V0/M1.
In this way, a minimal cell height of 102 nm is reached.
Figure 3.16: Layout of the two-finger cell in iN10 with critical rules highlighted.
Dummy poly cell
Fig. 3.17 shows the 3D view of the dummy-poly cell. Six gate lines are shown
of which one is active and highlighted. Half of the shared dummy-poly is also
40 DESIGN-TECHNOLOGY CO-OPTIMIZATION
highlighted. Again two BL/SL combinations are shown. The fin pattern is
continuous and the isolation of the internal nodes is made by grounding the
dummy-poly. As the cell shrinks in width as compared to the two-finger cell,
more cells can be placed within the six gate lines.
Figure 3.17: 3D view of the dummy-poly cell design
In order to physically layout the dummy-poly cell, two adjacent M0 lines need
to be contacted to different horizontal M1 strips. It is however not possible to
connect both M0 lines at the transistor drains without violating the M1 T2T
and the M1 extension over V0 rule. Fig. 3.18 illustrates this issue. Therefor it
is necessary to use 3M1 tracks to make all the contacts to the cells.
Fig. 3.19 shows the layout of the dummy-poly cell in iN10 with its critical rules
that determine the cell height. A single cell is highlighted by the dashed black
box. The width of the cell is 1.5PP by construction. The height of the cell is
3MP due to the problem mentioned above, but there is also a second path of
CRITICAL RULES ANALYSIS OF PHYSICAL LAYOUTS 41
Figure 3.18: Illustration of the contacting problem for the dummy-poly cell in
iN10.
rules which are critical for the height of the cell. They are again highlighted by
the orange arrows. They are
• (A) M0 T2T,
• (B) M0 extension over fin,
• (C) half of the fin width,
• (G) half of the fin pitch,
• (E) half of M1 minimum width,
• (D) M1 minimum spacing,
• (E) M1 minimum width and
42 DESIGN-TECHNOLOGY CO-OPTIMIZATION
• (F) M0 extension V0/M1.
In this way, a minimal cell height of 144 nm is reached.
Figure 3.19: Layout of the dummy-poly cell in iN10 with critical rules
highlighted.
DRAM-style cell
Fig. 3.20 shows the 3D view of the DRAM-style cell. Six gate lines are shown of
which one is active and highlighted. Also two BL/SL combinations are shown.
The BL is very wide as consecutive cells are staggered. Two rows of MTJ’s are
connected to the same BL. There is a gap in the fin pattern to create isolated
internal nodes.
Fig. 3.21 shows the layout of the DRAM-style cell in iN10 with its critical rules
that determine the cell height. A single cell is highlighted by the dashed black
CRITICAL RULES ANALYSIS OF PHYSICAL LAYOUTS 43
Figure 3.20: 3D view of the DRAM-style cell design
box. The width of the cell is 1PP by construction. The critical rules that
determine the height of the cell are highlighted by the orange arrows. They are
two times
• (A) M0 T2T,
• (B) M0 extension over fin,
• (C) fin width,
• (G) fin pitch and
• (B) M0 extension over fin.
In this way, a minimal cell height of 212 nm is reached.
44 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.21: Layout of the DRAM-style cell in iN10 with critical rules
highlighted.
111 SRAM cell
Fig. 3.22 shows the layout of the 111 SRAM cell variant in iN10. A single
cell is highlighted by the dashed black box. The width of the cell is 2PP
by construction. The critical rules that determine the height of the cell are
highlighted by the orange arrows. They are
• (L) gate contact height,
• (M) gate contact spacing to M0,
• (B) M0 extension over fin,
• (C) fin width,
CRITICAL RULES ANALYSIS OF PHYSICAL LAYOUTS 45
• (I) gate extension over fin,
• (H) gate T2T,
• (I) gate extension over fin,
• (C) fin width,
• (B) M0 extension over fin,
• (A) M0 T2T,
• (B) M0 extension over fin,
• (C) fin width,
• (I) gate extension over fin,
• (H) gate T2T,
• (I) gate extension over fin,
• (C) fin width,
• (B) M0 extension over fin and
• (M) gate contact spacing to M0.
In this way, a minimal cell height of 370 nm is reached. Note that for the
SRAM cell, it is more efficient to run M1 vertical and not horizontal as with
STT-MRAM. The cell is finished by the horizontal BL’s and supply lines on
M2 (not shown).
Assessment
Table 3.5 lists the area of the different cell variants for the different technologies
considered.
For STT-MRAM in iN10, the dummy-poly cell is the biggest, because the
inability to create the contacts of its desired layout results in a big area hit.
The large overhead of FEOL and MOL layers of the DRAM-style cell cause
it to be larger than the two-finger cell, which is the smallest. This is a stark
contrast with the pitch based estimate where the two-finger cell is the largest.
It is clear from the critical rules in all cells that the strict extension and T2T
rules have a big impact on the cell size.
For SRAM in iN10, the 111 cell is dominated by FEOL and MOL rules. The
gate contacts specifically are very area consuming. The 111 SRAM cell is 3.6x
larger than the smallest two finger STT-MRAM cell.
46 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.22: Layout of the 111 SRAM cell in iN10 on the left with critical
FEOL rules highlighted on the right.
CRITICAL RULES ANALYSIS OF PHYSICAL LAYOUTS 47
Table 3.5: Minimum area [nm2] of the different cell variants for the different
technology options
Technology STT-MRAM SRAMTwo-finger Dummy-poly DRAM-style 111
iN10 13 056 13 824 13 568 47 360
iN7 5 376 6 048 5 376 16 128
iN7 SAQP eng. 5 376 4 788 4 788 15 204
iN7 m-level via 5 376 4 032 5 376 16 128
3.3.2 iN7
In iN7 the MTJ will be integrated on top of the vertical M1. All the layouts
that are shown in Fig. 3.23, Fig. 3.24, Fig. 3.25, Fig. 3.26 and Fig. 3.27 use the
same color scheme:
• Green: fin,
• Red: gate,
• Black box: gate contact,
• Blue: M0,
• Purple: horizontal Mint and
• Yellow: vertical M1,
Also in the next sections this coloring scheme is applied.
As discussed in section 3.2.3 the most critical secondary design rules on extensions
and T2T spacing are very different in iN7 as compared to iN10 due to the
difference in patterning schemes. As a baseline assumption for iN7, the Mint
layer needs to be composed of a regular grid of evenly spaced lines with the
same width.
Two finger cell
Thanks to the tighter design rules mentioned above, both the FEOL and MOL
layers fit in the theoretical minimum height of 2xMP.
Fig. 3.23 shows the layout of the two-finger cell in iN7. A single cell is highlighted
by the dashed black box. The width of the cell is 2PP by construction. The
48 DESIGN-TECHNOLOGY CO-OPTIMIZATION
height of the cell is 2MP, so the theoretical minimal cell height of 64 nm is
reached.
Figure 3.23: Layout of the two-finger cell in iN7 at 2 MP high.
Since every other line in SAQP is fully spacer defined, all SL’s of this two finger
cell design can be spacer defined. This is beneficial for the variability of the
different BL rows, since this spacer deposition is well controlled. The spread
on the width of the short Mint strips has limited influence on the cell, so their
larger variability is of little concern.
Dummy poly cell
Thanks to the tighter design rules mentioned above, both the FEOL and MOL
layers again fit in the theoretical minimum height of 2xMP. The problem of
minimum area however pops up for the vertical copper M1 strips, which can
not be staggered as in the two-finger cell, since the dummy-poly cell is less wide.
This combined with the Mint SAQP grid results in the cell growing to 3xMP
in height. This causes the same layout design as in iN10 to be used. Fig. 3.24
illustrates the issue of the vertical M1 strips.
CRITICAL RULES ANALYSIS OF PHYSICAL LAYOUTS 49
Figure 3.24: Illustration of the problem with the vertical M1 strips in iN7.
Fig. 3.25 shows the layout of the dummy-poly cell in iN7. A single cell is
highlighted by the dashed black box. The width of the cell is 1.5PP by
construction. The height of the cell is 3MP due to the Mint SAQP grid.
Therefor, a minimal cell height of 96 nm is reached.
An important side effect of this 3MP high cell is the grid walking that occurs for
different BL rows. This is important for the SL variability, which will now have
a systematic component between BL rows of different SAQP line populations.
50 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.25: Layout of the dummy-poly cell in iN7 at 3 MP high.
DRAM-style cell
Thanks to the tighter design rules mentioned above, the DRAM-style, which
height was dominated by FEOL and MOL rules, shrinks a lot in height. The
Mint SAQP grid however forces in the cell to grow to 4xMP in height.
Fig. 3.26 shows the layout of the DRAM-style cell in iN7. A single cell is
highlighted by the dashed black box. The width of the cell is 1PP by
construction. The height of the cell is 4MP due to the Mint SAQP grid.
Therefor, a minimal cell height of 128 nm is reached. The unused Mint track is
also shown.
With a 4xMP cell height, all SL’s can again be spacer defined. This also means
that the unused Mint track is spacer defined, but it is still most beneficial to do
this for the SL variability.
CRITICAL RULES ANALYSIS OF PHYSICAL LAYOUTS 51
Figure 3.26: Layout of the DRAM-style cell in iN7 at 4 MP high.
111 SRAM cell
Fig. 3.27 shows the layout of the 111 SRAM cell variant in the iN7 baseline.
The unit cells are indicated by the dashed black boxes. The 111 SRAM cell
is 2PP wide by construction. With a regular Mint grid, its height is 6MP or
192 nm. Two of the 6 Mint tracks can not be used as they can not be contacted
from the top nor the bottom. Note that the Mint pattern is repeating every
two cells and that there are some limited differences in SAQP line population.
It is however possible to have spacer defined lines for both BL connections, for
the VSS supply line connections and for the VDD supply line. Only the WL
connections would alternatively belong to the "mandrel-defined" or "gap-defined"
lines.
52 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.27: Layout of the 111 SRAM cell in iN7 of 6MP high.
SCALING BOOSTERS 53
Assessment
Table 3.5 lists the area of the different cell variants for the different technologies
considered.
For STT-MRAM in iN7 with regular Mint grid, the two-finger cell is still the
smallest, but the DRAM-style cell is now an equally small alternative. These
designs can be used to trade-off WL resistance for SL and BL resistance. The
dummy-poly cell suffers most from the SAQP grid snapping and is the largest.
In terms of SL variability, the two-finger cell and the DRAM-style cell can both
be constructed with all SL’s fully spacer defined. The dummy-poly cell has the
issue of grid walking, which will generate SL’s with widths of all three SAQP
line populations.
For SRAM in iN7 with regular Mint grid, the 111 cell is now 2.9x smaller than
the iN10 version. This is due to the improved FEOL and MOL design rules.
The 111 SRAM cell is now exactly 3x larger than the smallest two-finger and
DRAM-style STT-MRAM cells.
Despite many of the critical rules from iN10 being addressed in iN7 by using
different pattering schemes, these schemes bring new problems with them. A
solution is needed for SAQP grid snapping, as well as for the minimum area
problem of short metal strips.
3.4 Scaling boosters
For iN7, so-called scaling boosters are being investigated to create area scaling
without scaling the basic sizes and pitches. These scaling boosters tackle
problems in physical layouts by enabling special processing and integration
techniques. The minimum size of short vertical metal strips limits the scaling of
the STT-MRAM cells, which is further exacerbated due to SAQP grid snapping.
By allowing flexibility in SAQP, the grid can adapt to the size of these strips
and by introducing multi-level via’s, the strips can be avoided all together.
3.4.1 SAQP mandrel and spacer engineering
SAQP is typically used to create a regular pattern of lines, equal in width and
equal in spacing. It consist out of three steps which determine this, as illustrated
in Fig. 3.28: mandrel printing and two spacer deposition steps. When the spacer
deposition is tuned, it can be made possible to merge two spacers, which will
eliminate one line to be formed, ending up with a repeating pattern of 3 lines.
54 DESIGN-TECHNOLOGY CO-OPTIMIZATION
When changing the width and/or the pitch of the mandrels, wider lines can be
created, resulting for instance in a repeating pattern of 2 lines. Combining both
approaches also gives the possibility to create a repeating pattern of 3 lines
where one is wider. All this allows to adapt the SAQP grids to the STT-MRAM
cell height and therefor avoids the overhead of snapping to the grid.
Figure 3.28: Illustration of possible methods to tune the SAQP grid for Mint
lines.
Two finger cell
The two finger cell was already at its theoretical minimum with a regular SAQP
grid. Its layout therefor remains the same as in Fig. 3.23.
Dummy poly cell
Thanks to the tighter design rules of iN7, the dummy-poly cell can now connect
both M0 lines at the transistor drains within the same track. The minimum
area constraint for the horizontal Mint strips however forces the use of tungsten.
Fig. 3.29 illustrates this issue. This in turn results in the need for SL straps
from Mint to a copper M2 to have sufficiently low SL resistance.
The dummy-poly cell is also limited by the minimum area of the vertical M1
strips. By adapting the SAQP mandrel width and pitch, a pattern of narrow
and wide lines is formed that matches the minimum cell height.
Fig. 3.30 shows the layout of the dummy-poly cell with its critical rules that
determine the cell height. A single cell is highlighted by the dashed black box.
The width of the cell is 1.5 PP by construction. The critical rules that determine
the height of the cell are highlighted by the orange arrows. They are
• M1 T2T and
SCALING BOOSTERS 55
Figure 3.29: Illustration of the problem with the horizontal Mint strips for the
dummy-poly cell in iN7.
• M1 copper minimum area.
In this way, a minimal cell height of 76 nm is again reached, which is a reduction
of 21%.
It is also important to note that the fin grid, which is also in SAQP, also needs
to be adapted to this height. This actually has been the case for most cells and
is also the common baseline. Since all fins are spacer 2 defined and they are
grouped per two at a pitch defined by spacer 1, it is only the mandrel width and
pitch which needs to be relaxed for these cells. Fig. 3.31 shows the fin and Mint
layers with their SAQP mandrel and spacer patterns. They are both aligned to
the cell height of 76 nm, which is highlighted by the dashed black box.
It is important to note that the SL’s are not spacer defined and belong to two
different populations of SAQP lines. The SL’s are however increased in nominal
width from 21nm to 33 nm, which will compensate the increased absolute
variability.
56 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.30: Layout of the dummy-poly cell in iN7 with SAQP tuning, with
critical rules highlighted.
DRAM-style cell
The DRAM-style cell is limited by the FEOL and MOL rules. It has however
the added difficulty that it needs a repeating pattern of 3 lines on the Mint
layer. Adapting the SAQP mandrel width and pitch is not enough to achieve
this, so spacer merging is required to further shrink the cell size.
Fig. 3.32 shows the layout of the DRAM-style cell with its critical rules that
determine the cell height. A single cell is highlighted by the dashed black box.
The width of the cell is 1 PP by construction. The critical rules that determine
the height of the cell are highlighted by the orange arrows. They are two times
• (A) M0 T2T,
• (B) M0 extension over fin,
• (C) fin width,
• (G) fin pitch and
• (B) M0 extension over fin.
SCALING BOOSTERS 57
Figure 3.31: SAQP mandrel width and pitch tuning to align the fin and Mint
layers to the minimum cell height of the dummy-poly cell in iN7.
This is the same as it was in iN10, but thanks to the tighter design rules as
mentioned above, a minimal cell height of 114 nm is reached, which is a reduction
of 11%.
Fig. 3.33 shows the fin and Mint layers with their SAQP mandrel and spacer
patterns. They are both aligned to the cell height of 114 nm, which is highlighted
by the dashed black box. Note that the Mint layer is also using the spacer
merge technique in order to get a repeating pattern of 3 lines.
Also here the SL’s are not spacer defined, but they do all belong to the same
population of SAQP lines. By using the "mandrel"-defined lines, the variability
of these lines is still fairly well controlled. The SL’s are also increased in nominal
width from 21 nm to 28 nm, which will also compensate the slightly increased
58 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.32: Layout of the DRAM-style cell in iN7 with SAQP tuning, with
critical rules highlighted.
absolute variability.
111 SRAM cell
Fig. 3.34 shows the layout of the 111 SRAM cell variant in iN7 with SAQP
engineering. A single cell is highlighted by the dashed black box. The width of
the cell is 2 PP by construction. The critical rules that determine the height of
the cell are highlighted by the orange arrows. They are
• (H) gate T2T,
• (I) gate extension over fin,
• (C) fin width,
SCALING BOOSTERS 59
Figure 3.33: SAQP mandrel width and pitch tuning and spacer merging to
align the fin and Mint layers to the minimum cell height of the DRAM-style
cell in iN7.
• (I) gate extension over fin,
• (H) gate T2T,
• (K) gate extension over dummy fin,
• (C) fin width,
• (B) M0 extension over fin,
• (A) M0 T2T,
• (B) M0 extension over fin,
• (C) fin width,
60 DESIGN-TECHNOLOGY CO-OPTIMIZATION
• (K) gate extension over dummy fin,
• (H) gate T2T,
• (I) gate extension over fin,
• (C) fin width and
• (I) gate extension over fin.
In this way, a minimal cell height of 181 nm is reached, which is a reduction of
6% as compared to the iN7 baseline. By adapting the SAQP mandrel width
and pitch, a pattern of narrow and wide lines is formed that matches this limit.
The single wide line is used for the VDD supply line. Note that the Mint pattern
is now repeating every cell and that every complementary line is belonging to
the same SAQP population, which improves the variability.
Figure 3.34: Layout of the 111 SRAM cell in iN7 with SAQP engineering on
the left with critical FEOL rules highlighted on the right.
SCALING BOOSTERS 61
Assessment
Table 3.5 lists the area of the different cell variants for the different technologies
considered.
For STT-MRAM in iN7 with SAQP mandrel and spacer engineering, the dummy-
poly cell is now the smallest together with the DRAM-style cell. These designs
can again be used to trade-off WL resistance for SL and BL resistance. The two
finger cell which was already at its theoretical limit and therefor does not benefit
from the scaling booster is now the largest. One important thing to note is that
the DRAM-style cell needs the extra scaling booster of spacer merging in order
to become the smallest. If spacer merging is not allowed, the DRAM-style cell
would grow to 4MP in height. In terms of SL variability, the two-finger cell still
has all the SL’s constructed with fully spacer defined lines. The dummy-poly
cell has the widest SL with the most variability and the DRAM-style cell is in
between, both in width and variability.
For SRAM in iN7 with SAQP mandrel and spacer engineering, the 111 cell is now
again at its FEOL limit. Also the variability of the Mint layer is optimized by
creating a pattern that is symmetric and repeats every cell. The 111 SRAM cell
is now 3.2x larger than the smallest dummy-poly and DRAM-style STT-MRAM
cells.
The dummy-poly cell and the DRAM-style cell have gotten significantly smaller,
thanks the improved extension and T2T rules due to the switch of patterning
scheme and the flexibility in SAQP. The biggest remaining issues now are the
short metal strips which are limiting the cells due to minimum area constraints.
3.4.2 Multi-level via
The core problem in the layouts of the STT-MRAM cells is the need to make
connections up to higher metal layers for integrating the MTJ. This causes
short metal strips on the Mint and M1 layers, that serve no other purpose than
connecting the access transistor straight up to the MTJ.
Multi-level via’s are created exactly to solve this issue. They are high aspect
ratio via’s that serve to connect a Mx layer with a Mx+2 layer. Fig. 3.35
illustrates this concept for two potential implementations. A "three-level" via
can be etched while processing the dual-damascene M2-V1 layers or a "two-level"
via can be etched while processing the dual-damascene M1-Vint layers. The
"two-level" via approach will be used for integrating the MTJ since the top of
this multi-level via is at the same layer as the M1 strips it will replace.
62 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.35: Illustration of two potential implementations of a multi-level via.
Two finger cell
The two finger cell was already at its theoretical minimum. By replacing the
vertical M1 strips by a multi-level via, an alternative implementation appears.
This implementation has the advantage that a copper WL can now be integrated
in the vertical M1 layer.
Fig. 3.36 shows the layout of the two-finger cell with multi-level via’s. The
multi-level via is highlighted in a separate layer. A single cell is highlighted
by the dashed black box. The width of the cell is 2PP by construction. The
height of the cell is 2MP, so the theoretical minimal cell height of 64 nm is
again reached.
Also in this alternative implementation, all SL’s are spacer defined.
Dummy poly cell
The dummy-poly cell is limited by both the minimum area of the vertical M1
strips and the minimum area of the horizontal Mint strips. There are two
possible solutions to reach the theoretical minimum area.
Fig. 3.37 shows the layout of the dummy-poly cell in iN7 when replacing only
the vertical M1 strips with multi-level via’s. This leaves the SL on both the
copper M2 layer and the tungsten Mint layer. The multi-level via is highlighted
SCALING BOOSTERS 63
Figure 3.36: Layout of the two-finger cell in iN7 with multi-level via’s at 2 MP
high.
in a separate layer. A single cell is highlighted by the dashed black box. The
width of the cell is 1.5 PP by construction. The height of the cell is 2MP, so the
theoretical minimal cell height of 64 nm is reached, which is a reduction of 33%
compared to the baseline iN7 and 16% compared to the flexible SAQP grid.
There is also a second path of rules which are critical for the height of the cell.
They are again highlighted by the orange arrows. They are
• (A) M0 T2T,
• (B) M0 extension over fin,
• (C) half of the fin width,
• (G) half of the fin pitch,
• (D) half of M1 minimum spacing,
• (E) M1 minimum width and
• (F) M0 extension V0/Mint (not shown, equal to zero).
Fig. 3.38 shows the layout of the dummy-poly cell when replacing both the
vertical M1 and horizontal Mint strips with multi-level via’s. This allows for
64 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Figure 3.37: Layout of the dummy-poly cell in iN7 when using one level of
multi-level via’s, with critical rules highlighted.
the use of a copper SL on the Mint layer at the cost of an extra multi-level via
layer. The two multi-level via’s are on top of each other and are highlighted in
a separate layer. Note that the horizontal Mint strips are gone from the layout
as they are replaced with the multi-level via. A single cell is highlighted by
the dashed black box. The width of the cell is 1.5PP by construction. The
height of the cell is 2MP, so the theoretical minimal cell height of 64 nm is
again reached. There is again the same second path of rules which are critical
for the height of the cell. They are highlighted by the orange arrows.
In both these implementations, all SL’s can be spacer defined as with the two-
finger cell. Note for the first version with the tungsten SL that it is important
to make the BL on M2 spacer defined and not the SL. The effect of the higher
variability of the copper M2 SL will be reduced by the stitched tungsten SL on
Mint.
SCALING BOOSTERS 65
Figure 3.38: Layout of the dummy-poly cell in iN7 when using two levels of
multi-level via’s, with critical rules highlighted.
DRAM-style cell
The DRAM-style cell was not limited by the vertical M1 strips, but by the
FEOL and MOL rules. It has no benefit from the multi-level via, so its layout
and size remain the same. It also can not benefit from the multi-level via’s to
route copper WL’s in the vertical M1 layer.
111 SRAM cell
The 111 SRAM cell was not limited by metal strips, but by the FEOL and
MOL rules. It has no benefit from the multi-level via, so its layout and size
remain the same.
Assessment
Table 3.5 lists the area of the different cell variants for the different technologies
considered.
66 DESIGN-TECHNOLOGY CO-OPTIMIZATION
For STT-MRAM in iN7 with multi-level via’s, the dummy-poly cell is now the
only smallest cell. The two finger cell is the biggest and the DRAM-style cell is
also the biggest or in between, depending if the multi-level via’s are combined
with a regular or flexible SAQP grid. In terms of SL variability, the two-finger
cell and the dummy-poly cell have all the SL’s constructed with fully spacer
defined lines. The DRAM-style cell will also have fully spacer defined lines
when using a regular SAQP grid and a wider line with increased variability
when using a flexible SAQP grid.
For SRAM in iN7 the multi-level via’s have no benefit for the area. Depending
on the combination with SAQP engineering or not, the 111 SRAM cell is now
respectively 3.8x or exactly 4x larger than the smallest dummy-poly STT-MRAM
cell.
The dummy-poly cell with multi-level via’s is also the smallest overall cell for
the iN7 variations and is more than 3x smaller than the smallest iN10 cell.
Thanks to the SAQP mandrel and spacer engineering and especially the novel
introduction of multi-level via’s in the process flow, large area gains can be
reached in STT-MRAM cells.
3.5 The third dimension
This entire chapter so far has been under the assumption that the MTJ stack
can be integrated in the height of a via. Whereas for iN10, with a via height
of 48 nm, this should be straightforward to attain, it will be far more difficult
for iN7, with a via height of only 24 nm. A further investigation in this PhD
research considered the challenges of integrating the MTJ in these small height
metal layers and the proposed multi-level via’s are shown to provide the solution
in this section.
3.5.1 MTJ integration
MTJ’s are typically integrated in the BEOL stack in a few steps. Starting from
a metal layer, a bottom electrode is patterned and smoothened. This smooth
surface is important for the quality of the complex thin film stack which will be
deposited next. After printing and etching the MTJ pillars, they need to be
connected from the top as well. Depending on the height of the total MTJ stack
(bottom electrode, thin film stack and remaining hard mask or top electrode)
this connection is made by a partial via or directly by a metal.
THE THIRD DIMENSION 67
Fig. 3.39 illustrates this for iN10 integration. The MTJ is integrated on top of
the vertical M2 and contacted from the top by the BL in the horizontal M3
layer.
Figure 3.39: Typically MTJ integration within the via height.
If the total MTJ stack does not fit in the via height, integration problems arise.
Again starting for a metal layer, the MTJ’s need to be processed first, since
they require a smooth surface without topography for the thin film deposition.
After the MTJ’s are constructed, the via and metal layer at the same level
are processed. This process ends with a copper fill and Chemical Mechanical
Polishing (CMP) step. This is a wafer wide step, so it will also affect the region
where the MTJ’s are located. This already prevents the MTJ stack height to
be higher than the via and metal height combined, since it would be polished
away. Moreover, since copper is hard to polish, this CMP step is very strong
and will cause a dishing effect in "softer" areas where there are no copper lines.
Fig. 3.40 illustrates these effects for iN7 integration. Note that the MTJ is
integrated one level lower as compared to before, in order to be contacted from
the top by a horizontal M2. The MTJ is integrated on top of the horizontal
68 DESIGN-TECHNOLOGY CO-OPTIMIZATION
Mint and contacted from the top by the BL in the horizontal M2 layer, thereby
skipping the M1 layer.
Figure 3.40: MTJ integration problems with MTJ stacks higher than the via
height.
The solution is to provide copper lines in between the MTJ’s that will locally
stop the CMP step from destroying the MTJ’s. In some cell designs, this is not
an issue, as illustrated by the two-finger cell design in Fig. 3.41. There is room
to insert a vertical WL in every cell without increasing the area. Note again
that the MTJ is now integrated on top of the Mint layer.
Denser cells, like the dummy-poly cell, will not have enough room to insert
extra lines without increasing the area. These lines would limit the room for
the MTJ which is now at the same layer.
3.5.2 Multi-level via to the rescue
Multi-level via’s can solve the integration problem. After processing the MTJ’s,
the multi-level via printing and etch steps are added before or after the via and
metal printing and etch steps. The multi-level via has the same height as the
via and metal layers combined. Then the copper fill and CMP steps are done,
THE THIRD DIMENSION 69
Figure 3.41: Layout of the two-finger cell in iN7 with high MTJ stack using M1
lines to protect the MTJ’s from the M1 CMP step.
where the multi-level via’s in between the MTJ’s protect them from the copper
CMP.
Fig. 3.42 shows the combination of multi-level via’s and MTJ integration for
iN7. The multi-level via’s are drawn "behind" the MTJ’s since they are not in
the same metal track. Fig. 3.43 also illustrates this.
Figure 3.42: MTJ integration with high MTJ stack and multi-level via’s.
The multi-level via’s bring the added advantage of increased density. They
occupy less room at the same layer than the alternative metal strips. The
70 DESIGN-TECHNOLOGY CO-OPTIMIZATION
dummy-poly cell can then again be as small as shown before in Fig. 3.37 and
Fig. 3.38.
Fig. 3.43 shows the layout of the dummy-poly cell with a high MTJ stack and
the multi-level via’s. Fig. 3.44 shows the 3D view. Six gate lines are shown
of which one is active and highlighted. Half of the shared dummy-poly is also
highlighted. Again two BL/SL combinations are shown. The MTJ is now
integrated on top of Mint and protected from the M1 CMP by the multi-level
via’s.
Figure 3.43: Layout of the dummy-poly cell in iN7 with high MTJ stack using
multi-level via’s to protect the MTJ’s from the M1 CMP step.
3.6 Conclusion
In this chapter, it has been shown that in technologies from iN10 and beyond,
it is imperative to take into account all embedded memories, including STT-
MRAM, as a benchmark for designing the process flow. The detailed analysis
of different cell variants performed in this PhD research has shown that STT-
MRAM cells will be impacted significantly by the patterning options and scaling
boosters Gains of up to 25% can be reached by introducing multi-level via’s
and/or SAQP mandrel and spacer engineering, whereas this has no or little
CONCLUSION 71
Figure 3.44: 3D view of the dummy-poly cell with high MTJ stack and multi-
level via’s.
impact on SRAM cells. Especially the novel introduction of multi-level via’s
into the process flow will provide a significant cell size reduction and will allow
the integration of MTJ’s in low height metal layers which is equally imperative
for continued scaling of STT-MRAM cells.

Chapter 4
Cell design for high density
caches
This first chapter of two on new cell designs targets higher density caches. The
focus on density results in minimum width biasing lines which show increased
resistance in advanced nodes.
During this PhD research, the line resistance in high density caches is
investigated together with its effect on the cell operation. A novel cell design
with a Partial Source Line Plane (PSLP) is introduced in this PhD research.
The improved resistance-area trade-off and the positive effects of the lower line
resistance on the cell operation are presented in this chapter.
Section 4.1 shows the baseline STT-MRAM cell and the issue of increased line
resistance in iN10. Section 4.2 introduces the novel cell with a Partial Source
Line Plane, its operation and layout. Section 4.3 shows the improved electrical
performance of the cell with PSLP thanks to the improved trade-off between
area and SL resistance.
4.1 Baseline STT-MRAM cell
For embedded SRAM replacement in iN10, the two-finger cell design is
considered as the baseline cell. It was shown in section 3.3 to be the smallest cell
variant and is also used in other work in literature [16][33][3]. For conciseness,
73
74 CELL DESIGN FOR HIGH DENSITY CACHES
the schematics will represent this two finger access transistor as a single device.
All layouts and simulations are however done with this two finger cell design.
Fig. 2.4 illustrated the write operation of the baseline cell. It is important to
note that all cells controlled by the same WL are biased independently by their
own SL and BL, enabling a fully parallel write operation.
Fig. 2.5 illustrated the discharging read operation of the baseline cell. Again
it is important to note that all cells controlled by the same WL can discharge
their own BL, enabling a fully parallel read operation.
Fig. 3.16 illustrated the physical layout of the baseline cell. The size was 128 nm
wide by 102 nm high.
4.1.1 Line resistance
In STT-MRAM cells, the resistance of the lines is important for multiple reasons.
The first obvious reason is the direct increase in series resistance, which affects
both the read and the write operation. For writing, this reduces the overall
current the cell can drive. For reading, this causes the overall resistance ratio
to drop, reducing the voltage signal. The second reason is the decrease of the
gate-to-source voltage due to the line resistance, which limits the transistor
performance and further increases the total undesired resistance. The final
reason is the address dependency which needs to be taken into account when
designing the peripheral circuits.
For assessing the impact of the series resistance on STT-MRAM operation
in deeply scaled nodes, the iN10 PDK is used. The basic sizes and pitches
of the technology were summarized in Table 3.1 in chapter 3. The minimum
metal width of 24 nm is important for the line resistance. Next to the typical
scaling of width and thickness of the copper layers, barriers further reduce the
cross-section of the actual copper. Moreover, at these small sizes, the resistivity
of copper degrades compared to bulk copper as shown in Fig. 4.1 (reproduced
from [1]). This results in high parasitic resistances as also shown in [22].
4.1.2 Cell layout for reduced SL resistance
In the minimal cell height of the baseline cell of 102 nm there is room to widen
the SL from 24nm to 30 nm without increasing the area. Fig. 4.2 shows this
adapted physical layout where the added width of the SL is indicated by the
narrow light purple line.
BASELINE STT-MRAM CELL 75
0 10 20 30 400
2
4
6
8
10
12
Copper line width (nm)
C o
p p
e r
 r e
s i s
t i v
i t y
 ( µ
Ω
 
c m
)
Figure 4.1: Copper resistivity as reproduced from [1]. The increase of resistivity
at small widths leads to high resistance increase.
Figure 4.2: Layout of the two-finger cell in iN10 with increased SL width
highlighted.
76 CELL DESIGN FOR HIGH DENSITY CACHES
The critical rules that now determine the height of the cell are highlighted by
the orange arrows. They are
• (D) M1 minimum spacing,
• (X) SL width
• (D) M1 minimum spacing and
• (E) M1 minimum width
Increasing the SL width beyond 30 nm with 1 nm will increase the cell height
with 1 nm.
The BL on the other hand can be increased in width to cover the entire height
of the cell minus the metal spacing. The BL resistance is therefore not a big
issue in these designs.
4.2 Cell with partial source line plane
In order to decrease the SL resistance further, without increasing the area of
the array, a novel cell design with a PSLP is proposed in this PhD research.
4.2.1 Cell design and operation
Fig. 4.3 shows the cell design with a PSLP connecting the transistor sources
of MBL rows together. By sharing a single SL among MBL rows, the area of
the cell could ideally be reduced down to (M+1)/M times the metal pitch, as
compared to twice the metal pitch for the baseline cell. It is then possible to
reduce the SL resistance without increasing the overall area as compared to the
baseline cell.
Unlike in the baseline architecture, where each cell can be individually controlled
for both read and write, only 1-out-of-M cell connected to a WL is operated
on at the same time. This means that there will always be cells of M different
words on the same WL and the correct memory word needs to be selected by
both the WL and the correct BL. The multiplexing of local BL’s in a shared
global BL is however common practice in sub-array design [16]. The 1-out-of-M
operation does limit the amount of BL rows that can be grouped together with
a PSLP, but most of the potential gains are already reached for small M (see
section 4.3.1).
CELL WITH PARTIAL SOURCE LINE PLANE 77
WL
PSLP
BL 1
BL 2
BL M
Figure 4.3: Circuit of the cell design with a PSLP in bold.
Table 4.1: Operating voltages for the read and write operations of the PSLP-
based cell.
Operation VBL,sel VBL,non VPSLP VWL
Discharge based read PRE = High Low Low High
P2AP write High Low Low High
AP2P write Low High High High
Fig. 4.4 illustrates the 1-out-of-M operation for write, Fig. 4.5 for read. The
voltage on the PSLP and on the BL of the selected cell are set, based on the
operation to be performed on the selected cell. The non-selected cells on the
same WL are deselected by setting their BL to the same potential as that of
the PSLP.
Table 4.1 summarizes the operating voltages from Fig. 4.4 and Fig. 4.5. VBL,sel is
the voltage for the selected BL, while VBL,non is the voltage for the non-selected
BL’s. For iN10, the "High" value is the logic supply voltage of 0.7V and the
"Low" value is the ground potential of 0.0V.
The extreme versions of this approach, either a full SL plane or a SL parallel
to the WL, can not operate in this matter. The different bias voltages for
the P2AP and AP2P switch will cause either a two phase write scheme or the
78 CELL DESIGN FOR HIGH DENSITY CACHES
a) b)
l
0
0
0
l 0
l
l
l
l
Figure 4.4: Write operation of a PSLP-based cell with bias voltages for switching
a)P2AP and b)AP2P.
need for a negative bias voltage [33]. The dual write scheme causes a write
performance penalty and the negative bias voltage causes the need for unwanted
charge pumps [6]. The SL parallel to the WL topology with the dual write
scheme has the added problem that it carries all current of all selected cells,
causing problems with high IR-drop and electro-migration (EM).
4.2.2 Cell layout
The cell design with the PSLP does not change the width of the cells. The
width is still fixed to twice the poly pitch (128 nm, because the two-finger
transistor cell design is still used. A first lower bound for the average cell
height is given by (M+1)/M times the metal pitch. For M=4, this bound is
60 nm (5/4 times 48 nm), 37.5% smaller than the 96 nm (2 times 48 nm) of the
ideal baseline layout. This minimal area however cannot be achieved due to
processing constraints.
CELL WITH PARTIAL SOURCE LINE PLANE 79
l
PRE= l
0
0
0
Figure 4.5: Discharge based read operation of a PSLP-based cell.
Fig. 4.6 illustrates the 3D view of the new cell design. Six gate lines are shown
of which two are active and highlighted. Also two BL/PSLP combinations are
shown. All the cells sharing one PSLP are highlighted.
Fig. 4.7 illustrates the layout of the new cell design with its critical rules that
determine the cell height. Two cells are highlighted by the dashed black boxes.
The bottom one is a cell that is bordering a SL, while the top one is not. The
PSLP with its SL and the connection to the MTJ and the BL are labeled in the
figure. The sources of the access transistors are connected with the long vertical
local interconnect metal layer 0 (M0) lines. The SL itself runs horizontally over
the middle of this local interconnect on metal layer 1 (M1). This gives rise to a
fishbone-shaped connection scheme. The BL runs horizontally on M3 on top of
the MTJ’s, filling the height of the cell minus the metal spacing.
The width of the cell is 2 PP by construction. The critical rules that determine
the height of the cell are highlighted by the orange arrows. For the top cell not
bordering the SL, they are
• (A) half of M0 T2T,
• (F) M0 extension V0/M1,
80 CELL DESIGN FOR HIGH DENSITY CACHES
Figure 4.6: 3D view of the PSLP-based two finger cell.
• (E) M1 minimum width,
• (F) M0 extension V0/M1 and
• (A) half of M0 T2T.
In this way, a minimal cell height equal to 86 nm is reached. For the bottom
cell bordering the SL, they are
• (A) half of M0 T2T,
• (F) M0 extension V0/M1,
• (E) M1 minimum width,
CELL WITH PARTIAL SOURCE LINE PLANE 81
Figure 4.7: Layout of the PSLP-based two finger cell in iN10 with critical rules
highlighted.
82 CELL DESIGN FOR HIGH DENSITY CACHES
• (D) M1 minimum spacing and
• (E) half of M1 minimum width.
In this way, a minimal cell height equal to 91 nm is reached.
This gives an average cell height equal to 88.5 nm for M=4. Even when a SL of
minimal width is used, this SL width affects the cell height. Widening it with
1 nm will however only increase the average cell height with 0.25 nm, since it is
shared among 4 cells.
4.3 Electrical assessment
The biggest advantage of the novel cell with PSLP is the improved trade-off
between area and SL resistance. The area can be reduced at the same SL
resistance or the SL resistance can be reduced at the same area. Section 4.3.1
shows this trade-off. Sections 4.3.2 and 4.3.3 show the gains in performance and
energy consumption for PSLP-based cells with the same area footprint as the
baseline cell. They highlight the benefits of a decreased SL resistance.
Simulations were performed with a variety of MTJ parameters. The relative
gains that are presented in these sections are consistent for the parameter ranges
mentioned in section 2.3. The graphs that are shown are for a TMR of 150%
and a RA product of 5Ωµm2. Three PSLP-based cells have been simulated: M
equal to 1 (baseline cell), 2 and 4. This is done for arrays with 32, 64, 128, 256
and 512 cells per BL.
4.3.1 Source line resistance versus area
Fig. 4.8 summarizes the minimal cell area for various SL topologies with multiple
options. Starting from SL parallel to BL over PSLP with M = [2,4,8,16] to SL
parallel to WL. The first group is based on layouts which take into account the
design rules and the second group is the estimation based solely on the pitches.
There are two important observations to be made from this figure. First of
all it is clear that simple pitch based estimates are insufficient at these small
nodes. Design rules that take into account the process constraints mentioned
in chapter 3 are dominating the size. The second observation is that most of
the area gain is already obtained by sharing a SL among 2 or 4 BL rows. This
further justifies the 1-out-of-M operation mode, which is best suited for smallM.
ELECTRICAL ASSESSMENT 83
Figure 4.8: Minimum cell area for various SL topologies.
Fig. 4.9 shows the SL resistance for the cell located at the end of the SL, taking
into account the end piece of the fish bone. This is done for an array of 256
cells per BL and plotted in function of the cell area as compared to the smallest
baseline cell (M=1). The markers indicate the cells that have been used for
further simulations.
The figure illustrates two of the key benefits of the PSLP. First comparing cells
with the same source line resistance, an area gain is reached of 7% (M=2),
11% (M=4) and 14% (M=8) out of a maximum of 16% for SL parallel to WL.
For the same area as the smallest baseline cell, The SL resistance is reduced
by a factor 2.4 (M=2), 4.6 (M=4) and 5 (M=8). The absolute resistance gain
by going from M=4 to M=8 is very small (as the end piece of the fish bone
becomes longer), so the design with M=8 is not used in further simulations.
The energy and performance gains that go with this resistance reduction are
shown in the next sections.
The BL width only depends on the cell height, not on the cell design. The
smaller cells will have a higher BL resistance, but this resistance is still far
smaller than the SL resistance. For the same area, all cell designs will have the
same BL resistance.
84 CELL DESIGN FOR HIGH DENSITY CACHES
85 90 95 100 105 110 1150
500
1000
1500
2000
2500
3000
3500
Cell area (%)
S o
u r
c e
 l i n
e  
r e
s i s
t a
n c
e  
( Ω
)
 
 
M = 1 (baseline)
M = 2
M = 4
M = 8
Figure 4.9: SL resistance versus cell area.
4.3.2 Write performance and energy consumption
Fig. 4.10 shows the simulated write delay. The reported delay is the slowest of
the P2AP and AP2P switches, reaching 90% of the final desired state. The
write delay is determined by the slowest cell at the end of the SL’s and BL’s,
where the parasitic BEOL resistance has the biggest effect limiting the current
through the cell. For a PSLP with M=4, the delay is decreased by 12%, 23%
and 47% for array sizes of 128, 256 and 512 respectively. This is due to the
higher current that the cell with the lowest resistance can drive.
Fig. 4.11 shows the simulated average write energy. The reported energy
consumption is the combination of a P2AP and AP2P switch. It is averaged
over all the cells on a BL, while using the same WL pulse duration for all
addresses. For a PSLP with M=4, the average write energy is decreased by
7%, 15% and 35% for array sizes of 128, 256 and 512 respectively. This is
due to the faster switching of the MTJ, which is more energy efficient in these
operating ranges.
ELECTRICAL ASSESSMENT 85
32 64 128 256 5120
1
2
3
4
5
6
7
8
Number of cells per BL
W
r i t
e  
d e
l a
y  
[ a .
u . ]
 
 
−12% −23%
−47%
M = 1 (baseline)
M = 2
M = 4
Figure 4.10: Write delay for PSLP-based cells for different array sizes.
32 64 128 256 5120
1
2
3
4
5
Number of cells per BL
A v
e r
a g
e  
w r
i t e
 e
n e
r g
y  
[ a .
u . ]
 
 
−7%
−15%
−35%
M = 1 (baseline)
M = 2
M = 4
Figure 4.11: Average write energy for PSLP-based cells for different array sizes.
86 CELL DESIGN FOR HIGH DENSITY CACHES
4.3.3 Read performance and voltage difference
A read scheme is used where the BL is first pre-charged and then discharged
by the selected cell. Two metrics are investigated in order to analyze read
performance and voltage difference.
In order to assess read performance, the time is considered that it takes for cells
with both possible states to reach a 100mV BL voltage difference. Fig. 4.12
shows the simulated delay. The delay increases with BL capacitance, which
scales linearly with the number of cells per BL. For a PSLP with M=4, the
delay is decreased by 10%, 18% and 33% for array sizes of 128, 256 and 512
respectively. This is due to the faster discharging and larger overall resistance
ratio with lower series resistance.
32 64 128 256 5120
1
2
3
4
5
6
7
Number of cells per BL
D
e l
a y
 [ a
. u .
]
 
 
−10%
−18%
−33%
M = 1 (baseline)
M = 2
M = 4
Figure 4.12: Delay to BL voltage difference of 100mV for PSLP-based cells for
different array sizes.
In order to assess the read voltage difference, the maximum difference in BL
voltage during discharging is considered, between cells with both possible states.
Fig. 4.13 shows the simulated read voltage difference. For a PSLP with M=4,
the maximum voltage difference is increased by 4%, 7% and 13% for array
sizes of 128, 256 and 512 respectively. This is due to the larger overall resistance
ratio with lower series resistance.
CONCLUSION 87
32 64 128 256 5120
50
100
150
200
Number of cells per BL
R
e a
d  
v o
l t a
g e
 d
i f f
e r
e n
c e
 [ m
V ]
 
 
4%
7%
13%
M = 1 (baseline)
M = 2
M = 4
Figure 4.13: Maximum read voltage difference for PSLP-based cells for different
array sizes.
4.4 Conclusion
The series resistance of STT-MRAM cells has become very important in deeply
scaled nodes. In particular the SL resistance has been uncovered in this PhD
research as a major factor, because its width is critical for cell area, which
is especially important when targeting high density caches. The novel PSLP-
based cell design provides a better trade-off between series resistance and area
by sharing a single SL among multiple BL rows. A detailed layout in iN10
performed in this PhD research has shown that most of the area gain can be
reached by using a partial source line plane shared among just 4 BL rows. In
this way, an area reduction of 11% is reached at the same SL resistance as
the baseline cell. Alternatively, at the same area of the baseline cell, the SL
resistance is reduced by more than a factor of 4, resulting in a performance gain
and a reduction of energy consumption. Simulations performed in this PhD
research with arrays of 128, 256 and 512 cells per bit line have shown a 7 to
35% reduction of write energy consumption, a 12 to 47% reduction of write
delay, an 10 to 33% reduction of delay for a read voltage difference of 100mV
and a 4 to 13% increase of maximum read voltage difference.

Chapter 5
Cell design for high
performance caches
This second chapter of two on new cell designs targets higher performance
caches. The focus on performance results in the use of complementary cells at
the cost of larger area.
During this PhD research, complementary cells for high performance caches
were investigated. After an evaluation of their strengths and weaknesses, a novel
3 Transistors 2 MTJ’s (3T 2MTJ) cell design with ground grid, the 3TGG cell
design, has been conceived during this PhD research. The introduction of the
ground grid and a novel cell operation, which make the 3T2MTJ cell smaller,
faster and more energy efficient than the state-of-the-art complementary cells,
are presented in this chapter.
Section 5.1 presents the state-of-the-art complementary cell designs in STT-
MRAM and highlights their remaining challenges. Section 5.2 introduces the
3TGG cell design, together with its benefits over other complementary cell
designs. Section 5.3 compares the improved cell with the state-of-the-art in area,
performance and energy-efficiency.
5.1 Complementary cell designs
STT-MRAM suffers from the low resistance difference between the bi-stable
states of the MTJ. Variability on MTJ resistance and access transistors makes
89
90 CELL DESIGN FOR HIGH PERFORMANCE CACHES
reliable read-out even more challenging. The approach in state-of-the-art to
improve STT-MRAM for lower level embedded caches at the cell level has been
straightforward: add more devices to overcome the shortcomings of the most
basic 1T 1MTJ cell.
5.1.1 What’s wrong with just one?
For STT-MRAM, the main source of variation is the resistance of the MTJ.
Figure 5.1: Normal distributions of RP and RAP with relevant parameters for
assessing sense margin.
Analyzing Fig. 5.1 gives a way to characterize the required process control
needed for readability of the cell. In order to have sufficient sense margin across
an entire memory, the largest RP value needs to be smaller than the reference
value and the smallest RAP value needs to be bigger than the reference value.
Assuming a yield target of 6σ, this means that Eq. (5.1) needs to hold for cells
with a single MTJ storing the data.
∆RP + ∆RAP > 6σRP + 6σRAP (5.1)
Assuming that σRPµRP is equal to
σRAP
µRAP
, gives Eq. (5.2).
σRP
µRP
<
TMR
6(TMR+ 2) (5.2)
COMPLEMENTARY CELL DESIGNS 91
Fig. 5.2 shows the above Eq. 5.2 for a typical TMR range. It shows the upper
limit on resistance variation allowed for readability for 1MTJ per cell with or
without series resistance variation taken into account. Series resistance variation
is specified in percentage for the ratio σRseriesσRP .
Figure 5.2: Upper limit on resistance variation allowed for readability for 1
MTJ per cell
It is also important to take into account the variation of the series resistance,
caused by the access transistor and the biasing lines. This changes Eq. 5.1 into
Eq. 5.3 and Eq. 5.2 into Eq. 5.4.
∆RP + ∆RAP > 6
√
σ2RP + σ
2
Rseries
+ 6
√
σ2RAP + σ
2
Rseries
(5.3)
σRP
µRP
<
TMR
6
√
1 +
σ2
Rseries
σ2
RP
+ 6
√
(TMR+ 1)2 +
σ2
Rseries
σ2
RP
(5.4)
Fig. 5.2 also shows Eq. 5.4 for three ratios of σRseriesσRP . When the variation of
the series resistance is of the same order as the RP variation, there is a clear
92 CELL DESIGN FOR HIGH PERFORMANCE CACHES
need for extra process control of the MTJ. Reducing the series resistance or
its impact on the cell operation, will shift the performance closer to the ideal
curve.
5.1.2 Adding a second MTJ and transistor
Innovations at the cell level to alleviate the readability problem in embedded
caches mostly consist of using complementary cells [16][9][18]. The resistance
difference for reading is increased by comparing two MTJ’s, one in the P-state
and one in the AP-state, instead of comparing a single MTJ with a reference
in between both states. For cells with two independent complementary MTJ’s
storing the data, Eq. (5.5) applies.
∆R > 6
√
σ2RP + σ
2
RAP
(5.5)
Again assuming that σRPµRP is equal to
σRAP
µRAP
, gives Eq. (5.6).
σRP
µRP
<
TMR
6
√
TMR2 + 2TMR+ 2
(5.6)
Fig. 5.3 shows the above equation for a typical TMR range. It shows the upper
limit on resistance variation allowed for readability for 2MTJ’s per cell with or
without series resistance variation taken into account. Series resistance variation
is specified in percentage for the ratio σRseriesσRP .
The use of complementary cells clearly alleviates the readability requirements
on process control and/or TMR value. For a given process control, a lower
TMR value is required. Alternatively for a given TMR value, a lower amount of
process control is required. The importance of taking into account the variation
of the series resistance still remains.
The simplest and most straightforward implementation of a complementary cell
is the 2 Transistors 2 MTJ’s (2T 2MTJ) cell and has been used in [16]. The cell
is composed of two separate dual Bit Line cells operated in a complementary
manner. This dual BL cell is the baseline 1T 1MTJ cell and has a theoretical
area of 16FF’, with F being half the metal pitch and F’ being half the gate
pitch. Fig. 5.4 shows the circuit of this cell design and its independent write
operation of both complementary 1T 1MTJ cells.
Table 5.1 summarizes the operating voltages from Fig. 5.4 for both the read and
write operations. For iN10, the "High" value is the logic supply voltage of 0.7V
COMPLEMENTARY CELL DESIGNS 93
Figure 5.3: Upper limit on resistance variation allowed for readability for 2
MTJ’s per cell.
BL
SL
MTJ
WL
BL
SL
MTJ
WL
a) b)
I 0
I0
I
I
P2AP AP2P
I1
I1b
Figure 5.4: a) Circuit of a 2T 2MTJ cell and b) its independent write operation.
94 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Table 5.1: Operating voltages for the read and write operations of the 2T 2MTJ
cell.
Operation VBL VBL,bar VSL VSL,bar VWL
Low voltage read VR VR Low Low High
Discharge based read PRE = High PRE = High Low Low High
Logic 1 write High Low Low High High
Logic 0 write Low High High Low High
and the "Low" value is the ground potential of 0.0V. The read voltage VR for
the low voltage read is set at 0.1V.
When writing, the 2T 2MTJ cell does not differentiate between the AP2P and
P2AP switch. Assuming the cell is biased with the logic supply voltage for both
switches, the AP2P switch, which has a lower required switching current, will
be over-dimensioned and will consume more energy than needed for its proper
function.
When reading, the 2T 2MTJ cell is benefiting from local referencing, significantly
reducing the impact of global variations. Local variations and transistor
mismatch are not addressed.
The downsides of this cell type are that:
• it doubles the area from 16FF’ to 32FF’,
• it doubles the write energy consumption,
• and it does not solve the problem of local variation and transistor
mismatch.
In this way the benefits over SRAM in reduced area and energy consumption
are fading. It is however only a first “brute force" design without taking into
account the nature of the complementary operation.
Other cells which add only extra MTJ’s have been suggested in [7] and [8].
The 1 Transistor 2 MTJ’s (1T 2MTJ) cell from [7] is interesting as a single cell
concept, but suffers from sneak current paths when put in an array configuration
as shown also in [19]. The multi-level cell from [8] is not a complementary cell,
but is also proposed to average out the variation by putting multiple MTJ’s in
series. In advanced nodes, this series connection will cause a significant area
penalty or a costly extra MTJ layer. Moreover, the switching of these serially
connected MTJ’s becomes more difficult with voltage headroom getting smaller
in advanced nodes. Finally, a special 2T 2MTJ static gain cell is proposed in
COMPLEMENTARY CELL DESIGNS 95
[24]. With its two phase read operation, it is not that interesting for fast caches.
In advanced nodes, this cell also suffers from a larger area due to the gate
contact in the cell and the many biasing lines. Moreover, it will suffer from
voltage drop on the biasing lines which are shared by all active cells. These
cells are not explored further due to the operation issues in advanced nodes.
5.1.3 Adding a third transistor
In a second design as proposed in [9], the authors do take into account the
complementary operation of the cell. A third transistor is added that creates a
serial write path through both MTJ’s. The SL is shared, since it is only used
for reading and thus biased at the same potential. Fig. 5.5 shows the circuit of
this cell design and its write operation.
BL
SL
MTJ
BL
MTJ
RWL
RWL
WWL
a) b)
0
00
0
I
IP2AP AP2P
I2
Figure 5.5: a) Circuit of a 3T sSL cell and b) its serial write operation.
Table 5.2 summarizes the operating voltages from Fig. 5.5 for both the read and
write operations. For iN10, the "High" value is the logic supply voltage of 0.7V
and the "Low" value is the ground potential of 0.0V. The read voltage VR for
the low voltage read is set at 0.1V.
The writing of the 3T 2MTJ cell with a shared SL (3T sSL), takes advantage of
the complementary nature of the cell by performing a serial write through both
MTJ’s. This, in an attempt to lower the write energy consumption. Although
being an interesting concept, this serial write causes problems with voltage
headroom, as mentioned in [9]. The serial write also does not differentiate
96 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Table 5.2: Operating voltages for the read and write operations of the 3T sSL
cell.
Operation VBL VBL,bar VSL VRWL VWWL
Low voltage read VR VR Low High Low
Discharge based read PRE = High PRE = High Low High Low
Logic 1 write High Low Low Low High
Logic 0 write Low High Low Low High
between both switches, sending an equal amount of current through both
MTJ’s.
For reading, the design takes advantage of a shared SL, which removes the
impact of SL resistance variation. However, this cell does still suffer from local
transistor variation and mismatch.
The downsides of this cell type are that:
• the area is even bigger at 36FF’ (3x gate pitch by 3x metal pitch),
• the serial write causes problems with voltage headroom,
• and it does not solve the problem of local transistor variation and
mismatch.
5.1.4 Adding even more transistors
There have been even bigger 4 Transistors 2 MTJ’s (4T 2MTJ) cells proposed
in [9] and [18].
Fig. 5.6 shows the 4T2MTJ cell design as proposed in [9] for lower voltage
operation. By putting both MTJ’s in parallel between the BL’s, they will
compete for write current. When the easier AP2P switch occurs first, this
parallel path transitions to the low resistive state. This will diverge much
needed current from the more difficult P2AP switch, causing the switching time
and energy consumption to rise.
Fig. 5.6 also shows the 4T2MTJ SRAM-“like" cell design as proposed in [18].
This cell suffers from even more area increase due to the multiple gate contacts
needed for every cell. It also uses a two-phase write operation, causing extra
write delay.
These bigger cells with four transistors are not explored further due to the
large area penalty and operation issues. Remember that the main advantage of
IMPROVED 3T2MTJ CELL DESIGN WITH GROUND GRID 97
BL
SL
MTJ
BL
MTJ
RWL WWL
a) b)
MTJ MTJ
BLBL
PL
WL
Figure 5.6: Circuits of two 4T2MTJ cells as in a) [9] and b) [18].
STT-MRAM over SRAM for embedded caches is the area reduction it brings
to this ever growing part of state-of-the-art chips.
5.2 Improved 3T 2MTJ cell design with ground
grid
The 3T sSL discussed in the previous section showed some interesting properties
that are exploited further.
5.2.1 Adding the ground grid: more is less
In the operation of the 3T sSL, the shared SL is only used for reading and
always biased at the same voltage as shown in Table 5.2. Therefor this shared
SL is replaced by a ground grid.
Fig. 5.7 shows the improved array configuration of the 3T 2MTJ cell design with
ground grid (3TGG). All the shared SL’s can be connected together, forming a
low resistive grid. This low resistive grid has multiple advantages.
Most importantly, the grid structure enables the removal of the local SL running
in parallel with the local BL’s. Since the entire array can be connected in a grid
arrangement, the local SL’s can run orthogonal to the local BL’s, effectively
98 CELL DESIGN FOR HIGH PERFORMANCE CACHES
WWL RWL
BL
SL
BL
a) b)
BL
BL
WL Grid
Grid
SL stitching into grid
Ground
Ground
cell M,N cell M,1 cell M,0
cell 0,N
cell 1,N
cell 0,1
cell 1,1 cell 1,0
cell 0,0
cell 0,N cell 0,1 cell 0,0
cell 1,N cell 1,1 cell 1,0
cell M,0cell M,1cell M,N
Figure 5.7: Array configuration of a) the 3T sSL cell as in [9] and b) the novel
3TGG cell as in this work.
eliminating the area impact of the SL’s. The area of the 3TGG cell is therefore
reduced to 24FF’ (3x gate pitch by 2x metal pitch), which is 1.5 x smaller than
the one with a shared SL. Thereby the cell mimics the layout of two 1T 1MTJ
common SL cells as in [33], without the need for two-phased write or a negative
write voltage.
Secondly, this grid can be permanently biased at ground potential eliminating
the need for SL drivers, resulting in an extra area and resistance decrease at
architecture level.
Finally, the access transistors will always have their sources biased at ground
potential through a very low resistive grid. This eliminates the source
degeneration effect that occurs in STT-MRAM, both due to the biasing lines
and the reversed bias operation for writing.
Note also in Fig. 5.7 that the 3TGG cell only needs a single WL signal compared
to a separate read WL (RWL) and write WL (WWL). The read and write
operations that will be discussed next, will clarify this further.
5.2.2 Improving the write operation: using what is already
there
The other major problem with the operation of the 3T sSL cell is the serial
write operation. Although it is very beneficial for very low resistive MTJ’s, it
causes problems with voltage headroom for typical resistance values [9]. In this
IMPROVED 3T2MTJ CELL DESIGN WITH GROUND GRID 99
Table 5.3: Operating voltages for the read and write operations of the 3TGG
cell.
Operation VBL VBL,bar VGG VWL
Low voltage read VR VR Low High
Discharge based read PRE = High PRE = High Low High
Logic 1 write High Low Low High
Logic 0 write Low High Low High
PhD research, the write operation has been improved dramatically by using all
three transistors of the cell.
Fig. 5.8 shows the boosted write operation. The P2AP switch is boosted by
using all three transistors for the write operation. All current through the cell
flows through this MTJ. As compared to Fig. 5.4 b) and Fig. 5.5 b), this current
will always be higher than in both other cells, since it has the most parallel
current paths with the lowest overall resistance. The energy-efficient serial write
mechanism is reused for the AP2P switch.
Table 5.3 summarizes the operating voltages from Fig. 5.8 and Fig. 5.13 for both
the read and write operations. For iN10, the "High" value is the logic supply
voltage of 0.7V and the "Low" value is the ground potential of 0.0V. The read
voltage VR for the low voltage read is set at 0.1V.
The different techniques and effects in the boosted write operation are explained
in the following sections and illustrated in Fig. 5.9, 5.10 and 5.11. These figures
are put on the same time scale in order to compare the effects properly.
P2AP switch and boosting effect
Fig. 5.9 illustrates the boosting effect. It shows the P2AP write currents of the
three cells from Fig. 5.4 b), Fig. 5.5 b) and Fig. 5.8 during a switching event.
The dotted lines indicate the time at which the P2AP switch is 90% complete.
The P2AP write current for the boosted write of the 3TGG cell is clearly far
higher than that of the other cells.
The 3TGG cell with boosted write in dashed green has the highest P2AP write
current. This results in a faster P2AP write operation, as can be seen from the
dotted lines which indicate the P2AP switch. Note that there is a small increase
in P2AP write current as the other MTJ starts to switch to the P-state. Also
note the bigger increase in P2AP write current in the 3T sSL cell with serial
write in dash-dotted red as the other MTJ switches to the P-state. This also
aids in the P2AP switching, but the current is still far lower and thus the P2AP
100 CELL DESIGN FOR HIGH PERFORMANCE CACHES
I
I
I I
0
0
I1
P2AP AP2P
I2
I3
Figure 5.8: Write operation of the 3TGG cell with boosted P2AP switch.
switching is far slower. The 2T 2MTJ cell in solid black shows no such change
in the P2AP write current, as it is independent of the other MTJ switching and
behaves the same as a 1T 1MTJ cell.
AP2P switch and low energy
The AP2P switch is using the advantage of the serial write by reusing part of
the current of the P2AP switch, so that it does not cause extra energy usage.
This makes it however important to balance the resistances of the different
branches. The ratio of AP2P write current to the total current should be bigger
or optimally equal to the ratio of the switching currents IAP2P /IP2AP . Since
transistor width tuning of finFET’s is restricted to integer numbers of fixed-size
fins, it is important to target the MTJ parameters such as RA and TMR to
work well with the logic transistors in order to guarantee this proper balance.
Fig. 5.10 illustrates the energy savings of the serial write. It shows the AP2P
write currents of the three cells during a switching event. The dotted lines
indicate the time at which the AP2P switch is 90% complete. The AP2P write
IMPROVED 3T2MTJ CELL DESIGN WITH GROUND GRID 101
Figure 5.9: P2AP write currents over time of the different complementary cells.
current for the boosted write of the 3TGG cell is the lowest as it reuses part of
the current of the P2AP switch.
The 3TGG cell in dashed green has the lowest AP2P write current, which is
in fact part of the P2AP write current. The 3T sSL cell in dash-dotted red
has a higher AP2P write current, which is in fact the same as the P2AP write
current. This also saves energy consumption, but does not take into account
the asymmetry of the switching. The 2T 2MTJ cell with independent write in
solid black has the highest AP2P write current, which is not reused from the
P2AP write current. It is again the same as that of the 1T 1MTJ cell.
Note that the AP2P write current of the 3TGG cell in dashed green first
increases as the MTJ starts to switch to the P-state. It however starts to
drop again as the other MTJ switches to the AP-state. This drop illustrates
why the balancing of the currents is important, as this drop could prevent the
AP2P switch if the current drops too low. The dotted lines, that indicate the
switch, illustrate that here the 3TGG cell is slower than the others. This slower
AP2P switch is however still faster than the other P2AP switches, so the overall
performance remains better.
102 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Figure 5.10: AP2P write currents over time of the different complementary
cells.
Total current and automatic “shutdown"
The final advantage is that the end state of the 3TGG cell with boosted write
has an overall higher cell resistance than the begin state. This results in an
automatic “shutdown" behavior where the power consumption reduces when
the cell is switched. Both of the other cells have a lower resistance in the end
state, due to different transistor biasing. This automatic “shutdown" behavior
is important for reducing the energy consumption of fast cells and cells that are
being overwritten with the same value.
Fig. 5.11 illustrates the automatic “shutdown" behavior. It shows the total write
currents of the three cells during a switching event. The total write current
for the boosted write of the 3TGG cell is lower after switching than before or
during. For the 1T 1MTJ, the average current of an AP2P and P2AP switch is
shown, which is half of the 2T 2MTJ cell.
The 1T1MTJ, 2T2MTJ and the 3T sSL cells have an increased total current
after the switching has occurred. Therefore the faster cells and the overwritten
IMPROVED 3T2MTJ CELL DESIGN WITH GROUND GRID 103
Figure 5.11: Total write currents over time of the different complementary cells.
cells will consume even more energy than the worst case cells. For the 3TGG cell
with boosted write this is the opposite, resulting in the automatic “shutdown"
behavior that will reduce energy consumption after switching.
5.2.3 Improving the read operation: using what is already
there
A 3T2MTJ cell is well suited to alleviate the remaining issues of local transistor
variation and mismatch. The 3T sSL cell as in [9] only uses both access
transistors for reading. In this PhD research, the read operation has been
improved by using all three transistors for the read operation.
In the 3TGG cell, the third transistor will compensate for the mismatch in
the access transistors by opening another current path. The ground grid in
the 3TGG cell has the added advantage that it increases the sense margin
by lowering the series resistance of the cell, both directly and indirectly by
improved transistor biasing. The improvements will be explained in this section
and illustrated with simulations in section 5.3.2.
104 CELL DESIGN FOR HIGH PERFORMANCE CACHES
There are two effects at play, which improve the readability of the cell. By
comparing the voltage and current differences for the 2T 2MTJ (same for 3T sSL)
and 3TGG cell driven by an ideal current source and by an ideal voltage source
respectively, both effects can be distinguished. Please note that this explanation
uses a DC approximation and is meant to distinguish the different effects.
a) b)
IR IR IR IR
RP RP(TMR+1) RP RP(TMR+1)
R1 R2
R3
R1 R2
Vdiff2T Vdiff3T
Imismatch
Figure 5.12: Circuit of a) 2T 2MTJ and b) 3TGG cell with ideal current input.
Fig. 5.12 shows the circuits with an ideal read current IR, which is used
for illustrating the reduction of transistor mismatch. Analyzing the DC
characteristics of these circuits and comparing the corresponding BL voltages,
gives Eq. (5.7) and Eq. (5.8) for 2T 2MTJ and 3TGG cells respectively.
Vdiff2T = IRRPTMR
+ IR(R2 −R1) (5.7)
Vdiff3T = IRRPTMR
+ IR(R2 −R1) R3
R1 +R2 +R3
(5.8)
Both equations have a desired term and an unwanted term, which are
proportional to the TMR value and the access transistor mismatch (R2 −R1)
respectively. The 3TGG cell however has a decreased impact of the mismatch
portion, which ideally goes to zero when R3 goes to zero.
IMPROVED 3T2MTJ CELL DESIGN WITH GROUND GRID 105
a) b)
RP RP(TMR+1) RP RP(TMR+1)
R1 R2
R3
R1 R2Imismatch
VR VR VRVR
IP IAP IP IAP
Figure 5.13: Circuit of a) 2T 2MTJ and b) 3TGG cell with ideal voltage input.
Fig. 5.13 shows the circuits with an ideal reading voltage VR, which is used
for illustrating the increase in sense margin. Table 5.3 also summarizes the
operating voltages from Fig. 5.13. Again analyzing the DC characteristics of
these circuits and comparing the corresponding BL currents, gives Eq. (5.9) and
Eq. (5.10) for 2T 2MTJ and 3TGG cells respectively.
Idiff2T = IP − IAP
= VR
denom2T RPTMR
+ VR
denom2T (R2 −R1) (5.9)
Idiff3T = IP − IAP
= VR
denom3T RPTMR
+ VR
denom3T (R2 −R1)
R3
R1 +R2 +R3
(5.10)
106 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Again, the 3TGG cell shows the same decreased impact of the mismatch portion
as before. Expanding the denominators further, gives Eq. (5.11) and Eq. (5.12).
denom2T = R1R2
+ R2RP
+ R1RP (TMR+ 1)
+ R2P (TMR+ 1) (5.11)
denom3T = R1R2
R3
R1 +R2 +R3
+ R2RP
R1 +R3
R1 +R2 +R3
+ R1RP (TMR+ 1)
R2 +R3
R1 +R2 +R3
+ R2P (TMR+ 1) (5.12)
These equations are both build with four terms. In the denominator of the
3TGG cell, these terms are scaled with a value which is always smaller than or
equal to 1. Therefore the denominator of the 3TGG cell is always smaller than
that of the 2T 2MTJ cell. In other words, the sense margin for the 3TGG cell
is always increased as compared to the 2T 2MTJ (and 3T sSL) cell for a given
read voltage.
5.3 Comparison with state-of-the-art
In this section the improved cell design and operation will be compared with
the state-of-the-art complementary cells discussed in section 5.1 in terms of area,
performance and energy consumption. This comparison is done in iN10.
Exploratory simulations were done for TMR ranging from 50% to 200%,
resistance area product (RA) from 2 to 5Ωµm2 and MTJ diameters of
30, 40 and 50 nm. The results presented in this section use the MTJ simulation
parameters as presented in Table 5.4.
COMPARISON WITH STATE-OF-THE-ART 107
Table 5.4: MTJ simulation parameters
Parameter Value
Diameter 30 / 40 nm
RA 2/ 5Ωµm2
TMR 100%
RP 1,6 / 2,8 / 4,0 / 7,1 kΩ
RAP 3,2 / 5,7 / 8,0 / 14,1 kΩ
5.3.1 Layouts and area and resistance comparison
These layouts have been made with the iN10 design rules as described in
chapter 3 in Table 3.1 and Table 3.2. Each of the cells has two active fins per
device, either by two parallel fins or by two parallel fingers.
The layout of the 1T 1MTJ cell is shown in chapter 3 in Fig. 3.16. Fig. 5.14 shows
the layout of the 2T 2MTJ cell, which is composed of two independent 1T 1MTJ
cells on the same WL. Fig. 5.15 shows the layout of the 3T sSL cell, which
is essentially the same layout as two adjacent dummy-poly cells as explained
in chapter 3 and shown in Fig. 3.19. The “dummy" poly gate is however not
grounded, but used as the WWL and both adjacent WL’s are used as the RWL.
Fig. 5.16 illustrates the 3D view of the 3TGG cell design. Six gate lines are
shown of which three are active and highlighted. Also two BL/Ground Grid
combinations are shown. Fig. 5.17 shows the layout of the improved 3TGG cell.
As explained in section 5.2.1, the use of a ground grid allows the orientation of
the local SL to change from parallel to the BL’s to parallel to the WL’s enabling
an area reduction. The local SL’s are stitched together to form a grid at the
positions were the WL’s are stitched to the gates, causing minimal overhead for
the grid configuration.
Fig. 5.18 shows the minimal cell area of the different cells. The first group is
based on layouts which take into account the design rules and the second group
is the ideal pitch-based estimate. The 1T1MTJ cell, the 2T2MTJ cell and
the 3TGG cell can not reach their optimal pitch-based area estimate due to
processing constraints, as discussed in chapter 3. The 3TGG cell is clearly the
smallest complementary cell with an area decrease of 22% as compared to the
baseline 2T 2MTJ (25% with pitch-based estimate). It is only 55% larger than
the 1T 1MTJ cell, but comes with clear advantages in terms of performance and
energy as shown in the next sections. All cells are large enough to accommodate
for MTJ’s of 50 nm or smaller.
Table 5.5 summarizes the resistance values obtained from PEX of a 256x256
108 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Figure 5.14: Layout of 2T 2MTJ cell (black box) in iN10 of 2x 102 nm high and
128 nm wide.
Table 5.5: Parasitic array resistances of the different complementary cells
Parameter 2T 2MTJ 3T sSL 3TGG
RFET (kΩ) 2,8 2,8 2,8
RSL /RGG (Ω) 1283 2936 373
RBL (Ω) 325 912 1034
cell array. The transistor resistances are the same for all, since all cells use
2 fins per device. It is important to note though, that the 3TGG cell uses all
three transistors for both operations, effectively lowering the parasitic resistance
caused by the access transistors. The SL or resistance is very important for the
operation of the cells as shown in chapter 4. This resistance is far lower for the
3TGG cell thanks to the low resistive ground grid. The BL resistance of the
3TGG cell is the highest, since it is the smallest of the three. The 2T2MTJ
cell benefits from being only 2PP wide resulting in the lowest BL resistance.
COMPARISON WITH STATE-OF-THE-ART 109
Figure 5.15: Layout of 3T sSL cell (black box) in iN10 of 144 nm high and
192 nm wide.
Table 5.6: Operating voltages of the low voltage read operation for the three
complementary cells.
Cell VBL VBL,bar VSL/GG VSL,bar VRWL/WL VWWL
2T 2MTJ 0.1V 0.1V 0.0V 0.0V 0.7V n/a
3T sSL 0.1V 0.1V 0.0V n/a 0.7V 0.0V
3TGG 0.1V 0.1V 0.0V n/a 0.7V n/a
5.3.2 Sense margin comparison
Complementary cells have the inherent advantage of an increased sense margin
as explained in section 5.1.2. In this section, the added sense margin of the
mismatch tolerant read operation is shown and the added benefit of the low
resistive ground grid. Simulations are done with a fixed reading voltage, since
this illustrates both the increased sense margin and a decreased effect of access
transistor mismatch. Table 5.6 summarizes the operating voltages for the low
voltage read operation simulations for the 2T2MTJ, 3T sSL and 3TGG cell.
For modeling mismatch, a σV t of just under 20mV is used, which correspond
to data from [15].
Fig. 5.19 and Fig. 5.20 show the read current difference in function of RP of the
110 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Figure 5.16: 3D view of the 3TGG cell.
Figure 5.17: Layout of 3TGG cell (black box) in iN10 of 106 nm high and
192 nm wide.
COMPARISON WITH STATE-OF-THE-ART 111
Figure 5.18: Minimum cell area for the different complementary cells.
2T 2MTJ cell, the 3T sSL cell and the 3TGG cell. The lines as in the legend
represent the mean transistor threshold voltages. The dotted lines of the same
color represent the 4.5σ values for worst and best case Vt shift. The arrows
show relative gains or losses of cells with the worst case mismatch, as compared
to the 2T 2MTJ cell. These gains or losses are bigger for smaller RP , since the
effects of transistors and BEOL resistance are relatively bigger for smaller RP .
Fig. 5.19 illustrates the effect of the mismatch tolerant three transistor read
operation with the dashed green arrows. The absolute values of the read current
difference are increased significantly by the three transistor read operation and
the spread with transistor mismatch is decreased. The 3T sSL cell shows a small
increase in read current difference as compared to the 2T2MTJ cell. This is
due to the lower ON-resistance of a two fin transistor as compared to a two
fingered transistor.
Fig. 5.20 illustrates the effect of the low resistive ground grid. There are three
cells that use a 2T read operation: the 2T2MTJ cell, the 3T sSL cell and the
3TGG cell when only using both access transistors. The BEOL resistance of
the 3T sSL cell is increased as compared to the 2T 2MTJ cell and therefore has
a smaller read current difference (red arrows). By introducing the ground grid
in the 3TGG cell, the BEOL resistance drops significantly which gives a bigger
read current difference (magenta arrows). Simply introducing the ground grid
improves the sense margin by up to 12%. The green arrows show the combined
effect of the low resistive ground grid and the mismatch tolerant three transistor
112 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Figure 5.19: Read current difference in function of RP for the three
complementary cells for the best case column with low BEOL impact.
read operation, which leads to up to 88% improvement.
The increased sense margin enables to make the read operation more robust to
read disturb. A given sense margin that is needed for reliable operation of the
sensing circuitry, can be attained with a lower reading current.
5.3.3 Read performance comparison
The read performance is mostly determined by the sensing circuitry and the
sense margin provided by the cell. The evaluation of the sensing circuitry is
outside the scope of this PhD research, but would be an interesting future work.
The sense margin provided by the cell was discussed in the previous section 5.3.2
The portion determined by the cell is how fast it reaches its desired sense margin.
Simulations are done by pre-charging the BL’s to 0.7V and then discharging
them while measuring the BL voltage difference. Table 5.7 summarizes the
operating voltages for the discharge based read operation simulations for the
2T 2MTJ, 3T sSL and 3TGG cell.
Fig. 5.21 shows the delay to a BL voltage difference of 25mV in function of the
COMPARISON WITH STATE-OF-THE-ART 113
Figure 5.20: Read current difference in function of RP for the three
complementary cells for the worst case column with high BEOL impact (256
columns).
Table 5.7: Operating and pre-charge voltages of the discharge based read
operation for the three complementary cells.
Cell VBL VBL,bar VSL/GG VSL,bar VRWL/WL VWWL
2T 2MTJ 0.7V 0.7V 0.0V 0.0V 0.7V n/a
3T sSL 0.7V 0.7V 0.0V n/a 0.7V 0.0V
3TGG 0.7V 0.7V 0.0V n/a 0.7V n/a
number of cells on a BL for the 2T 2MTJ cell, the 3T sSL cell and the 3TGG
cell (both with a two transistor read and a three transistor read). The delay
increases with BL capacitance, which scales linearly with the number of bits
per BL. It increases further with higher series resistance.
The magenta arrows shows the performance gain of introducing the ground grid.
The 3T sSL cell and the 3TGG cell with a two transistor read differ only in
their series resistance thanks to the introduction of the low resistive ground
grid. The performance gains shown in this comparison are consistent with the
results shown in chapter 4. The green arrows show the performance gain of the
3TGG cell with respect to the 2T2MTJ cell. Note again that although the
114 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Figure 5.21: Delay to a BL voltage difference of 25mV for the different
complementary cells.
gains seem very big, this consists only of a small portion of the total delay for
reading.
5.3.4 Write performance comparison
Write performance is mostly dominated by how much current the cell can
source, especially when performing a P2AP switch. As illustrated in Fig. 5.9,
the improved cell can source the highest current for a given voltage. What is
equally important, is the tolerance for variation. In an embedded cache, all
the cells will be operated synchronously and will be given the same duration of
voltage pulse. Therefore it is the performance of the slowest cell which determines
the speed of the memory. Especially MTJ area variation is important, since it
influences the switching in two ways. Variation on the area will cause variation
of the MTJ resistance and therefore change the amount of current the cell
can source. Variation on the area will also cause variation of the volume of
the free layer and therefore alter the energy barrier for switching. For MTJ
area variation, a σ/µ = 5 % as in [13] is used. As MTJ parameters for the
write simulations, a size of 40 nm, RA of 2 Ωµm2 and a TMR of 100% is used.
COMPARISON WITH STATE-OF-THE-ART 115
Table 5.8: Operating voltages of the logic 1 write operation for the three
complementary cells.
Cell VBL VBL,bar VSL/GG VSL,bar VRWL/WL VWWL
2T 2MTJ 0.7V 0.0V 0.0V 0.7V 0.7V n/a
3T sSL 0.7V 0.0V 0.0V n/a 0.0V 0.7V
3TGG 0.7V 0.0V 0.0V n/a 0.7V n/a
Table 5.8 summarizes the operating voltages for the logic 1 write operation
simulations for the 2T 2MTJ, 3T sSL and 3TGG cell.
Fig. 5.22 shows the write delays of the different cells with MTJ area variation.
The lines as in the legend represent the mean transistor threshold voltages. The
dotted lines of the same color represent the 4.5σ values for worst and best case
Vt shift. Both switches are simulated simultaneously and the write delay is
the maximum of both. For the 3TGG cell, it is often the AP2P switch which
becomes the slowest, since the cell is boosting the P2AP switch. The horizontal
axis shows the range of MTJ diameters when targeting a 40 nm pillar. The
crosses indicate the targeted design with 40 nm pillars and typical transistors.
The 3TGG cell has the lowest delay, but there is little difference in this mean
value as compared to the 1T 1MTJ and 2T2MTJ cell.
The difference is far bigger when taking into account the area variation. The
1T 1MTJ, 2T 2MTJ and the 3T sSL cell with serial write suffer a lot from MTJ
area variation as the diameter grows bigger and the MTJ’s become more difficult
to switch. The 3TGG cell on the other hand proves to be very robust against
MTJ area variation.
Finally also taking into account the transistor variation as illustrated in the
dotted lines, there is an even bigger difference in delay. Since the delay of the
array is determined by its slowest cell, the 3TGG cell with boosted write is
clearly the fastest cell design and can be as much as a 2x faster.
5.3.5 Write energy consumption comparison
When comparing write energy, the complementary cells inherently have the
disadvantage that they need to switch two MTJ’s. This makes the 2T2MTJ
cell consume double the energy of the 1T1MTJ cell. Write energy is further
determined by both the write current and the delay. As seen from the previous
section 5.3.4, the 3TGG cell with boosted write is the fastest of the cells. It
also applies most current where it is needed most and “shuts down" the cell
116 CELL DESIGN FOR HIGH PERFORMANCE CACHES
Figure 5.22: Write delay of the different cells with MTJ area variation.
once it is written, as seen in Fig. 5.11. All this contributes to a reduction of
energy consumption.
Fig. 5.23 shows the write energy of the different cells. The mean values without
transistor variation are shown with the crosses. The dotted lines are the energies
for the worst case transistor variation, the lines as in the legend take into account
the same worst case delay for all cells. The horizontal axis shows the range
of MTJ diameters when targeting a 40 nm pillar. The crosses indicate the
targeted design with 40 nm pillars and typical transistors. These crosses show
the potential energy benefit for a serial write, which is more energy-efficient
than the 2T2MTJ despite being slower as shown in Fig. 5.22. The 3TGG
cell already shows some interesting energy reduction in the mean case. Again
despite only being a bit faster, it is inherently more energy-efficient due to the
reusing of current for the easier AP2P switch. In the mean case, the 1T 1MTJ
cell is still the most energy-efficient.
The dotted lines representing the worst case transistor variation for the range
of MTJ diameters show again the tolerance against variation of the 3TGG cell.
There is now even an energy reduction gain for larger sizes as compared to the
1T 1MTJ cell, but there still is an overhead for smaller sizes.
CONCLUSION 117
Figure 5.23: Write energy of the different cells with MTJ area variation.
Finally the lines as in the legend show the energy consumption when taking
into account the synchronous operation of the memory. Since all cells would be
given the same duration of writing pulse, the faster cells would still consume
needless energy after being switched. So the difference between the lines as
in the legend and dotted lines, which is about half the energy consumption of
the 1T 1MTJ cell, is actually useless energy spend on faster cells. This makes
it very clear how important the variation tolerance is of the 3TGG cell with
boosted write and automatic “shutdown" behavior. Everything combined, the
3TGG cell is by far the most energy-efficient complementary cell. It can even
give up to 33% in energy reduction as compared to a 1T1MTJ cell, despite
writing two MTJ’s instead of one.
5.4 Conclusion
In this chapter, it has been shown that the 3TGG cell with ground grid,
mismatch tolerant read operation and boosted write operation is the new state-of-
the-art STT-MRAM cell for high performance embedded caches. By introducing
the ground grid, this novel design is the smallest of all complementary cells,
118 CELL DESIGN FOR HIGH PERFORMANCE CACHES
reducing the area by 22% as compared to 2T 2MTJ cell. With the low resistive
ground grid and the mismatch tolerant read operation, the effect of transistor
variation and mismatch is reduced and the sense margin is improved by up to
88% as compared to 2T 2MTJ cell. By using the boosted write operation with
high variation tolerance and automatic “shutdown" behavior, 3TGG cell has
become the fastest and most energy-efficient cell, improving on the 2T2MTJ
cell by up to 2x in write speed and 3x in write energy consumption.
Chapter 6
Write performance under
time-dependent variability
As for any technology and especially memory technologies, it is important to
assess STT-MRAM cells under variability. The write performance of this non-
volatile memory is of great concern when targeting embedded cache memories.
In this PhD research, the write performance under time-dependent variability
is investigated. The results presented in this chapter show the importance of
targeting the optimal MTJ diameter and the preference to use PMOS access
transistor in finFET nodes.
Section 6.1 introduces the important questions about optimal size and access
transistors that need to be answered when considering the write performance.
Section 6.2 discusses the main sources of variability in the cell for the write
performance. Section 6.3 gives the answers to the questions posed in section 6.1.
6.1 The relevant questions
This research focuses on advanced finFET nodes, which are quite different from
planar technology. The relevant questions to ask are therefor not the traditional
ones.
119
120 WRITE PERFORMANCE UNDER TIME-DEPENDENT VARIABILITY
6.1.1 What is the optimal MTJ diameter for write perfor-
mance?
Transistor sizing was used in planar technology to optimize the STT-MRAM
cell [11]. In finFET technologies, transistor sizing is however limited to integer
number of fins and/or fingers. Therefor it is investigated in this PhD research
what the optimal MTJ diameter target is under variability for a given access
transistor sizing.
Importantly, the diameter of the MTJ is determined upfront, because it will
influence the target values of the thin film properties. Most importantly, it
influences the thermal stability factor ∆, which is dependent on the volume
of the free layer. The ∆ needs to be high enough for all MTJ’s within the
distribution in order to avoid random switching of the MTJ with data loss as a
consequence.
6.1.2 Should PMOS or NMOS access transistors be used?
This used to be a trivial question in the past. In planar technology, NMOS
access transistors were always preferred for STT-MRAM cells, because of their
superior drive current, time-zero and time-dependent variability as compared
to PMOS transistors. This drive current is especially important for writing
STT-MRAM cells, since this is a current based technique.
There is however a problem when combining this NMOS access transistor with
an MTJ. Bottom-pinned MTJ’s are preferred in order to process the complex
thin film stack of the pinning layers first. As explained in section 2.2.1, the
write asymmetry exhibited by MTJ’s however favors the combination of NMOS
access transistors with top-pinned MTJ’s or PMOS access transistors with
bottom-pinned MTJ’s.
With the introduction of finFET devices in scaled technologies, the drive
current of PMOS transistors becomes better than that of NMOS transistors
[31]. Variability however still favors NMOS transistors. So the question that
should get answered, is whether the improved drive of the PMOS transistors
can overcome the higher variability.
The focus of the industry has been to create top-pinned MTJ’s which reach the
performance of bottom-pinned MTJ’s. In this PhD research, it is investigated
whether or not to change the access transistor from NMOS to PMOS in stead
of the MTJ from bottom-pinned to top-pinned.
MAIN SOURCES OF VARIABILITY 121
6.2 Main sources of variability
To answer the questions above, the main sources of variability in STT-MRAM
cells were first analyzed in this PhD research.
6.2.1 MTJ variability
The MTJ itself is of course the first source of variability. The two sources of
MTJ variability come from the thin film layer deposition and the MTJ pillar
formation.
Thin film deposition
The thin film layer deposition is a well-controlled process down to the atomic
level. It uses Atomic Layer Deposition (ALD) and annealing steps to create a
stack of uniform layers. Especially the formation of the MgO layer is important
for the MTJ properties.
The thin film properties are important for the write performance, but show very
good uniformity. As such, they have limited impact on overall variability.
Moreover, for addressing the questions asked above, the inclusion of the
variability on the thin film properties adds little insight and is therefor omitted.
MTJ pillar formation
The MTJ pillar formation by a lithographic printing and etch process has the
biggest impact on the overall variability [17]. Both the printing and the etching
give rise to variation in the Critical Dimension (CD) of the MTJ.
The CD of the MTJ pillar is very important for the write performance. It will
have an impact on the overall cell resistance and thus on the current through
the cell. When the MTJ is put in series with an access transistor, it will also
impact the current density through the MTJ. The CD will also have a large
impact on the thermal stability of the MTJ.
For simulations at iN10, CD standard deviations ranging from 0 to 1.5 nm are
used. As shown in [13], there are printing techniques available to get within this
range, such as the crossing of two spacer defined lines which promises standard
deviations as low as 0.5 nm. The etching step brings further variation on top of
the printing step.
122 WRITE PERFORMANCE UNDER TIME-DEPENDENT VARIABILITY
Possible etching problems, such as side-wall redeposition of metals and damaged
outer zone of the MTJ pillar, are not taken into account for the simulations.
Side-wall redeposition can be prevented by appropriate encapsulation and
damage zones should be minimized. Both have the effect on the circuit level of
a resistance in parallel with the MTJ. For both effects this resistance should be
high enough to have limited impact on the overall cell.
6.2.2 FET variability
The variability of the access transistor is the other major contribution to the
overall cell variability. Especially when targeting small cell sizes, since the
minimum area of the standard dual-bitline cell in iN10 can be achieved with an
access transistor composed of two fingers of a single fin, as shown in chapter 3 in
Table 3.5. The variability is modeled with both time-zero and time-dependent
threshold voltage shift.
Time-zero variability
For including time-zero variability in the simulation, a normal distribution
model based on Pelgrom’s mismatch parameter A∆VTH where the mean is zero
and the standard deviation is expressed in Eq. 6.1:
σVTH =
A∆VTH√
2WeffLG
, (6.1)
where LG is the gate length of 24 nm and Weff is the effective transistor width
as expressed in Eq. 6.2:
Weff = (2HFIN + TFIN )NFNFIN , (6.2)
where HFIN and TFIN are the height (30 nm) and thickness (7 nm) of a fin and
NF and NFIN are the number of fingers and fins.
Time-dependent variability
For including time-dependent variability in the simulation, an empirical based
model which fits the Bias-Temperature Instability (BTI) induced mean ∆VTH
by a simple power law as expressed in Eq. 6.3:
∆VTH = AEOXγtn, (6.3)
THE ANSWERS 123
where t is the stress time, A, γ and n are fitting parameters and EOX equals
the electric field over the gate oxide.
The values of these parameters are presented in Table 6.1 and are extracted
from previous work done in-house [27] and references on stable technologies
[5][20][31][30]. These are scaled to iN10 according to physical sizes.
Fig. 6.1 shows the threshold voltage shifts attained with these parameters. It
shows the shift for both PMOS and NMOS access transistors at time zero and
after 3 years of BTI degradation assuming 0.1% duty cycle for both Standard
and Low Threshold Voltage (SVT and LVT) devices at nominal supply voltage
of 0.7V. The polarity for PMOS and NMOS are inverted for comparison.
The PMOS transistors show bigger shifts and spreads than the NMOS transistors.
The LVT devices show bigger shifts and spreads due to BTI degradation caused
by a higher overdrive voltage. The spread on the LVT devices is still small
enough to ensure that the non-selected cells remain in the sub-threshold regime
and don’t cause too much array leakage during operation. Combining this with
the non-volatility of the MTJ, which enables powering down the unused sections
of the memory to avoid leakage energy consumption, the memory design can
be done with LVT devices. The remaining simulations are performed with LVT
devices.
6.3 The answers
To find the answers to both questions relating to write performance, a simulation-
based assessment is performed with the framework discussed in section 2.3 with
the following parameters:
• CD range of 20 to 50 nm
Table 6.1: iN10 technology and time-dependent variability parameters
Device type PMOS NMOS
A∆VTH [mVµm] 1.47 1.19
η [mV] 3.274 0.655
A 0.0269 0.0459
γ 2.9 4.745
n 0.1548 0.1551
LVT [V] 0.148 0.153
SVT [V] 0.269 0.277
124 WRITE PERFORMANCE UNDER TIME-DEPENDENT VARIABILITY
-0.15 -0.1 -0.05 0 0.05 0.1 0.15
∆VTH [V]
-6
-4
-2
0
2
4
6
Pr
ob
it 
(C
DF
)
PMOS - t0
PMOS - t3y - LVT
PMOS - t3y - SVT
NMOS - t0
NMOS - t3y - LVT
NMOS - t3y - SVT
Figure 6.1: Threshold voltage shift for both PMOS and NMOS access
transistors.
• TMR of 150%
• RA of 3Ωµm2
• ∆ of 50 for the 6σ CD point.
For simulating the variability, the sampling method as introduced in [28] is used
in order to reach high variation results.
For a fair comparison of NMOS versus PMOS access transistors, it is important
to first find the answer to the question on optimal CD for write performance,
6.3.1 Optimal MTJ target CD
Fig. 6.2 shows the write delay of two target CD’s under time-zero and time-
dependent variability. The target CD with the lowest write delay for the
time-zero mean case is 29 nm. The target CD of 27 nm however shows a lower
write delay at the 6σ crossing both at time-zero and after 3 years of aging. A
THE ANSWERS 125
crossover of the write delays can be seen around the 4.5σ point at time-zero and
already around the 3σ point after 3 years of aging. This shows that targeting
the MTJ CD for the mean case at time-zero will result in suboptimal designs.
0 1 2 3
Write delay [ns]
-6
-4
-2
0
2
4
6
Pr
ob
it 
(C
DF
)
27nm - t0
27nm - t3y
29nm - t0
29nm - t3y
Figure 6.2: Write delay under variability.
Fig. 6.3 shows the write delay under variability when sweeping the target CD.
The lines in this figure are composed of the crossing points with the mean
and 6σ lines from Fig. 6.2 for many target CD’s. Again the figure shows that
the optimal CD to target is not the same when considering variability. If
the time-zero and time-dependent variability is not taken into account when
targeting the MTJ CD, this will lead to a suboptimal design.
The results of this research have shown that there is an optimal target CD
which changes when taking into account time-zero and/or time-dependent
variability. It is also important to consider a possible difference between NMOS
and PMOS to make a fair comparison between both. Also the amount of MTJ
CD variability is important, especially when the effects of etch damage are not
yet fully characterized.
Fig. 6.4 shows the optimal target CD versus the CD control for both PMOS
and NMOS access transistors with and without variability. The black lines of
the mean cases show an increase in optimal target CD, because the ∆ increases
with increasing CD variation in order to maintain data retention robustness.
126 WRITE PERFORMANCE UNDER TIME-DEPENDENT VARIABILITY
23 24 25 26 27 28 29 30 31 32
Target CD [nm]
0
1
2
3
W
rit
e 
de
la
y 
[ns
]
t0 - mean
t0 - +6 σ
t3y - +6 σ
Figure 6.3: Write delay versus size.
0 0.25 0.5 0.75 1 1.25 1.5
MTJ σCD [nm]
20
25
30
35
40
45
50
O
pt
im
al
 M
TJ
 µ
CD
 
[nm
]
NMOS - t0 - 0 σ
NMOS - t0 - 6 σ
NMOS - t3y - 6 σ
PMOS - t0 - 0 σ
PMOS - t0 - 6 σ
PMOS - t3y - 6 σ
Figure 6.4: Optimal MTJ CD target for given MTJ CD control.
THE ANSWERS 127
The blue and red lines of the 6σ crossings for time-zero and after 3 years of
aging all show a similar trend. When the MTJ CD variability is large, larger
target CD’s are more optimal. This is expected, because the relative impact of
the variation is smaller for larger target CD’s. When the MTJ CD variability
is below a certain threshold however, the optimal target CD under variability
will be smaller than that in the mean case. At small MTJ CD variability it is
the variability that is the dominant factor. The current through the cell will
then mostly be determined by the saturation current of the access transistor,
so smaller MTJ’s will have a higher current density.
Fig. 6.5 shows the relative delay between the optimal target CD considering
variability and the optimal target CD of the mean under variability. This
concept is also illustrated with the horizontal and vertical lines in Fig. 6.3. It
shows how much loss in write performance can be expected when not taking
into account the variability when targeting the MTJ CD. Especially when the
MTJ CD variability exceeds the threshold this loss can be severe.
0 0.25 0.5 0.75 1 1.25 1.5
MTJ σCD [nm]
100
105
110
115
120
125
R
el
at
iv
e 
de
la
y 
[%
]
NMOS - t0 - 6 σ
NMOS - t3y - 6 σ
PMOS - t0 - 6 σ
PMOS - t3y - 6 σ
Figure 6.5: Relative delay for optimal MTJ CD target at t0 mean as compared
to the optimal delay for given MTJ CD control.
It is also important to note that the loss in performance is worse for PMOS than
it is for NMOS transistors. This is due to the larger variability of the PMOS
128 WRITE PERFORMANCE UNDER TIME-DEPENDENT VARIABILITY
transistors. It is therefor especially important to take all this into account when
comparing PMOS and NMOS access transistors.
6.3.2 PMOS access transistors should be preferred
Fig. 6.6 shows the write delays for the optimal MTJ target CD comparing PMOS
bottom-pinned and NMOS top-pinned combinations. As expected from the
higher drive current of the PMOS transistor in finFET technologies, the PMOS
cells show lower write delay in the black lines of the mean case. The increased
delay of the mean case with higher MTJ CD variability is again the effect of
the ∆ targeting, which needs to be higher with higher CD variability.
0 0.25 0.5 0.75 1 1.25 1.5
MTJ σCD [nm]
0
1
2
3
4
5
6
7
8
9
10
D
el
ay
 fo
r o
pt
im
al
 M
TJ
 
µ
CD
 
[ns
] NMOS - t0 - 0 σNMOS - t0 - 6 σ
NMOS - t3y - 6 σ
PMOS - t0 - 0 σ
PMOS - t0 - 6 σ
PMOS - t3y - 6 σ
Figure 6.6: Delay for optimal MTJ CD target for given MTJ CD control.
The PMOS also shows better results for all cases with both the predicted
time-zero and time-dependent variability. Note that for this comparison, the
properties of the bottom-pinned MTJ have not changed as compared to the top-
pinned MTJ, where in practice the bottom-pinned MTJ has better properties.
CONCLUSION 129
6.4 Conclusion
In this chapter, the importance to target the MTJ diameter under consideration
of variability has been shown. Specifically MTJ CD variability and access
transistor time-zero and time-dependent variability have been shown in this
PhD research to have a big impact on determining the optimal target CD for
write performance. The results in this chapter have shown that only considering
the mean values will result in suboptimal or even poor performance. With the
currently predicted time-dependent variability of PMOS transistors in scaled
finFET nodes, this research has shown that the combination of PMOS access
transistors with bottom-pinned MTJ’s is preferred for write performance over
the traditional use of NMOS transistors.

Chapter 7
Conclusion
While reaching the main goal to design STT-MRAM cells for embedded memories
in and beyond 10 nm technologies, many important aspects for the future of
STT-MRAM and scaling in general surfaced. STT-MRAM has great promise to
replace SRAM as a high density embedded memory, but as any technology has
its limits that will be very difficult to overcome. Semiconductor scaling in general
is moving beyond simple pitch based scaling by introducing scaling boosters
and co-integration techniques for heterogeneous scaling. The introduction of
novel, specifically resistance based, devices changes the importance of parasitic
resistance effects. Variability is increasingly important, even when the devices
are not at the edge of the technology. All these things combined result in a
need to change the way we develop new technologies and design new circuits
and systems.
7.1 Beyond pitch based scaling
As shown in chapter 3, the scaling of STT-MRAM cells is no longer only driven
by scaling the patterning pitches. Ever more secondary constraints are surfacing
and are limiting the potential to scale down the cell. The solution was to
introduce so-called scaling boosters, such as the multi-level via, that tackle the
specific challenges in scaling down the circuit as a whole, not just the basic
devices.
Scaling boosters are processing techniques that boost the miniaturization of
circuits without scaling down the pitches of the lithographic patterning. They
131
132 CONCLUSION
typically resolve issues related to connectivity in order to increase the density of
the circuits. The example mentioned before of the multi-level via illustrates this
perfectly. It does not require scaling down the pitches of the via, but it enables
a denser connectivity by bypassing a BEOL layer and removing an unwanted
metal strip.
The same trend is happening in logic technology scaling. The scaling of
the circuits by scaling the patterning pitches is slowing down as it becomes
increasingly hard to connect to the dense patterns of devices. Moreover the
limits of making these devices smaller, while improving performance and energy,
are being reached. This is why many scaling boosters are also being investigated
in logic in order to further scale down the circuits. An example for logic are
self-aligned gate contacts. These contacts do not have denser patterns, but
increase connectivity. Due to the self-alignment, they can be placed anywhere
on the gate, even between the source/drain connections. This allows for extra
flexibility when contacting logic standard cells, which in turn improves the
density.
This new scaling approach comes with two main dangers that we should be
wary about. First of all, these new processing techniques come with a high cost.
Extra processing steps, especially new techniques, are expensive to implement
and to yield to mass production. Providing sufficient gains for the overall chip
has become ever more important for these techniques. As shown in chapter 3,
scaling boosters often don’t improve every circuit or don’t improve them to the
same extent. In order to ensure the value return, it is of the utmost importance
to validate these scaling boosters for all parts of the chip, both for logic as well as
for all memories. Secondly, these techniques should provide sustainable scaling.
If a new technique will only provide a one-time improvement, it will be very
costly compared to the value return. It therefor becomes ever more important
to keep a close connection between design and technology development. In this
way, the crucial issues for continued scaling can be identified and solved to
improve the next generations of technology.
Recently an interest in heterogeneous scaling and co-integration of logic and
memory has picked up interest. Both parts of the processor chip have such
different requirements that it becomes increasingly hard to scale them together.
Therefor investigations have been on-going to co-integrate finFET’s and vertical
transistors in a single process. In this research, it is important to also take
into account STT-MRAM as it would greatly benefit from this heterogeneous
integration. The importance of scaling boosters however remains. An obvious
example for STT-MRAM is the need for buried interconnect when introducing
vertical transistors. Despite the smaller footprint of a vertical transistor, the
access to the bottom terminal reduces the area gain at the cell level. When
enabling an interconnect layer underneath the vertical transistors, this layer
THE IMPORTANCE OF RESISTANCE 133
can be used to route the SL and greatly reduce the footprint of the cells.
Beyond pitch based scaling will require innovative scaling boosters that bring
sufficient gains for the overall chip. These gains will need to exceed the cost
of integrating and yielding these new techniques, while providing sustainable
scaling.
7.2 The importance of resistance
Throughout this PhD research, parasitic resistance has been a main enemy. The
two cell designs in chapter 4 and chapter 5 have shown the need for resistance-
aware design in advanced technology nodes. In STT-MRAM cells, two major
contributions cause the problem with resistance. The resistivity increase of the
advanced interconnect is causing an increase in parasitic resistance even though
interconnect length is getting smaller due to scaling. On the other hand, due to
the resistive nature of the technology, STT-MRAM is hit extra hard by this
resistance increase, as it decreases crucial aspects as sense margin and write
current flow.
The resistivity increase of the advanced interconnect is a problem for all memory
arrays. As they have long interconnect patterns at their densest layers, the
resistance increase manifests itself here first. In logic interconnect, this is less
of an issue as these densest layers are mainly used for local interconnect, which
limits their overall resistance increase. One exception to this in logic is the
power delivery network. Even at the densest levels, this network is composed
of long lines that require low resistance in order to prevent IR-drops that will
degrade circuit performance. So resistance aware design has become important
for all parts of the chip.
Many other novel devices have a resistive nature like MTJ’s. All of these devices
will be influenced heavily by the increase in parasitic resistance. Properly taking
into account the parasitic resistance is important when evaluating these new
devices in bigger circuits and architectures. Especially since the scalability is
very important to justify the cost of integrating a new device.
On the flip side of the line resistance problem lies the opportunity for novel
designs that tackle this issue. The novel cell designs in this PhD research were
greatly inspired by the resistance problem and would likely not be conceived
without it. Taking to heart the increased importance of parasitic resistance will
no doubt lead to many more novel designs.
134 CONCLUSION
7.3 Beware of variability
Chapter 5 and chapter 6 have shown the importance of variability. Despite
the dimensions of the MTJ in embedded memories not being at the edge of
technology, the variation on especially its diameter is greatly influencing the
operation of the cells. Making the cell robust against this variation has shown
to greatly improve its overall performance.
The logic transistor, which is at the edge of scaling, also suffers a lot from
variability. It is however important to also realize that some things that were
true for planar technology, don’t necessarily hold for advanced finFET nodes.
As shown in chapter 6, the increased performance of PMOS transistors due to
technology innovation can overcome the higher variability they bring.
These types of insights show that it is important to take variability into account
from the very first steps of semiconductor design. When the basic block, such
as the memory cell, is already more robust against variation, large performance
gains can be made. For logic design, this will prove to be more complex as
there is no single basic block as in a memory. Variation-aware standard cell
design and PMOS-NMOS balancing at variability corners could be interesting
concepts to investigate.
7.4 The future of STT-MRAM
STT-MRAM is most likely to be introduced as a high density memory, since the
area advantage over SRAM is the clearest. It has been shown already that it can
reach the required performance and it greatly reduces the area of these memories.
Most gains from replacing SRAM can be made with these high density memories.
They are the ones dominating chip area and more importantly the ones pushing
the scaling limits. Replacing them should alleviate scaling efforts with respect
to the SRAM specific constraints.
A possible future extension of STT-MRAM technology is to double down on
the high density advantage. The 3D integration with BEOL transistors gives
the opportunity to stack multiple layers of memory cells on top of the FEOL
layer. This would also further alleviate FEOL transistor scaling with respect to
STT-MRAM access transistor requirements.
Replacing the high performance SRAM memories is a different story all together.
Because there are many small capacity level-1 caches on a modern multi-core
chip, the high performance memories have relaxed area constraints for the cells
making the peripheral circuits more important. Moreover, the speed of the
THE FUTURE OF STT-MRAM 135
fastest level-1 caches in high performance systems is currently still beyond the
reach of STT-MRAM and will require advances at the device level.
A potential solution could be the integration of the 3 terminal Spin Orbit
Torque Magnetic Random Access Memory or SOT-MRAM. This new technology
promises higher performance and could be co-integrated with STT-MRAM.
It can be optimized separately for both reading and writing thanks to its 3
terminals and separate read and write mechanisms. The reading of this cell
however remains resistive and this will require extra innovation at the peripheral
circuit level to fully get it up to speed.
Finally, there is a growing trend of System-Technology Co-Optimization. This
is the next step beyond DTCO and will also take into account the specifics of
systems, algorithms and applications to optimize technology. Herein lies great
potential to fully utilize the non-volatility of STT-MRAM. In embedded cache
memories, the nature of the stored data is most often volatile, i.e. temporary
results or data being processed. This requires data to be written quite often,
which is fundamentally more difficult and energy consuming in a non-volatile
memory technology. An application such as machine learning inference has a
large amount of fixed filter data that needs to be used often but rarely changes,
making it an ideal candidate to fully utilize an embedded non-volatile memory
like STT-MRAM. System-Technology Co-Optimization brings with it many
new and interesting research opportunities where inclusive design will become
ever more important.

Bibliography
[1] International technology roadmap for semiconductors 2013, back end of
line topic, interconnect tables.
[2] Dong, X., Xu, C., Xie, Y., and Jouppi, N. P. Nvsim: A circuit-level
performance, energy, and area model for emerging nonvolatile memory.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems 31, 7 (July 2012), 994–1007.
[3] Gupta, S., Park, S. P., Mojumder, N., and Roy, K. Layout-
aware optimization of stt mrams. In Design, Automation Test in Europe
Conference Exhibition (DATE), 2012 (March 2012), pp. 1455–1458.
[4] Ikeda, S., Sato, H., Yamanouchi, M., Gan, H., Miura, K.,
Mizunuma, K., Kanai, S., Fukami, S., Matsukura, F., Kasai,
N., and Ohno, H. Recent progress of perpendicular anisotropy magnetic
tunnel junctions for nonvolatile vlsi. SPIN 02, 03 (2012), 1240003.
[5] Kaczer, B., Chen, C., Weckx, P., Roussel, P. J., Toledano-Luque,
M., Franco, J., Cho, M., Watt, J., Chanda, K., Groeseneken,
G., and Grasser, T. Maximizing reliable performance of advanced cmos
circuits - a case study. In 2014 IEEE International Reliability Physics
Symposium (June 2014), pp. 2D.4.1–2D.4.6.
[6] Kang, S. Embedded stt-mram for energy-efficient and cost-effective mobile
systems. In VLSI Technology (VLSI-Technology): Digest of Technical
Papers, 2014 Symposium on (June 2014), pp. 1–2.
[7] Kang, W., Zhang, L., Klein, J. O., Zhang, Y., Ravelosona,
D., and Zhao, W. Reconfigurable codesign of stt-mram under process
variations in deeply scaled technology. IEEE Transactions on Electron
Devices 62, 6 (June 2015), 1769–1777.
137
138 BIBLIOGRAPHY
[8] Kang, W., Zhao, W., Wang, Z., Zhang, Y., Klein, J. O., Chappert,
C., Zhang, Y., and Ravelosona, D. Dfstt-mram: Dual functional stt-
mram cell structure for reliability enhancement and 3-d mlc functionality.
IEEE Transactions on Magnetics 50, 6 (June 2014), 1–7.
[9] Kawasumi, A., Kushida, K., Hara, H., Unekawa, Y., Abe, K.,
Ikegami, K., Noguchi, H., Kitagawa, E., Kamata, C., Kashiwada,
S., Kato, Y., Saida, D., Shimomura, N., Ito, J., and Fujita, S.
Circuit techniques in realizing voltage-generator-less stt mram suitable
for normally-off-type non-volatile l2 cache memory. In Memory Workshop
(IMW), 2013 5th IEEE International (May 2013), pp. 76–79.
[10] Kitagawa, E., Fujita, S., Nomura, K., Noguchi, H., Abe, K.,
Ikegami, K., Daibou, T., Kato, Y., Kamata, C., Kashiwada, S.,
Shimomura, N., Ito, J., and Yoda, H. Impact of ultra low power and
fast write operation of advanced perpendicular mtj on power reduction for
high-performance mobile cpu. In Electron Devices Meeting (IEDM), 2012
IEEE International (Dec 2012), pp. 29.4.1–29.4.4.
[11] Li, J., Ndai, P., Goel, A., Salahuddin, S., and Roy, K. Design
paradigm for robust spin-torque transfer magnetic ram (stt mram) from
circuit/architecture perspective. IEEE Transactions on Very Large Scale
Integration (VLSI) Systems 18, 12 (Dec 2010), 1710–1723.
[12] Liebmann, L., Zeng, J., Zhu, X., Yuan, L., Bouche, G., and Kye,
J. Overcoming scaling barriers through design technology cooptimization.
In 2016 IEEE Symposium on VLSI Technology (June 2016), pp. 1–2.
[13] Min, T., Tokei, Z., Kar, G., Coseman, S., Bekaert, J., Raghavan,
P., Cornelissen, S., Xu, K., Souriau, L., Radisic, D., Swerts, J.,
Tahmasebi, T., and Mertens, S. Interconnects scaling challenge for sub-
20nm spin torque transfer magnetic random access memory technology. In
Interconnect Technology Conference / Advanced Metallization Conference
(IITC/AMC), 2014 IEEE International (May 2014), pp. 341–344.
[14] Mojumder, N., and Roy, K. Proposal for switching current reduction
using reference layer with tilted magnetic anisotropy in magnetic tunnel
junctions for spin-transfer torque (stt) mram. Electron Devices, IEEE
Transactions on 59, 11 (Nov 2012), 3054–3060.
[15] Natarajan, S., Agostinelli, M., Akbar, S., Bost, M., Bowonder,
A., Chikarmane, V., Chouksey, S., Dasgupta, A., Fischer, K., Fu,
Q., Ghani, T., Giles, M., Govindaraju, S., Grover, R., Han, W.,
Hanken, D., Haralson, E., Haran, M., Heckscher, M., Heussner,
R., Jain, P., James, R., Jhaveri, R., Jin, I., Kam, H., Karl,
BIBLIOGRAPHY 139
E., Kenyon, C., Liu, M., Luo, Y., Mehandru, R., Morarka, S.,
Neiberg, L., Packan, P., Paliwal, A., Parker, C., Patel, P., Patel,
R., Pelto, C., Pipes, L., Plekhanov, P., Prince, M., Rajamani, S.,
Sandford, J., Sell, B., Sivakumar, S., Smith, P., Song, B., Tone,
K., Troeger, T., Wiedemer, J., Yang, M., and Zhang, K. A 14nm
logic technology featuring 2nd-generation finfet transistors, air-gapped
interconnects, self-aligned double patterning and a 0.0588 µm2 sram cell
size. In Electron Devices Meeting (IEDM), 2014 IEEE International (Dec
2014), pp. 3.7.1–3.7.3.
[16] Noguchi, H., Kushida, K., Ikegami, K., Abe, K., Kitagawa, E.,
Kashiwada, S., Kamata, C., Kawasumi, A., Hara, H., and Fujita,
S. A 250-mhz 256b-i/o 1-mb stt-mram with advanced perpendicular mtj
based dual cell for nonvolatile magnetic caches to reduce active power of
processors. In VLSI Circuits (VLSIC), 2013 Symposium on (June 2013),
pp. C108–C109.
[17] Ohashi, T., Yamaguchi, A., Hasumi, K., Inoue, O., Ikota, M.,
Lorusso, G., Donadio, G. L., Yasin, F., Rao, S., and Kar, G. S.
Variability study with cd-sem metrology for stt-mram: correlation analysis
between physical dimensions and electrical property of the memory element,
2017.
[18] Ohsawa, T., Koike, H., Miura, S., Honjo, H., Kinoshita, K., Ikeda,
S., Hanyu, T., Ohno, H., and Endoh, T. A 1 mb nonvolatile embedded
memory using 4t2mtj cell with 32 b fine-grained power gating scheme. Solid-
State Circuits, IEEE Journal of 48, 6 (June 2013), 1511–1520.
[19] Park, H., Dorrance, R., Amin, A., Ren, F., Markovic, D., and
Yang, C. K. K. Analysis of stt-ram cell design with multiple mtjs
per access. In 2011 IEEE/ACM International Symposium on Nanoscale
Architectures (June 2011), pp. 53–58.
[20] Prasad, C., Park, K. W., Chahal, M., Meric, I., Novak, S. R.,
Ramey, S., Bai, P., Chang, H. Y., Dias, N. L., Hafez, W. M., Jan,
C. H., Nidhi, N., Olac-vaw, R. W., Ramaswamy, R., and Tsai, C.
Transistor reliability characterization and comparisons for a 14 nm tri-gate
technology optimized for system-on-chip and foundry platforms. In 2016
IEEE International Reliability Physics Symposium (IRPS) (April 2016),
pp. 4B–5–1–4B–5–8.
[21] Ren, F., Park, H., Dorrance, R., Toriyama, Y., Yang, C. K. K.,
and Markovi0˘107, D. A body-voltage-sensing-based short pulse reading
circuit for spin-torque transfer rams (stt-rams). In Thirteenth International
140 BIBLIOGRAPHY
Symposium on Quality Electronic Design (ISQED) (March 2012), pp. 275–
282.
[22] Ryckaert, J., Raghavan, P., Baert, R., Bardon, M., Dusa,
M., Mallik, A., Sakhare, S., Vandewalle, B., Wambacq, P.,
Chava, B., Croes, K., Dehan, M., Jang, D., Leray, P., Liu, T.-T.,
Miyaguchi, K., Parvais, B., Schuddinck, P., Weemaes, P., Mercha,
A., Bömmels, J., Horiguchi, N., McIntyre, G., Thean, A., Tökei,
Z., Cheng, S., Verkest, D., and Steegen, A. Design technology
co-optimization for n10. In Custom Integrated Circuits Conference (CICC),
2014 IEEE Proceedings of the (Sept 2014), pp. 1–8.
[23] Ryckaert, J., Raghavan, P., Schuddinck, P., Trong, H. B.,
Mallik, A., Sakhare, S. S., Chava, B., Sherazi, Y., Leray, P.,
Mercha, A., Bömmels, J., McIntyre, G. R., Ronse, K. G., Thean,
A., Tökei, Z., Steegen, A., and Verkest, D. Dtco at n7 and beyond:
patterning and electrical compromises and opportunities. Proc. SPIE 9427
(2015), 94270C–94270C–8.
[24] Ryu, J. W., and Kwon, K. W. A reliable 2t2mtj nonvolatile static gain
cell stt-mram with self-referencing sensing circuits for embedded memory
application. IEEE Transactions on Magnetics 52, 4 (April 2016), 1–10.
[25] Sherazi, S. M. Y., Chava, B., Debacker, P., Bardon, M. G.,
Schuddinck, P., Firouzi, F., Raghavan, P., Mercha, A., Verkest,
D., and Ryckaert, J. Architectural strategies in standard-cell design for
the 7 nm and beyond technology node. Journal of Micro/Nanolithography,
MEMS, and MOEMS 15, 1 (2016), 013507.
[26] Slonczewski, J. Current-driven excitation of magnetic multilayers.
Journal of Magnetism and Magnetic Materials 159, 1-2 (jun 1996), L1–L7.
[27] Weckx, P., Kaczer, B., Chen, C., Franco, J., Bury, E., Chanda,
K., Watt, J., Roussel, P. J., Catthoor, F., and Groeseneken, G.
Characterization of time-dependent variability using 32k transistor arrays
in an advanced hk/mg technology. In 2015 IEEE International Reliability
Physics Symposium (April 2015), pp. 3B.1.1–3B.1.6.
[28] Weckx, P., Kaczer, B., Kukner, H., Roussel, J., Raghavan, P.,
Catthoor, F., and Groeseneken, G. Non-monte-carlo methodology
for high-sigma simulations of circuits under workload-dependent bti
degradation - application to 6t sram. In 2014 IEEE International Reliability
Physics Symposium (June 2014), pp. 5D.2.1–5D.2.6.
BIBLIOGRAPHY 141
[29] Wilton, S. J. E., and Jouppi, N. P. Cacti: an enhanced cache access
and cycle time model. IEEE Journal of Solid-State Circuits 31, 5 (May
1996), 677–688.
[30] Wu, S. Y., Lin, C. Y., Chiang, M. C., Liaw, J. J., Cheng, J. Y.,
Yang, S. H., Chang, S. Z., Liang, M., Miyashita, T., Tsai, C. H.,
Chang, C. H., Chang, V. S., Wu, Y. K., Chen, J. H., Chen, H. F.,
Chang, S. Y., Pan, K. H., Tsui, R. F., Yao, C. H., Ting, K. C.,
Yamamoto, T., Huang, H. T., Lee, T. L., Lee, C. H., Chang, W.,
Lee, H. M., Chen, C. C., Chang, T., Chen, R., Chiu, Y. H., Tsai,
M. H., Jang, S. M., Chen, K. S., and Ku, Y. An enhanced 16nm cmos
technology featuring 2nd generation finfet transistors and advanced cu/low-
k interconnect for low power and high performance applications. In 2014
IEEE International Electron Devices Meeting (Dec 2014), pp. 3.1.1–3.1.4.
[31] Wu, S. Y., Lin, C. Y., Chiang, M. C., Liaw, J. J., Cheng, J. Y.,
Yang, S. H., Liang, M., Miyashita, T., Tsai, C. H., Hsu, B. C.,
Chen, H. Y., Yamamoto, T., Chang, S. Y., Chang, V. S., Chang,
C. H., Chen, J. H., Chen, H. F., Ting, K. C., Wu, Y. K., Pan,
K. H., Tsui, R. F., Yao, C. H., Chang, P. R., Lien, H. M., Lee,
T. L., Lee, H. M., Chang, W., Chang, T., Chen, R., Yeh, M.,
Chen, C. C., Chiu, Y. H., Chen, Y. H., Huang, H. C., Lu, Y. C.,
Chang, C. W., Tsai, M. H., Liu, C. C., Chen, K. S., Kuo, C. C.,
Lin, H. T., Jang, S. M., and Ku, Y. A 16nm finfet cmos technology
for mobile soc and computing applications. In 2013 IEEE International
Electron Devices Meeting (Dec 2013), pp. 9.1.1–9.1.4.
[32] Yu, S., and Chen, P. Y. Emerging memory technologies: Recent trends
and prospects. IEEE Solid-State Circuits Magazine 8, 2 (Spring 2016),
43–56.
[33] Zhao, B., Yang, J., Zhang, Y., Chen, Y., and Li, H. Architecting
a common-source-line array for bipolar non-volatile memory devices. In
Design, Automation Test in Europe Conference Exhibition (DATE), 2012
(March 2012), pp. 1451–1454.

List of publications
Journal papers
R. Appeltans, P. Raghavan, G. S. Kar, A. Furnnt, L. Van der Perre and
W. Dehaene, "A Smaller, Faster, and More Energy-Efficient Complementary
STT-MRAM Cell Uses Three Transistors and a Ground Grid: More Is Actually
Less," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 25, no. 4, pp. 1204-1214, April 2017.
Peer-reviewed conference papers
R. Appeltans, S. Cosemans, P. Raghavan, D. Verkest, L. Van der Perre and W.
Dehaene, "STT-MRAM cell design with partial source line planes: improving the
trade-off between area and series resistance," 2015 IEEE Non-Volatile Memory
System and Applications Symposium (NVMSA), Hong Kong, 2015, pp. 1-6.
Raf Appeltans ; Pieter Weckx ; Praveen Raghavan ; Ryoung-Han Kim ; Gouri
Sankar Kar ; Arnaud Furnnt ; Liesbet Van der Perre ; Wim Dehaene; The effect
of patterning options on embedded memory cells in logic technologies at iN10
and iN7. Proc. SPIE 10148, Design-Process-Technology Co-optimization for
Manufacturability XI, 101480G (March 17, 2017);
Gian Francesco Lorusso ; Takeyoshi Ohashi ; Astuko Yamaguchi ; Osamu Inoue
; Takumichi Sutani ; Naoto Horiguchi ; Jrgen Bmmels ; Christopher J. Wilson ;
Basoene Briggs ; Chi Lim Tan ; Tom Raymaekers ; Romain Delhougne ; Geert
Van den Bosch ; Luca Di Piazza ; Gouri Sankar Kar ; Arnaud Furnnt ; Andrea
Fantini ; Gabriele Luca Donadio ; Laurent Souriau ; Davide Crotti ; Farrukh
Yasin ; Raf Appeltans ; Siddharth Rao ; Danilo De Simone ; Paulina Rincon
Delgadillo ; Philippe Leray ; Anne-Laure Charley ; Daisy Zhou ; Anabela Veloso
; Nadine Collaert ; Kazuhisa Hasumi ; Shunsuke Koshihara ; Masami Ikota
; Yutaka Okagawa ; Toru Ishimoto; Enabling CD SEM metrology for 5nm
technology node and beyond. Proc. SPIE 10145, Metrology, Inspection, and
Process Control for Microlithography XXXI, 1014512 (March 28, 2017);
143
144 LIST OF PUBLICATIONS
Patents filed
Raf Appeltans, Praveen Raghavan, "Three transistor two junction MRAM
bit cell", application numbers EP 15198573.6, JP 2016 238559, US 15367293
Raf Appeltans, Praveen Raghavan, Davide Francesco Crotti, "A device and a
method", application number EP 17157919.6
Previous work
Ubaid Ahmad, Min Li, Raf Appeltans, Hoang Duy Nguyen, Amir Amin,
Antoine Dejonghe, Liesbet Van der Perre, Rudy Lauwereins and Sofie Pollin,
"Exploration of Lattice Reduction Aided Soft-Output MIMO Detection on a
DLP/ILP Baseband Processor," in IEEE Transactions on Signal Processing, vol.
61, no. 23, pp. 5878-5892, Dec.1, 2013.
Min Li, Amir Amin, Raf Appeltans, Andy Folens, Ubaid Ahmad, Hans
Cappelle, Peter Debacker, Lieven Hollevoet, Andre Bourdoux, Praveen
Raghavan, Antoine Dejonghe and Liesbet Van Der Perre, "A C-programmable
baseband processor with inner modem implementations for LTE Cat-4/5/7 and
Gbps 80MHz 4 802.11ac (invited)," 2013 IEEE Global Conference on Signal
and Information Processing, Austin, TX, 2013, pp. 1222-1225.
Andre Bourdoux, Min Li, Hans Cappelle, Amir Amin, Raf Appeltans,
Andy Folens and Antoine Dejonghe, "A unified receiver signal processing
architecture for all modes of the DTMB broadcasting system," 2013 IEEE
24th Annual International Symposium on Personal, Indoor, and Mobile Radio
Communications (PIMRC), London, 2013, pp. 651-656.
Min Li, Amir Amin, Rodolfo Torrea, Ubaid Ahmad, Raf Appeltans, Antoine
Dejonghe and Liesbet Van Der Perre, "Processor based 20Mhz 4 Cat-5 LTE
MIMO receiver with advanced detectors," 2013 IEEE International Conference
on Acoustics, Speech and Signal Processing, Vancouver, BC, 2013, pp. 2669-
2673.
Kiyotaka Kobayashi, Hidekuni Yomo, Min Li, Raf Appeltans, Hans
Cappelle, Amir Amin, Aissa Couvreur, Matthias Hartmann, Andre Bourdoux,
Praveen Raghavan, Antoine Dejonghe and Liesbet Van der Perre, "Algorithm-
Architecture Co-Optimization of Area-Efficient SDR Baseband for Highly
Diversified Digital TV Standards," 2012 IEEE 75th Vehicular Technology
Conference (VTC Spring), Yokohama, 2012, pp. 1-5.
Min Li, Raf Appeltans, Amir Amin, Rodolfo Torrea-Duran, Hans Cappelle,
Matthias Hartmann, Hidekuni Yomo, Kiyotaka Kobayashi, Antoine Dejonghe
and Liesbet Van Der Perre "Overview of a Software Defined Downlink
Inner Receiver for Category-E LTE-Advanced UE," 2011 IEEE International
LIST OF PUBLICATIONS 145
Conference on Communications (ICC), Kyoto, 2011, pp. 1-5.
Min Li, Amir Amin, Raf Appeltans, Rodolfo Torrea, Hans Cappelle, Robert
Fasthuber, Antoine Dejonghe and Liesbet Van Der Perre "Instruction set
support and algorithm-architecture for fully parallel multi-standard soft-output
demapping on baseband processors," 2010 IEEE Workshop On Signal Processing
Systems, San Francisco, CA, 2010, pp. 140-145.


FACULTY OF ENGINEERING SCIENCE
DEPARTMENT OF ELECTRICAL ENGINEERING
MICAS-IMEC
Celestijnenlaan 200A box 2402
B-3001 Leuven
raf.appeltans@imec.be
