Extending the life of NOR flash memory using a genetic algorithm by Sullivan, Joe
Extending The Life Of NOR Flash




Supervisor: Dr. Conor Ryan
External Examiner:Prof. Robert J.Howlett
Internal Examiner:Dr. Hussain Mahdi
A thesis for the PhD Degree
Submitted to the University of Limerick
May 31, 2013
Dedicated to my wonderfull wife Irene
To my family and friends
To Barry
Declaration
I hereby declare that the work presented in this thesis is original except where
an acknowledgement is made or a reference is given to other work and I have




Date: May 31, 2013
Supervisor: Dr. Conor Ryan
Signed:
Date: May 31, 2013
ii
Abstract
We present methods, observations and insights into the application of Evolu-
tionary Algorithms(EA) to the problem of flash memory wear-out. In doing
so we examine the union of two distinct cutting edge technologies: that of
Non Volatile Memory(NVM) and that of EA, specifically the class of EAs
known as Genetic Algorithms(GA).
The complete adoption of flash memory for those applications that require
non-volatile storage is inhibited by a small number of negative characteristics
of flash, most notably wear-out and the data retention/endurance trade-off.
This thesis describes how to build and validate an automated system that
uses evolutionary search techniques to perform embodied evolution on hard
silicon in order to find programming parameters that will reduce wear.
We use the system to optimise the read, write and erase conditions of the
device to enhance reliability. Since the exploration is done on actual silicon
in real time, it is costly in both those terms. However, it provides a level of
accuracy that could barely be approximated in simulation due to the com-
plexity of the devices, the variance between storage elements and the sheer
number of unknowns. We mitigate this cost with the use of small population
methods and the structured inclusion of some acquired domain knowledge.
Results are calculated on a per device basis, with derived solutions com-
pared to baseline results for that device. They demonstrate an increase in
endurance of up to 300% per device. A blueprint for future experimentation
with sequential access, or NAND flash memory, is presented. Although there
has been an embargo placed on this thesis, with the result that very little
iii
can be published from it, this work has had considerable impact, getting
coverage in international popular science publications, as well as leading to
research funding involving several institutions and companies.
iv
Acknowledgements
May I first and foremost thank my advisor Dr Conor Ryan to whom I own
an enormous dept of gratitude, in particular for his early belief, middle for-
bearance and latter dedication and always throughout, enormous support.
I would also like to thank the many souls from Analog Devices Inc. for
their patience in answering my endless questions particularly Dr Brian Moss,
Kieran Heffernan, Tom Lynch, Alan Clohessy, Neville Craig, Tim Larkin,
K.D. Yu and many more. Several engineers from Rosemount Analytical were
helpful in particular Tim McCarty.
For inspiration I have to thank Dr Paul Cullen of the University of Not-
tingham as well as Nick Jacob and Robert Hoare. Without their skills with
a teapot and boundless intellects none of this would have been possible.
I would also like to thank Alan Sheahan from Limerick Institute of Tech-
nology(LIT) for helping with, and talking through some of the most inacces-
sible element of this work. Dáithi Sims, head of the Department of Electrical
and Electronic Engineering at LIT was extremely supportive and who’s flex-
ible approach at critical times eased the burden considerably. I am also
grateful to Pascal Meehan, head of the School of Science, Engineering, and
IT who was always supportive and who took the time to acknowledged our
successes. I would also like to acknowledge the help and support of several of
my collegues at LIT, one Mr Anthony McMahon, Cieran O’Loughlan, Gerard
Moynihan and Dr Niall Enright.
Furthermore I would like to thank a number of staff members at the
Department of Computer Science and Information Systems(CSIS), Univer-
v
sity of Limerick, J.J. Collins, Jim Buckley and of course head of department
Annette McElligott who were all extremely supportive.
I’m very grateful to John Koza for his kind words after Gecco 2007 which
changed everything for us. Thanks also to Dr Anthony Ginty of EMC.
I would also like to sincerely thank Jim McGee and Triona Marren-
O’Grady who made sure this thesis could be read.
I have been lucky in having the unfailing support of my wife, who gave
up countless week-ends, Christmas and summer holidays for years and years
so this could happen. Thanks to my family and friends, particularly Peter
my brother and Darragh O’Connor and Brian McGee who are always ready
to help. I would also like to thank my mum Kathleen who always insisted
that I would attend college and has been waiting for it to end for a very long
time.
Finally I would like to thank my friend Conor for being there during every
step of the way and who’s now many doors, were always open.







List of Figures xv
List of Tables xvi
1 Introduction 5
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Central Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Motivation and Rationale . . . . . . . . . . . . . . . . . . . . 10
1.4 Core Research Questions . . . . . . . . . . . . . . . . . . . . . 10
1.5 Contribution of this work . . . . . . . . . . . . . . . . . . . . . 12
1.6 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . 12
2 Flash Memory 14
2.1 Introduction to Non-Volatile Memory . . . . . . . . . . . . . . 14
2.2 The Erasable PROM Family . . . . . . . . . . . . . . . . . . . 19
2.2.1 E2PROM Derivatives . . . . . . . . . . . . . . . . . . . 20
2.2.2 Program and Erase Cycle Management . . . . . . . . . 21
2.3 NVM Floating Gate Basics . . . . . . . . . . . . . . . . . . . 22
2.3.1 MOSFET Operation . . . . . . . . . . . . . . . . . . . 23
vii
2.3.2 The MOSFET as a Memory Storage Element . . . . . 25
2.3.3 The Electrically Erasable Cell . . . . . . . . . . . . . . 26
2.3.4 Writing Using Hot Electron Injection . . . . . . . . . . 27
2.3.5 Erasing Using Fowler-Nordheim Tunneling . . . . . . . 28
2.3.6 Reading the Cell . . . . . . . . . . . . . . . . . . . . . 29
2.4 Failure Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Retention Problems . . . . . . . . . . . . . . . . . . . . 31
2.4.2 Endurance Failure Mechanism . . . . . . . . . . . . . . 32
2.4.3 Disturb Failures . . . . . . . . . . . . . . . . . . . . . . 33
2.4.4 Programming Parameter Calculation . . . . . . . . . . 34
2.5 Flash Cell Architectures . . . . . . . . . . . . . . . . . . . . . 34
2.5.1 Two Transistor . . . . . . . . . . . . . . . . . . . . . . 35
2.5.2 Stacked Gate . . . . . . . . . . . . . . . . . . . . . . . 36
2.5.3 Split Gate . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Evolutionary Algorithms 40
3.1 Evolutionary Computation . . . . . . . . . . . . . . . . . . . . 40
3.2 EA Operation and Operators . . . . . . . . . . . . . . . . . . 43
3.2.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.4 Representation . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Canonical EAs . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.1 Genetic Programming . . . . . . . . . . . . . . . . . . 49
3.3.2 Evolutionary Strategies . . . . . . . . . . . . . . . . . . 52
3.3.3 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . 52
3.4 Other EAs and Adaptive Systems . . . . . . . . . . . . . . . 57
3.4.1 Grammatical Evolution . . . . . . . . . . . . . . . . . . 57
3.4.2 Evolutionary Programming . . . . . . . . . . . . . . . 59
3.4.3 Learning Classifier . . . . . . . . . . . . . . . . . . . . 59
3.4.4 Other Algorithms . . . . . . . . . . . . . . . . . . . . . 61
viii
3.5 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . 61
3.5.2 EA Application . . . . . . . . . . . . . . . . . . . . . . 62
3.5.3 Flash Memory . . . . . . . . . . . . . . . . . . . . . . . 65
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Methodology and Equipment 68
4.1 Outline of Problem Space . . . . . . . . . . . . . . . . . . . . 68
4.1.1 Restatement of the Contentions . . . . . . . . . . . . . 70
4.1.2 Contentions Cn.1 - Cn.5 in the Context of the Platform
Requirements . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.1 Platform Options . . . . . . . . . . . . . . . . . . . . . 73
4.2.2 Experiment Duration Estimates . . . . . . . . . . . . . 75
4.2.3 Evaluating the Remaining Option . . . . . . . . . . . . 77
4.3 Test Platform Design . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.1 DUT General Description . . . . . . . . . . . . . . . . 78
4.3.2 Memory Map Coding Consideration . . . . . . . . . . . 80
4.3.3 The Control Board . . . . . . . . . . . . . . . . . . . . 82
4.3.4 The DUT Board . . . . . . . . . . . . . . . . . . . . . 83
4.3.5 Signal Conditioning Board . . . . . . . . . . . . . . . . 85
4.3.6 Building and Testing . . . . . . . . . . . . . . . . . . . 87
4.4 Hardware Summary . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5 NOR Revision Two and NAND . . . . . . . . . . . . . . . . . 89
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 Software 92
5.1 Software Requirements . . . . . . . . . . . . . . . . . . . . . . 92
5.2 Design Considerations . . . . . . . . . . . . . . . . . . . . . . 93
5.3 The Control Board . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.1 The Serial Communications Module . . . . . . . . . . . 96
5.3.2 The Event Handler . . . . . . . . . . . . . . . . . . . . 97
ix
5.3.3 Other Functions . . . . . . . . . . . . . . . . . . . . . . 98
5.4 The Device Under Test . . . . . . . . . . . . . . . . . . . . . . 98
5.4.1 First Generation DUT Code . . . . . . . . . . . . . . . 99
5.4.2 Second generation DUT code . . . . . . . . . . . . . . 100
5.5 The PC code . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.5.1 The Cross Compiler . . . . . . . . . . . . . . . . . . . 102
5.5.2 The GA form . . . . . . . . . . . . . . . . . . . . . . . 103
5.5.3 The Hardware Form . . . . . . . . . . . . . . . . . . . 106
5.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 106
6 Platform Calibration and Initial Results 107
6.1 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.2.1 ETIM1 Register . . . . . . . . . . . . . . . . . . . . . . 109
6.2.2 ETIM2 Register . . . . . . . . . . . . . . . . . . . . . . 110
6.2.3 EETEST0 Register . . . . . . . . . . . . . . . . . . . . 110
6.2.4 EETEST1 Register . . . . . . . . . . . . . . . . . . . . 111
6.3 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3.1 Flash Cell Recovery . . . . . . . . . . . . . . . . . . . . 113
6.3.2 Endurance Variation . . . . . . . . . . . . . . . . . . . 113
6.3.3 Manufacturing History . . . . . . . . . . . . . . . . . . 114
6.3.4 Longevity as a Fitness Function . . . . . . . . . . . . . 114
6.3.5 Cell Current as a Fitness Function . . . . . . . . . . . 115
6.4 Proving runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.4.1 Verification Testing . . . . . . . . . . . . . . . . . . . . 116
6.4.2 Visual Monitoring . . . . . . . . . . . . . . . . . . . . . 117
6.5 Timing Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.5.1 Initial Calibrations . . . . . . . . . . . . . . . . . . . . 119
6.5.2 Scope of Endurances . . . . . . . . . . . . . . . . . . . 121
6.5.3 Timing Verification . . . . . . . . . . . . . . . . . . . . 122
6.6 Coupling the Hardware to the GA . . . . . . . . . . . . . . . . 122
6.7 Calibration Runs . . . . . . . . . . . . . . . . . . . . . . . . . 126
x
6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7 Experiments and Results 128
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.2 Initial Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2.1 Device A . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2.2 Device B . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2.3 Device C . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2.4 Device D . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.2.5 Summary of the First Four Devices . . . . . . . . . . . 138
7.3 Primary Data Collection . . . . . . . . . . . . . . . . . . . . . 139
7.3.1 Devices E to O . . . . . . . . . . . . . . . . . . . . . . 140
7.3.2 Device GB1 to GB6 . . . . . . . . . . . . . . . . . . . 143
7.4 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8 Conclusions 147
8.1 Research Achievement . . . . . . . . . . . . . . . . . . . . . . 147
8.2 Motivation and Central Hypothesis . . . . . . . . . . . . . . . 148
8.3 Core Research Questions . . . . . . . . . . . . . . . . . . . . . 149
8.4 Addressing the Research Contentions . . . . . . . . . . . . . . 149
8.4.1 Contention Cn.1 . . . . . . . . . . . . . . . . . . . . . 150
8.4.2 Contention Cn.2 . . . . . . . . . . . . . . . . . . . . . 151
8.4.3 Contention Cn.3 . . . . . . . . . . . . . . . . . . . . . 153
8.4.4 Contention Cn.4 . . . . . . . . . . . . . . . . . . . . . 154
8.4.5 Contention Cn.5 . . . . . . . . . . . . . . . . . . . . . 154
8.4.6 Contention Cn.6 . . . . . . . . . . . . . . . . . . . . . 155
8.5 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.5.1 A blueprint for future experimentation on NAND Flash
memory . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
xi
9 Appendix 177
9.1 The Analog Devices ADu812/ADu824 . . . . . . . . . . . . . 177
9.2 The Memory Map . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.2.1 Memory Map Coding Consideration . . . . . . . . . . . 180
9.3 Code Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.3.1 SBC Header File . . . . . . . . . . . . . . . . . . . . . 183
9.4 Flash Memory: Growth . . . . . . . . . . . . . . . . . . . . . . 189
9.5 Erase Time Register Values . . . . . . . . . . . . . . . . . . . 191
9.6 Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 192
9.7 Detailed Schematics and Pin Designation . . . . . . . . . . . . 193
9.7.1 IFT board interface . . . . . . . . . . . . . . . . . . . . 193
9.7.2 DUT board Schematic . . . . . . . . . . . . . . . . . . 194
9.7.3 PIO Port Schematic and Modes . . . . . . . . . . . . . 195
xii
List of Figures
2.1 Principle memory types . . . . . . . . . . . . . . . . . . . . . 15
2.2 Flash memory growth in millions of US dollars . . . . . . . . 17
2.3 Flash memory growth forecast to 2016 . . . . . . . . . . . . . 17
2.4 SSD growth forecast in light of cheap NAND Flash . . . . . . 18
2.5 Idealised MOSFET memory element . . . . . . . . . . . . . . 23
2.6 MOSFET transistor and schematic symbol . . . . . . . . . . 23
2.7 Drain to source current response over drain to source voltage
for various gate voltages. i.e. the behaviour of MOS field effect
transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 EPROM structure . . . . . . . . . . . . . . . . . . . . . . . . 26
2.9 Flash structure versus EPROM structure . . . . . . . . . . . . 27
2.10 Channel Hot Electron(CHE) injection mechanism . . . . . . . 27
2.11 Fowler-Nordheim tunneling mechanism . . . . . . . . . . . . . 28
2.12 Nor cell memory array . . . . . . . . . . . . . . . . . . . . . . 30
2.13 Bathtub curve showing the flash memory life cycle . . . . . . 31
2.14 Stacked gate EPROM structure . . . . . . . . . . . . . . . . . 36
2.15 Split gate memory cell . . . . . . . . . . . . . . . . . . . . . . 37
2.16 Split gate memory cell die photograph . . . . . . . . . . . . . 38
3.1 An overview of a standard evolutionary algorithm . . . . . . . 44
3.2 Crossover in genetic algorithms . . . . . . . . . . . . . . . . . 47
3.3 Mutation in genetic algorithms. . . . . . . . . . . . . . . . . . 48
3.4 A genetic programming tree structure example . . . . . . . . 50
3.5 GP Tree structure is spliced . . . . . . . . . . . . . . . . . . . 50
xiii
3.6 The resultant crossed individual . . . . . . . . . . . . . . . . 51
3.7 Individuals on a fitness landscape. - Left, generation 1. Right,
after some evolution . . . . . . . . . . . . . . . . . . . . . . . 56
3.8 A deceptive fitness landscape. . . . . . . . . . . . . . . . . . . 56
3.9 Rastrigin function . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.10 GA Driven Innovative X-band Antenae Design . . . . . . . . 63
4.1 Block diagram of the Analog Devices ADu812 Micro-converter
chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 ADu824 PLCC pin assignment . . . . . . . . . . . . . . . . . . 80
4.3 ADu824 memory map . . . . . . . . . . . . . . . . . . . . . . 81
4.4 Generalised block diagram . . . . . . . . . . . . . . . . . . . . 82
4.5 Block diagram of the DUT board . . . . . . . . . . . . . . . . 84
4.6 The 74LS373 transparent latch . . . . . . . . . . . . . . . . . 84
4.7 DUT board schematic . . . . . . . . . . . . . . . . . . . . . . 85
4.8 DUT board prototype . . . . . . . . . . . . . . . . . . . . . . 86
4.9 Entire tester block diagram . . . . . . . . . . . . . . . . . . . 88
4.10 NAND tester . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1 Functional block diagram of the test platforms software . . . 94
5.2 VB cross compiler . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3 Running the platform . . . . . . . . . . . . . . . . . . . . . . . 104
6.1 The special function register ETIM1 . . . . . . . . . . . . . . 109
6.2 The special function register ETIM2 controls erase time . . . . 110
6.3 The special function register EETEST1 controls programming
current and high voltage set up time . . . . . . . . . . . . . . 111
6.4 Progress of the value of the register TERASE, over the life of
a individual . This Register controls the erase time. On the
Right when the GA sets the slope close to 1, and on the Left
when slope is set close to 6 . . . . . . . . . . . . . . . . . . . . 123
6.5 Variations of representation employed, binary string and float-
ing point numbers . . . . . . . . . . . . . . . . . . . . . . . . . 124
xiv
6.6 A sample of generation one, showing starting point in the first
half of the erase time and slopes from straight line to deeply
concave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.1 Generation 1 fitness proportionate roulette wheel and gener-
ation 2 individual replication distribution. Both generations
contain 20 individuals . . . . . . . . . . . . . . . . . . . . . . 130
7.2 The occurrence of viable solutions in generation 1 and their
re-occurrence in generation 2 . . . . . . . . . . . . . . . . . . . 131
7.3 Average life gain over the default solution per generation . . . 142
7.4 Max and average life gain over the default solution per gener-
ation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9.1 Block diagram of the Analog Devices ADu812 Micro-converter
chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.2 ADU824 memory map . . . . . . . . . . . . . . . . . . . . . . 180
9.3 ADU824 Memory Map . . . . . . . . . . . . . . . . . . . . . . 181
9.4 Special function register map . . . . . . . . . . . . . . . . . . 182
9.5 Flash memory growth forecast to 2016 . . . . . . . . . . . . . 189
9.6 SSD growth forecast in light of cheap NAND Flash . . . . . . 190
9.7 Erase Control Register . . . . . . . . . . . . . . . . . . . . . . 191
9.8 Erase Register Value Meanings . . . . . . . . . . . . . . . . . 191
9.9 Generalised block diagram . . . . . . . . . . . . . . . . . . . . 192
9.10 Complete block diagram . . . . . . . . . . . . . . . . . . . . . 192
9.11 Connections from the SBC . . . . . . . . . . . . . . . . . . . . 193
9.12 DUT board schematic diagram . . . . . . . . . . . . . . . . . 194
9.13 Operating modes of the 8255 . . . . . . . . . . . . . . . . . . . 195
9.14 Functional diagram of the PIO . . . . . . . . . . . . . . . . . . 195
xv
List of Tables
2.1 Comparison of programming methods . . . . . . . . . . . . . . 35
4.1 Estimated single experiment duration . . . . . . . . . . . . . . 76
5.1 Control board code modules . . . . . . . . . . . . . . . . . . . 96
5.2 Serial communications protocol . . . . . . . . . . . . . . . . . 97
5.3 The PC application forms . . . . . . . . . . . . . . . . . . . . 102
6.1 The non-volatile memory registers . . . . . . . . . . . . . . . 108
6.2 The variables TPROG and THVSU . . . . . . . . . . . . . . 110
6.3 The variables IPROG and HV . . . . . . . . . . . . . . . . . 112
6.4 Theoretical minimum experiment duration . . . . . . . . . . . 118
6.5 First run endurance results for blocks 4 to 17 . . . . . . . . . 125
6.6 Endurance results for first new device . . . . . . . . . . . . . . 126
7.1 Endurance results device A, generation 1 . . . . . . . . . . . . 129
7.2 Endurance results device A, generation 2 . . . . . . . . . . . . 130
7.3 Starting values, closing values and fitness for generation 1 and 2132
7.4 Endurance results for device A . . . . . . . . . . . . . . . . . 132
7.5 Endurance results for device B . . . . . . . . . . . . . . . . . . 133
7.6 Endurance results for device C including recovery figures . . . 135
7.7 Endurance results for device D . . . . . . . . . . . . . . . . . 137
7.8 Average improvement and endurance for fittest generation . . 142
7.9 Average improvement and endurance for fittest generation de-




J. Sullivan and C. Ryan. A destructive evolutionary algorithm process.
Soft Computing A Fusion of Foundations, Methodologies and Applications,
15(1):95 102, 2011.
Conferences:
J. Sullivan and C. Ryan. A destructive evolutionary process a pilot imple-
mentation. In Genetic and Evolutionary Computation Conferance, volume
2, page 2167 to 2174. Association for computing machinery, ACM, 2007.
J. Sullivan and C. Ryan. A destructive evolutionary algorithm process. In
Proceedings of the 2007 Frontiers in the Convergence of Bioscience and In-
formation Technologies. IEEE Computer Society, 2007.
1
Glossary of terms
Invisible to set tab tab2
ADC - Analog to Digital Converter
ADI - Analog Devices Incorporated
AI - Artificial intelligence
ALE - Address Latch Enable
ANN - Artificial Neural Network
ATE - Automatic Test Equipment
CHE - Channel Hot Electron
DAC - Digital to Analog Converter
DMA - Direct Memory Access
DRAM - Dynamic Random Access Memory
DUT - Device Under Test
EA - Evolutionary Algorithm
EC - Evolutionary Computation
ECC - Error Correcting Codes
EEPROM - - Electrically Erasable Programmable Read Only Memory
EPROM - Erasable Programmable Read Only Memory
ETOX - Erase Through Oxide
FET - Field Effect Transistor
FLOTOX - Floating Gate Tunnel Oxide
FN - Fowler Nordheim (Tunneling)
GA - Genetic Algorithm
GP - Genetic Programming
GPIO - General Purpose Input Output
2
I2C - Inter integrated circuit or two wire interface
IC - Integrated Circuit
ICE - In-circuit Emulator
IDE - Integrated Development Environment
IFT - Intelligent Field Transmitter
JEDEC - Joint Electron Devices Engineering Council
JPEG - Joint Photographic Experts Group. An electronic picture format
LCD - Liquid Crystal Display
LIT - Limerick Institute of Technology
MCU - Micro Controller Unit
MOSFET - Metal Oxide Silicon Filed Effect Transistor
MP3 - Moving Picture Experts Group Electronic Audio Format
NAND - Not AND, sequential access flash
NOR - Not OR, DMA flash
NV - Non Volatile
NVM - Non Volatile Memory
PIO - Programmable Input Output
PLCC - Plastic leaded chip carrier
Prom - Programmable Read Only Memory
PSEN - Program Store Enable (MCU pin)
RAM - Random Access Memory
RBER - Raw Bit Error Rate
ROM - Read Only Memory
RS232 - Serial communications protocol
3
SBC - Single Board Computer
SBUF - Serial Buffer (MCU internal register)
SFR - Special Function Register
SPI - Serial peripheral Interface
SRAM - Static Random access memory
UART - Universal Asynchronous Receiver/Transmitter
UL - University of Limerick
USB - Universal Serial Bus
UV - Ultra Violet
VT - Voltage Threshold





The research detailed in this thesis seeks to exploit the union of two distinct
cutting edge technologies: Non-Volatile Memory (NVM) and Evolutionary
Algorithms (EA), more specifically a class of EAs called Genetic Algorithms
(GA).
Over recent years, floating gate memory has become the overwhelming
technology of choice for those applications that require nonvolatile semicon-
ductor memory [110, 41, 5]. The growth of hand-held and portable devices
has seen demand for these products grow enormously since their introduction
in the mid-eighties [75, 131].
However, the complete adoption of flash memory to all applications re-
quiring non-volatility is inhibited by a small number of significant negative
characteristics of flash memory silicon [110, 36, 18]. These characteristics in-
clude wear-out and data retention. While silicon designers seek the holy grail
of eliminating these issues entirely they have had to content themselves with
optimising the manufacturing scheme, and with trading one characteristic
for another. Designers seek solutions that will achieve a balance between the
competing goals of reliability1 and cost per bit. In practice, optimisation of
1Reliability usually includes mean time between failure(MTBF) as well as endurance
5
the manufacturing scheme is achieved by modifying the manufacturing pro-
cess and by manipulating a multiplicity of interrelated control parameters in
the form of a series of registers inside each memory chip.
Discovering the ideal register settings is difficult, dogged as it is by var-
ious production variances. In order to obtain a specific outcome, such as
the qualification of a flash part or adherence to a specification sheet, it is
often necessary that individual flash parts be specified or even perform at
below their achievable rate. This approach is taken in order to guarantee
the compliance of all devices that carry the same part number. The problem
is further compounded by the unending drive to achieve higher manufactur-
ing yields by scaling lithographic process geometry and increasing packing
density [6, 83].
Yet flash memory reliability, expressed as endurance and retention fig-
ures, is one of the most important items in NOR2 memory specification
sheets [110]. These specifications define the applications that are suitable for
each device and those that are not [102].
Since this research began, these metrics have become even more impor-
tant in NAND3 memory, and they are key to opening up vast new markets
for flash [7, 132], such as mechanical hard disk replacement and tier one
enterprise storage [75].
Current methods of calculating operating parameter variable values will
involve a number of significant steps as follows[71, 45]
• Design evaluation in which a team of engineers will arrange for the
manufacture of several batches of sample devices. These devices will
be used to ascertain the operating limitations of the current design.
Here, changes may be ordered to correct errors or to provide more
and retention, however MTBF is generally unaffected by operational parameters variation
2So called because its behaviour closely resembles a NOR gate. This type of flash is
direct access memory, i.e. each byte is accessible independently via an address bus
3Named so because its behaviour closely resembles a NAND gate. This type of flash is
sequential access memory [118], i.e. each byte is accessible only by accessing all the data
in that page of data
6
control points;
• At this point, sets of control register variable are tried to establish if the
design can make the endurance and retention figures that are defined
in the preliminary specification documents. Iterative silicon changes
may now occur;
• New finalised pre-production batches are manufactured (provided that
no further design evaluation iterations are required). This batch may
be of a greater scale to facilitate the qualification process. The qualifi-
cation process involves taking a significant sample of devices and testing
them to destruction with a known set of operational parameters. Mod-
ifications to the operational parameters may take place at this point
and the process will iterate until a satisfactory outcome is achieved.
Sometime during this iterative process a production testing software
routine is written for a specific hardware test platform.
• The device will go into production and ongoing production sampling
as well as 100% testing will take place over the life of the part. Further
modifications to the operational parameters may take place in further
revisions of the device to achieve better specification sheet values or to
fix production problems such as yeild or field failures.
This process of finding suitable operational parameters is manual and
requires much engineering input and iteration [56]. The equipment used to
undertake it is costly and requires significant software development effort[71].
Finding ideal operational values is therefore expensive and time consuming,
in a search space that is enormously large, that changes with every silicon
foundry production run and even between wafers in a run [101]. In this con-
text, the above is a heuristic algorithm that locates good enough solutions
to the problem without concern for whether the solution can be proven to be
correct or optimal [15]. Heuristic methods trade-off concerns such as preci-
sion, quality, and accuracy in favor of computational effort. In practice,there
7
is no way of manually searching the entire solution space to find the ideal set
of control register values.
Instead, a high degree of domain knowledge must be maintained by en-
gineers within silicon foundry design and test departments. This knowledge
is used to find values that will get all devices over the minimum endurance
and retention line. This current approach has several disadvantages:
• Finding solutions is a labour intensive engineering operation;
• The solution must be conservative enough to ensure the compliance of
the weakest manufactured device;
• The solution takes no account of the age related performance changes
of the memory array;
• The domain knowledge of foundry test center engineers is often not
held in any formal way;
• Minor changes to the process, such as geometry scaling, require the
rediscovery of variable values and new additions to the knowledge base;
• Major changes to the process, such as moving from single-level to multi-
level cells, or from four levels to eight, present major disruption and
development challenges [6].
Finding ideal solutions is a difficult challenge in a highly dimensional,
combinatorial and dynamically changing, problem landscape. Current meth-
ods are engineering intensive and repetitive. It is in this context that this
research significantly contributes to knowledge and practice.
Here we propose to build and validate a system for finding and optimising
flash memory control parameters that is automated, quick and requires little
engineering iteration. Advanced evolutionary search techniques are used to
perform destructive experimentation on silicon memory in real time, with
the objective of finding a better set of parameter values in return for the
destruction of the silicon itself. We use this information to set the read,
8
write and erase conditions of the device in order to enhance the reliability
of the memory chips. In essence, a GA is charged with exploring how NOR
flash silicon degrades with usage over time. Since the exploration is done on
real silicon in real time, it is costly in those two items when compared to
conventional GA applications, in which the rate of computation is normally
the choke point. This cost challenge is mitigated with the use of small pop-
ulation methods [93] and the inclusion of some acquired domain knowledge
that is formalised into the GA design [130, 11, 70, 88].
We routinely find NOR flash parameter sets that are more than an order
of magnitude better than the specification sheet claims, and up to 300%
better than the factory-calculated set for any given device. This promise of
enhanced reliability opens up applications and markets for NOR as well as
NAND flash, which until now have been inaccessible due to the unreliability
of the silicon.
Since the start of this work, flash has become a key technology and flash
based data retrieval now exposes all other bottlenecks in computation and
in data center infrastructure [24]. Furthermore, as data packing density has
increased, memory applications [77] have become critically sensitive to flash
reliability.
Although an embargo has been placed on this thesis severely limiting the
material that can be published from it, the work has had considerable im-
pact and a follow on program has secured research funding involving several
institutions and companies.
1.2 Central Hypothesis
We put forward the hypothesis that an artificial evolutionary algorithm, such
as a GA, may be used in the process of finding operating parameters for NOR
flash memory in a way that will increase reliability and extend the useful life
of these devices. We will prove this by specifying and building a test platform
incorporating a GA capable of evaluating NOR flash endurance in real time
9
from within the evolutionary process.
1.3 Motivation and Rationale
The discovery of control parameters for flash memory has evolved into some-
thing of a black art. It requires a high degree of domain knowledge on the
part of engineering staff to specify, and then test a solution that will max-
imise reliability. The testing is normally done to an industrial standard such
as JEDEC (Joint Electron Devices Engineering Council) specification [56]
and is time consuming and repetitive [110]. The whole process needs to be
repeated for any modification to either the process or the product specifica-
tion.
The motivation of this research is to test the hypothesis that there is a
better way, and that a GA may be employed to search the solution space in a
more intelligent, automated and altogether better manner. To that end, we
postulate that it is possible to integrate a GA into a hardware test platform
and that the GA can operate directly on the silicon without using a model
or other such form of approximation. The rewards, if successful, will be
better, longer-lived, and more reliable devices in return for reduced effort.
We also hope to achieve cost reduction in the qualification process and define
a structured approach to parameter discovery for flash memory.
1.4 Core Research Questions
To explore the central hypothesis discussed above, this research poses a series
of related core questions:
CQ.1 Can a test platform be specified, built and tested that will test, and
serially retest multiple NOR flash devices in a way that enables a GA
to operate on a population of such devices for the purposes of artificial
evolution?
10
CQ.2 Can such a system deliver an improvement in reliability for the stable
mates of the devices under scrutiny?
CQ.3 Are there any additional advantages to considering such an automated
system, such as a binning or grading solution to separate good devices
from better devices?
These core questions in turn go on to pose a series of more specific research
questions, which are presented here as a list of contentions (Cn.1 to Cn.6).
We contend that it is possible to:
Cn.1 Build a test platform incorporating a GA to perform destructive testing
in real time on hard silicon in order to find values for programming
parameters that will improve the endurance of that device;
Cn.2 Find values for programming parameters such that failure of the device
is estimated in a short period of time by the rapid destruction of small
portions of the NOR flash device, effecting a binning solution that
separates good devices from bad;
Cn.3 Find values, in this way, for programming parameters such that a gen-
eral improvement in endurance can be achieved that applies to all NOR
devices of this type;
Cn.4 Find values for a batch of NOR devices such that an improvement in
endurance can be achieved for that batch;
Cn.5 Find values for a specific device such that an improvement in endurance
can be achieved for a specific device;
Cn.6 Reduce search expense in terms of time and destruction of flash real
estate by using small population methods and by directing certain as-
pects of the search using domain knowledge and the knowledge gained
in previous experiments.
11
1.5 Contribution of this work
The contributions of the work are many fold. A new approach to finding
suitable NOR flash programming parameters is introduced that makes use
of an evolutionary algorithm. Specifically, it has been proven that a genetic
algorithm can be used to automate the process of finding superior parameters
enhancing longevity of NOR flash memory. It has been shown how to achieve
cost reduction in the qualification of flash parts and a new method for grading
devices and mitigating the wasteful process of guard banding was reported.
Many stakeholders were consulted, including multi-national NOR and NAND
players as well as educational and funding institutions and other research
communities. Finally, a great deal was learned and documented about the
application of genetic algorithms to flash memory programming and it was
clearly demonstrated that these devices can be made to be more reliable by
utilising the novel approach introduced here. A comprehensive treatment of
the contributions is present in the final chapter, Chapter 8 in Section 8.4
while detailing how the work has met each of the contentions Cn.1 to Cn.6
listed above.
Since the start of this work, flash memory has moved to become a central
technology in many modern devices and at the same time has become criti-
cally sensitive to reliability issues with the result that this work becomes all
the more salient.
1.6 Organisation of the Thesis
This research was underway for a considerable number of years. In broad
terms, there were four phases:
• Platform Development: Background research and concept, includ-
ing the specification, building and testing of an evolutionary search
hardware platform;
• Calibration: Initial proving runs and a first tranche of experiments;
12
• Data Capture: A second, main tranche of experiments incorporating
what was learned in the first;
• Interpretation: A consolidation phase that included publishing and
consultation with interested industrial parties. This culminated in
the formation of an innovation partnership collaboration, between two
third level institutions and an industrial partner.
Chapters 2 and 3 introduce the background technologies. Flash memory
technologies are discussed from first principles in Chapter 2 while machine
learning and evolutionary computation are reported in Chapter 3.
The first phase, Platform Development is described across several of
the early chapters. Chapter 4, Methodologies and Equipment charts the pos-
sible solutions to the hardware requirements and outlines the rationale for
choosing specific paths. At the end of Chapter 4 a functional test platform
is realised and some initial test results are reported. In Chapter 5 the soft-
ware requirements are tabulated and the developed software solutions are
explained.
The second phase, Calibration, is documented in Chapter 6. The fitness
function and the representation are defined and coded to the hardware. Early
results are presented and this information is used to calibrate the platform
and inform later data collection.
Data Capture is described in Chapter 7 and many test platform runs are
tabulated and interpreted. The first trance of data is presented in some detail
while later sections present modifications and data sets aimed at meeting the
research goals as set out above in Section 1.4.
The final phase, Interpretation set out in Chapter 8 uses the results of
previous chapters to discuss the merits of the novel approaches already pre-
sented. The research goals are compared to the outcomes and a rationale for
further work is explored. A blueprint for future experimentation on NAND
flash memory is also presented here.
Chapter 9 contains the appendix and presents details such as schematic




The problem of flash memory wear-out is central to the work presented in this
thesis. This chapter presents a general introduction to NVM in Sections 2.1
and 2.2. There is a classical approach to explaining the mosfet transistor
which serves as the memory sense element in Section 2.3 while memory array
failure mechanisms are discussed in section 2.4. Finally in Section 2.5 several
popular cell architectures are detailed.
2.1 Introduction to Non-Volatile Memory
The primary function of a memory device is to retain settings. There are
largely three types of memory: RAM, ROM and EPROM. The features of
each type are shown in Figure 2.1.
RAM is volatile (settings are lost on power down) but one may read
and write to it. ROM is non-volatile, but one cannot write to it (settings
are retained on power down). EPROM(erasable programmable read-only
memory) is a family of device types offering a mixture of these primary
features.
Memory devices achieve their function in a variety of ways depending
on the application’s requirements. For example, DRAM (dynamic random
access memory) used as main memory in most computers is required to be
14
Figure 2.1: Principle memory types
cheap to manufacture in large arrays, so it is simple, has few silicon compo-
nents and thus, a small footprint. SRAM (static random access memory),
often used as cache memory, is optimised to be fast(less than 2 nanoseconds
access time). As a result, it is complex with each memory element comprising
six or seven transistor [7], so it has a large footprint. This makes it difficult
to manufacture in large arrays, cost effectively. Both these technologies are
volatile RAM. Volatility is yet another feature that must be balanced against
cost per bit.
Tradeoffs of this type are evident in all memory design. What has so
far proven difficult to achieve for silicon designers is the combination of such
competing goals within the same device. The ideal memory component for
many applications is one that is cheap, large and fast that is writable and
erasable and retains settings when power is switched off. This may seem
like a long list of requirements, but one class of memory, the floating gate
memory device1 can come close to satisfying all of them.
1So called because the control gate’s connection to the outside world has been severed
15
In recent years, floating gate memory devices have become the overwhelm-
ing technology of choice for those applications that require non-volatile semi-
conductor memory [110, 41, 5, 7]. NVM (non-volatile memory) schemes are
currently the only storage option available for small pieces of equipment that,
because of their size or portability, do not contain a mechanical storage device
for program code and data, such as a hard disk. The consumer goods market
is seeing the proliferation of such devices. Smart phones, tablet computing
and in car GPS are commonplace examples of new, yet well-established mar-
kets for NVM technologies. In the recent past this type of memory product
was most often NOR flash and was limited to code storage. Referred to
as XIP or Execute In Place flash memory, NVM has also seen unparalleled
growth in non-program code storage, i.e. data storage, in such devices as
memory sticks, digital cameras and media players.
The growth of NVM memory has been staggering and is shown in fig-
ure 2.2. In each of the five years up to 2008, flash memory market growth
has either outpaced or equaled that of the total integrated circuit (IC) mar-
ket [131, 7]. Flash memory is currently a 25 billion dollar industry with
NAND comprising 20 billion of the total. It is forecast to continue grow-
ing by all commentators as shown in figure 9.5 and the growth rate shows no
sign of abating as new markets are opened by the availability of cheap NAND
flash.“The NAND market has grown faster than any technology in the his-
tory of semiconductors, exceeding 11 billion in 2006, only a decade after its
introduction” [-Jim Handy, Objective Analysis, Flash memory summit 2008].
16
Figure 2.2: Flash memory growth in millions of US dollars
Figure 2.3: Flash memory growth forecast to 2016
17
More recently Solid State Hard disk has grown out of the NAND space
and are exposing all other bottlenecks in both PC design and big data storage
and warehousing. NAND flash drives have become a ‘Must Have’ in all data
warehousing sites due to the breakthrough speeds of SSD[123]. In stark
contrast to hard disks, SSDs have no moving parts and have multiple data
paths. This makes them almost limitlessly fast compared to hard disks.
This is driving the cloud computing revolution since near instant access is a
pre-requisite for moving storage from the local desktop to remote warehouse.
Meanwhile new markets for flash continue to grow in both the enterprise
class, aimed at high end and corporate users and in the consumer end in such
things as PC and tablets.
Figure 2.4: SSD growth forecast in light of cheap NAND Flash
In general, the XIP program code segment of the market is serviced by
the so-called ‘NOR’ NVM [9], which allows direct random access to each byte
18
of memory via an address bus scheme [19], while the data storage segment is
serviced by ‘NAND’ NVM, which allows only sequential access to the data but
can be made in larger arrays and has a much reduced cost per bit [123, 7, 19].
Both of these technologies are electrically erasable.
Sequential access is usually not a problem for data such as music or video
since its natural form is sequential. NAND is also less reliable than NOR [9],
but again, single bit errors in sequential data such as MP3 or JPEGs are less
serious than bit errors in program code data.
NAND is not limited to this type of data, however, and has huge potential
in the nascent PC and enterprise permanent storage market, where many
believe that it will, in the future, replace mechanical disks as the permanent
storage element [5]. A key attribute of semiconductor memory is that it is
very, very fast. This is because it has no moving parts and can be arranged
in parallel arrays for data access, while mechanical hard disks rely on a
single moving arm to position the heads for data access [23]. NAND also has
relatively low power consumption. These factors make flash very attractive
for enterprise class data storage, and for critical mobile applications.
2.2 The Erasable PROM Family
All current erasable NVMs use some sort of floating gate technology. The
traditional erasable NVM or EPROM, is writable only after it has been erased
for a significant number of minutes under ultra violet light. The UV light
ingress knocks charge off the floating gate to bring it to the erased state (there
is a quartz window on the device to facilitate this), while electrical methods
are used to bring the cells to the written state. However, once programmed,
it will retain data without power for many years and it can be made in sizes
of up to several megabytes.
This type of memory is well suited to embedded code sets and other appli-
cations in which the data may change before the equipment in which it serves
leaves the manufacturing environment. If, on the other hand, the application
19
requires non-volatile memory that is modifiable in the field during the life of
the host device, then an approach other than UV erasure is required. This
market niche is filled by electrically erasable read-only memory (EEPROM,
E2PROM or E2PROM). The key feature here is that these memory types
are re-writable at run time under program control.
2.2.1 E2PROM Derivatives
To electrically erase PROM memory, extra circuitry is required [105]. This
circuitry, if it is to be repeated on all cells (bit or byte erasable E2PROM)
adds such a significant circuitry overhead to the device that the packing
density (the number of memory cells per unit area of silicon), is much reduced.
This means that it is hard to make large arrays of byte erasable E2PROM.
To make a large array, this overhead must somehow be cut down. This
can be achieved by applying a common erase circuit to all the cells. Doing
this has the effect of erasing all the cells in the entire device when an erase is
required. However, each byte is still available to read randomly by manipu-
lating an address bus. This is called a bulk erasable E2PROM. Bulk erasable
memories are unsuited to many applications due to the requirement to erase
all locations in order to alter the data in a single memory byte or bit.
An intermediate step is to erase blocks of data together and use a common
erase circuit for each block. Flash memory, so named because segments or
blocks of memory are erased or flashed together [52] which reminded the
inventor of the action of a camera flash [54], is for many, the ideal compromise
of space and usability.
Further space saving can be achieved with the shared use of circuitry
associated with reading the device. These are called NAND devices and are
characterised by portions of the device sharing a single read circuit (known
as a page), while a multiplicity of pages share a single erase circuit (known
as a block) [123]. Pages are read serially rather than randomly, as they are
in NOR, so if a single byte or bit is required from memory, the entire page
must be read at once.
20
A further innovation to increase packing density is the use of MLC (multi-
level cells) [123]. This is where more than one bit of information is stored in
a single memory cell by detecting how much charge is stored on the memory
element, rather than if there is charge stored on the memory element [25].
These approaches to the packing density constraint give rise to three broad
categories [110, 5, 104] of E2PROM:
• Byte Erasable; In which every byte is erasable separately. This is gen-
erally referred to as E2PROM;
• Flash Memory, of which there are a number of subtypes designed to
optimise data packing density;
– Block Erasable Flash; In which cells are grouped together for era-
sure. Available in NOR and NAND architecture
– Block Erasable MLC Flash; In which cells are grouped together
for erasure and for reading, and a cell carries more than one bit
of data
• Bulk Erasable; In that all cells in the device are in the same group and
are erased together.
Flash memory development is a rapidly moving field [110, 5] and a num-
ber of differing approaches and cell architectures are currently in use [89].
Implementation differences further complicate the landscape giving rise to
many different flash memory variants.
2.2.2 Program and Erase Cycle Management
First generation flash and EPROM typically require the host system or pro-
grammer to execute complex algorithms to achieve the write and erase pro-
cesses. Current generation flash devices have an on-chip state machine or mi-
croprocessor that automates write and erase cycles [105, 3, 4]. This frees up
the host system to carry out other tasks while the state machine is working,
21
and simplifies design-in of flash by reducing the software overhead necessary
to manage the device.
During a write, the state machine controls all parameters of the write,
including pulse timing, voltage level, rise-time. It also tracks the number of
pulses issued to the cell and verifies that the data was written correctly. When
executing an erase, typically the state machine first writes all locations within
the block to the written state “0H00” so that each cell contains uniform
charge. The state machine then issues the erase pulses to the cells within
the block and monitors the erase for completion, the erase state in this case
being “0HFF”. Again, the state machine has charge over all parameters of
the erase pulse.
Significantly, these algorithms are not session aware, in that no account is
taken of how many times a particular cell has been through the erase/write
cycle. This will have a bearing on device life due to the destructive nature
of the read/write cycles. A single bit failing to erase or write will condemn
as failed the entire code memory array, as well as any equipment containing
that device. Thus endurance, the ability to withstand multiple erase/write
cycles, is a critical specification sheet metric.
2.3 NVM Floating Gate Basics
There are many different flash memory architectures and variants. How-
ever, all flash memory utilises floating gate technology using some specialised
means to get electrons on and off the floating gate. The floating gate memory
element consists of an electrode that is completely encased in an insulator,
which acts as the gate electrode to a MOSFET (metal-oxide semi-conductor
field effect transistor). Simply put, when there is charge on the floating
gate, it causes a constriction in the conduction channel of the nearby tran-
sistor, thereby reducing the current that can flow. The higher the charge,
the greater the constriction, until no current can flow and the device is said
to be at pinch-off. The next section contains a more classical approach to
22
MOSFET operation.
Figure 2.5: Idealised MOSFET memory element
2.3.1 MOSFET Operation
The operation of a MOSFET can be separated into three different operat-
ing regions, depending on the voltages at each of the three terminals: source,
drain and control gate [134, 121]. For an enhancement-mode n-channel MOS-
FET, the three operational states are cut-off, linear region and saturation.
Figure 2.6: MOSFET transistor and schematic symbol
• During cut-off, when VGS, the Gate-Source bias is less than VT , the
Threshold Voltage2, no current flows according to the simple threshold
2The value of the gate-source voltage at which the conducting channel is said to be
conducting, that is, exceeding a chosen value IT , the Threshold Current.
23
model. In reality a sub threshold leakage current flows from source to
drain which varies exponentially with gate-to-source bias.
• During linear mode, the transistor is turned on, and a channel is present
which allows current to flow between the drain and the source. In this
mode, the MOSFET operates like a resistor that is controlled by the
gate voltage where the voltage across the resistance is the voltage from













equates to the charge-carrier efficiency over the area of
the oxide gate.
• The third region, saturation also referred to as pinch-off. The transistor
is on with the current controlled principally by the gate-source voltage.






(VGS − Vt)2 (2.2)
Figure 2.7 shows a graphical representation of the response curve of the source
to drain current for source to drain voltage at a number of Vt’s for a typical
small signal FET. The value of the passed current shown will vary depending
on FET design. The dotted parabola depicts the boundary between the linear
and the saturated regions. We can summarise the conduction in the drain-
source channels as:




where K is a constant that depends on the gate and substrate material,
doping, and gate oxide thickness, and Q is the charge weighted with respect
to its position in the gate oxide, and Coxis the gate oxide capacitance [89]. As
can be seen, the threshold voltage of the MOSFET can be altered by changing
the amount of charge present between the gate and the channel [53]. In NVM,
we use a second control gate, the floating gate, to position charge close to
24
Figure 2.7: Drain to source current response over drain to source voltage for
various gate voltages. i.e. the behaviour of MOS field effect transistors
the conduction channel and modify the position of Vt. We can detect this
change by carefully measuring the current from source to drain. Conduction
above and below a chosen point, the threshold current, indicates whether or
not the device is holding charge.
2.3.2 The MOSFET as a Memory Storage Element
The electric charge on the floating gate constricts the flow of electrons in
the conduction channel, which is the memory sense element. Conduction in
the sense element above and below a predefined threshold yields logic One
or Zero. Although, as is suggested by its name, the floating gate is entirely
surrounded by an insulating material, (that is, its connection to the outside
world is severed), by applying atypical voltage conditions, it is possible to
induce Hot Electron Injection(CHE) [50, 22] and or Fowler-Nordheim(FN)
25
tunneling [34] to control the charge on the floating gate and thus alter the
conduction in the channel and so, the state of the memory element.
In essence, we use high voltages to accelerate and punch electrons through
the very thin tunnel oxide insulator to reside on the floating gate, where they
influence the conduction of the MOSFET transistor. This insulator can be as
little as 7 nm (70 angstrom) thick. In the absence of these special methods,
the device will retain its charge even when the power is switched off.
2.3.3 The Electrically Erasable Cell
Most electrically erasable or E2PROM devices share the same cell structure as
the UV erasable EPROM cell [17]. Both the E2 and EPROM cells are dual
polysilicon (poly), floating-gate field effect transistors [76]. The first poly
Figure 2.8: EPROM structure
layer is isolated from the control gate by an interpoly dielectric layer and from
the substrate, by a thin oxide layer as can be seen in Figure 2.8. This isolation
allows the first polysilicon layer (floating gate) to store charge. The second
poly layer is connected to the wordline and functions as the control gate.
However, there are two main differences between a E2 cell and an EPROM
cell as shown in Figure 2.9 that allow for electrical erasure of the cell. Firstly,
they have a thinner oxide layer of approximately 70 angstroms to enable
movement of electrons to and from the floating gate during programming
operation. In addition, flash has a deeper source diffusion to further enhance
26
Figure 2.9: Flash structure versus EPROM structure
programming performance. The bitline and the wordline in figure 2.9 above
are arranged in a matrix in order to select a single bit or byte for writing,
reading or erasing [17].
2.3.4 Writing Using Hot Electron Injection
Figure 2.10: Channel Hot Electron(CHE) injection mechanism
Flash, E2 and EPROM implement Channel Hot Electron (CHE) injec-
tion [22, 50] during a write, to place charge on the floating gate. A high
programming voltage (VPP = 12V) is placed on the control gate. This
forces an inversion region to form in the p-type substrate. The drain voltage
is increased to approximately half the control gate voltage (6 volts) while the
source is grounded (0 volts), increasing the voltage drop between the drain
27
and source. With the inversion region formed, the current between drain
and source increases. The resulting high electron flow from source to drain
increases the kinetic energy of the electrons. This causes the electrons to
gain enough energy to overcome the oxide barrier and collect on the floating
gate. After the write is completed, the negative charge on the floating gate
raises the cell’s threshold voltage (Vt) above the logic ‘one’ voltage and into
the logic ‘zero’ area. A sense amp detects the cell current and outputs a ‘0’
for a written cell [76].
2.3.5 Erasing Using Fowler-Nordheim Tunneling
Figure 2.11: Fowler-Nordheim tunneling mechanism
Fowler-Nordheim tunneling [34, 51] or F-N tunneling is employed to re-
move electrons from the floating gate. In order to bring the memory cell
to the erased state using high-voltage source erase, the source is brought
to a high voltage (VPP = 12V), the control gate is grounded (0 volts) and
the drain is left unconnected. The large positive voltage on the source, as
compared to the floating gate, attracts the negatively charged electrons to
the source through the thin oxide. The silicon is sometimes shaped here
to form an injector [121], which improves the efficiency of the erase by the
concentration of the electric field.
28
Because the drain is not connected, the erase function is a much lower
current-per-cell operation than a write. This fact means this erase function
is better suited to bulk erase type operations than CHE injection [41, 121].
However, the erase process is more destructive to cell integrity than the write
method.
After the erase is completed, the lack of charge on the floating gate lowers
the cell’s Vt (Threshold Voltage) below the logic ‘one’ voltage. When an
erased cell’s wordline is brought to logic ‘one’ during a read, the transistor
will turn on and conduct more current than a written cell. Some flash devices
use Fowler-Nordheim tunneling for writes as well as erases.
2.3.6 Reading the Cell
During a read of a byte or word of data, the addressed row (wordline) is
brought to a logic ‘1’ level (≥ Vt of an erased cell). This condition turns on
erased cells which allows current to flow from drain to source, while written
cells remain in the off state with little current flow from drain to source. The
cell current is detected by a sense amp and amplified to the appropriate logic
level for the outputs [76]. All other wordlines within the array remain low.
Because only one wordline needs to be controlled at a time during a read,
the decode overhead is minimized.
2.4 Failure Mechanisms
The failure rate of floating gate arrays is determined in part by dielectric de-
fects, metallisation defects, surface defects, faulty diffusions and other MOS-
type failure mechanisms [55]. The failure rate will be a function of the process
and circuit complexity [110].
In addition to the failure mechanisms in common with MOS semiconductors,
floating gate circuits exhibit additional failure mechanisms that are related
to the non-volatile characteristics of retention and endurance [110, 41, 5, 18].
Failure mechanics in flash and E2PROM are concentrated around two mea-
29
Figure 2.12: Nor cell memory array
surable quantities - Data Retention; the ability to retain the value present
on the floating gate for extended periods (up to 10 years) and Endurance;
the ability to cycle each memory cell through complement a vast number of
times (up to 100,000 depending on application [94]). All failures together
give rise to the classic bathtub curve of flash memory life, shown in figure
2.13 [110, 100]. We see high early, or infant, mortality. These can be screened
out at manufacture. This mortality high is followed by a stable useful-life
stage with low random failure rates, and finally we see window closure, in
which the devices reach the end of their useful life. In flash, this end stage is
usually associated with endurance problems or endurance related retention
failure [56]. The increase in total failure rate that results from endurance and
retention failures will vary widely, depending on the cell type, process and
manufacturer’s screening and quality control techniques [11]. In general, as
the channel and gate oxide have been scaled to deep sub-micron dimensions,
the ultra-thin gate oxide layer tends to suffer from leakage and implantation
30
Figure 2.13: Bathtub curve showing the flash memory life cycle
The flash memory life cycle is characterised by having high infant mortality rate, with
these devices being screened out by 100% testing during manufacture, low random failure
rate during useful life and finally end of life associated with endurance related failures
problems. Read error rate can also cause window closure, especially in NAND
devices where in recent years, geometry’s have shrunk to sub 20 Nano Meter
levels [77] meaning fewer and fewer electrons represent each logic state [81].
Many read errors, such as those related to disturb and charge loss, can be
remedied with an erase. However, when read errors become so common that
the page or block can no longer be reliably read using the error correction
employed by the system, we must stop using that block. In most cases, the
system can withstand a number of block failures, but as the failed block
count increases, the device is effectively at end of life.
2.4.1 Retention Problems
While a data retention requirement of ten years will mean a charge loss of
as little as 5 electrons a day [89], many retention failures in NOR are, in
fact, failures that are associated with the end of endurance, not as a result of
intrinsic retention [102]. Measurement of intrinsic retention is quite difficult
to achieve in a reasonable time frame, and most screens involve a bake at
high temperatures. At high temperatures, apparent activation energies of
1.7eVolts (electron volts) can be achieved [110] and the application of the
31
Arrhenius relationship [55] is then applied to yield a retention figure. Intrinsic
retention problems occur primarily for three reasons:
• Weakness in the oxide, which causes charge on the floating gate to
dissipate
• Thermo mechanical stress that occurs during the packaging stage of
manufacture
• Sodium contamination during the fabrication doping process
Since the most excessive dissipation will occur during the deepest erase,
all of the above can be screened out during manufacture [58], prior to usage in
the field, by the application of screening erase charges that are stronger than
anything the cell will normally encounter. Once removed, any remaining
retention problems are due to tired silicon brought on by program/erase
cycling.
2.4.2 Endurance Failure Mechanism
The effects of endurance cycling on failure rate is a function of cell and circuit
design, as well as process quality [119, 111]. Neither the write nor the erase
cycles are benign processes and both exact a toll on a memory cell’s ability
to be programmed. This is caused by hot electron trapping in the charge
transport oxide [51] showing up as erratic erase bits [22, 18], or dielectric
breakdown.
In the event, it is the F-N Tunneling during the erase that dominates the
destruction of the floating gate array. As the electrons are punched through
the insulating oxide, some charge gets trapped and obstructs the path of
subsequent electrons [41, 89]. The accumulated trapped charge in the oxide
eventually makes the cell un-programmable. That is, the cell’s state can no
longer be altered. Traps and oxide damage occur in proportion to the time
integral of the current density that flows through it - in other words, the
strength and duration of the erase.
32
Once a single bit is stuck in a single state in a NOR device, the entire
array is useless. The specified endurance value of the memory matrix is a
measure of how many write and erase cycles the device can sustain before
it becomes unprogrammable. Typically, this is between 10k and 100k [89].
This is a critical metric of the device. If wear-out can be minimized, then
the life of the part may be more favourably specified.
Detrapping of electrons back to the floating gate occurs during the inter-
vals between erases. Failures can be recovered by additional or longer erase
cycles, or by waiting the natural recovery time or interval between erase/write
program cycles before reusing the affected cell. Recovery time is an impor-
tant metric in establishing the reliability of flash memory devices [56]. It is
especially important in NAND flash where the usage model dominates the
life expectancy of the device in any application [57, 128].
2.4.3 Disturb Failures
A disturb is the inadvertent change of state of a floating gate as a result of
voltages applied to other nodes in the array during the course of read, write
or erase operations.
Disturbs have been observed for as long as floating gate memory elements
have been in use [80, 79]. In order to obtain smaller cell sizes, select transis-
tors that have served to minimize the likelihood of disturbs in byte-alterable
E2PROMs have been omitted in many flash technologies [110].
Disturbs result from the same processes that are used for programming
and erase operations(CHE and F-N). Fortunately, the probability of inadver-
tent injections of charge to or from the floating gate decreases exponentially
with applied voltage. Many mechanisms can supply the conditions required
to produce CHE and F-N based disturb, which may be characterised in a
number of ways.
Disturbs can be characterised by the physical mechanism that caused
33
them, such as CHE disturb, by the mode in which they occur, such as read
disturb, or by whether they increase or decrease the net number of electrons
on the floating gate [80]. Furthermore, each major architecture is susceptible
to some disturbs and immune to others. For more information, there is a
comprehensive discussion on disturb in the IEEE document 1005-1998 [110].
For the purposes of this document, disturb issues will be handled in the
architecture description of the pertinent cells.
2.4.4 Programming Parameter Calculation
There are parameters which affect the wear-out rate of the memory device,
such as erase duration, rate of change of potential and programming cur-
rent [42]. These parameters are controlled by registers inside the flash chip’s
state machine. The values for these registers are ascertained by exhaustive
testing during design, testing and qualification of the part [101]. Optimum
values are often hard to determine since the variables are often interdepen-
dent and must cover all devices and all stages of life, including end-of-life
conditions. Once values have been established, they are embedded into each
device for use during the entire lifetime of the memory chip. This process
is a one-time effort and covers all manufactured devices of this part num-
ber. It is a very expensive undertaking in terms of engineering manpower
and equipment, but takes no account of manufacturing variations or the con-
dition of the device as it progresses through its life in the field. This is
reflected in specification sheet endurance values, which are degraded in order
to guarantee the worst-case scenario.
2.5 Flash Cell Architectures
Non-volatile memory architecture is highly varied and depends heavily on
the application area. Space and cost per bit are two of the most important
competing factors giving rise to a rich diversity of NVM cell body plans [7].
NAND and MLC structures are the most efficient space savers, so their costs
34
per bit are the lowest on the market. These memory types are restricted to
serial access and are not suited to XIP operations [118, 116]. There are many
NOR architectures to consider, but the three most important - those that




For a treatment of other less mainstream devices, see the IEEE document
1005-1998 [110], ICE corporation Flash Memory Technology [52] or Paven et
al [89]. Table 2.1 shows the programming methods in use for the cells listed.
Table 2.1: Comparison of programming methods
Stacked Gate Two Transistor Split Gate
Erase F-N Tunneling F-N Tunneling F-N Tunneling
Write CHE drain side F-N Tunneling CHE Source side
Current 1 Milli Amp 10 Pico Amp 1 Micro Amp
F-N(Fowler Nordheim) CHE(Channel Hot Electron)
2.5.1 Two Transistor
Although this cell has a number of desirable attributes, it has become ap-
parent over time that its cost is a major drawback. The reason for the high
cost per bit is the inclusion of at least two transistors per bit. This extra
select transistor makes the cell popular for use in byte erasable E2PROMs
but it cannot compete with other single transistor technologies for use in
large arrays. While the select transistor affords some protection from pro-
gram and erase disturb problems [36], the passage of electrons under high
potential through the relatively thin oxide causes other problems, such as
35
damage to the oxide lattice structure and electron entrapment. This charge
can be trapped throughout the entire volume of the thin oxide and not just
at the injectors, as in split gate cells. Electrons can move from site to site
from the floating gate to the substrate, manifesting themselves as leakage of
charge from the floating gate. These Leaky bits cause data retention failures
related to program/erase(p/e) cycling, eventually causing an oxide rupture
and an immediate discharge of floating gate charge.
2.5.2 Stacked Gate
The stacked gate is shown in Figure 2.14 and is, in many ways, the classic
NVM cell as discussed earlier in this chapter. The select transistor is omit-
ted, allowing greater cell density and thus larger arrays within the same die
size and cost point. The isolation of the first polysilicon layer (the floating
gate) allows the cell to store charge. The second poly layer is connected to
the wordline and functions as the control gate. However, there is no select
transistor as there is in the two-transistor cell. This omission leads to a ten-
Figure 2.14: Stacked gate EPROM structure
dency for the cell to suffer from erase disturb issues [92, 122]. Also, trapped
charge can create dispersion in the erase threshold due to uncontrollable
quantum-mechanical effects, leading to erratic erase problems. This in turn
makes the cell prone to the over erase condition [21]. Over erase manifests
itself as single columns of memory cells being stuck at ‘1’ during Read. Once
36
stuck, these columns cannot be reprogrammed. Hence, over erase cannot be
recovered by using standard operating conditions. A further weakness in the
stacked gate approach is the high source-drain current during programming,
which can cause electro migration, interface trapping [1], and contact spik-
ing problems at the source and drain, resulting in single bit, row, or column
failures. The high current requirement for erasing large blocks also creates
challenges in terms of the on-board charge pump, where sourcing the current
is a significant problem [110, 104].
2.5.3 Split Gate
The split gate cell can be considered a hybrid of the stacked gate cell and a
two-transistor cell. The split gate configuration is formed by using a single
poly segment to form both the control gate and a select transistor on the
source side of a floating gate, as shown in Figure 2.15. This configuration is
sometimes referred to as one and a half transistor cell or 1.5T cell.
Figure 2.15: Split gate memory cell
The split gate memory cell configuration allows the memory cell threshold
voltage to be determined by the floating gate transistor when the memory
cell is programmed, and by the split-gate transistor when the memory cell is
erased. Programming of the cells on the unselected rows is therefore inhibited
37
by the source-side split gate or select transistor. This means that program
disturb issues are less severe in this cell than in many others, and they are,
in the main, a result of manufacturing defects which can be screened out by
100% testing [103, 110] during fabrication. Read disturbs are not an issue
because the cells are usually not read long enough to reach a significant
probability of failure [13].
Figure 2.16: Split gate memory cell die photograph
The split gate configuration also allows the erased threshold distribution
to be tightly controlled since all the floating gate transistors are over-erased
to depletion. This means that the cell is effectively immune to the over erase
problem [10]. The erase current is low at 100uA, so bulk erase can easily be
achieved using an on-chip charge pump.
The strong localised fields in the immediate neighbourhood of the edge
injector will create electron trap sites [1, 53]. Eventually, enough electrons
are trapped in this localised region to prevent single bits from erasing. The
amount of charge trapped is dependent on the total charge transferred [1, 53],
i.e. the strength of the erase and its duration. This is moderated somewhat
by detrapping. Detrapping of electrons back to the floating gate occurs dur-
ing the intervals between erases. Some split gate device failures can be re-
covered by waiting the natural interval between erase/program cycles before
reusing the affected cell, or by additional erase cycles.
There are few interface trap sites generated, and few charges trapped in
38
the oxide other than at the injector [1]. Electro migration, contact spiking
and junction ruptures are less of a problem with the split gate device since
it has a low source-drain current during programming.
This split gate cell is not the most space efficient option available, so it
is not that well suited to vast memory arrays in the same way that NAND
MLCs are. However, it is ideally suited to embedded XIP solutions since
code spaces don’t tend to be vast, but do require random access with very
low read errors and excellent retention figures. The split gate device is suited
to this research for a number of reasons:
• It is the most popular NOR architecture and therefore represents the
most significant portion of the NOR market
• The technical features outlined above; that is, the lack of an over erase
or a significant disturb mechanism [92], means we may experiment with-
out further screening
• Due to the support of Analog Devices Inc.(ADI) in the early part of this
research, we have access to large quantities of unused pre-production
NOR arrays of this type
2.6 Summary
In this chapter we review the basics of silicon memory. The main sub-
categories of non-volatile silicon storage are discussed and floating gate prin-
ciples are explored in some depth. The importance of cost per bit and struc-
ture size is highlighted with some discussion on the various approached used
to achieve cost reduction for the market. Failure mechanisms are considered
for each of the major flash memory storage cell architectures and the classic
bathtub curve of memory life is presented. Finally we explain why the split




The tenet of the work presented in this thesis is the application of a bio-
logically inspired computation method known as a genetic algorithm to the
problem of NOR flash wear-out. GAs are part of a broader class of meth-
ods named evolutionary algorithms(EA). This chapter presents a general
introduction to EA in Sections 3.1 and 3.2, dealing with operators, represen-
tations and the terminology of EAs. In Section 3.3, the most important EA
paradigms are discussed with an outline of their operation. GAs are explored,
in that section, from several different perspectives. Finally in Section 3.4,
some other pertinent algorithms are mentioned and we go on in section 3.5
to survey some of the more salient EA and NOR flash literture, to provide a
context to the choices made in attempting to solve the problem at hand.
3.1 Evolutionary Computation
Historically, the principles of evolution have helped scientists to explain the
characteristics of biological and ecological systems. Darwin’s seminal work
‘On the Origin of Species’ [26] provides us with a well-proven explanation
for the mechanisms of adaptation and phenotypic variations in organisms
under environmental selection pressures. The application of these principles
to computation, or to evolutionary computation(EC), enables us to trace
40
the roots of some of the key ideas back to the 1930s and Sewell Wright’s
influential work on evolutionary theory [129]. However, it wasn’t until the
advent of relatively cheap computing cycles in the 1960s that evolutionary
computation became a viable proposition.
This was a catalyst for the field, and around this time, one sees important
work on evolving finite state machines by Fogal at UCLA [32] and, in 1966,
Holland at the University of Michigan evolving robust adaptive systems [47],
and Rechenberg and Schwefel at the Technical University of Berlin on pa-
rameter optimisation [98, 99].
Evolutionary search methods have had tremendous success in solving
smooth unimodal and noisy multi-modal [38, 28, 120] as well as combina-
torial optimisation problems [91]. This is explored futher in section 3.5.
This success has extended to previously unsolved or difficult-to-solve prob-
lems [33] by virtue of the iterative strength of the computer coupled with the
flair of natural selection in a structured yet randomised information exchange
between generations of gene carriers.
An evolutionary system is a Darwinian process that, given initial condi-
tions, follows a trajectory over time through a complex state space, under
the guidance of evolutionary pressure.
For true Darwinian evolution to exist, it must embody the following at-
tributes [28].
• One or more population of individuals competing for a limited resource
• The population must be dynamic in that it involves the birth and death
of individuals
• The concept of some individuals being fitter and thus more able to
survive and reproduce
• Variation, in that offspring may resemble parents but are not identical
to them
EC is a weak artificial intelligence(AI) method, as opposed to the strong
AI methods of earlier work [15]. The fundamental difference is that strong
41
methods rely on a programmer to explicitly represent domain knowledge and
describe to the system how this knowledge is to be used [78] for example an
Expert System [68, 112]. Weak methods, on the other hand, generally do
not contain any explicitly programmed knowledge, and instead attempt to
learn and organise domain specific knowledge themselves. EC are stochastic
methods. That is to say, success is not guaranteed each time they are run.
For this reason, there are typically multiple independent runs carried out for
each experiment.
Evolutionary algorithm systems are viewed by many as an optimisation
technique, a way of automating tedious repetitive optimisation problems. An
alternative perspective is to view an EA as a complex adaptive system that if
applied to a dynamically changing landscape, will adapt and produce optimal
or near optimal solution time after time. The truth is that it is both [28]. The
power of EC is that it can repetitively solve tedious optimisation problems
in a dynamically changing environment in which the solution landscape may
be multi-modal(many peaked) and may not be continuous or linear.
EAs have been applied to many such tasks [97, 86] giving rise to several
sub-areas and a multitude of hybrid areas. However, the best known algo-
rithms include evolutionary programming, genetic programming, grammati-
cal evolution, evolutionary strategies and genetic algorithms [120, 97, 28].
All evolutionary systems have parameters that influence the Darwinian
process. Parameters such as encoding, population size, reproduction meth-
ods and mutation rates - the nuances of each of these materially affect the
performance of the EA, and in many cases, define the branch of EA to which
the method belongs. For a full treatment of all the major EA subareas and
parameters, see Dejong [28]
EAs are biologically inspired [108]. They include mechanisms such as
heredity, gene frequencies, recombination and mutation. Much of the sur-
rounding terminology has been borrowed from the field of genetics as well
as the Mendelian understanding of biological structures. The genotype is
defined as the abstract collection of genes possessed by an individual, and
42
so the genotype is a string of genes known as a chromosome. Each gene re-
sides at a position, the locus and has a specific value - the Allele. The genes
are expressed as properties Phenotypes of the individual. In nature, a single
gene can map to multiple qualities of the phenotype. This is referred to as
pleiotrophy. The opposite has also been observed, where a single quality of
the phenotype is affected by multiple genes Polygeny. It is unusual for simu-
lated evolution to use polygeny and pleiotrophy, and this can be considered
a simplification of the natural process by evolutionary computation [95].
3.2 EA Operation and Operators
To use an evolutionary approach to solving a particular problem, one must
know at least the following two pieces of information:
1. How can a candidate solution be represented?
2. How can a candidate solution be evaluated to generate a fitness or a
value that represents the quality of the solution?
When these questions have been answered, the evolutionary process may




3. If finished run, go to step 6;
4. Generate a new population;
5. Go to 2;
6. If not finished all runs, go to step 1.
43
Figure 3.1: An overview of a standard evolutionary algorithm
Step 1: Initialise Population
The initial population is created randomly. Although it may sound
counter-intuitive that a randomly initialised population will evolve into some-
thing useful (if not optimal), it is important for the EA to be given a good
spread of individuals at the start. This equates to the diversity of the gene
pool.
Step 2: Test Individuals
Test all the individuals and generate the fitness function.
Step 3: Finished run?
A run is typically terminated after one of the following conditions occurs:
44
1. The problem is solved;
2. A certain amount of time has passed;
3. Evolution has stagnated.
Step 4: Generate a New Population Using information gained from the
fitness function, the EA first selects individuals that will contribute their ge-
netic material to the new generation of the population. Individuals selected
are then subjected to genetic operators, such as mutation, crossover dis-
cussed in Section 3.2.2 and 3.2.3
Step 6: Finished Experiment? EA experiments almost always in-
volve multiple independent runs due to their stochastic nature. The number
of runs required varies according to the problem. For runs requiring rigor-
ous statistical analysis, a minimum of thirty runs is recommended (usually
considered to be the minimum number of samples required for statistical
analysis). However, for real-world applications, considerably fewer can be
used [93], and the stopping criterion is often whether or not a solution has
been encountered that is good enough.
3.2.1 Selection
Selection is a powerful mechanism to ensure that a breeding pool is made
up of the best genetic material available in the current population. Selection
mechanisms vary from one evolutionary algorithm to another; two popular
mechanisms are tournament and fitness-proportionate selection. In tourna-
ment selection, arbitrarily chosen individuals from the population are com-
pared to one another with the winner going on to be represented in the next
generation.
Fitness-proportionate selection operates by assigning a bias figure to each
individual according to its fitness proportion (that is, its fitness compared
to the fitness of the whole population). This mechanism is often referred to
as roulette wheel selection since it is similar to assigning each individual a
45
slice of a roulette wheel and spinning the wheel. Since the slice of the wheel
assigned to each individual is proportional to the individual’s fitness, the
wheel is more likely to stop on fitter individuals.
This mechanism is probabilistic in manner, wherein the fitter individuals
are more likely to contribute, and the less fit individuals are less likely to
contribute but still have a small chance to do so, in case they may have some
latent useful genetic material. For example, the probability that individual
pi would be selected to contribute is calculated as in Equation 3.1. This gives
individual pi a probability of being selected that is equal to its proportion of






The purpose of crossover is to exchange portions of genetic material from
parent individuals in the hope that their offspring exhibit the best traits of
both parents and are therefore fitter [47, 29]. Crossover may take the form
of simply splitting a gene randomly and splicing in a section from another
parent, or it may be a more complex crossing of tree sections, all the while
ensuring the integrity of the new tree structure. In any event, crossover is a
speculative process in which new children are produced with new features,
which equate to new search points. Some of these features may be beneficial
and some may not. The rate of crossover, can influence the intensity of search
but will also influence the rate of destruction of good genetic code.
Crossover involves two individuals (both chosen from the mating pool)
whose gene expressions are split with the first part of one and the second
part of another going to make a child offspring. There are other schemes in
use in which the genes are cut twice (2-point crossover) or even several times
(N point crossover) [27].
Evolutionary algorithms excel by balancing exploration and exploitation.
Exploitation is generally achieved through selection as described above, which
46
Figure 3.2: Crossover in genetic algorithms
biases the search towards certain kinds of individuals. Exploration is achieved
through crossover and mutation as they introduce speculation to the current
solutions. Crossover generally makes larger steps by exchanging feature ma-
terial between two different individuals. For example in a genetic algorithm
crossover results in the exchange of schemata or building blocks between
individuals. Good schemata or similarity templates (first discussed by Hol-
land 1968 [47]) are described by Goldberg [38] as “matching string subsets
that occur in defined places in highly fit individuals”. The crossover process
sometimes destroys feature material such as good schemata, particularly if
they are of long defining length but schemata of short defining length are
sampled at an exponentially increasing rate from generation to generation.
This fundamental implicit parallelism is a bonus for the EA approach and is
described in more detail in Section 3.3.3. Mutation tends to make smaller
steps by exploring small changes of allele and is less likely to cause disruption
of useful schemata.
3.2.3 Mutation
The mutation operator is performed to introduce minor changes to the ge-
netic material of individuals and can simply involves randomly flipping one
or more bits before copying the individual into the next generation. It is
47
often applied directly after crossover.
Mutation can take the form of simple bit flipping or it can be a more com-
plex ‘step and number’ arrangement where number is the number of genes
modified and step is the range over which they can be modified. The rate of
mutation is important since it does not always bring positive results. Keep-
ing the number of genes modified and the step large will bias the exploration
along the axes of the search space but can cause problems when the genes
are polygenic [29]. For single parent reproduction, mutation provides all
the variation that differentiates a child from its parent [28] and therefore is
the only thing driving adaptation. In other arrangements such as GA, mu-
tation plays a secondary role in adaptation and may be regarded more as
an insurance policy against the loss of useful genetic code by over-zealous
selection and crossover [38]. Mutation, despite its random, undirected im-
plementation, acts as an effective exploration mechanism by making minor
local adjustments to individuals produced via crossover.
Figure 3.3: Mutation in genetic algorithms.
3.2.4 Representation
The representation used for a particular problem, together with the set of
genetic operators, constitute the most essential components of any evolution-
ary algorithm [120]. These are the key elements that distinguish between the
various evolutionary paradigms. It is the representation of the candidate
48
solutions that gives rise to these subdivisions.
If a solution can be represented using a binary string and all solutions fit
the same schema, then the process may be better described as a genetic algo-
rithm [47, 38]. In contrast, some other problems need a variable length solu-
tion structure, therefore a tree-based representation can be more appropriate.
In this case, the process is often called genetic programming(GP) [64, 65].
A well known example of this kind of problem is the search for math-
ematical functions that map a set of input values to a set of outputs with
minimum error (symbolic regression). Still another paradigm is evolutionary
strategies. This paradigm was developed to solve parameter optimisation
problems [98, 99], and represents an individual as a pair of float-valued vec-
tors V=(x,σ).
3.3 Canonical EAs
While today there is a diverse range and a staggering number of evolutionary
algorithms, in essence, there are only four major EA paradigms
3.3.1 Genetic Programming
Genetic Programming [65, 66] can be considered as a specialisation of genetic
algorithms. Genetic algorithms work with a coding of binary strings which
may cause problems with the representation of certain types of problems.
This is especially true where the desired solution is hierarchical and where
the size and shape of the solution is unknown in advance [64, 90].
GP starts with a population of thousands of randomly created computer
programs, or primitive functions, for each branch of the to-be-evolved pro-
gram. This population of programs is progressively evolved over a series of
generations. GP traditionally represents programs in memory as tree struc-
tures since trees can be easily evaluated in a recursive manner.
Every internal tree node has an operator function and every terminal node
has an operand, making mathematical expressions easy to evolve and eval-
49
Figure 3.4: A genetic programming tree structure example
Figure 3.5: GP Tree structure is spliced
uate. Figure 3.4 to 3.6 shows an example of a tree structure representation
and how it might be manipulated by an evolutionary process.
GP is well suited to evolving the tree-like constructions of mathematical
formulae, as traditionally, GP favours the use of programming languages that
naturally embody tree-like structures such as Lisp or Prolog, although there
are implementations of C and other programming languages. There are 5
major steps in using GP [120]:
50
Figure 3.6: The resultant crossed individual
1. Selection of terminals;
2. Selection of functions;
3. Identification of the evaluation function;
4. Selection of the parameters of the system;
5. selection of the termination condition.
The set of all functions and terminals is selected in such a way that all of the
trees form a solution. The evaluation function assigns a fitness value that
evaluates the performance of the tree at solving the problem at hand, which
is based on a preselected series of test cases. The selection of contributors to
the next generation is proportional to their fitness.
Crossover is the main operator and works by exchanging sub trees be-
tween the selected parents. Mutation takes the form of new random sub
trees appearing at a selected node. In general, GP is more successful at
finding solutions if it is given enough training cases, and if the functions and
terminals supplied are appropriate for the given problem.
51
3.3.2 Evolutionary Strategies
This technique is inspired by a macro-level or the species-level process of evo-
lution (phenotype, heredity, variation) and is not concerned with the genetic
mechanisms of evolution (genome, chromosomes, genes, alleles) [15].
ES is an optimisation technique with a representation that is problem
dependent. ES algorithms have their basis in the (µ + λ) format, where
µ is the size of the parent population and λ is the size of the offspring
population [98]. These algorithms were developed as a method of solving
parameter optimisation problems and as such, an individual is represented
as a pair of float-valued vectors V=(x,σ). The simplest implementation of
this form is actually (1 + 1)-ES [98] and is called two-membered ES.
With this technique, there are no more than two rules:
1. Change all variables at a time, mostly slightly and at random;
2. If the new set of variables does not diminish the goodness of the device,
then continue.
This simple algorithm could get stuck at certain positions [98] if nearest-
neighbour positions were all worse than the current position. This prompted
the refinements of the multi-member forms of the (µ+ λ)-ES and (µ, λ)-ES.
In (µ+ λ)-ES, in which more than one offspring is created at a time or in a
generation, and, to keep the population size constant, the worst of all (µ+λ)
individuals are discarded. The (µ, λ)-ES, in which the selection takes place
among the offspring only, and their parents are ‘forgotten’ no matter how
well or how badly their fitness compared to that of the new generation.
ES is useful for continuous function optimisation, maximising the suit-
ability of a collection of candidate solution to a problem and it will often
achieve good results with modest population size.
3.3.3 Genetic Algorithms
Genetic algorithms(GAs) can trace their roots back to the work done in the
late 1960s at the University of Michigan led by John Holland [47]. Tra-
52
ditional calculus-based enumerative optimisation techniques, such as hill-
climbing and steepest descent, require that the function is continuous, uni-
modal and has ever-present derivatives. However, this does not accurately
represent many real-world applications as discussed later here, which are
multi-modal, may be noisy and/or discontinuous. Genetic algorithms make
no such assumptions about the solution to a problem and are inspired by
evolution at the population level.
Specifically, a GA was chosen here because of its ability to succeed in noisy
environments [31], where the search space may not be well understood [33, 30]
and because the typical bit string representation maps easily to the control
parameter of a NOR flash memory array - 8-bit binary registers.
These registers control every free variable at play in the operation of
NOR flash, i.e. controlling the free variables represents a complete solution
to the problem. The non-free or immovable variables represent variations and
modifications which occur that are beyond the control of the GA, such as
variation due to manufacturing differences, and variation due to geographical
location on the silicon wafer and die. These variations effectively make the
solution space noisy and are discussed in more detail in Chapter 4.
There are four ways in which GAs differs from traditional search methods:
• GAs work with a coding of the parameter set rather than the parame-
ters themselves;
• GAs search from a population of points rather than a single point;
• GAs use a payoff (objective or fitness function) to direct the search
rather than derivatives or other auxiliary knowledge;
• GAs use probabilistic transition rules, not deterministic rules.
There are two issues that need to be resolved before any GA can run.
These are the representation and the fitness function. There are two key
criteria of representation [95]:
53
• First, individuals must be complete. This means that they should
contain enough information for an entire solution, or at least enough
to enable the deterministic creation of an entire solution;
• Second, the representation should be such that the GA can use it to
recombine two solutions with each other.
Binary encoding is considered an excellent choice for problems in which the
solution points map to a string of zeros and ones [95]. The problem at hand
is such a case, in that the entire operation and thus the reliability of the
memory is controlled by the register set.
The fitness function is a measure of how well the GA-devised solution
solves the problem. The objective function maps that measure to the indi-
vidual’s fitness, taking into account such things as scaling and normalisation.
In nature, these things are one and the same [38], but with simulated evo-
lution, there is the opportunity (and according to Goldberg, the duty [38])
to regulate competition amongst members of the population. This allows us
to tailor the operation of the search to meet our requirements. The GA uses
the fitness array of a given generation to create the mating pool which will
in turn generate the next.
Why GAs work
The combined effect of selection, crossover and mutation gives us the so-called
Reproductive Schema Growth Equation [47] shown below in Equation 3.2.
n(S, t+ 1) ≥ n(S, t).eval(S, t)/F (t)[1− Pc. δ(S)
m− 1
− o(S).Pm] (3.2)
The schema idea is that the chromosome has an expression (or expressions)
within, denoted by S, where S is fit and defined over an alphabet of 0, 1
and don’t care(0,1,*). Holland noticed that there were similarities in the bit
string of fit genes and he called these ‘similarity templates’ or ‘schemata’.
Another expression used to describe them is ‘building blocks’, in that fit
54
genes may contain similar building blocks that contribute to the fitness of
individuals.
The probability of a schema turning up in subsequent generations is given
by the equation where n(S, t) denotes the number of strings in a population
at time t, matched by the schema S; δ(S) is the length of the schema and
o(S) is the order of the schema (the number of 0 and 1 positions). Eval(S, t)
is the fitness of all strings in the population matched by the schema while
F(t) is the total fitness of the whole population at a time t. The properties
Pc and Pm denote the probabilities of crossover and mutation.
In essence, the above Equation 3.2 tells us about the expected number of
strings matching the schema S in the next generation, as a function of the
number of matching schema, the fitness of the schema, and its length and
order.
It is clear from the above that short schemata of low order with high
fitness will re-occur with exponential frequency in subsequent generations.
This does not add anything new to the gene pool but rather fills the gene
pool full of the existing fit schemata. Crossover and mutation will speculate
on new schemata or search points, but Equation 3.2 tells us that their com-
bined disruptiveness is not significant on short, low order schemata. There
are several more ways of expressing this relationship.
Holland said
Short, low order, above-average schema receive increasing trials
in subsequent generations of a genetic algorithm.
GA in Operation
Consider Figure 3.7. The y-axis denotes the fitness awarded for the corre-
sponding position on the x-axis, while the x-axis represents the individuals.
The relationship between the fitness and the representation is the fitness
landscape, and its shape and characteristics dictate how difficult a problem
will be for the search algorithm to solve.
55
Figure 3.7: Individuals on a fitness landscape. - Left, generation 1. Right,
after some evolution
Figure 3.7 shows the problem after the first generation (left) and shows it
again (right) after some evolution. The GA, guided by the fitness function,
will drive the population up the curve in the landscape. This is a particularly
easy problem as there is a gentle gradient and no local optima in which the
search could become trapped.
Figure 3.8: A deceptive fitness landscape.
Consider the fitness landscape in Figure 3.8. This is a deceptive landscape
in that although the highest peak occurs on the right, the search algorithm
56
is more likely to tend towards the local optimum on the near left.
Once the search begins moving towards the local optimum, it becomes
increasingly difficult to move towards the global optimum. Most search tech-
niques will fail on this type of problem. A GA, on the other hand, will not
be fooled so easily. Figure 3.9 shows an even more extreme example. Here
we have the well-known Rastrigin mathematical function, which is often used
for benchmarking GA’s performance. In this problem, in which the goal is
to minimise the function, the difficulty can be augmented by increasing the
number of local optima. Thus, the function in the top left is the easiest func-
tion and can typically be solved easily by a GA, while the one in the bottom
right is virtually unsolvable by all search methods. Rastrigin’s function can




(x2i − A cos(2πxi));∀i ∈ [1..n], xi ∈ [05.12, 5.12] (3.3)
The problem in Figure 3.9 can be a very difficult one. It is only a two
dimensional problem; that is, there are only two parameters that can be
varied. It is not uncommon for GAs to tackle problems of significantly higher
dimensionality.
To summarise, a genetic algorithm is a robust search algorithm modeled
on the principles of natural evolution. Genetic algorithms work from a pop-
ulation, which reduces the probability of reaching a false peak. GAs work so
well because they ignore all information except that relating to payoff, while
other methods rely heavily on such secondary information. Their binary
string representation suits the other artifacts of this research precisely.
3.4 Other EAs and Adaptive Systems
3.4.1 Grammatical Evolution
The Grammatical Evolution algorithm [97] is inspired by the biological pro-
cess used for generating proteins from genetic material, as well as the broader
57
Figure 3.9: Rastrigin function
Rastrigin function is a GA benchmark where the goal is to minimise the function. The
difficulty can be varied by introducing more local optima.
genetic evolutionary process. The phenotype is a computer program that is
created from a binary string based genome. A population of programs is
evolved in a sub-symbolic form as variable length binary strings using a
standard GA. Then, they are mapped to a symbolic structured form as a
context-free Backus Normal(or Backus Naur Form BNF) grammar for exe-
cution.
Candidate solutions are evaluated by comparing the output against the
target function and taking the sum of the absolute errors over a number of
trials. Programs that contain a single term or those that return an invalid
or infinite result are penalized with an enormous error value. First proposed
by Ryan and O’Neill at The University of Limerick, for further information
the same authors provide a high-level introduction to grammatical evolution
58
with demonstration applications [87].
3.4.2 Evolutionary Programming
This branch of EA was inspired by the desire to automate the process of
computer program writing, which it was hoped would ultimately give rise to
an autonomous learning machine. It was originally conceived by Lawrence J.
Fogel in 1960 as an evolutionary learning process with the aim of generating
artificial intelligence [32], initially using finite state machines as intelligent
agents.
Evolutionary programming is a methodology that concentrates on models
of a fixed size population of parents, each of which produces a single offspring.
Subsequent generations are determined by combining parents and offspring
into a single population of the same size as the first, with the worst performers
being dropped from the population to achieve this. The size of the population
and the intensity of mutation can be varied, depending on the application.
3.4.3 Learning Classifier
A learning classifier system, or LCS, is a machine learning system closely
linked to reinforcement learning and genetic algorithms. First described by
Holland at the University of Michigan [48], an LCS is an adaptive system that
learns to perform the best action given an input or set of inputs. Here, ‘Best’
generally means the action that will receive the most reward, highest fitness
or reinforcement from the system and ‘Input’ means a vector of numerical
values.
The objective of the learning classifier system algorithm is to optimise
payoff based on exposure to stimuli from a problem-specific environment [67].
It achieves this objective by managing credit assignment for those rules that
prove useful, and searching for new rules and new variations on existing rules
using the evolutionary process.
Inside the LCS, there is a population of ‘Condition-Action Rules’ called
59
classifiers. When a particular input is detected, the LCS forms a ‘Match Set’
of classifiers whose conditions are satisfied by that input. So a classifier may
be a set of rules relating to a set of inputs. If the classifier input is satisfied,
then the rule joins the population of matches and influences the system’s
action, along with all other matches. Each classifier advocates an action and
also contains a prediction of payoff, if that action is taken.
The qualifying classifiers are grouped by action and each group’s action
payoff is averaged. The average prediction is then weighed by fitness. After
the action is completed, the fitness adjusts the payoffs of the classifiers to
make them more accurate. The fitness is generally regarded as the accuracy
of the classifier, or the inverse of the error.
So while payoff is used to calculate which action to use, fitness is used to
modify the population of classifiers so that they get better and change over
the learning period of the system. Population size remains fixed and children
compete with parents for position.
LCSs can be split into two types, depending on where the genetic algo-
rithm acts [15]. A Pittsburgh-type LCS [107] has a population of separate
rule sets, where the genetic algorithm recombines and reproduces the best of
these rule sets.
In a Michigan-style LCS [16] there is only a single population and the
algorithm’s action focuses on selecting the best classifiers within that rule
set.
Many important aspects of LCS are omitted in the above brief, includ-
ing among others; use in sequential (multi-step) tasks, modifications for
non-Markov (locally ambiguous) environments, learning in the presence of
noise, incorporation of continuous-valued actions, learning of relational con-
cepts, learning of hyper-heuristics, and use for online function approximation
and clustering. For more information, Booker, Goldberg, and Holland [12]
provide a classical introduction to learning classifier systems including an
overview of the state of the field and the algorithm in detail.
60
3.4.4 Other Algorithms
While there are several other evolutionary and adaptive methods that could
be mentioned in this context, as well as a host of non-evolutionary methods
such as hill climbing, random and tabu search to list but a few, it goes beyond
the scope of this thesis to be all-inclusive. For further reading on search
and optimisation, Brownlee provides a comprehensive list and description of
evolutionary and adaptive methods using a standardised approach [15].
3.5 Literature Survey
3.5.1 Evolutionary Algorithms
Many real-world applications are multi-modal, may be discontinuous or may
be noisy. Many may have all three characteristics to some degree and are
generally more suited to stochastic or probabilistic methods, rather than de-
terministic optimization [96, 38]. Brownlee [15] asserts that algorithms from
the fields of Computational Intelligence, Biologically Inspired Computing,
and Metaheuristics are applied to difficult problems, to which more tradi-
tional approaches may not be suited. J. C. Spall [109] said that ‘Algorithms
that exploit randomness are not random in behaviour, rather they sample a
problem space in a biased manner, focusing on areas of interest and neglecting
less interesting areas’. In this context, Michalewicz and Fogel [74] propose
five reasons why problems may be generally difficult to solve, namely:
1. The number of possible solutions in the search space is so large as to
forbid an exhaustive search for the best answer;
2. The problem is so complicated that just to facilitate any answer at all,
we have to use such simplified models of the problem that any result is
essentially useless;
3. The evaluation function that describes the quality of any proposed so-
lution is noisy or varies with time, thereby requiring not just a single
61
solution but an entire set of solutions;
4. The possible solutions are so heavily constrained that constructing even
one feasible answer is difficult, let alone searching for an optimal solu-
tion;
5. The persons solving the problem are inadequately prepared, or there is
a barrier that prevents them from discovering a solution.
All of the above reasons apply to the problem of reducing silicon wear by
optimising operational parameters.
Evolutionary search methods [38, 28, 120] have repeatedly been shown to
provide innovative and robust solutions to smooth unimodal and noisy multi
modal, as well as combinatorial optimisation problems [91, 33]. Research has
shown time and again the central role that evolutionary systems can play
in finding solutions to difficult optimisation and adaptation problems [63,
120]. Countless examples are to be found in academic literature reporting
the success of EAs at solving a multitude of problems in engineering and
science [69, 59, 115, 97, 120], including previously unsolved or difficult to
solve problems.
Mitchell [78] lists nine broad area where GAs have specifically been shown
to be effective in real world applications, namely, optimisation, automatic
programming, machine learning, economics, immune system research, ecol-
ogy, populations genetics, evolution and learning and social systems. Mitchell
further asserts that a GA is a good method to use in the context of large
problem spaces, which may be noisy and that are known to be not perfectly
smooth, and in which a fast found, sufficiently ‘good’ solution will suffice,
rather than a global optimum.
3.5.2 EA Application
GA have also been shown to be innovative, considering solutions that may
not normally be considered by a human practitioner. Hornby et al [49] used a
62
GA to design spacecraft X-band antennae, shown in figure 3.10, in a fraction
of the time it would take for human designers to complete the same task. In
each case, the solutions performed to the specifications and were developed
rapidly, with minimal human effort.
Figure 3.10: GA Driven Innovative X-band Antenae Design
King et al [59] used a GA to solve the problem of river and reservoir sys-
tem management for maximum economic return. King found that standard
approaches represented a trade-off between model accuracy and optimisation
capability. The GA approach allowed the use of an accurate system model,
while retaining a powerful search capability.
Baluja et al [8] used Evolutionary Algorithms to automatically set pa-
rameters that controlled the interaction of software based reasoning entities.
Without carefully setting such parameters, the interaction of the reasoning
algorithms could not be achieved. This task is described by Baluja as being
manual, tedious and prone to error. The use of EA in this context allowed
the rapid integration and deployment of new reasoning modes into a system,
in a time frame unachievable while depending on human interaction with the
63
optimisation problem.
Keong et al [126] used a GA to achieve optimal compression paradigms
with respect to resource constraints. This is a difficult NP-complete problem,
in which the GA consistently chooses the correct lossless or lossy compression,
with respect to compression resources.
Montana and Davis [82] use a GA to evolve weights in a fixed neural
network. In this instance, back propagation was replaced by a GA, to fix
good values for weights that define the training or learning process in a
neural network. In this case the neural network was ultimately charged with
detecting interesting signals in the midst of a wide variety of noise in an
aquatic environment.
Lucasius and Kateman [127] used a GA to interpret nuclear magnetic
resonance data to determine the structure of DNA and so predict protein
structures.
Recently, a new field of applying bio-inspired evolutionary techniques to
hardware design synthesis has emerged [14] - a confluence of Biology, Com-
puter Science and Electronic Engineering. Referred to as hardware evolution,
or evolvable hardware, it uses simulated evolution to design electronic cir-
cuits [39]. The process of designing hardware in this way involves producing
solution candidates by using evolutionary computation and evaluating those
candidates either by simulation (extrinsic evaluation) or by instantiation into
hardware (intrinsic evaluation) [39, 14, 37].
While on the face of it, the research documented here is a form of evolv-
able hardware, it differs in that it results in the destruction of the silicon
used for evaluation. Furthermore, evolution always occurs in situ, directly
in hardware; no model or approximation is ever used. This is embodied evo-
lution [125] and so individuals are evaluated, live, die and contribute to the
gene pool, within the silicon in which they will perform their primary func-
tion. While biological evolution favours individuals that can survive long
enough to ensure that their genes are robustly represented in subsequent
generations, this evolution selects for inclusion genes (sets of registers) that
64
live long lives in the present generation.
The strength of evolution is the implicit parallelism of sampling from a
diverse set of evolved or semi-evolved phenotypes and recombining them in
novel ways [91, 108, 73, 15, 96, 109, 74, 38, 78]. EAs operate at the population
level which reduces the probability of reaching a false peak. They ignore all
information except that relating to payoff, while other methods rely heavily
on secondary information.
Specifically, a genetic algorithm was chosen here because of its ability
to succeed in noisy environments [117, 31, 78], and where the search space
may not be well understood [33, 30]. Furthermore, the registers that control
the operation of NOR memory chips, being 8 bit binary strings, are well
suited to being mapped to a genetic algorithm. It is the representation of
the candidate solutions that gives rise to the EA subdivisions and if a solution
can be represented using a binary string and all solutions fit the same schema,
then the process is described as a genetic algorithm [47, 38].
3.5.3 Flash Memory
The problem of flash reliability is discussed in many quarters in terms of
defect detection during manufacturing, for example Verma and Mielke [122]
and Tsai et al [119]. Several examples of novel algorithms for efficiently
testing memory array are present in the literature such as Mohammad and
Saluja [79, 80]. There is also much discussion of the many reliability issues
affecting flash memory, like over erase and disturb mechanisms, for example,
Ginez et all 2006 [36], Brand et al [13], Quan et al [92]and Airtome et al[5].
However, surprisingly little literature is present pertaining to endurance as a
reliability issue in NOR memory, and those that are present tend to deal with
the subject in terms of lithography scaling and sustaining voltages during
programming and erase in large arrays, as well as change pump design for
erasure of large blocks such as Atwood [6] and Molta et al [85].
Cappelletti et al [18] does deal with the subject of program and erase
cycling on flash memory, considering both performance degradation of the
65
typical bit, and the evolution of the erase threshold voltage distribution of
the whole memory array. Cappelletti asserts that the variations of program
and erase threshold levels give a measure of oxide aging. This in turn infers
that operational parameter do not remain ideal throughout the life of the
device although no reference is made directly to them. Emphasis is given
to the failure mechanisms which affect flash memory endurance: the erratic
erase phenomenon is discussed and a degradation mechanism, induced by
parasitic drain stress conditions in program/erase cycling is also defined.
Chimenton [20] discusses the problem of reliability and performance, in-
cluding endurance and retention issues as a function of FN tunnelling, as-
serting that both NOR and NAND types are strongly sensitive to techno-
logical parameters variation [20]. Over erase, erratic phenomena, and tunnel
oxide degradation (TOD) are discussed in the context of the long-term con-
sequences on data retention, associated with FN related issues. Chimenton
shows that the use of high voltage ultra-short pulses, separated by a recovery
time, can improve reliability by reducing Tunnel Oxide Degradation.
Chung et al [22] restates the issue of hot-carrier injection (CHE), and
Fowler Nordheim (FN) induced reliability issues of flash memory cells after
long-term program/erase cycles, and asserts that both will generate oxide
damage, which includes the interface state and oxide trapped charge. Chung
states that this type of oxide damage will cause serious reliability problems,
such as programming time delay, operation window closure, and gate/read
disturb. Chung found that the interface state will dominate the device degra-
dation during programming, while the oxide trap charge will dominate the
cell performance during source-side FN erase operation. Moreover, source
bias should be kept as low as possible since the larger the applied source
erasing bias, the more the oxide trap sites will be generated in the chan-
nel near the source-side, causing larger threshold voltage shift that leads to
poorer cell reliability after long term cycling.
While the link between the operational parameters during p/e cycling
and level of silicon degradation is clearly made by this, no work on optimis-
66
ing or matching operational conditions to the state of the silicon could be
found. However, we know from industrial interviews [45] that choosing such
operational parameters is a large undertaking for silicon designers, and one
which is critical to long term reliability of the devices in which flash memory
serves.
In Chapter 2 we examine the internal construction of such devices and
layout how the flash operational parameters are controlled. This data allows
us to envisage a solution to the problem of wear out in which a GA is mapped
to the control registers of the NOR flash array,
3.6 Summary
This chapter has provided a general introduction to evolutionary algorithms
by giving an outline of their overall operation, together with an explanation
of some major operators, such as reproduction crossover and mutation. Some
of the most important EA were discussed and simple examples of implemen-
tation furnished. GAs were covered in some depth with explanations offered
from several viewpoints. As the discussion progressed some examples were
presented from literature that represents the clues inferring that evolutionary





In this chapter, we define the research goals in the context of designing the
hardware platform and extracting hardware criteria and requirements. Next,
we outline the arrangements within the silicon, showing how the analysis will
be performed. Finally, we include a brief on how the design was finalised
and implemented, discussing various options that were available. Detailed
design notes, showing exact pin outs and schematic descriptions of the final
implementation, are left to Appendix 9.7 for clarity.
4.1 Outline of Problem Space
Flash memory devices are arranged such that a set of register values con-
trols the routine operations and housekeeping of the device. These variables
control the operation of all subsystems including the control of actions such
as NV memory read and write and erase by fixing parameters such as the
erase time, programming voltage and so on. All of these parameter variables
require evaluation.
The current method used to calculate the variables is to carry out ex-
perimentation at design time on a sample population of relevant devices in
order to establish a safe set of register variable values [101]. These values will
guarantee that all manufactured parts of this type will meet the specifica-
68
tion sheet claims. The values are then hardwired into the silicon and remain
unchanged throughout the lifetime of that product.
This methodology carries with it a number of problems. Firstly Current
methods of calculating operating parameter variable are manual and require
much engineering input and iteration [56] and will involve a number of sig-
nificant steps as detailed in Chapter 1 Section 1.1, Secondly, It is a One Hat
Fits All solution and ignores two important facts about flash memory:
1. Not all flash cells that go through the same manufacturing process
are exactly the same. Variations in operational behaviour such as intrinsic
endurance or retention capabilities occur based on the silicon foundry used
where differences between clean room methods, the age of equipment, lithog-
raphy settings and the exact process utilized within that plant to produce
the silicon all contribute to flash device diversity. Variations also occur be-
tween batches of wafers, between wafers within a single batch, and from chip
to chip within a single wafer based upon nano structure size and integrity
within the geology of the silicon chip.
2. Flash memory cells do not remain immutable throughout their working
lives; there is variation depending on how much use a single cell has seen.
This is brought about, as discussed in Chapter 2, by degradation associated
with each p/e cycle. These two facts mean that any set of values, however
well chosen, will not be optimum over the vast majority of the cell’s lifetime.
The ramifications are premature aging and failure, sub-optimal operational
response times and de-rated specification sheet claims.
The problem is that the best solution for a specific device is not necessarily
a safe solution for all other devices and so the inferior factory model does not
bring out the best in each device, but will at least get all devices over the
line, in terms of the claims in the specification sheet. If a set of parameters
could be calculated for each subset of devices, this guard banding could be
eliminated, yielding a better specification sheet claim.
69
4.1.1 Restatement of the Contentions
In the introduction, Chapter 1, a number of contentions were developed
arising from the core research questions and the central hypothesis of this
work. Those contentions are restated here, before further analysis, for the
convenience of the reader. In Chapter 1 we contend that it will be possible
to:
Cn.1 Build a test platform incorporating a GA to perform destructive testing
in real time on hard silicon in order to find values for programming
parameters that will improve the endurance of that device;
Cn.2 Find values for programming parameters such that failure of the device
is estimated in a short period of time by the rapid destruction of small
portions of the NOR flash device, effecting a binning solution that
separates good devices from bad;
Cn.3 Find values, in this way, for programming parameters such that a gen-
eral improvement in endurance can be achieved that applies to all NOR
type devices of this type;
Cn.4 Find values for a batch of NOR devices such that an improvement in
endurance can be achieved for that batch;
Cn.5 Find values for a specific device such that an improvement in endurance
can be achieved for a specific device;
Cn.6 Reduce search expense in terms of time and destruction of flash real
estate by using small population methods and by directing certain as-
pects of the search using domain knowledge and the knowledge gained
in previous experiments.
70
4.1.2 Contentions Cn.1 - Cn.5 in the Context of the
Platform Requirements
In contention Cn.1 through Cn.5 above, we suggest that by applying a GA
to perform destructive testing on silicon that we will find values for internal
parameters, such that an improvement in endurance can be achieved.
Since the overhead in recording the cycle history of a cell is minimal in
bulk and block erasable flash, this analysis could be done on a per wafer basis,
thus optimising parameters for each single device cut from that wafer. The
calculation will involve testing to destruction a number of storage elements
on a wafer, the sacrifice allowing us to generate a better set of control register
variables. We may now encode these recipes for long life into each device.
This analysis could be done during the probe stage of manufacture, before
the wafer is broken up into chips for housing. It is envisaged that there may
be a set of parameters that will yield an endurance improvement on all flash
devices of a particular design, and also that there may be optimal sets of
parameters on a per production run level and so on.
In order to prove these contentions, it will be necessary to do destructive
testing on real silicon, using a test platform coupled to a computer running
a genetic algorithm program. It will also be necessary to have a quantity of
silicon and the information pertaining to the operation of the state machine
controlling the read, write and erase function of the device.
ADI agreed to partially support this project by providing access to a
large number of pre-production silicon devices that were used for product
qualification trials. These devices have extensive unused NOR flash memory
on board, so they are ideal for this evaluation.
4.2 Hardware Design
Hardware design is a logical progression of information gathering followed by
informed decision making. From previous discussions a number of important
hardware design criteria emerge. We are required to:
71
• Run experiments on a large number of silicon devices, therefore chang-
ing in and out devices should not be unduly difficult or cause any great
stress to the hardware;
• Program and erase groups of flash cells within the device a large number
of times;
• Know the conditions in place during such p/e cycling;
• Measure the degradation associated with the above p/e cycling.
The points above are the basic pillars of the hardware. However, closer
examination of each problems, revealed that the solutions to each, would
often require many steps.
For example, programming a cell a number of times under known con-
ditions required the silicon chips’ on-board micro controller to be running.
This in turn required the development of not only embedded communications
software for use within the device, but also a hardware programmer to load
this embedded code to the code space of the micro (this is a new device - no
off-the-shelf programmer is available). A re-compiler was required to trans-
late assembly language output files into a form that was usable by the micro
controller. High-level software and low-level embedded code were required
to run the platform and communicate with a PC. There proved to be many
steps in programming a memory cell.
Furthermore, there were uncertainties regarding the devices since they
were unqualified and untested pre-production models with documentation
often in flux. Further major issues arose in relation to the silicon because
of undocumented behaviour attributed to its newness. These issues are de-
scribed later in the thesis.
To sum up, the hardware was a large part of the research challenge and
took a great deal of care and time to complete, due primarily to the complex-
ity of the challenge, the host of new methods required to fulfill the hardware
requirements, and the newness of the part.
72
4.2.1 Platform Options
From previous discussions in Chapter 2 Subsection 2.5.3 we know we will use
the Analog Devices ADu812/ADu824. After study of the part’s specification
and outline knowledge of the problem, a number of possible solutions emerge
for consideration:
In general terms, all solutions must include an external hardware platform
that will stimulate the pins of the device under test (DUT), under the control
of software. This software is in turn directed by the GA. There are a number
of ways in which this problem can be solved. We could:
1. Use a production component tester such as a Teradyne A565 or a
CTS5040, both of which are available at ADI limerick plant;
2. Design and build dedicated hardware which would do much the same
thing;
3. Take an intermediate route and build a temporary test environment
using discrete, off-the-shelf equipment tied together with a piece of
software such as Labview.
Here are the arguments for and against the items listed above.
Advantages of using an off-the-shelf component tester:
1. The hardware is in place and is stable;
2. ADI agreed to allow us access to a tester during low activity times,
such as at the beginning of the quarter and during some weekends;
3. Any solution involving a tester would be on a platform which could
then be easily integrated into the manufacturing process, as both wafer
probe and final test use these platforms;
4. There may be some software routines in place from the current ADu824
test program that we could reuse.
73
Disadvantages of using an off-the-shelf component tester:
1. The tester, costing several millions of dollars, could not be taken out
of productive service to facilitate this project. This put expediting the
research at the mercy of production schedules;
2. It is unknown if any of ADI’s software for the ADu824/812 will be of
a type that could be used by this project;
3. An interface board is required. This board is of a standard multi-
layered printed circuit board type and must be built to a strict stan-
dard to comply with the tester hardware. They are costly, running to
approximately 2,200 Euros for each. It is to be expected that a project
will, for one reason or another, require at least two iterations. This
cost overhead will limit the hardware’s flexibility later;
4. The GA software would have to be ported to the flavour of UNIX in
use on the tester (SunOS or Solaris).
The second option, designing and building dedicated hardware,
has a number of attractive advantages:
1. Build cost will be negligible in comparison with the generic tester route;
2. There would be no production schedule conflicts as the hardware will
belong solely to the project;
3. Flexibility - modifying the design or adding features will be a straight-
forward matter of electronics, without the need to stay within the
boundaries of the tester specification or the re-issue of a printed circuit
board.
There are also a few major disadvantages to the second option:
1. First and foremost, there is a lot of work involved in building a test
platform from scratch. The hardware and software solutions are likely
74
to be extensive. There are a lot of uncertainties in taking on a job
of this size from a skills base viewpoint alone, as well as testing and
debugging concerns;
2. The sheer scale of the undertaking cannot be understated;
3. Any solution would not be in a form easily transferred to a production
environment.
The third option, that of discrete test equipment integrated with an off-
the-shelf software platform, falls before the first hurdle as it carries all the
scheduling problems of a production tester with almost all the development
headaches of building a platform.
Investigation showed that the pieces of equipment required were in al-
most constant use by the design evaluation group in ADI, and furthermore,
this route would also require substantial dedicated hardware. Two possible
routes remained. In order to finalise the design, critical pieces of information
were required. For one, how much tester time would be required to run the
experiments and how much tester time would be available?
4.2.2 Experiment Duration Estimates
From the specification sheet [4] we find that the erase time dominates the list
of tasks to be completed during an experiment. The other task times, such
as write, read and system latency may be ignored since all the other tasks
put together are an order of magnitude shorter than the erase time. Thus it
is possible to approximate a total cell cycle of erase and write plus latency
to equate to erase time which has allowable durations from 1 milliecond to
31 millisecond .
From discussions with ADI staff [133] we find that an erase time of 20
millisecond is considered to be average. The device currently has an expected
lifetime of 10,000 [3] erase cycles. A great result for this research would be to
achieve an order of magnitude increase in useful life. From this, we expected
75
to see experiment time for any one group of cells (or population individual)
of 100,000 by 20 milliseconds.
According to Reeves [93] and Goldberg [38] useful results cannot be ex-
pected with a population of less than 150 to 400 individuals. From this
argument, it is possible to construct the following table of likely experiment
durations.
Table 4.1: Estimated single experiment duration
Conditions Duration
Minimum 150 Individuals 10k Cycles 8 hours
Median 300 Individuals 100k Cycles 4.5 days
Maximum 400 Individuals 100k cycles 9 days
The figures are calculated with erase time assumed to be 20 milliseconds
The table shows that the worst case scenario is a duration of nine days
for a single evaluation. Given that it is hoped to do numerous evaluations,
this casts a serious shadow of doubt over the viability of the tester option,
as it is unlikely that a production tester would be idle for weeks at a time.
Another unknown was the availability of existing software routines for the
812/824 which might shorten development time. Several test programs were
reviewed and it became apparent that all the code that might be reused was
not test code but evaluation code and while there were some useful items, no
compelling reason could be found for using the production tester route.
The question of route was settled at this time when the silicon industry
in general became extremely busy just prior to the ‘Dot Com Bubble’ burst.
Bookings or future orders were at their highest levels in many years, resulting
in an effort to maximise through-put on all testers. It became obvious that
any tester time available would be very little indeed.
76
4.2.3 Evaluating the Remaining Option
The remaining option was to build a dedicated test platform of some kind
where experiments could be performed under the direction of a GA. These
experiments involved setting control registers and programming and erasing
specific groups of cells. There were a huge number of options to be considered
and a great deal of time was spent evaluating approaches.
The whole process of design evaluation took more than two years’ part-
time work and is too long to document here. However, we do discuss a
number of design features since they are integral to the operation of the
platform. We document other design features because in hindsight, they
may have been achieved in different ways. Still more features are notable
because in the event, they were not used for data collection. We include the
latter to clarify elements of the final schematic diagram presented here. There
follows a list of considered approaches. We considered a platform based on
the following options:
1. An Analog Devices ADu824 evaluation board already in construction
with hardware ready to use;
2. An initial test site, followed by production of multi-site boards;
3. Cell current as an objective function;
4. Longevity as an objective function;
5. A tool chain and test equipment supplied by Rosemount Analytical.
Option 1 This was discounted after some time due to lack of flexibility
in the completed design, including such problems as accessing cell current,
accessing proprietary section of register memory and boot code memory. A
further problem with this approach was that testing to destruction would
render the entire board useless as the DUT is soldered to the PCB. Fur-
thermore, it is not clear if sufficient quantities of boards can be obtained to
77
complete the research. A surface mount device socket is not available and a
through hole version cannot be fitted to this board type.
Option 2 This was eventually discounted due to the complexity and time
involved in proving the prototype and going on to design and build a multi-
site version. Furthermore, tests would show that most of the functionality
required would be present on the evaluation site. Although it takes much
longer to run experiments on a single site board, this drawback is more than
outweighed by the extra development time.
Option 3 This option was discarded late in the evaluation process be-
cause the extraction of the objective function was technically challenging
and it was possible to argue that this fitness function (cell current) did not
necessarily equate to life extension.
Option 4 and 5 These options were combined to produce the final test
platform which is detailed in the next section.
4.3 Test Platform Design
In this section we examine the broad strokes of the hardware design while
attempting to explain it and tie it into the programming model. Where
possible, detail has been removed to the appendix in Chapter 9 for clarity,
however a certain degree of technical description must be retained to explain
the operation and limitations of the system.
The Analog Devices ADu812/824 microcontroller has been chosen for
the investigation, an outline of which is shown in Figure 9.3. The internal
arrangements are detailed in the preliminary product description. This is
a proprietary document and is not reproduced here, but it is available for
review on request.
4.3.1 DUT General Description
The ADu812/824 is a complete smart transducer front-end, integrating many
subsystems including ADCs(Analogue to Digital Converters), an 8-bit MCU
78
Figure 4.1: Block diagram of the Analog Devices ADu812 Micro-converter
chip
(Micro Controller Unit) with program and data memory and much more. On-
chip factory firmware would normally support in-circuit serial download and
debug modes via UART (Universal Asynchronous Receiver Transmitters) but
in this case, no such firmware is included, nor is any tested version available
for upload. This means that on one hand we may use the 2K non-volatile
NOR flash area reserved for factory firmware during experimentation, while
on the other, our embedded code must be uploaded via parallel programming
since the serial port boot loader code is not present.
In the diagram it can be seen that the DUT has four 8-bit general purpose
ports through which many of the functions of the DUT are controlled. There
are many subsystems which are not used during this research and can be
79
Figure 4.2: ADu824 PLCC pin assignment
ignored, and still more are not used but nevertheless must be configured,
such as the watchdog timer.
The memory map shown in Figure 4.3 is of particular interest. The device
has, all told, 10 kilobytes of non-volatile program memory and 640 bytes of
non-volatile data memory.
The code memory is grouped into 64 byte-erasable chunks while the data
memory is grouped into 4 byte chunks. This means that any analysis in-
volving erase must be conducted on a 64-byte chunk of code memory or a
4-byte chunk of data memory. This reduces our experiment space consider-
ably, although it may be possible to do read and write analysis with greater
resolution while also running it in parallel with any erase type analysis.
4.3.2 Memory Map Coding Consideration
As with all 8051-compatible devices, the ADuC824 has separate spaces for
program and data memory. The program code may be executed internally
or externally by manipulating the EA pin. The data memory space consists
of four physically separate blocks:
• The lower 128 bytes of RAM
80
Figure 4.3: ADu824 memory map
MS(Memory Select)is mapped to port 3.5 pin 23 - when high, it selects code memory
and while low, it selects data memory
• The upper 128 bytes of RAM
• 128 bytes of special function register (SFR) area
• 640-byte NV data memory
Many other functions of the NV memory areas are mapped into SFRs, such
as control over the extra 2-Kilobyte bootstrap code space, parallel program-
ming, data code programming and read/write/erase parameter variables.
Armed with this information, we may now consider distilling a generalised
block diagram of the proposed system, shown in Figure 9.10, to a schematic
shown later in this chapter in Figure 4.7. The control board is used to exercise
the pins of the DUT and to allow devolution of complex algorithms to here
rather than have them consume limited resources on the DUT. Since we must
use parallel programming to load program code to the chip, it makes sense to
use parallel communication between the DUT and the control board during
run time. From the control board to the PC, Serial communication is easiest
81
Figure 4.4: Generalised block diagram
to implement. While Ethernet would be faster and more flexible, neither the
DUT nor the IFT is fitted with Ethernet ports.
4.3.3 The Control Board
Rosemount Analytical also supported this project and provided test equip-
ment and a tool chain at nominal cost. This consisted of an Ashling CTS51
Emulator and Pathfinder integrated development environment, a Prom Pro-
grammer, several Intelligent Field Transmitter (IFT) type 8051-based SBC
(single board computer) and an IFT power supply.
This equipment list was chosen because together they form a complete
tool chain for platform development - the SBC is perfectly suited to the task
of control board and is well documented and understood. It is also rich in
features such as ADCs DACs, GPIO(General Purpose Input Output) and
many other features, all under the control of an 8051 core. Keeping the cores
82
common reduces the learning curve for the programming tasks. The processor
is fitted to the board via a Plastic leaded chip carrier(PLCC) socket. This
means it may be replaced with the Ashling CTS51 PLCC emulator probe.
This set-up allows direct control over all the functions of the SBC from
the Pathfinder Integrated Development Environment (IDE) and enables us
to generate code that can be tested in real time on the control board. Fur-
thermore, this code can be run in emulation with breakpoints and watch
variable, and allows tremendous visibility into the workings of the control
board features.
The DUT, on the other hand, has no such development tool available and
will work very much in Black Box mode. This will prove challenging since it
remains to be uncovered that a number of serious undocumented behaviours
exist in the DUT.
The code to run all features on the SBC is bespoke to this project, and
includes many subsystems including a serial port driver, relay controllers,
GPIO and LCD port drivers, to mention just a few.
4.3.4 The DUT Board
Next, we consider the design of the DUT board. The SBC control board
does not have sufficient GPIO to directly control all four 8-bit ports of the
DUT as well as several other pins, such as ALE (Address Latch Enable) and
Reset, therefore a piecemeal approach is adopted. A bus is ran latched to all
8-bit ports using an 8-bit transparent latches(74LS373).
We use other GPIO pins to select each latch and yet more GPIOs to
control ancillary DUT control pins. In this way we may write and latch 8
bits of logic to each port in turn, thereby using 12 GPIOs to express 32 logical
bits. In the software, it will be necessary to select a port prior to writing
data to it. This may cause some latency; however there is no alternative to
port expansion at this point.
The SBC has 17 GPIOs memory mapped through an 8255 programmable
input output (PIO) controller. These design criteria are now informing the
83
Figure 4.5: Block diagram of the DUT board
Figure 4.6: The 74LS373 transparent latch
software requirements, and as we progress, a generalised list of tasks emerges
as well as the rudiments of some of the software routines we will use.
Since we will program the device using parallel programming we will also
use the parallel ports to control the device and communicate with the DUT
embedded code. In this way there is no need to operate a separate UART
84
Figure 4.7: DUT board schematic
on the DUT or implement a second UART on the SBC. In a later hardware
incarnation, this decision will be reversed.
4.3.5 Signal Conditioning Board
The signal conditioning board contains mostly analogue circuits for preparing
cell current for conversion to digital and translating various RS232 levels.
85
Figure 4.8: DUT board prototype
DUT board prototype showing the data bus at the rear and the transparent latches
connected to the bus. The port relays can be seen on the left
The relays in the DUT board circuit diagram shown in Figure 4.7, support
a special mode of the device called Cell I mode. In this mode, the chip will
output the cell currents of the selected memory elements to the GPIO port
instead of to the data sense circuit. Under normal operating conditions the
currents flow through the sense circuits where their values are determined.
Current flow values above and below a certain threshold indicates that the
bits are either ones or zeros.
These currents are diverted to port 1 in cell I mode. This makes execution
impossible during cell I mode as the port is essentially shorted to the selected
memory byte. It takes a reset of the device to restore normal operation.
The precise value of the currents can indirectly indicate the condition of
the cells and may be useful as part of any objective function. We can sum
the currents provided that the bits are all set or cleared. This gives us a
relative measure of the health of the selected byte, since each bit will have
86
undergone precisely the same aging. The current is small, in the order of
Micro Amps (1x10-6) and it was considered important that no other device
would be present on the measurement path to either sink or source current
while we were attempting to measure it.
The relays are energised in cell I mode, thus disconnecting the DUT
pins from the latch and connecting them to the signals board, where they
are directed to a precision resistor. There, they are converted to a voltage
for translation to digital by the control board’s ADC. Before conversion,
level and offset adjustments are required to present the A/D converter with
a useful signal for conversion. The signals board also handled serial port
translation from RS232 standard to digital and from digital to RS232.
4.3.6 Building and Testing
There are many other subsystems that required attention. On the SBC, for
example, the watchdog timer and the operating modes of the PIO (which are
extensive) required preparation. For the sake of brevity, all descriptions of
these subsystems have been omitted but they can be found in the specification
sheets for the memory mapped components and in the software comments in
Appendix 9.3.
The build was approached in a piecemeal fashion and developed in parallel
with the software routines that were required to run each subsection. Exten-
sive use was made of the ICE (in-circuit emulator) throughout development.
Many obstacles were encountered and overcome during the extended period
of the build. In summary, the SBC was tackled first with each subsystem
configured as it was required by the project. Most debugging was undertaken
using the ICE, a multimeter and an oscilloscope. The DUT raised its own
problems, as mentioned earlier. These problems are discussed in Chapter 5
Section 5.4. Once the hardware was stable, a great deal of testing was un-




At this point, it is be appropriate to summarise the hardware arrangements.
The DUT is an 8051-compatible 8-bit microprocessor-based, smart trans-
ducer with 10k of on-board NOR flash memory. We use the four 8-bit ports,
bonded out from the die to port pins to control most aspects of the device’s
operation. From the ports, we can load program code for use during exper-
iment, communicate with the DUT during operations, and extract the cell
current via Port1. Other pins on the micro are controlled to put the device
into special modes, reset the device and so on. The code set on the DUT will
be as small as possible in order to preserve memory for experimentation, and
to make testing easier, since the DUT’s micro is black box, pre-production
and untested.
Figure 4.9: Entire tester block diagram
88
The intention is to allow each discrete p/e cycling event, referred to from
here on as a Fragment, to run autonomously on the DUT. A Fragment is
defined as a subdivision or dismembered part of a GA-generated Individual
that is complete. That is, it is a specific block, running for a specific number
of p/e cycles, under known conditions, instantiated by a set of values writ-
ten to the control registers. A multiplicity of no more than 256 Fragments
constitutes an Individual.
The control board is an 8051-based single board computer with many
extra resources, such as high current outputs, GPIO and serial ports, to
name a few. It will be the control board’s task to operate and program
the DUT, communicate with it, schedule Fragments for p/e operations, and
so on. The SBC controls the DUT board via GPIO organised around an
8255 programmable input output device. The SBC will perform the tasks
that are too complex for the DUT firmware and will also implement serial
communications with the PC.
The PC will run an application that will enable the user to control the
functions of the DUT and to load Fragments to it via the control board. The
GA will run on the PC and will constitute input to the PC application. The
PC application will centralise control of all aspects of hardware operation
and GA evolution. The PC will also take the DUT executable code in Intel
HEX format and construct code packets from it. The control board will, in
turn, upload the packets to the DUT’s reset vector.
4.5 NOR Revision Two and NAND
After several years of use, the hardware described above was subjected to a
major hardware revision. The objective of this undertaking was to speed up
the process of evolution. It became apparent that communications latencies
arising from the convoluted communications steps described earlier were a
significant factor in the length of experiments.
The process of serial communication to the SBC followed by processing
89
and parallel communication to the DUT was simply not efficient. Further-
more, it would require multiple DUTs operating together in order to process
a significant number of individuals in a shorter timeframe. The platform
described above could not cope with that complexity.
In the new design, the SBC was eliminated, except for the purpose of
uploading the DUT with run-time code. Once loaded, the DUT was au-
tonomous and it used its own serial port to communicate directly with the
PC application. This would allow us to run DUTs in a socket simply by
powering it up and adding a serial transceiver.
The serial port speed was increased to 119 Kilo Baud. A new PC ap-
plication to cope with multiple serial connections was devised. This new
system did speed up operations and was used to collect some data. However,
a multi-site version was never completed because at this time, the focus of
the research began to move headlong towards NAND memory and therefore
beyond the scope of this thesis.
The subsequently designed NAND hardware is much more scalable and
was developed in a shorter timescale, largely due to what was learned through
the NOR revision 1 and 2 design processes. The NAND hardware is au-
tonomous and may be run in parallel to any dimension. A single computer
application can schedule all testers and a single entity (such as a GA) may
be used as input to this program. Hardware such as this is in use by all
members of the expanded research group.
4.6 Summary
In this chapter we have envisaged solutions to the problems set out by the
research goals and critically examined them. We describe the remaining
solution path and specify hardware and software requirements arising from
it. We describe the resultant tool chain and outline the major elements of
the hardware design. Much of the fine detail has been omitted to facilitate
clarity and brevity, including testing, debugging and commissioning. The
90
Figure 4.10: NAND tester
hardware design process informs the software engineering tasks and these




In this section, we outline how the software solution was developed. The tool
chain and other design decisions are explained and the three code sets that
are required are outlined with the functionality of each code set examined in
order to expose the operation of the platform as a whole. We also mention
some modifications that were made later in the experimentation phase as a
result of what we learned along the way which improves the efficiency of the
system and defines the shape of future test platforms used for NAND testing
in spin-out research programs.
5.1 Software Requirements
Once the shape of the hardware platform is finalised, it becomes possible
to develop the programmer’s model. In this, we envisage solutions to the
communications issues, decide on the location of processes and so on. There
is a minimum of three code sets arising from the system design thus far:
1. There is the PC-based application incorporating a GA to produce the
populations of solutions which are the basis of all experiments. It will
contain a processing and communication module which will communi-
cate with the control board;
92
2. There is the control board software that communicates with both the
PC and the DUT, controls many of this board’s sub-systems such as
the LCD port, the programmable input output(PIO) chip and many
more;
3. Finally there is the DUT software. This must communicate with the
control board, set up the programming conditions and perform the p/e
cycling, and measure the fitness.
5.2 Design Considerations
The DUT has four 8-bit multi-function ports that are used for many func-
tions of the chip, such as address bus, data bus, and Programmable Read
Only Memory (PROM) programming. While an algorithm could probably
be found in a commercial programmer that would program the device, there
are a number of other considerations at play that means that this option
cannot be availed off:
• A commercial PROM programmer will be unable to run the proprietary
algorithms to put the device into factory mode and manipulate the
register that controls the programming operations;
• A PROM programmer will not be able to put the device into cell current
read, mode nor will it be able to disconnect itself such that an accurate
cell current reading could be made;
• We will wish to use the chip’s own programming functions as they are
the subject of the experiments;
• We wish to devolve as much functionality to the control board as pos-
sible;
• There is no second serial port on the control board and so substantial
modification would ensue if we were to use the DUT’s serial modes for
communicating with the control board.
93
For these reasons we must develop our own PROM programmer which can
manipulate all the modes of the device. This means that the control board
needs to be able to control all the pins on the four ports of the DUT. We
will need to be able to disconnect the port used for measuring cell current
under program control and also be able to load experiments to the DUT
and read fitness values afterwards. Since the control board has an 8-bit port
in common with the DUT, we can expand this to control all ports. Extra
discrete pins on the DUT such as ALE and PSEN and RESET are controlled
via a secondary control bus. In this way, every pin on the device is under
program control. We can now generate a generalised functional block diagram
of the software shown in Figure 5.1.
Figure 5.1: Functional block diagram of the test platforms software
For all the reasons outlined, the control board software may be challenging
as it will contain all the following not insubstantial modules.
94
• A communications module for effecting serial communications with the
PC
• An initialisation module to set up the SBC resources
• A parallel communication module for communicating with the DUT
board
• Data and experiment-handling capabilities
• A DUT programming module
In terms of the project as a whole, this software work package is a gating
factor because of its central role in communicating with the DUT hardware
and the PC modules and so, it needs to come first. The Ashling CTS51 and
associated tool chain is ideal for writing embedded assembler and embedded
C. However, to retain precise control over the real time environment, it was
considered advisable to write as much of the control and DUT code as possible
in assembly.
5.3 The Control Board
In general, the assembly files for both the DUT embedded code and the
control board embedded code come in two parts - a declaration file or header
file containing all memory reservations and constants, and then a “main”
section along with sub routine functions. The control board code modules
are outlined in Table 5.1.
95
Table 5.1: Control board code modules
Module Description
Initialisation Set up the SBC resources
Serial module Communicates with the PC and handles
data downloads
Protocol interpreter A switch that will identify the command, issued
by the PC
Command A suite of command function to implement the
commands received from the PC
Parallel module Routines to pass data to the DUT board latches
Experiment handler Routines to run experiments and measure fitness
5.3.1 The Serial Communications Module
Serial communications is used, since both the PC and the control board are
fitted with RS232 serial ports UARTs. The serial monitor will loop endlessly
around polling the serial buffer flag. The UART will set this flag when the
serial buffer is full of incoming data. When the flag is set, the DUT code
immediately moves the contents of the serial buffer (Sbuf) to the accumulator
and resets the serial flag. The accumulator content is now moved into ram.
This move will happen while the serial buffer is filling with the next byte so
no data can be lost by overwriting the accumulator or Sbuf.
The process happens a prescribed number of times according to the pro-
tocol. The number of bytes transferred in each packet has changed over time
to suit the needs of the application. Typically it is 19 bytes. The packet is
now stored in memory with the first byte considered the command byte. This
is now processed by the command handler. While the command handler is
active, no further serial communications are possible. Thus, a packet is sent
and the PC must wait for a satisfactory response from the SBC before issuing
further commands. Table 5.2 details the serial communications protocol.
96
Protocol Implements
CheckBusy Returns 5 if the DUT is not busy. Returns 4 if it is
SanityCheck Port P0.6 on the DUT will be low if the cell is alive
ADRead Reads the A to D converter
ResetDUT Resets the DUT
EndRead Takes the DUT out of cell current mode
Boot Programs a packet to the DUT
FacMode Puts the device in factory mode
CellI Selects the cell and connects the relays
ClState Indicates that an erase or write error has occurred
LoadM Loads a Fragment to the DUT
Plug Set the control registers and verify them
InitLatches Set the latches around the DUT to benign values
ProgramByte Implement the timing diagram for programming
EraseBlock Implement the timing diagram for erasing a block
ReadZero Read the A/D with relays disconnected
SetLatch Selects a latch and writes a value to it
WriteCtrl Write to the control reg of the PIO
Write a b or c Writes to the PIO ports
Table 5.2: Serial communications protocol
5.3.2 The Event Handler
The event handler is called from the end of the serial routine, so this is
where execution will return to before returning data to the PC. We write to
memory address 2400 which has the effect of energizing the SBC relays. This
is a debug line to let us know audibly that the packet has been received and
is being handled. This feature was found to be useful and was never taken
out.
A switch statement is effected and the sub routines to handle each action
are now called. Each return statement in the switch returns execution to
97
the bottom of the serial handler, where the data to be returned to the PC is
processed through the serial port.
5.3.3 Other Functions
There are routines to run the PROM programming functions that implement
the timing diagram found in the device specification, and there are routines
to put the device in cell current read mode and capture the A/D output.
All other functions are either to implement the communications protocol
between the PC and the DUT, or to handle low-level functions such as wait
times or write values to the programmable input output device (PIO).
5.4 The Device Under Test
The DUT code is designed to be small, implementing only the minimum
functionality. There are 3 reasons for this:
1. To preserve as much memory as possible for experimentation.
2. The DUT is black box
3. When performing p/e cycling, there are few ways to verify task perfor-
mance, so this code should be short, well understood and well tested.
Point two above means that there is no way to see inside, single step
or otherwise trace program execution. This compels us to use an external
simulator to test code, and set port pins to signal code points, when running
in real time. This sort of work is best kept to a minimum, especially since
this chip is unreleased and as such, may have undocumented behaviour. As
it happened, there were several undocumented behaviors, one of which was
serious.
As code was developed for the DUT, it became apparent that something
was very wrong. Code segments that simulated perfectly well would not
run at all on the DUT. An enormous amount of time and effort went into
98
making sure that the code was being programmed to the device correctly.
Given that this process contained numerous steps involving a compiler, a
cross compiler and a newly-developed PROM programmer, there was plenty
of scope for errors, any one of which might cause the observed behaviour. It
is impossible here to overstate the time it took to verify all of these steps.
Finally we resorted to having the DUT single step the code space contents
out through a port pin, and thus we were able to establish that the machine
code we expected was in fact in the code space and at the relevant location.
Many other possible problems were ruled out, such as incorrect reset vectors
and other internal configuration possibilities.
It is not normally useful to consider that the core is not performing as
described, but finally, since this is a pre-production part we had to address the
possibility. It was discovered that very trivial code snippets would execute
normally, but anything with any complexity would not. Various expansions
of trivial working code revealed that the core would not perform any 16-bit
branch instructions. Only 8-bit vectors were working normally.
Several attempts were made to establish whether this was some special
mode of operation we could exit, but no one in Analog Devices recognised
this as a feature. In the end we wrote the code without function calls or
long-jumps, and with that, the core performed flawlessly. This problem is
further documented in the design notebooks.
5.4.1 First Generation DUT Code
The DUT code has 2 modules, a communications implementation and a p/e
cycling module. Communications data is received through ports 2 and 3.
Port 0 bit 0 (P0.0) is cleared at reset and is the busy/idle flag. P0.3 is
toggled with every state change denoting that the device is alive. This is
viewable on an oscilloscope. The SBC uses these signals to inform the PC
software.
All zeros on port 2 is the Do Nothing state and will result in another
polling of the port. Any other value on the port will indicate that there is
99
now data available on port 3. This data is transferred to RAM at the location
identified by port 2. Now we set P0.0 the busy/idle flag (B/I flag). Next, the
value of the Iter(iterations) memory reservation is tested. This corresponds
to ‘the number of iterations we are to do with these register settings’. If it
is zero, then there is nothing to do, so we clear the busy/idle flag and return
to polling.
In this way we can push any values we choose into the DUT’s memory.
As long as we don’t set Iter, we can do this as often as we like. Setting Iter
from the protocol will cause the registers to be instantiated and p/e cycling
to begin. It also sets the number of p/e cycles performed per iteration, so
we may control the resolution of the test. That is the total number of p/e
cycles we expect to see before the block of cells fails. It also causes the cell
ok flag to be set. This is cleared when the block fails.
5.4.2 Second generation DUT code
In Chapter 6 Section 6.5.3 we outline a number of issues with the duration
of testing. It was apparent that the time spent erasing was only a small part
of the overall experiment time. This was in no small part due to communi-
cations latencies involving the various code sets and the slow nature of the
parallel communication between the SBC and the DUT. This is confirmed
later when timing analysis reveals that on average 11 fitness quanta(tick of
the fitness counter) can be cycled to destruction per minute using default pa-
rameter settings. This equates to 11*15*255=42000 cycles of 1milli Second
each or a total of 42 seconds. This means that 18 second of every minute is
spend on communications latency or 30% of overall experiment time.
This new code is designed to improve the arrangements. Both the DUT
code and the PC code required complete revision. The SBC code remained
unchanged. This new arrangement sees the DUT running in standalone
mode, that is, without using the control board for serial and parallel com-
munication and storage of the individuals. Because of this, the lines of com-
munication are much shorter. There is no storing of commands, polling the
100
parallel ports, moving the data again from SBC memory over to the DUT
and so on. Also the serial port works at the higher baud rate of 119200.
Now we upload and entire generation directly to the DUT and return
upwards of 210 bytes of data, which includes fitness values for all individuals
as well as other data such as checksums and so on, which is stored for later
analysis. The DUT is much faster and more autonomous. Communication
latency is now a small portion of time spent erasing. The DUT code is a
larger code set and so some experiment space is lost however this is a small
price to pay for the improvement in speed and so shorter experiment time.
Experiment time is now entirely dominated by the actual experiments.
5.5 The PC code
Both a Windows and a Linux GA were trialed during the evaluation of the
hardware platform. Both worked well, however, the GA(GAlib) required that
each individual would return its fitness function before the next individual
was issued and would require modification to change this behaviour. The
GA(Xgenetic) did not, furthermore the tool chain for embedded code devel-
opment(Pathfinder emulations ware, WinIdeas and Winedit) were all only
Windows based. If newly developed PC application was also Windows it
would be easier to integrate the software and hardware development. Xge-
netic supports Visual Basic and this programming environment was found
to be well suited to serial communications handling, as well as being a good
IDE(Integraded development Environment) for developing automations on
the fly, and it is already within the skill set of the researchers. This GUI
code was designed from the outset to be quickly flexible and disposable. The
low-level functions are robust, modular and well tested, while the higher
level automations and data collection routines are often in flux to satisfy the
need of the changing experimentation. The PC code therefore should not be
thought of as code written for many users.
VB is similar to Visual C++ and Delphi in that all are event-driven and
101
are forms based. From version 6 on, VB is fully object-orientated. The PC
code is broken into several forms with each form handling a specific set of
tasks. In all, there are 5 forms, all of which are listed in Table 5.3.
Protocol Implements
MDI Form Multiple document parent and drop down menus
Mine Converts test data into a mineable form
CrossCompile Converts compiler output to programmable packets
GA Generates the experiments and processes the fitness
Hardware Controls every function of the hardware
Table 5.3: The PC application forms
5.5.1 The Cross Compiler
A cross compiler is necessary because the output from the compiler used to
create the DUT embedded code is unsuitable for programming directly to
the DUT. Outputs from the compiler are available in Intel hex and list file
formats. Not all the information required to create the packets is present in
the Intel format file, so we elected to use the list file as the basis for cross
compiling.
The cross compile window opens the compiler output list file and searches
for code segments. It adds all the code segments together into a large string
representing the entire code listing. This string is then packetised into a
format that can be easily transported to the control board where it is sub-
sequently programmed to the chip via parallel programming through the
latches on the DUT board. Leftover portions of the string are fitted to the
last packet and unused portions of this last packet are padded with FFs (the
erased state), hence they are ignored by the programmer.
102
Figure 5.2: VB cross compiler
VB cross compiler form showing code fragments found at line 178. We can see the raw
machine code in the top window and the final programmable packet with address added.
The FF padding(no-op)for the final packet is visible in the center window
5.5.2 The GA form
The GA form has two sections, left and right. The left section is for tradi-
tional single site operation where most of the data was collected and the right
section is for the newer hardware that allows the chip to run autonomously.
In each section there is a button for creating and evolving a population, used
for setting up and debugging. Then there are larger buttons for running
automated complete experiments.
103
Figure 5.3: Running the platform
Run Experiment Single: This sub routine runs experiments in which there
are 8 generations with 20 individuals in each generation, the maximum
number that a single chip can support. The fitness is a number from
0 to 255 indicating how long a cell remained working, that is to say,
successfully writing and erasing.
The value of Iter was one of several introduced in an effort to use
human knowledge to guide the evolution. It dictates how many cycles
are done for each element of an experiment thus we can control the
absolute number of p/e cycles done during a run. The “Range” variable
similarly contains the highest initial value for the registers. Thus, if set
to say 140, it limits the starting point of all variables to below 140 out
of 256. This ensures that outrageous values, values that are sure to
result in a failed individual, are not chosen.
The “Powerscale” variable is used to set the rate of increase of the
104
variables. This effectively ensures that no variable will decrease over
the life of the cell, again reflecting the absolute fact that(in the absence
of recovery time) it will become increasingly more difficult to persuade
the floating gate to change state, never less difficult.
These measures do affect the freedom of the GA to experiment with the
search space but are justified in terms of the cost of experimentation
in time and silicon real estate. We may also set the hardware to run
the default factory parameter set which is used to collect the control
group in each chip. The GA, in essence, must beat the control group.
The Create Sub: This sub routine may be called by clicking the create
button, but more usually it is called from an automated experiment.
The create sub routine randomly creates the first generation. XGEN
generates 20 individuals, all of which are saved to a file called mem-
bers.lst. This file contains the dismembered part of each individual.
That is the values of the variables at each of 256 points of the memory
cell’s life. This value is chosen since it is the maximum resolution of
the largest register. The GA defines as floats the starting point and the
rate of increase (or the slope) of the variable values over time. Later,
XGEN will be reconfigured to generate two binary strings. This has a
bearing on the process of evolution and is described in the next chapter.
Next, lower-level functions are called from the “Hardware” form, from
which a global array filled with the fitness values is returned. This array
is processed to extract the valuable fitness information which is then
passed to XGEN. This is a once-off arrangement to gather the results
of the first generation before running the XGEN function “evolve”.
The Evolve Sub: The “evolve” function is much the same as the “create”
function except that it uses fitness and not randomness to create the
next generation. This sub routine is run any number of times, depend-
ing on the number of generations in the experiment. Under normal
operating conditions, we might extract identical or near-identical indi-
105
viduals from later populations to avoid repeating identical trials, but in
this case, because of the potential variation within the chips, we want
to verify the evolved recipe with a significant number of tries so we
have allowed converged populations to continue to run.
5.5.3 The Hardware Form
This form implements the communications protocol. There is a sub routine
for all hardware functionality, such as factory mode, cell current reading,
code segment programming to mention but a few. This sub routine sets the
DUT in motion and monitors it for errors. It communicates with both the
control board and with the DUT. Once a Fragment is loaded, it goes into a
loop that polls the devices to see if it has finished. Messages from the control
board and DUT, including error states, are handled by this routine.
All the sub routines in this form may be run independently for develop-
ment and debugging. This feature was used extensively.
5.5.4 Summary
Here we have described the operation of the code sets used on each of the PC,
the control board and the DUT. In doing so, we have provided a mechanism
for understanding the operation of the flash memory test platform as a whole.
We describe modifications made on foot of our experience using the platform
that may inform later tester design. We now move on in the next chapter to





The next two chapters documents experimentation that was underway for a
considerable number of years. This chapter starts by describing the internal
arrangements that are concerned with NV memory management. The search
space is defined and then reduced in scope by careful examination of the
minutia of the device’s control registers. Initial experimentation with the
working platform helps to define the GA representation and fitness function.
Further proving and timing runs are used to set the overall experiment length
and calibrate many other process variables. In Section 6.7 it is shown that
even at this early stage the process is uncovering important aspects of flash
behaviour.
6.1 Experiment Design
The first task in Experiment Design is to determine the shape of the first
round of experiments. This includes the definition of the representation and
fitness function. Considering resource limitations, it is desirable to reduce
the search space as much as possible, but to do so without impacting the
ability of the GA to find innovative solutions. The approach is to progress
107
piecemeal using active design [61, 124] moving step by step closer to answer-
ing the research questions and in so doing, define a practical method for the
automated discovery of programming parameters using a genetic algorithm.
6.2 Representation
From Chapter 3 we know that representing the problem space to the GA is
an important step. The register map of the DUT is shown in the product
specification document [2]. From this we can construct Table 6.1 which shows
the registers concerned with controlling the NOR NV memory. We will pass
control of the NV memory to the GA.
The ADu812 flash memory registers
Name Register description Default value
ECON The memory command interpreter 0xB9
ETIM 1 Several programming variables 0xBA
ETIM 2 The erase time 0xB1
EETESTT0 Various options including read points 0xC5
EETESTT0 Program current and high voltages 0xC1
FME/SEC BITS ROM security bits 0x280
Table 6.1: The non-volatile memory registers
From the table, the search space is 248 in size. This is an astronomical
number. However not all registers are directly at play in reliability, which is
the article of interest. The ECON, for example, may be eliminated since it
is a command interpreter and does not control any aspect of reliability. Sim-
ilarly, FME(factory mode enable) and security, while it must be handled by
the embedded code, does not have any bearing on programming conditions.
Further study of the precise meaning of the register mappings is required in
order to eliminate further bits of the remaining 32.
108
6.2.1 ETIM1 Register
The register ETIM1 (EEPROM Timing 1) internals as described in the prod-
uct specification are shown in Figure 6.1. Variables instantiated in ETIM1
are TPROG and THVSU(described below). TPROG and THVSU values are
shown in Table 6.2 with their corresponding bit expressions. The remaining
bits are not mentioned in the specification. Consultation with ADI engi-
neers [133] revealed that if they are not mentioned, they are not in control of
any aspect of the design. Nevertheless, their reset values were ascertained by
experiment and were ‘masked in’ whenever this register was changed, thus
the unmentioned bits are never changed from their default values.
Figure 6.1: The special function register ETIM1
TPROG stands for Programming Time, and is the amount of time the
programming voltage will remain on the selected memory cells. THVSU is an
acronym meaning Time High Voltage Set Up or High Voltage Set Up Time.
This can be thought of as the amount of time over which the voltages used
for programming will ramp up. Both of these variables may have reliability
dependencies. The total number of bits at play here is five.
109
The Variable THVSU and TPROG
Bits 6,5,4 Time Bits 0,1 Time
000 0.0uS 00 30.5uS
001 30.5uS 01 61.0uS
010 61.5uS 10 91.5uS





Table 6.2: The variables TPROG and THVSU
6.2.2 ETIM2 Register
All of the bits in the ETIM2 (EEPROM Timing 2)register are concerned
with erase time - the length of time each block of cells is exposed to the erase
voltage. The default value is shown in Figure 6.2. The corresponding erase
times can be found in appendix 9.5
Figure 6.2: The special function register ETIM2 controls erase time
6.2.3 EETEST0 Register
This register controls functionality often used for testing the NVM during
manufacture, and allows the user to reroute certain values such as word line
110
voltage and cell currents to outside pins on the chip. It also allows us to
route high voltages into the chip for use in mass programming in the event
that the on-board charge pump is unable to provide sufficient current. As
well as this, there is a feature to allow us to change the sense voltage. This is
the threshold under which the cell is interpreted as a one and over which it
is interpreted as a zero. It is distributed over two bits and may come under
the control of the GA later.
6.2.4 EETEST1 Register
EETEST1 register is shown in Figure 6.3. The variables instantiated by
this register and their values are shown in Figure 6.3. IPROG represents
Programming Current and has 3-bit resolution, from .72 micro amp up to
2 micro amps. HV stands for High Voltage and we may set the values of
the high voltage applied to the cells for programming over two bits from
12 volts up to 16.3 volts. There are 5 bits of this register which may affect
reliability. The remaining bits, MASS-PRGEV and MASS-PRGOD, are for
mass programming the even and odd bits in the memory array in order to
set up a checker board pattern quickly.
Figure 6.3: The special function register EETEST1 controls programming
current and high voltage set up time
111
The Variable IPROG and HV
Bits 7,6,5 Current Bits 1,0 HV Program HV Erase
000 0.72 uA 00 12 V 15.1 V
001 0.57 uA 01 13.5 V 15.9 V
010 0.43 uA 10 14.1V 16.3 V





Table 6.3: The variables IPROG and HV
Notice that some of the variables do not follow values in order, and we
may have to adjust for this in the representation.
Information gathered so far can be summarised as follows:
• The entire search space has been reduced from 48 to 20 bits by careful
study of the control registers;
• Erase time dominates the programming activity time and has the high-
est resolution of all control parameters;
• From basic initial testing we find that erase time has a dramatic effect
on reliability. Endurance shows no other initial parameter dependen-
cies.
The information above is sufficient to determine a representation for the
GA in which each bit associated with flash reliability is assigned directly to
the GA’s binary genome. Running further basic tests without a GA hooked
up would tell us some more about the search space, resulting in a represen-
tation comprising an erase time starting point and slope(rate of increase) as
detailed in Section 6.5.1 below.
112
6.3 Fitness
Quite a lot was learned while defining the search space, and the same will be
true of examining the fitness function. This will be summarised at the end
of the section.
The fitness value, as discussed in Chapter 3, must represent the suitability
of the solution for solving the problem. The problem to be solved in this case
is longevity, not just of one cell, but of all the cells in a byte or all the bytes
in a memory block whose parameters are set together.
6.3.1 Flash Cell Recovery
Recovery is the process of natural detrapment of electrons form the silicon
over time allowing the device to be programmed more easily. It is important
to eliminate any recovery time before taking the fitness value for reasons
outlined in Chapter 2 Section 2.4.2. The simplest solution is to run p/e cycles
back to back until failure. In this way, any platform introduced latency is
evenly applied to all blocks and devices. This means that any control group
of cells will be exposed to the same recovery time as any group using a GA
derived programming plans. Running p/e back to back is significantly more
aggressive than real-world operating conditions. However, this is not a factor
since it is the relative performance of the memory that is of concern to the
GA fitness function. Any comparisons made to the factory default will be
made under the same conditions.
6.3.2 Endurance Variation
Variation in intrinsic endurance of flash cells is effectively seen as noise in
the fitness function by the GA. Another early lesson learned, is that while
endurance variation within a chip is modest and randomly distributed, there
is significant variation of intrinsic endurance (noise) between devices. That
is to say that there is a distribution of endurance amongst any population of
chips even though they may be using the same programming conditions. This
113
can be accounted for by silicon manufacturing process variations. It means
that evolution of a single population cannot take place across multiple chips
as the intrinsic endurance will cause the fitness to bias the search to the best
performing devices rather than best performing solutions. This conclusion
limits the size of the experiment space to a single chip - a total of 160 blocks.
This space must contain all generations as well as a control group.
6.3.3 Manufacturing History
Another uncertainty addressed at this point is the manufacturing history.
Chip markings are not sufficient to determine which wafer each chip comes
from, which makes it impossible to do any of the wafer-specific evaluation
alluded to in the research questions. We cannot be sure that any two devices
are from the same wafer, or for that matter, from different ones. All we can
be sure of is that if we acquire more devices, those additional devices will be
from a different wafer.
6.3.4 Longevity as a Fitness Function
NOR XIP memory cannot tolerate even a single bit error, so a single un-
correctable bit error will condemn the entire group as a fail. The fitness
function must represent this fact, and the fitness function must reward indi-
viduals that lead long lives.
As detailed in Chapter 2 in subsection 2.2.1, the NOR chip is divided
into groups or blocks in which all cells are erased together and are therefore
subject to the same erase conditions at any given time. Thus, we may use a
single bit error per block as an indication that an individual has failed, since
it is not possible to set the read, write and erase conditions separately within
each block. During p/e cycling, a single bit error indicates the fail point of
a single block. This accurately represents the longevity in p/e cycles that a
large sample of cell(the entire block) may undergo for any given individual.
114
It is difficult to argue with this measure given that there are no extraneous
conditions or influences that may alter or change the measure of fitness.
6.3.5 Cell Current as a Fitness Function
Cell current measurement was also trialed as a fitness function. This method
involves putting the chip into a special mode of operation under which the
cell current for a byte of cells is diverted to appear on a GPIO port of the
DUT. Inside the DUT, the circuitry that implements the normal operation
of the port is disconnected from the port pins and in our test head, we
constructed relays to isolate the circuitry that handles the port from the
flow of current. Instead an instrumentation class measurement circuit was
switched in to capture the data.
Cell current has the advantage of returning a continuous measure of fitness
over the life of the block rather than the binary pass/fail of the latter method.
However, it was unclear what the cell current meant in terms of endurance at
life cycles points other than end of life. Furthermore, the cell current is very
small, so small that the instruments required to measure it reliably were of
a highly specialist nature and the circuit design challenge of extracting the
signal from the background noise were large. This factor would introduce an
unacceptable level of uncertainty into any fitness based on this measure.
Another problem with this method is that it only applies to devices in
which the cell currents are bonded out to ports, and such a configuration is
not typical.
It is now possible to summarise what was learned from the determination
of the fitness function:
• The fitness will reward gene expressions that promote long-lived mem-
ory cells;
• Fitness will be based on how long a large sample of memory cells lasts
and, as such, is quite a robust measure;
115
• There will be 160 individuals trialed within each device since this is the
total number of blocks in any one device;
• Cell current measurement is technically difficult and is of dubious merit;
• No recovery time will be allowed so all fitness results will be like for
like;
• There are large variations in intrinsic endurance between devices and
so no cross devices evolutions are possible;
• No manufacturing history is available, that is to say we cannot tell from
which wafer the memory chips originated.
Representation and fitness functions defined, we move on to execute the
first GA-based experiments.
6.4 Proving runs
The purpose of the proving runs was to ensure that the platform was per-
forming as expected, that register values were as set, and that p/e was, in
fact, progressing as anticipated.
6.4.1 Verification Testing
As mentioned in Chapter 4, the DUT was black box; there was no way to gain
visibility over execution treads. The general approach to testing DUT code
was to toggle GPIO port pins to indicate execution points and to read back
critical register value changes through the ports, sometimes serially. We made
use of error detection codes and parity bytes to verify the communications
packets. Execution time changes were measured to verify the modification
of critical parameters under program control, such as erase time.
Because a GA is a perfect ‘garbage in garbage out’ paradigm, in that,
if there is any errors such as matching the fitness to the wrong individual
116
the entire evolution is less than useless. Hence, a lot of time was spent on
verification. Several minor programming language version changes as well
as operating system version changes throughout the process meant that this
verification activity was not a one-time effort.
6.4.2 Visual Monitoring
While the DUT core is interacting with the NOR memory, no execution
cycles are possible. The core hibernates during this interaction for up to
31 milliseconds. The DUT code asserts a port pin before any programming
activity and de-asserts it afterwards. In this way, it is possible to monitor
the programming cycles of the DUT externally.
This monitoring ability proved very useful. For each experiment, it is
possible to see the progression of the programming variable as p/e cycling
progresses. Since single evaluations can take anything from several days to
several weeks to complete, this proved to be a very efficient way of checking
on the progress of an experiment. It also served to verify the robustness of
the platform, which would often be required to work flawlessly for weeks on
end without a reboot.
Both the control board and the DUT had watchdog timers and were
stateless within any evaluation. The PC application also proved to be reliable
during any length evaluation period. The platform also allows us to measure
the overall time that each iteration of p/e was taking as well as the actual
time spent hibernating. This method was used throughout experimentation
to give a graphical indication on an oscilloscope that the experiment was still
running as expected.
At this point we can say with assurance that the system is capable of
running p/e under the control of the PC. We have proven that the DUT is
responding to register modification correctly and we have much confidence
in the robustness of the platform.
117
6.5 Timing Runs
Once the hardware and software were proven to be stable and robust, it was
time to establish by experiment, the time each evaluation might take. It is
possible to calculate potential evaluation time from the device specification
sheet. However, this would only get us so far as it would tell us neither the
intrinsic endurance of the device nor the actual values of the communica-
tions and software latencies. In fact, it would only tell us the theoretical
evaluation duration minimums for any given endurance value. There were a
considerable number of unknowns here given that the communication paths
were so convoluted with the control board communicating via serial port
with the PC, which in turn scheduled the tasks to the DUT using parallel
communications through the DUT’s GPIO port and the PIO and LCD port
of the control board.
Table 6.4: Theoretical minimum experiment duration
Conditions Duration
Minimum 160 Individuals 10k Cycles 9 hours
Median 160 Individuals 40k Cycles 1.5 days
Maximum 160 Individuals 100k cycles 3.75 days
These figures are calculated with an assumed erase time of 20 milliseconds
Table 6.4 shows the theoretical minimums for the shown endurance value
giving evaluation duration of between 9 hours and 4 days. These figures,
however, do not take into account the following:
• Software and communications latencies;
• The effect of variation in intrinsic endurance between devices;
• The effect of any guard banding of the specification sheet values;
• The possible deployment of erase values from 20 milliseconds to 31
milliseconds by the GA.
118
For these reasons, the evaluations will be longer than the values set out in
the table. The objective of the timing runs is to evaluate this increase.
6.5.1 Initial Calibrations
After holding meetings with ADI engineers [43], we can expect that most
devices will exceed the stated specification sheet claim by up to four times,
and that this will vary from device to device. This means an average life of
40k p/e cycles. This average will have an important bearing on experiment
duration. Variation between devices may further increase it. The first runs
will need to establish a maximum practical figure in order to calibrate the
maximum running time range. We also need to establish the minimum erase
time that will result in an erasure, such that the GA does not need to find
this out.
From early experimentation (detailed on page 26 of Research Notebook
2), we see that erase value 0 (Etim2 =0X00h) will not at any time erase the
device. Erase value 1 will erase initially, but after approximately 5000 p/e
cycles, it will fail.
Furthermore, a block that fails will subsequently work flawlessly for a
normal life span when its erase value is reset to default after having failed at
erase value 1. Also, we notice that not only does it continue to work for a
normal life span, but it does so without reference to the first 5000 cycles.
The DUT code used for this process checks for both successful write and
successful erase in each cycle, so we know we have not induced any write or
disturb issues as they would show as read errors.
We go further and increase erase time values modestly at three, six and
nine thousand p/e cycle intervals, and again we find that a normal life span
is endured before failure. We note from literature review [5, 110, 56] and
discussion with industrial contributors [45] that these early findings are in
line with expectations in that erase time will dominate both the overall pro-
gramming time(20 milliseconds for an erase and 30 microseconds for a write)
119
and crucially, the endurance-related failure rate. In other words, it will be-
come more difficult to force the cells to change state as life progresses - and
it is erase that will drive this change. Again, this is logical in that erase, as
discussed in Chapter 4, is destructive and causes implantation of electrons
as well as injector erosion. And so we find that the cell’s ability to change
state is a function of the absolute time spent erasing and the time since it
was last erased (the recovery time).
The above information has been gathered by review of specification, ex-
perimentation, and discussion with industrial stakeholders. It can be sum-
marised as follows:
• Erase dominates the programming time and the destruction of the cells;
• The intrinsic endurance of the memory arrays is much greater than the
specification sheets claim;
• Low value erase time figures don’t cause musch damage and therefore
have little effect on endurance;
• In the absence of recovery time, it will never get easier to alter the state
of a memory cell;
From these findings, it is possible to infer the attribute that a successful
solution might contain. The information will allow us to restrict the search
space further, at least initially, in order to find solutions that represent ‘the
low hanging fruit’. This will help preserve silicon and speed up the process.
Later, the GA can be allowed the freedom to search more widely.
We can include the above acquired domain knowledge in the following
way:
• Initial focus should be solely on erase time;
• A solution may start at some low value above the minimum where an
erase can be achieved and increased at some rate over the life of the
part. The precise starting point and rate of change of erase time may
be determined by the GA;
120
• The function of starting point and curve should extend over the de-
sired endurance of the part, which in this case will be the maximum
endurance achievable;
• The maximum resolution (the total number of changes over the NVM
life) need not exceed the maximum resolution of the largest control
variable;
• The number of p/e cycles per change of values is (total expected p/e)/(max
resolution).
6.5.2 Scope of Endurances
In order to determine the scope of possible endurance, we must set a number
of software static values that determine the total number of p/e cycles that
any device will undergo.
‘Iter’, short for iterations, is a DUT code variable instantiated at run
time from the PC application to mean the ‘Number of Iterations’ and has a
vlue of 0 to 255. This is further multiplied by a static value ‘poundsize’ in
the DUT code to give a total number of iterations. That is the number of
p/e cycles that should be done for any given register set state. This single
register state, for a given number of p/e cycles, is in fact a fragment of a GA
defined individual and is refered here as a ’Fragment’.
The maximum number of register states or Fragments in a given indi-
vidual is fixed by the maximum resolution of the largest variables controlled
by the registers. This happens to be 256 since all experiments will include
‘erase’, and ‘erase’ has a maximum resolution of eight bits. Thus, the number
of possible register states during the life of any memory block is 256.
‘Iter’ is pivotal in the total number of p/e cycles that any memory block
may be made to endure in that, each erase value or register state is executed
at least ‘Iter’ times. The ‘Iter’ value is further multiplied by the DUT code
static reservation ‘poundsize’.
121
This static is initially set to five so that we may have a sufficiently large
number of p/e cycles per Fragment to yeild the maximum possible endurance,
while maintaining the variable Iter within eight bits.
This gives 255 x 5 p/e cycles per Fragment. This equates to 1,125 p/e
cycles per register state values. There will be 256 Fragments, thus the max-
imum possible endurance achievable is 288,000. This would be a very good
result for a device specified to 10,000.
6.5.3 Timing Verification
We run the test bed to verify that the erase time is changing under software
control as defined by the specification sheet. Pin P0.4 toggles during p/e
cycling and effectively measures how long it takes the DUT to erase, write
and verify a single time. When the default value is used, this toggling takes
1.6 milliseconds. The default register setting for erase is 0X08H giving an
erase time of .977 microseconds, meaning that the there is approximately 600
microseconds of latency in the write and verify portion.
When Etim1 is set to 255 (the longest erase), then we measure 3.1 gratic-
ules on the scope when set to 10 milliseconds per division. This is 31 mil-
liseconds and agrees exactly with the specification.
We also note that while the DUT latency is modest the overall system
latency is high being in the order of several hundred milliseconds. That
is the time it take the PC to download a packet, have it processed by the
SBC, communicate that to the DUT via the parallel port and have the DUT
respond. While in an overall context this is not a great problem eliminating
this will reduce experiment duration significantly.
6.6 Coupling the Hardware to the GA
In this first run the GA will choose a starting point and the slope of the
increase, where the increase may be exponential and where the exponent
may be any float value from 0 (a straight line) to no more than 6 (the value
122
at which the line becomes deeply concave). This represents all models where
the erase time and the change in erase time, ∆ erase time, is increasing from
a constant to some accelerating rate.
These numbers are generated using two methods. One as expected would
be two 8-bit Binary Strings. The other was to generate Two floats in
a shortened genome in which no crossover would occur within each num-
bers, using mutation and crossing of the entire number as the sole means of
exploration.
Figure 6.4: Progress of the value of the register TERASE, over the life of a
individual . This Register controls the erase time. On the Right when the
GA sets the slope close to 1, and on the Left when slope is set close to 6
123
Figure 6.5: Variations of representation employed, binary string and floating
point numbers
This is effectively a variant of ES. We do this for two reasons. One, since
the ‘starting point and slope’ approach was first muted, instead of directly
mapping a binary number to a register, ES was considered as a possible fit
for this revised method. ES works effectively with small populations and is
well suited to the representation [93, 30]. Secondly, it is very easy to modify
the GA to facilitate this idea.
A randomly generated sample generation is shown in Figure 6.6. Here we
can see starting points chosen by the GA but limited to the first half of the
range and slopes from almost a straight line, such as individual 4 to deeply
concave such as individual 8.
124
Figure 6.6: A sample of generation one, showing starting point in the first
half of the erase time and slopes from straight line to deeply concave
The first generation run in actual silicon is detailed in Table 6.5. This chip
has been used for debugging the system and so had seen substantial use in
the low address blocks, which is evident from the erratic data. Nevertheless,
some good results hint at the endurance values to come.
Fitness P/E Fitness P/E Fitness P/E Fitness P/E
3 3375 80 90k 115 129k 78 88k
119 133k 92 104k 133 150k 116 131k
91 102k 112 126k 138 155k 122 137k
83 93k 96 108
Table 6.5: First run endurance results for blocks 4 to 17
The first run is made on a used device, so the results are erratic. Nevertheless, some
blocks show high endurance.
Several other informal proving runs are undertaken to enhance the data
collection procedure. When this device has had all block cycled it will be
retired and is of no further use.
125
6.7 Calibration Runs
A new device was fitted to the tester in order to progress the proving runs.
A static model is run using default factory settings as a control group. Re-
membering that Iter is multiplied by 5 (thus a total of 1125 p/e) in the
DUT code, the results of the static model ranged from 40xI(Iters) to 67xI
with an average of 52xI while the dynamic (median starting point and slope)
model, with one exception, didn’t fail at any point up to 255xI. This data is
summarised in Table 6.6.
Fail Point X Iterations Value = Endurance
Minimum 40 1125 45k
Average 54 1125 60.1k
Maximum 67 1125 73.1k
Dynamic 1-7 255 1125 >286k
Table 6.6: Endurance results for first new device
The first three figures are calculated with erase set to defaults. The final figure used a
dynamic model.
These are very surprising results for a number of reasons. First, the static
or default factory results are up to seven times the spec sheet value. This
is higher than the four mentioned in industrial meetings [43] and implies
that the device is de-rated by a factor of up to 7 time. It may also suggest
that there is a lot of manufacturing variation which may be leveraged by an
automated system. Second, almost all of the dynamic group of individuals
passed at p/e cycles of greater than 286k. This group is a single generation (so
no evolution) with a GA-chosen increase rate and a starting point randomly
chosen in the first quarter of erase time.
This has a number of implications. First, the number of p/e cycles per
Iter needs to increase so that we can get clarity on when cell will actually
fail. Second, that our efforts to reduce the search space have probably paid
off in that all individuals are significantly better than the factory default.
126
This may be a happy coincidence or it may be that the search space is rich
in solutions better than the default solution. Thirdly, that using domain
knowledge in this way may impact positively on the GAs ability to solve the
problem efficiently.
In the next round of experiments we change ‘Poundsize’ from 5 to 15,
increasing by a factor of 3 the maximum number of p/e cycles that a single
block can do. We use a new device and the GA will select the starting points
and slopes of the individual experiments.
6.8 Summary
In this chapter the internal control registers at play in NOR flash reliability
were identified, the search space, a representation and the fitness or objective
function were defined. We described the proving runs and noted what was
learned from them. Timing analysis was undertaken, the results of which
were shared and discussed with industrial stakeholders. Several important
facts are discovered about the characteristics of the memory under test. This





This chapter first describes the behaviour of the test platform over the first
four devices in some detail. There is a lot of information in these early runs
that will later help to guide statistically relevant data collections. Often a
single device offers many chances to improve the process and this information
is vital given the cost of the searches. While the same level of detail is
collected for all devices, the first four are representative of all, and so the
later devices are presented in a less forensic manner.
Later it is shown how experimentation was sometimes guided by industrial
interviews, and industrial interests subsequently helped influence the overall
research direction. Next, the main tranche of experimentation is described





The data from device A is tabulated in Table 7.1 and Table 7.2. It shows
enormous average endurance of 160 in the first generation and a max of 253.
Remembering that the fitness is returned from the devices as 8-bit values,
253 equates to 854k erase cycles before failure. We get this value because
the DUT code variables ‘Iter’ and ‘Pound’ are set to 225 and 15 respectively,
and these two variables multiplied together define how many p/e cycles are
done for every increment of the fitness counter. The average values of 540000
(160 x 3375) p/e cycles against a static model of 47 equating to 158625 (47
x 3375) represents a gain of life expectancy of 3.4 on the static model and a
gain of 54 on the spec sheet. Device A is a new device and it shows the scale
of the de-rating. Discussions with ADI engineers [43] indicated that on a
single block basis, some blocks are capable of reaching over 700k endurance.
If current results are typical, we will again have to rethink the maximum
length of experiments.
Fitness P/E Fitness P/E Fitness P/E Fitness P/E
153 516k 162 547k 189 638k 164 554k
127 429k 118 398k 118 398k 149 503k
121 408k 119 402k 233 786k 98 331k
136 458k 141 476k 253 854k 200 675k
138 467k 139 469k 77 259k 224 756k
Table 7.1: Endurance results device A, generation 1
The results for device A, generation 1 show very large endurances in the dynamic model
This experiment was stopped after the second generation. In short, it
was thought better to verify the results rather than wait further protracted
lengths of time to find that there was simply a fault on the machine. In the
event, the data was verified, there was no fault and these sort of results would
129
be fairly typical, including occasional reduction in fitness in generation 2.
Fitness P/E Fitness P/E Fitness P/E Fitness P/E
102 344k 86 290k 206 685k 112 378k
113 381k 90 304k 111 375k 114 385k
117 395k 103 348 114 385k 157 530k
67 226k 146 493k 81 273k 193 651k
123 415k 92 311k 133 449k 144 486k
Table 7.2: Endurance results device A, generation 2
The results for device A, generation 2 shows a reduction in average fitness
There are a number of possible reasons for this recurring event. Firstly,
as a results of using acquired domain knowledge to constrain the GA, most
results in the first generation are viable solutions. Figure 7.1 shows these
results as a pie chart (or roulette wheel).
Figure 7.1: Generation 1 fitness proportionate roulette wheel and generation
2 individual replication distribution. Both generations contain 20 individuals
Figure 7.1 shows a slice for each unique individual in generation 1 on the
left, and on the right, in wheel two, it shows how these individuals are repli-
130
cated in the next generation. Both generations, one and two contain 20
individuals, but diversity is reduced in generation two as some individuals
are represented several times. As one would expect individuals with a higher
fitness are selected more often, and thus they appear more often in the fol-
lowing generations, while some with lower fitness values are deselected from
the population. However, modest viable solutions are the largest category in
generation 1 and are well represented in generation 2 as shown by Figure 7.2.
Figure 7.2: The occurrence of viable solutions in generation 1 and their re-
occurrence in generation 2
Several very fit individuals are not represented at all in the second generation.
This may be thought of as a quantisation error related to the small population
size. Not being trialed in the second generation, they do not add to the
average fitness as they did in the first. In summary this decrease in generation
2 is not because generation 1 is poor in solutions, but because it is rich in
them.
Secondly, we see that erase times that are gently sloped do not do well
when coupled to low starting points. This is logical since the destruction
of the oxide, and so the difficulty in changing state, is progressing faster
131
than the increase in the pressure to change (the erase time). Later we find
that gently sloping solutions require a high starting point to achieve modest
results. Remembering that flash endurance is a noisy domain, we see gener-
ation 1 with 11 shallow sloped solutions achieving modest results. These are
subsequently represented 14 times in generation 2 of which many fair badly.
In general the modestly fit survivors from generation 1 do not do as well in
generation 2. This is a result of evolutionary exploration causing the slopes
for these low starters to become critically low as shown in Table 7.3.
Generation 1 Generation 2
Start End Value Fitness Start End Value Fitness
47 54 127 47 47 81
48 53 119 42 43 67
43 51 98 43 44 90
38 38 77 43 48 92
Table 7.3: Starting values, closing values and fitness for generation 1 and 2
The results for device A, generation 2 show a reduction in average fitness
We restart experiments with this device with the remaining space from
locations 100 to 160. The results are detailed below in Table 7.4. There are
also a handful of cells re-run immediately after failure. No recovery is seen
in the short tens of minutes between these runs.
Blocks Generation Average Fitness Gain p/e
0-19 1 160 3.5 540k
20-39 2 125 2.65 421k
40-90 Static 47 1.0 158k
100-119 1 98 2.08 331k
120-139 2 173 3.68 584k
139-159 3 171 3.63 577k
Table 7.4: Endurance results for device A
The improvement over the default settings is shown as Gain in column 4.
132
7.2.2 Device B
With device B, we set out again with the intent of collecting data in order
to try to direct future experimentation. This time, the starting points are
limited to below 40 and the slope to below power of 4. The intent is to force
the use of lower starting points in an attempt to exploit all of the erase time
range. The idea is that a less forceful start causing minimum implantation
and damage, with a gentle increase to counter this, might show benefits in
oxide health.
The results of device B, set out in Table 7.5 show this is not the case
and that it does not yield better results than device A. In contrast, device B
shows poor results in all generations, although one can see a marked increase
as the second and third generations tend towards longer starting erase time.
The GA converges by the second generation so only three generations
are run. This is the result of a very high fitness of 118 on a single block
causing generation 2 to converge on this value. This indicates that there
is not enough exploration at play. This is not surprising since mutation is
set at a low .05 and crossover occurs only between the values of Slope and
Starting Point as discussed in Chapter 6.
Blocks Generation Average Fitness Gain p/e
0-19 1 64 1.8 216k
20-39 2 77 2.16 260k
40-59 3 92 2.59 310k
60-99 Static 35.5 1 119k
100-119 1 60 1.92 229k
120-139 2 85 2.73 325k
140-159 3 93 2.98 355k
Table 7.5: Endurance results for device B
The last blocks in device B use an Iter value of 255. This value is used from here on
instead of 225
This is acceptable in the context of it being the second device to run. All
133
information gleaned here is of high value allowing us to shape later evalua-
tions. The mutation rate will be substantially increased in later runs. The
next 40 locations were processed using the static model to collect a good size
control group. The average fitness for the control group are 34 and 37, while
average fitness for the generation 1-3 are 64, 77 and 92.
There are 60 locations left in this device, so two things are done. First,
locations 40 to 60 are re-run immediately. Since none of these blocks managed
to erase it is reasonable to conclude that no recovery has taking place during
the tens of minutes since last ran. This means no recovery is evident. The
software latency periods are much shorter than tens of minutes so it is clear
that no recovery will take place during ordinary platform runs.
Next, we increase Iter to 255 allowing larger steps and greater final total
of p/e and free the GA to choose any starting value below 100 keeping the
slope below 4. Average fitness in location 100 to 160 are 60, 85 and 93 with
the GA again biasing towards the higher starting points.
7.2.3 Device C
The aim with device C is to consolidate the data from device B and try to
find the types of evaluations that can be effectively run. A starting point of
140 and below (more than half the search space), and a slope of 0 to 6 is
used. The results for device C are shown in Table 7.6. A dynamic model is
run from 0 to 60. This is three generations resulting in average fitnesses of
128, 123 and 122.
In an effort to deepen understanding of variation in the static model,
blocks 80 to 160 are forfeit to 4 static models. We see an average of 38 with
a maximum of 66 and a minimum of 27 and a standard deviation of 6.37. One
would expect to see larger variations in the dynamic model since most blocks
undergo radically different erase conditions throughout their lives while the
static model are all identical. Later trials using multiple blocks with a single
evolved solutions compare well with this deviation figure, meaning that long
lived devices are no more likely to generate outliers than standard devices.
134
Blocks Generation Average Fitness Gain p/e
Device C
0-19 1 128 3.26 489k
20-39 2 123 3.13 470k
40-59 3 122 3.11 466k
60-79 Static Increase 101 2.57 386k
80-99 Static 43 1 164k
100-119 Static 38 1 145k
120-139 Static 38 1 145K
140-159 Static 38 1 145k
Rerun 7 months later
80-99 Static 25 .63 95k
100-119 1 101 2.57 386k
120-139 2 117 2.98 447k
140-159 3 136 3.47 520k
Table 7.6: Endurance results for device C including recovery figures
All gain figures use original static values as a baseline
Blocks 60 to 80 in this device are in a continuously incrementing arrange-
ment in which each individual uses a different, single erase time throughout
its life. In previous experimentation, a clear pattern emerges of higher start-
ing point individuals doing well. The intention is to test for a simple static
solution, other than the factory default that may solve the problem. This
group may be considered the test bed for a forthcoming method to test for
this possibility.
Recovery after Extended Periods
Block 100 to block 145 re-ran on this device seven months after first running.
This is an opportunistic run to ascertain what kind of recovery may be seen at
this remove. In the event, recovery is very substantial and we see endurance
averages for the most part, of within 90% of the original average fitness for
135
these same memory cells and in some case in excess of 100%. If the new
static value of 25k p/e is taken as the baseline then these figures are even
better.
This may indicate promise for the evolutionary approach to a diversity
of flash problems. It also clearly underlines the value of a wear leveling algo-
rithm to block usage in high-endurance application typical in ‘NAND type’
data intensive application [46]. Wear leveling algorithms allow all blocks to
avail of the maximum recovery time available within the application, while
also keeping the entire devices at the same notional age. The last part of
this test cycle from 146 to 159 was interrupted by a power outage.
Other work completed during the above intervening seven-month period
included running further devices, outlined next. We also engaged with Mi-
cron Technologies Inc [60] in an attempt to help them with their NOR cost
reduction programme.
7.2.4 Device D
On Device D, a standard evolution is run using 4 generations with Iter set
to 255 which results in fitnesses 136, 128, 160 and 161. Next, an incremental
series of static models is ran on both the expired blocks 1 to 39 and on fresh
blocks 100 to 119 to see if there is a simple static model better than the
default, as mentioned in the previous paragraph. We run the best of these
models (which is erase value 216 in all cases) over twenty locations from 120
to 140 and get an average fitness of 116. Next, a static model is ran on 140
to 160 and resulted in an average fitness of 40, proving that this device falls
within the normal range of intrinsic endurance.
136
Blocks Generation Average Fitness Gain p/e
Device D
0-19 1 136 3.3 522k
20-39 2 128 3.11 491k
40-59 3 150 3.64 575k
60-79 4 161 3.9 616k
80-99 Static Increments N/A N/A N/A
100-119 Static Increments N/A N/A N/A
120-139 Static 216 116 2.8 444k
140-159 Static 41 1 157.6k
Table 7.7: Endurance results for device D
Due to the limitations of available space, this is not an exhaustive test,
but it is clear that the high starting point ‘best of’ static model does well, as
inferred by previous data. However, the dynamic model easily beats this in
generation 3 despite the fact that this generation is not converged. Remember
that a static solution is 20 samples of the same erase time while an evolved
generation trials a diversity of phenotype(Six in this case).
Furthermore, a major consideration is timing. This model has a con-
stant, very long erase time which is undesirable from a system’s response
time point of view. 20 to 30 milliseconds is considered a long time in em-
bedded programming in which a single instruction may take nanoseconds to
complete. The dynamic model has very short erase time in the early stages
of life, though it increments very often to high values. It is, however, a simple
matter to curtail p/e cycles and thereby increase response time. This will
not unduly affect endurance since the later erase values do not significantly
extend devices life(the high hanging fruit). As a minimum, end-of-life mem-
ory array poor response time may be acceptable, but start and mid-life poor
response is not.
The dynamic model would seem to be a more natural type of model
in that the harsh treatment that is successful in the static model is, by
137
and large, avoided. This harsh treatment of very high erase times from the
outset will cause the device to draw more current to satisfy the on-board
charge pump [85] and may induce disturb or retention issues. It may even
result in a lower mean time between catastrophic failure [133]. The dynamic
model attempts to use only the force that is required to allow the device to
change state.
7.2.5 Summary of the First Four Devices
It is appropriate to summarise here what has been learned so far:
• There is substantial guardbanding of the endurance specification with
typical intrinsic endurance up to seven times greater than the spec
sheet claim;
• There is substantial variation between devices which could be exploited
by an automated system. However, it also means that it is not possi-
ble to do cross-chip evolutions without significant correlation of fitness
values;
• There is less variation within a device but it is not negligible;
• Erase values of one and above will erase the device but endurance is
enhanced by using larger starting values;
• Erasing accounts for the vast majority of programming time as well as
dominating cells destruction;
• Recovery is insignificant over short time periods of tens of minutes but
is dominant over longer periods of months;
• It will always get harder to erase the device within the time frame of
an experiment;
• We have successfully tested and adapted the test platform to the prob-
lem with adjustments to the embedded software and to the performance
of the GA;
138
• The endurance environment is noisy and the low population sizes may
introduce quantisation errors in some evolutions;
• A simple static model, although at first glance it does better than the
factory default, will not suffice as a solution due to its harshness and
poor response time throughout the life of the part;
• The evolutionary process will probably benefit from more exploration.
Evaluation periods are considerable and collecting a meaningful amount
of data will take substantial time. At this point, it has been proven only that
there are the means and the justification to commit to this effort.
7.3 Primary Data Collection
On a working test platform with functionality only found on industrial class
component testers, there is a myriad of things that one may wish to do at
this point. However, the length of time the evolutions take forces us to be
conservative in terms of scope. The first objective is to collect data to support
the thesis contentions. Patience is important in collecting sufficient data to
underpin each paradigm.
We first trial the two floats genome. Further on, this is changed to the bi-
nary string model, which introduces more variation with crossover occurring
within the number values as well as mutation. During this testing period,
thought was given to how fundamental improvements may be made to the
tester. This thought process culminates in a major redesign to accommo-
date multiple simultaneous sites and streamlined communication. Due to
the scope of this research, the new design is not utilised heavily here but its
importance is in demonstrating a viable parallel template for future research,
including current on-going NAND work.
139
7.3.1 Devices E to O
The platform is tweaked to improve performance on the two floats genome.
With Device G the GA is freed somewhat to see the effect of other variables
on endurance. From Device H on, the mutation rate is increased from .05
to .35 .
Device E
With Device E, six generations are collected as well as two default statics
groups used as a control. We see average fitnesses of 134, 175, 138, 129, 119
and 115 against control group fitness of 54 and 51. The maximum gain is
3.43 while the standard deviation for the static groups is 7.1 and normalised
standard deviation over average is 7.34. Timing analysis is also facilitated
by this run. On average 11 fitness quanta(tick of the fitness counter) can
be cycled to destruction per minute using default register settings. This
equates to 11*15*255=42000 cycles at 1 millisecond or 42 seconds. This
means that 18 second of every minute is spend on latency or 30% of overall
time. With the evolved solution these averages don’t tell us much, since the
erase time changes are so diverse but we see from .07 to 4.7 quanta completed
per minute. This fits with the previous cycle data if the longer erase time
values are taken into account.
Device F
Device F is similar to Device E, except that the two static models are executed
afterwards. Average fitnesses per generation are 125, 139, 145, 166 and 167
with the static control yielding 47 and 48. This is a gain of 3.55 over the
static factory default.
A general trend of success emerges for high starting erase times with gen-
tle slopes, and moderate success for high starting erase time with moderate
slopes and poor results for individuals with low starting erase times. This is
supported by the final generation with all top ten results exhibiting one or
140
other of these features with seven of the ten showing both features.
The poor showing for erase plans with low starting erase values can be
thought of in terms of the deterioration of the cells progressing faster than the
advances in erase time, which might have compensated for this degradation.
This doesn’t mean that the cell is destroyed, merely that the erase plan did
not supply a curve capable of maintaining an effective erase throughout the
life of the cell.
Device G: Applying the GA to Other Variables
We now seek to broaden the search space to include other variables at play in
NOR flash reliability. The starting point and slope data is copied to program
time. That is, the program time will have the same pro-rata starting location
and rate of increase as will the erase. This approach is logical in that one
function is placing electrons on the floating gate while the other operation is
taking them off. It would seem sensible to track one to the other. While the
resolution of the program variables are a lot lower than that of the erase time,
there is no evidence of program time being effectual or of any improvement
in endurance. furthermore, all failures remain erase mode failures. This
is positive in that any endurance gain is accounted for by erase operations
alone, meaning read and write times may be static which suits embedded
programing timing analysis. Erase time elasticity is more manageable in
code sets than program or write times since erase is typically isolated from
critical functions as a matter of course due to its length. Average fitnesses
are 138, 119, 124, 116, 130, 125, 138 against a control group of 59. This is a
gain of 2.4.
Device H to O
With Devices H to O, the mutation rate is increased from .05 to .35 as
discussed earlier. This value is in line with ES norms [15]and has the effect
of increasing exploration thus introduce more diversity into the genome.
141
Figure 7.3: Average life gain over the default solution per generation
Device Average Improvement Best Generation P/E Endurance
Device E 3.33 522k
Device F 3.49 634k
Device G 2.33 528k
Device H 3.48 654k
Device I 3.76 570K
Device K 3.63 708k
Device L 3.42 604k
Device M 3.34 532k
Device N 3.70 608k
Device O 3.56 643k
Table 7.8: Average improvement and endurance for fittest generation
142
We do see more diversity in later generations and convergence is slower
but it does still occur within the six generations. The gains for all devices
are detailed in Table 7.8 and a sample is graphed in Figure 7.3 with average
and maximum gain figures graphed in Figure 7.4.
Figure 7.4: Max and average life gain over the default solution per generation
7.3.2 Device GB1 to GB6
The following devices were run using a binary string representation. The
main difference here is that the numbers controlling the slope and starting
point are generated using a 16-bit binary string rather than two floating point
numbers. Using this option means that crossover may occur anywhere in the
string rather than simply being the exchange of one float for another. This
is more disruptive but ultimately generates more exploration.
143
Convex Curves
Also in this case, the rate of change co-efficient is allowed to be a fraction,
and thus the slope of the rate of change may be convex. An example of a
convex erase time curve might start at a low erase time and rise up very
quickly with the rate of change later in life being quite modest.
Convex curves were not permitted with the two floats representation on
the basis that the rate of change would minimise the impact of low erase times
on the overall result, hence this type of plan did not seem very promising.
In the event, this proved indeed to be the case and no convex curves ended
up in final populations.
Convergence
In the two float representation there were some evolutions that did not con-
verge. Examining the data, one can see that because of the noisy nature of
the environment, the selection pressure is low. This does not happen with
binary numbers because there is more exploration, yet not so much that it
cannot begin to converge within six generations. One may still be seeing
sub-optimal results, but they are always better than the factory default or
any other static solutions, as well as better than any selection of random first
generation plans.
Device Average Improvement Best Generation P/E Endurance
Device GB1 3.60 665k
Device GB2 3.74 662k
Device GB3 2.52 505k
Device GB4 4.12 604k
Device GB5 2.73 504K
Device GB6 3.61 662k
Table 7.9: Average improvement and endurance for fittest generation devices
GB1 to GB6
144
Finally for completeness several single individual evaluations involving
tens of trials were analysed, showing gains of 3.6, 3.58 and 4.1.
7.4 Next Steps
During the data collection phase detailed above, discussions were held with
industrial parties. As mentioned, Micron expressed an interested in exploring
the possibility of using some of the methods detailed here to effect cost re-
duction in their NOR process [60]. They also expressed interest in a NAND
variant. In the event, a basis for continuing down this avenue could not be
found. It was apparent that at that point during the project we had neither
enough data nor enough finesse on the system for an industrial partner. Fur-
ther challenges in the silicon industry at this time following the ‘Dot Com’
downturn means funding for experimental cost reduction programmes within
Micron is limited.
In a later phase several papers were published, including a journal [115]
and a several conference paper [114, 113] (nominated for best paper). These
papers were also referenced and described in some popular press maga-
zines [35, 72] as a result of which, there was some attention from a number
of significant industrial players in the NAND and SSD space(EMC2, Hynix,
Samsung, STEC).
NAND devices were the main focus of the aforementioned group. A sub-
stantial research program, beyond the scope of this thesis is ongoing under
Enterprise Ireland’s Innovation Partnership scheme between Limerick Insti-
tute of Technology, University of Limerick and EMC in this area in which
the researchers in this work are the principle investigators.
7.5 Summary
A substantial amount of data from more than 20 devices was presented here,
some showing extraordinary results that underpin the core research questions
145
and contentions set out in Chapter 1. The best use is made of costly resources
such as time and silicon and we see early results informing and guiding later
runs. Every effort is made to curtail ambition in order to achieve the re-
search goals but, nevertheless, both program and erase control parameters
are trialed and three different search methods are used including two GA
paradigms. Finally we flag the rational for departure from NOR operations




Chapters 6 and 7 details the majority of the results of this research. Here
those results are discussed in the context of the research goals and the thesis
contentions and thus this discussion is used to explain the contribution of
the work as a whole.
Firstly in this chapter, the basic tenets of the research are revisited. The
central hypothesis is stated, leading directly to a list of core research questions
that are set out in Section 8.3. A set of more detailed contentions was detailed
in Chapter 1 Section 1.4 and how these contentions have been met by the
work is now set out here in Section 8.4. We suggest further work, and a
series of recommendations are made as to the topics that may prove fruitful
to pursue in future research.
8.1 Research Achievement
This research set out to prove that a genetic algorithm could be used to auto-
mate the process of finding suitable (or superior) parameters for controlling
NOR flash memory programming and thus enhance reliability. This has been
achieved and the results are tabulated in Chapter 6 and 7.
Chapter 4 describes how to build and validate an automated system that
uses evolutionary search techniques to perform embodied evolution on silicon
147
memory. We have used small populations and shown how learning in one
evolution can be passed to another. The research has demonstrated excellent
results and exposed aspects of the trialed devices not documented before.
We have discovered that a large guardband is built into the specification
sheet endurance value and we have proven that it may be tuned out using
an automated system. Furthermore, it has been shown that a static set of
parameters,1 however well chosen, cannot ideally suit a device throughout
its product life.
It has been shown how to achieve cost reduction in the qualification pro-
cess and a repeatable, structured approach to parameter discovery in flash
memory has been defined. The rewards are better, longer-lived devices in
return for reduced manufacturing effort.
We have publishes a number of peer reviewed papers (listed at the be-
gining of this document) including a journal paper and the work has been
mentioned several times in the international popular science press.
8.2 Motivation and Central Hypothesis
In this work we put forward the central hypothesis that an artificial evolu-
tionary algorithm, such as a GA, may be used in the process of efficiently
finding optimal operating parameters for NOR flash memory.
The current method typically used by silicon foundries is to use manual
iterative testing to find safe values followed by a qualification process for
all devices of a particular type. This approach has a number of significant
weaknesses as noted in Chapter 1.
The motivation of this research was to test the idea that there is a better
way, and that a GA may be employed to search the solution space in a
more intelligent, automated and altogether better manner. To that end, we
postulated that it is possible to integrate a GA into a hardware test platform
and that the GA can operate directly on the silicon without using a model
1Where parameters do not change during the life of the part
148
or other such approximation.
Before examining the contribution of this research in detail the core re-
search questions and contentions as set out in Chapter 1 are reviewed for the
convenience of the reader. While examining how each contention has been
met both the positive and the negative propositions are discussed where ap-
propriate.
8.3 Core Research Questions
To explore the central hypothesis, this research posed a series of related
questions:
CQ.1 Can a test platform be specified, built and tested that will test, and
serially retest multiple NOR flash devices in a way that enables a GA
to operate on a population of such devices for the purposes of artificial
evolution?
CQ.2 Can such a system deliver an improvement in reliability for the stable
mates of the devices under scrutiny?
CQ.3 Are there any additional advantages to considering such an automated
system, such as a binning or grading solution to separate good devices
from better devices?
These central questions in turn go on to pose a series of more specific re-
search questions, which are presented as a list of contentions in Chapter 1
Section 1.4. In the next section, how the research has addressed each of these
contentions is discussed.
8.4 Addressing the Research Contentions
In Chapter 1 Section 1.4 the research questions are represented as a list
of contentions labeled Cn.1 to Cn.6. This section discusses each of these
contentions and says how the research has met each one.
149
8.4.1 Contention Cn.1
We contend that it is possible to build a test platform incorpo-
rating a GA, to perform destructive testing in real-time on hard
silicon in order to find values for programming parameters that
will improve the endurance of that device.
This has been proven. We have designed and built a platform with ca-
pabilities only found on industrial class component testers that can place
the NOR memory programming conditions totally under the control of an
evolutionary algorithm.
No model has been constructed and no assumptions are made in relation
to the performance of the devices. The GA operates on real silicon in real
time, just as the flash controller will in the field. The results show that using
a GA discovered solution will yield an enormous increase in endurance over
the specification sheet value.
Detailed requirements and specifications were gathered and valuable ex-
perience charted in building test platforms to do this type of analysis. Fur-
thermore, the research has shown the prowess of GA at finding solutions
using small populations in conjunction with employing elements of domain
knowledge.
It has also shown the general value of optimising those processes that are
destructive to silicon devices at the nanometer level. This last point in turn
holds the promise that NAND structures will benefit from a similar effort.
This is further explored below in Section 8.5.1
In summary, the process as described works, the electronics and software
are viable and there is a high degree of potential, as testified by the level of
industrial interest.
On the negative side, the business case for the dynamic model2 in NOR
is questionable for the following reasons:
2Where parameters change during the life of the part
150
• NOR code memory does not normally require high endurances and
it has become clear over the years that NOR is not the forerunner
for domination of the data storage and mass storage markets where
endurance is critical. This means that NOR will continue only as a
code set or XIP storage medium for the foreseeable future[9];
• Some NOR devices are not ideally suited to a dynamic solution be-
cause it requires that counts be maintained of the number of cycles
each block has endured. This information is used to track when GA
prescribed parameter changes should take place. This overhead may
be a significant cost in NOR designs given the relatively small size of
code space memory;
• Real time or embedded programmers do not like uncertainty in relation
to the time taken to perform flash programming tasks, as it makes it
difficult to calculate real time responses for end user products[84]. This
is especially important in mission critical applications. This may force
NOR manufacturers to provide op-out modes, again adversely affecting
cost competitiveness.
8.4.2 Contention Cn.2
We content that it is possible to find values for programming pa-
rameters such that failure of the device is estimated in a short
period of time by the rapid destruction of small portions of the
NOR flash device, effecting a binning solution that separates good
devices from bad.
Very early in the experimental phase as detailed in Chapter 6 and Chap-
ter 7 it was proven that this contention was true. It was found that upon
cycling a block to destruction in any given part, the rest of the blocks in this
part would degrade to destruction in a similar time frame when compared to
the time frame for block destruction in other devices.
151
Another way of saying this is to say that the spread of endurance was
much greater over a group of devices than over a single device. This means
that selecting a relevant block or a relevant sample of blocks and cycling them
to destruction is an effective predictor of the general intrinsic endurance of
the device.
It is common in silicon manufacturing to use information like this to sepa-
rate devices that have different operational behaviours into different physical
bins for marking and packaging. For example the Intel 486SX and 486DX
are the same dies, but in the case of the SX the maths co-processor is faulty.
The SX is then sold as an entry level processor. The same silicon is later
sold as a discreet maths co-processor (The Intel 487) to those who bought
the SX.
The problem with doing this for memory devices based on endurance is
that one must first destroy the device in order to prove where it will fail. If
the destruction of a small sample of memory blocks can be characterised in
a suitable time frame, then it will be possible to separate good devices from
poor ones.
The length of any endurance test is critical. If this is too long then it will
increase the testing time and thus the ATE (Automated test Equipment)
resources associated with each device with the corresponding knock on effect
in cost. From calculations in Chapter 4 we can expect such a test to take
anywhere from 40 seconds to 23 minutes. It is possible to reverse the fitness
function of the GA to evolve a more destructive parameter set. Beyond the
scope of this work, but none the less interesting would be to show that this
new endurance fail point is still highly predictive of the intrinsic endurance
for the device as a whole.
The general principle can also be applied to a complete wafer. Sample
sites could be provided that could be tested at the probe stage of manufacture
to facilitate this.
To sum up, this research has proven that it is technically possible to effect
a binning solution for flash memory endurance and we have shown that with
152
further work there is reason to believe that it can be done economically.
8.4.3 Contention Cn.3
We contend that it is possible to find values, in this way, Find val-
ues, in this way, for programming parameters such that a general
improvement in endurance can be achieved that applies to all NOR
type devices of this type.
This point is proven. Chapter 6 set out the results of endurance testing per-
formed with GA derived parameter sets. All well out-perform the standard
factory settings, some by as much as 400% which represents more than 70
times the specification sheet claims.
The data shows clearly that the results for the evolved models are much
better than those of the control group. In the case of the dynamic model,
at this remove, it is unsurprising since the model uses only the electrical
pressure required for that portion of the devices life and no more, resulting
in the movement of smaller numbers of electrons on and off the floating gate
and so minimising implantation and structural decay. Since less degradation
has occurred for the number of memory state changes or ‘p/e age’ of the
block, less pressure is required for the next memory state change than would
have been required for the commensurate factory model p/e age.
There is also a system response time advantage for this method since it
uses shorter erase times at early and median life stages. This is significant
since erase dominates the programming time and is an order of magnitude
greater than the nearest programming action duration such as read or write.
In the case of the static model, many plans such as the factory default
plan quickly become incapable of erasing a block, which is not to say that
the block is un-erasable. We have shown that imposing such a restriction
will mask the inherent ability of a memory cell to change state.
However many of the discovered static plans performed strongly and the
method has the advantage that there is no need to capture and manage
the p/e age for each block. In NAND this is not an issue since there will
153
always be a memory controller charged with managing wear leveling, ECC
and garbage collection. It is a trivial matter to further charge it with p/e age
management. In micro controllers such as the ADu812 the problem is also
easily managed by the on board micro controller, however in XIP memory no
such controller is available and the host system would be required to manage
the data for a dynamic model.
8.4.4 Contention Cn.4
We contend that it is possible to find values for a batch of NOR
devices such that an improvement in endurance can be achieved
for that batch.
This contention is not proven but is highly supported. It has been shown
that there is a large spread in intrinsic endurance between devices. What is
unknown is if this spread is related to the wafer that produced the device.
From discussions with industrial stakeholders[44] it is believed to be at least
partly the case. In general it is difficult to prove this contention since to
do so would require a large sample of devices of known wafer history being
trialed.
Since we do not have wafer specific information for the devices trialed
here, that places it firmly beyond the scope of this research.
However in Chapter 6 results from a large sample of devices were seen
that had previously been cycled to destruction and as such, had dissimilar
characteristics to new devices. These pre-degraded devices exceeded the gain
figures they achieved in the first evaluation when measured against a similarly
pre-degraded control group. While this does not prove the contention, it is
highly supportive of it, if variation is wafer dependent.
8.4.5 Contention Cn.5
We contend that it is possible to find values for a specific device
such that an improvement in endurance can be achieved for a spe-
154
cific device.
As in contention Cn.4 this is proven. However it is not a practical propo-
sition since the process of evolution will destroy almost all of the available
experiment space within a device and so what’s left would be long lived but
uselessly small.
To make this a practical proposition would require the provision of sam-
ple blocks of very small proportions, that are indicative of the rest of the
memory array. This is entirely possible but would require the cooperation of
a fabrication company and so is a significant project in its own right.
8.4.6 Contention Cn.6
We contend that it is possible to reduce search expense in terms
of time and destruction of flash real estate by using small popula-
tion methods and by directing certain aspects of the search using
domain knowledge and the knowledge gained in previous experi-
ments.
This contention is proven. GAs are normally deployed in areas where the
search costs are low. For example, by using software models rather than real
world data, a GA can iterate through as many individuals and generations
as the speed of the computation platform can viably process. Using current
technology this is very considerable. The problem with this approach is that
all models are wrong. The only difference between a good model and a bad
model is the scale and significance of the errors[62].
In some domains it is not possible to construct a model. In others, the
effort of constructing a model is greater than the effort in iteratively solving
the problem. Real world search spaces are frequently costly to search and so
are sometimes not well suited to the iterative, stochastic approaches requiring
many generations and copious individuals to cover all areas of the search
space.
This research has shown that with some acquired domain knowledge, a
155
population size of 20 individuals per generation, with as few as 5 generations
is enough to yield solutions that show large improvements in life expectancy
for NOR flash memory. It has also shown that this quantity is sufficient to
calculate up to 256 solutions for use over the life of the silicon in the field.
The domain knowledge is in some cases common sense, such as a device
will never become easier to program with increases in implantation and struc-
ture decay. Still others are acquired during early GA runs, for example, that
degradation is dominated by the erase function.
This acquired domain knowledge is used to guide the GA towards poten-
tially more productive space while avoiding that search space that is consid-
ered to be of little use or wasteful. In effect, we see the transfer of search
space knowledge not only from generation to generation of the GA but from
run to run of the entire experiment, as the results of the early experiments
influencing the design of the later ones. This is a sensible approach in that
the GA is not required to re-invent the wheel at each run.
Furthermore, the space is searched not only in real time but also in reality
with associated destruction of silicon. Without the principle of reduction of
search expense this destruction will have been wholesale, and it is doubtful
that the research would have proven possible. The corollary of this is that
using small population methods and domain knowledge, it is possible to use
GA effectively in expensive, real world and real time applications.
8.5 Recommendations
In the previous section several references were made to work that is beyond
the scope of this thesis. They will not be re-iterated, but are listed here as
well as some that are not explicitly discussed above. It is recommended that
some or all of the items listed be pursued.
156
Possible Implementation and Further Work
1. Static Model in NOR: a single set of register values, automatically
discovered that are tailored to a subset of flash memory parts;
2. Dynamic Model in NOR: discover a number of suitable register values
for use at various times during the life of the part in the field;
3. Binning Application in Final Test: a rapid test to grade devices based
on intrinsic longevity;
4. Binning Application in Probe: the provision of sample sites to either
grade the entire multiplicity of devices on the wafer and/or calculate a
more suitable set of register values for all wafer chips;
5. Automation of the Characterisation and Qualification Process: reduce
the cost of the qualification process by automating parameter discovery;
6. NAND Binning: as in NOR binning, ascertaining a more suitable reg-
ister value set and/or grade the devices based on intrinsic endurance;
7. NAND Endurance Enhancement for SSDs and Enterprise Storage: ap-
plying the process of automated experimentation using a genetic algo-
rithm to the NAND memory architecture with the aim of improving
standard grade NAND devices such that they can be used in enterprise
class SSDs.
Implementation
The barrier to doing many of the items listed above for NOR flash (the first
five) up to now, had been the manual nature of calculating a good set of
parameters for each and every batch.
This research offers a solution to that problem, in demonstrating an au-
tomated, painless way of calculating an optimum set of parameters. This
157
can be done over and over again using equipment already used for the manu-
facturing process. There is substantial potential for a manufacturing process
improvement here as described, but several steps are required.
1. Scale up operations and run the process on a large number of devices
in which the donor wafer is known;
2. Prove the concept by evolving safe, better parameter sets for each wafer
group;
3. Qualify each group under an appropriate (JEDEC) procedures to prove
that each device group still complies with the revised specification
sheet;
4. Qualify the entire process as a single manufacturing operation.
Clearly this is a lot of work, best done on a commercial footing within a
silicon foundry due to the requirement to trace a large quantity of wafer
batches. However the concept has been proven to be sound by this work.
8.5.1 A blueprint for future experimentation on NAND
Flash memory
While the methods described may be transferable to NAND most of the
outcomes and findings are not, as the device internals are radically different
to NOR. Furthermore, the operation of the NAND memory’s state machines
and memory controllers are fundamentally different to that of NOR. They
track and verify erase and programming cycles as a matter of course and
repeat operations based on success or failure of each cycle[9][106]. This will
make any NOR like analysis as currently set up impossible as the target will
continually move as far as any GA derived solution is concerned.
However it may be possible to apply the methods used here to NAND in
controlling such things as programming voltages and currents, the number
of retries or in the placement of thresholds and other such physics.
158
This work has shown the general value of optimising those processes that
are destructive to silicon devices at the nanometer level and it is hoped
that NAND structure will benefit from a similar effort. This is a tantalising
prospect since it is now the storage medium of choice for data streams and
mass storage[23][9]. Furthermore, it is known to be critically sensitive to
wear-out reliability issues[40][25][9] in a manner in which NOR simply is
not[80][116].
These problems caused by destructive processes are set to dominate mem-
ory manufacturing[128] as lithographic processes continue to scale the silicon
geometry[83][128] and fewer and fewer electrons represent the logic states[81].
This brings retention to the fore as an independent reliability issue rather
than an End-Of-Life endurance related issue as it is in NOR. Any work on
NAND will have to focus on retention as well as endurance.
With retention issues come read errors since retention affects only the
read operation. It is clear from the specification sheets[106][46] that read
is the predominate error state in NAND and NAND memory is designed to
function in the presence of read errors. The number of read errors tolerable is
a function of the host systems ability to correct them as well as the ability of
the NAND device to store the redundancy data required to do the corrections.
Any evolution in NAND will have to account for ECC levels in the fitness
function.
Retention is a function of time and it will not be possible for any GA to
wait the retention period (up to one year) to calculate the fitness. It will
be possible to make use of the Arrhenius relationship[13] to accelerate this
process and it may also be possible to relate Raw Bit Error Rate(RBER) to
notional retention. This will have to be proven by experiment first, before
any evolution can begin.
Multiple sites will be desirable as is the case with the second generation
NOR platform. Communications should be Ethernet or USB with Ethernet
providing more scope for parallel communications. To implement either will
require each site to maintain an embedded processor to effect the communi-
159
cations protocol. The same processor should be able to create the waveforms
required to operate the NAND flash interface and implement the control
grammar. Embedded systems have moved on since the start of this research
project and an off-the-shelf single board computer preloaded with an oper-
ating system such as Linux or Windows CE is many times more powerful
that the bespoke system used for this work and would provide many of the
features required, such as USB or Ethernet.
NAND is in some ways unrecognisable as the subject for the platform
described here, the general approach however, is relevant. Since the inception
of this research, NAND reliability has moved center stage and the effort to
address the implementation difficulties is very worthwhile in the opinion of
the current researchers and in the opinions of those stakeholders that have
been consulted.
8.6 Summary
The contributions of the work are many fold and are mainly expressed in
Section 8.4. As the speed of silicon data retrieval exposes all other bottle-
neck in computation and in data center infrastructure, flash has moved from
a performance option to a must have technology [24]. Furthermore, as work
progresses on data packing densities, silicon memory[77] has become criti-
cally sensitive to reliability issues and this work has become all the more
salient. Due to the proprietary nature of much of the input data(including
pre-release data sheet) as well as the sensitivity of the foreground IP, an
embargo is in place on this thesis limiting publications, nevertheless it has
gathered considerable interest as stated in Chapter 1. Research funding has
been secured and work is now on-going in conjunction with several research
institutions and major multi-national companies in the silicon memory field
as a result of this study. One of the challenges in completing this thesis was




[1] Drain-avalanche induced hole injection and generation of interface traps
in thin oxide mos devices. 28th Annual Proceedings of the International
Reliability Physics Symposium, pages 150 – 153, 1990.
[2] Analog Devices BV Limerick, Microconverter group. Adu812 product
specification. Proprietary Information, April 1998(preliminary).
[3] Analog Devices BV Limerick, Microconverter group. Aduc812 user
specification, 2002.
[4] Analog Devices BV Limerick, Microconverter group. Aduc824 user
specification, 2002.
[5] S. Aritome, R. Shirota, G. Hemink, T. Endoh, and F. Masuoka. Relia-
bility issues of flash memory cells. Proceedings of the IEEE, 81(5):776–
788, 1993.
[6] G. Atwood. Future directions and challenges for etox flash memory scal-
ing. Transactions on Device and Materials Reliability, IEEE, 4(3):301–
305, 2004.
[7] G. Atwood, A. Fazio, D. Mills, and B. Reaves. Intel strataflash(tm)
memory technology overview(white paper). Technical report, Intel,
1998.
162
[8] S. Baluja, R. Sukhankar, and J. Hancock. Prototyping intelligent Vehi-
cle modules using Evolutionary Algorithms, chapter Part III Computer
Science and Engineering, pages 241–257. Springer, 1997.
[9] R. Bez, E. Camerlenghi, A. Modelli, and A. Visconti. Introduction to
flash memory. Proceedings of the IEEE, 91(4):489–502, April 2003.
[10] S. Bhattacharya, K. Lai, K. Fox, P. Chan, E. Worley, and U. Sharma.
Improved performance and reliability of split gate source-side injected
flash memory cells. The IEEE’s International Electron Devices Meeting
(IEDM), pages 339–342, August 2002.
[11] S. Billings, B. Kenneth, and M. Sambridge. Hypocenter location: Ge-
netic algorithms incorporating problem specific information. Geophys-
ical Journal International, 118:693–706, 1994.
[12] L. B. Booker, D. E. Goldberg, and J. H. Holland. Classifier systems and
genetic algorithms. Artificial Intelligence, 40(1-3):235–282, Setember
1989.
[13] A. Brand, K. Wu, S. Pan, and D. Chin. Novel read disturb failure
mechanism induced by flash cycling. The IEEE’s 31st International
Reliability Physics Symposium (IRPS), pages 127–132, 1993.
[14] M. S. Bright and T. Arslan. Synthesis of low-power DSP systems
using a genetic algorithm. The IEEE’s Transactions on Evolutionary
Computation, 5(1):27–40, 2001.
[15] J. Brownlee. Clever Algorithms, Nature-Inspired Programming Recipes.
Lulu.com, 1st edition, 2011.
[16] M. V. Butz and S. W. Wilson. An algorithmic description of xcs. Soft
Computing - A Fusion of Foundations, Methodologies and Applications,
6:144–153, 2002. 10.1007/s005000100111.
163
[17] G. Campardo, M. Scotti, S. Scommegna, S. Pollara, and A. Silvagni.
An overview of flash architectural developments. Proceedings of the
IEEE, 91(4):523–536, April 2003.
[18] D. Cappelletti, R. Bez, D. Cantarelli, and L. Fratin. Failure mecha-
nisms of flash cell in program/erase cycling. ieee in edm, Central R&D,
SGS-Thomson Micro-Electronics, Via Olivetti 2, 20041 Agrate, Italy,
1994.
[19] E. Chen and T. Yen. Comparing SLC and MLC flash technologies and
structure. Technical report, Advantech, September 2009.
[20] A. Chimenton. Ultra-short pulses improving performance and reliabil-
ity in flash memories. Non-Volatile Semiconductor Memory Workshop,
2006., 21st:46 – 47, 2006.
[21] A. Chimenton, P. Pellati, and P. Olivo. Overerase phenomena: An
insight into flash memory reliability. Proceedings Of The IEEE,
91(4):617–626, April 2003.
[22] S. S. Chung, C.-M. Yih, S.-M. Cheng, and M.-S. Liang. A new tech-
nique for hot carrier reliability evaluations of flash memory cell after
long-term program/erase cycles. IEEE Transactions On Electron De-
vices, 46(9):1183–1189, September 1999.
[23] J. Cooke. Flash memory technology direction. Technical report, Micron
Technology, Inc., May 2007.
[24] G. Crump. Flash everywhere - flash memory summit day 1. Business
report, Storage Switzerland, Aug 2012.
[25] R. Dan and R. Singer. Implementing MLC nand flash for cost-effective,
high-capacity memory. White paper 91-SR-014-02-8L, M-systems,
September 2003.
164
[26] C. Darwin. On the Origin of the Species by Means of Natural Selection,
or the Preservation of Favoured Races in the Struggle for Life. Royal
Geographical Society, 1859.
[27] K. A. DeJong. An analysis of the behavior of a class of genetic adaptive
systems. Doctoral thesis, University of Michegan, 1975.
[28] K. A. DeJong. Evolutionary computation: A unified approach. Number
0-262-04194-4. MIT Press , Cambridge , Massachusetts, 1st edition,
2006.
[29] K. A. DeJong and W. M. Spears. An analysis of multi-point crossover.
Report, Operations Research Cybernetics, Naval Research Lab Wash-
ington Dc, 1990.
[30] K. A. DeJonge and W. M. Spears. An analysis of the interacting roles
of population size and crossover in genetic algorithms. Proceedings of
the 1st Workshop on Parallel Problem Solving from Nature(PPSN),
1:38–47, 1991.
[31] J. M. Fitzpatrick and J. J. Grefenstette. Genetic algorithms in noisy
environments. Machine Learning, 3:101–120, 1988.
[32] L. J. Fogel, A. J. Owens, and M. Walsh. Artificial Intelligence Thorough
Simulated Evolution. Number 0-471-33250-X. John Wiley & Sons Ltd,
1966.
[33] S. Forrest and M. Mitchell. What makes a problem hard for a genetic
algorithm? some anomalous results and their explanation. Machine
Learning, 13:285–319, 1993.
[34] R. H. Fowler and L. Nordheim. Electron emmission in intense electric
fields. Proceedings of the Royal Society of London., 119(781):173–181,
1928.
165
[35] J. Gautam. Dont invent, evolve:the inventors trial-and-error approach
can be automated by software that mimics natural selection. The
Economist, 9896323, 2007.
[36] O. Ginez, J.-M. Daga, M. Combe, P. Girard, C. Landrault, S. Pravos-
soudovitch, and A. Viraze. An overview of failure mechanisms in em-
bedded flash memories. In Proceedings of the 24th IEEE VLSI Test
Symposium (VTS06), 2006.
[37] N. Gockel, R. Drechsler, and B. Becker. A multi-layer detailed routing
approach based on evolutionary algorithms. IEEE International Con-
ference on Evolutionary Computation. ICE97, pages 557–562, 1997.
[38] D. E. Goldberg. Genetic Algorithms in Search, Optimization , and
Machine Learning. Number 0-201-15767-5. Addison-Wesley, 1999.
[39] T. G. W. Gordon and P. J. Bentley. On evolvable hardware. In Soft
Computing in Industrial Electronics, pages 279–323. Physica-Verlag,
2002.
[40] S. Gregor, A. Cabrini, O. Khouri, and A. Torelli. On-chip error cor-
recting techniques for new-generation flash memories. Proceedings Of
The IEEE, 91(4), April 2003.
[41] S. Haddad, C. Chang, B. Swaminathan, and J. Lien. Degradations due
to hole trapping in flash memory cells. IEEE Electron Device Letters,
10(3):117–119, March 1989.
[42] S. Haddad, C. Chang, A. Wang, J. Bustillo, J. Lien, T. Montalvo,
and M. V. Buskirk. An investigation of erase-mode dependent hole
trapping in flash eeprom memory cell. IEEE Electron Device Letters,
11(11):514–516, November 1990.
[43] K. Heffernan. Analog devices BV, Limerick Microconverter group, in-
terviewed by author. August 1999.
166
[44] K. Heffernan. Analog devices BV, Limerick Microconverter group, in-
terviewed by author. July 1999.
[45] K. Heffernan and F. Liamy. Analog devices BV, Limerick Microcon-
verter group, interviewed by author. June 1999.
[46] K. Hirsch. Programming nand devices. Technical report, Data IO
Corporation, 10525 Willows Road NE, Redmond WA 98052.
[47] J. H. Holland. Adaptation in Natural and Artificial Systems. Number
978-0-262-08213-6. MIT Press, 1992 edition, 1975.
[48] J. H. Holland and J. S. Reitman. Cognitive systems based on adaptive
algorithms. Special Interest Group on Artificial Intelligence SIGART
Bulletin, 63:49–55, 1977.
[49] G. S. Hornby, A. Globus, D. S. Linden, and J. D. Lohn. Automated
antenna design with evolutionary algorithms. In AIAA Space, pages
19–21, 2006.
[50] C. Hu. Lucky electron model of channel hot electron emission. Electron
Devices Meeting, 1979 International, IEEE, 25:22–25, December 1979.
[51] C. Huang, T. Wang, T. Chen, N. Peng, A. Chang, and F. Shone.
Characterization and simulation of hot carrier effect on erasing gate
current in flash EEPROM. 33rd Annual Proceedings of the Reliability
Physics Symposium, IEEE International, pages 61 – 64, April 1995.
[52] IC Engineering Corporation. Flash memory technology. Technical re-
port, IC Engineering Corporation, Portside Dia Building 2F 10-35,
Sakae-cho, Kanagawa-ku, Yokohama, 221-0052 JAPAN.
[53] H. T. E. Igura, Y.; Matsuoka. New device degradation due to ’cold’
carriers created by band-to-band tunneling. ELECTRON DEVICE
LETTERS, 10(5):227 – 229, May 1989.
167
[54] A. Inoue and D. Wong. Nand flash applications design guide. Technical
report, Toshiba America Electronic Components, Inc., 2003.
[55] Joint electron devices engineering council JEDEC. Failure mechanisms
and models for semiconductor devices, August 2003.
[56] Joint electron devices engineering council JEDEC. Electrically erasable
programmable rom (eeprom) program/erase endurance and data reten-
tion stress test, March 2006.
[57] Joint electron devices engineering council JEDEC. Solid-state drive
(ssd) endurance workloads, September 2010.
[58] Joint electron devices engineering council JEDEC, Subcommittee on
Reliability Test Method and Packaged Devices. Electrically erasable
programmable rom (eeprom) program/erase endurance and data re-
tention stress test, March 2009.
[59] J. King, H. Fahmy, and M. Wentzel. A genetic Algorithm Approach
to river management in Evolutionary Algorithms in Engineering Ap-
plications, chapter 2, pages 117–134. Number 3-540-62021-4. springer,
1998.
[60] D. Kline. Micron technology, inc, 8000 s. federal way boise, id inter-
viewed by author. May 2004.
[61] Kotanchek. The Meta-Model Approach for Simulation-Based Design
Optimization. Phd thesis, Universiteit van Tilburg, 2006.
[62] M. Kotanchek and K. Vladislavleva. Genetic Programming Theory
and Practice, volume V of Genetic and Evolutionary Computation Se-
ries, chapter Trustable symbolic regression models: using ensembles,
interval arithmetic and pareto fronts to develop robust and trust-aware
models, pages 201–220. March 2008.
168
[63] J. R. Koza. Genetic evolution and co-evolution of computer programs.
Proceedings of Second Conference on Artificial Life, 1990.
[64] J. R. Koza. Genetic programming: A paradigm for genetically breeding
populations of computer programs to solve problems. Technical report,
Stanford University Stanford, CA, 1990.
[65] J. R. Koza. Genetically breeding populations of computer programs
to solve problems in artificial intelligence. Computer Society Press
Proceedings of the Second International Conference on Tools for AI
IEEE, pages 819–827, 1990.
[66] J. R. Koza and R. Poli. In Search Methodologies: Introductory Tutorials
in Optimization and Decision Support Techniques, chapter 5 Genetic
Programming. Springer, 2005.
[67] P. L. Lanzi. Learning classifier systems from a reinforcement learning
perspective. Soft Computing - A Fusion of Foundations, Methodologies
and Applications, 6:162–170, 2002. 10.1007/s005000100113.
[68] J. Liebowitz. Expert systems: A short introduction. Engineering Frac-
ture Mechanics, 50(56):601 – 607, 1995.
[69] J. Lienig. Physical Design of VLSI Circuits and the Application of
Genetic Algorithms, chapter III Science and Engineering, pages 277–
292. Number 3-540-62021-4. Springer, 1997.
[70] S. J. Lousis and F. Zhao. Incorporating problem specific information
in genetic algorithms. Scientific Literature Digital Library, 2007.
[71] T. Lynch. Analog devices BV, Limerick Microconverter group, inter-
viewed by author. July 1999.
[72] P. Marks. Evolutionary algorithms now surpass human designers. New
Scientist, Issue 2614, 2007.
169
[73] C. R. Marshall and J. W. Valentine. The importance of preadapted
genomes in the origin of the animal bodyplans and the cambrian ex-
plosion. Evolution, 64(5):1189 – 1201, 2010.
[74] Z. Michalewicz and D. Fogel. How to Solve It: Modern Heuristics.
Springer, 2004.
[75] R. Micheloni, L. Crippa, and A.Marelli. Inside Flash Memory. Number
9-048-19430-X. Springer, August 2010.
[76] Micron Technology Inc. Boot block flash memory technology. Technical
note TN-28-01, Micron Technology, Inc., 8000 S. Federal Way Boise,
ID, 1999.
[77] Micron Technology, Inc. Tlc flash memory devices. Technical note,
Micron Technology, Inc, 8000 S. Federal Way Boise, ID, 2011.
[78] H. Mitchell. Introduction. Springer, 2010.
[79] M. G. Mohammad and K. K. Saluja. Defect based functional test for
non-volatile memory disturb faults. In 3rd Workshop on RTL and High
Level Testing, 2002.
[80] M. G. Mohammad, K. K. Saluja, and A. Yap. Testing flash memories.
In 13th International Conference on VLSI Design, pages 406–411, 2000.
[81] G. Molas, D. Deleruyelle, B. D. Salvo, G. Ghibaudo, M. Gely,
L. Perniola, D. Lafond, and S. Deleonibus. Degradation of floating-gate
memory reliability by few electron phenomena. IEEE Transactions on
Electron Devices, 53(10):2610 – 2619, October 2006.
[82] D. J. Montana and L. Davis. Training feedforward neural networks
using genetic algorithms. In Proceedings of the 11th international joint
conference on Artificial intelligence - Volume 1, IJCAI’89, pages 762–
767, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc.
170
[83] G. E. Moore. Lithography and the future of moore’s law. Solid-State
Circuits Newsletter, IEEE, 20(1):37–42, 2006.
[84] D. B. Moss. Analog Devices BV Limerick, Microconverter group, in-
terviewed by author. July 2010.
[85] I. Motta, G. Ragone, O. Khouri, G. Torelli, and R. Micheloni. High-
voltage management in single-supply CHE Nor-type flash memories.
Proceedings of the IEEE, 91(4):554 – 568, 2003.
[86] W. K. Ng, S. Choi, and C. V. Ravishankar. Lossless and Lossy Data
Compression in Evolutionary Algorithms in Engineering Applications,
chapter 2, pages 174–188. Number 3-540-62021-4. Springer, June 1998.
[87] M. ONeill and C. Ryan. Under the hood of grammatical evolution.
Proceedings of the Genetic and Evolutionary Computation Conference,,
1999.
[88] S. Passone, P. Chung, and V. Nassehi. Incorporating domain-specific
knowledge into a genetic algorithm to implement case-based reasoning
adaptation. Knowledged-Based Systems, 19(19), 2005.
[89] P. Pavan, R. Bez, P. Olivo, and E. Zanoni. Flash memory cells, an
overview. In Proceedings Of The IEEE, volume 85, pages 1248–1271,
August 1997.
[90] R. Poli, W. B. Langdon, N. F. McPhee, and J. R. Koza. Genetic
programming an introductory tutorial and a survey of techniques and
applications. Technical Report 1744-8050, School of Computer Science
- University of Birmingham, October 2007.
[91] A. Prugel-Bennett. Benefits of a population: Five mechanisms that
advantage population-based algorithms. IEEE Transactions On Evo-
lutionary Computation, 14(4), 2010.
171
[92] W. Quan, M. K. Cho, and D. M. Kim. Dynamic snap-back induced
programming failure in stacked gate flash eeprom cells and efficient
remedying technique. IEEE Transactions on Electron Devices, 46(12),
December 1999.
[93] C. R. Reeves. Using genetic algorithms with small populations. In Pro-
ceedings of the Fifth International Conference on Genetic Algorithms,
pages 92–99, 1993.
[94] R. Richey. Flash memory technology: Considerations for application
design. Technical report, Microchip Technology Inc, 2355 West Chan-
dler Blvd. Chandler, Arizona, 2003.
[95] S. Roland. Robust Encoding in Genetic Algorithms in Evolutionary Al-
gorithms in Engineering Applications, chapter 1, pages 29–44. Number
3-540-62021-4. Springer, June.
[96] F. Rothlauf. Optimization methods. In Design of Modern Heuristics,
Natural Computing Series, pages 45–102. Springer Berlin Heidelberg,
2011.
[97] C. Ryan, J. J. Collins, and M. O’Neill. Grammatical evolution, evolving
programs for an arbitary language. In First European WorkShop on
Genetic Programming, pages 83 – 95, 1998.
[98] H. P. Schwefel. On the evolution of evolutionary computation. Univer-
sity of Dortmund D44221 Dortmund, Germany, (2).
[99] H. P. Schwefel and T. Back. Evolutionary computation: an overview.
Proceedings of IEEE International Conference on Evolutionary Com-
putation, (0-7803-2902-3):20 – 29, May 1996.
[100] Silicon Storage Technology, Inc. Endurance testing of eeproms. Tech-
nical Paper 705, 450 Holger Way San Jose CA, 95134, November 2001.
172
[101] Silicon Storage Technology, Inc. Product reliability. Technical Paper
706, 450 Holger Way San Jose CA, 95134, November 2001.
[102] Silicon Storage Technology, Inc. Reliability considerations for repro-
grammable nonvolatile memories. Technical Paper 704, 450 Holger
Way San Jose CA, 95134, 2001.
[103] Silicon Storage Technology, Inc. Superflash eeprom technology. Tech-
nical Paper 701, 450 Holger Way San Jose CA, 95134, November 2001.
all about the cell used.
[104] Silicon Storage Technology, Inc. Technical comparison of floating gate
reprogrammable nonvolatile memories. Technical Paper 702, 450 Hol-
ger Way San Jose CA, 95134, November 2001.
[105] A. Silvagni, G. Fusillo, R. Ravasio, M. Picca, and S. Zanardi. An
overview of logic architectures inside flashmemory devices. Proceedings
of The IEEE, 91(4), April 2003.
[106] SK Hynix Corporation. 32 gbit (4 g x 8 bit) mlc nand flash
memory specification. User specification, Hynix Corporation, 2091,
Gyeongchung-daero, Bubal-eub, Icheon-si, Gyeonggi-do, KOREA,
2010.
[107] S. F. Smith. Flexible learning of problem solving heuristics through
adaptive search. In Proceedings of the Eighth international joint con-
ference on Artificial intelligence - Volume 1, IJCAI’83, pages 422–425,
San Francisco, CA, USA, 1983. Morgan Kaufmann Publishers Inc.
[108] R. V. Sole, P. Fernandez, and S. A. Kauffman. Adaptive walks in a gene
network model of morphogenesis: insights into the cambrian explosion.
International Journal of Developmental Biology, 47:685–693, 2003.
[109] J. Spall. Introduction to Stochastic Search and Optimization: Estima-
tion, Simulation, and Control. Wiley Series in Discrete Mathematics
and Optimization. Wiley, 2005.
173
[110] Standards Committee of the IEEE Electron Devices Society. Standard
definitions and characterization of floating gate semiconductor arrays,
2002.
[111] Standards Committee of the IEEE Electron Devices Society. Standard
definitions and characterization of floating gate semiconductor arrays.
(1005-1998):77–78, 2002.
[112] G. Stanley. Experiences using knowledge-based reasoning in online
control systems. International Federation of Automatic Control (IFAC)
Symposium on Computer Aided Design in Control Systems, 1991.
[113] J. Sullivan and C. Ryan. A destructive evolutionary algorithm process.
In Proceedings of the 2007 Frontiers in the Convergence of Bioscience
and Information Technologies. IEEE Computer Society, 2007.
[114] J. Sullivan and C. Ryan. A destructive evolutionary process a pilot im-
plementation. In Genetic and Evolutionary Computation Conferance,
volume 2, page 2167 to 2174. Association for computing machinery,
ACM, 2007.
[115] J. Sullivan and C. Ryan. A destructive evolutionary algorithm pro-
cess. Soft Computing- A Fusion of Foundations, Methodologies and
Applications, 15(1):95 –102, 2011.
[116] A. Tal. Two technologies compared: Nor vs. nand. White Paper 91-SR-
012-04-8L, M-Systems(now SanDisk), 601 McCarthy Boulevard Milpi-
tas, CA 95035, July 2003. Revision 1.1.
[117] T. Then and E. K. Chong. Genetic algorithms in noisy environments.
ECL technical reports 245 (School of Electrical Engineering Perdu Uni-
versity Lafeyette Indiana), September 1993.
[118] Toshiba Electronic Components America Ltd. Nand vs. nor flash mem-
ory technology overview. Technical report, 19900 MacArthur Boule-
vard, Suite 400 Irvine, CA 92612.
174
[119] S.-H. Tsai, J.-S. Hung, N.-F. Wang, J.-H. Horng, M.-P. Houng, and Y.-
H. Wang. Oxide degradation mechanism in stacked-gate flash memory
using the cell array stress test. Institute of Physics Publishing, Semi-
conductor Science and Technology, 18:857 – 863, August 2003.
[120] Various authors. Evolutionary Algorithms in Engineering Apllication.
Number 3-540-62021-4. Edited by D. Dasgupta and Z.Michalewicz,
Springer, 1997.
[121] Various authors. Nonvolatile Memory Technologies with emphasis on
flash. Edited by Joe E. Brewer and Manzur Gill, IEEE press series on
Microelectronic systems, 2008.
[122] G. Verma and N. Mielke. Reliability performance of etox based flash
memories. 26th Annual Proceedings of the International Reliability
Physics Symposium, pages 158 – 166, 1988.
[123] J. Wai. Advantages of solid-state drives for design computing. White
paper, Intel Information Technology, Computer Manufacturing, 2200
Mission College Blvd. Santa Clara, California, September 2009.
[124] H. Wang. Sequential Optimization Through Adaptive Design of Exper-
iments. Phd thesis, Massachusetts Institute of Technology, 77 Mas-
sachusetts Avenue Cambridge, MA, March 2007.
[125] R. A. Watson, S. G. Ficici, and J. B. Pollack. Embodied evolution:
Embodying an evolutionary algorithm in a population of robots. In
Congress on Evolutionary Computation, pages 335 – 342. IEEE, 1999.
[126] S. C. Wee Keong Ng and C. Ravishankar. Lossless and lossy data com-
pression. Evolutionary Algorithms in Engineering Applications, 1:173–
188, 1997.
[127] R. Wehrens, C. Lucasius, L. Buydens, and G. Kateman. Sequential as-
signment of 2d-nmr spectra of proteins using genetic algorithms. Jour-
175
nal of Chemical Information and Computer Sciences, 33(2):245–251,
1993.
[128] Western Digital Corporation. Nand evolution and its effects on solid
state drive (ssd) useable life. white paper WP-001-01R, Western Digital
Corporation, 2009.
[129] S. Wright. The role of mutation, inbreeding, crossbreeding and selec-
tion in evolution. Proceedings of the 6th International conferance on
genetics, pages 356 – 366, 1932.
[130] J. Xie and W. Xing. Incorporating domain specific knowledge into
evolutionary algorithms. Bejing Mathematics Dept. of applied maths,
Tsinghua University, Beijing 100084, P.R. China, 4(2):131–139., 1998.
[131] C. F. Yinug. The rise of the flash memory market: Its impact on firm
behavior and global semiconductor trade patterns. journal of inter-
national commerce and economics, US international trade commission,
2007.
[132] T. Yoshii, C. Black, and S. Chahal. Solid-state drives in the enter-
prise: A proof of concept. White paper, Intel Information Technology,
Computer Manufacturing, March 2009.
[133] K. Yu and F. Liamy. Analog devices BV, Limerick Microconverter
group, interviewed by author. September 1999.
[134] B. V. Zeghbroeck. The Principles of Semiconductor Devices. University




9.1 The Analog Devices ADu812/ADu824
The Analog devices ADu812/ADu824 micro controller has been chosen for
the investigation and is smart transducer and micro controller. The internal
arrangements are detailed the user specifications[3][4]an outline of which is
shown here in Figure
The ADuC824 is intended to be a complete smart transducer front-end,
integrating two high-resolution sigma delta ADCs(Analog to Digital Con-
verter), an 8-bit MCU(Micro Controller Unit) with program and data NOR
Flash Memory arrays on a single chip. The device operates from a 32 kHz
crystal with an on-chip PLL(Phase Locked Loop) generating a high-frequency
operating clock of 12.58 MHz. This clock is, in turn, routed through a pro-
grammable clock divider from which the MCU core clock operating frequency
is generated.
The micro controller core is an 8052 and therefore 8051 instruction set
compatible. The micro controller core machine cycle consists of 12 core clock
periods of the selected core operating frequency.
Eight Kilobytes of nonvolatile NOR Flash program memory are pro-
vided on-chip. 640 bytes of nonvolatile NOR Flash data memory and 256
bytes of RAM are also integrated on-chip. The ADuC824 also incorpo-
177
Figure 9.1: Block diagram of the Analog Devices ADu812 Micro-converter
chip
rates additional analog functionality such as a 12-bit DAC, current sources,
power supply monitor, and a bandgap reference. On-chip digital periph-
erals include a watchdog timer, time interval counter, three general pur-
pose timers/counters, and three serial I/O ports (SPI, UART, and I2C-
compatible).
On-chip factory firmware would normally supports in-circuit serial down-
load and debug modes (via UART) but in the case of these pre-production
evaluation parts no such firmware is included nor is any tested version avail-
able for upload. This means that on one hand we may use the 2K Non-volatile
178
NOR flash area reserved for this firmware during experimentation while on
the other hand any embedded code we wish to run on the part must be loaded
using parallel programming mode.
The part operates from a single 3 V or 5 V supply. When operating from
3 V supplies, the power dissipation is below 10 milliwatts. The chip is housed
in a 52-lead MQFP(Metric Quad Flat Pack )package. An adapter for this
device is available from Emulation technology as a through hole programming
adapter suitable for inclusion on any DUT board.
From the diagram it can be seen that the DUT has four 8-bit general
purpose ports through which many of the its functions are controlled as well
as they being outputs under the control of the MCU.
There are then many other pins supporting a verity of functions including
power supply and reset. Many of the port pins are dual purpose such as the
serial port pins which may be configured as general purpose input output
when not being used for communication.
There are many subsystems which are not used during the research and
may be ignored. There are some subsystems which are not used but never-
theless must be configured. The ADu812 is an identical device other than
the A/D converter is limited to 12 bits.
9.2 The Memory Map
Of particular interest is the memory map shown here in Figure 9.3. The
device has, all tolled, 10 kilobytes of non-volatile program memory and 640
bytes of similar non volatile data memory. Code memory from 0x1FFF to
0XF800 is not present internally and so is necessarily provided externally if
required with the EA pin set high to access it. This NOR NV(Non Volatile)
memory is, typically of flash, erasable only in group. The code memory is
grouped into 64 byte erasable chunks while the data memory is grouped into
4 byte chunks. This means any analysis involving erase must be conducted
on either the 64 byte block, code memory or the 4 byte block of data memory
179
Figure 9.2: ADU824 memory map
MS(Memory Select)is mapped to port 3.5 pin 23 and when high selects code memory
and while low selects data memory
but not both since they will have different characteristics. This reduces our
experiment space considerably, although it might be possible to do read and
write analysis with greater resolution and also in parallel with erase type
analysis.
9.2.1 Memory Map Coding Consideration
As with all 8051-compatible devices, the ADuC824 has separate address
spaces for Program and Data memory. If the user applies power or resets the
device while the EA pin is pulled low, the device will execute code from the
external program space, otherwise the part will execution from internal NV
program memory. Figure 9.4
The data memory space consists of four physically separate blocks, the
lower 128 bytes of RAM, the upper 128 bytes of RAM, 128 bytes of special
function register (SFR) area, and a 640-byte NV Data memory. This Data
Memory is available to the user indirectly via a group of control registers
mapped into the Special Function Register (SFR) area. Many other function
of NV memory are mapped into the SFRs such as control over the extra 2
180
Figure 9.3: ADU824 Memory Map
Kilobyte bootstrap code space, parallel programing, data code programming
and read/write/erase parameter variables. The complete SFR map is shown
in figure 9.4
The lowest 32 bytes of internal ram are grouped into four banks of eight
registers addressed as R0 through R7. The next 16 bytes (128 bits), locations
20 Hex through 2F Hex above the register banks, form a block of directly
addressable bit locations at bit addresses 00H through 7F Hex. The stack
can be located anywhere in the internal memory address space, and the stack
depth can be expanded up to 256 bytes.
Reset initialises the stack pointer to location 07 Hex and increments it
once to start from locations 08 Hex which is also the first register (R0) of
register bank 1. Thus, if one is going to use more than one register bank,
the stack pointer should be initialized to an area of RAM not used for data
storage.
181
Figure 9.4: Special function register map
182
9.3 Code Listings
There were a large number of line of code written across 3 platforms to
complete this work and they can not all be listed here, however they are
available on request.
9.3.1 SBC Header File
The following code example shown is from the header file of an Early Version
of SBC Embedded Code. It gives a flavour of the embedded source code
structure. In this file we define statics and address map reservation for use
in the main assembler file. Assembly is useful in that it allows direct control
over the micro processors without the interpretation of a compiler which
was an important feature during this work. The SBC memory map can be










;//** Date 15 / JAN / 1997 A COPY OF SERX9126.ASM
;//** Updated 16/JAN /1997 TO REFLECT THE NEW INTERFACE CARD
;//** Updated 25/JAN /2000 TO ACT AS HEADER FOR THE PROT PROGRAM
;//** Updated 21/Dec /2000 TO ACT AS HEADER FOR THE Serial Monitor
;//** Updated 6/JAN /2001 TO REFLECT use in test hardware for masters work
;//** Updated 2/ June/ 2001 additional reservations





;7 0 T0 GATING OFF
;6 0 T0 TIMER/COUNTER SELECTION. TIMER SELECTED
;5 1 MODE
;4 0 MODE
;3 0 T1 GATING OFF





;7 1 T1 OVERFLOW FLAG . SET BY HARDWARE
;6 1 T1 RUN BIT SET TO RUN
;5 0 T0 OVERFLOW FLAG . SET BY HARDWARE
;4 1 T0 RUN BIT . SET TO RUN
;3 0 INT 1 EDGE FLAG . SET BY HARDWARE
;2 0 EDGE/ LEVEL TRIGGERED INT
;1 0 INT 0 EDGE FLAG . SET BY HARDWARE






;4 1 SERIAL ENABLE
;3 0 TB8 THE 9 DATA BIT TO BE XMITTED
;2 0 STOP BIT OR 9 BIT TO BE RECEIVED
;1 1 XMIT INT FLAG . SET BY HARDWARE




;7 0 DISABLE ALL INTS
;6 X RESERVED
;5 X RESERVED
;4 1 SERIAL PORT INT . ENABLED .
;3 1 TIMER 1 INT ENABLE
;2 0 EXT INT 1 ENABLE . DISABLED
;1 1 TIMER 0 INT ENABLE . ENABLED






;4 1 SERIAL PRIORITY
;3 0 T1 PRIORITY
;2 0 EXT1 PRIORITY
;1 0 T0 PRIORITY . SET TO HIGH AS THIS IS THE WATCHDOG HIT
;0 0 EXT 1PRIORITY
; 10HEX
;***************************************************************************
;** THE FOLLOWING RESERVES THE BIT AREA FOR HANDSHAKING SIGNALS FOR THE //
;** PORT AND XDATA LOCATIONS WHICH ADDRESS THE 8255 . ALSO RAM LOCATION
;** WHICH WILL CONTAIN THE DATA FROM THE PC AND DATA RETURNED FROM THE
;** BOARD UNDER TEST . fINALLY SOME ERRORS AND TESTS ARE GIVEN NAMES AND
;** NUMBERS .
;***************************************************************************
;BIT ADDRESSABLE RUNS FROM 20 TO 2F HEX IN ORDER FROM 0 TO 7F BITS
185
ACK BIT 7CH ;P1.3;ROW1 PIN 13 INIT LOW
BYTE_RDY BIT 7DH ;P1.1;ROW0 PIN 12 INIT LOW
BYTE2TX BIT 7EH ;COLSEL1 PIN 10 INIT LOW
BUSYFLG BIT 7FH ;COLSEL 0 PIN 9 INIT HIGH
STATUSFLG BIT 7BH ;THE STATUS FLAG ADDED 16/JAN/1997
REP_BIT BIT 7AH ;THE LATCH IS IN USE 17/FEB/1996
ALTN_BIT BIT 79H ;DATA ORIGINATES LOCAL USE ALTERN BYTES
;THE 8255 LOCATION
PORTA55 EQU 2300H ;THE DATA BYTE
PORTB EQU 2301H ;THE ROW ADDRESS
PORTC EQU 2302H ;THE COLSEL ADRESS
CTRL8255 EQU 2303H ;THE CONTROL BYTE
;VALUES RETURNED FROM THE TARGET BOARD AND SENT FROM PC
HIGHBYTE DATA 69H ;Contains the eeprom address high byte
LOWBYTE DATA 6AH ;Contains the eeprom address low byte
PROG_DATA DATA 60H ;Contains the eeprom Data byte
DATA_PNTR DATA 61H ;Points to the next data byte
ACTION DATA 68H ;Carries the instruction type number
CONVERT_H DATA 58H ;Storage for the A/D converter
CONVERT_L DATA 59H ;Storage for the A/D converter
;THE ERROR NUMBERS AND COMMS VALUES
ATT_VB EQU 0FFH ; ATTENTION PLEASE VB
ALL_BUSY EQU 31H ;TARGET REPORTS CONSTANTLY BUSY
TIMEOUT EQU 01H ;THE V_INSTRUMENT TIMED OUT
S_DEFAULT EQU 02H ;SWITCH DEFAUL . FALL OUT THE END
PA_OUTPUT EQU 89H ;A AND B ARE OUTPUTS
PA_INPUT EQU 99H
ALL_OUTS EQU 80H ;ALL PORTS AS OUTPUTS 1000,0000B
MUXAD590 EQU 00H ;VALUE THAT SELECTS AD590 IN MUX
MUXSTACK EQU 10H ;SELECT STACK, CONTROL LINE 00
186
BOOTLOAD EQU 31H ;ACTION NUMBER 1
LOAD_MEMBER EQU 32H ;ACTION NUMBER 2
CHECK_BUSY EQU 33H ;ACTION NUMBER 3
;MUX SELECT FOR VOLTAGE MEASUREMENT
VOLTS_1 EQU 11111110B ;BIT 0 ENABLES THE 573 LATCH
VOLTS_2 EQU 11111100B ;THIS SHOULD ALWAY BE DISABLED
VOLTS_3 EQU 11111010B ;= A HIGH . THIS BINARY IS
VOLTS_4 EQU 11111000B ;INVERTED BY THE LATCH U24
VOLTS_5 EQU 11110110B ;HENCE A ZERO GERNRATES A HIGH
VOLTS_6 EQU 11110100B ;WHICH DISABLES THE LATCH
VOLTS_7 EQU 11110010B ;NOTE THE RELAYS ACTUATE ON A
VOLTS_8 EQU 11110000B ;HIGH
;DECLARES TO SWITCH ON THE LATCHES ON THE DUT BOARD BY SETTING






TIME EQU 0FFH ; THE SHORT WAIT DURATION
;DECLARATIONS SPECIFIC TO THE FILE DAC_FACE
IDAT BIT P1.6 ; Inverted data
IDCLK BIT P1.5 ; Inverted clock (clock on raising edge)
NICS BIT P1.4 ; DAC chip select (Active High)




;** END OF DECLARATION AND START OF CODE PROPER
;***************************************************************************
188
9.4 Flash Memory: Growth
Flash memory is currently a 25 billion dollar industry with NAND comprising
20 billion of the total. It is forecast to continue growing by all commentators.
Figure 9.5: Flash memory growth forecast to 2016
The Growth rate has been staggering and NAND flash has grown faster
than any technology in the history of semiconductors[131]. The growth rate
shows no sigh of abating as new markets are opened by the availability of
cheap NAND flash.
More recently Solid State Hard disk has grown out of the NNAND space
and are exposing all other bottlenecks in both PC design and big data storage
and warehousing. NNAND flash drives have become a ‘Must Have’ in all
data warehousing sites due to the breakthrough speeds of SSD[123]. In stark
contrast to hard disks, SSDs have no moving parts and have multiple data
paths. This makes them almost limitlessly fast compared to hard disks.
This is driving the cloud computing revolution since near instant access is a
pre-requisite for moving storage from the local desktop to remote warehouse.
Meanwhile new markets for NAND flash continue to grow in both the
enterprise class, aimed at high end and corporate users and in the consumer
end in such things as PC, tablets.
189
Figure 9.6: SSD growth forecast in light of cheap NAND Flash
NAND has outstripped Nor in sales by a large margin since 2005. Nor
is now only considered as a code storage medium since it supports direct
memory access via an address bus.
NOR capacity has risen dramatically in recent years and so code space
is now well served with NOR flash of ample size. Endurance and retention
figures are now considered sufficient for code memory since it does not require
frequent rewrites.
Code space is the only market for NOR and so, with the exception exotic
applications such as remote sensing there are no compelling reason for creat-
ing NOR with exceptional endurances although there still is a cost reduction
and binning Rationale. NAND is now used extensively for data storage ap-
plications and as such requires frequent rewrite endurance.
190
9.5 Erase Time Register Values
The erase time is great.
Figure 9.7: Erase Control Register
Figure 9.8: Erase Register Value Meanings
191
9.6 Block Diagrams
Figure 9.9: Generalised block diagram
Figure 9.10: Complete block diagram
192
9.7 Detailed Schematics and Pin Designation
9.7.1 IFT board interface
IFT board interface pin definitions showing all ports and their connection to
the DUT board. Some port pins are duplicated on multiple headers. The
complete IFT board schematic is available on request.
Figure 9.11: Connections from the SBC
193
9.7.2 DUT board Schematic
Figure 9.12: DUT board schematic diagram
194
9.7.3 PIO Port Schematic and Modes
Figure 9.13: Operating modes of the 8255
Figure 9.14: Functional diagram of the PIO
195
