Fault tolerance in digital controllers using software techniques by Halse, Robert G.
Durham E-Theses
Fault tolerance in digital controllers using software
techniques
Halse, Robert G.
How to cite:
Halse, Robert G. (1984) Fault tolerance in digital controllers using software techniques, Durham theses,
Durham University. Available at Durham E-Theses Online: http://etheses.dur.ac.uk/7474/
Use policy
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or
charge, for personal research or study, educational, or not-for-proﬁt purposes provided that:
• a full bibliographic reference is made to the original source
• a link is made to the metadata record in Durham E-Theses
• the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Please consult the full Durham E-Theses policy for further details.
Academic Support Oﬃce, Durham University, University Oﬃce, Old Elvet, Durham DH1 3HP
e-mail: e-theses.admin@dur.ac.uk Tel: +44 0191 334 6107
http://etheses.dur.ac.uk
FAULT TOLERANCE IN DIGITAL CONTROLLERS USING 
SOFTWARE TECHNIQUES 
by 
Robert G. Halse 
FAULT TOLERANCE IN DIGITAL CONTROLLERS USING SOFTWARE TECHNIQUES 
Robert G. Halse 
ABSTRACT 
Microprocessor based systems for controll ing gas supplies require very 
high levels of reliability for safety reasons. Non-redundant systems are 
considered to be inadequate, and an alternative approach is necessary. In 
digital systems, transient faults are as much as fifty times more common 
than permanent faults. Therefore mechanisms which allow for recovery from 
transients will provide large improvements in reliability. However, to 
enable effective design of recovery mechanisms it Is necessary to 
understand failure modes. 
The results from practical interference tests, designed to simulate 
transient faults, are presented. They show that corruption to the correct 
flow of program execution is a common failure, and that subsequent 
instruction fetches can be performed from any of the memory locations. 
Under these conditions any value of operation code can be interpreted as an 
Instruction, including those undeclared by the manufacturers. Four 
commonly used microprocessors are investigated to establish the functions 
of the undeclared codes, and other undeclared operations are revealed. 
Analyses on the sequence of events following a random jump into the 
four main memory types of data, program, unused and input areas, are 
presented. Recovery from this type of execution can be achieved by the 
addition of restart codes into the areas, so that execution can transfer to 
a recovery routine. The effect of this mechanism on the recovery process 
is investigated. 
Finally, some methods of testing systems, to check the levels of 
reliability improvement obtained by these techniques, are considered. 
ACKNOWLEDGEMENTS 
I would like to express my gratitude to the people and organisations 
that have contributed to the work presented in this thesis. In paricular. 
to the Science and Engineering Research Council, and the British Gas 
Corporation's Engineering Research Station at Killingworth. for providing 
financial support. To my supervisor Dr. Clive Preece tor his 
encouragement, guidance and general advice throughout the research. To Dr. 
Ken Jenkins of the British Gas Corporation for providing much useful 
information and equipment. To Dr. Mansour Sahardi for his interest and 
discussion on the project. To Mandy for translation and typing work. To 
the electrical technicians (Jack. Trevor, Michael. Colin. Steve. Ian and 
lan) for their co-operat ion and assistance while I have been at the 
university. Finally, I would like to thank the Fleetham family for 
allowing me to practice my building skills on their house during my spare 
time. 
FAULT TOLERANCE IN DIGITAL CONTROLLERS USING 
SOFTWARE TECHNIQUES 
by 
Robert G. Halse 
The copyright of this thesis rests with the author. 
No quotation from it should be published without 
his prior written consent and information derived 
from it should be acknowledged. 
Thesis Submitted for the Degree of 
Doctor of Philosophy 
in the Faculty of Science 
University of Durham 
November 1984 
List of Contents 
Section Page No. 
List of Figures viii 
List of Tables x 
List of Symbols and Abbreviations xi 
CHAPTER 1 
Introduction and Review of System Reliability 
1.1 The Need for a Reliable Controller 1 
1.1.1 Present Mechanical Control 2 
1.1.2 Future Micro-Electronic Control 4 
1.2 Source of Failures 7 
1.3 Methods of Increasing Reliability 9 
1.3.1 Reducing Failures due to Design Errors 10 
1.3.2 Reducing Failures due to Component Malfunctions 12 
1.3.3 Reducing Failures due to Environmental Effects 15 
1.4 Reliability Improvements Obtained 19 
1.5 Importance of Error Detection 21 
1.6 Possible Dangers of Adding Redundancy 24 
1.7 Requirements for Different Applications 26 
1.8 Contents of the Thesis 28 
CHAPTER 2 
Practical Tests to Determine Transient Failure Mechanisms 
2.1 Introduction 31 
2.2 Test System 32 
2.2.1 Processor Board 32 
2.2.2 Decoding Circuitry 33 
2.2.3 Power Supply Unit 34 
2.2.4 Software 34 
2.2.4.1 SYSTEST 34 
2.2.4.2 RAMTEST 35 
2.3 Practical Tests Performed 36 
2.4 Test Results 38 
2.4.1 interference to the RAM 39 
2.4.2 interference to the EPROM 41 
2.4.3 Interference to the Processor 43 
2.4.4 interference to the Complete System 44 
2.5 Significance of the Results 46 
2.6 Observations of Permanent Failures 46 
2.6.1 Processor Failures 47 
2.6.2 RAM Failure 49 
2.6.3 Crystal Failure 49 
2.7 Summary 50 
CHAPTER 3 
Undeclared Operations of Microprocessors 
3.1 Introduction 52 
3.2 Undeclared Operation Codes 52 
3.3 Operations of the 8085 54 
3.4 Operations of the 6800 56 
3.4.1 Determination of the Undeclared Instructions 56 
3.4.2 Functions of the Undeclared Codes 58 
3.4.3 Cycling Through Memory 59 
3.4.4 Comparison with Published Data 61 
3.5 Operations of the 48-series Microprocessors 62 
3.5.1 Undeclared Memory in the 8035 63 
ii 
3.5.2 Determining the Undeclared Instructions 64 
3.5.3 The Effects of Executing the Undeclared Codes 65 
3.5.3.1 Intel 8035/8048 65 
3.5.3.2 NEC 8035/8048 66 
3.5.4 Other Devices in the Series 67 
3.6 Operations of the 68000 67 
3.7 Operations of the 6809 and Z80 69 
3.8 Implications of the Undeclared Operations on Reliability 70 
3.8.1 Significance for Watchdog Design 70 
3.8.2 Powering down to Enable Recovery 71 
3.8.3 Use of Non-Maskable Interrupts 72 
3.8.4 The Most Important Undeclared Operations 72 
3.9 Summary 72 
CHAPTER 4 
Erroneous Execution in Data Areas 
4.1 Introduction 74 
4,1.1 Random Jump Within the Memory Map 75 
4.2 Analysis of Execution 75 
4.2.1 Response of Different Processors 77 
4.2.2 Results from the Analysis 79 
4.3 Transfer from the Data Area 79 
4.3.1 Halt Instructions 79 
4.3.2 Restart Instructions 80 
4.3.3 Return Instructions 81 
4.3.4 Unspecified Jumps 81 
4.4 Modification to the Analysis 82 
4.5 Improvements in Recovery 83 
iii 
4.6 Simulation of Execution in Data Areas 83 
4.7 Optimum Seeding of Data 84 
4.7.1 Data Structures for the 8085 84 
4.7.2 Data Structures for the 6800 85 
4.7.3 Data Structures for the 8048 85 
4.7.4 Data Structures for the 68000 87 
4.8 The Effect of Data Block Size on Recovery 87 
4.9 Summary 88 
CHAPTER 5 
Erroneous Execution in Program Areas 
5.1 introduction 89 
5.2 Detailed Analysis 89 
5.2.1 Comparison Between Instruction Sets 92 
5.2.2 Comparison Between Actual Programs 93 
5.3 Simplified Analysis 94 
5.4 Comparison Between the Detailed and Simplified Analyses 96 
5.5 Verification of Results 97 
5.6 Improvements in Recovery 97 
5.6.1 Low Level Detection 98 
5.6.2 High Level Detection 99 
5.7 Summary 100 
CHAPTER 6 
Erroneous Execution in Unused and Input/Output Areas 
6.1 Introduction 101 
6.2 Execution in Unused Areas 101 
6.2.1 Unpopulated Memory Areas 102 
6.2.2 Unpopulated Areas of the 8085 103 
iv 
6.2.3 Unpopulated Areas of the 8048 105 
6.2.4 Unpopulated Areas of the 6800 and 68000 108 
6.3 Execution in Memory Mapped I/O 108 
6.3.1 Execution of Input Data by the 8048 110 
6.4 Summary 111 
CHAPTER 7 
Flow of Execution Between Different Memory Areas 
7.1 Introduction 112 
7.2 Method of Analysis 112 
7.3 Initial Error 113 
7.4 Transfer from Different Memory Areas 113 
7.5 Execution of an Infinite Loop 115 
7.5.1 Loops in Data Areas 115 
7.5.2 Loops in Unused Areas 116 
7.5.3 Loops in Input Areas 117 
7.6 The Expected Number of Instructions Executed 118 
7.7 The Effects of Memory Map Usage on Erroneous Execution 119 
7.7.1 Memory Maps of the 8085 120 
7.7.1.1 Fault Tolerant Program Area 121 
7.7.1.2 Fault Tolerant Data Area 122 
7.7.1.3 Fault Tolerant Unused Areas 123 
7.7.2 Memory Maps of the 6800 123 
7.7.3 Memory Maps of the 68000 125 
7.7.4 Memory Maps of the 8048 126 
7.8 Number of Erroneous Instructions Executed 127 
7.9 Probability of Data Corruption 128 
7.10 Summary 128 
v 
CHAPTER 8 
Selection of Error Detection Mechanisms 
8.1 Introduction 131 
8.2 Specific System Considered 131 
8.3 The Effects of Adding Error Detection Mechanisms 132 
8.3.1 The Non-Fault Tolerant System 132 
8.3.2 Removal of Input Areas from the Memory Map 133 
8.3.3 Addition of a Recovery Routine 134 
8.3.4 Forcing Restart Instructions into the Unused Areas 135 
8.3.5 Modifying the Program and Data Areas 135 
8.3.6 Detection Within the Software 136 
8.4 Watchdog Timers 137 
8.5 Other Hardware Implemented Detection Mechanisms 139 
8.5.1 Wait State Recognition 139 
8.5.2 Illegal instruction Fetches 139 
8.5.3 Detection of a Write Outside RAM Areas 140 
8.5.4 Detection of Undeclared or Unused instructions 141 
8.5.5 Voltage Level Detection 141 
8.6 Choice of Mechanisms for General Systems 143 
8.7 Summary 144 
CHAPTER 9 
Development of a Facility to Test Redundant Systems 
9.1 Introduction 145 
9.2 Fault Injection 146 
9.3 Generation of Interrupts 148 
9.4 Memory Boundary on Test Program 149 
9.5 Software Design 152 
vi 
9.6 Initial Results 154 
9.7 Possible Developments 156 
9.8 Summary 157 
CHAPTER 10 
Conclusions 
10.1 Introduction 158 
10.2 Practical Tests to Determine Failure Mechanisms 158 
10.3 Undeclared Operations in Microprocessors 159 
10.4 Execution Following an Erroneous Jump 160 
10.5 Recovery from Erroneous Execution 162 
10.6 Choice of Recovery Mechanisms 163 
10.7 Summary 164 
References 166 
Figures 179 
Tables 211 
APPENDICES 
A1 Software to Test the Effects of Executing Undeclared 222 
Operation Codes 
A2 The Effects of Executing the Undeclared Operation Codes of 226 
the 8035/8048 
A3 Instruction Set Parameters 236 
A4 Equations for Transfers within a Program Area 244 
A5 Results of Execution in Unpopulated Memory Areas 248 
A6 Software for the Fault Simulation Test Facility 250 
vii 
List of Figures 
Figure Page No. 
1.1 Typical Diaphragm Operated Regulator 179 
1.2 Simple Microprocessor Control Arrangement 180 
2.1 Block Diagram of the 8085 Test System 181 
2.2 Layout of the Components 182 
2.3 Logic Diagram of the Memory Decoding Circuitry 183 
2.4 Circuit Diagram of the Test Power Supply Unit 184 
3.1 Block Diagram of the 8035/8048 Test System 185 
3.2 Full Instruction Set for the 8048 Manufactured by Intel 186 
3.3 Full Instruction Set for the 8048 Manufactured by NEC 187 
4.1 Erroneous Execution in Data Areas for Various Processors 188 
4.2 Flow of Execution in Random Data 190 
4.3 Recovery improvements Obtained by Seeding Data Areas 191 
4.4 Average Number of Instructions Executed with Seeded Data 191 
Areas 
5.1 Erroneous Jump into a Program Area 192 
5.2 Flow of Erroneous Execution in Program Areas 192 
5.3 Erroneous Execution in Program Areas for Various Processors 193 
5.4 Simplified Flow of Execution in Program Areas 194 
5.5 Erroneous Execution in Program Areas of the 68000 195 
6.1 Common Memory Arrangements for the 8048 195 
7.1 Flow of Execution Between Different Memory Areas 196 
7.2 The Effects of Adding Fault Tolerance to the Program Areas 197 
of the 8085 
7.3 The Effects of Adding Fault Tolerance to the Data Areas 198 
of the 8085 
7.4 The Effects of Adding Fault Tolerance to the Unused Memory 199 
Areas of the 8085 
viii 
7.5 The Effects on the Average Number of Instructions Executed 200 
by Adding Fault Tolerance to the 8085 
7.6 The Effects of Adding Fault Tolerance to the Program Areas 201 
of the 6800 
7.7 The Effects of Adding Fault Tolerance to the Data Areas 202 
of the 6800 
7.8 The Effects of Adding Fault Tolerance to the Unused Memory 203 
Areas of the 6800 
7.9 Probability of Data Corruptions in the 8085 204 
8.1 Memory Map of the Specific System Studied 205 
8.2 Wait State Recognition Circuit 206 
8.3 Circuit to Detect an Illegal Instruction Fetch 206 
8.4 Circuit to Detect a Write into ROM 207 
8.5 Circuit to Detect a Write Outside RAM Areas 207 
9.1 Logic Required to Detect an Operation Code Fetch 208 
9.2 Implementation of Logic on Test System 208 
9.3 Circuit to Restrict Execution to 16 K of Memory 209 
9.4 Software Flow Diagram for the Fault Injecting Test Facility 210 
ix 
List of Tables 
Table Page No, 
1.1 Reliability Requirements for Different Applications 211 
2.1 Voltage Level at which Errors Occurred in 8155 RAM Chips 212 
2.2 Location and Value of the First Errors Observed 212 
2.3 First Data Corruptions in RAM chip R5 213 
2.4 Length of Interruptions to the Test Supply, in Cycles. 214 
Necessary to Cause Corruptions 
3.1 Internal Memory of the 48-Series Microprocessors 214 
4.1 Results of Execution in Random Data 215 
4.2 Comparison Between Different Data Structures 215 
5.1 Comparison Between Processors for Erroneous Execution in 216 
Program Areas 
5.2 Comparison Between Actual Programs 216 
5.3 Results from the Simplified Analysis of Erroneous Execution 217 
in Program Areas 
5.4 Detailed Analysis of Modified Programs 217 
6.1 Probability of Different Outcomes After a Random Jump into 218 
an Unused Memory Area of an 8085 
6.2 Outcomes After a Random Jump into an Unused Memory Area of 218 
an 8085, Assuming Address Range C000 to FFFF is Unused 
6.3 Transfer from Unpopulated Areas of an 8048 219 
6.4 Transfer from Partially Decoded Memory Mapped Input Ports 219 
7.1 Data Corruptions in the 8085 Caused by Erroneous Execution 220 
8.1 Erroneous Execution Under Different System Arrangements 221 
x 
List of Symbols and Abbreviations 
A/C Alternating Current 
ACM Association of Computing Machinery 
AFIPS American Federation of Information Processing Societies 
AIAA American Institute of Aeronautics and Astronautics 
CCD Charge-Coupled Device 
CE Chip Enable 
CMOS Complementary Metal Oxide Semiconductor 
D/C Direct Current 
DAG Demand Activated Governing 
DX Op-Code Fetch from the Second Byte of a Double Byte Instruction 
e The Exponential Function 
EA External Access 
EMI Electromagnetic Interference 
EMP Electromagnetic Pulse 
ENSIMAG Ecole Nationale Superieure D'lnformatlque et de Mathematiques 
Appliquees Grenoble 
EPROM Erasable Programmable Read Only Memory 
FMEA Fault Mode Effect Analysis 
FTCS Fault Tolerant Computing Symposium 
HLT Halt Instruction 
I Number of Instruction Cycles or Transfers 
i.e. Integrated Circuit 
I/O Input/Output 
IEE Institution of Electrical Engineers 
IEEE Institute of Electrical and Electronic Engineers 
in. w.g. Inches Water Gauge 
IRQ Interrupt Request 
xi 
J Joules 
JMP Jump Instruction to a Non-Specif ic Location 
K Number of Instructions Executed 
K Kilo-Bytes 
kV Kilo-Volts 
L Length of Instructions in Bytes 
In Natural Logarithm 
LSI Large Scale Integration 
LSTTL Low Power Schottky Transistor Transistor Logic 
mA Mll l i -Amperes 
mbar Mil l i -bars 
mJ Mil l i-Joules 
ms Mil l i -seconds 
MHz Mega-Hertz 
mm Millimetres 
mV Milli-Volts 
MTBF Mean Time Between Failures 
N Number of Instructions Executed 
N_ Total Number of Bytes in the Program Area 
D 
N o c Expected Number of Data Bytes Read 
DC 
N Number of Conditional Jump Instructions in the Instruction Set 
O J 
N D A Actual Number of Data Bytes in the Memory Map 
N D ) Number of Double Byte Instructions in the Program Area 
N ( Total Number of Instructions in the Program Area 
N Number of Bytes Interpreted as Jumping Instructions 
N^ Number of Execution Sequences of K Instructions 
N L J Number of Bytes Interpreted as Jumping Instructions of Length L 
xii 
N LNJ 
N NJ 
N PB 
N RST 
N, 
N, 
N TB 
N T I 
NASA 
NATO 
N B A V 
NEC 
AV Nl 
N I E 
N I L 
NMI 
NMOS 
NMR 
NOP 
N R A V 
ns 
op-code 
P C J 
P D 
P DX 
Number of Bytes Interpreted as Non-Jumping instructions of 
Length L 
Number of Bytes Interpreted as Non-Jumping Instructions 
Number of Program Bytes which Appear in the Memory Map 
Effective Number of Restart Instructions 
Total Number of Execution Sequences 
Total Number of Possible Op-Codes 
Total Number of Bytes in the Memory Map 
Number of Triple Byte Instructions in the Program Area 
National Aeronautics and Space Administration 
North Atlantic Treaty Organisation 
Average or Expected Number of Bytes Read Before a Jump 
Nippon Electric Company 
Average or Expected Number of Instruction Executed Before a Jump 
Expected Total Number of Instructions Executed 
Upper Limit on the Number of Instructions Executed 
Non-Maskable Interrupt 
N-Channel Metal Oxide Semiconductor 
N-Modular Redundancy 
No Operation 
Average Number of Instructions Executed Before Resuming Valid 
Instruction Fetches 
Nano-Second 
Operation Code 
Probability that a Conditional Instruction will cause a Jump 
Probability of Entering a Data Area 
Probability of Entering the Operand Field of a Double Byte 
Instruction 
A Proportion of the Total Errors 
xiii 
IL 
LD 
LI 
LU 
NC 
Nl 
NJ 
RST 
TXX 
TXX 
U 
UU 
UUL 
UXi 
Xf 
P 
XiU 
P 
XjXi 
PXRST 
PF 
PROM 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Probabi 
Instruct 
Probabi 
Instruct 
ity of Entering an Area of Input Data 
ity of Entering the Input Area Twice 
ity of a Loop after Entering the Input Area Twice 
ity of Interpreting a Jump Instruction 
ity of Forming a Loop in the Data Area 
ity of Forming a Loop in the Input Area 
ity of Forming a Loop in the Unused Area 
ity that Particular Data is Not Corrupted 
ity of Executing a Given Number of Instructions or More 
ity of Interpreting a Non-Jumping Instruction 
ity of Entering a Program Area 
ity of Resuming Valid Instruction Fetches 
ity of Interpreting a Restart Instruction 
ity of Entering the Second Byte of a Triple Byte 
on 
ity of Entering the Third Byte of a Triple Byte 
on 
Probability of Entering an Unused Memory Area 
Probability of Entering the Unused Area Twice 
Probability of a Loop after Entering the Unused Area Twice 
Probability of a Transfer from an Unused Area to Memory Area Xi 
Probability of Entering an Operand Field in the Program Area 
Probability of Reaching a Particular Final State 
Probability of Transfering from Memory Area Xi to an Unused Area 
Probability of Transfering from Memory Area Xj to Memory Area Xi 
Probability of Interpreting a Restart in an Operand Field 
Pico-Farads 
Programmable Read Only Memory 
xiv 
RAM Random Access Memory 
RC Resistor and Capacitor 
RET Return Instruction 
RFI Radio Frequency Interference 
ROM Read Only Memory 
RST Restart Instruction 
SDK System Design Kit 
SEC/DED Single Error Correction/Double Error Detection 
SERC Science and Engineering Research Council 
SPC Jump Instruction to a Specific Location 
SSI Small Scale Integration 
TMR Triple Modular Redundancy 
TXX Op-Code Fetch from the Second Byte of a Triple Byte Instruction 
TXX Op-Code Fetch from the Third Byte of a Triple Byte Instruction 
uF Micro-Farad 
UPS Uninterruptible Power Supply 
us Micro-Second 
US United States 
v Volts 
Xf Represents a Particular Final State 
Xi Represents a Particular Memory Area 
Xj Represents Each of the Four Different Memory Areas 
XV 
No material contained In this thesis has previously been submitted 
for a degree in this or any other university. 
The copyright of this thesis rests with the author. No quotation 
from it should be published without his prior written consent and 
information derived from it should be acknowledged. 
CHAPTER 1 
Introduction and Review of System Reliability 
1.1 The Need for a Reliable Controller 
With the conversion from town gas to natural gas. the supply to the 
consumer has changed from a large number of isolated networks to one fully 
integrated distribution system. This system is connected to the supplies 
of natural gas in the North and Irish seas, and transports It around the 
country in large diameter pipes (up to 1050mm), at high pressures (up to 70 
bar). The pressure is reduced in stages into smaller diameter pipes until 
it is at a safe level to supply to the consumer. 
An analogy can be. drawn with the National Grid for electricity supply, 
where voltage corresponds to pressure, and current corresponds to flow 
rate. However, unlike the electricity system, gas can and must be stored 
within the network. This is necessary because the supply is obtained at a 
constant rate from the gas fields, whereas the demand by the consumer 
varies both throughout the day and throughout the year. Also any peak 
demand within a particular area must be supplied locally due to the time 
delay in transporting gas through the system. Therefore the network 
presents a complex arrangement requiring sophisticated control. 
At the high pressure end of the system large volumes of gas are being 
handled, and small increases in efficiency result in significant f inancial 
savings. Also failure at this level is likely to affect a large number of 
consumers, and this justifies high expenditure on control and safety 
equipment. As the pressures reduce, the quantities and hence the value of 
gas being handled becomes less, and high expenditure on control equipment 
is not justified. A low cost controller is therefore required and this is 
the aim of the current research. 
1 
At the low pressure end of the system, accurate pressure control is 
required for several reasons. Much of this part of the network is 
constructed from short sections of cast iron pipes laid many years ago. and 
it is estimated that there are almost 60 million joints. Small leaks can 
occur but rarely create a safety problem. However, taken collectively they 
represent a substantial loss of revenue. Therefore methods which reduce 
this leakage can have both financial and safety benefits. An extensive 
programme to replace old sections of the network has been in progress for 
several years and has cost hundreds of millions of pounds. Meanwhile a 
reduction of pressure within the system provides significant benefits by 
reducing leakage, and also repair and maintenance costs. It can also 
postpone eventual reinforcement of the network necessary to cater for 
increased demand, thus saving revenue equivalent to borrowed capital 
interest. 
Obviously the pressure cannot be reduced below a certain level or the 
gas would not reach the consumer. This would lead to the possibility of 
air entering the pipework producing a potentially hazardous condition. Low 
pressures can also affect the efficiency of some appliances. For these 
reasons a statutory minimum pressure has been set. This is S in. w.g. 
(inches water gauge) which is equivalent to the height of a column of water 
that the pressure can support, and is approximately 12.3 mbar. Clearly the 
aim is to supply the consumer with the minimum acceptable pressure 
throughout the dally load cycle. 
1.1.1 Present Mechanical Control 
Traditionally, an entirely mechanical approach has been adopted for 
the control of the low pressure distribution system. The use of diaphragm 
operated gas regulators, as shown in figure 1.1. is widespread, with 
2 
approximately 17.000 installed throughout the country. They aim to reduce 
the pressure to a steady value independent of the flow rate. This is 
achieved by feeding back the down-stream pressure into a chamber under the 
diaphragm. The force on the diaphragm is balanced by a spring or a series 
of weights. Any imbalance causes the valve to open or close and has the 
effect of increasing or reducing the down-stream pressure. By adjusting 
the loading, different output pressures can be maintained. However, this 
arrangement does not give perfect pressure control. The pressure tends to 
fall as the flow rate increases, and this is known as the 'droop' 
characteristic of the regulator. 
So far only the pressure at the outlet of the regulator has been 
considered. However, the consumer may be over a mile away from the outlet 
and therefore, by simple fluid mechanics theory, a pressure drop will exist 
along the pipework and will be proportional to the square of the flow rate. 
Consequently, with the simple regulator described above, it is necessary to 
set the output pressure at a higher level to guarantee that the consumer 
will be supplied with at least the minimum statutory pressure, at times of 
peak demand. Clearly this will result in a pressure well above the 
statutory minimum at other times. This is refered to as the 'over 
pressure' of the system, and has a maximum value of the sum of the 
regulator droop, the pipework losses and a safety margin, as the flow rate 
reduces. The safety margin is Included to allow back-up equipment to 
intervene if an abnormally low pressure is detected. Obviously the aim is 
to reduce the 'over pressure' to a minimum. 
The above example considers only one regulator and one consumer, in 
reality the situation is in fact far more complicated. Low pressure 
networks can be fed by more than one regulator and supply several thousands 
3 
of consumers. Due to varying demands from the system, the low pressure 
point may not always be at the same physical location. This makes 
effective control even more difficult. 
In the past, methods have been devised to provide automatic changes in 
the set point of a regulator to try and follow the pattern of demand. 
However, these do not operate directly on the district pressure, but on 
other parameters which are associated with demand, such as the time of day 
or the ambient temperature. Both of these parameters are not strongly 
linked with demand but control based on them has provided some savings. A 
third approach has used the flow through the regulator to adjust the set 
point, and has proved most successful. This is commonly know as demand 
activated governing (DAG). 
Spearman (98) has reported a number of DAG schemes which have all 
shown significant savings in repair and maintenance costs. They have 
provided DAG by mechanical means but have several disadvantages. At least 
three additional valves and a substantial amount of extra pipework is 
required. A complex setting up and commissioning procedure is necessary to 
ensure optimum performance, and this has to be repeated periodically to 
allow for changes in the network or demand. Therefore there is scope for 
further improvements. 
1.1.2 Future Micro-Electronic Control 
To overcome the problems with mechanically implemented DAG mentioned 
above, and to allow for other developments, it has been proposed that 
micro-electronic techniques could be applied to the control of gas 
pressure. A simple arrangement for such a system is shown in figure 1.2. 
It contains a microprocessor which reads the remote low pressure point from 
a transducer, and activates the valve to maintain a steady supply. To 
4 
ensure overall system safety and availability, all three parts must be both 
reliable and must be fai l-safe. 
The valve could be operated by a simple solenoid providing only open 
and closed positions. The pressure would then be controlled by pulse width 
modulation on the supply to the solenoid. Although this is a simple 
solution, it tends to be very unreliable due to the large number of 
operations needed to maintain a steady pressure. Another disadvantage is 
that failures in either of the normal operating positions produce dangerous 
conditions. 
A better solution would be to use a motorised valve. This will be 
more reliable as actuations are only required when the pressure changes, 
resulting in less mechanical wear. However, the response under fault 
conditions will be poor. 
The arrangement which has been used in initial trials with digital 
control utilises an Indirect approach. The main pressure reduction 
regulator is retained in the traditional configuration, but with the set 
point controlled by the microprocessor. This provides a much better 
solution as failure of the microprocessor system causes control to revert 
to mechanical pressure regulation. 
To adjust the set point a method of increasing and decreasing the 
spring loading within the regulator is required. Two prototype 
arrangements have been built. The first uses a stepper motor to adjust the 
length of the spring and hence the loading. The second uses two solenoid 
valves to feed up-stream or down-stream pressure under a second diaphragm 
which acts on the spring to adjust the loading. The solenoid system is 
preferred as it can be arranged to 'fall safe' on power failure, by setting 
the regulator to its maximum set point. Burrow (20) states that generally 
5 
the 'fail safe' approach has been neglected. It is much cheaper to 
implement than 'fail operational' designs, and is clearly acceptable in 
this application as failure will only result in a reversion to a high 
pressure setting within the network. The 'fail safe' approach still 
ensures that no area drops below the statutory minimum pressure. The 
stepper motor, however, will stay at its current position during a power 
failure. As mentioned above, this reverts to mechanical control, but. if 
demand increases, the low pressure point will fall below the statutory 
minimum. 
Initial trials have been carried out with both arrangements. As 
mentioned previously, the ideal solution is to monitor the remote low 
pressure point and relay the Information back to the controller, and this 
requires some sort of telemetry link. The use of hard-wired links is 
expensive, and therefore other methods of transmitting the data are being 
investigated. However, a system operating in the United States, described 
by Reese (84), uses telemetry and has shown that the cost of the equipment 
can be recovered within the first year, due to the reduction in lost gas 
alone. 
These initial trials have shown that micro-electronic control of the 
gas network is both feasible and economic. Another area to which it could 
be applied is the control of storage facilities. As indicated previously, 
it is necessary to store gas within the network and a number of 
arrangements have been developed such as gas holders, liquefaction plants 
and underground caverns. Recent interest has been directed towards the use 
of the medium pressure part of the network as a means of storage, and can 
be achieved by increasing the pressure and thus compressing the gas. This 
is known as ' l ine-pack' and is possible in this particular part of the 
6 
network because the pipework is relatively new and does not suffer from 
leakage. 
Due to the very stringent safety requirements, it was felt that 
further work should be carried out to investigate methods of increasing the 
reliability of these control systems. British Gas has had long term 
experience with mechanical regulators, and. as a result, has in-depth 
knowledge and expertise on their operation. This has led to the 
development of very reliable equipment. With regard to the control of the 
low pressure network, micro-electronics has only recently been used by the 
Corporation. Therefore, this work is aimed at investigating methods of 
increasing the reliability of the micro-electronic parts of the systems. 
1.2 Source of Failures 
All equipment can fail, and usually does so in a variety of different 
ways. In a complex electronic system the cause of failure can be due to 
design errors, component failures or to environmental effects. in m ic ro -
processor based systems, design errors can occur in both the hardware and 
software, and can be introduced at the specification, implementation or 
construction phases of a project. Shooman (92) gives an example of a data 
acquisition system where, over a nine month period, nearly half the 
failures were due to software errors. At the specification stage errors 
can be made due to an insufficient knowledge of the system to be 
controlled, or by an incomplete description of the required response under 
all operating conditions. The importance of these errors is emphasised by 
Soi and Gopal (97) who suggest that nearly 60% occur at this stage in the 
software. At the implementation stage, the choice of the wrong type of 
components in hardware, or the wrong algorithm in software, can lead to 
failure. Finally, errors can be made during the construction of hardware 
7 
or the coding of software. 
Components can fail due to a number of different failure mechanisms, 
and generally they follow the familiar 'bath-tub' curve. It shows a high 
failure rate at the beginning of their life due to manufacturing defects. 
This is followed by a period of constant failure rate, due to random 
effects, which is normally considered to be the useful life of the 
component. After this period the failure rate increases again due to wear 
out. A description of the types of failures observed in electronic 
components is given by Doyle (31). and a study of microprocessor devices is 
presented by Hnatek (47) who describes a number of physical failure 
mechanisms and how they can be detected. 
The correct operation of electrical and electronic systems can be 
disturbed by environmental conditions. In analogue devices it can result 
in noisy signals, but in digital equipment severe disruption of the 
processing sequence can occur. Sources of disruption include radio 
frequency interference (RFI). electromagnetic interference (EMI), radiation 
effects, static discharges and power supply variations. 
Whallen et al (113) have shown that RFl can disrupt digital circuits 
by changing their state. Sources of EMI in high voltage substations are 
listed by Pellegrini et al (79). and most are due to various forms of 
switching. The effects of lightning are also considered. May and Woods 
(64) highlight the problem of alpha particle interaction originating from 
packaging material. This has become a problem with the development of 
higher density chips, and affects most devices. General radiation effects 
on semiconductors have been investigated by Sexton et al (90), and they 
have shown that device parameters drift with dosage. 
Faults can be either permanent or temporary. Permanent faults occur 
8 
as a result of catastrophic failure of a component or subsystem, and also 
from inherent design errors. Some sources of temporary faults are 
described by Ng and Avizienis (70) and include component drifts around the 
limits of their specifications, and environmental factors. However, the 
errors produced by permanent faults may appear temporary. For example, a 
single node stuck at zero can only produce an error when it should be set 
at one. and this is illustrated by Gunther and Carter (39). Also a part of 
the circuit which is infrequently used may not cause any errors until it is 
exercised. Goldberg (37) indicates that some design faults, such as timing 
problems, can appear to be induced environmentally, and may be difficult to 
distinguish. For these reasons faults can remain undetected for a 
considerable length of t ime. 
McConnel et al (61) draw a distinction between intermittent errors and 
transient errors. Intermittents occur as a result of an underlying 
permanent fault and will periodically reappear, whereas a particular 
transient will occur only once. Ball and Hardle (5) indicate from 
practical experience that 90% of field failures are intermittent and are 
particularly difficult to isolate. 
Most reliability work In the past has considered only stuck at faults. 
More recently bridging faults have been considered where electrical contact 
is made between adjacent tracks, and these are described by Kodandapani and 
Pradham (52). Toschi and Watanbe (103) state that soft fails in memories 
can also be due to data patterns, timing and read/write sequencing. All 
these produce intermittent errors and are particularly difficult to 
identify. 
1.3 Methods of Increasing Reliability 
There are two compU mentary approaches available to increase 
9 
reliability and these are described by Avizienis (4). The first attempts 
to eliminate all sources of failure and is known as the fault intolerance 
approach. The second recognises that failures will occur, and attempts to 
mask their effects by the use of redundancy, this is known as fault 
tolerance. To achieve very high reliability a combination of both these 
approaches is necessary, and can be applied to each of the three sources of 
failure described above. 
1.3.1 Reducing Failures due to Design Errors 
Errors in hardware design have been reduced to a very low level by the 
implementation of rigorous procedures at all stages. Complex computer 
programs are used to analyse and simulate the hardware to check for a 
number of faults. Hazard and race conditions in logic circuits can be 
detected, interconnections can be checked for the correct routing, and 
loading on each node can be analysed to ensure, for example, that maximum 
fan-out is not exceeded. Once the hardware is constructed, thorough 
testing is carried out to verify correct operation. 
Fault free design is more easily achieved due to recently developed 
integrated circuits which have themselves been designed for simple 
interconnection. This reduces the amount of work necessary by the system 
designer, but increases the effort required by the chip designer. Design 
errors within large scale integrated circuits (LSI) are more likely to 
occur due to the increased complexity of these devices. This problem is 
highlighted by Sequin (89). 
An advantage with microprocessor based hardware is that the basic 
circuit can be used for many applications. This reduces the possibility of 
introducing errors into new projects. Software, however, has been treated 
in a different manner in the past, and remains a serious source of failure. 
10 
This is due mainly to the unlimited way in which software can be arranged 
and that, in almost all cases, new code is written for each application. 
Recently much more emphasis has been placed on software reliability. 
This is due to the increased proportional cost of the software within 
systems, which results from increased complexity and reduced hardware 
costs. Greenspan and McGowan (38) state that 70% of US Air Force computing 
expenditure was for software in 1972, and this is expected to rise to 90% 
by 1985. Both fault intolerant and fault tolerant approaches have been 
investigated to alleviate this problem. The advantages of structured 
programming are widely recognised, making programs easier to read and 
understand, and thus simplifying the process of identifying errors. It 
tends to force the programmer to divide the problem into a series of 
modules. Nelson (66) reports on analyses which have shown that the error 
rate increases with the routine size. This is because smaller modules are 
far more easy to understand and test, and therefore methods which enforce 
the use of smaller modules will increase reliability. 
As well as the language itself, the environment under which programs 
are developed is also important in enabling efficient testing and isolation 
of errors. For these reasons the United States Department of Defense has 
sponsored an extensive project to design a new language (Ada), and its 
associated development environment. Programming in Ada is more difficult 
than other languages due to tight restrictions on syntax and variable 
types. But it facilitates the early detection of errors at both compile 
and run time, reducing the overall development time. It also makes the 
code easier to understand and modify. This is particularly important as 
Dunn and Ullman (32) have shown; in badly written packages more errors can 
be introduced than are removed at the debugging stage, making the whole 
11 
system less reliable. 
The fault tolerant approach recognises that bugs will remain in the 
software, and two methods of counteracting their effects have been 
proposed. Randell (83) suggests the use of recovery blocks. In this 
method an acceptance test is executed after each program module, and, if 
the results fail the test, an alternate algorithm is used. This process 
can be repeated until an acceptable set of results is obtained, or until 
all the alternate algorithms have been tried. In the latter case a 
different form of recovery must then be used. A practical example of 
recovery blocks in action is given by Anderson and Kerr (1). 
The other approach is called N-version programming as described by 
Chudleigh (23). In this case all versions of a particular program module 
are executed and a majority vote is taken on all the results. 
Both methods have their own advantages and disadvantages. For 
example, the recovery block procedure operates much faster In the absence 
of errors, but. to enable accurate error detection, complex acceptance 
tests are sometimes necessary. These can themselves be a source of error, 
as can the voting software in N-version programming. However, in the 
latter case the crit ical software is much smaller and will be less 
susceptible to errors. A comparison between the two techniques is given by 
Wei (109). He concludes that N-version programming is better than the use 
of recovery blocks because of problems with acceptance tests. 
1.3.2 Reducing Failures due to Component Malfunctions 
Unlike design errors which can. theoretically, be eliminated from a 
system, component failures are always possible. In the past both fault 
tolerant and fault intolerant approaches have been favoured at different 
times. In the early days of digital computers thousands of valves were 
12 
used in e a c h machine, and reliability was poor due to the high failure 
rates of the components. Redundancy was n e c e s s a r y to improve performance. 
With the advent of the transistor and the subsequent development of the 
integrated circuit, less emphas is has been placed on redundancy due to the 
vast i n c r e a s e in the reliability of the components. Examples of fault 
tolerance in early computers is given by Carter and Bouricius (21). 
In more recent years some computer applications have required even 
higher levels of reliability. T h e s e Include c a s e s where human life is 
involved or large financial l o s s e s are incurred on failure, s u c h as manned 
and unmanned s p a c e flight, hospital life support equipment and aircraft 
control. Attempts have been made to inc rease still further the reliability 
of components used in these applications. Significant improvements can be 
achieved by screen ing out weak dev ices , and Pappu et al (75) descr ibe 
methods of detecting them. Burn- in is a popular technique whereby 
equipment is operated at elevated temperatures before actual use. Even 
with these improvements it has again been n e c e s s a r y to use the fault 
tolerant approach. 
A popular arrangement has been the use of triple modular redundancy 
(TMR) which was first proposed by Von Neumann in 1956 (105). TMR cons is ts 
of three identical modules, e a c h performing the same function, which are 
connected to a majority voting circuit. If one module fails, the voting 
circuit masks any errors by outputting the values from the other two. 
Clearly this requires at least three times as much hardware as a simplex 
system. 
In many c a s e s the extra cost could not be justified, and. in these 
c a s e s , dual systems have been used. They can be configured in a number of 
ways. In a cold standby arrangement, a second module is maintained in an 
13 
inactive state and requires initialisation before use. Lonn et al (58) 
descr ibe a hot standby system where the second module continually monitors 
the p rocess , ready to take immediate control. In many control applications 
the switch over to the standby system is performed manually after the 
activation of an alarm. In a duplex arrangement both modules perform 
identical operations and compar isons are made between their outputs. This 
provides simple error detection but does not readily indicate which module 
is in error. 
As the cost of hardware has fallen and the requirements of reliability 
have increased , more complex arrangements have been developed. N-modular 
redundancy (NMR). where N represents the number of modules, has been 
proposed in c a s e s where the reliability of TMR is considered insufficient. 
Examples using four channe ls have been constructed for the F - 8 fighter 
aircraft descr ibed by Bumby (19). and for NASA's s p a c e shuttle descr ibed by 
Qelderloos and Wilson (36). In both c a s e s the requirement is for safe 
operation in the p r e s e n c e of two failures. 
An important property of these systems is that reliability is 
drastically reduced after each failure. For example. TMR is at least twice 
as unreliable a s a simplex system after a single failure, and therefore it 
is important to repair failed modules quickly. In c losed systems, such as 
unmanned spacecraf t , manual repair is not possible. NMR can be used to 
survive several failures by increasing the number of modules. 
Alternatively a number of standby spares can be provided so that the system 
can reconfigure itself in order to substitute a failed component or 
subsystem for a good one. This effectively provides automatic repair. 
Wensley (110) proposes the use of a number of loosely connected units, 
with the fault tolerance implemented by software. In this way critical 
14 
tasks can be executed on several units with voting carr ied out in the 
program. This arrangement allows dynamic reconfiguration to eliminate 
faulty units after they have been identified. This sort of arrangement is 
commonly used in telephone switching equipment, and has also been proposed 
for aircraft applications by Hamill and Phillips (40). 
Hybrid systems utilising a combination of the above architectures to 
exploit their individual advantages are becoming more popular. For 
example, Hopkins (48) descr ibes a processing concept for s p a c e vehicles 
which uses duplex. TMR and standby sparing. 
1.3.3 Reducing Fai lures due to Environmental Effects 
Significant improvements in reliability can be obtained by reducing 
the effects of environmental phenomena. The fault intolerant approach does 
this by providing a stable local environment for the equipment. Basu (9) 
and Williamson (115) give comprehensive details of possible steps for 
reducing the effects of noise, and indicate that the design of the system 
enclosure is of great importance. In the United States the level of EMI 
emitted from digital equipment is restricted. To meet these requirements 
good shielding is n e c e s s a r y , which not only reduces emiss ions , but also 
reduces the susceptibility of the equipment from external EMI. 
Boothman (14) descr ibes methods of designing cabinets for optimum 
shielding, and suggests the use of metals, metalised coatings on plastics 
or conductive plast ics. Ideally a continuous unbroken metal enc losure 
forming a Faraday cage is preferred in order to eliminate most electr ical 
interference. However, all systems need to communicate with the outside 
world and most require an external power supply. Therefore apertures in 
the enclosure are inevitable and Boothman shows the importance of both 
their size and location relative to the internal components. Rostek (86) 
15 
suggests a 'Rule of Thumb' of restricting maximum openings to 25mm for e a c h 
nanosecond rise time of the digital circuits. 
He also e m p h a s i s e s the importance of conducted interference on power 
supply and signal l ines and suggests the use of comprehensive filtering. 
Routing of power and signal cab les and the quality of their sheilding is 
also important, and is d i s c u s s e d by Dick (30). With the development of 
fibre optics, data transmission can be made far more s e c u r e . Dyer (33) 
recommends their use . especial ly in military equipment, for immunity of 
both EMI and the more damaging EMP generated by nuclear explosions. 
As well as EMI super imposed on the power supply, brown-outs and b lack-
outs can occur , where the voltage is reduced or lost completely over a 
period of time. In these c a s e s filtering alone is not sufficient. T h e s e 
problems occur frequently and have led to the development of 
uninterruptible power suppl ies (UPS) . A number of arrangements have been 
developed for large installations, with the standby power source provided 
by batteries or diesel generators. T h e s e are descr ibed by Sulway (99). and 
in these c a s e s an A / C supply is maintained. 
In smal ler systems batteries alone can directly provide the n e c e s s a r y 
D/C levels. This has led to the development of higher capacity minature 
batteries. such as the z inc /a i r type descr ibed by Pytches (82). 
Rechargeable batteries can be trickle charged when the external s o u r c e is 
available, ensuring that they are in good condition when required. This 
sort of arrangement has been used in the NATO III communication satell ites 
descr ibed by McKinney and Briggs (62). Solar cel ls provide the external 
source to charge several sets of batteries, and are required for peak 
demand and to ensure continuous operation during solar ec l ipses . 
Other environmental factors such as mechanica l shock and vibration 
16 
must also be taken into account , and can usually be s u p p r e s s e d by suitable 
damping. Thermal effects are a lso important. It is widely recognised that 
high component temperature leads to an increased failure rate, it can also 
c a u s e a general drift in component properties, therefore, methods which 
restrict temperature such as cooling fins or convective fans will produce 
benefits. However, the use of fans drawing air in from outside the cabinet 
can have detrimental effects, s i n c e openings are required to allow for the 
p a s s a g e of air, and these introduce the possibility of increasing the 
susceptibility to EMI as descr ibed above. It a lso allows moisture, solid 
particles and corrosive subs tances to enter the enc losure . Filtering can 
be used to reduce the possibility of contamination, but in certain c a s e s a 
totally sea led unit is preferred. 
Shielding is effective In many c a s e s but does not prevent all external 
interaction. Ziegler and Lanford (117) have shown that even half a metre 
of concrete has little effect on reducing interference to c h a r g e - c o u p l e d 
devices (CCD) from certain types of c o s m i c rays. However, they do suggest 
that the orientation of the devices can be used to reduce the problem. 
Shielding is a lso ineffective against internally generated interference and 
alpha particle interaction originating from package material. 
In these c a s e s the components themselves can be designed to be less 
susceptible to certain disturbances. Brodsky (16) suggests methods of 
improving RAM's against alpha particle attack, and Kim et al (51) descr ibe 
methods of hardening devices against general forms of radiation. Certain 
device technologies are inherently less susceptible to radiation than 
others. Barton et al (8) show that bipolar devices are superior to 
complementary metal oxide semiconductors (CMOS), which in turn are better 
than N-channel devices (NMOS). 
17 
As descr ibed above, fault intolerance can be used to reduce the 
influence of environmental phenomena. In genera l , great improvements can 
be obtained for a small cost if careful consideration is taken at the 
design stage. Additional improvements c a n be made but usually involve ever 
increasing costs . In such c a s e s the fault tolerant approach is worth 
while. 
Environmental disturbances can either c a u s e permanent or temporary 
damage to systems. With the precautions taken above, damage will be 
reduced and transient effects will predominate. The redundancy techniques 
mentioned in the previous section will be effective provided that 
simultaneous faults in different channe ls do not occur . Much work has been 
aimed at developing techniques to detect and correct errors in memory 
systems. Most techniques rely on error detection and correction c o d e s , 
such as those proposed by Hamming (41). Extra bits of information are 
added to each of the data words and these can indicate which particular bit 
is in error if a fault occurs . Levine and Meyers (57) indicate the number 
of check bits required for single error correction and double error 
detection ( S E C / D E D ) . However, if more than two bits fail they may not be 
detected or an erroneous correction may be made. In these c a s e s Walker et 
al (108) descr ibe a memory system which is capable of masking off failed 
bits to survive multiple faults. 
Time redundancy is a useful means of counteracting transient faults. 
This method uses the re-execution of a program segment at a later time. In 
the anticipation that transient disturbances will have subsided. A number 
of different stategies can be adopted. For example, a particular segment 
could be executed repeatedly until two or three consecut ive results are the 
same. Alternatively the segment could be executed a fixed number of times 
18 
and a majority vote taken. This is similar to N-version programming with 
all versions Identical. 
Rollback techniques are another effective defence against transient 
faults. In this c a s e the program periodically saves information about its 
current state. This is known as a checkpoint. When an error is detected, 
execution can then be restarted at one of these points. O'Brien (72) 
studies several checkpointing stategies. and he recognises that in control 
applications the speed of recovery is usually crit ical , requiring the 
frequent insertion of rollback points. He shows that a large overhead is 
n e c e s s a r y with regard to both execution time and memory s p a c e . To limit 
overheads, a checkpoint should be saved when the critical data is at a 
minimum, and this will normally occur at the end of a calculation. It is 
recommended that a checkpoint should be saved at least once during e a c h 
control loop. 
A disadvantage of this technique is that added complexity in the 
software is both costly and prone to error. Barigazzl and Strigini (6) 
suggest that the setting of recovery points should be transparent to the 
programmer to overcome these problems. This has been implemented on the 
Cm computer by Siewiorek (93). Checkpointing and rollback are similar to 
the use of recovery blocks, where instead of using alternate algorithms the 
s a m e segment Is repeated until an acceptlble set of results is obtained. 
Lee et al (56) have proposed a method of reducing the programming 
requirement in the use of recovery blocks by using a recovery c a c h e . This 
automatically saves critical data as the program executes and could be used 
in simple rollback recovery. 
1.4 Reliability Improvements Obtained 
To determine the improvements obtained by adding one of the features 
19 
descr ibed above, it is n e c e s s a r y to determine the failure rate of the 
system with and without the modification. Modern microprocessor based 
systems have high reliability with a mean time between failures (MTBF) of 
several thousand hours. Therefore practical testing under normal operating 
conditions is both time consuming and costly. For individual component 
failure rates a number of data b a s e s exist, such as MIL -HDBK-217D (121) 
compiled by the US Military, and HRD3 (122) compiled by British Te lecom. 
HRD3 is based mainly on field data, whereas MIL-217D is based both on field 
data and acce lera ted life testing. A comparison between various failure 
rate data b a s e s is given by Siewiorek et al (94). 
Accelerated life tests have become very popular. They aim to speed up 
the failure process by subjecting the device to a more severe environment 
than normal, such as increased humidity, vibration or temperature. 
However, great ca re must be taken with the results. Siewiorek et al (94) 
show how the Arrhenius equation can be used to translate acce lera ted test 
data to ambient conditions and indicate that a factor of 62 difference in 
predicted failure rate can be obtained by the cho ice of activation energy. 
Another problem with acce le ra ted tests is that If the conditions are varied 
too much, then failures due to other mechan isms can o c c u r which will not be 
present under normal conditions, and this Is illustrated by Hart et al 
(42). 
However, these tests do provide useful results if c a r e is taken, but. 
unfortunately, are normally carr ied out only at the component level. Full 
system testing can be achieved but requires bulky equipment and is time 
consuming. For these reasons a great deal of r e s e a r c h has been aimed at 
modelling systems and predicting overall failure rates from the components. 
To ass is t in the calculat ions several computer programs have been written, 
20 
s u c h as A R I E S , descr ibed by Ng and Avizienis (71). and PREDICTION, 
descr ibed by Bell et al (10). 
improvements obtained by techniques to counteract software design 
errors are difficult to quantify. They are dependent on the knowledge of 
the failure rate before and after implementation, and this information is 
not readily available. Musa (65) states that assembly language programs 
have an average of between 3-8 errors per 1000 lines before testing. He 
proposes that the number remaining in a system is proportional to the time 
between error detection during testing, and suggests that this can be used 
to predict the failure rate of the final version. Hecht (44) proposes a 
model for the reliability of software systems using recovery blocks, and 
evaluates their effect iveness by trying 'what if numbers in the model. He 
conc ludes that for a given level of reliability, the goal can be reached 
more cheaply by using the fault tolerant approach. 
An alternative approach to determine improvements is to simulate the 
hardware on another computer. A variety of faults can then be injected 
into the simulator and the response of the system observed. This method 
was adopted for the Saturn V launch vehicle digital computer, and is 
descr ibed by Ball and Hardie (5). 
1.5 Importance of Error Detection 
From the types of investigations mentioned above, a large number of 
predictions have been made for the improvements obtained by e a c h of the 
redundancy techniques. In many c a s e s a large variation in the results 
exist, and this is due mainly to the assuptions made about failure 
m e c h a n i s m s and recovery response . For most arrangements , error detection 
and fault location is of prime importance, for both recovery and 
maintenance. Triplex systems provide simple identification of single 
21 
failed units, whereas with duplex systems fault location is more difficult. 
Significant benefits can be obtained with the addition of fault detection 
mechan isms especial ly in systems relying on software implemented recovery. 
A number of techniques have been developed and generally they fall 
into two main catagories of continuous monitoring and periodic checking. 
Continuous monitoring can be provided by self checking circuits or 
arithmetic codes . Self checking circuits are designed to fail in a s e c u r e 
manner, and are descr ibed by Williamson (114). One approach is to 
duplicate all s ignals using complementary logic, and this is descr ibed by 
Sedmak and Liebergot (88). In this way all single point faults and most 
multiple faults are easily detected. Arithmetic codes are d i s c u s s e d by 
Avizienis (3), they are an extension of error correcting codes in memories, 
but their properties are maintained during arithmetic and some logical 
operations. They c a n therefore be used to detect errors in memory, on the 
bus and in the processor , but require spec ia l process ing units. 
Periodic c h e c k s can be initiated by software to exercise all e lements 
of the system, in order to test for correct operation. Barraclough et ai 
(7) state that it is impossible to test for all faults, and therefore 
partial testing of e a c h functional block is recommended. This approach is 
adopted in an aircraft application, using duplex redundancy, descr ibed by 
Johnson and Shaw (50). and is used in conjunction with other techniques 
such as rollback and reconfiguration. 
P r o c e s s o r testability is d i s c u s s e d by Robach et al (85), who suggest 
that a systematic approach should be adopted where blocks are tested by 
elements which have already been verified. Clearly, some blocks must be 
assumed fault free initially, and the aim is to reduce this hard core to a 
minimum. Smith (96) investigates four different methods of testing 
22 
processors and concludes that the systematic approach is the best. Example 
programs for functional testing of the 8080 are given by Peckett (78) and 
Nichols (69). The 6805 has an in-built test program which is descr ibed by 
Boney (13). Unfortunately it requires a speci f ic external configuration 
and therefore cannot be used as a built in test feature. 
Random a c c e s s memory (RAM) tests have been studied extensively. It is 
recognised that exhaustive testing for all possible pattern sensit ive 
faults is not realistic. This has led to the development of a number of 
selective tests which are designed to reveal certain expected faults. 
Thatte and Abraham (102) descr ibe a number of failure m e c h a n i s m s and the 
tests n e c e s s a r y to detect them. Read only memory (ROM) can be tested by 
the evaluation of a c h e c k s u m , and this method is explained by Jack et al 
(49). input and output lines can be c r o s s connected for testing, or in a 
c losed loop control situation, the response to a small disturbance by the 
controller can be monitored to reveal faults in all the interfacing 
circuits. This latter procedure is suggested by Kurzhals and Deloach (55) 
for an aircraft application. 
T h e s e checking routines can be executed in a background mode similar 
to that proposed by P r e e c e and Stewart (80). in all c a s e s the aim is to 
detect errors quickly so that they cannot propogate and prevent recovery. 
Using these methods it is possible to detect some faults before they have 
disrupted program execution, and is due to the error latency of digital 
circuits. This is the time taken for a fault to generate an error on the 
output of the device. Shedletsky and McClusky (91) show that even in a 
simple four state sequential circuit the error latency can be several tens 
of c y c l e s , and will be far more in complex circuits. 
T h e s e types of self checking procedures are particularly useful in 
23 
duplex arrangements and those using stand-by s p a r e s , to locate failed units 
during operation. Another important use is in applications having short 
mission times. For these, the importance of a fault free system prior to 
use is illustrated by T a s a r (100) in connection with aircraft control. He 
suggests that 90% of faults can be detected in this way. with only a basic 
knowledge of the hardware. 
Error detection is an Important aspect of fault to lerance, but without 
error correction Kopetz (53) has shown that availability is reduced. 
1.6 Possible Dangers of Adding Redundancy 
Careful consideration must be taken when adding redundancy to a 
system, as increased complexity can lead to design errors. Even correct 
designs can be less reliable than non-redundant systems. For example, if a 
single voting arrangement is adopted in a TMR system, then the voters must 
be more reliable than a single channel to achieve an overall improvement, 
and this is shown by Wakerly (107). In equipment containing standby 
s p a r e s . Losq (59) has shown that a system with a large number of s p a r e s is 
less reliable than the corresponding simplex arrangement, due to the 
complexity of the switch. Elkland and Siewiorek (34) show that memory 
error detection and correction systems can also be less reliable, due to 
failure of the additional memory and correction circuits. 
Another important factor is the concept of coverage which was first 
introduced by Bouricius et al (15). It is the probability that a system 
will recover from a fault without any loss of essent ia l information. 
Clearly the aim is for a high level of coverage, and Arnold (2) has shown 
that even a small percentage of uncovered faults has a severe effect on the 
reliability of redundant systems. T h e s e faults are cal led common mode 
failures, and can be the major source of system unreliability. Westermeier 
24 
(112) shows that adding redundancy with low coverage actually reduces 
overall reliability. 
Most of the techniques mentioned so far are designed to counteract 
particular c l a s s e s of faults. If these fault types are not common in the 
final system then the methods will be ineffective and may even reduce 
overall reliability. For example, most fault tolerant memory systems are 
designed to detect and correct single bit failures, where multiple bit 
failures may be more common due to simultaneous disturbances in several 
chips. Exhaustive memory tests to detect faults are not possible due to 
restrictions of time. For this reason tests have been developed for 
certain types of fault such as interactions between adjacent ce l ls . 
However, Heftman (45) descr ibes modern devices with extra rows of ce l ls 
which can be substituted for faulty ones. This severely reduces the 
effectiveness of the tests. 
Wulf (116) states that increasing the reliability of individual 
components has little effect on the mission time, but increasing the 
coverage of the most probable fault produces significant improvements. It 
is therefore of great importance to know what type of failures will occur 
in the real system, so that only methods suitable to counteract those 
particular faults are adopted. 
Previous sect ions have indicated that a particular error detection and 
correction mechanism is not effective against all faults. It is therefore 
n e c e s s a r y to use a number of techniques. Pearson et al (77) descr ibe a 
hierarchical approach with different levels of fault recovery. At a low 
level modular redundancy and memory protection are transparent to the 
program, and are independent of the application. At higher levels the 
mechan isms become more application dependent, with the use of software 
25 
techniques. The highest level must cover all other undetected faults, and 
is usually provided by a watchdog timer. This device Is periodically 
updated by the control program, and is normally configured to generate a 
master reset if it fails to receive correct signals. The benefits obtained 
by watchdogs are recognised, but the following chapters show that careful 
consideration for their design is n e c e s s a r y . 
1.7 Requirements for Different Applications 
Different applications have varying operating requirements, and in 
each c a s e a particular technique is sometimes n e c e s s a r y . A number of 
applications and their specif icat ions are given in table 1.1. For systems 
such as aircraft control very short program loops are n e c e s s a r y to maintain 
stability. Therefore detection and correction of errors must occur very 
rapidly, and requires the use of TMR or NMR. This usually prevents any 
interruption of program execution. 
In telephone switching systems, short interruptions are permissible, 
but repair must be quick and effective. Emphas is is placed more on the 
detection and isolation of faulty modules, and this is achieved by a large 
number of process ing units e a c h adopting a duplex arrangement. In this way 
faults within a particular unit are easily identified, while recovery is 
performed by reallocation of process ing tasks. 
In many industrial control situations, such as coal fired power 
stations descr ibed by Bland et al (11), it is only n e c e s s a r y to detect an 
error and to switch safely to mechanica l or manual control. In these 
c a s e s , process ing power can be lost for several s e c o n d s or minutes without 
severe damage to the plant. 
For the British G a s application of micro-e lect ronic implementation of 
DAG, the latter c a s e is acceptable for most networks. This is b e c a u s e only 
26 
the set point of the regulator requires adjustment with local mechanica l 
control of the pressure . If the set point is changed too often then 
instability between the two control mechan isms can occur . Spearman (98) 
suggests a time interval between adjustments in the range of 5 to 120 
seconds . Therefore a loss in processing time of a similar duration will 
not be detrimental, provided that the regulator is not driven to its lowest 
setting during failure. However, in networks containing large industrial 
loads, rapid c h a n g e s in demand can occur and this requires a faster 
response with a correspondingly shorter control loop. 
An architecture for a small digital controller suitable for this 
application has been proposed by Pearson (76). It cons is ts of a 
triplicated processor arrangement with voting, connected to a single block 
of RAM which contains single bit error correction and double bit error 
detection. The control program is stored in two different EPROM's so that 
if one fails the other one can be used. This architecture does have a high 
coverage for a number of fault conditions, but is suscept ible to several 
possible common mode failures, which can remain undetected by the hardware, 
in these c a s e s detection and correction methods within the software are 
required. 
An alternative approach, which could be used , is descr ibed by O b a c -
Roda and Davles (73). They suggest using three independent microprocessor 
systems connected in a ring structure. E a c h system operates in loose 
sychronism with the other two. and voting on results is achieved in 
software. A similar arrangement couid be used but with all the channe ls 
working in complete isolation. The outputs could then be brought together 
at the actuators, and even these could be isolated by using seperate ones 
for each channel . With such isolation, Del lacorna et al (29) have 
27 
indicated that it would not be n e c e s s a r y to use the same processor in e a c h 
unit, and therefore e a c h one could be designed and programmed by a 
different development group to eliminate the possibility of nearly all 
common mode failures. 
By reducing the interaction between modules, a great deal of physical 
and electrical isolation c a n be achieved, especial ly with the use of fibre 
optics. Emfinger and Flannigan (35) descr ibe how physical isolation is 
used to improve the survivability of a fighter aircraft from attack. The 
use of these methods in the British G a s application could reduce the risks 
from rare events such a direct lightning strikes and vehicle impacts, which 
have occurred in the past. 
1.8 Contents of the Thes is 
The aim of the work descr ibed in this thesis is to investigate methods 
of increasing system reliability with particular attention given to 
software techniques. It has been indicated in the foregoing d iscuss ion 
that both transient and intermittent failures are common, and therefore 
improvements in this a rea are most likely to give significant benefits. To 
prevent failure, both error detection and error correct ion must be 
effective, and detection m e c h a n i s m s receive particular attention. 
It has been shown that the actual failure m e c h a n i s m s are important in 
the development of redundancy techniques. Most r e s e a r c h e r s have adopted a 
policy of considering only single point failures. This is a legacy from 
early reliability studies on systems containing descrete components and 
small s c a l e integration (SSI) . With the development of large s c a l e 
integration (LSI), it is an increasingly more complex process to analyse 
systems at the transistor and gate levels. Also, faults are less likely to 
be limited to single nodes due to their physical size and very c lose 
28 
proximity to each other. Despite this there is very little information 
available about failure m e c h a n i s m s observed at the subsystem level. 
Chapter 2 contains a description of a number of practical test which 
were carr ied out on a small microprocessor based system. T h e s e were 
primarily concerned with electr ical interference on the power supply rails. 
Errors observed at the chip level are presented. The tests revealed that a 
number of mechan isms exist which c a u s e the corruption of the program 
counter, resulting in the possible resumption of execution at any location 
in the memory map. This demonstrates the importance of the undeclared 
operation c o d e s in microprocessors which may be read under these 
conditions. Chapter 3 investigates the undeclared codes of several 
p rocessors and reveals other undeclared properties. 
Chapters 4. 5 and 6 look at the response of different p r o c e s s o r s to 
erroneous execution in speci f ic parts of the memory maps. Analysis is 
performed by a ser ies of mathematical models derived from Markov diagrams. 
In some c a s e s they have been verified by computer simulations. Chapter 7 
studies the flow of erroneous execution between different memory a r e a s , and 
represents the response to a random jump within the memory map. A 
comparison between p r o c e s s o r s and different memory arrangements are made, 
and the effects of adding error detection is presented. 
Chapter 8 shows how the reliability of speci f ic systems can be 
improved by the addition of the techniques developed in the previous 
sect ions, and also suggests some hardware detection m e c h a n i s m s . Chapter 9 
descr ibes a testing facility which has been constructed to physically check 
error detection and correct ion mechan isms. It allows the injection of a 
large variety of faults, and permits rapid testing. 
Finally, the conclus ions drawn from the research and the suggest ions 
29 
for future development are presented in chapter 
30 
C H A P T E R 2 
Pract ical Tests to Determine Transient Fai lure Mechanisms 
2.1 Introduction 
It has been shown in the previous chapter that methods of increasing 
the reliability of a system are generally designed to counteract a 
particular fault type, and are only effective if these faults are common. 
Ball and Hardie (5) have indicated that over 90% of field failures are due 
to intermittent or transient faults. Therefore techniques which enable 
recovery from the errors resulting from these faults will have a 
significant effect on reliability. The c a u s e of these events have been 
d i s c u s s e d , but details of their effects, especial ly at the time of failure, 
are not fully understood. This is due to the random nature of their 
o c c u r r e n c e , which means that analysis of failure is usually possible only 
after the event when little data is available. The only indication that a 
transient has occurred may be that the system has c r a s h e d or an erroneous 
output has been made. 
To enable the development of effective detection and recovery 
techniques, it is n e c e s s a r y to have a more detailed understanding of the 
mechan isms of failure. There are three methods available for 
investigation, and these are theoretical evaluation, computer simulation 
and practical tests. Theoretical evaluation rel ies on the assumption of 
certain fault conditions, such as single nodes stuck at 0 or 1, and the 
evaluation of their effects on the rest of the system. This is known as 
fault mode effect analysis (FMEA). and provides information about possible 
failure mechan isms . However, without a knowledge of the o c c u r r e n c e rate of 
the assumed faults, it is not possible to determine the most common 
failures. 
31 
With computer simulation, a model of the system at the transistor or 
gate level is produced, and this was the approach adopted for the Saturn V 
guidance computer descr ibed by Ball and Hardie (5). Faults can then be 
simulated and the effects observed, but this suffers from the s a m e 
disadvantages as FMEA. Pract ical tests are the only way of determining 
which faults will occur in real systems. Once these have been establ ished. 
FMEA and simulation can then be used more effectively. 
Little information is available on practical testing of systems under 
transient disturbances. Those which are reported have focused their 
attention on methods of eliminating disruption by shielding or filtering. 
For example, Teets (101) states that short interruptions, of a few mill i-
s e c o n d s , can c a u s e corruption to the contents of memory and also n o n -
programmed jumps. He suggests that these problems can be overcome by the 
use of uninterruptible power supplies. Although vast improvements can be 
made in this way. it is not 100% effective in all c a s e s , especia l ly for 
unanticipated phenomena. For these c a s e s it is n e c e s s a r y to adopt the 
fault tolerant approach. 
This chapter descr ibes work carr ied out to identify possible failure 
m e c h a n i s m s , in small digital control lers, by the use of practical tests. 
2.2 Test System 
The test system which has been constructed for the purpose of 
identifying fault modes and their f requencies, is descr ibed in detail in 
the following sect ions. The hardware consis ts of a single process ing board 
powered by a purpose built power supply unit. 
2.2.1 Processor Board 
The processor board is based on the design of a small single board 
computer given in the 8085 U s e r ' s Manual (119). However, a few 
32 
modifications have been made for this application. A block diagram of the 
system is shown in figure 2.1, and the layout of the components is given in 
figure 2.2. Two main modifications have been added. Extra circuitry has 
been included to fully decode the on board memory, and an RS232 interface 
provides serial communicat ions with a terminal. 
The main components of the system a r e : -
8085 8 bit microprocessor 
8155 256 byte RAM + 22 parallel I/O lines + timer 
8755 2 K byte EPROM + 16 parallel I/O l ines 
6.144 MHz Crystal 
The power supplies to the three main integrated circuits, to the 
decoding circuits and to the RS232 interface are not permanently connected 
together, but are joined by removeable links. This allows the connection 
of an alternative supply to different parts of the board, so that the 
effects of interference on individual components can be observed. It 
should then be possible to identify levels of interference that effect 
different components before trying to analyse the whole system. 
Resistors are connected to the data lines so that they can be pulled 
high or low. 
2.2.2 Decoding Circuitry 
The decoding circuitry consis ts of three L S T T L integrated circui ts , 
and a logic diagram is given in figure 2.3. The inputs are taken from 
address bits 8 -15 on the system bus and the outputs are connected to the 
chip select pins on the memory devices. The 8755 EPROM chip is mapped to 
the address range 0000 to 0 7 F F (hexadecimal) , and the 8155 RAM chip is 
mapped to the range F F 0 0 to F F F F . 
By fully decoding the memory and applying a suitable combination of 
33 
pull up and pull down resistors to the bus. a fixed data byte is forced 
onto the data lines when an attempt is made to a c c e s s a non-populated 
memory address . By setting the data byte equal to a restart instruction, a 
software interrupt is generated when an instruction is fetched from a n o n -
existant memory location. This can be used to detect some transient 
errors. 
The chip se lec ts are connected via wire-wrap or soldered links so that 
the behaviour of the system with full or partial decoding can be observed. 
2.2.3 Power Supply Unit 
For a computer to function correctly it is essent ia l for the 
integrated circuits to be supplied with a good steady voltage. If the 
power supply can filter out mains borne transients then fewer errors will 
occur . The power supply therefore plays an important role in the overall 
system reliability. A circuit diagram of the test supply unit is shown in 
figure 2.4. The unit contains two transformers (one laminar and one 
toroidal), and three smoothing capaci tors of different values. Two 
switches allow the selction of any combination of transformer and smoothing 
capacitor. This allows testing of the processor board to determine levels 
of interference that c a u s e errors for different arrangements of the power 
supply. 
2.2.4 Software 
Two software packages have been written to run on the test equipment. 
One is designed to test the whole system and to display m e s s a g e s if an 
error is detected. The other is for identifying data errors in the RAM 
chip. 
2.2.4.1 S Y S T E S T 
S Y S T E S T is a software package designed to test the whole system. The 
34 
main part of the program writes a data byte into memory and then reads it 
back again. It then compares the value with a reference byte stored in 
memory and with another stored in the C register. if either values 
disagree an error code is sent to the terminal. This p rocess continues by 
using the same byte in s u c c e s s i v e memory locations until the whole memory 
block F F 1 0 to F F F F has been tested. The data byte is incremented and the 
p rocess repeats until all values have been tried. If no errors are 
detected, a character is sent to the terminal to indicate that the system 
is functioning correctly, and the program restarts at the beginning. 
Recovery software is included at the low order a d d r e s s e s of the 
memory, so that if a hardware interrupt, a software interrupt or a total 
reset is erroneously executed, then an error code is sent to the terminal 
and testing is restarted. This will occur if the program jumps into any of 
the unpopulated memory, provided that full decoding is used and the data 
lines are pulled high to force the execution of a Restart 7 instruction. 
A number of different c o d e s are Included to indicate different errors 
so that the type of failure can be easily recognised. The test system has 
no monitor program, so the software package includes a subroutine to 
generate the software controlled serial output. To output a character the 
asci i code is passed to the routine in the C register, which then generates 
the serial data together with start and stop bits. 
2.2.4.2 RAMTEST 
RAMTEST is a software package designed to test for data errors in the 
RAM chip. The program requests a byte of data to be used in the test. It 
then writes that value into all the RAM locations FFOO to F F F F . When 
complete, a prompt is sent to the terminal and the program waits for an 
input before continuing. The data is then read back and displayed at the 
35 
terminal before starting again with a new byte of data. By including the 
wait between writing and reading, interference can be applied to the memory 
device during writing, during reading, between writing and reading, or 
during any combination of these. 
This software is designed to test for corruption of the memory, and 
therefore the program cannot use the RAM for its own operation. The 
software includes a number of subroutines which deal with the serial 
communicat ions. Subroutine cal ls are not used in the normal way, as the 
tests would corrupt the system stack. Instead, the return a d d r e s s , at 
which execution must resume, is loaded into the HL register pair. The 
routine is then entered by a normal jump instruction. At completion the 
PCHL instruction is used to load the program counter with the address 
stored in the HL register pair, and execution continues at that address , 
in this way all information for the correct operation of the program is 
stored in the internal registers of the processor rather than in memory. 
2.3 Practical Tests Performed 
Faults in digital circuits occur very infrequently, for example, a 
system similar to that descr ibed above has been operating continuously for 
over 4 months. O c c a s i o n a l interruptions to the power supply have c a u s e d 
full resets , but apart from these, no other errors have been detected. In 
order to observe the effects of faults, as they happen, it is n e c e s s a r y to 
induce failure. 
In the British G a s application, the digital controllers will be 
situated in remote a reas and will generally receive their electr ical power 
from street lighting circuits. T h e s e are not particularly c lean suppl ies, 
due to noise picked up from a number of s o u r c e s . Bull (18) suggests that 
interference on the supply from devices s u c h as thyristors. motors and gas 
36 
discharge lamps can c a u s e disruption, or even permanent damage, to digital 
circuits. Therefore conducted interference on the power supply is expected 
to be a possible source of failure, and the tests have been aimed at this 
a rea . 
Initial tests involved variations in the 5 volt supply rail. The 
levels at which errors occurred were recorded during manual reductions of a 
variable output supply. Other disturbances were created on the A / C mains 
input to the experimental power supply unit, using a Schaffner interference 
simulator. This consists of a main frame into which a number of plug-in 
units can be fitted. Three s u c h units were available, and these c a u s e 
short interruptions to the supply, or super impose high or low energy spikes 
onto the mains. 
The equipment generates interruptions of between 1.5ms and 500ms to 
simulate the change over of generators or breaks in the line. The low 
energy pulses of 2mJ have a rise time of 5ns or 10ns and an amplitude from 
50 to 2,500 volts, to simulate interference from e lect romechanica l switches 
and relays in c lose proximity. The high energy pulses of 2J have a r ise 
time of approximately 0.3us and an amplitude of up to 5.000 volts to 
simulate the effects of thyristors, atmospheric d ischarges , high voltage 
current breakers and electr ical machinery. 
All tests with the high energy pulses showed no observable disruption 
to normal program execution. Using a digital storage s c o p e , the effects of 
the spikes on the 5 volt rail were examined. With symmetric interference 
applied between live and neutral, no fluctuations were seen on the rail. 
However, with asymmetric interference between the two supply lines and 
ground, a 0.2 MHz oscillation of 0.6v amplitude, damped out after four 
c y c l e s , was observed. This produced a minimum of 4.4 volts on the supply. 
37 
which is shown later to be insufficient to c a u s e corruption. No variation 
in the response occurred with different values of smoothing capaci tors . 
To observe the effects of the other forms of interference, a Dolch 
logic analyser with a personality pod for the 8085 was used. This not only 
provides an indication of the states of e a c h of the pins on the p rocessor , 
during e a c h clock cyc le , but also provides a d isassembly of the 
instructions executed. Unfortunately, with fast spikes the interference is 
sufficiently harsh to affect the operation of both the system under test 
and the logic analyser , and did not provide much useful data. Information 
about program execution during voltage reductions and short interruptions 
w a s readily obtained. However, during some testing, incorrect d i s -
assembl ies were generated. This appeared to be due to the generation of 
additional clock pulses within the pod. caus ing the analyser to take extra 
erroneous samples . But by reverting to the display of binary states, it 
was possible to evaluate the actual processor response . 
2.4 Test Results 
As mentioned above, the test system was designed so that separate 
power supplies could be connected to e a c h major device. Therefore inter-
ference tests were carr ied out on the individual ch ips , before being 
repeated on the whole board. This approach was adopted to try and identify 
the most likely s o u r c e s of failure in a complete system. The results of 
these tests are given in the following sect ions. 
Investigations on the effects of adding pull-up or pull-down resistors 
to the data l ines, revealed only minor variations in susceptibility to 
interference. In all subsequent tests pull-up resistors were connected at 
all times. 
38 
2,4.1 Interference to the RAM 
A random a c c e s s memory (RAM) device has three main functions. T h e s e 
are to accept data from another device, to store the information, and to 
pass it back when required. Errors can occur during e a c h of these states, 
and are termed write, data and read errors respectively. Voltage level 
tests were carr ied out, using the RAMTEST software, to determine the 
sensitivity of e a c h of these operations. 
The voltage levels at which the first errors occurred for different 
devices, are given in table 2.1. It shows that the read and write 
operations are the most suscept ible to this sort of disturbance, while the 
data remains valid internally until at least another 1.4 volt drop in the 
supply. There is also a significant variation between devices. R3 and R4 
were manufactured by Intel and are corrupted more easily than R5 and R6 
which were manufactured by N E C . Slight variations in the level of first 
corruptions were observed for different data values, but these were all 
less than 80 mV. 
An interesting observation was the variation in the location and value 
of the first error for different data bytes. T h e s e are summar ised in table 
2.2. All initial write errors gave a value of F F when read back, this was 
the value to which all locations were initialised before disruption. 
However, this was observed at various locations with the Intel dev ices , 
whereas the N E C devices always showed the first failure at location FF0O. 
Similar observations were made with read errors, except one Intel device 
showed single bit errors at various locations, while the other consistently 
failed to F F at address FF00 . For data errors the first events observed 
were single bit c h a n g e s , and these occurred at various locations. However, 
a further reduction of only 50 mV resulted in multiple bit c h a n g e s . 
39 
Although errors occurred at various locations for different data 
bytes, the results were always consistent for a particular device. For 
example, the first data errors for RAM chip R5 are given in table 2.3. 
This shows that bit c h a n g e s in the device are more likely in certain bit 
positions. The table shows that, for device R5. bit 2 at address F F B F will 
always be the first to change if it is set to zero. Similar results were 
obtained for the other ch ips , but the errors occurred at different 
locations. This information could be used to test for general corruptions 
of data. The most susceptible bit could be checked periodically, and if 
correct would indicate that other corruptions were unlikely. However, this 
would create major problems in construction and maintenance, a s e a c h chip 
would have to be tested and the software modified accordingly. 
For short interruption testing, a variable resistor was connected in 
parallel to the device under test, and adjusted to maintain a constant load 
of 500 mA on the power supply unit. This arrangement was adopted to allow 
compar isons to be made between different parts of the circuit. Table 2.4 
shows the length of the interruption, in c y c l e s , which c a u s e d the first 
errors for each part. As expected, a larger smoothing capacitor needed a 
longer interruption before errors occurred . 
During this testing, RAM chip R4 suffered a permanent failure, and 
this is d i s c u s s e d further in section 2.6.2. Table 2.4 shows the results 
for device R3, and in e a c h c a s e the 5 volt rail dropped to a minimum of 
about 3.8 volts, before the first errors occurred . This is over 1 volt 
higher than expected from the previous results. However, in this c a s e the 
software package S Y S T E S T was being used , indicating that the susceptibility 
to errors Is dependent on the program being executed. This was confirmed 
by repeating the voltage reduction test while running S Y S T E S T , and showed 
40 
initial errors at around 3.8 volts. 
Detailed investigations into the effects of applying low energy fast 
spikes to individual devices were not carr ied out. This was b e c a u s e no 
useful information could be obtained, from the logic analyser , due to 
corruptions c a u s e d by the interference. Limited results for full board 
testing under this type of interference are given in section 2.4.4. 
Tests on early 4K RAMs have been carr ied out by Hnatek et al (46). 
The device studied required three different voltage levels of +5v, - 5 v and 
12v. Supply reductions to the 5v rail showed initial data errors at around 
1.2v. which is similar to those observed for the 8155. He also discovered 
devices which lost bits of data after 3 s e c o n d s if they were not a c c e s s e d . 
Investigations showed that leakage currents, as a result of faulty 
manufacture, c a u s e d the bits to change state. This failure mechanism is 
particularly ser ious as in—circuit tests are designed to operate in the 
shortest possible time and would not detect them. A preventive solution is 
to refresh the memory as often as possible. The importance of this type of 
refreshing in counteracting the effects of soft errors due to alpha 
particle hits has been shown by Smith (95). However, in this c a s e 
refreshing does not need to be carr ied out as often. An attempt to 
reproduce delayed errors on modern devices was unsuccess fu l . Two 8155s and 
sixteen 2114s were left u n a c c e s s e d for ten days while filled with the value 
AA. This was repeated with complementary data, but in both c a s e s no errors 
occurred. 
2.4.2 Interference to the EPROM 
Testing the erasable programmable read only memory (EPROM) was a much 
simpler p r o c e s s , as voltage variations can only c a u s e read errors. 
Initially the supply was gradually reduced until the program started 
41 
sending error codes to the terminal. Repeating the test with the logic 
analyser connected showed no alteration in the voltage level at which first 
errors occurred , indicating that it did not affect the results. 
Single bit errors were observed at a level of 3.43 volts. T h e s e were 
all changes from 0 to 1 in bit location 5, and occurred at a number of 
a d d r e s s e s . This resulted in the misinterpretation of instructions or the 
incorrect reading of operands. Further reductions in the supply c a u s e d 
more bits to change from 0 to 1. until F F was read during e a c h instruction 
fetch at a level of 3.25 volts. In this condition the restart 7 
instruction is executed repeatedly, pushing a return address onto the stack 
each time. This results in the stack extending through the entire memory 
map. destroying all volatile data. 
Only one device was tested in this way. However, in previous tests on 
2716 E P R O M s , in a different system, similar results were observed. A 
common failure for one device was the misreading of a jump a d d r e s s , 
resulting in execution passing to an unpopulated area of memory. Another 
was the misreading of the operand in a compare instruction. In both c a s e s 
the same bit showed a transition from 0 to 1. A similar device programmed 
with identical data a lso showed these types of transitions but in different 
bit locations. A similar response is therefore expected with other 8755s. 
The lengths of interruptions n e c e s s a r y to c a u s e corruptions are given 
in table 2.4. The minimum supply level reached for e a c h capacitor was 
about 3.4 volts, which agrees with the previous results. One failure mode 
encountered during interruptions was the repetitive execution of interrupt 
routines. This was only observed with the logic analyser connected , and 
was due to oscil lations on the interrupt l ines. The problem was cured by 
removing the analyser , or by tying the lines low. Other failures were 
42 
similar to those for gradual reductions of the supply. Bit 5 showed the 
initial failures with other bits corrupted during longer interruptions. 
2.4.3 Interference to the P r o c e s s o r 
Voltage reductions on the processor revealed initial errors at a level 
of 2.74 volts, these consisted of bit 1 incorrectly read as 1 instead of 0. 
at several a d d r e s s e s . At a level of 2.72 volts the program counter showed 
signs of incorrect operation. This resulted in execution skipping over 
single and multiple bytes in the program. For example, the third and 
fourth bytes following a jump instruction were read as the jump address . 
This sort of execution was observed in several parts of the program. 
Continuous servicing of interrupts, in the s a m e way as in the previous 
sect ion, was also observed. Again this was eliminated by grounding the 
interrupt l ines. Another failure mode encountered, was the cyc l ic reading 
of data through memory. In this c a s e the processor would read s u c c e s s i v e 
locations to the end of the memory map. and then repeat from the beginning. 
This mode was always entered if the supply was reduced to below 2.45 volts 
and then raised slowly. The processor would not leave this state with the 
application of a TRAP, which is supposed to be a non-maskab le interrupt. A 
full reset is n e c e s s a r y to exit from this mode. A similar s e q u e n c e of 
operation is encountered when certain o p - c o d e s are executed on the 6800. 
The fact that no further useful processing Is performed under these 
conditons is particularly important from a reliability point of view, and 
this is d i s c u s s e d further in section 3.4.3. 
The length of interruptions required to c a u s e errors in the processor 
are given in table 2.4. First errors occurred when the supply reached a 
minimum of about 2.8 volts. Incorrect read and write operations were 
observed under these conditions. Slightly longer interruptions, caus ing a 
43 
dip down to 2.5 volts, revealed program counter malfunctions, as descr ibed 
above. Further reductions c a u s e d the processor to execute a s e q u e n c e of 
restart 7 instructions ( F F ) . but as the supply recovered the cyc l ic read 
mode was entered. This occurred for all interruptions which resulted in 
the supply rail falling to a value between 2.5 and 0.3 volts. If the 
supply dropped below this range, the power-on reset circuit would generate 
a correct reset. 
2.4.4 interference to the Complete System 
Raising the power supply slowly from 0 to 5 volts, for the whole 
board, c a u s e d the processor to enter the cycl ic read mode. This indicates 
that ca re must be exerc ised in starting up a system, and is to be expected 
as the power-on reset circuit will not operate correctly unless the supply 
is restored quickly. 
Reductions in the power supply revealed initial memory read errors at 
3.73 volts. This is a similar level to that observed with a reduction to 
the RAM supply. At 3.66 volts, memory read errors occurred at the stack 
locations, resulting in the incorrect execution of return instructions. At 
3.46 volts, the system could not send error codes to the terminal. This 
was due to the incorrect reading of the EPROM. and prevented normal 
execution. 
Interruption testing revealed comparable failures. The lengths of 
interruptions required to c a u s e initial failures are given in table 2.4. 
As expected, they are similar to those for the RAM. which is the most 
susceptible part of the system to this sort of disturbance. 
Low energy fast spikes were applied to the whole system, but even with 
2.5 kV pulses having a 5 ns rise time, no observed failures were produced, 
provided that correct earthing and shielding of the equipment was used. 
44 
Without such an arrangement, errors could be induced. As mentioned above, 
the logic analyser could not be used effectively, to observe the point of 
failure, as it suffered from the interference. However, it could be used 
after the event to identify the final outcome of the fault. 
Without grounding the c h a s s i s of the interference simulator, 
corruption of the stack pointer so that it pointed to an address in the 
EPROM, was observed. On returning from a subroutine, an arbitrary address 
was retrieved, and execution continued from that point. Subsequent cal ls 
attempted to overwrite the current stack position without s u c c e s s , and the 
following returns passed execution back to the same location as before. 
Corruption of the stack pointer was also observed during interruption 
testing on another system. This shows the importance of checking the stack 
pointer, or the return a d d r e s s , before leaving a subroutine. 
The cyc l ic read mode could also be entered as a result of this type of 
Interference. On another occas ion the wait state was entered, and by 
applying a TRAP and observing the location to which execution returned, it 
was possible to establish the last byte executed before the wait. The 
processor had in fact read the operand of a conditional jump, which was 
equivalent to the code for a HALT instruction. Again, the repetitive 
servicing of interrupts was also observed, when the interrupt l ines were 
allowed to float. 
Finally, a few investigations were carr ied out with the ana lyser 
connected. Although the output was corrupted, a few conditions could be 
interpreted. T h e s e revealed o c c a s i o n s where the processor misread 
instructions. For example, a triple byte Instruction was interpreted as 
three single byte instructions. 
45 
2.5 Signif icance of the Results 
The test programme descr ibed above, relied on the assumption that the 
errors produced. under the various forms of interference. were 
representative of those which do occur in real systems. As with 
accelerated life testing, descr ibed in chapter 1. the experiments may 
reveal mechan isms which do not occur under normal operating conditions. 
However, the types of interference used were chosen to be similar to that 
expected in the particular British G a s application being considered. The 
aim was to simulate naturally occurr ing events, rather than to induce 
failure by altering the environmental conditions. 
Another factor which suggests that the failure m e c h a n i s m s observed 
will occur under normal operating conditions, is that in many c a s e s a 
particular mechan ism was observed as a result of different d isturbances. 
This happened not only with similar interference on different parts of the 
circuit, but a lso with different types of interference. The results for 
the low energy fast spikes and the short interruptions are particularly 
important. Gradual reductions are less significant b e c a u s e they appear the 
same as short interruptions at the instruction level. Although sharp dips 
seem to occur as a result of an interruption, the minimum voltage is 
maintained for over a mil l isecond. During this time approximately one 
thousand instructions will be executed. and therefore individual 
instructions will s e e the interference as a steady low voltage level. 
2.6 Observations of Permanent Fai lures 
Although this work is aimed mainly at transient events, the detection 
and recovery p r o c e s s e s should not be developed without consideration for 
permanent failures. Over the past three years several permanent component 
failures have been observed, and these are descr ibed in the following 
46 
sect ions. 
2,6.1 P r o c e s s o r Fai lures 
Two processor chips have experienced permanent failure, but despite 
this they have not failed completely. Certain parts of the integrated 
circuits still function correctly. Both failures occurred while the 
processors were operating on an Intel 8085 system design kit (SDK) board. 
The first p rocessor appeared to fail for no particular reason and may have 
been a random failure. When connected to a system it s e e m s to success ive ly 
read through every memory location from 0000 to F F F F and then repeats 
continuously, in the s a m e way as the cycl ic read mode encountered during 
interference testing. All the control s ignals are correct for a s u c c e s s f u l 
read and the logic analyser confirms that the correct data for e a c h address 
goes onto the data bus in the normal way. 
This failure mechanism is particularly important when designing a 
watchdog timer for a system. It would not s e e m unreasonable to retrigger 
the timer on a certain address in the control program. Then if the system 
c r a s h e d and execution no longer continued around that address , the watchdog 
would reset the system and control would be restored. This arrangement is 
proposed by Oppenheimer (74) to recover from transient d isturbances to the 
power supply. However, if the cyc l ic read mode is entered, the trigger 
address still appears at regular intervals and no reset or alarm would be 
set off. if the timing of the watchdog is not critical. This state could 
continue unnoticed for a considerable time. A complete memory cyc le lasts 
for approximately 65 ms. with a 6.144 MHz crystal , and therefore the wa tch -
dog must be set to a shorter time interval if address triggering is used. 
Oppenheimer also suggests that the watchdog could be designed to generate a 
non-maskable interrupt. It has been establ ished from the tests that s u c h 
47 
an interrupt is not recognised during the cycl ic read mode and therefore 
cannot enable recovery from this state. 
The second processor was damaged when the power supply failed during 
interference testing. Interruptions which result in restoration of the 
supply during a peak in the mains cycle c a u s e a sharp spike in the current 
drawn. For the power supply used , the spike had a peak amplitude of up to 
16 Amps, compared with a normal demand of approximately 300 mA. and was 
often sufficient to blow the input fuse. The power supply failed during 
interruption testing and was probably due to these surge currents. At the 
s a m e time an L S T T L dual D flip-flop (74LS74) failed. All functions of the 
chip were lost and it took a current of approximately 1 Amp when attached 
to a 5 volt supply. 
The only damage that appeared to occur to the p rocessor was that one 
of the multiplexed address and data lines (AD5) stuck at ' V . This meant 
that for e a c h instruction fetched, that particular bit would be read as a 
' V . Therefore only half of the instructions could be read successfu l ly , 
but it seemed that for the instruction that the processor had read, the 
correct execution followed. The program counter incremented internally in 
the normal way. with only the single bit corrupted externally on the 
address bus. 
Again this failure is important when considering watchdog designs. 
One method of resetting the timer is to connect it to a port, and to use 
the OUT command. If a single bit stuck at ' V fault occurred which did not 
affect the OUT instruction or the port a d d r e s s , then it Is possible for 
execution to continue in such a way that the watchdog would not generate a 
reset or alarm. This processor also entered the cycl ic read mode 
occasional ly but with the failed bit stuck at '1 ' . 
48 
2.6.2 RAM Failure 
During interruption testing a permanent failure in an 8155 RAM chip 
occurred. Subsequent reading of the device gave a value of 3F at all 
locations. The chip was removed from the circuit and later replaced, at 
which time all locations appeared to be stuck at 00. At this stage the 
device drew a current of over 0.5 amps, compared with a normal consumption 
of about 40 mA. 
The execution of the processor was affected. It operated with bit 7 
stuck at 0, and read the value 3F from unpopulated memory. In the 
resulting execution a HALT instruction was Incorrectly interpreted, caus ing 
the processor to enter the WAIT state. During subsequent tests, the 
processor would not respond in any way with the failed device in the 
circuit. 
2.6.3 Crystal Fai lure 
During initial trials of the single board test system regular problems 
were encountered in initiating correct operation of the processor . This 
problem was particularly evident when the 5 volt supply was instantaneously 
applied to the board. By slowing down the rise time of the 5 volt rail the 
problem c e a s e d . However, the timing constraints for power on reset were 
satisfied in the original state. 
Further investigations revealed that the problem was c a u s e d by the 
crystal. Under certain conditions it would oscil late at 18 MHz. three 
times its rated frequency of 6.144 MHz. Once it had started at that 
frequency it was n e c e s s a r y to apply a capac i tance to the crystal to force 
it into its correct operation. A hardware reset had no effect, so a 
watchdog timer connected to the reset line on the p rocessor would not 
restore correct operation from this failed state. 
49 
This problem has been cured by permanently connect ing a 20 pF 
capacitor from the crystal to ground. The 8085 U s e r ' s Manual (119) 
suggests that this should be done for crystal f requencies below 4 MHz. 
Their design for a single board computer does not include the capacitor , 
therefore they must consider it unnecessary at 6 MHz. This suggests that 
the crystal may have been faulty. However, this fault did not appear when 
four other p rocessors were tested. Three of these were manufactured by 
N E C , whereas the other one, and the original, were manufactured by Intel. 
This s e e m s to indicate an isolated fault in the internal oscillator circuit 
of the suspect device. All other functions of the chip operated normally. 
2.7 Summary 
The aim of the tests descr ibed in this chapter, was to identify 
failure m e c h a n i s m s which are likely to occur in digital controllers. The 
mechan isms observed fit into two main catagor ies of corruption of data and 
disruption in the s e q u e n c e of program execution. They both occurred under 
different types of interference applied to various parts of the circuit , 
and suggests that they will occur in real systems. 
Corruption of data results from interference to each of the main 
elements of a digital system. Disturbances to the RAM allows data to be 
destroyed within the device, or during read and write transfers. In the 
c a s e of the EPROM and processor , incorrect interpretation of instructions 
can result in the wrong data being a c c e s s e d or the wrong operations being 
performed. 
Disruption of the s e q u e n c e of program execution can also originate 
from all three devices. Corruption of the stack data in the RAM results in 
incorrect returns from subroutines. Misinterpretation of instructions, due 
to interference in the EPROM or processor , c a n result in the execution of 
50 
erroneous jump, halt or stack operations. Disruption can also result from 
the direct corruption of the stack pointer and the program counter within 
the processor. Finally, the cycl ic read mode and repetitive servicing of 
interrupts, both prevent any further meaningful execution. 
Both these groups of failure are of great Importance in control 
systems. The effects of data corruptions have been studied by a number of 
researchers , and methods have been developed to detect and correct them. 
These consist of the use of recovery blocks. N-version programming, 
rollback, time redundancy and reasonableness checks , and are d i s c u s s e d in 
chapter 1. 
However, the sequence of events following disruption in the flow of 
program execution, has received less attention, and is studied further in 
chapters 4. 5, 6 and 7. It is of particular signif icance as without a 
resumption of valid execution, the data correction methods mentioned above, 
cannot function. 
The results of the tests have shown that any value of op -code can be 
executed by the processor , either as a result of misreading instructions, 
or by access ing erroneous addresses . It is therefore necessary to know the 
effects of every op-code . This is d iscussed further in the following 
chapter. 
It has been indicated that some failure mechanisms can have serious 
implications for the effective operation of watchdog timers. Further 
design considerations for these devices are also presented in the following 
chapter. 
51 
C H A P T E R 3 
Undeclared Operations of Microprocessors 
3.1 Introduction 
From a reliability point of view it is extremely important to know all 
the possible operations of a microprocessor . Without a full knowledge of 
their operation it may not be possible to design effective methods to 
counteract the results of transient or permanent faults. The manufacturers 
provide information about their dev ices , but this is not comprehensive . An 
obvious area of omission is in the declaration of the effects of all 
possible operation codes . This is important b e c a u s e the execution of a 
program can depart from its normal route, either due to a programming error 
or to some external interference. This has been demonstrated in the 
practical tests, descr ibed in the previous chapter. 
Other undeclared operations are difficult to reveal. For example, 
the memory cycling mode on the 8085 was found by testing, and could not 
otherwise have been forseen. 
Three manufacturers (Intel, N E C and Motorola) were contacted to s e e 
if they would re lease any further information other than that which is 
readily available, but they were not prepared to do so. Therefore the 
information required was only available from independent s o u r c e s , or had to 
be establ ished by experimentation. 
3.2 Undeclared Operation Codes 
The full instruction map for most 8-bit microprocessors has a total of 
256 possible instruction codes . T h e s e take the values 00 to F F in 
hexadecimal. For a particular device a certain number of these codes will 
be defined by the manufacturer to perform specif ic tasks, but usually this 
does not cover the entire instruction map. The remaining codes remain 
52 
undeclared but inherently must operate in some way. An initial reaction 
might be to a s s u m e that they perform in the s a m e manner as the Instruction 
cal led a 'no-operat ion ' (NOP). This Is a slightly misleading name b e c a u s e 
although no data is altered, the program counter is incremented by one. 
Therefore even a NOP c a u s e s a change in the overall state of the processor . 
Alternatively if these undeclared codes c a u s e a halt in the execution of 
Instructions, this also is a change In the overall state. 
As the codes are undeclared by the manufacturers there is a 
possibility that they may not perform in a logical fashion, or may not be 
repeatable even under similar conditions. Also, there is no guarantee that 
a particular response on one processor will be observed on another. This 
is particularly important where a specif ic processor is manufactured by 
several different companies . In this c a s e it is possible that the chips 
may be fabricated using different masks and it will be highly probable that 
the undeclared codes will function differently. For example, it has been 
suggested in (118) that Intel and National Semiconductor use the same masks 
for the 8080, whereas N E C and AMD have developed independent designs. This 
has been establ ished from the operation of the auxiliary carry flag, which 
does not always function correctly on the first two manufacturers devices . 
However, it is believed with the 8085 that Intel have speci f ied, to 
other manufacturers, exactly what e a c h code should do. and the codes which 
they say are undefined are in fact only undeclared to the final user of the 
device. Similar c a s e s may also exist with other p r o c e s s o r s , but should be 
treated with extreme caution as new modifications may depart from previous 
arrangements. This has been demonstrated by Nemmour (67) who reports on 
differences between 6800 microprocessors manufactured before and after 1977 
by the same companies . He suggests that some of the c h a n g e s were to 
53 
correct design errors in the original masks. 
The undeclared o p - c o d e s of various microprocessors are d i s c u s s e d in 
the following sect ions , along with some other undisclosed functions. 
3.3 Operations of the 8085 
The 8085 is a typical 8 bit microprocessor with a 16 bit address bus. 
it interprets all operation types from a single byte, and therefore 256 
different o p - c o d e s exist. Intel only define 246 of these c o d e s leaving 10 
undeclared. The functions performed by the undeclared codes have been 
investigated by Dehnhardt and Sorensen (28). Not only do they perform in a 
logical way, but they also provide some very useful operations, such as 16 
bit additions, subtractions and rotations. The same results can be 
achieved using s e q u e n c e s of other instructions, but this involves extra 
execution time and memory s p a c e . 
Also revealed in (28) is that two of the bits in the condition code 
register, which are supposedly undefined, also perform in a logical 
fashion. They state that bit 1 indicates a two's complement overflow, 
whereas bit 5 indicates an unsigned overflow for data c h a n g e s between 0000 
and F F F F , when executing 16 bit Increment and decrement instructions. 
These flags are used by some of the undeclared instructions. 
This leads to the question of why the codes and flags are not dec lared 
by the manufacturers. The 8085 has c lose links with both the 8080 and the 
Z80, with most of the o p - c o d e s performing in the same way. Therefore the 
extra codes may have been left undeclared to maintain a high level of 
software compatability between the devices. When asked about the c o d e s , 
the manufacturers stated that they could not be guaranteed to work under 
all conditions, suggesting that pattern sensitive faults, introduced at the 
design or manufacturing s tages , may be present. 
54 
Dehnhardt and S o r e n s e n (28) suggest that the o p - c o d e s and flags can be 
used to e n h a n c e programming, and it is known that they have been used in 
some applications. Clearly this is a dangerous situation if pattern 
sensit ive faults do exist. Investigations on an Intel 8085 by Buchhoiz (17) 
revealed pattern sensitivity in the over-flow flag. During addition and 
subtraction. 25 particular operations resulted in the incorrect setting of 
the flag. Similar errors were observed with the compare instruction. 
As indicated above, p rocessors from different manufacturers, or 
different batches, may vary in their response , and for this reason modern 
devices were tested to compare with the published results. The undeclared 
o p - c o d e s were executed on an Intel SDK board. The monitor program, 
provided with the kit. allowed the setting of registers and flags prior to 
the test, and also the interrogation of their values afterwards. A Dolch 
logic analyser , with an 8085 personality pod. was connected to the 
processor to enable all external pins to be monitored. Most of the 
instructions c a n be checked without the analyser , especia l ly with a prior 
knowledge of their operation. However, it does provide verification of 
data transfers, and is particularly useful in monitoring the flow of 
execution after conditional jump instructions. T h e s e operations are 
difficult to monitor with software alone. 
Full testing of all the instructions for every possible combination of 
data values would take a considerable length of time. For this reason , 
tests were carr ied out, both with random data, and data se lected to c h e c k 
speci f ic responses . All ten undeclared codes were executed on an N E C 8085 
and responded in the same way as that descr ibed by Dehnhardt and S o r e n s e n . 
NEC have developed Independent designs for the 8080 (118) and the 8035/8048 
(see section 3.5), which suggests that Intel may have specif ied to other 
55 
manufacturers how all c o d e s of the 8085 must perform. A nuclear hardened 
version of the 8085. descr ibed by Kim et al (51). was developed from 
information provided by Intel and has all the o p - c o d e s defined. This 
suggests that all 8085s should operate in the same way. 
However, tests were carr ied out to attempt to reproduce the apparent 
malfunctions observed by Buchholz (17). Both N E C and Intel 8085s were 
subjected to the same operations which were reported to have incorrectly 
set the proposed two's complement overflow flag. At all times during 
testing the flag was set correctly. This indicates that the errors were 
observed on an isolated faulty component, or that a fault existed in the 
masks of a particular batch which has been corrected on other devices. 
Another undeclared operation, which was discovered during interference 
testing, is the continuous cycl ic reading of memory. This is descr ibed 
further in chapter 2. and its implications for reliability are d i s c u s s e d in 
section 3.8.1. Due to the complex structure of a microprocessor , other 
modes of operation, which have not been d iscovered, may exist. 
3.4 Operations of the 6800 
The 6800 Is also an 8 bit microprocessor with a 16 bit address bus. 
Again there are 256 different possible operation c o d e s , but only 197 are 
defined, leaving 59 undeclared. The functions performed by all the c o d e s 
have been studied by Nemmour (67). However, practical tests were carr ied 
out to determine the operation of the undeclared c o d e s , without a prior 
knowledge of the published results. The methods used are descr ibed in 
detail as they can be used in the study of other p rocessors . 
3.4.1 Determination of the Undeclared Instructions 
Studying the positions of the undeclared c o d e s , in relation to the 
defined instructions in the instruction map. provides a useful starting 
56 
point. A number of the undeclared codes are situated in adjacent 
locations, suggesting that they may have similar operations but use 
different addressing modes. By considering the defined c o d e s , alongside 
the one under investigation, it is possible to suggest the likely 
addressing mode. T h e s e suggest ions proved correct in the majority of c a s e s 
and assisted greatly in the determination of many of the operations. 
To c h e c k the expected operations provided by the c o d e s , they were 
executed on a small 6800 based system. A short assembly program was 
written to ass is t in the investigations, and a full listing is given in 
appendix 1. It effectively u s e s the MIKBUQ routines to read in values from 
the terminal and to set the registers accordingly, before the execution of 
the required o p - c o d e . The data read in is stored in s u c c e s s i v e locations 
in memory and then the stack pointer is set to the location above the 
block. A return from interrupt Instruction is then executed to load the 
correct values into the corresponding registers. This ensures that all the 
registers and the condition codes can be set to any value. A se r i es of 
software interrupt instructions are placed after the o p - c o d e to use the 
MIKBUQ routine to print out the contents of the registers and condition 
codes . 
This provides a c lear indication of any c h a n g e s that have occurred 
within the processor due to the specif ic o p - c o d e . However, it does not 
give any indication of external events s u c h as reading and writing to 
memory, and is of little use in c a s e s where a jump or branch is generated, 
in these c a s e s a logic analyser was used to monitor the states of the 
address and data b u s e s , and the read/write and valid memory address l ines. 
This enabled all external data transfers to be monitored, and clearly 
indicated the flow of execution after jump instructions. Without 
57 
monitoring the external pins of the processor , it would not have been 
possible to establish all the operations performed. 
3.4.2 Functions of the Undeclared Codes 
The functions performed by the undeclared codes fit into two main 
groups, those which perform totally new operations, and those which perform 
identical or similar operations to the instructions already defined. Most 
of the codes are similar to the ones specif ically defined by Motorola. 
They perform roughly the s a m e operation but will manipulate the flags 
differently or not change the contents of a register. For example there is 
an add accumulators instruction indentical to the defined instruction 
except that the half carry flag is not affected. 
Some of the codes are Indentical to defined ones and appear to be due 
to the instruction map not being fully decoded in some p laces . Examples of 
this are the four addressing modes for the compare X register instructions. 
These are normally c o d e s 8 C . 9 C . AC and B C . but also appear at C C . DC. E C 
and F C . This suggests that bit 6 is ignored when the instructions are 
decoded. 
Some of the codes are substantially different and appear to perform 
useful tasks, however these functions can also be performed by two or more 
of the defined Instructions. For example there is an add accumulator to 
the complement of memory instruction. It works for both accumulators , and 
for all four addressing modes. All the flags except for the half carry and 
the interrupt mask are affected by the result of the operation. Another 
useful instruction performs a logical AND on the two accumulators and puts 
the result in the A register, three of the flags are affected. A similar 
instruction affects the flags but does not change the contents of the 
accumulators. 
58 
Store immediate operations exist for the A. B and X registers and for 
the stack pointer. To be consistent with the load immediate instructions 
they should store the data in the memory locations immediately following 
the instruction, but this does not occur . Instead, the first byte is 
skipped and the data is written to the following locations. However, the 
program counter Is adjusted accordingly so that the next instruction is 
read from the location immediately after the one into which the last data 
byte is written. This effectively makes the store A and B registers into 
triple byte instructions, and the store X register and stack pointer into 
instructions with four bytes. But in all c a s e s only one byte Is read. 
3.4.3 Cycl ing Through Memory 
Four of the undeclared o p - c o d e s c a u s e the p rocessor to cycle through 
memory indefinitely. This state is of particular importance when 
considering reliability. It means that if one of these o p - c o d e s is 
inadvertently executed, either due to an error in programming or to some 
external interference, then the processor will ' lock -up ' and will not 
execute any further instructions until some external intervention is 
initiated. 
Operation codes 9D and DD c a u s e the processor to read through memory 
starting at the direct address following the code. Once in this state it 
will not respond to either a non-maskable interrupt (NMD or an interrupt 
request (IRQ), even if the interrupt mask is c leared beforehand. The only 
way of leaving this state is to exert a full reset on the processor . The 
contents of the A. B and X registers are not altered from the state that 
they were in before the o p - c o d e was executed. This was determined by 
generating an interrupt immediately after the reset. Unfortunately the 
interrupt will not occur until after the first instruction has been 
59 
executed. In the system used the first instruction loads the stack 
pointer, and therefore its contents at the time of the reset could not be 
determined. 
Those condition codes which were not affected by the first instruction 
or the reset, remained in the same state that they were in originally. 
This suggests that no change occurs in any of the internal registers while 
the processor is cycl ing through memory. Therefore the only data lost are 
the contents of the program counter and the state of the interrupt mask, 
which are both set by the reset sequence . The contents of the registers 
will not be of great use after the reset, as some unforseen s e q u e n c e of 
instructions will have been executed before the undeclared o p - c o d e was 
reached. However they may give some sort of indication of how that 
particular state was entered. 
The result of executing operation c o d e s 3C and 3D is similar to that 
obtained by the codes 9D and DD. in that the processor ends up cycl ing 
through memory reading s u c c e s s i v e locations. After executing the code , it 
differs by saving the address of the next byte onto the stack, before 
reading the next location on the stack. it rereads the previous location 
and then starts cycl ing through memory from the top of the stack. 
While in this state the processor will not respond to NMI or IRQ, a s 
before. Again the only means of leaving this state is by a reset. Nemmour 
(67) suggests that this is due to the way in which interrupts function. 
They do not respond until the completion of an instruction, and therefore, 
b e c a u s e these operations never finish, no interrupts can be initiated. 
However, the B and X registers are not changed from the state that 
they were in before the undeclared o p - c o d e was executed, but the A register 
is changed. Bits 1-7 are c leared while bit 0 remains unaffected. Again. 
60 
it was not possible to determine the value of the stack pointer after the 
reset. If it remains unaltered then the address at the top of the stack 
will point to the byte immediately after the illegal op-code that was 
executed. This is a very important point when attempting to diagnose the 
original fault, and could prove very useful. 
The reason for these modes of operation is unclear, but it is 
believed that they may be for testing purposes. Hayes and McCluskey (43) 
propose a test sequence for the 8080 which starts by executing NOPs 
repeatedly. This is designed to reveal faults on the address bus. 
However, the cyclic read mode is not only suitable for revealing address 
bus faults, but can also indicate data bus and memory failures. 
3.4.4 Comparison with Published Data 
The investigations by Nemmour (67) were carried out in a similar 
manner, but in addition he studied the masks to enable cross checking with 
practical tests. Devices from different manufacturers (SESCOSEM and 
Motorola) were used, however these are constructed from identical masks. 
In all cases the instructions operated in the same manner as the 
independent Investigations described above. This shows consistency between 
devices from the same manufacturer, but variations may be obtained if 
different masks have been developed. Again, it is unclear why these 
operations are not disclosed. Design or manufacturing difficulties could 
have caused problems, and these may have been corrected subsequently. 
Nemmour reveals several changes that were made to the masks in 1977. 
some of these were to correct initial errors. For example, on original 
devices, the application of a non-maskable interrupt, during certain cycles 
of the execution of a software interrupt, caused the servicing of the 
maskable interrupt routine. This sort of fault is particularly difficult 
61 
to locate, and others of a similar nature may exist. 
3.5 Operations of the 48-serles Microprocessors 
The 48-series microprocessors are also 8 bit devices but have a very 
different architecture from the 8085 and 6800. They consist of a central 
processing unit. 27 I/O lines, a single interrupt and an internal timer/ 
counter. In addition to this a quantity of internal read only and random 
access memory is provided, the size of which depends on the particular 
device, and is given in table 3.1. The processors are designed for small 
scale control applications where the final program would reside in one of 
the ROM based chips. The other devices are primarily for use in the 
development and debugging stages. 
The address bus is 12 bits wide allowing a maximum possible address 
range of 4K bytes. The program counter is however only 11 bits long, and 
effectively splits the memory map into two separate blocks. Access to each 
area is controlled by software which can alter the most significant bit of 
the address bus. The internal RAM is not accessed by the main bus. and its 
contents can only be treated as data, no instruction fetches can be made 
from it. Therefore the normal arrangement is to locate the program within 
the 4K address range, and to use the internal RAM for data storage. 
However, fixed data values can be stored in the main memory map, but they 
are less easily accessed. 
External memory devices can be attached to the processors to 
supplement the internal memory. Alternatively, devices can be mapped to 
the same locations as the internal ROM. and the processor forced to access 
them instead. This can be used in the development stage, or to provide an 
alternative program, such as for testing purposes. 
The processors interpret the instruction type from 8 bits, and 
62 
therefore 256 possible op-codes exist. Only 230 are defined, leaving 26 
undeclared. No published work on the undeclared operations of these 
devices has been found, and therefore investigations were carried out to 
determine the effects of executing the undeclared codes, and to discover 
other undisclosed functions. Full details of these studies are given in 
the following sections. 
3.5.1 Undeclared Memory in the 8035 
In all published literature, the major manufacturers state that the 
8035. 8039 and 8040 have no internal ROM. However, it was suspected that 
this might not be the case, and attempts were made to read internal memory 
of 8035s as if they were 8048s. For nine devices from three different 
manufacturers (Intel. NEC and National Semiconductor) a logical program of 
up to IK was revealed. The Intel 8035 contained a games program which read 
9 bits of parallel data from port 1 and test input T l , and used bit 7 of 
port 2 and test input TO for transmitting and receiving serial data. It is 
therefore clear that the 8035 is in fact an 8048 but sold under a different 
name. When approached on this matter, Intel did admit that they are the 
same device, and that 8048s which do not operate at the required speed, or 
have faults in the ROM. are sold as 8035s. 
This fact raises two important points. Firstly, any details of the 
undeclared codes of the 8035 will relate directly to the 8048. Secondly, 
the existence of an internal program might have serious consequences with 
respect to reliability. The internal program is disabled by holding the 
external access (EA) pin high, but an internal chip failure could cause the 
pin to be disabled resulting in bus conflict or the correct execution of 
the internal program. This could result in a dangerous sequence of signals 
appearing at the ports and could mislead any external hardware monitoring 
63 
the state of the system. 
The external access pin does not operate in the same way for all 
8035s. If allowed to float, the Intel chip accesses the external memory, 
whereas the NEC chip accesses the internal memory. For the National Semi-
conductor device access to both memories appears to occur. Normally the 
pin would be tied high or low. but an internal wire bond failure, due to 
thermal stress or vibration, could cause it to float. For this type of 
failure a particular device will continue without error, depending on which 
memory contains the main control program. 
3.5.2 Determining the Undeclared Instructions 
In order to determine the operation of the undeclared codes, a small 
8035 based system was constructed. A block diagram of the system is shown 
in figure 3.1. It consists of the processor, an 8-bit latch and a 2K 
EPROM. The latch is necessary in order to demultiplex the address and data 
bus. An EPROM emulator was used to enable quick and easy modifications to 
the program being run. 
The software used to investigate each code is given in appendix 1. 
The program outputs the contents of the accumulator onto port 1, executes 
the undeclared op-code and then re-outputs the accumulator to port 1. 
before Incrementing the accumulator and restarting. All unused memory is 
set to 04, this causes a jump to address 004 if an attempt is made to 
execute outside the program. This method is also used to recover execution 
after the undeclared code. A subroutine call during each cycle is included 
to monitor the state of the stack. 
A logic analyser was used to monitor the state of the ports and bus. 
The clock output on the TO pin was used as the clock input to the logic 
analyser, causing one sample to be taken during each T state. This is 
64 
equivalent to five samples during each program cycle. In this way it was 
possible to determine the number of bytes associated with each code and the 
number of cycles it took to execute. Any effects on the ports, bus or 
accumulator could also be seen. 
As the processor is designed to be used as a single chip controller, 
many of the Instructions result in only internal actions, and cannot be 
observed externally. In order to establish internal operations, further 
investigations were carried out using a Prompt 48 microcomputer design aid. 
This allows programs to be executed from RAM and enables access to all of 
the internal registers and flags. Using this system it was possible to 
reveal any internal effects of the undeclared codes. 
3.5.3 The Effects of Executing the Undeclared Codes 
A detailed list of the effects of executing each of the undeclared op-
codes is given in appendix 2. Unlike the 8085 and 6800, in this case, 
processors from different manufacturers give different results. Devices 
from the three manufacturers of Intel, NEC and National Semiconductor, were 
studied. The results from the National Semiconductor 8035/8048 were 
identical to those from Intel, and therefore have not been included in the 
detailed descriptions in the appendix. It seems to be the case that 
National Semiconductor do not produce independent designs for their 
devices, and this is supported in (118). 
The full instruction maps, including the undeclared codes, for both 
the Intel and NEC devices are given in figures 3.2 and 3.3. Descriptions 
of each of the operations of the undeclared codes are given below. 
3.5.3.1 Intel 8035/8048 
Figure 3.2 shows that, for the Intel chip, out of 26 undeclared codes. 
17 perform a No-Operation, 4 cause a jump in execution and the remaining 5 
65 
affect the input/output lines. Three of the jump instructions are logical 
extensions to the standard instruction set. They are conditional on a 
particular flag being clear and. in the instruction map. they are adjacent 
to their corresponding jump, conditional on the flag being set. The fourth 
additional jump instruction is unconditional and branches to an address 
within the current page. This is not provided for directly in the standard 
instruction set, and enables program modules to be relocated on a different 
page without modification. 
Four of the additional I/O instructions are identical to codes in the 
standard instruction set. They are copies of the four operations involving 
port 2. and each one is adjacent to its copy in the instruction map. 
suggesting that bit 0 is not used in decoding these instructions. The 
fifth code involving the I/O lines has the value 38. By considering the 
adjacent locations in the map. an OUTL BUS.A instruction would be expected, 
which outputs the contents of the accumulator to the Bus. This undeclared 
code does take two machine cycles to execute, which is necessary for an I/O 
function, but no read or write signal is generated to perform a correct bus 
operation. The value 00 does appear on the Bus during T4 of the second 
machine cycle, but this does not seem to perform a useful task. No other 
part of the processor appears to be affected. 
3.5.3.2 NEC 8035/8048 
Most of the undeclared codes for the NEC device are the same as those 
already described above. All the jump and I/O operations are the same, but 
six of the No-Operations have been replaced by useful instructions. Four 
of these fit logically into the instruction map and perform functions not 
previously provided. They fill the gaps for the indirect addressing modes 
of the decrement, and the decrement and jump if not zero Instructions. 
66 
which are omitted from the standard instruction set. There does not seem 
to be any logical reason why these instructions should be omitted. Errors 
during initial development of the processor may have caused problems which 
have now been corrected by NEC. 
Two of the instructions perform functions totally unrelated to those 
already defined. One has the effect of clearing the upper nibble of the 
accumulator (bits 4-7). The other loads the accumulator with the lower 8 
address bits of the next sequential instruction to be executed. The first 
instruction is useful when manipulating nibbles, whereas the second could 
be useful when debugging a program. In the latter case this code could be 
placed in several locations in a program followed by an output to a port. 
Then by monitoring the port it would be possible to trace execution past 
these points. 
3.5.4 Other Devices in the Series 
All investigations were carried out on the 8035/8048. The only 
declared difference, with the other devices in the series, is the size of 
the internal memory. These chips are therefore likely to have similar 
properties. For example, the undeclared op-codes are expected to function 
in the same way as those in the 8035/8048. and the devices which are 
defined as having no internal ROM are expected to have internal program 
memory. 
3.6 Operations of the 68000 
The discussion up to now has been directed towards 8 bit micro-
processors, but it is now worth mentioning the Motorola 68000. which has a 
16 bit internal architecture, and a 24 bit address bus. The type of 
operation performed is determined from a full 16 bit data word, and 
therefore the total number of possible op-codes is much greater than for an 
67 
8 bit machine, and is in fact 65.536. Obviously with such a large number 
of possible codes, there will be a substantial quantity which are not 
defined. The 68000 has 56 basic functions, but with ail the addressing 
modes and register references approximately 45.800 op-codes perform defined 
operations leaving over 19.700 unused. However, the processor has been 
designed to signal an exception if it detects the illegal execution of any 
of these codes. This effectively means that each one of them acts as if it 
were a software interrupt. 
A study of the full instruction map reveals that the unused codes 
appear in isolated locations as well as large groups, some up to 4K. The 
manufacturers state that codes in the large groups may be used in later 
designs. For this reason they cause the execution of a different 
exception handling routine from the other codes, if an attempt is made to 
execute them. This allows the emulation of new instructions on the 
original devices. It was felt that if any of the unused codes were going 
to perform undeclared operations, then the isolated ones would be the most 
likely to do so. For this reason, a number of the codes were executed on a 
small 68000 based single board computer. In all the cases that were tried, 
a correct response from the exception handling logic was observed. No 
unusual operations were revealed. 
As well as the detection of unused op-codes, internal logic is 
provided to detect other erroneous states, such as an attempt to perform an 
instruction fetch from an odd address. The processor is also specifically 
designed to have external logic to detect unsuccessful memory transfers. 
All these checking modes are important from a reliability point of view. 
They reduce the probability of executing a large number of erroneous 
instructions before detection. 
68 
Although this is an advantage in the detection of errors due to 
transient faults, the permanent failure rate will be higher than that for 8 
bit processors due to the Increased complexity of the chip. This may also 
increase the susceptibility to transients. 
3.7 Operations of the 6809 and Z80 
The 6809 and Z80 are both 8 bit microprocessors, but they differ from 
those described above. In some cases the instruction type is not 
established from 8 bits alone. This allows the possibility of undeclared 
op-codes at different levels. The 6809 is a modified version of the 6800. 
and has a similar instruction set with the majority of the instructions, at 
the first level, still having the same operation codes. This is 
particularly evident in the ranges 20-2F and 4 0 - F F where nearly all the 
codes are the same. 
An Interesting point is that two of the previously undeclared codes 
are replaced with Instructions which would logically be expected from 
looking at the memory map. Code 21 has been programmed to execute a branch 
never instruction, which is the logical opposite of code 20, the branch 
always instruction. Code 9D executes a jump to subroutine using direct 
addressing. This was a previously omitted form of addressing in calling 
subroutines, and fits in between the other addressing modes. Nemmour (68) 
has identified 498 undeclared op-codes, at different levels, in the 6809, 
four of these cause the cyclic reading of memory in the same way as those 
described for the 6800 and 8085. 
No further studies were carried out on the 6809 and Z80. They have 
been mentioned here to indicate possible problem areas for other processor 
architectures. 
69 
3.8 Implications of the Undeclared Operations on Reliability 
Undeclared operations of microprocessors can be divided into two broad 
catagories. those which occur as a result of executing an undeclared op-
code, and those produced by other mechanisms. Most of the undeclared codes 
in microprocessors operate in a similar way to the declared instructions, 
and therefore their importance does not differ significantly from the 
erroneous execution of defined instructions. However, there are some codes 
which operate very differently, and are of great significance. These 
result in the processor cycling through memory and prevent the execution of 
further instructions until a reset. Therefore some sort of watchdog timer 
must be included in a system to recover from these states. 
3.8.1 Significance for Watchdog Design 
When considering the design of a watchdog timer the following two 
points should be noted. Firstly, the highest level of fault recovery must 
initiate at least a full reset, and secondly the address lines alone should 
not be used to trigger the timer. The second point is particularly 
important In the 6800, which uses the address lines for calculating branch 
and indexed addresses. The triggering must incorporate the write signal 
which is never present unless a valid write operation is being performed. 
The existence of memory cycling can be considered as an advantage or a 
disadvantage depending on the application. If the accuracy of a system is 
more important than its timing, then this mode of operation would be an 
advantage as it has the effect of suspending execution, preventing any 
further output. On the other hand if timing is more important, the 
considerable amount of time which could elapse between the occurrence of 
the fault and the detection of the error by the watchdog, would be a 
disadvantage. Further time could be lost resetting the system and 
70 
I 
reinitialising variables. 
However, even in the first case, the major drawback is that recovery 
has to be initiated by some hardware, if the timer fails the whole system 
fails. A better solution would be to attempt to detect and correct errors 
under program control and only rely on external hardware when this approach 
fails. This cannot be achieved with these particular codes, therefore the 
only method of providing a back-up procedure in the case of a watchdog 
failure is to include further hardware to monitor its operation. 
The existence of internal memory in the 8035. when being used for 
control purposes, is also important for watchdog design. A chip failure, 
such as a wire bond fracture or an internal short, can result in the 
execution of the internal program. It may then operate In such a way that 
the errors go undetected. This is possible, as the devices are used in I/O 
intensive situations and therefore the ports, to which a watchdog would be 
connected, will probably be highly active. If a simple triggering sequence 
is used with non-critical timing the Internal program could generate 
signals which would satisfy the timer. However, a complex triggering 
sequence will reduce the likelihood of non-detection of this type of 
failure. 
3.8.2 Powering down to Enable Recovery 
It has been shown by the crystal failure that it may be necessary to 
provide a level of recovery which goes further than a reset and actually 
powers down the system before powering up in a controlled manner. This is 
because with the crystal oscillating at three times its natural frequency 
the application of the reset has no effect and the power has to be removed 
before correct operation will resume. This situation has been cured by the 
addition of a small capacitor, but does at least demonstrate that it is not 
71 
always sufficient just to apply a reset. 
3.8.3 Use of Non-Maskable Interrupts 
The memory cycling mode has shown that non-maskable interrupts should 
not be used to initiate recovery. However, if they are used for other 
purposes, great care must be exercised in their handling. A noisy signal 
on the input can cause multiple interrupts and result in a large quantity 
of data being stored on the stack, which may result in overflow and the 
overwriting of critical data areas. It is therefore advisable to reset the 
stack at the beginning of the routine, if the return address is not 
required, or to at least check that the stack pointer is within certain 
limits. 
3.8.4 The Most Important Undeclared Operations 
The failure modes which present the major threat to the integrity of a 
system are those which have not been discovered and cannot be forseen. Any 
amount of time can be spent designing against the effects of known or 
expected failure modes, but Inevitably it is the unknown modes which cannot 
be designed against fully. It is hoped that high level detection 
mechanisms, such as watchdogs, will allow recovery from these types of 
failure. 
3.9 Summary 
This chapter has shown that microprocessors perform a number of 
operations which are not declared by the manufacturers. Some of these can 
have serious consequences in the design of error detection and correction 
techniques, and therefore a knowledge of these modes of operation is 
necessary in order to achieve high reliability. 
A common failure mode observed during the interference testing, 
described in chapter 2. was a transfer of program execution to a non-
72 
specific memory location. This can result in instruction fetches from data 
areas or operand fields, and any value of op-code can be read. The 
functions performed by executing each op-code have been determined for the 
8085. 6800. 8048 and 68000, either from published data or from practical 
tests. With a knowledge of all the op-codes it is possible to predict the 
flow of execution in different memory areas, after an erroneous jump, and 
this is discussed in detail in chapters 4, 5. 6 and 7. From these studies 
it is possible to design more effective error detection and recovery 
processes. 
It has been established that the undeclared operations do not always 
function in the same way in devices from different manufacturers, and 
changes can occur between different revisions of the masks. Therefore the 
results may not always be consistent between any two devices. It has been 
suggested by Nemmour (67). and by Dehnhardt and Sorensen (28). that the 
undeclared codes can be used to enhance programming, but this would seem to 
be a very dangerous practice. 
73 
CHAPTER 4 
Erroneous Execution in Data Areas 
4.1 Introduction 
During the practical tests on the small single board system, described 
in chapter 2. it was shown that corruption to the normal flow of execution 
could be generated by applying different types of interference to 
particular parts of the circuit. The three main elements on the board, 
consisting of the processor. EPROM and RAM. could each cause such a 
failure. Although the particular failure mechanism is different in each 
case, they can be divided into the two main catagories of the 
misinterpretation of instructions. and the incorrect return from 
subroutines. 
The misinterpretation of instructions occurs either due to the 
incorrect transmission of data from the EPROM. or to the corruption of the 
program counter within the processor resulting in the wrong bytes being 
read. Incorrect returns from subroutines occur by the corruption of 
either, the stack pointer within the processor, or the stack data stored in 
the RAM. 
The tests therefore show that this class of failure is likely to occur 
in real systems under certain types of interference. It is particularly 
important because without knowledge of the behaviour of a system after such 
a failure, it Is not possible to effectively design hardware or software 
methods to detect and correct system operation. For example, a common 
solution to this problem is to attach a hardware watchdog timer but. as 
will be shown later, without careful consideration to the design, certain 
failures will not be detected by the circuit. 
74 
4,1,1 Random Jump Within the Memory Map 
When program execution departs from its predefined sequence it must 
continue at some other location. For the purpose of the following analysis 
it is assumed that the location is random within the full memory map. 
Therefore the failure mode being investigated is equivalent to the 
erroneous execution of a jump instruction to a random address. For a 
typical 8-bit microprocessor with a 16-bit address bus this gives a 
possible 65.536 different locations at which execution could resume after 
the fault. 
However, certain parts of the memory map have different properties 
dependent on the type and sequence of values which are read when various 
locations are accessed. In the following sections three main categories 
are studied, these are program areas, data areas and unused areas. The 
effects of memory mapped input and output is also considered. This chapter 
studies the flow of execution after a random jump into a data area. 
4.2 Analysis of Execution 
If as a result of an error execution resumes in a data area, the 
processor will interpret the data as Instructions and perform the 
corresponding operations. Obviously the type of data and its arrangement 
in a particular block will depend very much on the application and method 
of programming, and in the case of random access memory, will change during 
execution of the program. Therefore to analyse this type of execution for 
the general case it is necessary to assume that the sequence of bytes is 
totally random. 
From this it follows that when a data byte is interpreted as an 
instruction one of two possible outcomes will be performed. Either, a jump 
will be generated causing control to pass to another part of the memory 
75 
map. or a non-jumping instruction will be interpreted passing control to 
the next logical byte. For the latter case the whole process repeats 
again. If Pj . the probability of interpreting a jump instruction, is zero 
then execution would continue to the end of the data block. However, 
assuming random data. Pj will be dependent on the particular instruction 
set of the processor, and is given by:-
N J 
Where:- N . is the number of bytes which cause a jump or branch. 
N-j. is the total number of possible op-codes (256 for a normal 8-
bit processor). 
Clearly, P N J the probability of interpreting a non-jumping instruction is 
given by:-
P N J = 1 - P j Eqn. 4.2 
It follows that. P (K). the probability that K instructions will be 
J 
executed before control passes to another part of the memory map. can be 
obtained from:-
fk—i) P.OO = P M , . P . Eqn. 4.3 J NJ J 
An important quantity, which will be used later in chapter 8. is the 
average or expected number of Instructions which will be executed before 
the jump, N I A V , and is given by:-
Nl = 2 . K . P . ( K ) Eqn. 4.4 
It Is useful for determining both the time taken to initiate recovery, and 
the probability of corruption of specific data. The average number of 
bytes read, N B A W . will be greater than N I A W because each instruction 3 AV a AV 
interpreted can consist of one or more bytes, therefore NE3 A V will be given 
76 
by:-
°5! L . N L . N 
N B A V = ^ ' A V " 1 5 ' ^—TT1 + IT1 E^n'45 
L=1 NJ L=l J 
Where:- N L N j a n c l a r e t n e numbers of bytes Interpreted as non-jumping 
and jumping instructions of length L 
N N J and Nj are the total number of bytes interpreted as non-
jumping and jumping instructions. 
To assist in the calculation of both N I A V and N B A V . a short FORTRAN 
program was written. It requests a number of details about the particular 
instruction set and then calculates the values using equations 4.4 and 4.5. 
N B A V is used later in chapter 7 when considering the flow of execution 
as it passes between different parts of the memory map. The following 
section looks at a few microprocessors and goes through the necessary steps 
to calculate the above quantities. 
4.2.1 Response of Different Processors 
To determine the expected response for a particular processor it is 
necessary to make a detailed study of the instruction set. For execution 
in the data area the important instructions are those which cause a jump or 
transfer of program execution. Appendix 3 lists a number of parameters for 
the 8085. 6800. 8048 and 68000 microprocessors, it includes the effects of 
the undeclared codes. The importance of chapter 3 in determining the 
undeclared op-codes is now clear, as without the knowledge of them 
inaccurate results would be obtained. 
The instructions are divided into two groups, those which always cause 
a jump and those which are conditional on some internal state of 
the processor. To include the properties of the conditional jump 
instructions, equation 4.1 is modified to:-
77 
N + P« ,<i> Eqn. 4.6 
Where:- P C J ( | ) i s t n e probability that the ith op-code of N C J conditional 
instructions causes a jump. 
For ease of calculation it is assumed that there is a 50% chance that 
a jump will occur. Although this is not strictly true in individual cases, 
overall the assumption is valid. This is because in most cases the 
instructions have a logical pair which tests the inverse state of a 
particular condition, and therefore any variations in the probabilities 
will be cancelled out. In this case equation 4.6 simplifies to:-
N, + 0.5 . N_. 
However in a few cases a different approach was adopted. For the 8048 
decrement and jump if not zero instructions, it is assumed that they always 
cause a jump. Provided that the contents of the particular register 
concerned is random, then there is only a 1 in 256 chance that the jump 
will not occur. They are therefore grouped together with the other jump 
instructions. 
Special treatment has been given to the 68000 instruction set. This 
is due to the fact that most of the instructions which can cause a transfer 
of control have several possible outcomes. For example, the branch on 
condition code instruction first makes a test and if not true, no branch 
occurs. This is assumed to have a probability of 0.5 for the same reasons 
as above. If a jump does occur it is assumed to be random, in which case 
there is a 50% chance that the address will be odd. The processor can only 
read instructions from even addresses and generates an exception if an 
78 
attempt is made to access an odd address. Therefore if a branch on 
condition code instruction is executed, the probability of no jump is 0.5. 
the probability of generating an exception is 0.25 and that of a successful 
jump is also 0.25. To simplify the calculations the op-codes for this 
instruction are split in the same proportions to give an effective number 
of op-codes for each outcome. A similar treatment has been adopted with 
the other Instructions and the proportions in which they are divided are 
given in appendix 3. 
4.2.2 Results from the Analysis 
Using the data In appendix 3. together with the FORTRAN program 
mentioned in section 4.2. values of P. . N I A W and N B A W have been evaluated 
J AV AV 
for the 8085. 6800. 8048 and 68000 processors. The values of these 
quantities are given in table 4.1. The upper curve on each of the graphs 
in figures 4.1 (a)-(e) show the probability that a certain number of 
instructions, or less, will be executed before a jump. The other curves 
will be explained in the following section. 
4.3 Transfer from the Data Area 
The analysis so far has only considered the number of instructions or 
bytes read before a jump. It would also be useful to know where execution 
will continue so that methods can be developed to generate an ordered 
recovery to the correct program. Consideration of the jump instructions 
reveals four distinct types of halts, restarts, returns and unspecified 
jumps. These are shown in figure 4.2 and are described in detail below. 
4.3.1 Halt Instructions 
Halt instructions are those which prevent further execution of any 
instructions until an interrupt is applied to the processor. If no 
provision is made to exit from this state, then no recovery is possible. 
79 
4,3.2 Restart Instructions 
Restart Instructions cause the processor to jump to a specified 
location in the memory map. The particular address varies between 
processors and can either be generated internally or is read from another 
location. In the case of the 8085. restarts jump to the low end of memory 
and continue to execute from that point. If no consideration for erroneous 
restarts have been made then values read from those locations will be 
interpreted as instructions. 
Turner (104). In an example of a program for a security system, states 
that it is all right to place the code over the restart vectors if they are 
not being used. This would be acceptable as long as the system functions 
without errors and is not susceptible to external interference, however 
this is difficult to guarantee. If restarts do occur then execution will 
resume at some location within the program, but will not necessarily pick 
up correct instructions immediately, as shown in chapter 5. 
The program in the example is short enough to finish before the end of 
the restart table, in particular, it does not occupy the restart 7 
location. This is of special importance because the op-code for the 
restart instruction Is FF , and it is usually the case that unused locations 
of ROM or EPROM are also left at FF. Therefore if such a code is 
erroneously executed the processor will jump to the restart location, 
immediately read another restart instruction and continue to loop 
indefinitely. This condition is similar to the execution of a halt in that 
no other instructions will be executed, except that a restart saves the 
return address on the stack. If multiple restarts occur, the stack will 
grow through the entire memory map destroying all the data. 
On the 6800 a restart is generated by the software interrupt 
80 
instruction. It differs from the 8085 in that the address at which 
execution resumes is read from the top end of memory. Therefore if those 
particular locations have been used for some other purpose an unspecified 
address wiil be read. 
The restart instructions are of great importance in returning program 
control to a recovery routine. In the following analysis it will be 
assumed that the restart vectors have been set. and that full recovery is 
achieved if any of the restart instructions are executed. 
4.3.3 Return Instructions 
If a return instruction is read, then execution will resume at the 
address obtained from the top of the stack. This will result in control 
passing back to the program provided that two conditions are met. Firstly, 
the last Information pushed onto the stack before the fault must have been 
a valid program address, and secondly, both the stack pointer and the stack 
data must not have been corrupted by the fault or subsequent processing. 
In the following sections it will be assumed that a valid program 
address is not read from the stack, and therefore execution continues at 
some undefined location, which is considered to be random in nature. This 
is a reasonable approach if the programming technique has been adopted 
where data is stored on the stack immediately after entering a subroutine. 
In this case the return address from the subroutine only occupies the last 
position on the stack for a very short time. 
4.3.4 Unspecified Jumps 
The last of the instructions are those which jump to a location 
dependent either on the contents of the bytes following the instruction or 
the contents of a register. In this case it is assumed that a random jump 
occurs. 
81 
4.4 Modification to the Analysis 
Having divided the jump instructions into the four groups mentioned 
above, it is now possible to split the probability function, of equation 
4.3. into its constituent parts corresponding to each group. The new 
functions will be proportional to the original probability and will depend 
on the relative number of each instruction type. For instance the 
probability of a restart P R S T ( K ) is given by:-
N R S T 
P R S T ° ° = IT- • P J ( K > E < n - 4 8 
J 
Where:- n R S T i s t h e e f f e c t , v e number of restart instructions. 
Similar equations can be obtained for the other three groups. 
Figures 4.1 (a)-(e) show graphs for the probability function for each 
of the processors under investigation. Two graphs (c) and (d) are given 
for the 8048. one for each of the manufacturers. This is due to the 
dissimilar instruction sets. However, no noticeable variation can be seen 
in the results despite the differences. 
The graphs show that the proportions of the different types of jump 
vary enormously between processors. Assuming that recovery is only 
obtained from restarts, as mentioned in section 4.3.2. the 68000 has the 
best response by recovering on 95% of the occasions of a random jump into a 
data area. This is due to the large number of undeclared op-codes which 
effectively generate restarts by initiating exception handling. The 8085 is 
the next best at 32%, followed by the 6800 at 4%. The 8048 has no restart 
instructions and therefore cannot recover in this manner. These figures 
represent the worst case, as recovery can be initiated after jumps to other 
parts of the memory map, and these will be considered later in chapter 7. 
82 
4.5 improvements in Recovery 
In order to increase the chances of successfully completing recovery 
it is necessary to initiate the recovery process as quickly as possible, so 
that the corruption of data is kept to a minimum. The easiest method of 
initiating the process Is via the restarts, therefore the aim is to 
increase the number of jumps caused by restarts and to reduce the number of 
instructions executed prior to the jump. 
The obvious solution is to seed the area with restart instructions, 
additional to those found randomly within the data. The problem is to 
establish the optimum number and position of the extra codes. An initial 
reaction could be to split the data into separate blocks so that execution 
can not transfer from one to another. This requires a string of adjacent 
single byte restart instructions equal to the length of the longest 
instruction. It will be shown later that this solution does not represent 
the best use of resources in most cases. 
4.6 Simulation of Execution in Data Areas 
When considering the execution in non-random data, the derivation of 
accurate equations to represent the response of the processor becomes very 
complex. An alternative approach, which was adopted, is to simulate the 
process on a computer. The program developed generates a block of random 
data which can then be modified to Include certain types of instructions, 
such as restarts. Then, starting at a particular location, it translates 
the data into a sequence of instruction types, and calculates both the 
number of instructions and the number of bytes encountered before a jump. 
The data structures considered have consisted of a certain number of 
random bytes separated by a given number of a particular instruction type. 
Execution begins randomly between the start of the first block and the 
83 
start of the second block. In each complete run the response is evaluated 
for a number of sequences, each one starting with a new set of data. 
For a paricular sequence, the probability. P' (K). that K instructions 
are executed from N Q sequences, is given by:-
N K 
N 
Eqn. 4.9 
Where:- N is the number of sequences where K instructions are executed. 
This will give a representative result provided that N c is large. 
Initial runs were carried out with totally random data to provide a 
means of determining a reasonable number of sequences for each run. The 
value chosen was 5000. which consistently gave results within 2% of the 
results obtained from the original analysis, proving that both methods are 
consistent. 
4.7 Optimum Seeding of Data 
The optimum seeding of data was established by completing a number of 
runs on the simulator with different data structures. A selection of the 
results are shown in table 4.2. The percentage overhead signifies the 
additional memory requirement, for a particular arrangement. However, for 
a given overhead there are a number of ways in which the data can be 
seeded. 
4.7.1 Data Structures for the 8085 
With the 8085 and a 20% overhead the following structures were 
considered: 20 bytes of random data followed by 4 adjacent single byte 
restarts. 15 followed by 3, 10 followed by 2 and finally, 5 followed by 1. 
Assuming that execution of a restart generates a successful recovery, table 
4.2 shows that the original suggestion of totally separating the data 
blocks does not give the best chance of recovery. It also shows that no 
84 
advantage is achieved by separating the blocks by more than the length of 
the longest instruction. 
The best solution for the 8085 Is to spread the seeded data, such as 
the Restart 7 instruction (op-code FF) . evenly throughout the data area. 
Not only does this provide the greatest chance of recovery, but it also 
gives the lowest average for the number of instructions executed before a 
jump. One disadvantage of this arrangement is that execution is not 
restrained within a block. It can skip over the restart instructions and 
therefore there is no limit to the number of instructions which could be 
read. 
However, the probability of execution continuing for a long time is 
small, and in this case a higher level of fault detection, such as a 
hardware watchdog timer, should provide the necessary coverage. 
4.7.2 Data Structures for the 6800 
The 6800 gives a totally different set of results. The optimum 
solution is to spread the restarts (sofware interrupt instruction code 3F) 
within the data area, but rather than placing them individually, they 
should be positioned in groups of two. The reason for this is the high 
number of double and triple byte Instructions in the instruction set, which 
increases the probability of skipping over individual bytes. 
4.7.3 Data Structures for the 8048 
A different approach is necessary for the 8048, because the 
instruction set does not contain any restart type instructions. To 
initiate recovery it is necessary to jump to a given location which 
contains a recovery routine. This can be achieved using straight forward 
jump instructions, but requires a greater overhead, as more than one byte 
is needed for a given jump. The problem is to ensure that the instruction 
85 
is executed correctly, so that the address is not interpreted as an 
instruction. 
One possible solution is to make the address equal to the op-code of 
the instruction. For example, the op-code 04 causes a jump to page 0 of 
the address map. with the low order address being read from the second 
byte. Therefore if execution enters a string of 04's at any point, control 
will always transfer to address 004. Similar effects can be obtained with 
the other jump instructions. An alternative method is to place one or more 
no-operation (NOP) instructions before the jump. 
However, in both cases it is important to consider the last byte in 
the string. If just two bytes, such as 04, are used to separate the data 
blocks then the second byte can be interpreted as an instruction. This 
happens if either, a double byte Instruction is read immediately before it, 
or if a direct jump to that byte occurs. This would result in a jump to an 
unspecified location dependent on the first byte of the next data block. 
By replacing the last byte with a NOP (00), execution in this case 
will continue in the next data block and gives the opportunity of recovery 
if It reaches the end of the block. Test results have shown that this does 
in fact improve the probability of recovery. 
The seeded data used for the results shown for the 8048 in table 4.2, 
where 04, 04, 00 for the triple byte strings, and 04, 00 for the double 
byte strings. In the first case control can pass to address 004 or 000, 
and in the latter case only to 000. For a single recovery address the 
first sequence could be changed to 00, 04, 00. Different recovery 
addresses can be obtained using different jump instruction codes. 
Table 4.2 shows that the optimum response is obtained with the double 
byte strings. This is due to the. large proportion of single byte non-
86 
jumping instructions in the instruction set. Separate runs for 8048's from 
different manufacurers were not carried out due to the close agreement 
obtained from previous analyses. Instead, the data used contained the 
average number of particular instruction types. 
4.7.4 Data Structures for the 68000 
For the 68000 the level of recovery from execution in the data area is 
95% without any modification to the system, apart from the addition of a 
recovery routine. it is unlikely that any appreciable improvement will be 
obtained by altering the structure of the data area. Therefore no further 
analysis was carried out on this processor. 
4.8 The Effect of Data Block Size on Recovery 
Having obtained the optimum recovery string length for each of the 
processors, a number of further simulations were carried out. These were 
designed to determine the effects on recovery, of altering the data block 
size. Obviously, a reduction in block size results in a greater 
requirement for memory, to store the extra recovery strings, and therefore 
has a greater overhead. 
The results from these runs are given in figure 4.3. The graph shows 
that a large improvement in recovery is obtained with only a small increase 
in the data area. Further increases continue to make an improvement, but 
with a reduced effect. 
For all three processors the greatest benefits are obtained with an 
increase in data area of around 20%. However, in most systems it is rare 
that the whole data area is used, in which case the data should be seeded 
with sufficient restarts to fill all the unused locations. This provides 
an immediate improvement without the need for any alterations to the 
hardware. If further improvements are required, additional memory is 
87 
necessary. 
Figure 4.4 shows how the average number of instructions executed, 
reduces as the amount of seeded data increases. The effects on the 8048 
are less than that for the other two processors because the original 
average is lower and the seeded data generates proportionally fewer 
recoveries. 
4.9 Summary 
This chapter has shown how erroneous execution in data areas can be 
detected and can then lead to recovery. All that is required is to force 
the processor to jump to a specific location where a recovery routine is 
initiated. 
The 68000 microprocessor is particularly good in this respect, due to 
the large number of illegal and unassigned instructions which invoke 
exception handling. For the 8085. 6800 and 8048 it is necessary to seed 
the data area with certain values to improve the probability of recovery. 
The particular values required for each processor have been discussed, 
together with their optimum grouping and positioning. 
The results from this analysis are used in chapter 7 where the flow of 
erroneous execution between different memory areas is considered. 
88 
CHAPTER 5 
Erroneous Execution in Program Areas 
5.1 Introduction 
This chapter looks at the sequence of events following a random jump 
into a program area, and derives equations for the probabilities of 
different outcomes. Unlike the data area, the program area contains a 
logical sequence of instructions and therefore a different approach is 
necessary. Again the sequence of bytes will be dependent on the 
application and the method of programming. In order to analyse the general 
case, bytes in the program area are divided into different instruction 
types, and then the probabilities of different sequences of these types are 
studied. 
The first analysis adopts a more detailed approach than the second by 
allowing a greater number of byte types. It therefore gives better results 
but has only been developed to cater for processors having single, double 
or triple byte instructions. However, it could be extended to include four 
byte instructions, such as those found on the Z80. The second analysis is 
less accurate but can be applied to any processor regardless of instruction 
length. 
5.2 Detailed Analysis 
When execution jumps randomly into a program area the first byte read 
can either be a valid op-code from the program, or it can be an operand 
from a multi-byte instruction. In both cases the processor will interpret 
the byte as an instruction and perform the corresponding operation. 
Figure 5.1 shows the type of byte which can be read. Clearly, the 
probability of reaching each of the particular states is dependent on the 
type of instructions in the program. P R (0) , the probability of resuming 
89 
valid instructions at the first cycle after the erroneous jump, is given 
by:-
PRCO) = — Eqn. 5.1 
B 
Where:- N ( is the total number of instructions in the program. 
N B Is the total number of bytes in the program area. 
P D X ( 0 ) . P T X X ( 0 ) and P T X X <0) , the probabilities of entering the operand 
fields of double and triple byte instructions immediately after the 
erroneous jump, are given by:-
N DI 
P D A ( 0 ) = — Eqn. 5.2 
P T X X ( 0 ) = P T X X ( 0 ) = TTT E < ^ 5 3 
Where:- N Q ( is the number of double byte instructions. 
N T ) is the number of triple byte instructions. 
It is now necessary to consider the flow of execution after each of 
the above states has been reached. For the case where a valid instruction 
has been read, the processor will continue to fetch and execute valid 
Instructions, as It will have resynchronised instruction fetches with the 
program. However, this situation may not continue indefinitely if certain 
instructions are encountered. For example a return from subroutine 
instruction will cause an undefined jump If the stack pointer has been 
corrupted, or if the last information pushed onto the stack was data rather 
than a return address. 
Where an operand byte is read, it could be interpreted in such a way 
that control is passed to another part of the memory map. if the operand 
byte is interpreted as a non-jumping instruction, then another byte would 
90 
be read, which again could either be a valid instruction or another operand 
byte. As with the analysis of execution in data areas, it is useful to 
know where execution continues if a jump occurs. Therefore the same 
approach has been adopted where the jump instructions are divided into four 
separate groups of halts, restarts, random jumps and returns. 
The possible sequence of events after the initial jump is shown in 
figure 5.2. Provided that the probability of entering the operand field is 
less than one. execution will eventually perform a jump to another part of 
the memory map or resynchronise instruction fetches with the program. In 
order to calculate the likelihood of each of these two outcomes it is 
necessary to determine all the possible ways of transferring from one state 
to another. 
This Is achieved by considering all the possible sequences of bytes 
which allow transfer between the states. The probability that a particular 
sequence will occur is obtained by multiplying together the probabilities 
that certain types of bytes will appear in specified locations in the 
sequence. The overall probability of a particular transfer, from one state 
to another, is then obtained by adding together the probabilities that each 
sequence for that transfer will occur. 
A list of all the possible sequences for each of the transfers, 
together with the derivation of the probability equations, is given in 
appendix 4. It shows that each value can be determined provided that the 
probability of certain bytes appearing in given locations is known. These 
can be evaluated by assuming equal use of each instruction for a particular 
processor and random data in the operand fields, or by analysing the 
occurrence of certain types of bytes in programs under investigation. 
It is then possible to find expressions for the probabilities that the 
91 
processor will have resumed execution of the program or will have 
transferred to another part of the memory map at. I, instruction cycles 
after the initial erroneous jump. These expressions are also given in 
appendix 4. The final outcome of resuming or transferring is therefore 
given by the probability equations when I is equal to infinity. In most 
cases the probability that operand bytes are still being read after about 
five cycles is small, and therefore it is only necessary to consider the 
first ten cycles. 
Again It is important to calculate the average number of instructions 
executed, but in this case it is necessary to calculate a value for both of 
the possible outcomes. N R ^ . the average number of instructions executed 
before resuming, is given by:-
Similar expressions can be obtained for the average number of instructions 
executed before the other outcomes. 
Clearly the type of instructions in a particular instruction set. and 
the way in which the instructions are used, will affect the overall 
results. A comparison of different instruction sets and programs is given 
In the following section. 
5.2.1 Comparison Between Instruction Sets 
The response of execution in program areas will obviously be dependent 
on the arrangement and frequency of use of different Instruction types. 
The analysis in the previous section requires a total of 24 different 
parameters to enable a solution. These can be obtained directly from the 
instruction set by assuming that each op-code is used the same number of 
times, and that the data in the operand field is random. 
NR AV 
1=1 
(P_(l) P_(l D ) 
P_(crf>) 
Eqn. 5.4 
92 
Table 5.1 contains results obtained, using the previous assumpt ions, 
for the 8085. 6800 and 8048 microprocessors . For all three p r o c e s s o r s it 
shows that if execution enters a program a r e a , then there is over a 90% 
probability that instruction fetches will resynchronise with the program. 
It also indicates that the number of instructions executed before reaching 
one of the final states is small . The average value in all c a s e s is less 
than two, implying that very few erroneous instructions will be executed 
and consequently little corruption of data will occur . 
Only one figure has been given for the average number of instructions 
executed before a transfer to another part of the memory map. b e c a u s e e a c h 
individual transfer gave almost identical results. Similarly, for the 
8048. one set of results is shown, as only slight variations were observed 
between processors from different manufacturers. 
Figures 5.3 (a), (b) and (c) show graphs of the relationship between 
the probability of reaching a particular outcome and the number of 
instructions executed, for e a c h of the p rocessors . They indicate the short 
transition period between the initial erroneous jump into the program area 
and the transfer to the next state. In all c a s e s the probability of still 
reading an operand byte after five instruction cyc les is less than 0.5%. 
5.2.2 Comparison Between Actual Programs 
The results from the previous section give an indication of the 
inherent properties of a particular instruction set. However, there are 
many instructions, such as the logical operators which are rarely used , and 
others such as the jump instructions, which are frequently used. Therefore 
the previous results are unlikely to be representative of actual programs 
using a particular instruction set. 
In order to evaluate the effects of different instruction code usage . 
93 
a number of actual programs were analysed. Results from these ana lyses are 
given in table 5.2. Programs A and C are monitor programs for small s c a l e 
8085 and 6800 based systems respectively. They were chosen to give a 
comparison between software designed to perform similar operations but 
using a different instruction set. The table shows that the probability of 
resuming valid instructions, and the average number of instructions 
executed before reaching the final outcomes, are almost identical. There 
is a slight variation in the probabilities where control transfers to 
another part of the memory map. but these values are small anyway. 
Programs B. D. E and F are taken from industrial control and data 
transmission systems. Again, c lose agreement is obtained for the 
probability of resuming valid instructions and for the average number of 
instructions executed. These values are also similar to those given by the 
monitor programs. 
Therefore it s e e m s that for erroneous execution in program a r e a s , the 
probability of resychronising instruction fetches with the program is 
approximately 95%, regard less of the processor . This suggests that, 
despite di f ferences in the instruction se ts , particular instruction types 
tend to be used in the s a m e proportions. 
5.3 Simplified Analysis 
The previous analysis is suitable for p rocessors having single, double 
and triple byte instructions, and could be extended to include four byte 
instructions. However, to enable compar isons to be made with the 68000, 
which has instructions up to five words long, a more simplified approach is 
necessary . This is achieved by consider ing fewer execution states and less 
complex transfers. Figure 5.4 shows the different states for this analysis 
and the transfers between them. It shows that attempting an instruction 
94 
fetch from any of the operand fields is represented by the s a m e state. 
The probability of resuming valid instructions immediately after the 
erroneous jump. P R ( n > - remains unchanged, and clearly the probability of 
entering the operand field. P^W). is given by:-
P X«D) = 1 - P R ( 0 ) Eqn. 5.5 
Again, if a valid instruction is read, the processor will continue in 
step with the program. However, an instruction fetch from the operand 
field will either pass control to another part of the memory map or the 
next logical byte will be read. The probability of interpreting a jump 
instruction is dependent on the proportion of bytes in the operand field 
which will c a u s e a jump. If the data within the field is considered to be 
random, then the probabilities of executing different jump instruction 
types can be obtained from the proportion of e a c h particular type within 
the instruction set. Alternatively they can be evaluated from the analysis 
of particular programs under investigation. 
If the next logical byte is read, this analysis a s s u m e s that the 
probability of reading a valid instruction is dependent on the ratio of the 
number of instructions in the program to the total number of bytes in the 
program area , which is equal to P R ( 0 ) . This is effectively equivalent to 
random fetches within the program area until either a valid instruction is 
read or a jump is generated. 
It follows that the probabilities that the processor has resumed or 
jumped at the end of I. instruction c y c l e s after the erroneous jump, are 
given by:-
P R ( I ) = P R ( i - 1) + P R ( 0 ) . (1 - P ) . P x ( i - D Eqn. 5.6 
P R S T ( , ) = P R S T ( I ' 1 5 + P X R S T • P X ( M ) E ^ 5 7 
Where: - P is the probability of reading any jump instruction type. 
J 
95 
Ppg-pfl) is the probability of reaching the restart state within I 
instruction cyc les of the erroneous jump. 
P X R S T is the probability of reading a restart type instruction in 
the operand field. 
Similar expressions can be obtained for the other jump instruction types. 
The probabilities of the final outcomes of resuming or transferring, is 
given by the above equations when I is equal to infinity. In pract ice only 
the first ten cyc les are important. 
A compar ison between the results from this analysis and the previous 
more detailed analysis is given in the following sect ion. It shows that 
despite the different approach the results are in fairly c lose agreement. 
5.4 Compar ison Between the Detailed and Simplified Analyses 
The simplified analysis descr ibed above was carr ied out on e a c h of the 
programs studied in section 5.2.2. and the results from these programs are 
shown in table 5.3. By making a compar ison with the previous values in 
table 5.2 it can be seen that both approaches give similar results. 
Therefore the simplified analysis is an acceptable approximation to 
erroneous execution in program a r e a s . 
The main reason for developing this approach was to enable a 
comparison to be made between the 8-bit p r o c e s s o r s and the 68000. which has 
a 16-bit architecture. To obtain a set of results for the 68000. a monitor 
program for a small single board system was investigated. Values for P (0) 
and P D ( 0 ) were obtained by counting instructions within the software. The 
other parameters were estimated by assuming that the operand fields 
contained random data. This was n e c e s s a r y due to the large instruction map 
of 65.536 c o d e s , which makes the determination of the effect of particular 
values extremely difficult. 
96 
The results obtained for the 68000 are given in table 5.3. It shows 
that the probability of resuming valid instruction fetches is around 20% 
lower than for the 8-bit p rocessors . This difference is made up by the 
increase in the number of restarts, in the form of exception handling. As 
will be shown in section 5.6. this results in a better c h a n c e of detecting 
errors quickly, and improves the prospects of recovery. 
The relationship between the probability of reaching a particular 
outcome, and the number of instructions executed, is given in figure 5.5. 
The graph shows that the 68000 r e a c h e s the final outcome, of execution in 
the program area , in approximately the s a m e time as the other p r o c e s s o r s . 
This Is further supported by the average number of Instructions executed, 
which also shows c lose agreement. 
5.5 Verification of Results 
In order to check the a c c u r a c y of the results, tests were carr ied out 
on the monitor program for the 8085. From a set of random numbers, 200 
a d d r e s s e s were selected which fell within the program area . Then starting 
at e a c h of these a d d r e s s e s , the bytes were translated into instructions and 
the flow of execution, which the processor would follow, was determined. 
Only two possible outcomes were cons idered , that of resuming valid 
instructions and that of a transfer to another part of the memory map. The 
probability of resuming c a m e to 94.1%, and that of a jump to 5.9%. 
Compar ison between these values and those in table 5.2, obtained from the 
detailed analysis , show direct agreement proving that the p r o c e s s gives 
accura te results. 
5.6 Improvements in Recovery 
To improve the c h a n c e s of recovery the p rocessor must be able to 
detect that an error has occurred . This c a n be achieved by software in one 
97 
of two ways. Firstly, at a low level by increasing the probability that a 
restart will be generated, or secondly at a higher level, by encouraging 
execution to resynchronise with correct instructions and to detect the 
error from within the program. The first solution will give the quickest 
recovery, but as will be shown in the following sect ions , it is not easy to 
attain. 
5,6.1 Low Level Detection 
This can be achieved, in the s a m e way as in the data a r e a , by 
increasing the probability that a restart instruction will be interpreted, 
it is therefore n e c e s s a r y to force the restart o p - c o d e s into the operand 
fields within the program. The most commonly used operands are those which 
contain program or data a d d r e s s e s . T h e s e can be forced to contain 
particular values by the suitable positioning of memory blocks. 
For example, many 8085 based systems contain RAM starting at address 
2000 hexadecimal , and as a result a significant proportion of the third 
bytes in triple byte instructions contain the value 20. By moving the data 
area to the address range FFOO to F F F F these values are replaced by F F , the 
restart 7 instruction. A similar arrangement is possible for the 6800 by 
moving the data area to the address range 3F00 to 3 F F F , so that more bytes 
of the value 3F (op-code for a software interrupt) appear in the operand 
fields. 
This type of procedure could also be employed with the 68000. For 
this processor , address ranges A000 to A F F F and F000 to F F F F c a n be used . 
These are all values of unassigned o p - c o d e s which initiate exception 
handling if an attempt is made to execute them. This provides a much 
larger data area of up to 8192 bytes if both blocks are used. This method 
cannot be used for the 8048 b e c a u s e it does not have any restart type 
98 
instructions in the instruction set. 
Table 5.4 shows the effect of increasing the number of restart type 
instructions in the operand fields of the 8085 and 6800. It contains 
X X X 
results given by the detailed analysis for three programs. A . B and C . 
where the modifications have been made. T h e s e correspond to the original 
programs A. B and C in table 5.2. By comparing the values it c a n be s e e n 
that the number of restarts are increased quite substantially, but still do 
not form the major outcome from execution in this a rea . 
A further means of increasing the number of restarts would be to move 
the program area as well. However in most small systems containing only 
one program area , this is not possible b e c a u s e the memory block must be 
positioned to coincide with the reset , restart and interrupt vectors. 
Also, in the c a s e of the 6800. b e c a u s e it only has one restart type 
instruction, both the data area and the program area could not be moved to 
utilise this effect. 
For both p rocessors this is only suitable for data blocks up to 256 
bytes long. Any larger a reas would i n c r e a s e the number of o p - c o d e s , 
adjacent to the restarts, within the operand field. For both the 8085 and 
the 6800 this would reduce the c h a n c e s of recovery by introducing more 
undesirable jump instruction types. Therefore, unless the data blocks c a n 
be split up into 256 byte lengths, this does not provide a means of 
increasing error detection which would Improve the c h a n c e s of recovery. 
5.6.2 High Level Detection 
Another way of detecting that an error has occurred is to encourage 
execution to resume valid instruction fetches from the program. It is then 
possible to test certain conditions from within the software. This would 
seem to be the better solution in the c a s e of the 8085, 6800 and 8048 
99 
b e c a u s e there is already such a high probability that execution will 
resychronise with the program. 
This can be further increased by the same methods descr ibed in the 
previous section. However, the positions to which the blocks of memory 
should be moved, are to those which increase the number of non- jumping 
instruction types within the operand fields. Ideally, single non- jumping 
instructions should appear in the second byte of double byte instructions 
and also in the third byte of triple byte instructions. Double non- jumping 
types should appear in the second location of triple instructions. 
It would be possible to write programs such that the above conditions 
were met at all t imes, but this would impose tight restrictions on the 
software, by eliminating the use of certain a d d r e s s e s and data. 
5.7 Summary 
This chapter has shown that after an erroneous jump into a program 
area for the 8085, 6800 and 8048, execution has the probability of about 
95% that it will resychronise instruction fetches with the program. Slight 
variations in this figure c a n be obtained by suitable hardware design and 
programming, but the most efficient method of detecting errors is from 
within the software. A number of these software m e c h a n i s m s are descr ibed in 
chapter 8. 
For the 68000 processor the probability of resychronisat ion is much 
lower at 72%. and the probability of a restart or exception is around 26%. 
Therefore it is n e c e s s a r y to have a recovery routine at the restart 
a d d r e s s e s and to have fault detection within the software. 
The results from these ana lyses , together with those from the previous 
chapter, are used in chapter 7 where the flow of execution between 
different memory a reas is considered . 
100 
C H A P T E R 6 
Erroneous Execution in Unused and Input/Output Areas 
6.1 Introduction 
This chapter looks at the response of different p r o c e s s o r s to an 
erroneous jump into unused a r e a s of the memory map. and those parts which 
are used for input and output devices. It then goes on to consider ways in 
which recovery can be initiated from these types of execution. 
6.2 Execution in Unused Areas 
There are two distinct types of unused locations. Those parts of the 
memory map which are not populated by memory dev ices , and those which do 
reference particular devices but the locations within them are not used. 
In the latter c a s e it has already been demonstrated, in chapter 4. that 
with data a r e a s any spare locations should be used to s e e d the information 
with restart instructions. However, with program a r e a s , no improvement in 
recovery is obtained by dispersing the unused locations within the 
software. Therefore they appear as a single block at the end of the 
program area , or as smal ler groups separat ing program modules. 
In most control systems the software is written into read only memory 
In the form of PROM or EPROM. and consequently any unused locations are 
left unprogrammed. usually taking the value F F in hexadecimal . In the c a s e 
of the 8085 and 68000, instruction fetches from these locations generate 
restarts. For the 8085 the restart 7 instruction is interpreted, and for 
the 68000 an unassigned instruction is encountered which initiates 
exception handling. Therefore recovery can be performed by a suitable 
error handling routine. 
For the 6800 and 8048, the o p - c o d e F F is interpreted as a non- jumping 
instruction and therefore s u c c e s s i v e locations will be a c c e s s e d until 
101 
another memory area is reached . This can be prevented by using the 
locations to pass control to a recovery routine, and can be achieved by 
adding restart or jump instructions. For the 6800 all the spare locations 
should be set to 3 F , the code for a software interrupt. Whereas , for the 
8048 the value 04 can be used to generate a jump to location 004, in the 
same way as that descr ibed in section 4.7.3. However the s u c c e s s of this 
method for the 8048 depends on the amount of the memory map which is used , 
and the state of the memory bank se lect flip-flop after the error. This is 
d i s c u s s e d further in section 6.2.3. 
6.2.1 Unpopulated Memory Areas 
Erroneous execution in unpopulated a reas of the memory map is 
dependent on both the processor being used and the hardware attached to it. 
in particular it is determined by the state of the data bus when no memory 
devices are driving it high or low. When this state is known it is 
possible to establish the instructions interpreted and how execution will 
proceed. 
This normally forces the processor to jump immediately to another 
location or to repeatedly read a fixed value and to continue executing the 
same instruction until a used block of memory is encountered. However, in 
the c a s e of some processors which have a multiplexed address and data bus. 
the address at which an instruction fetch is attempted remains on the bus 
during the read cycle if there are no other external inf luences. This 
results in a se r i es of consecut ive numbers being interpreted a s 
instructions, with appropriate adjustments made where multibyte functions 
are encountered. For this c a s e It is possible to trace through the 
s e q u e n c e of instructions which will be executed, for a particular 
instruction set, starting at e a c h possible address . 
102 
It is very simple to influence the response of a p rocessor when 
reading unused memory locations. In this state the data l ines will tend to 
float, and by attaching resistors between them and the power supply rai ls, 
any value can be forced onto the bus. This can be used to generate a jump 
to a speci f ic location and then to execute a recovery routine. 
The following sect ions d i s c u s s the response of e a c h of the p r o c e s s o r s 
being studied, and proposes methods of improving the c h a n c e s of recovery in 
each c a s e . 
6.2.2 Unpopulated Areas of the 8085 
The 8085 has a multiplexed address and data bus. and as a result, 
values read from unpopulated memory a reas are dependent on the capac i tance 
and loading of the bus. Under normal conditions of the bus being connected 
directly to buffers, the capac i tance and loading is such that the low order 
byte of the address always remains valid during the subsequent read if it is 
not driven to any speci f ic value. 
As mentioned in the previous sect ion, this results in consecut ive 
values being interpreted as instructions. T h e s e values can fall anywhere 
in the range 00 to F F , and therefore there are 256 different positions 
within the s e q u e n c e where execution can c o m m e n c e . The outome from entering 
at e a c h of these locations has been determined by tracing through the 
s e q u e n c e of instructions which the processor would interpret. For 
s e q u e n c e s where conditional jump instructions are encountered, it was 
assumed that the probability of a jump would be 50%. Such s e q u e n c e s were 
divided proportionally into the different outcomes which could be 
generated. Then the effective number of locations within the 256 byte page 
which c a u s e each of the possible outcomes was calculated. T h e s e results 
are given in appendix 5. together with the probability of e a c h of the 
103 
transfers. 
As in the previous chapters , divisions into halts, restarts, random 
jumps and returns can be made, but in this c a s e a number of speci f ic jumps 
are also possible. The probabilities of e a c h of these groups, and the 
average number of instructions executed before them, are given in table 
6.1. 
However, the speci f ic jumps occur to locations in the range C000 to 
F F F F . and in most small s c a l e control applications it is unlikely that many 
of these locations will be populated with memory. In this c a s e execution 
will continue as before until another transfer is reached , and the effects 
of this arrangement are shown In table 6.2. 
The results indicate that on half of the o c c a s i o n s , of a random jump 
into unpopulated a r e a s of the 8085. a halt will be executed. If no 
mechanism is built into the system to recover from this situation then 
total failure will occur . The other outcome which has a high probability 
is that a return instruction will be executed. As mentioned in section 
4.3.3. the address to which control p a s s e s in this c a s e depends on the 
contents of the stack, and can either be a valid program address or a 
random location. 
The results also show that it is highly probable that a large number 
of erroneous instructions will be executed before leaving the a r e a , with 
the average for all transfers at around 40. 
This type of execution can be totally eliminated by suitable loading 
of the bus. as indicated In section 6.2.1. By applying pull-up resistors 
between the data lines and the power supply rail the value F F will always 
be read when unpopulated memory a r e a s are a c c e s s e d . This will result in 
the interpretation of the restart 7 instruction, and recovery can then be 
104 
performed by a suitable routine. 
6.2.3 Unpopulated Areas of the 8048 
The 8048 u s e s a multiplexed address and data bus when a c c e s s i n g 
external memory, and consequently a similar response to the 8085 is 
observed when unpopulated a reas of the memory map are a c c e s s e d . Again the 
low-order byte of the address remains on the bus during the read cyc le 
under normal buffering arrangements. 
The effect of a jump into e a c h of the 256 different possible locations 
In the observed s e q u e n c e has been studied, and the results are given in 
appendix 5. They show that on 88% of the o c c a s i o n s , of a random jump into 
unpopulated a r e a s , a transfer to speci f ic locations occur . On 9% of the 
o c c a s i o n s , a return instruction will be executed passing control to the 
address stored at the top of the stack. The remaining 3% will c a u s e a 
relative jump, dependent on the contents of the accumulator , to a location 
within the 256 byte page being a c c e s s e d . This p a s s e s control back into the 
unpopulated area where execution will continue until another transfer is 
reached . When considering all t ransfers, the average number of 
instructions executed is approximately 14. 
For the 8048 the response of the processor after a transfer from 
unpopulated memory is very dependent on the particular hardware 
arrangements. This is due to its architecture which Is very different from 
normal 8-bit p r o c e s s o r s , and is descr ibed in more detail in section 3.5. 
Instruction fetches are limited to a 4K address s p a c e and are 
referenced by a 12-bit bus. However, only 11 bits of the program counter 
operate in the normal way. The 12th bit is set by the state of the memory 
bank select flip-flop when a cal l to a subroutine or an absolute jump 
o c c u r s , or is loaded from the stack when a return is executed. The state 
105 
of the flip-flop is only affected by the two instructions; se lec t memory 
bank 0 and se lect memory bank 1. 
This effectively splits the address range into two separate 2K blocks. 
Therefore the execution following a transfer from an unpopulated area is 
dependent on the state of the memory bank select flip-flop and the amount 
of memory which is used. Figure 6.1 shows four common memory arrangements 
for the 8048 which leave part of the memory map unused, in e a c h of these 
arrangements a transfer out of the unpopulated area can result in execution 
reentering the same area. Due to the layout of the instruction map. this 
can occur a number of t imes, and in some c a s e s results in the possibility 
of executing several hundred instructions before reaching the final state. 
The final states reached after a random jump Into the unpopulated a rea for 
each of the four arrangements, and for both conditions of the memory bank 
select flip-flop, are given in table 6.3. 
It shows that, under certain conditions, there is a high probability 
that a return instruction will be reached , in which c a s e the address at the 
top of the stack, after the error, determines the location at which 
execution will continue. if the address is within the unpopulated a r e a , 
then the p r o c e s s will repeat, and execution of another return instruction 
will probably occur . In this way the processor tends to s e a r c h through the 
stack looking for a valid program address . However, during the erroneous 
execution, a number of stack locations are corrupted, and if a valid 
address is not found within the first few positions on the stack then an 
infinite loop will be formed. If a valid program address is found then 
execution will return to the program, but the memory bank se lect flip-flop 
may be left in the wrong state. In this c a s e , if it is not reset before a 
call or an absolute jump is executed, then control will pass to the wrong 
106 
memory block. 
Execution in the unpopulated a reas can be controlled in some c a s e s by 
suitable loading of the bus. The effect of jumping into a string of 04 's 
has been d i s c u s s e d in previous sect ions , and this method can be employed 
here by forcing the value onto the bus with suitable resistors connected to 
the power supply rails. Unfortunately, this is only effective all of the 
time for memory arrangements C and D. For the other two conditions 
execution will loop continuously within the upper 2K block if the memory 
bank select flip-flop is set to one. 
An alternative solution is available for memory arrangement B. by 
loading bits 1 to 7 on the bus with the values 1110010. This forces the 
values E4 and E5 alternately into the unpopulated area . The corresponding 
instructions interpreted by the processor are jump to address in page 7 and 
select memory bank zero. In this way control will always transfer to 
location 7E5 regardless of the position of the erroneous jump into the 
unpopulated area . 
For memory arrangement A, there is no simple method of ensuring that 
control p a s s e s back to the program area . If the memory block is external 
to the processor , then partial decoding can be used to create another image 
of the program in the upper memory bank, and the solution for memory 
arrangements C and D will then work. Otherwise, it is n e c e s s a r y to ensure 
that a program address is always left on the stack and that the memory bank 
select flip-flop is reset before e a c h call or absolute jump instruction in 
the program. This will ensure that on most o c c a s i o n s control will pass 
back to the program, provided that the stack and stack pointer are not 
corrupted by the fault. For the o c c a s i o n s when an infinite loop is formed 
it is n e c e s s a r y to rely on a higher level of recovery provided by external 
107 
hardware. 
6.2.4 Unpopulated Areas of the 6800 and 68000 
Both the 6800 and 68000 microprocessors have separate address and data 
Puses . When a c c e s s i n g unpopulated a reas the data bus floats high and 
therefore the value F F is read. For the 68000 this is interpreted as an 
unassigned instruction and c a u s e s the immediate initiation of exception 
handling, and provides a good method of recovery without any alteration to 
the hardware. 
In the c a s e of the 6800, the value F F is interpreted as a triple byte 
instruction to store the X register using extended address ing . This means 
that instruction fetches will occur at every third s u c c e s s i v e byte until 
another memory block is encountered. One method of recovery from this 
situation is to trap execution when it r e a c h e s the next block. However 
this could result in a substantial delay if a large unpopulated a rea 
exists, as the average number of instructions interpreted will be 
proportional to the size of the block. Also the contents of the location 
F F F F will be destroyed. 
A better solution is to load the data bus so that the value 3F appears 
in all the unpopulated a r e a s . This will Immediately generate a software 
interrupt and enable rapid recovery without any further corruption of data. 
6.3 Execution in Memory Mapped I/O 
When Input and output devices are mapped into the normal memory a r e a , 
it is possible that an erroneous jump may occur into these locations. in 
the c a s e of output l ines the response will be the same as that for unused 
locations, as they will have no active effect on the bus. Therefore the 
same approach can be adopted as for the unpopulated a r e a s , in the form of 
bus loading to force certain values to be read. 
108 
For Input lines the external data will be interpreted as an 
instruction and the corresponding function will be performed. If a number 
of different ports appear in consecut ive locations they will appear to have 
the same effect as a data a rea . However, it is common practice to 
partially decode port a d d r e s s e s so that the same data will appear in a 
number of consecutive locations, sometimes as much as 4K. As in the c a s e 
of unused locations, this will result In an immediate jump or the 
repetitive execution of the s a m e instruction. In the latter c a s e , the 
average number of instructions executed will depend on the length of the 
instruction interpreted and the size of the input a rea . The particular 
response will change according to the state of the input l ines. 
It has been a s s u m e d that the state of the lines is random in nature, 
and from this the probabilities for the different outcomes have been 
calculated for particular instruction sets. The results for the 8085. 6800 
and 68000 are given in table 6.4. 
For the 8085. execution will on most o c c a s i o n s exit from the block of 
input data, but may take a substantial amount of time to do so if large 
blocks exist. Alternatively, a number of specif ic jumps are possible. In 
the previous ana lyses for execution in program and data a r e a s , these codes 
produced random jumps b e c a u s e the operand fields were not dependent on the 
particular code. In this c a s e the operand bytes are the s a m e as the code , 
and therefore jumps to speci f ic a d d r e s s e s are generated. Due to the layout 
of the instruction map, these c a u s e control to transfer to particular 
locations in the range C000 to F F F F . 
For the 6800, again the majority of c a s e s will result in execution 
leaving the area. However some of these are due to relative branches 
backwards out of the beginning of the block. If the preceding a rea is 
109 
unused and the data lines have been left floating, then execution will pass 
back to the input a rea and form a continuous loop until the data on the 
port c h a n g e s . 
Execution for the 68000 will tend to continue through to the end of 
the a r e a or will generate a restart. 
This type of execution can be eliminated in all three p r o c e s s o r s by 
fully decoding the ports so that the input data only appears at single 
locations. In the c a s e of the 8085 there is no need to use memory mapped 
input, unless more than 2048 lines are required, and therefore should be 
avoided if possible. 
6.3.1 Execution of Input Data by the 8048 
If no external memory is connected to an 8048 the bus can be used as a 
port, and could be used for input data. In this c a s e the data will appear 
at all memory locations which were previously left unpopulated in memory 
arrangement A shown in figure 6.1. An erroneous jump into this a r e a with 
the memory bank select flip-flop set at one, will provide no means of 
e s c a p e as execution will be restricted within the upper 2K memory block. 
With the flip-flop set at zero, the probability of forming an infinite loop 
will depend on which half of the memory map the erroneous jump o c c u r s . For 
the upper half the value is 96.7%, whereas for the lower half it is only 
12.0%. 
The formation of infinite loops should be prevented if at all possible 
so that it is not n e c e s s a r y to rely on external hardware to initiate 
recovery. This can be achieved by avoiding the use of the bus as a port. 
However, if it is required it should be used for output and the same 
precautions taken as those d i s c u s s e d for unpopulated a reas . 
no 
6.4 Summary 
This chapter has shown that an erroneous jump into both unused and 
input/output a r e a s can result in a complex s e q u e n c e of execution, which can 
last several hundred instruction cyc les or even form infinite loops. In 
the latter c a s e , recovery can only be initiated by the intervention of some 
additional hardware. 
Methods of controlling execution within these a r e a s have been 
d i s c u s s e d , and a simple solution for most p r o c e s s o r s , of loading the data 
bus. has been descr ibed. 
The results obtained are used in the following chapter where the flow 
of execution between different memory a reas is considered . 
111 
C H A P T E R 7 
Flow of Execution Between Different Memory Areas 
7.1 Introduction 
The previous ana lyses have produced methods of determining the flow of 
execution within certain types of memory a r e a s . This chapter cons iders the 
transfer of execution between these a r e a s , to evaluate the overall response 
of a processor after an erroneous jump to any location within the memory 
map. Figure 7.1 shows the various states and transitions which will be 
studied. Areas of memory mapped output are included with the unused a r e a s , 
as they have the s a m e effect. Four final states are present in the model, 
these indicate that the processor is expected to halt operation, to enter 
an infinite loop, to resume executing valid instructions or to recover from 
the error. 
7.2 Method of Analysis 
This analysis u s e s a similar approach to that used for the erroneous 
execution in program a r e a s and the equations derived are of the s a m e form. 
For example, the probability of execution being in a particular memory area 
after a given number of transfers, between the different a r e a s , is given 
by:-
P X i ( 0 = ^ V M ) - P X 1 X I E l " - ™ 
Where: - Xi represents a particular memory area . 
X] represents e a c h of the four different a reas . 
P XjXi i s t h e P r o l 3 a t ) " i t y o f t n e transfer from Xj to Xi. 
I is the number of transfers after the initial error. 
The probability equations for reaching e a c h of the final states are 
given by:-
112 
P x f ( . ) = P X f ( l - l ) + 2 : P X j ( M > . P X J X f Eqn.7.2 
Where: - Xf represents a particular final state. 
Solutions to these equations can be found for all positive integer 
values of I. provided that the initial conditions and the probabilities for 
e a c h of the transfers is known. The methods of evaluating these quantities 
are given in the following three sect ions. 
7.3 Initial Error 
By assuming that the initial error c a u s e s a jump to a random location 
within the memory map. it follows that the probability of entering e a c h of 
the different a reas is proportional to the relative size of the block. In 
this c a s e the size of the block includes all a reas where that particular 
memory type appears . If certain memory devices are not fully decoded then 
multiple copies of the data will appear in the map and therefore it will be 
more likely that execution will enter that a rea . 
The probability of entering the program area immediately after the 
erroneous jump, P p ( l ) . is given by:-
N P B 
P (1) = ^ p E - E q n . 7.3 
T B 
Where: - N D Q is the total number of program bytes which appear in the 
r b 
memory map. 
N T n is the total number of bytes in the memory map. 
PpCI) . P y H ) and PjCD. the probabilities of entering the data a r e a s , the 
unused a reas and the input a r e a s , are found in the same way. 
7.4 Transfer from Different Memory Areas 
Erroneous execution in different memory a reas has been considered in 
the previous three chapters. They have shown that transfer out of e a c h of 
113 
the a r e a s is generated in up to five particular ways. T h e s e are; halts, 
restarts, unspecified jumps, returns and specif ic jumps. Also, in the c a s e 
of program a r e a s , execution can resychronise with the program and resume 
valid instruction fetches. 
T h e s e transfers can be easily converted into those given in figure 
7.1. Clearly, the execution of halts and resuming valid instruction 
fetches correspond directly and need no alteration. The restarts can have 
a number of different effects, dependent on the processor and the contents 
of particular locations. 
In systems such as the 8085. where a restart c a u s e s execution to 
c o m m e n c e at a given location, it is normally the c a s e that read only memory 
will be mapped to these locations. if no consideration for erroneous 
restarts has been included it is not uncommon for part of the program to 
reside in this area. Under these conditions a restart will c a u s e a 
transfer into a program area and execution will continue in the manner 
descr ibed in chapter 5. 
For other systems such as the 6800, the address at which execution 
continues after a restart, is read from a particular location. Again, read 
only memory will normally be mapped to this a rea , but in this c a s e , 
regard less of whether program or data appears at these locations, execution 
will transfer to some arbitrary location within the memory map. If the 
particular location is considered to be random, then a transfer similar to 
the initial jump will occur . This is acceptable when analysing a single 
restart for the general c a s e . However, any number of a particular restart 
in a speci f ic system will always give the same result. This is considered 
further in section 7.5. 
For an erroneous restart in both types of system, recovery from the 
114 
error can be achieved by the addition of a suitable recovery routine which 
is always executed when the restart o c c u r s . 
The unspecif ied jumps c a u s e execution to transfer to arbitrary 
locations within the memory map. similar to the condition descr ibed above. 
T h e s e transfers are considered to be random in nature and therefore wiil 
have the s a m e effect as the initial jump. 
The returns c a u s e transfers dependent on the contents of the top of 
the stack. In the following analysis it is assumed to be random and the 
corresponding transfers are the same as the initial jump. This is a 
reasonable assumption if the stack is used to store data as well as return 
a d d r e s s e s , or if it is corrupted by the fault. 
Finally, the specif ic jumps always c a u s e a fixed transfer to a 
particular memory a rea . 
7.5 Execution of an Infinite Loop 
The type of execution which has not been determined In the previous 
sect ions is the formation of an infinite loop. In this c a s e the p r o c e s s o r 
continually executes a fixed s e q u e n c e of instructions, and no recovery is 
possible without some external intervention. The formation of loops in 
three different a r e a s have been considered . In all c a s e s the analysis 
estimates the probability of executing the same bytes twice, and if this 
happens it is a s s u m e d that a loop has formed. In real systems this 
situation will not necessar i ly result in a loop, b e c a u s e data may change in 
such a way that returns and conditional jump instructions will act in a 
different way the second time that they are executed. Therefore the 
analysis will tend to make an over estimate of the true value. 
7.5.1 Loops in Data Areas 
The first area in which a loop has been considered is the data a rea . 
115 
For this c a s e , at the end of each transfer after the initial error, a 
calculation is made to determine the expected number of data bytes which 
will have been read. This is obtained from the following equation:-
I 
N B E ( D = X P D ( k ) • N B A V Eqn. 7.4 
k=l 
Where:- N g E ( l ) i S t n e e x P e c t e d number of data bytes read after I 
transfers. 
N B A y is the average number of bytes read during erroneous 
execution in data a reas . 
Assuming that transfers into the data area are random, the probability of 
entering a loop in the data a rea , P ^ ( l ) . is given by:-
N (I) 
P L D ( , ) = P D ( , ) • "ITT- E< n' 7 5 
DA 
N D A is the actual number of data bytes in the memory map, it does not 
include the extra bytes which appear if partial decoding is used. This is 
important b e c a u s e without full decoding identical strings appear in more 
than one location, and therefore It is more likely that a loop will form 
with a particular string. 
7.5.2 Loops In Unused Areas 
Execution in unused a reas follows a number of fixed s e q u e n c e s for a 
given processor and hardware arrangement. If a particular s e q u e n c e is 
executed twice it is a s s u m e d that a loop has formed. Again this will tend 
to give an over estimate, for the s a m e reasons as before. For this 
analysis it is n e c e s s a r y to evaluate the probability that execution has 
been in the unused area . For each state the following expression is u s e d : -
p u x i ( l ) = ^ p u x ] ( M ) • pxjxi E ^ 7 6 
116 
Where: - p y x i ( l ) i s t h e P r o o a o " i t y o f execution in a given area after 
execution in an unused area . 
Xj corresponds to e a c h of the memory a r e a s . 
XjXi represents the transfer from area Xj to Xi. 
From this it follows that the probability of entering the unused area 
twice, P ( j u ( | ) ' is given by:-
4 
p u u = ^ p u x i ( , ) • p x iu E * n - 1 1 
1=1 
Where: - P v . . . is the probability of a transfer from memory area Xi to the 
unused area . 
However, not all double entries into the unused a r e a s will c a u s e a loop, 
b e c a u s e in some c a s e s a number of different s e q u e n c e s appear in the area . 
Therefore P y j M . the probability of forming a loop in the unused a r e a , is 
given by:-
P L U ( , ) = P U U L • P U U ( , ) E q n ' 7 8 
P U U L i s t h e P r o b a b i , i t y o f forming a loop after entering the unused area 
twice. in the following examples It is given a value equal to the 
proportion of s e q u e n c e s which c a u s e specif ic transfers. This will a lso 
give an over estimate for the probability of forming a loop, as the s e c o n d 
transfer may not be the s a m e as the first. However the figures from the 
overall analysis in the following sect ions, using the previous assumpt ions, 
indicate that the probability of forming a loop is small . Therefore the 
inaccurac ies in the model cannot have much of an effect on the final 
results. 
7.5.3 Loops in Input Areas 
The formation of loops in the input a reas is treated in the s a m e way 
as those for the unused a r e a s , and similar expressions to equations 7.6 and 
117 
7.7 are obtained. Therefore the probability of forming a loop in the input 
a rea . R L |Cl>, is given by:-
P L I ( I ) = P I I L • P l i ( l > E q n ' 7 9 
Where:- P ( ( is the probability of entering the input a rea twice. 
P | ) L is the probability of forming a loop after entering the 
input a rea twice. 
P j ) L is evaluated by consider ing Individual arrangements of the memory map. 
For a single block it will take the value of 1. For multiple blocks which 
are separated by other memory types, a value equal to the reciprocal of the 
number of different blocks will give accurate results for the first reentry 
to the a rea , but will be less accurate on subsequent entries. For adjacent 
blocks execution can pass between them and the formation of a loop is more 
likely. If the probability that execution p a s s e s through the a rea is high, 
which is true for most p r o c e s s o r s , it will tend to reach the end of the 
last block regardless of the starting point. In this c a s e the probability 
P ( | L tends to the value of 1. 
7.6 The Expected Number of Instructions Executed 
In the previous chapters the average number of instructions 
interpreted during erroneous execution in e a c h of the memory a r e a s , has 
been establ ished. Now by combining these values with the probabilities of 
passing through the different a r e a s , it is possible to estimate, the 
expected number of instructions executed. N l £ . between the original error 
and reaching the final outcome. In the following examples it has been 
obtained from:-
* ( 4- \ 
Nl^ = Pv-<l> • N I A W . Eqn. 7.10 
1=1 \ t? X l A V 7 
Where:- P y , ( i ) is the probability that execution is in the 'i'th memory 
118 
area at I states after the error. 
N l A y . is the average number of instructions executed for the 
corresponding memory area . 
A more accurate result could be obtained by consider ing the average 
number of instructions executed before each of the possible transfers. 
T h e s e could then be combined with the probabilities of the corresponding 
transfers, and would give individual values for e a c h of the outcomes. 
However, in most c a s e s the averages do not vary significantly, and 
therefore the overall value will be a reasonable approximation. The c a s e s 
where a large variation does exist are for input and unused a reas where no 
fault tolerance has been considered . 
Once the expected number of instructions has been establ ished for a 
particular system, the average length of time of erroneous execution c a n be 
determined from the clock frequency. This is a very important quantity 
when consider ing watchdog des igns , and is d i s c u s s e d further in sect ion 7.8. 
It is also useful in determining the probable damage, to the data within 
the system, that will be c a u s e d by the execution of erroneous instructions, 
this is studied in section 7.9. 
7.7 The Effects of Memory Map Usage on Erroneous Execution 
The previous sect ions have built up a model for the flow of execution 
following an erroneous jump to a random location in the memory map. From 
this model a ser ies of investigations have been carr ied out to study the 
effects of varying the amounts of different memory types. The improvements 
achieved by adding the fault tolerant features, descr ibed in the previous 
chapters , have also been studied. Clearly the results vary between 
p r o c e s s o r s , and they are d i s c u s s e d individually in the following sect ions. 
119 
7,7,1 Memory Maps of the 8085 
The values used to obtain results for this section are taken from the 
ana lyses in the previous chapters. For e a c h of the memory a reas both fault 
tolerant and non-fault tolerant structures are considered . For the data 
area the fault tolerant c a s e consists of single restarts separating 5 byte 
blocks, which is the optimum seeding with a 20% overhead. For the program 
area the results from the standard and modified versions of program B are 
used. Both the unloaded and loaded conditions of the bus are studied for 
the unused areas . To simplify the results, no a r e a s of input data are 
considered in this sect ion. 
From this information the effects of varying the amounts of e a c h 
memory a rea , and the addition of the fault tolerant features, have been 
establ ished. When varying the size of one memory type it is inevitable 
that at least one other must alter in size. To overcome this problem, the 
size of e a c h particular memory type was varied between 2 and 62 K bytes, 
while the other two filled the remainder of the s p a c e In equal proportions. 
The effects of adding the fault tolerant features can then be seen by 
comparing the results between the unmodified and the modified arrangements. 
T h e s e are shown in graphical form in figures 7.2, 7.3, 7.4 and 7.5. 
In e a c h c a s e the results for the non-fault tolerant memory a rea 
include a recovery routine, so that the execution of any restart generates 
an ordered recovery from the error. Without the routine, the restarts in 
the 8085 c a u s e execution to transfer to the low order a d d r e s s e s . In most 
c a s e s without any fault tolerance, the program will reside in this area . 
Previous results have shown that around 95% of these transfers will c a u s e a 
resumption of program execution. Therefore the removal of the recovery 
routine forces nearly all of the outcomes, which previously generated 
120 
recovery, to resume program execution. A se r i es of tests were carr ied out 
to check this arrangement, and they showed almost identical results for the 
formation of loops and the execution of halts. 
7.7.1.1 Fault Tolerant Program Area 
Figure 7.2 shows the effects of adding fault tolerance to the program 
area , by forcing restart Instructions into the operand fields. It shows 
that the probability of recovery i n c r e a s e s with the program size , but even 
with a large program most of the errors will result in a resumption of 
program execution. Therefore it indicates that the most significant 
improvements can be obtained by detecting the error after execution has 
reentered the program. 
However, this does not mean that no consideration should be given to 
the positioning of memory types. In most systems memory map decoding is 
arbitrary, and a number of different arrangements can be obtained with the 
s a m e hardware, and only minor modifications to the interconnections between 
decoders and memory devices. Therefore, if this concept is considered at 
the design phase , no added cost in hardware or software design will be 
incurred. Also the hardware reliability will not be reduced, as there are 
no additional components. 
The added advantage of detecting the error by the erroneous execution 
of a restart, is the speed of recovery, which will be initiated within a 
few instruction c y c l e s . If detection is carr ied out within the program, a 
long delay is possible before reaching the checking routines, and even then 
they may fail to detect the error. It would then remain uncorrected until 
detected at a higher level, and would result in a further delay. This is 
particularly important in critical high speed applications where errors 
must be detected and corrected quickly. 
121 
7.7,1.2 Fault Tolerant Data Area 
The effects of seeding the data area are shown in figure 7.3. As 
expected the improvements obtained increase with the size of the data a r e a , 
but in all c a s e s it is only moderate. This has to be offset against the 
increase in the amount of hardware necessary . In the example 20% extra 
memory is required which will produce a corresponding d e c r e a s e in the 
hardware reliability of the system. 
This gives a c lear demonstration that adding fault tolerance for a 
certain c l a s s of fault can reduce the reliability in connection with 
another fault type, and therefore can result in an overall degredation of 
the full system performance. It has been suggested by Castil lo et al (22) 
that transient failures are up to 50 times more frequent than permanent 
failures. This figure was obtained for medium sized computers which would 
normally be subjected to stable electr ical and environmental conditions. 
For industrial control conditions it is expected that the transient error 
rate is much higher, and therefore the seeding of the data a rea may produce 
an overall improvement. However, other methods of recovery can be employed 
which are more likely to give a greater Improvement. T h e s e are descr ibed in 
chapter 8, and require little extra hardware. 
A disadvantage with these methods is the delay between the fault and 
the detection of the subsequent errors , as mentioned in the previous 
section. This is further illustrated in figure 7.5 (a), where the effect 
on the average number of instructions executed before reaching the final 
outcome, is shown for data a r e a s with and without fault tolerance. The 
seeding of the data area results in fewer instructions being executed, and 
will give a more rapid recovery. it is therefore useful in time critical 
systems, particularly with large data a reas . 
122 
in systems which have extra capacity within the data a rea , an 
improvement will always be made by using the spare locations to seed the 
area with restarts, as no additional hardware is required. 
7.7.1.3 Fault Tolerant Unused Areas 
The effects of loading the bus. so that restart instructions are 
interpreted when execution enters an unused a r e a , are shown in figure 7.4. 
Once again the improvements achieved increase with the size of the a r e a , 
but in this c a s e they are quite substantial even for a small a rea . The 
only extra hardware that is required are 8 pull up resistors. T h e s e 
components are highly reliable when compared with integrated circuits, and 
will have a negligible effect on the overall hardware reliability. Also 
failure to open circuit by itself will not c a u s e total system failure, it 
will only result in the response to an error reverting to the non-fault 
tolerant condition. 
An additional advantage of this arrangement is the reduction in the 
number of erroneous instructions executed following a fault. This is shown 
in figure 7.5 (b). Not only does it reduce the time taken to initiate 
recovery, but it a lso reduces the probability of destroying data within the 
system. The advantages of adding fault tolerance to the unused a r e a s are 
very significant, and therefore should be incorporated in all 8085 systems. 
7.7.2 Memory Maps of the 6800 
The effects of adding fault tolerant features to the 6800 are shown in 
figures 7.6, 7.7 and 7.8. As with the 8085, the non-fault tolerant memory 
a r e a s are shown with a recovery routine. The restart on the 6800 reads the 
address , at which execution resumes , from the high order memory area . If 
the vector has not been set an arbitrary jump will occur , which is a s s u m e d 
to be random. The memory map is considered to be arranged with the data 
123 
area at the low order a d d r e s s e s , and the program area at the high order 
a d d r e s s e s . This Is the normal arrangement so that non-volati le memory is 
resident at the restart and interrupt vectors, and so that direct 
addressing can be used for frequently a c c e s s e d data in the zero page. With 
this situation a jump into the non-fault tolerant unused a rea results in a 
transfer back into the program area . Therefore most restarts, without the 
vector set. will c a u s e a resumption of program execution. 
Figure 7.6 shows the effects of adding fault tolerance to the program 
area . A similar result Is obtained to the 8085 and the s a m e conc lus ions 
can be drawn. 
For the data a rea , fault tolerance is added in the form of two restart 
bytes separating 10 byte blocks of data, representing the optimum 
arrangement for a 20% overhead. A very different set of results are 
obtained, and these are shown in figure 7.7. This Is due to the ratio 
between the number of halt and restart instructions in the 6800 instruction 
map. In this c a s e the seeding of the area with restarts does have a 
significant effect. especial ly for systems with large data a reas . 
Therefore it is more likely to produce an overall improvement in system 
performance despite the additional hardware required. In any c a s e , spare 
bytes should be used in pairs to separate blocks of data. 
For the unused a r e a , the results are shown in figure 7.8. Again a 
very different response is obtained from that given by the 8085. B e c a u s e 
of the memory layout, erroneous execution in the non-fault tolerant unused 
a rea , leads to a resumption of program execution. Therefore it might be 
suggested that fault detection could be carr ied out within the program. 
But as execution continues sequentially through the unused a r e a , a very 
long delay could be generated. For example, the average number of 
124 
instructions executed in a 6K block will be 1024, b e c a u s e triple byte n o n -
jumping instructions are interpreted. Therefore fault tolerance in the 
form of bus loading should be included in all 6800 systems to enable rapid 
recovery from erroneous execution in unused a reas . As with the 8085. this 
has a negligible effect on the hardware reliability. 
7.7.3 Memory Maps of the 68000 
For the 68000 significant Improvements cannot be obtained by forcing 
restart instructions into the operand fields, as very few erroneous 
instructions are interpreted during execution in program areas . As with 
the other p r o c e s s o r s , on most o c c a s i o n s it is n e c e s s a r y to detect erroneous 
execution in the program area from within the software. Very little effect 
is possible on the execution in data a reas as very few instructions will be 
executed. Also, approximately 95% of the transfers out of this a rea will 
be restarts in the form of exception handling. 
For the unused a rea it was shown, in section 6.2.4, that instruction 
fetches will generate restarts by reading the code for an unass igned 
operation. This occurs without any modifications. However in later 
versions of the device, the code may be ass igned a function. Therefore in 
view of future developments, a better solution would be to force a valid 
restart instruction onto the bus. 
The necessi ty of setting the restart vectors and providing a recovery 
routine are obvious from the d iscuss ions for the other p rocessors . For the 
68000 it is even more important b e c a u s e of the generation of restarts at 
e a c h unused location. In the same way as the 6800 an arbitrary jump will 
occur if the vector is not set. If the address to which execution 
transfers is also unused another restart will be generated. This will 
repeat in an infinite loop with no means of e s c a p e , except from external 
125 
intervention from hardware. Due to the large addressing range of 16 M 
bytes, it is likely that only a small proportion will be used , especia l ly 
for industrial control, and therefore the setting of the vectors is even 
more critical. 
7.7.4 Memory Maps of the 8048 
Memory map variations for the 8048 are fairly limited. With 12 
address l ines, instruction fetches are restricted to only 4 K of memory. 
Random a c c e s s memory is mapped to separate locations and cannot be executed 
as instructions. However fixed data can appear in the 4 K map and may 
therefore be read as instructions under fault conditions. 
Due to these tight constraints, very little c a n be done to the program 
area to improve error detection. Again It is n e c e s s a r y to carry out the 
checking p rocess from within the program. Seeding of the data a rea was 
investigated in sect ion 4.7.3, and showed that improvements were very 
slight due to the a b s e n c e of a restart instruction in the processor . 
However, as the data is always known before execution, it is possible to 
check for any s e q u e n c e s which would result in undesirable execution, s u c h 
as an infinite loop. If such s e q u e n c e s are found, the data could be 
rearranged to eliminate them. 
The type of execution expected for different unused blocks was 
d i s c u s s e d in section 6.2.3. Bus loading was shown to be particularly 
important, as without it there is a high probability of forming an infinite 
loop for certain arrangements. 
B e c a u s e there is less scope for the detection and correction of errors 
by the processor , it is n e c e s s a r y to rely more heavily on an external 
hardware monitor, such as a watchdog timer. However, this can result in 
long delays before correct execution is restored, due to the t ime-out 
126 
period of such devices. 
7.8 Number of Erroneous Instructions Executed 
In the previous sect ions the ana lyses have led to a figure for the 
expected number of erroneous instructions executed between the initial 
error and reaching the final state. This gives an indication of the 
probable length of erroneous execution, but does not produce limits for the 
most likely events. 
T h e s e can be achieved by assuming that the distribution of the 
probability. P ^ M - that N instructions or more will be executed, follows 
an exponential curve. P N ) ( N ) Is then given by:-
P N | ( N ) = e ~ A " N Eqn. 7.11 
Where: - A is a constant. 
It c a n be shown that the expected value for this function is equal to 
the reciprocal of A. From this information it is possible to determine 
NI L > the limit of the number of instructions executed for a given 
proportion. P £ , of the errors. T h e s e quantities are related by:-
N I L = - N I E . In P E Eqn. 7.12 
Where: - N l £ is the expected number of instructions executed, determined 
from the previous sect ions. 
For example from equation 7.12. Nl^ takes the value 22.2 when N l £ is equal 
to 9.65 and P £ Is equal to 0.1. This means that where the expected number 
of instructions executed is 9.65. 90% of the errors will result in less 
than 23 instructions being executed before reaching the final states. This 
gives c lose agreement with figure 4.1 (a) for execution in the data area of 
the 8085, which has an average number of instructions executed of 9.65. It 
therefore suggests that this is likely to be a reasonable approximation for 
the overall execution. 
127 
These limits are useful in estimating the proportion of errors which 
will be detected by a watchdog for various time out periods. Another use 
for these values is in estimating the damage to data within the system. 
This is d iscussed further in the following section. 
7.9 Probability of Data Corruption 
Having establ ished a method of estimating the number of erroneous 
instructions executed, it is possible to determine the probable effects 
that this will have on the data within the system. Every instruction has 
the effect of changing at least one quantity, as they ail alter the 
contents of the program counter. For the 8085, the effective number of 
instructions which change other quantities is shown in table 7.1, and the 
probability that a single instruction will not c a u s e a corruption is a lso 
given. This a s s u m e s that the instructions interpreted are totally random 
in nature. For N Instructions the probability, P N < ~ . ( N ) , t n a t n ° corruption 
o c c u r s , is given by:-
P N C ( N ) = P N C ( 1 ) N E q n 7 1 3 
Figure 7.9 shows how the probability, that no corruption will occur to 
the accumulator and the B register, d e c r e a s e s as more instructions are 
executed. 
From values obtained using the previous sect ion, this leads to the 
estimation of the lower bounds on the probability that no corruption to a 
particular data element will occur . Using the previous example, of less 
than 23 erroneous instructions being executed, the probability of no 
corruption occurr ing to the B register in an 8085 Is 33.0%. Whereas the 
probability of no corruption to the Accumulator is only 0.01%. 
7.10 Summary 
This chapter has used the information derived from the previous three 
128 
chapters , to determine the flow of erroneous execution following a jump to 
a random location in the memory map. The effects of varying the amounts of 
different memory types have been studied for a variety of p r o c e s s o r s , and 
the relative merits of the different methods of introducing fault tolerance 
to e a c h of the areas have been establ ished. 
It has been shown for all p rocessors that bus loading, to c a u s e 
restarts in unused locations, is a very effective way of initiating rapid 
recovery. For example, with the 8085 the proportion of errors resulting in 
recovery c a n be increased from around 20% to over 90%. This level of 
improvement is obtained when only a small proportion of the memory map is 
used, which is the c a s e in most small s c a l e industrial controllers. 
The positioning of memory a r e a s , to introduce particular values into 
the operand fields of programs, provides improvements of less than 10% and 
also i n c r e a s e s the speed of recovery. Although the benefits are small the 
method should be considered when designing systems, as these improvements 
are obtained without involving any additional costs . 
Seeding data a r e a s with restart instructions requires a substantial 
inc rease in hardware If spare capacity is not available. Not only does 
this i n c r e a s e costs but it also reduces the overall hardware reliability. 
Only the 6800 showed the capability of a significant improvement in 
recovery, and therefore it Is the only processor for which it is worth 
considering the use of this method. However, In order to provide an 
overall improvement the increase In reliability due to the recovery from 
transient faults must be greater than the reduction in reliability due to 
permanent hardware failures. Therefore significant improvements can only 
be obtained, by this method, in systems which suffer from a high proportion 
of transient failures. 
129 
Finally, methods of determining the limits of the number of erroneous 
instructions executed, have been presented. These are used in the 
following chapter, where examples of adding fault tolerance to a speci f ic 
system will be studied. 
130 
C H A P T E R 8 
Select ion of Error Detection Mechanisms 
8.1 Introduction 
In the previous chapters it has been shown that corruption to the flow 
of program execution can occur in a number of ways. For this reason , 
methods have been proposed for the detection of erroneous execution so as 
to enable the early initiation of recovery p r o c e s s e s . So far the 
individual methods have been considered in isolation when applied to 
general systems. This chapter looks at a specif ic system and investigates 
the effects of adding e a c h of the mechan isms , to establ ish which ones 
should be adopted. In addition, some hardware mechan isms to detect 
erroneous execution are also d i s c u s s e d . 
8.2 Speci f ic System Considered 
The specif ic system considered Is a general purpose single board 
computer based on the 8085 microprocessor . It has been used for a number 
of applications within the British G a s Corporation. The system contains 4K 
EPROM. 2K RAM. four 8 bit input ports and four 8 bit output ports. The 
memory locations at which these devices can be a c c e s s e d are shown in figure 
8.1. The EPROM, RAM and e a c h input port are se lected as 4K blocks, 
therefore the RAM is mapped into two adjacent 2K blocks and e a c h individual 
port can be a c c e s s e d from 4096 different locations. All the output ports 
appear within a 4K block and are individually selected by the states of 
four address lines. Therefore if all four lines are active, within the 4K 
block, all the output ports will be selected together. This means that 
individual output ports can be se lected from 2048 different a d d r e s s e s which 
appear in blocks of 256 locations. Pul l -up resistors are connected to all 
the data lines so that the value F F is read from all unused locations. 
131 
8.3 The Effects of Adding Error Detection Mechanisms 
A number of ana lyses , on the effects of adding error detection 
m e c h a n i s m s , have been carr ied out based on the layout of the system 
descr ibed above. Results from these investigations are given in table 8.1. 
Some fault tolerance was considered in the design of the system and has 
already been incorporated. Therefore, to show the advantages of including 
those features, additional studies have been performed on the corresponding 
design without the features. 
8.3.1 The Non-Faul t Tolerant System 
The results for the entirely non-fault tolerant arrangement are 
labelled 'A' in table 8.1. They show that a large number of jumps into 
random locations within the memory map terminate with the entering of the 
wait state, by the execution of a halt instruction. This is due to the 
property of the unused locations which tend to lead execution towards the 
halt instructions. Another observation made is the large number of 
erroneous instructions executed before reaching one of the final states, 
and this is due to the large portion of the memory which is mapped to very 
loosely decoded input ports. In most c a s e s execution p a s s e s straight 
through these a r e a s repeatedly executing the s a m e instruction several 
hundred times. 
From this set of basic results, the aim is to se lect error detection 
m e c h a n i s m s to improve the response of the system under fault conditions. 
It has been shown previously that some methods can produce an overall 
degredation. despite an improvement with regard to an individual fault 
type. This is usually as a result of increased complexity which is 
inevitable when adding extra features. Therefore, it is c lear that any 
additions must be both simple and effective against the considered fault. 
132 
For the error detection mechan isms studied in the previous chapters it 
has been shown that their effectiveness is related to the size of the 
particular memory block. Therefore the greatest improvements are obtained 
by implementing the features associated with the largest blocks. For the 
system considered , these consist of unused a reas and input ports. 
8.3.2 Removal of Input Areas from the Memory Map 
The input ports take up one quarter of the memory map, and therefore 
will have a significant effect on the respose to a random jump. 
Arrangement ' B ' cons iders the effect of removing the input ports from the 
memory map. Table 8.1 shows that little change occurs in the probability 
0 
of reaching each of the final outcomes. However, a vast d e c r e a s e , of 
nearly 95%, in the average number of erroneous instructions executed, is 
produced. 
A similar response is observed in section 8.3.4 when the ports are 
removed while other features are present. A reduction in erroneous 
execution is Important to limit the amount of damage which might be done 
during that time. It a lso enables rapid recovery, which is required in 
control situations where time is crit ical. It has been indicated so far 
that the aim is to initiate a recovery p rocess . However, the recovery 
p r o c e s s may not be s u c c e s s f u l if too much damage is done to the data within 
the system. A d iscuss ion of the effects of delays in initiating a recovery 
routine, on the s u c c e s s of recovery. Is presented by P r e e c e et al (81). 
Therefore the improvements obtained by removing the ports are highly 
desirable, and can be easily implemented in this c a s e . The 8085 allows for 
separately mapped I/O by the use of the IO/M line from the processor . This 
can be connected directly to the enable pins on the input buffers, and does 
not require any other logic. Therefore, no detrimental effects to the 
133 
response to other fault types is expected, and clearly this modification 
should be included. 
8.3.3 Addition of a Recovery Routine 
In the previous arrangements d i s c u s s e d , no specif ic recovery is 
possible, as no provision has been made for it to occur . On approximately 
one quarter of the o c c a s i o n s , execution does resume with the interpretation 
of valid instructions. In a control application where all data is read in 
at the beginning of each cyc le , the resumption of program execution will 
give full recovery at the start of the next cycle . However, it is normally 
the c a s e that information is passed from previous calculat ions, and 
therefore a resumption of program execution will not provide acceptable 
recovery. This mechanism is a lso unsuitable in c a s e s where a single wrong 
output can be harmful to the system. 
In these c a s e s it is n e c e s s a r y to include recovery software to 
generate an ordered return to correct execution. This is written most 
effectively as a restart routine, to enable easy a c c e s s and to 
automatically initiate recovery when a restart instruction is erroneously 
executed. The effect of adding a recovery routine, which is entered by any 
restart instruction, is given by arrangement ' C in table 8.1. It shows 
that over 15% of the final outcomes transfer from a resumption of program 
execution to a complete recovery. However, the full benefits are not 
real ised until efforts are made to force erroneous execution to interpret 
more restarts. 
The addition of a recovery routine does add to the complexity of the 
system. If spare capacity is not available extra memory will be required 
which will result in a reduction in overall hardware reliability. However, 
provided that the routine is smal l , failures resulting from its 
134 
implementation will be negligible in relation to the benefits obtained, and 
therefore it should be included. 
8.3.4 Forcing Restart Instructions into the Unused Areas 
The majority of the memory map is unused, and it was shown in chapter 
6 that these locations could easily be made to appear as restart 7 
instructions by the addition of pull-up resistors to the data l ines. This 
modification is given in 'D ' . and represents the system as it was designed, 
it demonstrates the vast improvements which can be obtained by this method, 
lifting the proportion of errors leading to recovery to well over 90%. 
However, the number of instructions executed before recovery can still be 
very high, and this is due to the input a reas in the memory map. 
Arrangement ' E ' shows the effect of removing the input ports from the 
map while retaining the other features. As before, very little change 
occurs in the proportion of the final outcomes, b e c a u s e , as indicated 
previously, most execution p a s s e s straight through and r e a c h e s the unused 
a r e a s following these blocks. However, the number of erroneous 
instructions executed is reduced to single figures. in 90% of the c a s e s 
less than 4 will be executed. 
As Indicated above, implementation of this feature is straight forward 
and has a negligible effect on hardware reliability, and should therefore 
be included in the system. 
8.3.5 Modifying the Program and Data Areas 
it was shown, in chapters 4 and 5. that modification to the data and 
program a r e a s does not produce large improvements In the error detection 
process . Both these a r e a s are relatively smal l , in the system being 
studied, and therefore little improvement is to be expected. The effects 
of adding methods of encouraging recovery during erroneous execution in the 
135 
program and data areas are shown In arrangements F, Q. H and I. For the 
program area , this consists of organising the software so that more restart 
instructions appear in the operand fields. For the data a r e a s , restart 
instructions are interspersed within the memory to limit erroneous 
execution. Both methods do provide further improvements, but only of the 
order of 1%. 
The implementation of these techniques is complex with the placing of 
tight restrictions on the software, or with the addition of extra hardware. 
These both require significant development r e s o u r c e s , and can themselves 
lead to design errors. The costs involved in implementation are not 
justified for the level of improvements that can be obtained, and therefore 
these techniques should not be included. 
8.3.6 Detection Within the Software 
The previous sect ions have shown that the preferred arrangement , for 
the speci f ic system cons idered , is labelled ' E ' in table 8.1. In this c a s e 
recovery from erroneous execution is expected on 93% of the o c c a s i o n s . 
However, 6% of the time execution will resume with the valid interpretation 
of instructions. Some of these can be detected by a watchdog timer, and 
this is d i s c u s s e d below. For the other c a s e s it is n e c e s s a r y to detect the 
errors from within the software. 
If these errors are not detected, software fault tolerance against 
other failures, such as memory errors , may operate incorrectly. For 
example, the errors could c a u s e a jump into a reasonab leness test without 
the preceeding code being executed. If the test failed the p rocessor would 
retry that particular block of code and reapply the test. It could then 
interpret the error as a transient and continue execution assuming that 
full recovery had been achieved, when in fact a higher level of recovery 
136 
was required. 
The flow of execution c a n be monitored in a number of ways. 
Chudleigh (23) suggests the use of a ' re lay- runner ' in which a 'baton' or 
password is carr ied along with execution. This can be implemented with a 
single register which is incremented periodically during execution. Then 
at various points in the control loop the contents of the register is 
checked against the expected value. A d iscrepancy indicates that execution 
has not followed the correct path. This technique does not require a 
substantial amount of extra code. All that is required are single byte 
increment instructions d ispersed throughout the program and a few 
compar isons to check the register contents. 
Alternatively, the flow of execution can be monitored by checking the 
return a d d r e s s e s before leaving subroutines, or by periodically checking 
the current stack level. However, the use of the stack has been shown to 
be a possible source of errors , and can be eliminated completely while 
still retaining subroutines. The return address can be loaded into the HL 
register pair and then the PCHL Instruction c a u s e s the required transfer of 
control. An advantage of this arrangement is that the address can be 
stored in multiple locations and compar isons made between the values before 
a transfer of control o c c u r s . 
* 
By using the techniques proposed above, together with those from the 
previous sect ions , erroneous execution will result In the initiation of the 
recovery p r o c e s s on around 99% of the o c c a s i o n s . 
8.4 Watchdog Timers 
Watchdog timers can be used to detect a proportion of the errors 
resulting from erroneous execution. Some of the factors which must be 
considered when designing them have been Indicated in previous chapters . 
137 
Their importance can now be seen from the results obtained for the system 
descr ibed above. Control systems are usually configured so that the timer 
is updated periodically: typically once during e a c h control loop. However, 
tight constraints are not normally used. For example. Debelle et al (26) 
descr ibe a control system for a power station boiler where the watchdog is 
updated once every second . If a fault occurs immediately after an update 
then a full second of erroneous execution could follow, and this 
corresponds to the execution of approximately one million instructions. 
Clearly a great deal of damage could occur in that time. More seriously, 
if a simple updating mechanism is used , such as an a c c e s s to a single 
a d d r e s s , then this could occur erroneously allowing further incorrect 
execution. 
However, the previous results have shown that, for a non-fault 
tolerant system, erroneous execution will only last for a few thousand 
instructions before a final state is reached. A watchdog will detect most 
c a s e s where a loop is formed or where a wait state is entered, as the 
trigger is unlikely to occur at the correct interval. A watchdog is less 
likely to detect an error when execution resumes the interpretation of 
valid instructions as the trigger s e q u e n c e will reappear. 
With the addition of the error detection m e c h a n i s m s , the watchdog is 
less effective as halts and loops are virtually eliminated. if the t ime-
out period is longer than twice the time interval between updates then no 
errors , which resume program execution, will be detected. This is b e c a u s e 
the worst c a s e is where a fault c a u s e s execution to jump from a point 
immediately before an update, to a point immediately after. By reducing 
the time-out period to the s a m e length as the update time, half the errors 
will be detected. For the other half execution effectively jumps forward 
138 
and generates an update before the normal time. This situation can be 
detected by setting a minimum time. Taken further, the watchdog could be 
arranged to detect the update at a speci f ic clock cyc le , and could then 
detect any s e q u e n c e of erroneous execution. 
This p laces tight restrictions on both the hardware and the software, 
and would probably lead to more failures due to other failure m e c h a n i s m s . 
Therefore the use of watchdogs for the detection of erroneous execution is 
ineffective If other mechan isms have been incorporated. However, it is 
recognised that they must be built into systems requiring high reliability 
to provide a level of recovery to cater for unanticipated faults. 
8.5 Other Hardware Implemented Detection Mechanisms 
A number of other hardware mechan isms to detect erroneous states can 
be used, and a selection of these, applicable to the 8085. are descr ibed 
below. 
8.5.1 Walt State Recognition 
The wait state can be detected, from the status l ines, by the simple 
circuit shown in figure 8.2. A rising edge appears on the output as the 
wait state Is entered. This can be connected directly to the TRAP pin on 
the processor , so that the interrupt routine is initiated immediately after 
the halt instruction has been executed. Recovery from this state would 
also occur with a watchdog timer, but a long delay could result. 
8.5.2 Illegal Instruction Fe tches 
The status lines also indicate when an operation code fetch is being 
performed. Therefore, the circuit shown in figure 8.3 can be used to 
detect illegal instruction fetches outside the program area. The chip 
enable ( C D s ignals , from all devices containing instructions, are ANDed 
together at gate 1, which produces a high output when none of the dev ices 
139 
are se lected . This s ignal , together with the status l ines, produces a 
positive going pulse on the output of gate 2 if an instruction fetch is 
attempted from an invalid area. This could also be connected directly to 
the TRAP pin on the processor . 
Illegal instruction fetches from the operand fields within the program 
areas could be detected by the addition of an extra bit assoc ia ted with 
e a c h location. The bit corresponding to a valid instruction could be 
programmed to 0 while operands or data would be labelled with a 1. 
Detection of an illegal instruction fetch within the program area could 
then be achieved by replacing the output from gate 1 in figure 8.3 with the 
extra data line. By adding a pull-up resistor to the line, all illegal 
instruction fetches from any memory location would be detected. 
This arrangement requires a substantial amount of extra hardware and 
would not be worthwhile In the system being studied. The hardware could be 
reduced by the development of 9 bit wide read only memories, as this would 
limit the extra logic to only a few gates. 
8.5.3 Detection of a Write Outside RAM Areas 
A simple development from the circuit shown in figure 8.3, allows the 
detection of a write into a program area , and a suitable circuit is shown 
in figure 8.4. It is strongly recommended that programs should be stored 
in read only memory for control applications, and in these c a s e s the above 
circuit will be applicable. However, if it is n e c e s s a r y for the program to 
be altered during normal operation, the circuit must be modified to disable 
the output during loading of the program. At other t imes, while enabled, 
it will provide some protection against corruption of the code. 
This concept can be extended to the detection of any writes to 
locations outside random a c c e s s memory a reas . A suitable circuit is shown 
140 
in figure 8.5. where the chip enables ( C D are from all the RAM devices. 
8.5.4 Detection of Undeclared or Unused Instructions 
Another illegal state which can be detected is the execution of an 
undeclared operation code. This has been investigated by Marchal and 
Courtois (63) in connection with permanent s tuck-at failures on the data 
l ines. They suggest that after failure the average detection time is 11 
instruction cyc les for both the 6800 an 68000. This is a useful mechanism 
in the 68000 b e c a u s e it is already built into the device. However, with 
other p rocessors a substantial amount of extra hardware is required, and 
therefore is not worthwhile. The effectiveness of this m e c h a n i s m , in 
detecting erroneous execution after a transient fault, will be very low for 
the specif ic system studied with the fault tolerant features added. This 
is b e c a u s e very few erroneous Instructions are executed. 
The detection p rocess is dependent on the number of undeclared 
instructions in a processor . Clearly, it will be more effective for the 
6800. which has 59 undeclared c o d e s , than for the 8085 which has only 10. 
However, this concept could be extended further to detect all unused 
operation codes within a particular program, but would require c h a n g e s in 
the hardware when different instructions are used. Investigations into 
instruction usage by Lunde (60) revealed that only 75% of the codes were 
used , and that half of these accounted for 99% of the execution time. 
Therefore programming with a reduced instruction set would not be severely 
restrictive, and could be imposed for all programs. But even with this 
arrangement detection of erroneous execution will still be limited. 
8.5.5 Voltage Level Detection 
The hardware mechan isms descr ibed above have all been designed to 
detect errors after they have been produced. The voltage level detection 
141 
mechanism attempts to prevent errors occur ing by suspending execution while 
the output from the power supply is insufficient to drive the system. This 
mechanism can be implemented with a single 8 pin integrated circuit. The 
Texas 7705 monitors the power supply rail and holds the reset line, to the 
processor , low while the voltage is less than 4.75 volts. When the supply 
r ises above this value the reset is held low for an additional time 
interval, which is set by an RC network. This allows the internal state of 
the processor to stabil ise before correct execution c a n commence . 
Interruption testing, similar to that descr ibed in chapter 2. was 
repeated with the voltage level detection circuit added. For short 
interruptions, which did not c a u s e the supply to drop below 4.66 volts, no 
errors were detected. For all other interruptions a full reset occurred 
when the supply was restored. 
A disadvantage of this arrangement is that the delay in restoring 
execution can be relatively long. For example, a delay of 10 ms is 
recommended for the 8085 (119). and 50 ms is recommended for the 8035/8048 
(120). Tests on the processor , descr ibed in chapter 2, showed that it 
could recover from an interruption which c a u s e d the supply to drop to as 
low as 2.5 volts. However, disruption to program execution will o c c u r 
while the supply is between 2.5 and 3.8 volts, but once it has been 
restored, the recovery mechan isms descr ibed above can initiate the recovery 
p rocess within m i c r o - s e c o n d s . In c a s e s where rapid recovery is required, a 
voltage level detection circuit should be set to activate at around 2.5 
volts, and the other mechan isms can be used to recover from smal ler dips in 
the supply. Alternatively, a second level detection circuit could be set 
at a higher level to initiate an Interrupt routine as soon as the supply 
r e a c h e s the level above which no errors will occur . 
142 
8.6 Choice of Mechanisms for Genera l Systems 
The previous sect ions have shown that hardware mechan isms to detect 
erroneous execution, are not effective for the system descr ibed in section 
8.2. with fault tolerance added. This is b e c a u s e it has a good response to 
erroneous execution, which is due to the large proportion of the memory map 
which is unused. In systems where more of the map is populated the 
response will be different. However, the unused a reas and input a r e a s 
should be looked at first before considering other parts of the map. 
For systems containing a large data a r e a , the hardware mechanism to 
detect instruction fetches outside the program area will be effective. For 
large program a r e a s the mechanism to detect instruction fetches from the 
but 
operand and data fields should be considered,^they can only produce small 
improvements. This is b e c a u s e most erroneous jumps into program a r e a s 
result in an immediate resumption of the interpretation of valid 
instructions. Therefore detection within the software will give greater 
improvements than the hardware method. 
This type of procedure to se lect detection m e c h a n i s m s can generally be 
followed for other systems. However, the e a s e of implementation of some 
mechan isms will depend on the particular processor . For example, the 
detection of an Illegal instruction is built into the 68000. and in order 
to generate recovery a suitable routine Is all that is required. 
Conversely, the indication of an operation code fetch in the 6800 is not 
readily available, and therefore instruction fetches from illegal locations 
are difficult to detect. For single chip p r o c e s s o r s , such as the 
8035/8048. there is less scope for the implementation of detection 
mechan isms as few signals are available externally. For these reasons 
mechan isms must be chosen with consideration for both the memory map usage 
143 
and the e a s e of implementation. 
8.7 Summary 
This chapter has investigated the implementation, on a speci f ic 
system, of the detection m e c h a n i s m s for erroneous execution, which were 
studied earlier. It has shown that a very high level of detection c a n be 
achieved by minor hardware c h a n g e s , and with the addition of some extra 
software. 
Other detection mechan isms have been studied, and their effectiveness 
for different systems has been indicated. It has been establ ished that the 
choice of m e c h a n i s m s , to achieve the greatest improvements in reliability, 
depends on both the memory map usage and the processor within the system. 
144 
C H A P T E R 9 
Development of a Facility to Test Redundant Systems 
9.1 Introduction 
This chapter presents the development of a facility to test the 
response of digital control systems which are subjected to a variety of 
transient disturbances. Testing is n e c e s s a r y to check the correct 
functioning of the error detection and recovery mechan isms . With redundant 
software parts of the code will not be executed under normal operating 
conditions. The test facility a ims to simulate faults to enable all paths 
in the program to be executed. 
Several other methods of testing were considered. For example, field 
trials provide accurate results but. due to the infrequent rate of failures 
in digital sys tems, they require a considerable length of time before any 
improvements c a n be establ ished. Another important factor is that failure 
of the system in the field could have ser ious c o n s e q u e n c e s , although this 
can be avoided by testing the system in a monitoring mode without any 
direct control. 
To reduce the period of testing, methods can be used to i n c r e a s e the 
failure rate by subjecting the system to a hostile environment. This 
approach was adopted for the tests, descr ibed in chapter 2, to investigate 
failure mechan isms. During those tests it was establ ished that different 
hardware did not always react in exactly the s a m e way. Therefore, to 
obtain a representative set of results for all hardware it is n e c e s s a r y to 
test a large number of components. 
A solution, which speeds up the whole procedure, is to use simulation. 
This approach was adopted for the Saturn V guidance computer, and is 
descr ibed by Ball and Hardie (5). In this c a s e all internal functions of 
145 
the computer were simulated at the gate level, and the effects of single 
node stuck at 0 or 1 faults were investigated. A less detailed approach 
was adopted by Courtois (24) for a 6800 system. Instead of consider ing the 
gate level of the processor , a functional simulation was developed. 
Clearly, this reduces the amount of work required in the development of the 
model. 
An alternative solution is to simulate faults on an actual processor . 
This eliminates the need for a detailed knowledge of the internal workings 
of the device, and prevents the introduction of errors into the simulation 
at this stage. A small 8085 based system was developed using this 
approach, and it is descr ibed in detail in the following sect ions. It was 
designed around an Intel 8085 system design kit (SDK) board, which has an 
additional memory card containing up to 6K RAM and 8K EPROM. Hardware 
modifications to the printed circuit boards were kept to a minimum to allow 
the system to be used for other purposes. 
9.2 Fault Injection 
To enable full testing it is n e c e s s a r y to simulate faults so that the 
recovery process c a n be observed. A number of methods of fault injection 
were considered. A simple solution would be to corrupt the data, address 
and control buses by deliberately holding individual l ines high or low. A 
more sophisticated version could involve some logic circuitry to monitor 
the lines and inject faults when a certain pattern appears , or at defined 
time Intervals. 
This sort of approach has been adopted by Decouty et al (27). Their 
system intercepts s ignals before reaching individual chips in a similar way 
to the memory masking circuit descr ibed below. However, they are careful 
not to generate any short cicuits which clearly c a n occur in real systems. 
146 
Therefore, only a limited number of different faults are allowed, and these 
consist of s t u c k - a t - 0 and s tuck-at -1 conditions. 
An alternative arrangement is to use a second microprocessor which 
s h a r e s part of the memory with the main processor , it would then be able 
to monitor the execution of the test routines and, when predefined 
conditions occur , inject faults into the system. T h e s e could involve 
corruption of the system buses , or data stored in the shared memory. A 
wide range of faults could be simulated in this way, with the exception of 
corruption of the internal registers of the microprocessor . For these to 
be changed to specif ic values it is n e c e s s a r y for the p rocessor to execute 
valid load instructions, and therefore cannot be achieved externally. 
The dual processor approach has been used by Kuczynski and Pr ice (54). 
but was limited to investigating the specif ic fault condition of single bit 
corruptions in the program code. To achieve this, the s e c o n d processor 
copies the corrupted program into a shared memory block which emulates the 
EPROM of the system under test. The test system is then started and the 
following execution observed. Although this has given some useful results 
for that particular fault condition. It cannot be used to simulate other 
faults. 
The solution which was finally adopted is much more flexible and only 
uses a single microprocessor . External logic circuitry generates an 
interrupt during execution of the test program. The interrupt routine c a n 
be written to simulate a large number of faults, and corruption of the test 
program, stored data and internal registers can be implemented. The timing 
of the interrupt is set by the control software, so both the type of fault 
and the position in the program, that it o c c u r s , can be easily altered. 
147 
9,3 Generation of Interrupts 
To provide thorough testing, it is desirable to inject faults in as 
many p laces as possible. Interrupts are only recognised at the completion 
of execution of an instruction. Therefore to inject the greatest number of 
faults, by this method, it is n e c e s s a r y to c a u s e an interrupt during the 
execution of e a c h instruction. 
In order to generate interrupts at s u c c e s s i v e locations in a program, 
the expansion 8155 (Memory- I /O-Timer) i.e. on the SDK board is used. The 
timer section is designed to give an output after a certain number of 
pulses have been applied to its input. The number of pulses needed before 
triggering is programmable, and can be set by the system software. In 
order to be able to c a u s e an interrupt during s u c c e s s i v e locations in the 
program it is n e c e s s a r y to generate one pulse for e a c h instruction. This 
is achieved by detecting an operation code fetch which can be determined by 
the condition of the status lines SO, S I and IO/M". For an o p - c o d e fetch 
they are 1.1.0 respectively. Combining these together with logic is 
insufficient for the input to the timer, as this conditon remains steady 
throughout certain single byte instructions. For example, a string of no 
operations (NOPs) will produce a single pulse. By including the status of 
the read (RD) line, which is low for only a short period of the o p - c o d e 
fetch, it is possible to generate a single pulse for e a c h individual 
instruction. 
The logic requires that the output is high when SO and S I are high 
together with 1 0 / ^ and RD being low. In boolean a lgebra: -
F = A . B . C . D Eqn . 9.1 
= A . B + C . D Eqn. 9.2 
= A . B + C + D Eqn. 9.3 
148 
Where: - F is the output. 
A. B. C and D are the inputs. 
The circuit shown in figure 9.1 satisfies the logic given by equation 
9.3 by using OR. NOR and NAND gates. However, to reduce the number of 
devices n e c e s s a r y , only NOR and NAND gates were used. Figure 9.2 shows the 
final layout that is wired onto the SDK board. SO, S I . IO/M and RD* signals 
are all taken from the expansion bus. and the TIMER IN signal is connected 
to the input of the 8155. Output from the Timer (TIMER OUT) is connected 
to the interrupt 7.5 (RST 7.5) pin on the 8085. This pin is also used for 
the Vector Interrupt (VECT INTR) key on the SDK keypad which incorporates 
an R C network to prevent multiple interrupts. Therefore, to ensure a quick 
sharp response to the timer out s ignal , the RC network has to be 
d isconnected. 
9.4 Memory Boundary on Test Programs 
The test facility descr ibed so far is capable of providing useful 
results for faults involving the data of a test routine. However, if 
faults are injected into the program itself, causing corruption of the 
program counter, then control could be passed to the SDK monitor. To 
prevent this from occurr ing, additional hardware was designed to restrict 
the test routine to a section of memory away from the monitor. However, 
during execution of the control program, and during the interrupt routines, 
it is n e c e s s a r y to allow the processor to have a c c e s s to all locations. 
Due to the layout of the system It was not possible to restrict the 
test routine to half of the memory map, as this would prevent the control 
program from using the expansion memory board. It was therefore decided to 
allocate the top quarter (16K) of the map for use by the test routine, and 
this requires that the top two address lines (A14. A15) are held high 
149 
during execution of the routine. To satisfy the buffers and address 
decoders , it is n e c e s s a r y to control the two address lines before they 
reach the SDK board. 
The solution adopted was to construct a small circuit ra ised above the 
SDK board. A 40 pin wire-wrap socket , plugged into the normal p rocessor 
location, provides the electr ical connect ions, and the mechanica l support, 
for the extra circuit board. All of the lines make direct contact between 
the 8085 and the SDK board, except the pins associa ted with A14 and AT5 
which are diverted through the extra logic to enable some memory a c c e s s e s 
to be restricted. 
Careful consideration was needed between the timing of the control 
software and the masking of the address lines to ensure the correct 
transition between the control and test programs. This is achieved by 
writing to certain ports, which the extra logic circuitry detects and 
latches. However, the masking is not altered until the processor has read 
the following jump instruction. 
Four transitions to and from the test routine occur for e a c h run. 
Three of these, (from the control program to test routine, fault routine to 
test routine, and test routine back to the control program) are e a c h 
catered for by the above solution. The fourth transition, c a u s e d by the 
fault injecting interrupt, is treated in a slightly different manner. The 
logic detects the Interrupt acknowledge on the status l ines, and waits 
until after the return address has been pushed onto the stack, before 
releasing the address l ines. 
Figure 9.3 shows the circuit diagram for the address masking logic. 
In addition to the details shown. 1 Kilo-ohm pull up resistors have been 
connected to all the data and control s ignals taken from the micro -
150 
processor , and 0.1 uF capaci tors have been connected a c r o s s the power 
supplies to most of the devices. 
The circuit operates in the following manner. Writing any value to 
one of the ports F C . FD. F E and F F . will c a u s e a short low level pulse at 
the output of i.e. 3. and presetting of flip-flop 4a will occur , forcing 
the Q" output low. The SO status line remains high until the end of the op -
code fetch for the jump instruction, and then remains low until both 
address bytes have been read. During this time the output from the OR gate 
(5a) will have changed from a high level to a low level. The rising level 
of the SO line will produce a similar rise on the output of 5a. and 
triggering of both flip-flops 4a and 6b will occur , c learing 4a. Fl ip-f lop 
6b has its inverted output fed back into its input, so that the output is 
toggled e a c h time the device is triggered. The Q output is connected to 
two OR gates, 5c and 5d, these form the link between the p rocessor and the 
SDK board for the two address lines A14 and A15. When the Q output from 6b 
is high, A14 and A15 on the SDK board remain fixed high, whereas with a low 
output they follow the normal outputs from the processor . 
For the transition c a u s e d by the interrupt, the bottom half of the 
circuit is activated. After the interrupt has occur red , the status l ines 
indicate that it has been acknowleged. A short low level pulse is 
generated at the output of 2a which presets the flip-flop 4b. The S I 
status line goes low during the writing of the return address onto the 
stack. At the end of this operation a rising edge occurs at the output of 
5b. triggering the flip-flop 6a and setting its "0* output low. This c lears 
6b allowing normal address ing , and also c lears both flip-flops 4b and 6a. 
Provided that the logic is triggered in the correct s e q u e n c e of. an 
output to port, an Interrupt, and two more outputs to port, then the 
151 
desired masking will occur . To ensure the correct initialisation of the 
logic, the reset line is connected to flip-flop 4a and. through the NAND 
gates 7a and 7b. to flip-flop 6b. Therefore when a reset is activated on 
the SDK board. 4a and 6b are c leared which in turn c lear both 4b and 6a. 
9.5 Software Design 
When designing the control software three main criteria were 
cons idered , speed of operation, e a s e of reprogramming and flexibility. The 
time taken to complete each individual test is of great importance, as 
injecting faults during the execution of e a c h instruction can lead to a 
very large number of runs. Therefore the control software needs to be 
short and efficient. However, to enable quick changeover to injecting a 
different fault, or testing another routine, it was desirable to make 
reprogramming as simple as possible. T h e s e two criteria have conflicting 
requirements, so a compromise solution was adopted. In addition to this 
the overall flexibility of the system had to be considered. The aim was to 
avoid the necessi ty of rewriting most of the basic control software when 
new test routines or fault types are developed. 
Figure 9.4 shows the final structure of the program, and fully 
commented listings appear in Appendix 6. Basical ly , the test routine is 
executed a number of t imes, injecting a fault in s u c c e s s i v e points of the 
program, until e a c h location has been tested. It is reloaded into the test 
a rea before e a c h run so that corruption of the code does not affect later 
tests. However, this is only representative of systems which execute 
programs stored in RAM. For high reliability applications the software 
must not be held in volatile storage to ensure that the program cannot be 
corrupted during erroneous execution or other disturbances. To simulate 
both arrangements of volatile and non-volatile program memory, an EPROM 
152 
emulator can be mapped into the test a rea and the write line can be 
connected , or not. accordingly. 
In more detail, the program performs the following operations. It 
starts by storing the initial value of the timer trigger into memory for 
future use. An opening m e s s a g e is displayed on the terminal requesting the 
end address of the test routine, and the routine is then copied into the 
test a rea . The 8155 Timer i.e. is set so that it will generate the 
interrupt at the correct moment, and the initialisation subroutine is 
cal led to set initial values in the system. The timer is started, and the 
masking hardware, descr ibed above, is enabled to restrict execution to the 
upper 16K memory block. Control is passed to the test routine and 
continues until the interrupt is generated, releasing the masking 
circuitry. The interrupt routine sets the upper two address bits on the 
stack pointer, before retrieving the return a d d r e s s , to ensure that it is 
read from wtihin the upper memory area . The address is then saved a s part 
of a jump instruction at the end of the interrupt routine. The software 
has been arranged so that the last two bytes are mapped into RAM. to enable 
the return address to be written into them, whereas the rest of the program 
is in EPROM. 
All the internal registers are then saved , so that the fault injection 
routine does not affect the internal status of the processor unless this is 
intended. The 'fault' is then injected by call ing a subroutine which 
c h a n g e s the required data. The stack pointer and internal registers are 
reloaded with their original or modified values, the address mask is set. 
and execution returns to the test program at the point at which the 
interrupt occurred. 
At the end of the test routine the address mask is reset and a jump is 
153 
made back into the control program. A check is made to ensure that the 
interrupt has occur red , and indicates whether there are still more 
locations to be tested. If all locations have been tried, then a closing 
m e s s a g e is sent to the terminal and execution returns to the SDK monitor. 
Otherwise, a subroutine is cal led to check the results. For a correct 
solution an ' S ' is sent to the terminal to indicate s u c c e s s , alternatively, 
in the c a s e of a failure, the value of the timer trigger is printed. The 
program continues by jumping to the start, where the trigger is incremented 
and the whole p rocess is repeated. 
9.6 Initial Results 
Initial testing was carr ied out by simulating data corruptions only. 
The effects of these are reasonably straight forward to predict, and 
therefore the results from the test facility were easily verified. For 
example, a non-fault tolerant 8 bit addition routine was investigated. It 
read two numbers, from separate locations, into the internal registers, 
added them together and stored the answer back in the memory. As expected, 
corruptions to the input data in memory only c a u s e d errors If they occurred 
before reading the information into the registers. Conversely, corruption 
of the output location in memory only c a u s e d errors after the result had 
been stored. 
This trivial c a s e shows that the susceptibility of systems to 
transient memory faults c a n be reduced by holding critical data within the 
processor for as long as possible. But clearly, this will i nc rease the 
susceptibility to register faults. This demonstrates the necessi ty to know 
which fault types are most common. The practical tests descr ibed in 
chapter 2 indicated that the memory was less resistant to interference than 
the processor , and therefore the registers provide a safer storage area . 
154 
Obviously, all the data cannot be stored in the registers, and 
consequently, an alternative approach is necessary . Hardware methods s u c h 
as protective coding has been d i s c u s s e d , but these can fail due to multiple 
bit faults or transients affecting the correction mechan isms . To overcome 
these problems, or in the c a s e where no memory protection is available, 
individual data can be stored in several locations. Clearly, this requires 
a large amount of extra memory s p a c e , and can only be justified for 
critical data. 
A simple 8 bit addition routine incorporating triple storage was 
investigated. Even such a bas ic operation can be organised in several 
different ways. For example, the data could be compared as it is read in. 
and a single set chosen for manipulation by a majority vote or select ion of 
a mid-value. The result would then be stored in memory, either in a single 
location or in three separate locations. Alternatively, calculat ions could 
be carr ied out on all three se ts , and a selection made before storage. 
Taken one stage further, separation could be maintained throughout, and 
compar isons made after a number of other operations. 
When consider ing corruptions of single locations, multiple storage of 
data gives large improvements in reliability. However, this must not be 
considered in isolation. It is possible for a large number of locations to 
become corrupted. This can occur as a result of an extensive memory 
disturbance, or by erroneous execution overwriting data. In the latter 
c a s e an erroneous loop containing a ca l l , without a return, will overwrite 
all volatile memory with the same 16 bit word. it is therefore suggested 
that if multiple copies are used, then they should not all be stored in an 
identical way. For example, one or more copies could be complemented. 
This will i nc rease the complexity of the checking routines, but will be 
155 
more effective against extensive errors. 
So far only data corruptions have been considered. Disruption to the 
flow of execution is possible, and this can also be tested on this system. 
However, a few problems are envisaged with this type of error, and 
suggestions for modifications are given in the following sect ion. 
9.7 Possible Developments 
For data corruptions alone, execution will follow a logical s e q u e n c e 
of instructions, provided that the software does not contain any errors. 
However, with the disruption in the s e q u e n c e of execution, an arbitrary 
combination of instructions will be interpreted and as a result the test 
facility can fail in two ways. Firstly, the erroneous execution of an 
output to one of the ports F C , FD. F E and F F will c a u s e the premature 
activation of the masking circuit, and an unpredictable response will 
follow. Secondly, the formation of a continuous loop within the test 
routine will prevent the return to the control program, and thus s u s p e n d 
any further runs. 
The former c a s e occurs infrequently as the probability of picking s u c h 
an instruction at random is approximately 1 in 16,000. But if a large 
number of runs are attempted the failure rate may be unacceptable. It can 
be improved by tightening the conditions required to activate the masking 
circuit and could be achieved by testing for a particular value at the 
port. The formation of loops is more likely, however the resulting 
problems can be reduced by adding another hardware timer. This would be 
set at the beginning of e a c h run, and if it 'timed out' before execution 
re -entered the control software a failure would be indicated and the next 
run initiated. Alternatively, the s a m e timer as that used for fault 
injection could be reset before leaving the fault routine, so as to allow a 
156 
maximum time for further execution. 
Finally, to obtain meaningful results, it is n e c e s s a r y to perform a 
very large number of runs. In order to simplify the analys is , it is 
suggested that the output, from the test facility, is captured by an 
intelligent device which can perform data reduction operations. This would 
enable the rapid evaluation of both the number and type of the failed runs. 
9.8 Summary 
This chapter has presented some ideas on how testing can be performed 
on fault tolerant software, and a particular facility has been descr ibed in 
detail. in this type of software, execution will pass through different 
segments depending on the number and type of errors in the system. Under 
normal operating conditions errors will be rare, and testing of all 
segments is not possible without fault injection. The test facility 
therefore provides an aid to the full functional testing of fault tolerant 
routines. 
157 
C H A P T E R 10 
Conclus ions 
10.1 Introduction 
It is generally accepted that transient and intermittent faults are 
far more common, in digital c ircuits, than permanent faults. It has been 
suggested that they are as much as 50 times more likely. Therefore, in 
order to obtain high reliability, the greatest improvements will be 
achieved by designing in mechan isms to counteract the effects of 
transients. However, recovery cannot be initiated until errors have been 
detected and therefore the detection mechan isms play a very important role 
in the recovery p r o c e s s . 
Investigations have been carr ied out into detection m e c h a n i s m s with 
particular emphasis on software techniques. However, they cannot be 
evaluated until the modes of failure are understood. For this reason 
practical tests were performed to study actual failure modes. T h e s e 
attempted to reproduce the type of transient d isturbances which are 
expected in industrial control applications. 
10.2 Pract ical Tests to Determine Fai lure Mechanisms 
The results of the tests showed that two broad types of failures can 
occur ; corruption to the data within the system, and disruption to the 
correct flow of program execution. Both of these groups of failures 
occurred under different types of interference to e a c h of the main elements 
of the system. The fact that similar failures occur under different 
operating conditions indicates that they will appear in real systems. This 
is true even if the types of interference, used during testing, were not 
representative of those which do occur in industrial controllers. 
Data errors c a n be detected and corrected either by external hardware. 
158 
or internally by the software. Hardware m e c h a n i s m s have been investigated 
thoroughly in the past and the majority of current systems are designed to 
detect and correct single bit errors. The tests did show that single bit 
errors do occur , but are restricted to a narrow band of interference level. 
In the majority of c a s e s multiple bit errors occurred and therefore single 
bit correction m e c h a n i s m s would not be effective. 
Errors in the flow of program execution are more ser ious, as these 
result in the interpretation of an unspecif ied s e q u e n c e of instructions. 
While in this state the processor cannot perform any useful tasks, and the 
data error correction mechan isms cannot work. Therefore it is of paramount 
importance to be able to detect this type of failure so that it is possible 
to re -es tab l ish useful execution. 
In order to be able to develop suitable detection m e c h a n i s m s , it is 
n e c e s s a r y to determine the s e q u e n c e of events following corruption of 
execution. The tests indicated that a fault can c a u s e an erroneous jump to 
any location in the memory map and that, subsequently, the values read 
would be interpreted as instructions. This revealed the importance of 
knowing the exact function of every possible operation code in a m i c r o -
processor . 
10.3 Undeclared Operations in Microprocessors 
Investigations were carr ied out to discover the effects of executing 
the codes which are undeclared by the manufacturers. In most c a s e s useful 
operations were revealed, which leads to the question of why these 
instructions are not declared. The manufacturers were not willing to 
reveal information on this subject, but it is believed that some of the 
codes are left undeclared to retain compatibility between different 
dev ices , whereas others are not d isc losed b e c a u s e original design errors 
159 
mean that they do not function correctly under all operating conditions. 
Some of the codes are particularly undesirable from a reliability 
point of view. T h e s e are the ones which c a u s e the processor to cyc le 
continually through memory reading s u c c e s s i v e locations indefinitely. The 
only means of recovery from this state is a full reset which has to be 
generated by some external hardware. This has revealed that not only is it 
n e c e s s a r y to have external hardware to enable recovery from some errors, 
but also the way in which it is designed is important. For example, 
watchdog timers which generate interrupts, or are updated by the a c c e s s to 
a single a d d r e s s , will not be effective. 
Other undeclared operations of microprocessors have also been 
discovered, such as the cycling through memory in the 8085 as a result of 
power supply disturbances. T h e s e operations are particularly important 
b e c a u s e they cannot be forseen readily, unlike the functions of the 
undeclared c o d e s which, clearly, must exist. Without a full knowledge of 
all possible operations in microprocessors it is more difficult to design 
effective error detection and correction mechan isms. This demonstrates the 
need for a much more co-operat ive attitude from the manufacturers in 
revealing full information about their devices. 
10.4 Execution Following an Erroneous Jump 
Having determined the functions of all the operation codes of the 
8085. 6800, 8035/8048 and 68000. ana lyses were performed to establ ish the 
s e q u e n c e of events following an erroneous jump to a random location. The 
execution which follows depends on the particular type of memory into which 
the jump o c c u r s . Four different memory types were cons idered; data a r e a s , 
program a r e a s , unused areas and input a reas . 
Data a r e a s were a s s u m e d to contain random values, and therefore e a c h 
160 
operation code was equally likely to be read. It was found that execution 
would interpret a number of instructions before encountering a jump. The 
average ranged from between 2 and 10. depending on the processor . 
Program areas contain a logical s e q u e n c e of instructions, but an 
erroneous jump will not necessar i ly pass control directly to a valid 
instruction, as an operand field can be read. However, the analysis 
revealed that there is a high probability that a valid instruction will be 
read immediately, in which c a s e the processor will continue to read valid 
instructions in step with the program. If an operand field is entered 
initially, the probability of reading a valid instruction at the next fetch 
is very high, and it has been shown that resynchronisat ion with the program 
tends to occur very rapidly, usually in less than three or four instruction 
cyc les . 
For unused a r e a s the response depends on the state of the data bus 
when no active devices are connected to it. and this is determined by the 
processor and associa ted hardware. If the bus floats high the value F F 
will be read. Depending on the instruction set. this may be interpreted as 
a jump instruction in which c a s e control will pass e lsewhere , otherwise the 
next location will be a c c e s s e d and the process will repeat until another 
memory block is encountered. For p rocessors with a multiplexed address and 
data bus the address can remain valid during the subsequent read cyc le . 
This results in the execution of a predefined s e q u e n c e of instructions 
dependent on the instruction set and the location of the first read. For 
the 8085 this type of execution terminates with a halt for about one half 
of the initial starting points. 
The data from input ports can be read as instructions if the ports are 
memory mapped. For a number of ports which are fully decoded into adjacent 
161 
locations they will appear to have the s a m e properties as data a r e a s , and 
can be treated in the s a m e way. However, it is common practice to use only 
partial decoding, and therefore the s a m e value will appear in adjacent 
locations, sometimes in as many as 4 K. With rapidly changing data on the 
ports a s e q u e n c e of different instructions will be read, but in the 
analysis presented it was a s s u m e d that data remains stable for several 
mil l iseconds. In this c a s e a jump instruction will be interpreted 
immediately, or the same instruction will be executed repeatedly until the 
end of the block is reached . 
10.5 Recovery from Erroneous Execution 
Having establ ished the possible s e q u e n c e s of execution following an 
erroneous jump for a non-fault tolerant system, methods were considered 
which would allow recovery from erroneous execution. Clearly, the aim is 
to force the processor to execute a recovery routine and this c a n be 
achieved by encouraging the execution of a restart instruction. 
For the data a rea the code for a restart can be placed at regular 
intervals so that they may be read as instructions if execution enters the 
area . However, if multibyte instructions appear before them, they are less 
likely to be executed. This can be overcome by grouping the restart codes 
together. Investigations were carr ied out to determine both the optimum 
spacing and optimum grouping to give the greatest benefits. It was 
establ ished that around a 20% content of restart codes provides the best 
solution, but that the optimum grouping depends on the particular 
instruction set. In c a s e s where there are a large number of multibyte 
instructions the restart codes should be grouped together in two's or 
three's . Although there is an increase in the probability of recovery, 
from erroneous execution in this a rea , by using this method it is not 
162 
considered worthwhile. This is due to the large amount of extra hardware 
that is required, which will itself be prone to failure. A hardware 
mechanism to detect operation code fetches from data a r e a s has been 
presented. It provides immediate detection using only a few simple logic 
gates and will therefore be much more effective. 
The contents of the program a r e a s can be influenced by the positioning 
of various memory blocks. For example, the a d d r e s s e s in a heavily a c c e s s e d 
data area will appear in many locations in the program. Therefore by 
certain positioning of blocks particular o p - c o d e s can be made to appear 
more often. This concept was investigated, but revealed that only marginal 
improvements in recovery could be obtained. As before, in most c a s e s 
execution resynchronises with the program. Therefore m e c h a n i s m s 
incorporated in the software are more effective, to detect that execution 
has not followed the correct path. 
The unused a r e a s can be modified very simply and effectively by the 
addition of resistors between the data and power supply l ines. This forces 
a single value into all locations and can be selected to be equivalent to a 
restart Instruction, so that recovery is Initiated immediately. This 
should be incorporated in all systems. 
For the input a r e a s the ports should be removed from the memory map. 
if possible, otherwise a high level of decoding should be used. This is 
particularly important if rapid recovery is required as large blocks of 
input data can lead to very long s e q u e n c e s of erroneous execution. 
10.6 Choice of Recovery Mechanisms 
The choice of a particular combination of mechan isms depends on the 
size of e a c h type of memory. General ly , modification to the unused a r e a s , 
and detection within the program, should be included. The addition of a 
163 
combination of these techniques ensures that erroneous execution will be 
detected quickly in most c a s e s , but there will be o c c a s i o n s when they will 
fail. It is therefore n e c e s s a r y to provide a higher level of detection in 
the form of a hardware watchdog timer. It has been shown that the design 
of s u c h a timer is important. For example, simple updating methods should 
be avoided as these may be erroneously generated under fault conditions. 
Interrupts must not be used to initiate recovery, as they may not function. 
At least a full reset must be used , and in some c a s e s it may be n e c e s s a r y 
to power-down the system before recovery is possible. 
10.7 Summary 
This thesis has concentrated on error detection m e c h a n i s m s , however 
the recovery process is equally important and requires careful 
considerat ion. It may vary from a simple reset to a thorough c h e c k - o u t of 
the entire system followed by an attempt to reconstruct all critical data 
that was lost. 
The techniques studied provide the greatest improvements to n o n -
redundant systems. They can also be used in redundant systems to enable 
the recovery of a failed unit or to recover from common mode failures. For 
the British G a s application of digital control. a simplex system 
incorporating these techniques and, perhaps, containing some additional 
fa i l -safe m e c h a n i s m s , may be considered to give high enough reliability. 
If higher standards are required it will be n e c e s s a r y to adopt a redundant 
arrangement in the hardware. This can be achieved in a number of ways, 
from a tightly coupled system with voting at e a c h clock cyc le , to a very 
loosely coupled system maintaining separate channe ls from the t ransducers 
to the actuators. 
The latter arrangement is preferred because it essential ly cons is ts of 
164 
several simplex channe ls , which will all receive the full benefits from the 
techniques descr ibed above. Taken individually, each channel will be easy 
to design and maintain, and will therefore be more readily accepted into an 
industry which has been concerned traditionally with mechanica l 
controllers. The arrangement is highly immune to common mode fai lures, and 
is also very adaptable for other applications requiring different levels ot 
reliability. It is simply a c a s e of adding or removing modules as 
required. 
Finally, for any application requiring high reliability, full testing 
of the system is essent ia l before it undertakes active control. Some 
methods of testing have been presented, but these should be followed by 
comprehensive field trials to establish whether specif ied levels of 
reliability have been reached . 
165 
References 
1 Anderson. T. and Kerr. R.. 'Recovery Blocks in Action' . University of 
Newcastle upon Tyne. Technica l Report S e r i e s , No. 93. July 
1976. pp. 1-11. 
2 Arnold, T .F . , 'The Concept of Coverage and Its Effect on the 
Reliability Model of a Repairable Sys tem' . I E E E Trans . 
Computers. Vol C - 2 2 , No. 3. March 1973, pp. 251-254. 
3 Avizienis. A.. 'Arithmetic Algorithms for E r r o r - C o d e d Operands ' . I E E E 
Trans. Computers. Vol C - 2 2 . No. 6, June 1973. pp. 567-572 . 
4 Avizienis. A., 'Faul t -Tolerant Sys tems ' , I E E E Trans . Computers, Vol 
C - 2 5 . No. 12. December 1976. pp. 1304-1312. 
5 Bal l , M. and Hardie, F. , 'Effects and Detection of Intermittent 
Fai lures in Digital Sys tems ' , AF IPS Proc. Spring Joint Computer 
Conference . 1969, pp. 329-335 . 
6 Barigazzi, G . , and Striginl, L , 'Appl icat ion-Transparent Setting of 
Recovery Points' , 13th Annual Int. F T C S . June 1983, pp. 48 -55 . 
7 Barraclough. W., Ch iang , A.C.L. and Sohl , W., 'Techniques for Testing 
the Microcomputer Family, Proc. I E E E , Vol 64, No. 6. June 1976. 
pp. 943-950. 
8 Barton, S.K. et a l , 'Communicat ions Engineering R e s e a r c h Satell ite' . 
S E R C Report. Rutherford Appleton Labs. R A L - 8 4 - 0 1 6 , March 1984. 
9 B a s u , R.N., 'Measurement of Small Signals in a Noisy Environment' . 
IEE Conference on Electr ical Interference in Instrumentation, 
1970, pp. 109-114. 
10 Bell , E.M.. Kwiatkowski. C. and R o s s . C . E . J . , 'Computer Aids for 
Reliability Prediction and Spares Provisioning', E lectr ica l 
Communicat ion, Vol 54, No. 2, 1979, pp. 136-142. 
166 
11 Bland. G.M.S.. Bradbury. K.J. and Smith T.D.. 'Distributed Computer 
Control of a Large Coal Fired Generat ing Unit:- A Design 
Study'. IEE Conference on Distributed Computer Control. 
November 1977. pp. 1-12. 
12 Bologna. S . et al.« 'A Computerized Protection System for a Fast 
R e s e a r c h Reactor ' , I E E E Trans . Nuclear S c i e n c e . Vol N S - 2 7 . No. 
1. February 1980. pp. 803-807. 
13 Boney, J . . 'Let Your Next Microcomputer Check Itself and Cut Down 
Your Testing Overhead' . Electronic Design. September 1979. pp. 
100-105. 
14 Boothman, G. , 'Designing B u s i n e s s Machine Cabinets for Optimal EMI 
Shielding' . Wescon '82 Conference Record . Anaheim. USA. 
September 1982. pp. 1 1 - 1 / 1 - 5 . 
15 Bouric ius. W.G.. Carter , W.C.. J e s s e p . D.C.. Schne ider , P.R. and 
Wadia. A.B. . 'Reliability Modelling for Fault Tolerant 
Computers ' . I E E E Trans . Computers. Vol C - 2 0 , No. 11, November 
1971, pp. 1306-1311. 
16 Brodsky, M.. 'Hardening RAMs Against Soft E r r o r s ' , E lec t ron ics . April 
1980. pp. 117-122. 
17 Buchholz. S . . 'Besitzt der Mlkroprozessor Intel 8085 ein Overflow-
F l a g ? ' , fernmeide-praxis. Vol 58. June 1981. pp. 428-436 . 
18 Bull. J .H. , ' Interference to instumentation due to Transients in the 
Supply System' , IEE Conf. Electr ical Interference on 
Instrumentation, 1970, pp. 94 -100 . 
19 Bumby, E.A., 'Redundancy Management for F ly -by-Wire Sys tems ' , AlAA 
Guidance and Control Conference , Paper 72-884, 1972. pp. 1-5. 
167 
20 Burrow. L.D.. 'The Fail Soft Design of Complex Sys tems ' . IEE Conf. on 
Distributed Computer Control. November 1977. pp. 151-156. 
21 Carter. W.C. and Bourlc ius. W.G.. 'A Survey of Fault Tolerant 
Computer Architecture and its Evaluation' . Computer. Vol 1. 
January 1971. pp. 9 -16 . 
22 Castil lo. X.. McConnel . S.R. and Siewiorek. D.P., 'Derivation and 
Calibration of a Transient Error Reliability Model'. I E E E 
Trans . Computers, Vol C - 3 1 . No. 7. July 1982. pp. 658-671 . 
23 Chudleigh. M.. 'Software Must be Tolerant Too' . Computer Sys tems, 
December 1982. pp. 43 -45 . 
24 Courtois. B.. 'Some Results About the Efficiency of Simple Mechan isms 
for the Detection of Microcomputer Malfunctions'. 9th Annual 
International F T C S , June 1979. pp. 71-74 . 
25 De. B.B. and Krarau, H.B.. ' Fau l t -To le rance In a Multiprocessor. 
Digital Switching System' , I E E E Trans . Reliability, Vol R -30 . 
No. 3, August 1981, pp. 246-252 . 
26 Debelle, J . et al , 'F i rst Belgian Application of a Digital Computer 
for the Control of a 280 MW Boiler of the Thermal Power Station 
at G e n k - L a n g e r l o ' . Digital Computer Applications to P r o c e s s 
Control. 1977. pp. 769-788. 
27 Decouty. B.. Michel, G. and Wagner. C . 'An Evaluation Tool of Fault 
Detection Mechanisms Eff iciency' . 10th Annual International 
F T C S . October 1980. pp. 225-227 . 
28 Dehnhardt. W. and S o r e n s e n . V.M., 'Unspecif ied 8085 O p - C o d e s E n h a n c e 
Programming' . E lec t ron ics , 18th January 1979, pp. 144-145. 
168 
29 Del lacorna. L . Morganti. M. and Novielli. Q., 'A M ic ro -p rocessor 
Based Control Unit for High Availability Applications' . 10th 
Annual Int. F T C S . 1980. pp. 357-362 . 
30 Dick. I.J.. 'Low Frequency Electr ical Interference in P r o c e s s Control 
Computing' . I E E Conference on Electr ica l Interference in 
Instrumentation. 1970. pp. 74-80 . 
31 Doyle. E.A. Jr . . 'How Parts Fa i l ' . I E E E Spectrum, Vol 18. No. 10. 
October 1981. pp. 36 -43 . 
32 Dunn. R.H. and Ullman. R.S. . 'A Workable Software Quality/Reliability 
P lan ' . Proc. Annual Reliability and Maintainability Symposium. 
1978. pp. 210-217 . 
33 Dyer, Q.. 'Protecting Military Systems and Equipment from EMP' . 
Communicat ions International, April 1982. 
34 Elkland, S.A. and Siewiorek, D.P., 'Reliability and Performance of 
Er ror -Correct ing Memory and Register Arrays ' . I E E E Trans . 
Computers. Vol C - 2 9 . No. 10, October 1980, pp. 920-927 . 
35 Emfinger, J . , and F lannlgan, J . . 'Fly by Wire Technology' , AIAA 
Guidance and Control Conference , Paper 72 -882 , 1972. pp. 1-6. 
36 Gelder loos, H.C. and Wilson D.V.. 'Redundancy Management of Shuttle 
Flight Control S e n s o r s ' . Proc. I E E E Conf. on Dec is ion and 
Control. 1976. pp. 462 -475 . 
37 Goldberg. J . . 'New Problems in Fault -Tolerant Computing' . 5th Annual 
International F T C S . 1975. pp. 29 -34 . 
38 G r e e n s p a n , S . J . and McGowan. C.L. . 'Structuring Software Development 
for Reliability'. Microelectronics and Reliability. Vol 17. 
1978. pp. 75-84 . 
169 
39 Gunther. N.L. and Carter, W.C., 'Remarks on the Probability of 
Detecting Faul ts ' , 10th Annual Int. F T C S , 1980, pp. 213-215 . 
40 Hamill, T .G. and Phillips, R.. 'A Fault Tolerant Reconfigurable 
Multiprocessor System' , IEE Conference on Distributed Computer 
Control, November 1977, pp. 139-144. 
41 Hamming. R.W., 'Error Detecting and Error Correcting C o d e s ' , Bell 
Systems Techn ica l Journal . Vol 29. 1960. pp. 147-160. 
42 Hart. A.. Teng. T. and McKenna. A., 'Reliability Influences from 
Electr ical Overstress on LSI Dev ices ' , 18th Annual Proc, of 
Reliability P h y s i c s . April 1980. pp. 190-196. 
43 Hayes, J .P . and McCluskey. E . J . , 'Testability Considerat ions in 
M i c r o p r o c e s s o r - B a s e d Design ' . I E E E Computer. Vol 13, No. 3. 
March 1980. pp. 17-26. 
44 Hecht, H.. 'Faul t -Tolerant Software for R e a l - T i m e Applicat ions' . 
Computing Surveys, Vol 8, No. 4, December 1976, pp. 391-407 . 
45 Heftman, E . , 'Growing Concern over Memory Soft Errors Prompts Intense 
Alpha-Part ic le R e s e a r c h ' , Electronic Design, April 1979, p. 27. 
46 Hnatek, E.R, G r a v e s , W. and Schmitt, R.G., 'How Static is the Static 
4K RAM?' , I E E E Semiconductor Test Symposium. 1976, pp. 3 -8 . 
47 Hnatek, E.R., 'Microprocessor Device Reliability', M icroprocessors , 
Vol 1. No. 5, June 1977, pp. 299 -303 . 
48 Hopkins, A . L Jr . , 'A Fault Tolerant Information Process ing Concept 
for S p a c e Veh ic les ' , I E E E Trans . Computers. Vol C - 2 0 . November 
1971. pp. 1394-1403. 
49 J a c k . L.A.. Kinney. L . L and Berg, R.O.. 'Compar ison of Alternative 
Self Check Techniques in Semiconductor Memories' , 7th Annual 
International F T C S , 1977, pp. 170-174. 
170 
50 Johnson . J.N. and Shaw. J . L . 'System Malfunction Detection & 
Correction Studies Software for a Fault Tolerant Computer -
Dual P r o c e s s o r with Monitor'. Boeing Company, Document Number 
D 1 8 0 - 1 9 2 4 9 - 2 . July 1976. pp. 1-21. 
51 Kim. W.S. et al . 'Radiat ion-Hard Design Principles Utilised in CMOS 
8085 Microprocessor Family' . I E E E Trans . Nuclear S c i e n c e . Vol 
N S - 3 0 . No. 6. December 1983. pp. 4229-4234. 
52 Kodandapani. K .L and Pradhan. D.K.. 'Undetectability of Bridging 
Faults and Validity of Stuck-At Fault Test S e t s ' . I E E E Trans . 
Computers. Vol C - 2 9 , No. 1. January 1980. pp. 55 -59 . 
53 Kopetz. H.. 'Software Reliability'. Macmilian P r e s s . 1979. 
54 Kuczynski, M. and Pr ice . B . L . 'EPROM Evaluation: A Technique for 
the Software Evaluation of Microprocessor B a s e d Burner 
Control lers' . British G a s Internal Report. July 1982. (British 
G a s reports are not normally available to other organisations). 
55 Kurzhals. P.R. and Deloach. R.. 'integrity in Flight Control 
Sys tems ' . Proc. Joint Automatic Control Conference . Vol 1. 
1977. pp. 489-497 . 
56 Lee . P .E . , Ghani . N. and Heron, K., 'A Recovery C a c h e for the P D P -
11', I E E E Trans . Computers. Vol C - 2 9 . No. 6. June 1980. pp. 
546 -549 . 
57 Levine. L and Meyers. W.. 'Semiconductor Memory Reliability with 
Error Detecting and Correcting C o d e s ' . I E E E Computer. Vol 9. 
No. 10. October 1976. pp. 43 -50 . 
171 
58 Lonn. W.M.. Moore. G.H. and Speckman . B.M., 'Operating Exper ience 
with Dual DDC Computer System Pittsburg Power Plant Unit 
No.7'. Proc. 16th Int. I.S.A. Power Instrumentation Symposium. 
Vol 16 A73. 1973. pp. 75 -85 . 
59 Losq. J . . ' Influence of Fault-Detection and Switching Mechanisms on 
the Reliability of Stand-by Sys tems ' . 5th Annual International 
F T C S . 1975. pp. 81 -86 . 
60 Lunde. A., 'Emper ica l Evaluation of Some Features of Instruction Set 
P r o c e s s o r Archi tectures ' . Communicat ions of the ACM. Vol 20. 
No. 3. March 1977, pp. 143-153. 
61 McConnel , S.R., Siewlorek, D.P. and T s a o . M.M., 'The Measurment and 
Analysis of Transient Errors in Digital Computer Sys tems ' . I E E E 
C h l 3 9 6 - l / 7 9 / 0 0 0 0 - 0 0 6 7 $ 0 0 . 7 5 . 1979. pp. 67 -70 . 
62 McKinney, H.N. and Briggs. D . C . 'E lectr ica l Power Subsystem for the 
NATO III Communicat ions Satall ite' , 11th Int. Energy Convers ion 
Conference , 1976. paper 769242. pp. 1408-1413. 
63 Marchal . P. and Courtois, B., 'On Detecting the Hardware Fa i lures 
Disrupting Programs in Microprocessors ' . 12th Annual 
International F T C S . 1982. pp. 249-256 . 
64 May. T .C. and Woods. M.H., 'A New Physical Mechanism for Soft Errors 
in Dynamic Memories' . Proc. Int. Reliability Phys ics Symposium. 
April 1978. pp. 33 -40 . 
65 Musa, J .D. . 'Measuring and Managing Software Reliability'. I E E E 
Proceeding of the 2nd Annual Conference on Computers and 
Communicat ions. March 1983. pp. 105-109. 
66 Nelson. E . C . , 'Software Reliability'. 5th Annual International F T C S . 
1975. pp. 24 -28 . 
172 
6 7 N e m m o u r , M. . ' E t u d e d u F o n c t i o n n e m e n t I n t e r n e d e s M l c r o p r o c e s s e u r s 
6 8 0 0 ' . E N S I M A Q , F i n a l R e p o r t ( N M 4 ) , G r e n o b l e . M a y 1 9 7 9 . 
6 8 N e m m o u r , M. , ' E t u d e d u F o n c t i o n n e m e n t d u M i c r o p r o c e s s e u r M C 6 8 0 9 ' . 
E N S I M A G . C o n t r a c t E D F I 2 5 I A 2 7 3 9 . G r e n o b l e . 
6 9 N i c h o l s . N. , ' 8 0 8 0 / 8 0 8 0 A M i c r o c o m p u t e r ' . I n t e l R e l i a b i l i t y R e p o r t . 
R R - 1 0 . M a r c h 1 9 7 6 . p p . 1 - 1 0 . 
7 0 N g . Y.W. a n d A v i z i e n i s . A . . ' A M o d e l f o r T r a n s i e n t a n d P e r m a n e n t 
F a u l t R e c o v e r y in C l o s e d F a u l t T o l e r a n t S y s t e m s ' . 6 t h A n n u a l 
I n t e r n a t i o n a l F T C S . 1 9 7 6 . p p . 1 8 2 - 1 8 8 . 
71 N g . Y.W. a n d A v i z i e n i s . A . . ' A R I E S - A n A u t o m a t e d R e l i a b i l i t y 
E s t i m a t i o n S y s t e m f o r R e d u n d a n t D i g i t a l S t r u c t u r e s ' . P r o c . 1 9 7 7 
A n n u a l R e l i a b i l i t y a n d M a i n t a i n a b i l i t y S y m p o s i u m . U S A . J a n u a r y 
1 9 7 7 . p p . 1 0 8 - 1 1 3 . 
7 2 O ' B r i e n . F .J . . ' R o l l b a c k P o i n t I n s e r t i o n S t r a t e g i e s ' . 6 t h A n n u a l 
I n t e r n a t i o n a l F T C S . 1 9 7 6 . p p . 1 3 8 - 1 4 2 . 
73 O b a c - R o d a , V. a n d D a v i e s . O . J . , ' A s p e c t s o f F a u l t T o l e r a n t R i n g 
S t r u c t u r e s ' , IEE C o l l o q u i m . L o n d o n . D i g e s t N o . 1 9 8 2 / 6 7 , O c t o b e r 
1 9 8 2 , p p . 3 / 1 - 9 . 
7 4 O p p e n h e i m e r , C P . , ' R e l i a b l e D e s i g n s B e g i n w i t h t h e B a s i c s ' , C o m p u t e r 
D e s i g n . A u g u s t 1 9 8 3 . p p . 9 3 - 9 9 . 
75 P a p p u , R.V., H a r r i s . E. a n d Y a t e s . M. . ' S c r e e n i n g M e t h o d s a n d 
E x p e r i e n c e w i t h M O S M e m o r y ' . M i c r o e l e c t r o n i c s a n d R e l i a b i l i t y . 
V o l 17 . N o . 1 . 1 9 7 8 . p p . 1 9 3 - 2 0 0 . 
7 6 P e a r s o n . J . C . . ' R e l i a b i l i t y o f S m a l l D i g i t a l C o n t r o l l e r s ' , P h D 
T h e s i s , U n i v e r s i t y o f D u r h a m , 1 9 8 3 . 
1 7 3 
77 P e a r s o n , J . C . , H a l s e . R.G. a n d P r e e c e , C . ' R e l i a b l e D i g i t a l 
C o n t r o l l e r A r c h i t e c t u r e f o r G a s D i s t r i b u t i o n R e g u l a t o r s ' . 
R e l i a b i l i t y E n g i n e e r i n g , Vo l 8 , 1 9 8 4 , p p . 1 7 9 - 1 8 9 . 
78 P e c k e t t , D., ' F a u l t - F i n d i n g w i t h A i d o f S e l f - T e s t P r o g r a m s ' , 
P r a c t i c a l C o m p u t i n g , D e c e m b e r 1 9 7 9 , p p . 1 0 2 - 1 0 7 . 
7 9 P e l l e g r i n i , G . , R a i m o , A . a n d R e y n a u d . C , ' E M C P r o b l e m s in H.V. S u b -
S t a t i o n s ' , I E E E I n t e r n a t i o n a l S y m p o s i u m o n E l e c t r o m a g n e t i c 
C o m p a t a b i l i t y , 1 9 7 6 , p p . 1 0 6 - 1 0 9 . 
8 0 P r e e c e , C. a n d S t e w a r t T ,R. , ' M u l t i l e v e l F a u l t R e c o v e r y i n R e a l - T i m e 
D i g i t a l C o n t r o l l e r s ' , IEE C o l l o q u i m , L o n d o n , M a y 1 9 7 9 . 
8 1 P r e e c e , C , P e a r s o n , J .C . a n d H a l s e , R .G. , ' T h e I n t r o d u c t i o n o f F a u l t 
T o l e r a n c e i n t o D i g i t a l l y C o n t r o l l e d G a s R e g u l a t o r s ' . In t . G a s 
R e s e a r c h C o n f e r e n c e , L o n d o n . J u n e 1 9 8 3 . 
8 2 P y t c h e s , D. , ' Z i n c - A i r C e l l s : P o w e r S o u r c e o f t h e F u t u r e ' . 
E l e c t r o n i c s a n d P o w e r , V o l 2 9 , N o . 7 / 8 , J u l y 1 9 8 3 , p p . 5 7 7 - 5 8 0 . 
8 3 R a n d e l l , B., ' S y s t e m S t r u c t u r e f o r S o f w a r e F a u l t T o l e r a n c e ' , I E E E 
T r a n s . S o f t w a r e E n g i n e e r i n g , V o l S E - 1 , N o . 2 , J u n e 1 9 7 5 , p p . 
2 2 0 - 2 3 2 . 
8 4 R e e s e , S .E . , ' L o w P o i n t C o n t r o l S y s t e m R e d u c e s G a s L o s s e s ' , P i p e l i n e 
a n d G a s J o u r n a l , J u l y 1 9 7 5 , p p . 4 2 - 4 6 . 
8 5 R o b a c h , C S a u c i e r , G , a n d L e b r u n , J . , ' P r o c e s s o r T e s t a b i l i t y a n d 
D e s i g n C o n s e q u e n c e s ' , I E E E T r a n s . C o m p u t e r s , V o l C - 2 5 , N o . 
6 , J u n e 1 9 7 6 , p p . 6 4 5 - 6 5 2 . 
8 6 R o s t e k , P . M . , ' T e c h n i q u e s o f S h i e l d i n g a n d F i l t e r i n g D i g i t a l 
C o m p u t e r s f o r E M I S u s c e p t i b i l i t y ' , I E E E E l e c t r o m a g n e t i c 
C o m p a t i b i l i t y S y m p o s i u m R e c o r d , S a n A n t o n i o , U S A , O c t o b e r 1 9 7 5 , 
S e s s i o n 4 B , p p . e l - 7 . 
1 7 4 
8 7 R u s s e l l . P .J . . ' N o n - C o m m e r c i a l N o n - S t o p P r o c e s s i n g ' . C o m p u t e r 
S y s t e m s . D e c e m b e r 1 9 8 2 . p p . 4 7 - 4 9 . 
8 8 S e d m a k , R .M. a n d L i e b e r g o t . H . L . ' F a u l t - T o l e r a n c e o f a G e n e r a l 
P u r p o s e C o m p u t e r I m p l e m e n t e d by V e r y L a r g e S c a l e I n t e g r a t i o n ' . 
8 t h A n n u a l I n t e r n a t i o n a l F T C S . J u n e 1 9 7 8 , p p . 1 3 7 - 1 4 3 . 
8 9 S e q u i n . C . H . . ' I n s t r u c t i o n in M O S LSI S y s t e m s D e s i g n ' , I E E E C o m p u t e r . 
M a r c h 1 9 8 0 . p p . 6 7 - 7 3 . 
9 0 S e x t o n . F.W. e t a l . ' R a d i a t i o n T e s t i n g o f t h e C M O S 8 0 8 5 M i c r o -
p r o c e s s o r F a m i l y ' . I E E E T r a n s . N u c l e a r S c i e n c e . V o l N S - 3 0 . N o . 
6 . D e c e m b e r 1 9 8 3 . p p . 4 2 3 5 - 4 2 3 9 . 
9 1 S h e d l e t s k y . J . J . a n d M c C l u s k y , E .J . . ' T h e E r r o r L a t e n c y of a F a u l t i n 
a S e q u e n t i a l D i g i t a l C i r c u i t ' . I E E E T r a n s . C o m p u t e r s . V o l C - 2 5 . 
N o . 6 . J u n e 1 9 7 6 . p p . 6 5 5 - 6 5 9 . 
9 2 S h o o m a n . M.L. . ' T h e S p e c t r e o f S o f t w a r e R e l i a b i l i t y a n d i t s 
E x o r c i s m ' , P r o c e e d i n g s o f t h e J o i n t A u t o m a t i c C o n t r o l 
C o n f e r e n c e . 1 9 7 7 , p p . 2 2 5 - 2 3 1 . 
9 3 S i e w i o r e k . D.P. . ' T r a n s p a r e n c y in D i s t r i b u t e d , F a u l t T o l e r a n t 
C o m p u t i n g S y s t e m s ' , 14 th I n t e r n a t i o n a l C o m p u t e r S o c i e t y 
C o n f e r e n c e , p p . 2 7 6 - 2 7 8 . 
9 4 S i e w i o r e k , D .P . , K i n i , V. , J o o b b a n i , R. a n d B e l l i s . H. , ' A C a s e S t u d y 
o f C . m m p . C m * a n d C . v m p : P a r t l l - P r e d i c t i n g a n d C a l i b r a t i n g 
R e l i a b i l i t y o f M u l t i p r o c e s s o r S y s t e m s ' . P r o c . o f I E E E , V o l 6 6 . 
N o . 1 0 . O c t o b e r 1 9 7 8 , p p . 1 2 0 0 - 1 2 2 0 . 
9 5 S m i t h , A . L . , ' H a r d a n d S o f t F a i l u r e s in D y n a m i c R A M F a u l t T o l e r a n t 
M e m o r i e s ' , I E E E T r a n s . R e l i a b i l i t y , Vo l R - 3 0 , N o . 1 , A p r i l 
1 9 8 1 , p p . 5 8 - 6 0 . 
175 
9 6 S m i t h , D .H . . ' M i c r o p r o c e s s o r T e s t i n g - M e t h o d o r M a d n e s s ' . I E E E S e m i -
c o n d u c t o r T e s t S y m p o s i u m . 1 9 7 6 . p p . 2 7 - 2 9 . 
9 7 S o i . l .M . a n d Q o p a l , K.. ' S o m e A s p e c t s of R e l i a b l e S o f t w a r e 
P a c k a g e s ' , M i c r o e l e c t r o n i c R e l i a b i l i t y . Vo l 1 9 , 1 9 7 9 . p p . 3 7 9 -
3 8 6 . 
9 8 S p e a r m a n , C.A. , ' I m p r o v e d D i s t r i c t P r e s s u r e C o n t r o l ' , T h e I n s t i t u t i o n 
of G a s E n g i n e e r s . 8 t h M a r c h . 1 9 7 7 . 
9 9 S u l w a y , B.. ' A C E m e r g e n c y a n d U n i n t e r r u p t i b l e P o w e r S u p p l i e s ' . 
C o m m u n i c a t i o n s I n t e r n a t i o n a l , V o l 3 , D e c e m b e r 1 9 7 6 , p p . 6 2 - 6 5 . 
1 0 0 T a s a r . V., ' A n a l y s i s o f F a u l t D e t e c t i o n C o v e r a g e o f a S e l f - T e s t 
S o f t w a r e P r o g r a m ' , 8 t h A n n u a l In t . F T C S . J u n e 1 9 7 8 . p p . 6 5 - 7 1 . 
1 0 1 T e e t s . R .M. , ' P r o t e c t i n g M i n i c o m p u t e r s f r o m P o w e r L i n e 
P e r t u r b a t i o n s ' , C o m p u t e r D e s i g n , V o l 1 5 . J u n e 1 9 7 6 , p p . 9 9 - 1 0 4 . 
1 0 2 T h a t t e , S . M . a n d A b r a h a m , J .A . , ' T e s t i n g o f S e m i c o n d u c t o r R a n d o m 
A c c e s s M e m o r i e s ' . 7 t h A n n u a l In t . F T C S . 1 9 7 7 , p p . 8 1 - 8 7 . 
1 0 3 T o s c h i . E.A. a n d W a t a n a b e . T . , ' A n A l l - S e m i c o n d u c t o r M e m o r y w i t h 
F a u l t D e t e c t i o n , C o r r e c t i o n , a n d L o g g i n g ' , H P J o u r n a l , p p . 8 -
13 . 
1 0 4 T u r n e r , R.C., ' R e a l - T i m e P r o g r a m m i n g w i t h M i c r o c o m p u t e r s ' , L e x i n g t o n 
B o o k s , T o r o n t o , 1 9 8 0 . 
1 0 5 V o n N e u m a n n , J . , ' P r o b a b i l i s t i c s , L o g i s t i c s a n d t h e S y n t h e s i s o f 
R e l i a b l e O r g a n i s m s f r o m U n r e l i a b l e C o m p o n e n t s ' , A u t o m a t a 
S t u d i e s f r o m A n n a l s o f M a t h e m a t i c a l S t u d i e s N o . 3 4 , P r i n c e t o n 
U n i v e r s i t y P r e s s . 1 9 5 6 . p p . 4 3 - 9 9 . 
106 W a c h t e r . W . J . . ' S y s t e m M a l f u n c t i o n D e t e c t i o n a n d C o r r e c t i o n ' , 5 t h 
A n n u a l I n t e r n a t i o n a l F T C S , 1 9 7 5 , p p . 1 9 6 - 2 0 1 . 
176 
107 W a k e r l y , J . F . . ' M i c r o c o m p u t e r R e l i a b i l i t y I m p r o v e m e n t U s i n g T r i p l e -
M o d u l a r R e d u n d a n c y ' , P r o c . I E E E , V o l 6 4 , N o . 6 , J u n e 1 9 7 6 . p p . 
8 8 9 - 8 9 5 . 
108 W a l k e r . W . K . S . . S u n d b e r g , C.W. a n d B l a c k , C . J . , ' A R e l i a b l e 
S p a c e b o r n e M e m o r y w i t h a S i n g l e E r r o r a n d E r a s u r e C o r r e c t i o n 
S c h e m e ' , I E E E T r a n s . C o m p u t e r s , V o l C - 2 8 , N o . 7, J u l y 1 9 7 9 , p p . 
4 9 3 - 5 0 0 . 
109 W e i . A .Y . , ' R e a l T i m e P r o g r a m m i n g w i t h F a u l t T o l e r a n c e ' , P h D T h e s i s . 
U n i v e r s i t y o f i l i i n o i s , U S A , 1 9 8 1 . 
110 W e n s l e y , J . H . , ' S I F T - S o f t w a r e I m p l e m e n t e d F a u l t T o l e r a n c e ' , A F I P S 
C o n f e r e n c e P r o c e e d i n g s , Vo l 4 1 , P a r t 1 , 1 9 7 2 , p p . 2 4 3 - 2 5 3 . 
111 W e n s l e y . J . H . a n d Lev i t t . K .N. . ' A C o m p a r a t i v e S t u d y o f A r c h i t e c t u r e s 
f o r F a u l t - T o l e r a n c e ' , 4 t h A n n u a l I n t e r n a t i o n a l F T C S , J u n e 1 9 7 4 , 
p p . 4 - ( 1 6 - 2 1 ) . 
1 1 2 W e s t e r m e l e r . T . F . , ' R e d u n d a n c y M a n a g e m e n t o f D i g i t a l F l y - b y - W i r e 
S y s t e m s ' , P r o c . J o i n t A u t o m a t i c C o n t r o l C o n f e r e n c e , V o l 1 . 
1 9 7 7 , p p . 2 7 2 - 2 7 7 . 
113 W h a l i e n . J . J . , T r o n t , J . , L a r s o n , C .E . a n d R o e , J . M . , ' C o m p t e r - A i d e d 
A n a l y s i s o f RFI E f f e c t s in I n t e g r a t e d C i r c u i t s ' , I E E E E l e c t r o -
m a g n e t i c C o m p a t i b i l i t y S y m p o s i u m , J u n e 1 9 7 8 , p p . 6 4 - 7 0 . 
1 1 4 W i l l i a m s o n , I., ' D e s i g n o f S e l f - C h e c k i n g a n d F a u l t - T o l e r a n t M i c r o -
p r o g r a m m e d C o n t r o l l e r s ' , I E E E C o n f . C o m p u t e r S y s t e m s a n d 
T e c h n o l o g y , 1 9 7 7 , p p . 1 9 3 - 2 0 4 . 
115 W i l l i a m s o n , T. , ' D e s i g n i n g M i c r o c o n t r o l l e r S y s t e m s f o r E l e c t r i c a l l y 
N o i s y E n v i r o n m e n t s ' , I n t e l C o r p o r a t i o n , A p p l i c a t i o n N o t e A P -
1 2 5 , F e b r u a r y 1 9 8 2 . 
177 
116 W u l f . W.A . . ' R e l i a b l e H a r d w a r e / S o f t w a r e A r c h i t e c t u r e ' . I E E E T r a n s . 
S o f t w a r e E n g i n e e r i n g . V o l S E - 1 , N o . 2 , J u n e 1 9 7 5 . p p . 2 3 3 - 2 4 0 . 
1 1 7 Z i e g l e r . J . F . a n d L a n f o r d . J . F . . ' E f f e c t of C o s m i c R a y s o n C o m p u t e r 
M e m o r i e s ' . S c i e n c e . V o l 2 0 6 . N o v e m b e r 1 9 7 9 . p p . 7 7 6 - 7 8 8 . 
118 . ' T h e 8 0 8 0 A s A r e N o t A l l A l i k e ; Y o u S h o u l d K n o w t h e 
D i f f e r e n c e s ' . E l e c t r o n i c D e s i g n , 1 8 t h J a n u a r y 1 9 7 7 . p p . 4 1 - 4 2 . 
1 1 9 , ' M C S - 8 0 / 8 5 F a m i l y U s e r ' s M a n u a l ' , I n t e l C o r p o r a t i o n , 1 9 7 9 . 
120 , ' 4 8 - S e r i e s M i c r o c o m p u t e r s H a n d b o o k ' , N a t i o n a l S e m i c i n d u c t o r 
C o r p o r a t i o n , 1 9 8 0 . 
121 , ' M I L - H D B K - 2 1 7 D R e l i a b i l i t y P r e d i c t i o n o f E l e c t r o n i c 
E q u i p m e n t ' , U.S. D e p a r t m e n t o f D e f e n s e . J a n u a r y 1 9 8 2 . 
1 2 2 . ' H R D 3 H a n d b o o k o f R e l i a b i l i t y D a t a ' , B r i t i s h T e l e c o m . 
M a t e r i a l s a n d C o m p o n e n t s C e n t r e . B i r m i n g h a m , J a n u a r y 1 9 8 4 . 
178 
INLET 
P R E S S U R E 
OUTLET 
PRESSURE 
J ^ 
OUTLET INLET 
F i g u r e l . l T y p i c a l D i a p h r a g m O p e r a t e d R e g u l a t o r 
179 
LU 
CO 
a 
a 
CO 
CO 
HI 
CO 
CO 
CO 
LU 
X 0 a < 
A 
180 
Vcc 
MANUAL 
RESET 
Q Q 
« 8 8085 
N I 
a 
O 
< < |£r|52(E o i- S x <r < 
/ \ I I I M I 1 1 1 1 I I 
RESET IN 
GND 
Q Q Q 
Z Z 2 
(3(3(3 g 
> 
Vcc-
Vcc-
Vcc • 
Vcc vcc 
1> 
74LS04 74LS30 
A 11-15 ) A 11-15 
74LS30 
A 11-15 
A 8-10 
A 8-10 
PROG/CE 
8755A 
C E 
I0R 
Vdd 
CLK 
RESET 
IQ/M 
WR 
m 
ALE 
AD 0-7 
C E 
8155 
RESET 
10/M 
ALE 
AO 0-7 
F i g u r e 2,1 B l o c k D i a g r a m o f t h e 8085 T e s t S y s t e m 
181 
25 WAY D-CONNECTOR 
74LS30 
74LS04 
74LS30 
8 0 8 5 
6.144 
8 7 5 5 A 
8 1 5 5 
F i g u r e 2 .2 L a y o u t o f t h e C o m p o n e n t s 
182 
A D A D A D A D 4 12 10 8 
A D A D A D A D 
5 13 11 
R A M > S E L E C T 
A D A D A D A D A D 
15 13 12 11 4 
V Y Y 
Y Y 
> 5v 
E P R O M 
S E L E C T 
F i g u r e 2 .3 L o g i c D i a g r a m of t h e M e m o r y D e c o d i n g C i r c u i t r y 
183 
> 
ID 
> 
O 
Q 
o 
P 
«; £ 
C CO (0 c i : co O 
CM 
CC 
W W • ) -
O 
o 
o 
CO 
I o 
I 
<o 
o 
^ n u u n i_ O O ctj ~ ~ 
C CJ O 
z : CO <D 
C Q . Q . 
CO CO CO 
- J o o 
LL LL 
3 3 
3 2 
4 5 0 
0 0 0 
o 0 0 
° ^ 
CM CM Tj-
w. o 
O CM 
1— CM 
o J2 
— 10 
O CU 
Q. 
CO £ 
o I 
LL O 
<3- f 
O 
<n 
w 
CD 
LT 
£ .c 
O o 
m 2 
0 
CO 
O 
O 
c 
CD 
CD 
t_ 
O 
1— C M " - C M C O f u O i — CM»— 
( - l - O O O O O t E C E Q CM 
CO 
o 
co 
o 
CM 
o 
o lr 
7 
c 
> 
a a 
co 
CD 
o a. 
CO 
CD 
co 
E 
cu 
a 
CO 
3 
o 
O 
CM 
CD 
k. 
3 
a 
m r Y Y T 
CM 
H 
9 
/ 
( ) • 
LU 
1 8 4 
o o > 
i 
03 
CL 
Q . 
> 
I 
o 
LU 
[O 
I 
o 
Q 
2 
> 
co 
co 
co 
> 
> 
o 
CO 
Q I 
Q 
5 CO 
h - - J 
O 
CO 
Q 
CM 
CO 
CO 
I 
o 
O 
Q o 
• • • 
Q 
CO 
CO 
co 
CM o 
CM CM 0. a. CO 
o O CO 
O \ > 
co 
i— CJ o 
CO 
—I _l 
< < 1-X X 
UJ 
_l 
< 
CO 
r> 
CO 
< 
UJ 
t-
LU 
CO 
111 
GC 
z 
UJ 
CO 
Cu 
Q 
> 
o 
r 
2 
> 
o 
> 
to 
3 
Q. 
O 
CM 
U. 
Q . 
O 
CM 
E 
a> 
> 
co 
V> 
CO 
CO 
o 
CO 
CO 
o 
CO 
CD 
r . 
E 
w a 
CD 
O 
O 
GO 
CO 
CO 
3 
a 
LL 
1 8 5 
0 1 2 3 4 S 6 7 8 9 A B C D E F 
NOP 
tB. IC 
•MOP" 
IB , IC 
OUTL 
BUS.R 
IB.2C 
RUB 
R , *d 
2B .2C 
jnp 
QXX 
2B .2C 
EM 
1 
t B . t C 
•JNTF" 
add 
2B .2C 
DEC 
R 
IB , 1C 
IMS 
fl.BUS 
tB.2C 
IN 
fl.Pl 
1B.2C 
1M 
fl,P2 
IB.2C 
• 1M« 
R .P2 
1B.2C 
nOUD 
A.P4 
I B . 2 C 
nouo 
R,PS 
1B.2C 
nouD 
fl.PS 
1B.2C 
nouo 
A.P? 
IB . 2 C 
IMC 
9R0 
IB . IC 
IMC 
8RI 
IB, IC 
JBO 
add 
2B .2C 
RDDC 
R.ftd 
2B .2C 
CALL 
OXX 
2B.2C 
D\S 
1 
IB , 1C 
J T F 
add 
2B.2C 
IMC 
fl 
t B . l C 
INC 
RO 
IB. IC 
IMC 
Rt 
tB, tC 
IMC 
R2 
tfl, t c 
IMC 
R3 
IB . tC 
IMC 
R4 
I B , tC 
INC 
RS 
tB, tC 
INC 
R6 
IB , tC 
INC 
R7 
tB. tC 
XCH 
A.8R0 
IB, IC 
XCH 
R,«R1 
I f l.lC 
•MOP" 
t B . l C 
nou 
fl,#d 
2B.2C 
j n p 
t x x 
2B .2C 
EM 
TCMT1 
I B . tC 
JMTO 
add 
2B.2C 
CLR 
fl 
tB, 1C 
XCH 
fl.RO 
IB . 1C 
XCH 
fl.Rt 
I B , 1C 
XCH 
fl,R2 
t B . t C 
XCH 
FI.R3 
t B . l C 
XCH 
fl,R4 
I B , 1C 
XCH 
fl.RS 
IB, tC 
XCH 
fl.RG 
t B . t C 
XCH 
A,R7 
I B . tC 
XCHJJ 
A.iRO 
IB, 1C 
XCHD 
fl,8Rl 
t B . l C 
JB1 
add 
2B.2C 
•MOP' 
tB, 1C 
CALL 
1XX 
2B.2C 
D1S 
TCMTI 
IB , tC 
JTO 
add 
2B.2C 
CPL 
fl 
tB, tC 
•BUS' 
IDLE 
IB.2C 
OUTL 
PI .A 
IB.2C 
OUTL 
P2,fl 
1B.2C 
•OUTL' 
P2,fl 
IB.2C 
MOUD 
P4,fl 
IB.2C 
nouo 
PS,A 
tB,2C 
nOUD 
PS,A 
tB.2C 
MOUD 
P?,A 
IB.2C 
ORL 
fl.iRO 
IB, tC 
ORL 
fl,8Rt 
IB , 1C 
nou 
fl,T 
tB.t C 
ORL 
fl,»d 
2B.2C 
JMP 
2XX 
2B.2C 
STR7 
CM7 
IB.1C 
JNTl 
add 
2B.2C 
SUflP 
fl 
IB, I C 
ORL 
R.RO 
IB, tC 
ORL 
A,R1 
I B . 1C 
ORL 
fl,R2 
IB, I C 
ORL 
R,R3 
IB . tC 
ORL 
A.R4 
I B , tC 
ORL 
fl.RS 
t B . l C 
ORL 
fl,R6 
t B . l C 
ORL 
A.R? 
I B , I C 
HfiL 
fl,«RO 
IB , 1C 
RML 
fl.SRl 
tB, I C 
JB2 
add 
2B.2C 
RML 
R,#d 
2B.2C 
CRLL 
2XX 
2B .2C 
STRT 
T 
IB , tC 
J T l 
add 
2B.2C 
DA 
fl 
IB, tC 
RNL 
R.RO 
t B . l C 
AML 
A,R1 
I B , I C 
RML 
fl,R2 
tB, IC 
RNL 
R.R3 
tB, IC 
flML 
A.R4 
I B . t C 
RNL 
fl.RS 
tB, tC 
flNL 
fl.RS 
IB , IC 
ANL 
A.R7 
I B , tC 
ADD 
R.gRO 
IB , IC 
RUB 
A.8RI 
IB , I C 
n o u 
T,fl 
I B . I C 
• NOP' 
I B . I C 
JttP 
3XX 
2B.2C 
STOP 
TCMT 
I B . I C 
•JMFl" 
add 
2B.2C 
RRC 
fl 
tB, tC 
ADD 
R.RO 
Ifl, 1C 
HDD 
H.Rl 
I B , 1C 
ADD 
fl,R2 
tB, tC 
ADD 
R.R3 
IB, tC 
ADD 
A.R4 
tB. tC 
ADD 
fl.RS 
tB. tC 
ADD 
A,R6 
IB , tC 
ADD 
H.R7 
t B . t C 
RDDC 
A.BRO 
IB , 1C 
RflflC 
R.flRt 
tB, tC 
JB3 
add 
2B.2C 
•MOP» 
t B . l C 
CALL 
3XX 
2B.2C 
EM70 
CLK 
t B . t C 
J F l 
add 
2B.2C 
RR 
R 
tB, tC 
RDDC 
fl.RO 
I B . t C 
ADDC 
A,Rl 
tB, tC 
RDDC 
fl,R2 
I B , 1C 
RDDC 
R.R3 
IB , 1C 
RDDC 
A.R4 
I B , tC 
RDDC 
R .RS 
IB, 1C 
ADDC 
fl.RG 
I B , 1C 
ADDC 
A.R7 
I B . IC 
noux 
R.8R0 
1B.2C 
noux 
R.SRt 
1B.2C 
• MOP» 
1B.1C 
RET 
IB.2C 
JttP 
4XX 
2B.2C 
CLR 
FO 
I B . tC 
JM1 
add 
2B.2C 
• MOP" 
IB , tC 
ORL 
BUS.#d 
2B.2C 
ORL 
Pt , # d 
2B . 2C 
ORL 
P2 ,#d 
2B.2C 
•ORL« 
P2,#d 
2B.2C 
ORLD 
P4.A 
IB . 2 C 
ORLD 
PS.fl 
IB . 2 C 
ORLD 
P6 ,A 
IB.2C 
ORLD 
P7,A 
1B.2C 
noux 
SRO.fl 
IB.2C 
noux 
e R i . f l 
1B.2C 
JB4 
add 
2B.2C 
RETR 
tB,2C 
CRLL 
4XX 
2B .2C 
CPL 
FO 
tB. 1C 
JM2 
add 
2B.2C 
CLR 
C 
I B , 1C 
RNL 
BUS,#d 
2B.2C 
ANL 
P l , # d 
2B.2C 
flNL 
P2 ,#d 
2B.2C 
• flNL' 
P2 ,#d 
2B.2C 
AMLD 
P4.A 
tB,2C 
RMLD 
PS, A 
IB.2C 
AMLD 
PG,A 
1B.2C 
AMLD 
P7.A 
IB .2C 
nou 
8R0.R 
IB. 1C 
nou 
8Rl , A 
tB, IC 
• NOP' 
IB, 1C 
noup 
R.8A 
1B.2C 
JNP 
5XX 
2B.2C 
CLR 
F t 
I B . I C 
•JMFO" 
add 
2B.2C 
CPL 
C 
tB. 1C 
nou 
R0.fi 
I B . I C 
nou 
Rt ,A 
I B , 1C 
nou 
R2 ,A 
IB, IC 
nou 
R3,fl 
tB, tC 
nou 
R4.fl 
tB , I C 
nou 
RS .A 
IB, tC 
nou 
R6 .A 
I B . I C 
nou 
R7,A 
tB, 1C 
nou 
8R0,»d 
2B.2C 
n o u 
8Rt , #d 
2B.2C 
JBS 
add 
2B.2C 
JnPP 
8A 
tB,2C 
CALL 
SXX 
2B.2C 
CPL 
Ft 
t f l.lC 
JFO 
add 
2B.2C 
•NOP' 
1B.1C 
nou 
R0.»d 
2B.2C 
nou 
Rt ,ftd 
2B.2C 
nou 
R2,»d 
2B.2C 
nou 
R3 .»d 
2B.2C 
nou 
R4 ,#d 
2B.2C 
nou 
RS,»d 
2B.2C 
nou 
R6 .#d 
2B.2C 
nou 
R7,#d 
2B.2C 
•NOP' 
IB . 1C 
•MOP' 
IB, 1C 
•MOP' 
tB. tC 
•MOP» 
tB. tC 
JMP 
e x x 
2B .2C 
SEL 
RBO 
IB , tC 
JZ 
add 
2B .2C 
nou 
fl.PSU 
I B . t C 
DEC 
RO 
IB . 1C 
DEC 
Rl 
I B , tC 
DEC 
R2 
tB, IC 
DEC 
R3 
IB , 1C 
DEC 
R4 
I B , tC 
DEC 
RS 
IB, tC 
DEC 
R6 
IB, tC 
DEC 
R7 
tB, tC 
XRL 
R.SRO 
I B . I C 
XRL 
R.8R1 
t B . l C 
JB6 
add 
2B.2C 
XRL 
fl,#d 
2B.2C 
CALL 
e x x 
2B.2C 
SEL 
RBI 
tB, tC 
•JHPP" 
add 
2B.2C 
n o u 
PSU.fl 
tB, I C 
XRL 
R.RO 
1B.IC 
XRL 
fl.Rt 
I B , 1C 
XRL 
R.R2 
tB, tC 
XRL 
R,R3 
IB , 1C 
XRL 
fl,R4 
tB, tC 
XRL 
R.RS 
l f l . t C 
XRL 
A.R6 
1B.1C 
XRL 
A.R7 
I B , 1C 
•NOP* 
tB. tC 
•MOP' 
tB, IC 
•MOP' 
I B . I C 
M0UP3 
fl.8fl 
IB.2C 
j n p 
7XX 
2B.2C 
SEL 
HBO 
I B , tC 
JMC 
add 
2B .2C 
RL 
A 
IB , tC 
OJNZ 
RO.add 
2B.2C 
DJMZ 
Rt . a d d 
2B .2C 
DJN2 
R2 ,adc 
2B.2C 
DJMZ 
R3 ,adc 
2B.2C 
DJMZ 
R4,adc 
2B . 2C 
DJMZ 
RS .adC 
2B .2C 
DJMZ 
R6,add 
2B.2C 
DJN2 
R?,add 
2B.2C 
nou 
R.SRO 
IB, 1C 
nou 
fl.SRt 
t B . l C 
JB7 
add 
2B.2C 
•MOP" 
IB . t C 
CALL 
7XX 
2B.2C 
SEL 
nBt 
tB, tC 
JC 
add 
2B.2C 
RLC 
A 
tB, IC 
nou 
R,R0 
tB. IC 
nou 
fl.Rt 
tB, tC 
nou 
R.R2 
1B.IC 
nou 
R .R3 
tB, tC 
nou 
R .R4 
tB. tC 
nou 
fl.RS 
I B . 1C 
nou 
fl,R6 
IB. 1C 
nou 
A,R7 
tB, 1C 
« - INDIRECT ADDRESSING 
# - innEDlflTE ADDRESS1MG 
B - BYTES 
C - CYCLES 
add - ADDRESS 
d - DATA 
• • - UNDECLARED INSTRUCTION 
F i g u r e 3 .2 F u l l I n s t r u c t i o n S e t f o r t h e 8 0 4 8 M a n u f a c t u r e d bv I n t e l 
1 8 6 
0 1 2 3 4 S S 7 8 S A B C D E F 
MOP 
13, 1C 
•MOP* 
1B.IC 
0U7L 
BUS.fl 
1B.2C 
ROB 
A.#d 
2B.2C 
JOP 
OXX 
2B.2C 
EM 
1 
IB, I C 
•JM7F« 
add 
2B.2C 
DEC 
A 
IB, tC 
IMS 
A. BUS 
1B.2C 
1M 
A.Pt 
1B.2C 
1M 
A.P2 
IB, 2C 
•1M« 
A.P2 
1B.2C 
nouo 
A.P4 
1B.2C 
HOUD 
A. PS 
1B.2C 
nouo 
A, PS 
1B.2C 
nouo 
A.P7 
IB.2C 
IMC 
«R0 
13. IC 
IMC 
«R1 
\B.\C 
J BO 
add 
2B.2C 
AODC 
A ,*d 
2B.2C 
CALL 
OXX 
2B.2C 
BIS 
1 
I B , IC 
J7F 
add 
2B, 2C 
IMC 
A 
1B.IC 
IMC 
RO 
IB. IC 
IMC 
Rl 
IB, 1C 
IMC 
R2 
IB, 1C 
IMC 
R3 
IB, 1C 
IMC 
R4 
IB. IC 
IMC 
R S 
IB.IC 
IMC 
RS 
I B . IC 
IMC 
R7 
IB. 1C 
XCH 
fl.BRO 
\B, 1C 
XCH 
H. i R t 
1B.IC 
•nou« 
A.PC+l 
IB, IC 
nou 
fl,#d 
2B.2C 
j n p 
t x x 
2B.2C 
EM 
7CN71 
IB, IC 
JM70 
add 
2B.2C 
CLR 
A 
1B.1C 
XCH 
fl.RO 
IB.tC 
XCH 
A.RI 
I B , IC 
XCH 
A.R2 
IB , I C 
XCH 
A.R3 
IB. IC 
XCH 
A.R4 
IB. 1C 
XCH 
A.RS 
IB.1C 
XCH 
R.R6 
IB, 1C 
XCH 
A.R7 
IB, I C 
XCHD 
A.8RO 
IB, I C 
XCRD 
H , « R t 
1B.1C 
JB1 
add 
2B.2C 
•MOP« 
IB.1C 
CALL 
t x x 
2B.2C 
OIS 
TCMT1 
I B , 1C 
J70 
add 
2B.2C 
CPL 
A 
1B.1C 
•BUS* 
IDLE 
IB.2C 
0U7L 
Pt.A 
1B.2C 
0U7L 
P2.A 
I B , 2C 
•0U7L* 
P2,A 
IB.2C 
nouB 
P4.A 
1B.2C 
n o u o 
PS, A 
IB.2C 
n o u o 
PS, A 
IS, 2C 
n o u o 
P7.A 
IB, 2C 
ORL 
fl,«R0 
13,1C 
ORL 
A , « R l 
IB, 1C 
nou 
fl,T 
IB, IC 
ORL 
fl,»d 
2B.2C 
j n p 
2XX 
2B.2C 
S7R7 
c m 
IB, 1C 
JM71 
add 
2B.2C 
SUAP 
A 
1B.1C 
ORL 
R.RO 
IB.tC 
ORL 
R . R t 
13. 1C 
ORL 
A.R2 
IB, 1C 
ORL 
A.R3 
IB.1C 
ORL 
R.R4 
1B.1C 
ORL 
R . R S 
1B.1C 
ORL 
R.R6 
13,1C 
ORL 
A.R7 
IB. 1C 
ANL 
A.8R0 
IB, tC 
AML 
A.8R1 
1B.1C 
JB2 
add 
2B.2C 
AML 
R,«d 
2B.2C 
CALL 
2XX 
2B.2C 
S7R7 
7 
IB, 1C 
J71 
add 
2B.2C 
DA 
A 
I B , IC 
ANL 
fl.RO 
1B.1C 
AML 
A . R l 
I B . I C 
AML 
A.R2 
IB.tC 
AML 
R.R3 
IB, 1C 
RML 
R.R4 
1B.1C 
AML 
A.RS 
IB, IC 
ANL 
A.R6 
13. IC 
AML 
A.R7 
IB, I C 
AW) 
A,«R0 
I B . I C 
ADO 
A.tRl 
I B , t C 
nou 
7.A 
I B.1C 
•MOP» 
13,IC 
j n p 
3XX 
2B.2C 
S70P 
7CM7 
1B.1C 
«JMF1« 
add 
2S.2C 
RRC 
A 
l B . t C 
ABB 
fl.RO 
1B.IC 
ROD 
A.Rl 
13.1C 
ROD 
A.R2 
1 3 , I C 
ABB 
A.R3 
13, IC 
ROB 
A.R4 
13.1C 
ROD 
A,RS 
13,1C 
RDD 
R.R6 
13.1C 
ADD 
A,R7 
13, I C 
RDDC 
A.WO 
IB, 1C 
RDDC 
fl,«Rt 
1B.1C 
JB3 
add 
2B.2C 
»NOP» 
1B.IC 
CALL 
3XX 
2B.2C 
EM70 
CLK 
IB, I C 
JFl 
add 
2B.2C 
RR 
A 
IB, tC 
RBBC 
fl.RO 
IB.tC 
RDDC 
A . R l 
13,1C 
AD3C 
A.R2 
13, IC 
RDDC 
A.R3 
1B.1C 
RDDC 
A.R4 
IB.IC 
RDDC 
A . R S 
IB. 1C 
RDBC 
A.RS 
13, IC 
RDDC 
R.R7 
I B , I C 
naux 
A,«R0 
\B.2C 
noux 
A.IRl 
13,2C 
•NOP» 
IB.1C 
RE7 
IB.2C 
JHP 
4XX 
2B.2C 
CLR 
FO 
1B.1C 
JM1 
add 
2B.2C 
• CLR* 
A4-A7 
I B . t C 
ORL 
BUS,#o 
2B.2C 
ORL 
P l , # d 
23,2C 
ORL 
P2 . #d 
2B.2C 
•ORL' 
P2 ,#d 
2fl, 2C 
ORLB 
P4.A 
1B.2C 
ORLD 
PS.A 
1B.2C 
ORU) 
PS,A 
I B . 2C 
ORLD 
P7.A 
I B , 2C 
n o u x 
«R0,A 
\B.2C 
n o u x 
•Rt.A 
13.2C 
JB4 
add 
2B.2C 
RE7R 
1B.2C 
CALL 
4XX 
2B.2C 
CPL 
FO 
I B , I C 
JM2 
add 
2B.2C 
CLR 
C 
IB . t C 
RML 
BUS.td 
2B.2C 
AML 
P l . t d 
2B.2C 
AML 
P2 ,#d 
2B.2C 
•AML» 
P2 , * d 
23,2C 
AMU) 
P4,A 
1B.2C 
AML3 
PS, A 
IB.2C 
RMLB 
PS,A 
13.2C 
ANLD 
P7,A 
1B.2C 
n o u 
8R0.A 
\B, 1C 
nou 
8R1.A 
IB, 1C 
•MOP* 
IB.1C 
n o u p 
n , « n 
IB.2C 
Jnp 
5XX 
23,2C 
CLR 
F t 
1B.IC 
•JNFO' 
add 
2B.2C 
CPL 
C 
1B.1C 
n o u 
RO.A 
I B . I C 
n o u 
R l.A 
13, IC 
nou 
R2.A 
13,1C 
nou 
R3.A 
IB, 1C 
nou 
R4.A 
I B . 1C 
nou 
RS,A 
13,IC 
n o u 
R6.A 
13.1C 
nou 
R7.A 
13,1C 
nou 
8R0 , td 
2B.2C 
n o u 
•Rl,#d 
2B.2C 
JBS 
add 
2B.2C 
j n p p 
«R 
IB.2C 
CALL 
5XX 
2B.2C 
CPL 
F t 
tB, tC 
JFO 
add 
2B.2C 
•NOP* 
tB. tC 
nou 
R0 ,«d 
2B.2C 
n o u 
Rt , # d 
2B.2C 
nou 
R2 . t d 
2J.2C 
nou 
R3 ,«d 
23. 2C 
n o u 
R4 ,»d 
23,2C 
n o u 
RS,»d 
23.2C 
nou 
R6.#d 
23.2C 
n o u 
R7 , «d 
2B.2C 
•DEC' 
«R0 
IB, IC 
•BEC» 
«R1 
IB, IC 
•MOP» 
IB, IC 
•MOP' 
I B , 1C 
JHP 
6XX 
2B.2C 
SEL 
R30 
IB, I C 
J2 
add 
2B.2C 
nou 
A.PSU 
IB, IC 
DEC 
RO 
IB.tC 
DEC 
R l 
13,1C 
BEC 
R2 
IB.IC 
DEC 
R3 
13. 1C 
BEC 
R4 
13.1C 
BEC 
RS 
IB, 1C 
DEC 
RS 
IB, 1C 
DEC 
R7 
13, I C 
XRL 
A.8R0 
1B.1C 
XRL 
A.8R1 
1B.1C 
JB6 
add 
2B.2C 
XRL 
A,»d 
23,2C 
CALL 
6XX 
23,2C 
SEL 
RBI 
1B.1C 
•JMPP« 
add 
2B.2C 
nou 
PSU.A 
I B , 1C 
XRL 
R.RO 
1B.1C 
XRL 
A . R l 
IB, 1C 
XRL 
A.R2 
13 . t C 
XRL 
A.R3 
13, IC 
XRL 
A.R4 
1B.1C 
XRL 
A.RS 
13.1C 
XRL 
A.R6 
13.IC 
XRL 
A.R7 
IB. 1C 
•DJM2» 
tRO,ad 
23.2C 
•DJN2' 
SRl .ad 
2B.2C 
•MOP» 
IB.1C 
HOUP3 
fl.SA 
1B.2C 
j n p 
7XX 
2B.2C 
SEL 
nBO 
IB, 1C 
JMC 
add 
2B.2C 
RL 
A 
IB. 1C 
DJM2 
RO.add 
2B.2C 
DJN2 
R l . a d d 
23.2C 
BJM2 
R2,ado 
23.2C 
DJM2 
R3 .add 
23,2C 
0JM2 
R4 ,add 
23,2C 
BJM2 
RS.add 
23.2C 
BJM2 
R6.add 
23.2C 
DJNZ 
R7.add 
23.2C 
nou 
H.SRO 
\B, IC 
nou 
A, i R t 
13,1C 
JB7 
add 
2B.2C 
•NOP' 
IB.1C 
CALL 
7XX 
23,2C 
SEL 
nBi 
\B, IC 
JC 
add 
2B.2C 
RLC 
A 
I B . I C 
n o u 
R,R0 
IB.IC 
nou 
R.Rl 
I B . I C 
nou 
A,R2 
1 3 , K 
nou 
R.R3 
IB. 1C 
nou 
A.R4 
I B . I C 
nou 
R.RS 
IB, 1C 
nou 
R,RS 
13. IC 
nou 
A.R7 
I B . I C 
« - 1MD1REC7 ADDRESSIMG 
• - lnn£DlA7E ADDRESSING 
B - BY7ES 
C - CYCLES 
add - BBDRESS 
d - DR7A 
• • - UNDECLARED 1NS7RUC710M 
F i g u r e 3 .3 F u l l I n s t r u c t i o n S e t f o r t h e 8 0 4 8 M a n u f a c t u r e d by N E C 
187 
100 
UJ 
f- 30 
RETURN EXECUTION i n 
80 DATA AREA 
70 
9 SO 
jun? 
50 
40 
30 
RESTART 20 
l—I l l l I l—I—i—I—I—i—i—i—i—i F " H I—1—1—I—I—I—t—I I—I—I—I—I—I—I—I—1 
20 2S 30 35 40 
INSTRUCTIONS EXECUTED AFTER ERROR 
F i g u r e 4 . 1 (a) E r r o n e o u s E x e c u t i o n in D a t a A r e a s o f t h e 8 0 8 5 
100 
UJ 
RETURfl F 30 
in 
EXECUTION I N 
80 DATA AREA 
70 
° SO 
JUNP 50 
40 
30 
20 RESTART 
HALT 
l—l—) t—I—I I I I—t—<—t—I I—I—I—I—I—I—I—I—I—I—I I—I 
0 5 10 IS 20 25 30 35 40 
INSTRUCTIONS EXECUTED AFTER ERROR 
F i g u r e 4 .1 (b) E r r o n e o u s E x e c u t i o n in t h e D a t a A r e a s o f t h e 6 8 0 0 
188 
100 
RETURN 
F 30 
EXECUTION IM 
80 DATA AREA 
70 
5 SO 
50 JUNP 
40 
30 
20 
10 
t * • *—I—I—l—I—l—l i i—l—(—t—i—i—i—I l—l—l l—l l—l t l — i — l — i — i — t 
10 IS 20 25 30 3S 40 
INSTRUCTIONS EXECUTED AFTER ERROR 
Figure 4.1 (c) Erroneous Execution in the Data Areas of the NEC 8048 
too 
RETURfl Ui 
F 30 
in 
EXECUTION I ft 
60 DATA AREA 
70 
9 60 
SO jurip 
40 
30 
20 
10 
0 • — I — I — I — I — I — I — I — I — I — t — i — I I I • • < — I — I — I — ( — j 1 — I — I — I — I 1 1 — i — I — I — I — I 1—H— I — I I I ( 
0 S 10 IS 20 2S 30 35 40 
IMSTRUCTIO/IS EXECUTED AFTER ERROR 
Figure 4.1 (d) Erroneous Execution in the Data Areas of the Intel 8048 
189 
100 junp 
so EXECUTION in 
DflTH HREfl 
SO 
70 
60 
RESTART SO 
40 
30 
20 
i 
0 S 10 IS 20 
INSTRUCTIONS EXECUTED AFTER ERROR 
Figure 4.1 (e) Erroneous Execution in the Data Areas of the 68000 
^ERRONEOUS JUMP 
C ) HALT RETURN 
READ 
DATA 
BYTE 
) UNSPECIFIED RESTART JUMP 
Figure 4.2 Flow of Execution in Random Data 
190 
10 20 30 40 50 60 70 80 30 100 
2 OUERHEflD OF EXTRA flErlORY 
Figure 4.3 Recovery Improvements Obtained by Seeding the Data Areas 
10 
in r 
2 3 t-u 3 
in r 
B + 
7 + 
5 E f 
5 •• 
4 •• 
3 •• 
2 •• 
1 -• 
0 
8048 
6800 
8085 
•H 1 1 I 1 » I I I 1 H 
EO 60 70 80 30 100 
2 OUERHEftD OP EXTRA flETiORY 
— i — 
10 
—(— 
20 
—t— 
30 
—t— 
40 
Figure 4.4 Average Number of instructions Executed with Seeded Data Areas 
191 
ERRONEOUS JUMP 
C DX 
C RESUME TXX VALID INSTRUCTIONS 
TXX 
Figure 5.1 Erroneous Jump into a Program Area 
ERRONEOUS JUMP 
HALT 
DX 
RESTART 
RESUME 
TXX VAL D 
INSTRUCTIONS 
RANDOM 
UMP 
TXX 
C RETURN 
Figure 5.2 Flow of Erroneous Execution in Program Areas 
192 
RETURN 
100 
RflMUon Jun? OPERRftD 
FIELD F 30 
in 
80 
^ 70 
° SO 
50 
RESUME 40 
30 
20 
to 
1 
INSTRUCT10nS EXECUTED AFTER ERROR 
Figure 5.3 (a) Erroneous Execution in Program Areas of the 8085 
RETURN 
100 
Rflwjon junp Li 
f= 30 QPERHWJ RESTART in 
FIELD 
80 
70 
° 60 
SO 
RESUT1E 40 
30 
20 
10 
INSTRUCTIONS EXECUTED AFTER ERROR 
Figure 5.3 (b) Erroneous Execution in the Program Areas of the 6800 
193 
100 
£ 30 
10 
80 + 
70 + 
S GO + 
50 •• 
40 •• 
30 •• 
20 •• 
10 •• 
0 
OPERAND 
FIELD 
RANDOM jOnT" 
RESUME 
-t- -» 
3 4 S 
INSTRUCT I OriS EXECUTED AFTER ERROR 
Figure 5.3 (c) Erroneous Execution in the Program Areas of the 8048 
RESUME 
VALID 
NSTRUCTIONS 
ERRONEOUS J 
HALT 
OPERAND 
FIELD 
RETURN 
Figure 5.4 Simplified Flow of Execution in Program Areas 
194 
100 
.RflfiDon junr 
OPERAND 
F I E U ) 
3 4 5 
INSTRUCTIONS EXECUTED AFTER ERROR 
Figure 5.5 Erroneous Execution in Program Areas of the 68000 
23 u n JPOPULATED AREAS 
FFF 
COO 
BFF 
800 
7FF 
400 
3FF 
000 
D 
Figure 6.1 Common Memory Arrangements for the 8048 
195 
ERRONEOUS JUMP 
UNUSED 
AREAS 
DATA 
AREAS 
HALT 
LOOP 
INPUT 
AREAS 
PROGRAM 
AREAS 
G 0 RESUME 
RECOVER 
Figure 7.1 Flow of Execution Between Different Memory Areas 
196 
8085 UITH RECOVERY ROUTINE ONLY 
100 
LOOP 
30 
3 
b 80 HALT 
- 70 
5 60 
RESUHE M 50 
40 
30 
20 
RECOUER 
0 32 24 40 56 64 48 
SIZE OF PROGRAfl AREA IN KILOBYTES 
8085 UITH FAULT TOLERANT PROGRAM AREA 
100 
Li LOOP 
30 
fe 80 HALT 
- 70 
m 60 
RESUNE 
N 50 
40 
30 
20 
RECOUER 
I 
56 64 32 40 48 24 8 
SIZE OF PROGRAfl AREA Ifl KILOBYTES 
Fiaure 7.2 The Effects of Addina Fault Tolerance to the 
Proaram Areas of the 8085 
197 
SO 85 UITH RECOUERY ROUTINE ONLY 
100 
Id LOOP 
30 
HALT BO 
70 
5 60 
M 50 
RESUME 
40 
30 
RECOUER 20 
I 
16 24 32 40 48 S6 64 
SIZE OP DATA AREA in KILOBYTES 
BOBS UITH FAULT TOLERANT DATA AREA 
Ui LOOP 
30 
MALT 
it 80 
70 
5 60 
M SO RESUHE 
40 
RECCUER 
30 
20 
0 ' t i l t 1 1 1 l i 1 1 1 l 1 1 » 
0 8 16 24 32 40 48 S6 64 
SIZE OF DATA AREA Ifl KILOBYTES 
Figure 7.3 The Effects of Adding Fault Tolerance to the 
Data Areas of the 8085 
198 
BOSS U1TH RECOUERY ROUTINE ONLY 
too 
u 
i 
30 
£ 80 
- 70 
HALT 
s 
S 60 
REsurte 
M SO 
40 
30 
20 
10 RECOUER 
0 t 
8 16 24 32 40 48 56 64 
SIZE OF UNUSED AREA IN KILOBYTES 
8085 UITH FAULT TOLERANT UNUSED AREA 
100 
30 
& 80 
REsurte 
70 
oo 60 
SO 
40 
RECOUER 
30 
20 
10 
i t t 
64 56 48 32 40 24 8 
SIZE OF UNUSED AREA IN KILOBYTES 
Fiaure 7.4 The Effects of Adding Fault Tolerance to the 
Unused Memory Areas of the 8085 
199 
(a) Results for Different Sizes of Data Areas 
30 
Ul NON—FAULT TOLERANT 
UJ 
x 26 ui 
y 20 
in 
" 15 is 
FAULT TOLERANT 
UJ 
10 
0 
64 56 46 32 40 16 24 0 
SIZE OF DATA AREA in KILOBYTES 
(b) Results for Different Sizes of Unused Areas 
70 T 
UJ 
Ul 60 
in 
SO 
ui E 40 MOM-FAULT TOLERANT 
19 
30 Ul 
20 
FAULT TOLERANT 
0 4 1 1 1 1 1 — l 1 » 
0 8 16 24 32 40 48 56 64 
SIZE OF UNUSED AREA IN KILOBYTES 
Figure 7.5 The Effects on the Average Number of Instructions Executed 
by Adding Fault Tolerance to the 8085 
200 
6800 UITH RECOUERY ROUT1ME OHLY 
100 
LOOP 
u 
§ 
SO HflLT 
80 
70 
3 60 
SO 
RESUME 
40 
30 
20 
RECOUER 
64 S6 48 32 24 40 
SIZE OF PROGRAM AREA 1M KILOBYTES 
6800 UITH FAULT TOLERAMT PROGRAM AREA 
100 T 
u c o 
" 30 + 
& 80 + 
> 
at 
£ 60 + o at 
N SO + 
40 + 
'LOOP 
HALT 
RESUME 
30 •• 
20 
10 
0 
RECOUER 
—I— 
16 
-t- - t - -+- •+-
24 32 40 48 SG 
SIZE OF PROGRAM AREA IN KILOBYTES 
64 
Figure 7.6 The Effects of Adding Fault Tolerance to the 
Program Areas of the 6800 
201 
6800 UITH RECOVERY ROUTINE ONLY 
too 
ui LOO? 
SO 
80 
HALT 
70 
n 60 
» 50 RESUME 
40 
30 
20 
10 
RECOUER 
24 32 40 48 56 64 
SIZE OF DATA AREA IN KILOBYTES 
6800 UITH FAULT TOLERANT BATA AREA 
LOOP 
100 HALT 
SO 
B: SO 
70 
RESUNE 
« 60 
M SO 
40 
RECOUER 30 
20 
0 J 1 1 1 1 1 1 1 > 1 - 1 ' 1 ' ' — 
0 8 16 24 32 40 48 56 
SIZE OF DATA AREA IN KILOBYTES 
Figure 7.7 The Effects of Adding Fault Tolerance to the 
Data Areas of the 6800 
202 
6800 UITH RECOUERY ROUTIDE OMLY 
100 
LOOP 
HALT SO 
t 80 
70 
m 60 
RESUME 
M SO 
40 
30 
20 
10 
RECOUER 
t 
64 32 56 8 24 48 40 
SIZE OF UMUSEJ) AREA in KILOBYTES 
6800 UITH FAULT TOLERANT UflUSED AREA 
100 
HALT 
i 
so 
fe 80 
70 
RESUnt 5 60 
M SO 
40 
RECOUER 
30 
20 
10 
I 
64 56 48 32 24 40 16 8 
SIZE OF UnUSEB AREA in KILOBYTES 
Fiaure 7.8 The Effects of Addina Fault Tolerance to the 
Unused Memory Areas of the 6800 
203 
100 
30 
80 
* 70 
> 60 
B REGISTER 
is SO 
40 
RCCUTlULflTOR 
30 
20 
10 
i 
30 25 20 10 
MUflflER OF INSTRUCTIONS EXECUTED 
Figure 7.9 Probability of Data Corruptions in the 8085 
204 
c c c c 
8000 
7FFF 
7000 
6FFF 
6000 
5FFF 
5000 
4FFF 
4000 
3FFF 
3000 
2FFF 
2000 
1FFF 
1000 
OFFF 
0000 
(32K) UNUSED 
(4K) SINGLE 8 BIT INPUT PORT 
(4K) SINGLE 8 BIT INPUT PORT 
(4K) 
<4K) 
SINGLE 8 BIT INPUT PORT 
(4K) UNUSED 
SINGLE 8 BIT INPUT PORT 
(4K) FOUR 8 BIT OUTPUT PORTS (256 BYTE BLOCKS) 
(4K) 2K RAM (APPEARS TWICE) 
(4K) 4K EPROM 
Figure 8.1 Memory Map of the Specific System Studied 
205 
3 > ALE INT SO SI 
Figure 8.2 Wait State Recognition Circuit 
C T > 1 CE CE CE 
ALE INT 
IO/M 
SI SO 
Figure 8.3 Circuit to Detect an Illegal instruction Fetch 
206 
WR 
CE INT 1 
CE 
CE 
Figure 8.4 Circuit to Detect a Write into ROM 
WR 
INT 
Figure 8.5 Circuit to Detect a Write Outside the RAM Areas 
207 
> B 
D 
Figure 9.1 Logic Required to Detect Operation Code Fetches 
> SO l a SI 
3 > TIMER IN 2a 
Id IO/M 
2d 
RD i 
Figure 9.2 Implementation of Logic on Test System 
208 
4 cn 
t r 
Q. to 
co 
CO 
o 
CD P CO 
UJ co ca 
CO 
o 
A 
to CD 
CO 
a 
CO 
Q. (a "11/ 
A CO 
CO C5 
UJ 
CO LU P 01 
o 
<3=i CM CM UJ 
5 co 
2 
CO CO 
CO 
209 
START 
SEND MESSAGE. 
READ IN LIMIT 
OF TEST PROGRAM 
TRANSFER TEST 
PROGRAM FROM INSTANT 
ROM INTO RAM 
SET TIMER 
TO CAUSE 
INTERRUPT 
I 
INITIALISE REG'S 
IN PROCESSOR 
START TIMER 
AND JUMP INTO 
TEST PROGRAM 
HAS 
INTERRUPT" 
.OCCURRED, 
YES 
CHECK 
RESULT 
PROGRAM 
INJECT FAULT BY 
INTERRUPT ROUTINE 
SEND MESSAGE 
AND FINAL 
VALUE OF TIMER 
OUTPUT 'S' FOR 
SUCCESS. OR TIMER 
COUNT FOR FAILURE 
J 
( ST0P ) 
Figure 9.4 Software Flow Diagram for the Fault Injecting Test Facility 
210 
Type of 
Application 
Requirements Reference 
Batch 
Processing 
Recovery time of between 
10 minutes and 2 hours 
m 
Communications Recovery time of 1-15 minutes 111 
Telephone 
Switching 
Less than 2 hours down-t ime in 40 years 
Less than 2 calls lost in 10.000 
25 
Typical 
Industrial 
Recovery within 250 mil l iseconds 87 
Aerospace Recovery within 10 mil l iseconds m 
Space 98% survivability over 5 years 106 
Nuclear Reactor 
Safety System 
—6 
10 - probability of failure on demand 12 
Aircraft 10 - probability of failure during 
a 10 hour flight 
110 
Table 1.1 Reliability Requirements for Different Applications 
211 
DEVICE 
ERROR TYPE 
WRITE DATA READ 
R3 3.33 1.13 3.31 
R4 2.99 1.16 3.26 
R5 2.61 1.10 2.61 
R6 2.37 0.91 2.38 
Table 2.1 Voltage Levels at which First Errors Oecurreq 
in 8155 RAM Chips 
ERROR TYPE 
DEVICE WRITE DATA READ 
LOCATION VALUE LOCATION VALUE LOCATION VALUE 
R3 VARIOUS FF VARIOUS 
SINGLE 
BIT 
ERROR 
FF00 FF 
R4 VARIOUS FF VARIOUS 
SINGLE 
BIT 
ERROR 
VARIOUS 
SINGLE 
BIT 
ERROR 
R5 FF00 FF VARIOUS 
SINGLE 
BIT 
ERROR 
FFOO FF 
R6 FF00 FF VARIOUS 
SINGLE 
BIT 
ERROR 
FFOO FF 
Table 2.2 Location and Value of the First Errors Observeo 
212 
DATA ADDRESS 
HEX 
BINARY 
HEX 
D 7 D 6 D 5 ° 4 o ° 2 
0 , 1 ° 0 
00 0 0 0 0 0 0 0 0 FFBF 
n 0 0 0 1 0 0 0 1 FFBF 
22 0 0 1 0 0 0 1 0 FFBF 
33 0 0 1 1 0 0 1 1 FFBF 
88 1 0 0 0 1 0 0 0 FFBF 
99 1 0 0 1 1 0 0 1 FFBF 
AA 1 0 1 0 1 0 1 0 FFBF 
BB 1 0 1 1 1 0 1 1 FFBF 
44 0 1 0 0 0 1 0 0 FF3F 
66 0 1 1 0 0 1 1 0 FF3F 
C C 1 1 0 0 1 1 0 0 FF3F 
EE 1 1 1 0 1 1 1 0 FF3F 
55 0 1 0 1 0 1 0 1 FFFA 
DD 1 1 0 1 1 1 0 1 FFFA 
77 0 1 1 1 0 1 1 1 FFEC 
FF 1 
1 
1 1 1 
1 
1 1 FF4B 
0 1 - FIRST BITS CORRUPTED 
Table 2.3 First Data Corruotions in RAM Chio R5 
213 
DEVICE 
SIZE OF CAPACITOR IN TEST SUPPLY MINIMUM 
VOLTAGE 
REACHED 2.200 uF 4.700 uF 10.000 uF 
Cycles Cycles Cycles Volts 
RAM 1.50 3.25 7.25 3.8 
EPROM 1.75 3.25 7.75 3.4 
PROCESSOR 2.00 4.25 9.25 2.8 
COMPLETE 
SYSTEM 
1.50 3.50 7.25 3.8 
Table 2.4 Length of Interruptions to the Test Supply (in Cycles) 
Necessary to Cause Corruptions 
DEVICE RAM ROM EPROM 
8035 64x8 NONE 
8039 128x8 NONE 
8040 256x8 NONE 
8048 64x8 !Kx8 
8049 128x8 2Kx8 
8050 256x8 4Kx8 
8748 64x8 1Kx8 
8749 128x8 2Kx8 
Table 3.1 Internal Memory of the 48-Series Microprocessors 
214 
PROCESSOR 
PROBABILITY 
OF A JUMP 
"7 
AVERAGE NUMBER 
OF INSTRUCTIONS 
EXECUTED ( N I A W ) AV 
AVERAGE NUMBER 
OF BYTES 
EXECUTED (NB ..) 
AV 
8085 0.1035 9.65 12.5 
6800 0.1035 9.65 18.5 
8048 
(INTEL) 
0.1543 6.48 8.3 
8048 
(NEC) 
0.1621 6.16 7.9 
68000 0.3436 2.91 
Table 4.1 Results of Execution in Random Data 
PR
O
C
ES
SO
R
 
BL
O
C
K 
SI
ZE
 
O
F 
D
A
TA
 
LE
N
G
TH
 O
F 
R
EC
O
VE
R
Y 
S
TR
IN
G
 
%
 O
V
E
R
H
E
A
D
 
| 
A
V
E
R
A
G
E 
N
U
M
B
E
R
 
1 
O
F 
IN
S
TR
U
C
TI
O
N
S
 
E
X
E
C
U
TE
D
 
J 
% PROBABILITY 
OF OUTCOME 
PR
O
C
ES
SO
R
 
BL
O
C
K 
SI
ZE
 
O
F 
D
A
TA
 
LE
N
G
TH
 O
F 
R
EC
O
VE
R
Y 
S
TR
IN
G
 
%
 O
V
E
R
H
E
A
D
 
| 
A
V
E
R
A
G
E 
N
U
M
B
E
R
 
1 
O
F 
IN
S
TR
U
C
TI
O
N
S
 
E
X
E
C
U
TE
D
 
J 
H
A
LT
 
R
E
S
TA
R
T 
(R
E
C
O
V
E
R
Y
) 
R
A
N
D
O
M
 
JU
M
P
 
R
E
TU
R
N
 
8085 20 4 20 5.0 1.9 68.4 21.3 8.4 
8085 15 3 20 4.5 1.5 73.8 17.8 6.9 
8085 10 o (_ 20 3.9 1.3 77.9 14.6 6.2 
8085 5 1 20 3.2 0.9 84.1 10.6 4.4 
6800 15 o 20 3.7 6.0 70.7 19.5 3.8 
6800 10 2 20 3.3 4.7 74.1 17.1 4.1 
6800 5 1 20 3.5 5.4 72.1 18.6 3.9 
8048 15 o 20 4.1 0.0 42.7 54.4 2.9 
8048 10 0 20 4.1 0.0 46.3 51.0 2.7 
Table 4.2 Comoarison Between Different Data Structures 
215 
P
R
O
C
E
S
S
O
R
 
%
 
H
AL
T 
%
 
R
E
S
TA
R
T 
| 
%
 
R
A
N
D
O
M
 J
U
M
P
 
%
 
R
ET
U
R
N
 
%
 
R
E
S
U
M
E 
A
V
. 
N
o.
 
IN
S
TR
U
C
TI
O
N
S
 
E
X
E
C
U
TE
D
 
BE
FO
R
E 
A
N
Y 
TR
A
N
S
FE
R
 
A
V 
N
o.
 
IN
ST
R
U
C
TI
O
N
S 
E
X
E
C
U
TE
D
 
BE
FO
R
E 
R
E
S
U
M
IN
G
 
P
R
O
G
R
A
M
 
8085 0.1 1.0 1.4 0.6 96.9 1.2 0.3 
6800 1.6 0.3 5.2 1.3 91.6 1.7 0.7 
8048 0.0 0.0 3.5 0.2 96.3 1.0 0.2 
Table 5.1 Comparison Between Processors for Erroneous Execution 
in Program Areas 
P
R
O
C
E
S
S
O
R
 
P
R
O
G
R
A
M
 
%
 
H
AL
T 
%
 
R
E
S
TA
R
T 
%
 
R
AN
D
O
M
 
JU
M
P
 
%
 
R
E
TU
R
N
 
%
 
R
ES
U
M
E 
A
V 
N
o.
 
IN
S
TR
U
C
TI
O
N
S
 
E
X
E
C
U
TE
D
 
BE
FO
R
E 
A
N
Y 
TR
A
N
S
FE
R
 
A
V 
N
o.
 
IN
S
TR
U
C
TI
O
N
S
 
E
X
E
C
U
TE
D
 
BE
FO
R
E 
R
E
S
U
M
IN
G
 
P
R
O
G
R
A
M
 
8085 A 0.0 2.8 2.2 0.9 94.1 1.2 0.7 
8085 B 0.1 1.5 1.8 1.0 95.6 1.2 0.7 
6800 C 0.0 0.6 2.6 2.4 94.4 1.5 0.7 
8048 D 0.0 0.0 4.1 0.3 95.6 1.1 0.3 
3048 E 0.0 0.0 3.7 0.2 96.1 1.1 0.3 
8048 F 0.0 0.0 4.0 0.3 95.7 1.0 0.3 
Table 5.2 Comparison Between Actual Programs 
216 
P
R
O
C
E
S
S
O
R
 
|
 
P
R
O
G
R
A
M
 
%
 
H
AL
T 
%
 
R
E
S
TA
R
T 
%
 
R
AN
D
O
M
 
JU
M
P
 
%
 
R
ET
U
R
N
 
%
 
R
E
S
U
M
E
 
AV
. 
N
o.
 I
N
ST
R
U
C
TI
O
N
S 
1 
E
X
E
C
U
TE
D
 
BE
FO
R
E 
A
N
Y 
TR
A
N
S
FE
R
 
| 
AV
 N
o.
 I
N
S
TR
U
C
TI
O
N
S
 
1 
E
X
E
C
U
TE
D
 
B
E
FO
R
E
 
R
E
S
U
M
IN
G
 P
R
O
G
R
A
M
 
| 
8085 A 0.0 4.1 3.2 1.3 91.4 1.8 0.8 
8085 B 0.2 2.4 2.8 1.6 93.0 1.8 0.8 
6800 C 0.0 0.7 3.2 2.8 93.3 1.8 0.8 
8048 D 0.0 0.0 5.5 0.4 94.1 1.4 0.4 
8048 E 0.0 0.0 4.7 0.3 95.0 1.4 0.4 
8048 F 0.0 0.0 5.0 0.4 94.6 1.3 0.3 
68000 G 0.0 26.1 1.5 0.0 72.4 1.5 0.5 
Table 5.3 Results from the Simplified Analysis 
of Erroneous Execution in Program Areas 
PR
O
C
ES
SO
R
 
P
R
O
G
R
A
M
 
%
 
H
A
LT
 
%
 
R
E
S
TA
R
T 
%
 
R
AN
D
O
M
 
JU
M
P
 
%
 
R
ET
U
R
N
 
%
 
R
ES
U
M
E 
A
V
. 
N
o.
 
IN
S
TR
U
C
TI
O
N
S
 
E
X
E
C
U
TE
D
 
B
E
FO
R
E 
A
N
Y 
TR
A
N
S
FE
R
 
AV
 N
o.
 
IN
S
TR
U
C
TI
O
N
S
 
E
X
E
C
U
TE
D
 
BE
FO
R
E 
R
E
S
U
M
IN
G
 
P
R
O
G
R
A
M
 
8085 
X 
A 0.0 8.6 2.2 0.9 88.3 1.3 0.6 
3085 
X 
B 0.1 19.1 1.8 1.0 78.0 1.3 0.5 
6800 
X 
C 0.0 5.9 2.4 2.2 89.5 1.4 0.6 
Table 5.4 Detailed Analysis of Modified Programs 
217 
TYPE 
OF 
TRANSFER 
% PROBABILITY 
AVERAGE NUMBER 
OF INSTRUCTIONS 
EXECUTED 
HALT 47.7 54.6 
RESTART 5.8 1.6 
RANDOM 
JUMP 
3.0 3.75 
RETURN 35.0 33.6 
SPECIFIC 
JUMP 
8.5 2.2 
ALL 100.0 38.2 
Table 6.1 Probability of Different Outcomes after a Random j u m p 
into an Unused Memory Area of an 8085 
TYPE 
OF 
TRANSFER 
% PROBABILITY 
AVERAGE NUMBER 
OF INSTRUCTIONS 
EXECUTED 
HALT 49.7 55.7 
RESTART 7.2 2.3 
RANDOM 
JUMP 
5.0 5.0 
RETURN 38.1 31.3 
ALL 100.0 40.0 
Table 6.2 Outcomes after a Random Jump into an Unused Memory Area 
of an 8085, Assuming Address Range C000 to FFFF is Unused 
218 
MEMORY 
ARRANGEMENT 
(see fig. 6.1) 
STATE OF MEMORY 
8ANK SELECT 
FLIP-FLOP 
AFTER ERROR 
% PROBABILITY OF TRANSFER 
JUMP OUT OF 
UNUSED AREA 
RETURN LOOP 
A 0 4-9.8 49.2 1.0 
A 1 0.0 99.0 1.0 
B 0 90.5 9.4 0.1 
B 1 29.2 69.8 1.0 
C 0 90.9 9.0 0.1 
C 1 89.8 9.2 1.0 
D 0 89.8 9.2 1.0 
D 1 90.9 9.0 0.1 
Table 6.3 Transfer from Unpopulated Memory Areas of an 8048 
P
R
O
C
E
S
S
O
R
 
%
 
H
A
LT
 
%
 
R
E
S
TA
R
T 
%
 
R
A
N
D
O
M
 
JU
M
P
 
%
 
R
E
TU
R
N
 
%
 
S
P
E
C
IF
IC
 
JU
M
P
 
%
 
E
X
IT
 F
R
O
M
 
BL
O
C
K 
8085 0.4 3.3 0.4- 2.0 4-. 3 89.6 
6800 2.0 0.4- 1.2 1.6 1.2 93.6 
63000 0.0 32.5 1.9 0.0 0.0 65.6 
Table 6.4- Transfer from Partially Decoded Memory Mapped Input Ports 
219 
LOCATION 
OF DATA 
NUMBER OF 
INSTRUCTIONS 
WHICH CAUSE 
CORRUPTION 
PROBABILITY THAT 
A SINGLE 
INSTRUCTION WILL 
NOT CORRUPT DAT 
ACCUMULATOR 84 0.672 
B REGISTER 12 0.953 
C REGISTER 14 0.945 
D REGISTER 16 0.938 
E REGISTER 18 0.930 
H REGISTER 22 0.914 
L REGISTER 24 0.906 
STACK POINTER 30.5 0.881 
MEMORY 34.5 0.865 
ALL FLAGS 105 0.590 
SIGN FLAG 45.5 0.822 
ZERO FLAG 45.5 0.822 
AUXILIARY 
CARRY FLAG 
45.5 0.822 
PARITY FLAG 45.5 0.822 
CARRY FLAG 44.5 0.826 
Table 7.1 Data Corruptions in the 8085 Caused bv Erroneous Execution 
220 
S
Y
S
T
E
M
 A
R
R
A
N
G
E
M
E
N
T 
N
O
N
-M
E
M
O
R
Y
 M
A
P
P
E
D
 P
O
R
T
S
 
R
E
C
O
V
E
R
Y
 
R
O
U
TI
N
E
 
P
U
L
L
-U
P
S
 O
N
 D
A
T
A
 L
IN
E
S
 
M
O
D
IF
IE
D
 P
R
O
G
R
A
M
 A
R
E
A
 
FU
LL
 R
A
M
 
D
E
C
O
D
IN
G
 
S
E
E
D
E
D
 D
A
T
A
 A
R
E
A
 
F INAL O U T C O M E R E A C H E D 
E
X
P
E
C
T
E
D
 N
U
M
B
E
R
 
O
F 
E
R
R
O
N
E
O
U
S
 
IN
S
T
R
U
C
T
IO
N
S
 
E
X
E
C
U
T
E
D
 
90
%
 C
O
N
FI
D
E
N
C
E
 
LI
M
IT
 
O
N
 T
H
E
 N
U
M
B
E
R
 
O
F 
IN
S
T
R
U
C
T
IO
N
S
 
E
X
E
C
U
T
E
D
 
S
Y
S
T
E
M
 A
R
R
A
N
G
E
M
E
N
T 
N
O
N
-M
E
M
O
R
Y
 M
A
P
P
E
D
 P
O
R
T
S
 
R
E
C
O
V
E
R
Y
 
R
O
U
TI
N
E
 
P
U
L
L
-U
P
S
 O
N
 D
A
T
A
 L
IN
E
S
 
M
O
D
IF
IE
D
 P
R
O
G
R
A
M
 A
R
E
A
 
FU
LL
 R
A
M
 
D
E
C
O
D
IN
G
 
S
E
E
D
E
D
 D
A
T
A
 A
R
E
A
 
%
 
H
A
LT
 
%
 
LO
O
P
 
%
 
R
E
S
U
M
E
 
%
 
R
E
C
O
V
E
R
 
E
X
P
E
C
T
E
D
 N
U
M
B
E
R
 
O
F 
E
R
R
O
N
E
O
U
S
 
IN
S
T
R
U
C
T
IO
N
S
 
E
X
E
C
U
T
E
D
 
90
%
 C
O
N
FI
D
E
N
C
E
 
LI
M
IT
 
O
N
 T
H
E
 N
U
M
B
E
R
 
O
F 
IN
S
T
R
U
C
T
IO
N
S
 
E
X
E
C
U
T
E
D
 
A 65.2 10.5 24.3 0.0 1045.3 2406.9 
B y 69.2 7.8 23.0 0.0 56.6 130.3 
C y 67.2 7.7 9.9 15.2 1055.6 2430.6 
D y y 0.4 0.4 6.4 93.0 682.2 1570.8 
E y y y 0.3 0.0 6.2 93.5 1.6 3.7 
F y y y y 0.3 0.0 5.1 94.7 1.6 3.7 
G y y y y 0.1 0.0 6.1 93.8 1.3 3.0 
H y y y y y 0.0 0.0 6.0 94.0 1.1 2.5 
l y y y y y y 0.0 0.0 4.9 95.1 1.1 2.5 
Table 8.1 Er roneous Execution Under Different System Arrangements 
221 
Appendix 1. Software to Test the Effects of Executing Undeclared 
Operation Codes 
This appendix contains full commented listings of the programs used to 
identify the effects of executing the undeclared operation codes of the 
6800 and 8035/8048. Similar techniques, as those illustrated, c a n be 
employed on other microprocessors . However, the software alone is not 
usually sufficient to identify all functions, and it is n e c e s s a r y to use 
additional techniques such as the monitoring of all external s ignals with a 
logic analyser. 
A1.1 Listing of the 6800 Test Program 
NAM M6800 
x x x x x x x x x x x x x x x x * * x x * * x x x * x * x x * x x x * x x x x x * x * x x x x x x x x x * x x * x * x 
X 
« * * * * * M 6 8 0 0 . A S M * * * * * 
X 
K X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 
X 
* THIS PROGRAM IS D E S I G N E D TO T E S T THE U N D E C L A R E D 
O P - C O D E S IN T H E MOTOROLA 6800 M I C R O P R O C E S S O R 
X 
* IT ALSO T E S T S IF THE INTERRUPTS A R E DISABLED BY THEM 
X 
X 
X 
E1D1 O U T E E E EQU $E1D1 ROUTINE TO OUTPUT A C H A R A C T E R 
E055 BYTE EQU $E055 R E A D S IN A BYTE O F DATA IN HEX 
E 0 E 3 CONTRO EQU $ E 0 E 3 ENTRY POINT INTO MIKBUG 
E 0 7 E PDATA1 
ft 
EQU $ E 0 7 E ROUTINE TO OUTPUT A STRING 
1FF0 ORG $ 1 F F 0 S E T STACK LOCATIONS 
1FF0 0001 STACK RMB 1 POSITION O F TOP O F STACK 
I F F ! 0001 C T E M P RMB 1 S P A C E FOR CONDITION C O D E S 
1FF2 0001 BTEMP RMB 1 S P A C E FOR ACCUMULATOR B 
1FF3 0001 ATEMP RMB 1 S P A C E FOR ACCUMULATOR A 
1FF4 0001 XTEMPH RMB 1 HIGH BYTE OF X R E G I S T E R 
1FF5 0001 XTEMPL RMB 1 LOW BYTE OF X R E G I S T E R 
1FF6 0002 PTEMP 
X 
RMB 2 S P A C E FOR RETURN A D D R E S S 
A048 ORG •A048 S E T START A D D R E S S FOR MIKBUG G 
A048 0100 GOADD FDB $0100 COMMAND 
A000 ORG $A000 
A000 0200 FDB IRQVEC S E T V E C T O R FOR IRQ 
A006 ORG $A006 
A006 0210 FDB NMIVEC S E T V E C T O R FOR NMI 
222 
6800 Test Program (cont.) 
ORG $0100 START OF T E S T PROGRAM 
0100 
0103 
0106 
0109 
010C 
010F 
0112 
0115 
0118 
one 
011E 
0121 
0124 
0127 
012A 
012D 
0130 
0133 
0134 
0135 
0136 
0137 
0138 
0139 
013A 
0138 
013D 
013E 
013F 
0140 
0141 
C E 0137 
F F 1FF6 
BD 0142 
BD E055 
B7 1FF1 
BD 014D 
BD E055 
B7 1 F F 2 
BD 014D 
BD E055 
B7 1FF3 
BD 014D 
BD E055 
B7 1 F F 4 
BD E055 
B7 1 F F 5 
8E 1FF0 
3B 
01 
01 
01 
01 
01 
01 
01 
20 F7 
3F 
3F 
3F 
3F 
3F 
START 
R E S U 
T E S T ! 
LDX 
STX 
J S R 
J S R 
STAA 
J S R 
J S R 
STAA 
J S R ' 
J S R 
STAA 
J S R 
J S R 
STAA 
J S R 
STAA 
LDS 
RTI 
NOP 
NOP 
NOP 
NOP 
NOP 
NOP 
NOP 
BRA 
SWI 
SWI 
SWI 
SWI 
SWI 
- R E S U 
PTEMP 
C R L F 
B Y T E 
C T E M P 
S P A C E 
BYTE 
BTEMP 
S P A C E 
BYTE 
ATEMP 
S P A C E 
BYTE 
XTEMPH 
BYTE 
XTEMPL 
' S T A C K 
LOAD A D D R E S S TO GO TO A F T E R RTI 
S T O R E VALUE ON STACK 
S E T TERMINAL ON NEW LINE 
READ IN BYTE FOR CONDITION C O D E S 
S T O R E ONTO S T A C K 
READ IN BYTE FOR ACCUMULATOR B 
S T O R E ONTO S T A C K 
READ IN BYTE FOR ACCUMULATOR A 
S T O R E ONTO S T A C K 
READ IN HIGH BYTE OF X R E G I S T E R 
S T O R E ONTO S T A C K 
READ IN LOW BYTE OF X R E G I S T E R 
S T O R E ONTO S T A C K 
LOAD S T A C K TO POINT TO DATA BLOCK 
LOAD R E G S AND JUMP TO T E S T LOC 
NOPS IN LOOP TO WAIT FOR INTERRUPT 
T E S T BYTE CAN B E I N S E R T E D BY HAND 
IN ONE OF T H E S E LOCATIONS 
START LOOP UNTIL INTERRUPT 
STRING OF S O F T W A R E INTERRUPTS TO 
C A P T U R E EXECUTION A F T E R T E S T C O D E 
N E C E S S A R Y FOR S I N G L E . DOUBLE 
OR TRIPLE BYTE INSTRUCTIONS 
S U B R O U T I N E S 
0142 86 0D 
0144 BD E1D1 
0147 86 OA 
0149 BD E1D1 
014C 39 
C R L F LDAA 
J S R 
LDAA 
J S R 
RTS 
"$0D 
O U T E E E 
"$0A 
O U T E E E 
SUBROUTINE TO OUTPUT A C A R R I A G E 
RETURN AND LINE F E E D TO THE 
TERMINAL 
014D 86 20 
014F BD E1D1 
0152 86 20 
0154 BD E1D1 
0157 39 
S P A C E LDAA 
J S R 
LDAA 
J S R 
RTS 
-$20 
O U T E E E 
'$20 
O U T E E E 
SUBROUTINE TO OUTPUT A 
TO THE TERMINAL 
S P A C E 
223 
6800 Test Program (cont.) 
0200 
0200 C E 0217 
0203 BD E 0 7 E 
0206 3B 
0210 C E 021F 
0213 BD E 0 7 E 
0216 3B 
0217 20 
0218 20 
0219 49 
021A 52 
021B 51 
021C 20 
021D 20 
021E 04 
021F 20 
0220 20 
0221 4 E 
0222 4D 
0223 49 
0224 20 
0225 20 
0226 04 
INTERRUPT S E R V I C E ROUTINES 
ORG $0200 
IRQVEC LDX 
J S R 
RTI 
" IRQSTR LOAD S T A R T A D D R E S S OF STRING 
PDATA1 PRINT STRING TO INDICATE IRQ 
ORG $0210 
NMIVEC LDX 
J S R 
RTI 
* NMISTR LOAD START A D D R E S S OF STRING 
PDATA1 PRINT STRING TO INDICATE NMI 
IRQSTR F C C / IRQ / STRING PRINTED BY IRQ 
F C B $04 DELIMITER 
NMISTR F C C / NMI / STRING PRINTED BY NMI 
F C B $04 DELIMITER 
END 
224 
A1.2 Listing of the 8035/8048 Test Program 
O l > l l I < l l * * t « > < « I > > < > > < « l t < « I l l * > < I l t X * l > l t X l l I l t > I I I I « < X l t < « « ) ! < l t « < X l l > > ( « I > l t > < l t 
; PROGRAM TO T E S T THE U N D E C L A R E D O P C O D E S OF THE 8035/8048 
ALL UNUSED LOCATIONS A R E S E T TO 04. THIS F O R C E S A JUMP TO 
A D D R E S S 004 IF PROGRAM EXECUTION IS A T T E M P T E D OUTSIDE T H E 
NORMAL PROGRAM A R E A 
000 64 JMP 0300 JUMP TO INITIALISATION BLOCK 
001 00 
002 04 UNUSED LOCATIONS S E T TO 04 
003 04 
004 39 OUTL P I , A OUTPUT C O N T E N T S OF ACCUMULATOR TO PORT 
005 83 R E T R E T U R N TO MAIN LOOP 
006 04 
007 04 U N U S E D LOCATIONS S E T TO 04. C A U S E S JUMP 
008 04 TO A D D R E S S 004 IF E X E C U T E D 
MAIN PROGRAM LOOP 
100 75 ENT0 CLK S E T TO AS A C L O C K OUTPUT FOR LOGIC ANALYSER 
101 17 INC A INCREMENT T E S T BYTE IN ACCUMULATOR 
102 54 CALL 0200 CALL ROUTINE TO E X E C U T E U N D E C L A R E D C O D E 
103 00 
104 24 JMP 0101 JUMP BACK TO BEGINNING OF LOOP 
105 01 
106 04 
107 04 UNUSED LOCATIONS 
SUBROUTINE TO E X E C U T E U N D E C L A R E D C O D E 
200 39 OUTL P I . A OUTPUT C O N T E N T S OF ACCUMULATOR TO PORT 
201 XX S P A C E FOR U N D E C L A R E D C O D E 
202 04 S E Q U E N C E OF LOCATIONS S E T TO 04, THIS E N S U R E S 
203 04 THAT EXECUTION WILL T R A N S F E R TO LOCATION 004 
204 04 R E G A R D L E S S OF WHETHER THE U N D E C L A R E D C O D E IS 
205 04 A S I N G L E . DOUBLE OR TRIPLE BYTE INSTRUCTION 
C O D E FOR INITIALISATION OF P R O C E S S O R ON R E S E T 
300 23 MOV A,0AAH S E T S ACCUMULATOR TO THE VALUE AA 
301 AA 
302 00 NOP S P A C E FOR SETT ING OTHER R E G I S T E R S OR F L A G S 
303 00 NOP 
304 24 JMP 0100 JUMP TO BEGINNING OF MAIN LOOP 
305 00 
306 04 UNUSED LOCATIONS S E T TO 04 
307 04 
END 
225 
Appendix 2. The Effects of Executing the Undeclared Operation C o d e s of the 
8035/8048 
Appendix 2 contains a detailed description of the operations performed 
by all the instruction codes which are not declared for the 8035/8048. 
They appear in numerical order and are referenced by their hexadecimal 
value. In c a s e s where the code performs a different function for different 
manufacturers, this is clearly marked and both operations are descr ibed. 
Symbols Used 
The symbols used and the layout of the definitions is very similar to 
that used in the National Semiconductor 4 8 - S e r i e s Microcomputers Handbook 
(120). Reference should be made to the handbook for descriptions of the 
standard instruction set. 
Symbols Description 
A The Accumulator 
The Auxiliary Carry Flag 
Program Memory Address 
Bit Designator (b = 0-7) 
The Bank Switch 
The Bus Port 
Carry Flag 
Clock Signal 
Event Counter 
Nibble Designator (4 bits) 
Number or Expression (8 bits) 
Memory Bank F l ip -F lop 
F lags 0,1 
Interrupt 
" In-Page" Operation Designator 
Port Designator (p = 1,2 or 4 -7) 
Program Status Word 
Register Designator (r = 0,1 or 0-7) 
Stack Pointer 
Timer 
Timer Flag 
Testable Inputs 0,1 
External RAM 
Prefix for Immediate data 
Prefix for Indirect Address 
Contents of Accumulator 
Contents of Location Addressed by A 
Replaced By 
AC 
addr 
Bb 
B S 
BUS 
C 
CLK 
CNT 
D 
data 
DBF 
F0.F1 
P 
Pp 
PSW 
Rr 
S P 
T 
TF 
T0,T1 
X 
e 
(A) 
((A)) 
< 
226 
Operation Code: 
Mnemonic: 
Operation: 
Description: 
Note: 
Spec ia l 
Conditions: 
01 01 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction 
S a m e operation as the defined instruction (code 00). 
Cyc les : 1 
Bytes: 1 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Spec ia l 
Conditions: 
06 06 
J N T F addr 
Jump to specif ied address if timer flag is c lear . 
(PC 0-7) < addr if TF=0 
(PC) < (PC) + 2 if TF=1 
If the internal t imer/counter flag is set to a logic 
zero, the contents of the program counter are replaced 
by the address bits from byte 2. If the t imer/counter 
flag is a logic one, the next sequential instruction is 
executed. 
This instruction is the logical inverse of the J T F 
instruction, except that the t imer/counter flag is not 
affected. 
Cyc les : 
Bytes: 
2 
2 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Spec ia l 
Conditions: 
0B 0B 
IN A.P2 
Input data to accumulator from port 2 
(A) < (P2) 
Data present at port 2 is input into the accumulator. 
S a m e operation as the defined instruction (code OA). 
Cyc les : 2 
Bytes: 1 
227 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Spec ia l 
Conditions: 
22 
MOV A.PC+1 
* * N E C 8048 ONLY*" 22 
Move contents of the 
accumulator and increment. 
(A) < (PC) + 1 
program counter into the 
The contents of the program counter are moved to the 
accumulator and then the accumulator is incremented by 
one. After executing this instruction the accumulator 
contains the address of the next sequential instruction. 
This function is only performed by the processor 
manufactured by N E C . 
C y c l e s : 1 
Bytes: 1 
Operation Code: 
Mnemonic: 
Operation: 
Description: 
Note: 
Specia l 
Conditions: 
22 " I N T E L 8048 ONLY*« 22 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction 
This code performs a no operation (code 00) on the 8048 
manufactured by Intel. It performs a different function 
on the p rocessor made by N E C . 
Cyc les : 1 
Bytes: 1 
Operation Code: 
Mnemonic: 
Operation: 
Description: 
Note: 
Specia l 
Conditions: 
33 33 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction. 
S a m e operation as the defined instruction (code 00). 
C y c l e s : 1 
Bytes: 1 
228 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Spec ia l 
Conditions: 
38 38 
BUS IDLE 
No specif ic operation on the Bus 
(BUS) < 00 
The value 00 appears on the Bus during T4 of the second 
cycle of the instruction, but no read or write signal is 
generated. Therefore a valid Bus operation is not 
performed. 
This code does not appear to perform any useful 
function. 
Cyc les : 2 
Bytes: 1 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Spec ia l 
Conditions: 
3B 3B 
OUTL P2.A 
Output contents of accumulator to port 2. 
(P2) < (A) 
The contents of the accumulator are p laced, and latched, 
at the output port 2. 
S a m e operation as the defined instruction (code 3A). 
Cyc les : 2 
Bytes: 1 
Operation Code: 
Mnemonic: 
Operation: 
Description: 
Note: 
Specia l 
Conditions: 
63 63 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction. 
Same operation as the defined instruction (code 00). 
Cyc les : 1 
Bytes: 1 
229 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Specia l 
Conditions: 
66 66 
JNF1 addr 
Jump to specified address if flag 1 is c lear . 
(PC 0-7) < addr. if F1=0 
(PC) < (PC) + 2. if F l = l 
If flag 1 is at a logic zero, the contents of the 
program counter are replaced by the address bits from 
byte 2. If flag 1 is a logic one. the next sequential 
instruction is executed. 
This instruction is the logical inverse of the JF1 
instruction. 
C y c l e s : 2 
Bytes: 2 
Operation Codes : 
Mnemonic: 
Operation: 
Description: 
Note: 
Specia l 
Conditions: 
73,82 73,82 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction 
S a m e operation as the defined instruction (code 00). 
C y c l e s : 1 
Bytes: 1 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Specia l 
Conditions: 
87 " N E C 8048 ONLY** 87 
C L R A 4 - 7 
C lear accumulator high nibble. 
(A4-7) < 0 
Accumulator bits 4 through 7 are c leared to zero. 
This function is only performed by the processor 
manufactured by N E C . 
Cyc les : 
Bytes: 
230 
Operation Code: 
Mnemonic: 
Operation: 
Description: 
Note: 
Specia l 
Conditions: 
87 " I N T E L 8048 ONLY* * 87 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction 
This code performs a no operation (code 00) on the 8048 
manufactured by Intel. It performs a different function 
on the processor made by N E C . 
C y c l e s : 1 
Bytes: 1 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Specia l 
Conditions: 
8B 
ORL p2 .#data 
8B 
Logica l -OR- immedia te specif ied data with contents of 
port 2. 
(P2) < (P2) OR data 
The data contained In byte 2 is logically ORed with the 
data on port 2. and the results are sent back to the 
port. 
S a m e operation as the defined instruction (code 8A). 
Cyc les : 
Bytes: 
2 
2 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Specia l 
Conditions: 
9B 9B 
ANL P2 .#da ta 
Logica l -AND- immediate specif ied data with port 2. 
(P2) < (P2) AND data 
The data contained in byte 2 are logically ANDed 
immediately with the data on port 2, and the results are 
sent back to the port. 
S a m e operation as the defined instruction (code 9A). 
C y c l e s : 
Bytes: 
2 
2 
231 
Operation Code: 
Mnemonic: 
Operation: 
Description: 
Note: 
Spec ia l 
Conditions: 
A2 A2 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction 
S a m e operation as the defined instruction (code 00). 
Cyc les : 1 
Bytes: 1 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
A6 
JNFO addr 
A6 
Jump to specif ied address if flag 0 is c lear . 
(PC 0-7) < addr. if F0=0 
(PC) < (PC) + 2. if F0=1 
If flag 0 is at a logic zero, the contents of the 
program counter are replaced by the address bits from 
byte 2. If flag 0 is at a logic one. the next 
sequential instruction is executed. 
This instruction is the 
instruction. 
logical inverse of the JFO 
Spec ia l 
Conditions: 
C y c l e s : 
Bytes: 
2 
2 
Operation Code: 
Mnemonic: 
Operation: 
Description: 
Note: 
Spec ia l 
Conditions: 
B7 B7 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction 
S a m e operation as the defined instruction (code 00). 
Cyc les : 1 
Bytes: 1 
232 
Operation Codes: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Spec ia l 
Conditions: 
C0.C1 * * N E C 8048 ONLY** 
D E C § Rr 
Decrement- indi rect contents of RAM by one. 
CO.Cl 
((Rr)) < ((Rr)) + 1, where r = 0 or 1 
The contents of the internal RAM location as addressed 
by bits 0 through 5 of register 'r ' , are decremented by 
one. 
This function is only performed by the processor 
manufactured by NEC. 
Cyc les : 
Bytes: 
Operation Codes : 
Mnemonic: 
Operation: 
Description: 
Note: 
Specia l 
Conditions: 
C0.C1 " I N T E L 8048 ONLY* * C0.C1 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction. 
T h e s e codes perform a no operation (code 00) on the 8048 
manufactured by Intel. They perform a different 
function on the processor made by N E C . 
Cyc les : 
Bytes: 
1 
Operation Codes: 
Mnemonic: 
Operation: 
Description: 
Note: 
Spec ia l 
Conditions: 
C 2 . C 3 C 2 . C 3 
NOP 
No operation performed. 
No operation is performed: execution continues with the 
next sequential instruction 
S a m e operation as the defined instruction (code 00). 
Cyc les : 1 
Bytes: 1 
233 
Operation Code: 
Mnemonic: 
Operation: 
Symbolic 
Representation: 
Description: 
Note: 
Specia l 
Conditions: 
D6 D6 
JMPP addr 
Jump to specif ied address within address page. 
(PC 0-7) < addr 
The contents of the program counter are replaced by the 
address bits from byte 2. 
Performs an unconditional jump within the current 
address page. This operation is not provided in the 
standard instruction set. 
Cyc les : 2 
Bytes: 2 
Operation Codes: 
Mnemonic: 
Operation: 
E0 .E1 
DJNZ § Rr addr 
' N E C 8048 ONLY* * E0 .E1 
Decrement- indi rect contents of RAM. test contents, jump 
if not zero. 
Symbolic 
Representation: 
((Rr)) < ((Rr)) - 1: 
(PC 0-7) < addr. 
(PC) < (PC) + 2. 
where r = 0 or 
if ((Rr)) = 0 
if ((Rr)) = 0 
1 
Description: 
Note: 
Spec ia l 
Conditions: 
The contents of the internal RAM location, as addressed 
by bits 0 through 5 of register r. are decremented by 
one. and then tested to s e e if the contents equal zero. 
If the contents of the location equal zero, the next 
sequential instruction is executed. If the location is 
not zero, control p a s s e s to the instruction at the 
address designated in byte 2. 
This function is only performed by the p rocessor 
manufactured by N E C . and provides an operation which is 
not available from the standard instruction set. 
Cyc les : 
Bytes: 
2 
2 
234 
Operation Codes: 
Mnemonic: 
Operation: 
Description: 
Note: 
E0 .E1 " I N T E L 8048 ONLY** E0 ,E1 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction 
These c o d e s perform a no operation (code 00) on the 8048 
manufactured by Intel. They perform a different 
function on the processor made by NEC. 
Specia l 
Conditions: 
C y c l e s : 1 
Bytes: 1 
Operation Codes: 
Mnemonic: 
Operation: 
Description: 
Note: 
Spec ia l 
Conditions: 
E 2 . F 3 E 2 . F 3 
NOP 
No operation performed. 
No operation is performed; execution continues with the 
next sequential instruction. 
S a m e operation as the defined instruction (code 00). 
C y c l e s : 1 
Bytes: 1 
235 
Appendix 3. Instruction Set Parameters 
This appendix contains details of the instruction set parameters for 
the 8085. 6800. 8048 and 68000 microprocessors . 
A3.1 Instruction Set Parameters for the 8085 
The 8085 contains the following instruction types: -
Single Byte Instructions Declared Undeclared Total 
Non-Jumping 183 5 188 
Conditional Jump 8 1 9 
Jump 11 0 11 
Total 202 6 208 
Double Byte Instructions Declared Undeclared Total 
Non-Jumping 18 2 20 
Conditional Jump 0 0 0 
Jump 0 0 0 
Total 18 2 20 
Triple Byte Instructions Declared Undeclared Total 
Non-Jumping 8 0 8 
Conditional Jump 16 2 18 
Jump 2 0 2 
Total 26 2 28 
All Instructions Declared Undeclared Total 
Non-Jumping 209 7 216 
Conditional Jump 24 3 27 
Jump 13 0 13 
Total 246 10 256 
The effective number of jump instructions is 26.5. 
236 
Jump Instruction Types for the 8085 
Unconditional Jumps Conditional Jumps 
Code Mnemonic Length Type Code Mnemonic Length Type 
C3 JMP 3 JMP C2 JNZ 3 JMP 
CD CALL 3 JMP CA JZ 3 JMP 
C9 RET 1 RET D2 JNC 3 JMP 
C7 RST 0 1 RST DA JC 3 JMP 
CF RST 1 1 RST E2 JPO 3 JMP 
D7 RST 2 1 RST EA JPE 3 JMP 
DF RST 3 1 RST F2 JP 3 JMP 
E7 RST 4 1 RST FA JM 3 JMP 
EF RST 5 1 RST C4 CNZ 3 JMP 
F7 RST 6 1 RST CC CZ 3 JMP 
FF RST 7 1 RST D4 CNC 3 JMP 
E9 PCHL 1 JMP DC CC 3 JMP 
76 HLT 1 HLT E4 CPO 3 JMP 
EC CPE 3 JMP 
F4 CP 3 JMP 
FC CM 3 JMP 
CO RNZ 1 RET 
C8 RZ 1 RET 
DO RNC 1 RET 
D8 RC 1 RET 
E0 RPO 1 RET 
E8 RPE 1 RET 
FO RP 1 RET 
F8 RM 1 RET 
DD >«• 3 JMP 
FD X X X 3 JMP 
CB 1 RST 
HLT — Halt Instructions. 
JMP — Jump Instructions. 
RST — Restart instructions. 
RET — Return Instructions. 
x x x — undefined Instructions. 
237 
A3.2 Instruction Set Parameters for the 6800 
The 6800 contains the following instruction types:-
Single Byte Instructions 
Non-Jumping 
Conditional Jump 
Jump 
Total 
Double Byte Instructions 
Non-Jumping 
Conditional Jump 
Jump 
Total 
Triple Byte Instructions 
Non-Jumping 
Conditional Jump 
Jump 
Total 
Four Byte Instructions 
Non-Jumping 
Conditional Jump 
Jump 
Total 
All Instructions 
Non-Jumping 
Conditional Jump 
Jump 
Total 
Declared 
47 
0 
4 
51 
Declared 
85 
14 
4 
103 
Declared 
41 
0 
2 
43 
Declared 
0 
0 
0 
Declared 
173 
14 
10 
197 
Undeclared 
25 
0 
4 
29 
Undeclared 
12 
1 
4 
17 
Undeclared 
10 
0 
1 
11 
Undeclared 
2 
0 
0 
Undeclared 
49 
1 
9 
59 
Total 
72 
0 
8 
80 
Total 
97 
15 
8 
120 
Total 
51 
0 
3 
54 
Total 
2 
0 
0 
2 
Total 
222 
15 
19 
256 
The effective number of jump instructions is 26.5. 
238 
Jump Instruction Types for the 6800 
Unconditional Jumps Conditional Jumps 
Code Mnemonic Length Type Code Mnemonic Length Type 
20 BRA 2 JMP 22 BHl 2 JMP 
6E JMP(I) 2 JMP 23 BLS 2 JMP 
7E JMP(E) 3 JMP 24 BCC 2 JMP 
8D BSFt 2 JMP 25 BCS 2 JMP 
AD JSR(I) 2 JMP 26 BNE 2 JMP 
BD JSR(E) 3 JMP 27 BEQ 2 JMP 
39 RTS 1 RET 28 BVC 2 JMP 
3B RTI 1 RET 29 BVS 2 JMP 
3E WAI 1 HLT 2A BPL 2 JMP 
3F SWI 1 RST 2B BMI 2 JMP 
38 * * « 1 RET 2C BGE 2 JMP 
3A * * « 1 RET 2D BLT 2 JMP 
3C « » 1 HLT 2E BGT 2 JMP 
3D X X X 1 HLT 2F BLE 2 JMP 
9D X X X 2 HLT 21 X X X 2 JMP 
CD X X X 2 JMP 
DD X X X 2 HLT 
ED X X X 2 JMP 
FD X X X 3 JMP 
HLT — Halt Instructions. 
JMP — Jump Instructions. 
RET — Return Instructions. 
RST — Restart Instructions. 
x x x — undefined Instructions. 
(I) — Indexed Addressing. 
(E) — Extended Addressing. 
239 
A3.3 Instruction Set Parameters for the 8048 
The 8048 instruction set is dependent on the manufacturer of the 
device. The main figures given are for processors made by Intel. Figures 
in brackets show the variations for processors made by NEC. 
Single Byte instructions 
Non-Jumping 
Conditional Jump 
Jump 
Total 
Double Byte Instructions 
Non-Jumping 
Conditional Jump 
Jump 
Total 
All Instructions 
Non-Jumping 
Conditional Jump 
Jump 
Total 
Declared 
161 
0 
3 
164 
Declared 
22 
20 
24 
66 
Declared 
183 
20 
27 
230 
Undeclared 
20(18) 
0 
0 
20(18) 
Undeclared 
2 
3 
1(3) 
6(8) 
Undeclared 
22(20) 
3 
1(3) 
26 
Total 
181(179) 
0 
3 
184(182) 
Total 
24 
23 
25(27) 
72(74) 
Total 
205(203) 
23 
28(30) 
256 
The effective number of jump instructions is 39.5 (41.5). 
240 
Jump Instruction Types for the 8048 
Unconditional Jumps Conditional Jumps 
Code Mnemonic Length Type Code Mnemonic Length Type 
04 JMP OXX 2 JMP 12 JBO 2 JMP 
24 JMP 1XX 2 JMP 32 JB1 2 JMP 
44 JMP 2XX 2 JMP 52 JB2 2 JMP 
64 JMP 3XX 2 JMP 72 JB3 2 JMP 
84 JMP 4XX 2 JMP 92 JB4 2 JMP 
A4 JMP 5XX 2 JMP B2 JB5 2 JMP 
C4 JMP 6XX 2 JMP D2 JB6 2 JMP 
E4 JMP 7XX 2 JMP F2 JB7 2 JMP 
14 CALL OXX 2 JMP 06 x x x 2 JMP 
34 CALL 1XX 2 JMP 16 JTF 2 JMP 
54 CALL 2XX 2 JMP 26 JNTO 2 JMP 
74 CALL 3XX 2 JMP 36 JTO 2 JMP 
94 CALL 4XX 2 JMP 46 JNT1 2 JMP 
B4 CALL 5XX 2 JMP 56 JT1 2 JMP 
D4 CALL 6XX 2 JMP 66 X X X 2 JMP 
F4 CALL 7XX 2 JMP 76 JF1 2 JMP 
B3 JMP 9A 1 JMP 86 JNI 2 JMP 
D6 X X X 2 JMP 96 JNZ 2 JMP 
E8 DJNZ RO 2 JMP A6 X X X 2 JMP 
E9 DJNZ m 2 JMP B6 JFO 2 JMP 
EA DJNZ R2 2 JMP C6 JZ 2 JMP 
EB DJNZ R3 2 JMP E6 JNC 2 JMP 
EC DJNZ R4 2 JMP F6 JC 2 JMP 
ED DJNZ R5 2 JMP 
EE DJNZ R6 2 JMP 
EF DJNZ R7 2 JMP 
83 RET 1 RET 
93 RETR 1 RET 
NEC 8048 ONLY 
EO DJNZ §R0 2 JMP 
El DJNZ 8R1 2 JMP 
HLT — Halt Instructions. 
JMP — Jump Instructions. 
RET — Return Instructions. 
RST — Restart Instructions. 
x x x — undefined Instructions. 
XX — Low-order Byte of Jump Address. 
241 
A3.4 Instruction Set Parameters for the 68000 
The 68000 contains the following instruction types:-
All Instructions Declared Undeclared Total 
Non-Jumping 
Conditional Jump 
Jump 
41021 
4210 
20305 
0 
0 
0 
41021 
4210 
20305 
Total 65536 0 65536 
Jump Instruction Types for the 68000 
For the 68000 it is not acceptable to assume that conditional jump 
instructions will cause transfer of execution on 50% of the occasions that 
they are executed. The list on the following page shows how the different 
instructions have been divided into the effective number of codes which 
fall into particular groups. Further details of the divisions are given in 
section 4.2.1. 
242 
Jump Instruction Types for the 68000 (cont.) 
Instruction No. Codes Non-Jump Exception(RST) Jump (Type) 
Bcc 
BRA 
BSR 
CHK 
DBcc 
JMP 
JSR 
RTR 
RTS 
TRAP 
TRAPV 
3584 
256 
256 
424 
128 
28 
28 
1 
1 
16 
1 
1792 
0 
0 
106 
64 
0 
Privilege instructions 
STOP 1 
RESET 1 
RTE 1 
MOVE to SR 53 
ANDI to SR 1 
EORI to SR 1 
ORI to SR 1 
MOVE USP 16 
0 
0 
0 
0 
0.5 
0 
0.5 
0 
26.5 
0.5 
0.5 
0.5 
8 
896 
128 
128 
318 
32 
14 
14 
0.5 
0.5 
16 
0.5 
0.5 
0.5 
0.5 
26.5 
0.5 
0.5 
0.5 
8 
896 
128 
128 
0 
32 
14 
14 
(JMP) 
(JMP) 
(JMP) 
Totals 4798 1999 1585 
In addition to those mentioned above, there are 
unassigned op-codes which generate an exception if they are 
Overall Instruction Grouping 
(JMP) 
(JMP) 
(JMP) 
0.5 (RET) 
0.5 (RET) 
0 
0 
0.5 (HLT) 
0 
0.5 (RET) 
0 
0 
0 
0 
0 
1214 
19.717 illegal or 
executed. 
HLT — Halt Instuctions. 
JMP — Random Jump Instructions. 
RET — Return Instructions. 
RST — Restart Instructions. 
Non-Jumping Instructions. 
Total 
Effective Number 
0.5 
1212.0 
1.5 
21302.0 
43020.0 
65536.0 
243 
Appendix 4, Equations for Transfers within a Program Area 
This appendix contains the detailed derivation of the probability 
equations governing the transfers between states, during erroneous 
execution in program areas. The following derivations are valid for 
processors having single, double and triple byte instructions only. 
in order to calculate the probabilities of reaching a particular 
state, it is necessary to determine the possible ways of transferring form 
one state to another. All the possible transfers from the three different 
operand fields are shown below. In all cases the first byte read is the 
fourth in the sequence. 
Transfer from State QX. 
Transfer to Jump 
Transfer to Resume 
D J 
Transfer to DX 
Transfer to TXX 
Transfer to TXX 
D S V . . . . 
D D S V . . . 
D T S S V . . 
D T D X y . . 
D D D X . . . 
D T S D X . . 
D D T X X . . 
D T S T X X . 
D T T X X . . 
Transfer from State TXX 
Transfer to Jump 
Transfer to Resume 
T X J 
Transfer to DX 
Transfer to TXX 
T X S y . . . . 
T X D S V , . . 
T X T S S V . . 
T X T D X V . . 
T X D D X . . . 
T X T S D X , . 
T X D T X X . . 
T X T S T X X . 
Transfer to TXX T X T T X X . 
244 
Transfer from State TXX 
Transfer to Jump . . T J X . 
Transfer to Resume . . T D X y . . . 
. . T T X S V . . 
Transfer to DX . . T T X D X . 
Transfer to TXX . . T T X T X X . 
Transfer to TXX . . T S X . 
The symbols used are as fol lows:-
Byte interpreted as an instruction. 
V Any valid instruction bytes. 
X Operand byte of any value. 
S Single byte instruction op-code in the program. 
D Double byte instruction op-code in the program. 
T Triple byte instruction op-code in the program. 
S Operand byte interpreted as a single byte non-jumping instruction. 
D Operand byte interpreted as a double byte non-Jumping instruction. 
T Operand byte interpreted as a triple byte non-jumping instruction. 
J Operand byte interpreted as a jump instruction type. 
X Operand byte interpreted as an instruction. 
The probability that a particular transfer occurs is evaluated by 
multiplying together the probabilities that each specific byte appears in 
that sequence. For example, the transfer from the operand field of a 
double byte instruction to resuming valid instruction fetches can be 
achieved in four different ways. The probability of each sequence is given 
by:-
PD2<R1 
Eqn. A4.1 
P D£R2 • p s Eqn. A4.2 
PD.£R3 P D T • p s • p s Eqn. A4.3 
PD2<R4 P D I P D Eqn. A4.4 
Where:- R is the state of resuming valid instruction fetches. 
245 
DXR represents the transfer from DX to R. 
For all the other quantities the same nomenclature has been used as above, 
so that the probability of interpreting an operand byte of a double byte 
instruction, as a single byte non-jumping instruction, is represented by 
p 
DS ' 
As the transfer can occur in any one of these ways, the overall 
probability of the transfer, P Q X R . is given by:-
P = P + P + P + P Fnn A4 5 DXT-I DXR1 DXR2 DXR3 DXR4 c q n ' * ° 
Similar expressions can be obtained for all the other transfers. 
The probability of a specific byte appearing at a given location is 
obtained from the ratio of that byte type to the total number of locations. 
For example P n c is given by:-
N D S 
P D S = Eqn. A4.6 
Where:- is the number of single byte non-jumping instructions which 
appear in the operand field of double byte instructions. 
N D ) is the number of double byte instructions. 
Pg is given by:-
N S I 
P s = - j jS - Eqn. A4.7 
Where:- Ng ( is the number of single byte instructions in the program 
area. 
IMj is the total number of instructions. 
Values of these probabilities can either be obtained by assuming equal 
use of each instruction and random data in the operand fields, or by 
analysing actual programs. 
From the above expressions it is possible to derive equations for the 
246 
probability of being at a particular state. I instruction cycles after the 
erroneous jump. They are of the following form:-
V" = V M ) • PDXDX + P T X X ( M ) ' PTXXDX + P T X X ( M ) ' PT^XDX 
Eqn. A4.8 
Where:- DXDX represents the transfer from DX to DX. 
TXXDX represents the transfer from TXX to DX. 
TXXDX represents the transfer from TXX to DX. 
Similar expressions can be obtained for P T X X ( I > and P T X X ( I ) . 
For the probabilities of a jump to another part of the memory map or 
of resuming valid instruction fetches, the values are cumulative because it 
is assumed that once in these states, execution cannot transfer elsewhere. 
Therefore the following expressions apply:-
P R ( I ) = P R ( M ) + P D X ( . - D . P D X R + P ^ ( M > . P T X 2 i R 
+ P T * X ( M ) • P T X X R Eqn. A4.9 
The analysis of section 5.2 treats the jump instruction types as four 
separate groups. For clarity, the derivations so far have only considered 
a single jump type. However the expressions for the individual groups are 
the same as for the overall group, except that the probabilities of a 
particular jump type appearing in the operand field are reduced 
proportionally. 
Section 5.2 also shows that the probabilit ies, when I equals zero, can 
be found. Therefore the probabilities for all other positive integer 
values of l can be evaluated from the above equations. 
247 
Appendix 5. Results of Execution in Unpopulated Memory Areas 
This appendix gives detailed results of the execution following an 
erroneous jump into unpopulated memory areas of the 8048 and 8085. For 
both processsors instruction fetches read back the lower order byte of the 
address and therefore a 256 byte sequence appears in these areas. The 
results below show the effective number of starting points within the 
sequence which give a particular transfer, and from this the probability of 
each outcome has been calculated. 
A5.1 Unpopulated Area Execution for the 8048 
Final Instruction Effective Number of % Probability 
Executed Start Addresses of Transfer 
JUMP 005/805 1.0 0.4 
CALL 015/815 15.0 5.9 
JUMP 125/925 1.0 0.4 
CALL 135/935 46.0 18.0 
JUMP 245/A45 1.0 0.4 
CALL 255/A55 15.5 6.1 
JUMP 365/B65 31.5 12.3 
CALL 375/B75 16.0 6.3 
JUMP 485/C85 1.0 0.4 
CALL 495/C95 8.0 3.1 
JUMP 5A5/DA5 16.0 6.3 
CALL 5B5/DB5 7.5 2.9 
JUMP 6C5/EC5 16.0 6.3 
CALL 6D5/ED5 8.0 3.1 
JUMP 7E5/FE5 24.0 9.4 
CALL 7F5/FF5 15.0 5.9 
CALL 815 0.5 0.2 
CALL 935 0.5 0.2 
CALL 7F5 1.0 0.4 
JUMP §A 8.5 3.3 
RET 15.0 5.9 
RETR 8.0 3.1 
Where two addresses have been given the transfer is dependent on the 
state of the memory bank select f l ip-f lop, and the corresponding address 
will be used. 
248 
A5.2 Unpopulated Area Execution for the 8085 
Address or Instruction Effective Number 
Reached 
HALT 
RETURN 
Address in HL Register 
DFDE 
FFFE 
RESTART 4 
RESTART 5 
RESTART 6 
RESTART 7 
F4F3 
Address in DE Register 
RESTART 1 
Address in BC Register 
RESTART 0 
CFCE 
RESTART 2 
RESTART 3 
C4C3 
C5C4 
D4D3 
DCDB 
E4E3 
E6E5 
EEED 
F6F5 
FCFB 
FEFD 
CECD 
Address in PSW 
C6C5 
CCCB 
RESTART 8 
DEDD 
ECEB 
D6D5 
of Start Addresses 
122.2 
89.6 
3.5 
3.0 
2.2 
2.0 
2.0 
2.0 
2.0 
1.9 
1.9 
1.8 
1.5 
1.5 
1.5 
1.5 
1.5 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
0.8 
0.8 
0.5 
0.5 
0.5 
0.5 
0.5 
0.5 
% Probability 
of Transfer 
47.7 
35.0 
1.4 
1.2 
0.9 
0.8 
0.8 
0.8 
0.8 
0.7 
0.7 
0.7 
0.6 
0.6 
0.6 
0.6 
0.6 
0.4 
0.4 
0.4 
0.4 
0.4 
0.4 
0.4 
0.4 
0.4 
0.4 
0.3 
0.3 
0.2 
0.2 
0.2 
0.2 
0.2 
0.2 
Transfer 
Type 
HLT 
RET 
JMP 
SPC 
SPC 
RST 
RST 
RST 
RST 
SPC 
JMP 
RST 
JMP 
RST 
SPC 
RST 
RST 
SPC 
SPC 
SPC 
SPC 
SPC 
SPC 
SPC 
SPC 
SPC 
SPC 
SPC 
JMP 
SPC 
SPC 
RST 
SPC 
SPC 
SPC 
Transfer types:- HLT Hall. 
RST Restart. 
JMP Random Jump. 
RET Return. 
SPC Specific Jump. 
249 
Appendix 6. Software for the Fault Simulation Test Facility 
This appendix contains full commented listings of the software written 
modules. CONTROL is the control program which organises the sequence of 
runs and calls the other modules. PREFAULT is the main core of the 
interrupt service routine. It retrieves the return address from the stack 
and saves all the registers before call ing the fault injection routine. 
Both of these modules provide the basis for all testing and do not require 
alteration. 
The remaining modules are test dependent and have to be rewritten for 
different faults or test programs. For these modules, the listings show a 
specific example. INIT initialises the state of the test system before 
each run. FAULT simulates the desired fault during the interrupt routine. 
CHECK gives an indication of the correctness of the result after execution. 
TEST Is the program to be tested. 
A6.1 CONTROL - Main Control Program 
CONTROL PROGRAM FOR TESTING FAULT TOLERANT ROUTINES 
IT INJECTS A FAULT AT SUCCESSIVE LOCATIONS IN THE TEST PROGRAM 
THE TESTING FACILITY USES THE TIMER SECTION OF THE 8155 ON THE SDK BOARD 
TO GENERATE INTERRUPTS AT DIFFERENT POINTS IN THE PROGRAM. 
THE INTERRUPT ROUTINE CAN THEN BE USED TO INJECT 'FAULTS' INTO THE SYSTEM 
for the fault simulation test facility. It is split into a number of 
BY CORRUPTING REGISTERS OR MEMORY LOCATIONS. 
BLKEND EQU 
BLKST EQU 
CHECK EQU 
CIN EQU 
COUT EQU 
ENDTC EQU 
GETAD EQU 
INIT EQU 
MASK EQU 
020B6H 
07FFFH 
08700H 
00820H 
00850H 
02008H 
00626H 
08500H 
02007H 
STORE FOR END OF PROG ADDRESS 
START OF PROGRAM ADDRESS 
LOCATION OF CHECKING ROUTINE 
READS IN SERIAL BYTE INTO A AND C REGS 
OUTPUTS SERIAL CHARACTER IN C REG 
STORES FINAL VALUE OF TIMER COUNT 
READS ADDRESS INTO BC REGS 
LOCATION OF ROUTINE TO INITIALISE REGS 
TEMPORARY STORE FOR INTERRUPT MASK 
250 
CONTROL (cont.) 
MONIT EQU 00408H :RETURN ADDRESS INTO SDK BOARD MONITOR 
NUMFLG EQU 02005H .FLAG FOR SINGLE FAULT INJECTION 
PROG EQU 0C000H .START OF TEST PROGRAM 
RUNNUM EQU 02003H .HOLDS THE VALUE OF THE CURRENT RUN NO. 
STACK EQU 020B0H .STACK POINTER 
TCOUNT EQU 020C0H .LOCATION FOR TIMER COUNT 
TIMFLG EQU 02006H .FLAG FOR SINGLE LOCATION FAULT INJECTION 
UPDAD EQU 00362H ;DISPLAYS CONTENTS OF HL ON SDK BOARD 
ASEG 
ORG 09800H .LOCATE PROGRAM 
LXI SP.STACK .SETS STACK FOR CONTROL PROG USE 
Dl .DISABLES INTERRUPTS WHILE SETTING TIMER 
LXI H.06H .SET TIMER COUNT FOR FIRST RUN 
SHLD TCOUNT .'SAVES VALUE 
MVI A.000H .SET FLAGS 
STA TIMFLG 
STA NUMFLG 
LXI H.00000H .SET RUN NUMBER 
SHLD RUNNUM 
LXI H.MESS1 ;LOADS START ADDRESS OF MESSAGE 
CALL STRING .AND OUTPUTS STRING 
CALL GETAD ;READS IN END OF PROGRAM BLOCK 
MOV H.B .TRANSFER BLOCK END AND SAVE 
MOV L.C 
SHLD BLKEND 
LXI H.MESS2 .READY MESSAGE AND SEND 
CALL STRING 
CALL CIN .READ IN SINGLE BYTE 
ANI 07FH .REMOVE EXTRA BIT 
CPI 'S ' ;CHECK IF SINGLE LOCATION REQUIRED 
JNZ SKIP1 .IF ALL LOCATIONS SKIP RESETTING FLAGS 
LXI H.MESS3 
MVI A.OFFH .SET SINGLE LOCATION FLAG 
STA TIMFLG 
CALL STRING .REQUEST COUNT FOR SINGLE LOCATION 
CALL GETAD .READ IN TIMER COUNT 
MOV H.B .TRANSFER TO HL 
MOV L C 
SHLD TCOUNT .STORE NEW VALUE OF TIMER COUNT 
SKIP!: LXI H.MESS4 .READY MESSAGE AND SEND 
CALL STRING 
CALL CIN ;READ IN SINGLE BYTE 
ANI 07FH ;REMOVE EXTRA BIT 
CPI 'S ' ;CHECK FOR SINGLE FAULT INJECTION 
JNZ START ;IF FULL RUN START EXECUTION 
MVI A.OFFH .SET RUN NUMBER FLAG 
STA NUMFLG 
LXI H.MESS5 ;READY MESSAGE AND SEND 
CALL STRING 
CALL GETAD ;READ IN RUN NUMBER 
251 
CONTROL (cont.) 
START: 
MOVE: 
MOV H.B 
MOV L C 
SHLD RUNNUM 
JMP START 
LHLD TCOUNT 
INX H 
SHLD TCOUNT 
LXI SP.STACK 
LHLD TCOUNT 
SHLD ENDTC 
CALL UP DAD 
LXI B.BLKST 
LXI D.PROG-1 
LHLD BLKEND 
INX B 
INX D 
LDAX B 
STAX D 
MOV A.L 
CMP E 
JNZ MOVE 
MOV A.H 
CMP D 
JNZ MOVE 
LHLD TCOUNT 
MVI A.040H 
OUT 028H 
MOV A.L 
OUT 02CH 
MOV A.H 
OUT 02DH 
CALL INIT 
MVI A.OCOH 
OUT 028H 
MVI A.01BH 
SIM 
POP PSW 
LXI SP.0C800H 
El 
OUT OFFH 
JMP PROG 
:MOVE VALUE INTO HL 
STORE VALUE IN MEMORY 
SKIP INCREMENT OF COUNTER 
RESET COUNTER FOR NEXT RUN 
SET STACK FOR CONTROL PROGRAM USE 
LOAD TIMER COUNT INTO HL REGS 
SAVES VALUE FOR DISPLAY ON COMPLETION 
DISPLAY VALUE ON SDK BOARD 
LOAD BLOCK START ADDRESS IN ROM 
LOAD BLOCK START ADDRESS IN RAM 
LOAD BLOCK END ADDRESS IN RAM 
INCREMENT POINTERS 
;READ IN BYTE FROM INSTANT ROM 
.STORE BYTE IN RAM 
.CHECK FOR END OF BLOCK 
.LOAD COUNT FOR PROGRAMMING TIMER 
;STOP TIMER IF RUNNING 
.LOADS LOW ORDER BYTE OF COUNT 
.LOADS HIGH ORDER BYTE OF COUNT 
.'INITIALISES REGISTERS BEFORE TEST 
.START COUNT 
;SET INTERRUPT TO ENABLE RST 7.5 
;RESETS PSW BEFORE TEST 
.SET STACK BEFORE JUMP 
.TRIGGERS HARDWARE TO MASK OFF A14.A15 
.•JUMPS INTO PROGRAM TO BE TESTED 
FOR THE FIRST RUN AN INTERRUPT WILL OCCUR DURING THE JUMP INSTRUCTION. 
IT IS THEREFORE POSSIBLE TO INJECT A 'FAULT' BEFORE EXECUTION OF THE 
FIRST INSTRUCTION IN THE TEST PROGRAM. 
AFTER A COMPLETE RUN OF THE TEST PROGRAM. FOLLOWING CODE WILL BE 
EXECUTED TO DETERMINE SUCCESS OR FAILURE. THEN OUTPUT THE RESULT. 
ORG 
RETURN: LXI 
09900H .FIX RETURN ADDRESS 
SP.STACK ;RESET STACK AFTER TEST 
252 
CONTROL (cont.) 
FAIL: 
T E S T ! : 
T E S T 2 : 
PREND: 
RIM 
STA MASK 
MVI A.010H 
SIM 
CALL C H E C K 
J N C FAIL 
MVI C . ' S ' 
C A L L COUT 
JMP T E S T ! 
C A L L OUTTC 
LDA TIMFLG 
INR A 
JZ T E S T 2 
LDA MASK 
ANI 040H 
JZ ISTART 
LXI H.0006H 
SHLD TCOUNT 
LDA NUMFLG 
INR A 
J Z PREND 
MVI C.ODH 
CALL COUT 
MVI C O AH 
C A L L COUT 
LHLD RUNNUM 
INX H 
SHLD RUNNUM 
C A L L OUTNUM 
MVI C . ' " ' 
C A L L C O U T 
JMP S T A R T 
MVI A.010H 
SIM 
LXI H.MESS6 
C A L L STRING 
LHLD E N D T C 
CALL OUTNUM 
LXI H.MESS7 
CALL STRING 
LHLD RUNNUM 
C A L L OUTNUM 
MVI C.01AH 
C A L L COUT 
JMP MONIT 
READ IN S T A T E O F I N T E R R U P T S 
S T O R E IN TEMP LOCATION 
R E S E T I N T E R R U P T O F F 
SUBROUTINE TO C H E C K IF C O R R E C T R E S U L T S 
CARRY NOT S E T IF U N S U C C E S S F U L 
OUTPUTS ' S ' TO INDICATE S U C C E S S 
R E T U R N S A F T E R SENDING ' S ' 
OUTPUTS TIMER COUNT 
T E S T FOR S I NGLE LOCATION FAULT 
S INGL E RUN JUMP TO S E C O N D T E S T 
RELOAD S T A T E OF I N T E R R U P T S 
T E S T S TO S E E IF MORE RUNS R E Q U I R E D 
G O E S BACK FOR NEXT RUN 
R E S E T TIMER COUNT FOR NEXT S E T OF RUNS 
SAVE VALUE FOR LATER U S E 
LOAD T E S T FLAG INTO A C C 
JUMP IF S I NGLE RUN OR END 
S E N D C R . L F TO P L A C E RUN NUMBER IN 
L E F T HAND COLUMN OF LINE 
LOAD RUN NUMBER 
INCREMENT READY FOR NEXT RUN 
SAVE FOR LATER U S E 
O U T P U T S VALUE TO S C R E E N 
OUTPUT ' * ' TO MARK RUN NUMBER 
;JUMP BACK FOR NEXT RUN 
; R E S E T RST7.5 FLIP FLOP TO O F F 
;OUTPUTS TERMINATING M E S S A G E 
.OUTPUT VALUE O F TIMER COUNT 
;OUTPUT VALUE OF RUN NUMBER 
;SEND END OF F I L E MARKER 
:JUMP BACK TO SDK MONITOR 
M E S S ! : 
M E S S 2 : 
M E S S 3 : 
DB 
DB 
DB 
DB 
DB 
ODH.OAH.'FAULT TOLERANT TEST ING FACILITY' .0DH.0AH.0AH 
' T E S T PROGRAM LOCATED FROM C000 TO $' 
ODH.OAH.OAH.'TYPE " S " FOR FAULT INJECTION AT A S I N G L E ' 
' LOCATION $' 
ODH.OAH.OAH.'ENTER TIMER COUNT FOR R E Q U I R E D LOCATION $' 
253 
CONTROL (cont.) 
MESS4: DB ODH.OAH.OAH.'TYPE " S " FOR S INGLE RUN ON E A C H ' 
DB 'LOCATION $' 
M E S S 5 : DB ODH.OAH.OAH.'ENTER RUN NUMBER FOR R E Q U I R E D RUN $' 
MESS6: DB 0DH.0AH.07H /EXECUTION TERMINATED AT TIMER COUNT $' 
MESS7: DB '. AND RUN NUMBER $' 
SUBROUTINE — STRING C A L L S : - COUT 
ROUTINE TO OUTPUT A STRING OF C H A R A C T E R S DELIMITED BY A '$ ' 
START A D D R E S S O F STRING MUST B E IN T H E HL R E G PAIR 
MOV C M .'GET BYTE FROM MEMORY 
MOV A . C ; C H E C K FOR DELIMITER 
CPI ' $ ' 
RZ .RETURN IF END OF M E S S A G E 
P U S H H .SAVE MEMORY A D D R E S S 
CALL C O U T .OUTPUT C H A R A C T E R 
POP H 
INX H . INCREMENT A D D R E S S POINTER 
JMP STRING .GO BACK FOR NEXT C H A R A C T E R 
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 
SUBROUTINE - - OUTTC/COUTNUM) C A L L S — AOUT, CONV. COUT 
C O N V E R T S 16 BIT A D D R E S S S T O R E D IN 'TCOUNT' . (OR IN HL R E G I S T E R PAIR). 
INTO 4 ASCII C O D E S AND OUTPUTS THEM TO THE SERIAL PORT 
X X X X X X X X X R X X R X X X * X X * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * X * X X * * « X 
OUTTC: LHLD TCOUNT 
OUTNUM: MOV A.H 
CALL AOUT 
MOV A.L 
CALL AOUT 
R E T 
LOAD IN NUMBER FOR OUTPUT 
OUTPUTS HIGH BYTE 
OUTPUTS LOW BYTE 
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 
SUROUTINE — AOUT C A L L S — CONV. COUT 
C O N V E R T S S I N G L E BYTE IN ACCUMULATOR INTO TWO ASCII C O D E S AND O U T P U T S 
THEM TO THE S E R I A L LINE 
x x x x x x * « * x x x x x x x x x x x x * x x * x x x x x x x x * x x x x x x x x x x x x x x x x x x x x x x x * x x x x x x x x x x x x x x x 
AOUT: MOV B.A 
RAR 
. S A V E S BYTE IN B R E G 
.SHIFTS U P P E R BITS 
254 
CONTROL (cont.) 
RAR 
RAR 
RAR 
ANI OFH MASK O F F U P P E R 4 BITS 
CALL CONV C O N V E R T S TO ASCII AND 
MOV A.B R E S T O R E VALUE 
ANI OFH MASKS O F F BITS 
CALL CONV C O N V E R T S AND S E N D S 
R E T 
SUBROUTINE — CONV C A L L S — COUT 
C O N V E R T S HEX DIGIT IN ACCUMULATOR TO ASCII AND O U T P U T S TO S E R I A L PORT 
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 
CONV: 
SKIPC: 
ADI 030H ;CONVERT TO ASCII 
CPI 03AH . C H E C K IF 0 -9 
JM SKIPC 
ADI 07H R E A D J U S T S FOR A - F 
MOV C.A .MOVE C O D E TO C R E G 
CALL COUT ;OUTPUTS ASCII C O D E 
R E T 
END 
A6.2 P R E F A U L T - Main Core of Interrupt Serv ice Routine 
• x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 
THIS IS THE FAULT INJECTING ROUTINE 
IT IS E X E C U T E D A F T E R A RST 7.5 INTERRUPT. IT S A V E S T H E C U R R E N T STACK 
POINTER AND ALL THE R E G I S T E R S . IT R E T R I E V E S T H E RETURN A D D R E S S FROM 
MEMORY AND S T O R E S IT AS PART OF THE JUMP INSTRUCTION AT THE END OF 
THIS ROUTINE. A SUBROUTINE IS THEN C A L L E D TO ACTUALLY INJECT THE FAULT 
B E F O R E REINSTATING ALL T H E R E G I S T E R S AND T H E ORIGINAL STACK POINTER. 
C A L L S :- FAULT 
X X X X X X X X X X X X X X X X X X X X X X X X R X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 
CYTEMP EQU 020B2H 
DUMMY EQU 00000H 
HLTEMP EQU 020B0H 
FAULT EQU 08600H 
S P T E M P EQU 020B4H 
STACK EQU 020B0H 
TEMP S T O R E FOR CARRY FLAG 
DUMMY A D D R E S S C H A N G E D LATER 
TEMP S T O R E FOR HL R E G I S T E R S 
LOCATON OF FAULT INJECTING ROUTINE 
TEMP S T O R E FOR S T A C K POINTER 
TEMP STACK POINTER FOR THIS ROUTINE 
255 
P R E F A U L T (cont.) 
A S E G 
ORG 0 9 F C 2 H 
SKIP: 
SHLD HLTEMP 
MVI H.OOOH 
J N C SKIP 
MVI H.OFFH 
SHLD C Y T E M P 
LXI H,00000H 
DAD S P 
LXI S P . S T A C K 
PUSH D 
PUSH B 
PUSH PSW 
MOV E . L 
MOV D.H 
INX H 
INX H 
SHLD S P T E M P 
XCHG 
MOV A.H 
ORI OCOH 
MOV H.A 
MOV E.M 
INX H 
MOV D.M 
XCHG 
SHLD RETURN+1 
CALL FAULT 
POP PSW 
POP B 
POP D 
LHLD C Y T E M P 
DAD H 
LHLD S P T E M P 
S P H L 
LHLD HLTEMP 
OUT OFFH 
JMP DUMMY 
END 
. S E T A D D R E S S SO LAST TWO B Y T E S IN RAM 
.SAVE HL R E G I S T E R S 
. P R E P A R E TO S E T TEMPORARY CARRY FLAG 
; T E S T C U R R E N T CARRY FLAG 
; R E S E T TO INDICATE CARRY WAS S E T 
.SAVE TEMPORARY CARRY FLAG 
;CLEAR HL R E G S 
. G E T C U R R E N T S T A C K POINTER INTO HL R E G S 
,'LOAD TEMPORARY STACK POINTER 
.SAVE ALL R E G I S T E R S 
; S T O R E COPY OF ORIGINAL SP IN DE R E G S 
.ADJUST OLD S P FOR REMOVAL OF RETURN 
; A D D R E S S 
.SAVE VALUE IN TEMP S T O R E 
.RETURN OLD S P INTO HL R E G S 
; S E T U P P E R TWO BITS SO THAT S T A C K POINTS 
; TO MASKED A R E A 
.'RETURN HIGH B Y T E TO H R E G 
. G E T LOW BYTE OF R E T U R N A D D R E S S 
; G E T HIGH BYTE 
. T R A N S F E R TO HL R E G S 
; S T O R E AS PART O F JUMP INSTRUCTION 
.CALL ROUTINE TO INJECT FAULT 
. R E S T O R E ALL R E G I S T E R S 
R E T R I E V E TEMPORARY CARRY F L A G 
R E S E T CARRY F L A G 
G E T OLD STACK POINTER 
R E S E T STACK FOR T E S T PROGRAM U S E 
R E S T O R E HL R E G S 
S E T UP MASKING CIRCUITRY 
DUMMY C H A N G E D DURING EXECUTION 
256 
A6.3 FAULT - Simulates Desired Fault 
THIS IS A FAULT INJECTING ROUTINE WHICH C O R R U P T S DATA IN MEMORY LOCATIONS 
C100 AND C101. 
FLAG S E T TO OFFH ON LAST RUN 
S T O R A G E LOCATION FOR THE RUN NUMBER 
LOCATION OF INPUTS INTO T E S T ROUTINE 
LOADS IN RUN NUMBER 
LOAD A D D R E S S OF INPUTS INTO DE R E G S 
SWAP HL FOR DE 
C L E A R B R E G READY FOR LAST RUN FLAG 
T R A N S F E R HIGH BYTE OF RUNNUM INTO A C C 
T E S T LOW O R D E R BIT 
NO CORRUPTION IF BIT NOT S E T 
C O R R U P T MEMORY BYTE 
S E T HALF OF LAST RUN FLAG 
RELOAD HIGH BYTE OF RUN NUMBER 
T E S T S E C O N D BIT 
NO CORRUPTION IF BIT NOT S E T 
S E T A D D R E S S IN HL R E G TO HIGH B Y T E 
C O R R U P T HIGH BYTE IN MEMORY 
S E T S E C O N D HALF OF LAST RUN FLAG 
ADD BOTH HALVES OF FLAG T O G E T H E R 
MOVE FLAG TO B R E G 
LOAD LOW BYTE OF RUN NUMBER INTO A C C 
T E S T FOR LAST RUN 
RETURN IF NOT LAST RUN 
MOVE F L A G INTO A C C 
S T O R E FLAG IN MEMORY 
NUMFLG E Q U 02005H 
RUNNUM E Q U 02003H 
VAL E Q U 0C100H 
FAULT: LHLD RUNNUM 
LXI D.VAL 
XCHG 
MVI B.OOH 
MOV A.D 
ANI 01H 
JZ S K I P ! 
MOV M.E 
MVI B.OFH 
S K I P ! : MOV A.D 
ANI 02 H 
J Z SKIP2 
INX H 
MOV M.E 
MVI A.OFOH 
ADD B 
MOV B.A 
SKIP2: MOV A . E 
INR A 
RNZ 
MOV A.B 
STA NUMFLG 
R E T 
END 
257 
A6.4 INIT - Initialisation Routine 
«««««««»«««««««««««««««««««««««««««««««««««»««««««««««*«*«»««««««««««««»» 
THIS ROUTINE S E T S UP INITIAL S T A T U S OF THE P R O C E S S O R 
H A A K A A A A A A A f t A A A A A A A A A A A A A A A A A A K S A A A A A X A A A A A A A A A A A A A A A A n A j X A A A A X A A A A A t t A A A A A 
E R R F L G EQU 0 C 7 F F H .LOCATION OF E R R O R F L A G 
LXI H.00O0OH 
XTHL 
PUSH H 
LXI B.00O0OH 
LXI D.OOOOOH 
LXI H.00000H 
MVI A.OFFH 
STA E R R F L G 
R E T 
END 
S E T WHAT WILL B E C O M E THE PSW 
P U S H E S HL ONTO S T A C K . R E M O V E S R E T ADD 
R E P L A C E S RETURN A D D R E S S ONTO STACK 
S E T INITIAL VALUE FOR BC R E G S 
S E T INITIAL VALUE FOR DE R E G S 
S E T INITIAL VALUE FOR HL R E G S 
S E T S A R E G TO F F 
S E T S E R R O R FLAG 
A6.5 C H E C K - C h e c k s Result after Execution 
R S R R * X R R R R R R R f i R R R R R R R R R R R R R H R R R R R * * * R « R R « R X * * R R R R * R R * * * R R R » R n a R R R K n R R R * R « 
C H E C K I N G ROUTINE TO T E S T FOR S U C C E S S OR FAILURE OF T H E T E S T PROGRAM 
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 
VAL3 EQU O C n O H 
C H E C K : LDA VAL3 
CPI 055H 
JNZ C L E A R 
S T C 
R E T 
C L E A R : XRA A 
R E T 
END 
A6.6 T E S T - Program to be Tested 
L O C A T I O N OF ANSWER FROM T E S T PROGRAM 
LOAD IN ANSWER FROM T E S T PROGRAM 
C H E C K HIGH BYTE 
C L E A R CARRY AND R E T U R N 
S E T CARRY TO INDICATE S U C C E S S 
. C L E A R CARRY 
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 
THIS PROGRAM ADDS TWO 8 -B IT NUMBERS T O G E T H E R AND S T O R E S THE R E S U L T 
INCLUDES E R R O R DETECTION AND C O R R E C T I O N S O F T W A R E 
E R R F L G EQU 0 C 7 F F H 
RETURN EQU 09900H 
START EQU 0C000H 
S T O R E FOR E R R O R FLAG 
RETURN A D D R E S S TO CONTROL PROGRAM 
START A D D R E S S OF PROGRAM 
258 
T E S T (cont.) 
A S E G 
ORG S T A R T 
LXI SP.STACK+1C 
LDA VAL11 
MOV B.A 
LDA VAL12 
S U B B 
JZ READ2 
XRA A 
STA E R R F L G 
LDA VAL13 
MOV C.A 
S U B B 
JZ READ2 
MOV B.C 
READ2: LDA VAL21 
MOV C.A 
LDA VAL22 
S U B C 
JZ C A L C 
XRA A 
STA E R R F L G 
LDA VAL23 
MOV D.A 
S U B C 
JZ C A L C 
MOV C D 
C A L C : MOV A.B 
ADD C 
STA VAL3 
Dl 
OUT OFFH 
JMP RETURN 
ORG START+100H 
VAL11 DB 012H 
VAL13 DB 012H 
VAL12 DB 012H 
VAL21 DB 043H 
VAL22 DB 043H 
VAL23 DB 043H 
ORG START+110H 
VAL3: DB 000H 
.LOAD STACK POINTER 
.'LOAD IN F IRST VARIABLE 
LOAD IN S E C O N D COPY 
S U B T R A C T VALUES 
IF BOTH EQUAL. U S E VALUE IN B R E G 
C L E A R ACCUMULATOR 
Z E R O FLAG TO INDICATE E R R O R 
READ IN THIRD COPY 
TEMP S T O R E IN C R E G 
T E S T IF EQUAL 
U S E VALUE IN B R E G 
IF ONLY ONE ERROR. MUST B E IN VAL11 
T R A N S F E R THIRD COPY TO B R E G AND U S E 
R E P E A T FOR OTHER INPUT USING C R E G 
; C L E A R ACCUMULATOR 
; C L E A R FLAG TO INDICATE E R R O R 
T R A N S F E R F IRST INPUT TO A C C 
ADD TO S E C O N D INPUT 
S T O R E THE R E S U L T 
P R E V E N T ANY F U R T H E R INTERRUPTS 
C L E A R S M A S K F O R R E T U R N T O C O N T R O L PROG 
JUMP BACK TO CONTROL PROGRAM 
STACK: .BOTTOM OF STACK 
END 
259 
