Real Time Fault Detection and Diagnostics Using FPGA-Based Architecture by Naber, Nathan P.
Air Force Institute of Technology
AFIT Scholar
Theses and Dissertations Student Graduate Works
3-10-2010
Real Time Fault Detection and Diagnostics Using
FPGA-Based Architecture
Nathan P. Naber
Follow this and additional works at: https://scholar.afit.edu/etd
Part of the Computer and Systems Architecture Commons, and the Data Storage Systems
Commons
This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and
Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact richard.mansfield@afit.edu.
Recommended Citation
Naber, Nathan P., "Real Time Fault Detection and Diagnostics Using FPGA-Based Architecture" (2010). Theses and Dissertations.
1976.
https://scholar.afit.edu/etd/1976
  
 
REAL TIME FAULT DETECTION AND DIAGNOSTICS  
USING FPGA-BASED ARCHITECTURES 
THESIS 
Nathan P. Naber, Second Lieutenant, USAF 
AFIT/GCE/ENG/10-04 
 
DEPARTMENT OF THE AIR FORCE 
AIR UNIVERSITY 
AIR FORCE INSTITUTE OF TECHNOLOGY 
 
Wright-Patterson Air Force Base, Ohio 
 
APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
The views expressed in this thesis are those of the author and do not reflect the official policy or 
position of the United States Air Force, Department of Defense, or the United States 
Government. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 AFIT/GCE/ENG/10-04 
 
REAL TIME FAULT DETECTION AND DIAGNOSTICS  
USING FPGA-BASED ARCHITECTURES 
THESIS 
Presented to the Faculty 
Department of Electrical and Computer Engineering 
Graduate School of Engineering and Management 
Air Force Institute of Technology 
Air University 
Air Education and Training Command 
In Partial Fulfillment of the Requirements for the 
Degree of Master of Science in Computer Engineering 
 
Nathan P. Naber, BS 
Second Lieutenant, USAF 
 
March 2010 
 
APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED 
 
 
 
 
 

iv 
 
 
AFIT/GCE/ENG/10-04 
Abstract 
 Errors within circuits caused by radiation continue to be an important concern to 
developers.  A new methodology of real time fault detection and diagnostics utilizing FPGA 
based architectures while under radiation were investigated in this research.  The contributions of 
this research are focused on three areas; a full test platform to evaluate a circuit while under 
irradiation, an algorithm to detect and diagnose fault locations within a circuit, and finally to 
characterize Triple Design Triple Modular Redundancy (TDTMR), a new form of TMR.  Five 
different test setups, injected fault test, gamma radiation test, thermal radiation test, optical laser 
test, and optical flash test, were used to assess the effectiveness of these three research goals. 
 The testing platform was constructed with two FPGA boards, the Device Under Test 
(DUT) and the controller board, to generate and evaluate specific vector sets sent to the DUT.  
The testing platform combines a myriad of testing and measuring equipment and work hours 
onto one small reprogrammable and reusable FPGA.  This device was able to be used in multiple 
test setups.  The controlling logic can be interchanged to test multiple circuit designs under 
various forms of radiation.   
 The detection and diagnostic algorithm was designed to determine fault locations in real 
time.  The algorithm used for diagnosing the fault location uses inverse deductive elimination.  
By using test generation tools, fault lists were developed.  The fault lists were used to narrow \ 
the possible fault locations within the circuit.  The algorithm is able to detect single stuck at 
faults based on these lists.  The algorithm can also detect multiple output errors but not able to 
diagnose multiple stuck at faults in real time.   
v 
 
 TDTMR utilized three unique forms of logic rather than having three copies of identical 
circuitry.  The three different adder designs used for this research are a behavioral adder, carry 
look ahead adder, and ripple carry adder.   
Based on the five tests, the testing platform operated successfully.  The detection and 
diagnosis algorithm was able to detect errors.  The injected fault test was the only test that was 
able to properly diagnosis the location of the fault.  The results also unexpectedly showed that 
the voting unit failed before any of the adders while under radiation.  Dose rate versus total dose 
has a differing effect on the DUT.  The goals of this research was met by completing a fully 
interchangeable and operational testing platform, an algorithm that detects and diagnosis errors 
in real time, and an initial evaluation of TDTMR. 
 
 
  
vi 
 
AFIT/GCE/ENG/10-04 
 
 
 
 
 
 
 
 
 
To My Uncle 
For always pushing me toward 
higher learning  
  
vii 
 
Acknowledgements 
To say that I was able to accomplish this research by myself would be completely remiss 
on my part.  There are so many people and organizations that aided my research efforts in the 
last year.  First off, I would like to thank my thesis advisor for always offering me advice and 
guidance despite constantly inundating his office with my presence.  Without his direction, I 
would have surely gone awry.  I want to thank Dr. Petrosky for his time and effort in tutoring me 
on the radiation effects on electronics and his advice on the direction I needed to take my 
experiments to maximize my results in radiation.  I’m indebted to Dr. Grimaila for his expert 
advice in electrical engineering theories.   
None of my hardware for this thesis could have been constructed without the help of all 
the lab mangers at AFIT and contractors at AFRL who allowed me to use their electrical parts, 
tools, and labs to construct my test platform.  Thank you to OSUNRL and Major Koehl for their 
aid in helping me use their facilities for the gamma and thermal radiation tests.  Special thanks 
go to Lt Kyle Stewart for his knowledge and aid in coding the base platform for the hardware.  I 
would also like to thank my fellow VLSI partners, Dan and Adam, for keeping the workplace 
lighthearted and enjoyable during our time at AFIT. 
Finally, I would like to thank all the other professors at AFIT that aided me when I had 
questions throughout the year, my friends here in Dayton and abroad, and my family members.  
Without your support, guidance, and friendship, I would not have made it to where I stand today 
having completed this work.  
 
 Thank you all again, 
 Nathan P. Naber  
 
 
viii 
 
Table of Contents 
Page 
Abstract .......................................................................................................................................... iv 
Acknowledgements ........................................................................................................................ vii 
Table of Contents ......................................................................................................................... viii 
List of Figures ................................................................................................................................ xi 
List of Tables ................................................................................................................................ xiii 
List of Symbols and Acronyms ..................................................................................................... xiv 
I. Introduction ................................................................................................................................. 1 
1.1 Motivation ............................................................................................................................. 1 
1.2 Scope ..................................................................................................................................... 2 
1.3 Research Goals ...................................................................................................................... 3 
1.4 Thesis Organization............................................................................................................... 4 
II. Background ................................................................................................................................ 6 
2.1 Chapter Overview ................................................................................................................. 6 
2.2 Field Programmable Gate Arrays .......................................................................................... 6 
2.3 Triple Modular Redundancy ................................................................................................. 7 
2.4 Fault Detection and Correction ........................................................................................... 11 
2.5 Fault Diagnosis .................................................................................................................... 13 
2.6 Radiation Effects on Electronics ......................................................................................... 14 
2.6.1 Total Dose Effect .......................................................................................................... 14 
2.6.2 Single Event Effects ..................................................................................................... 17 
2.6.3 Single Event Upsets ...................................................................................................... 18 
2.7 Gamma Radiation Source.................................................................................................... 20 
2.8 Thermal Radiation Source ................................................................................................... 21 
2.9 Previous Work ..................................................................................................................... 23 
2.10 Summary ........................................................................................................................... 24 
III. Methodology ........................................................................................................................... 26 
3.1 Chapter Overview ............................................................................................................... 26 
3.2 Test Platform ....................................................................................................................... 26 
ix 
 
3.2.1 TMR Design ................................................................................................................. 26 
3.2.2 Controller Board ........................................................................................................... 29 
3.2.3 External Devices ........................................................................................................... 31 
3.2.4 Buffer Bridge ................................................................................................................ 33 
3.3 Operation Speed .................................................................................................................. 34 
3.4 Fault Detection and Diagnosis ............................................................................................ 35 
3.4.1 Fault Detection ............................................................................................................. 36 
3.4.2 Fault Diagnosis ............................................................................................................. 37 
3.4.2.1 Test Vector Generation ......................................................................................... 37 
3.4.2.2 Diagnosis Algorithm ............................................................................................. 38 
3.5 Test Setup ............................................................................................................................ 39 
3.5.1 Injected Fault Setup ...................................................................................................... 39 
3.5.2 Gamma Radiation Test Setup ....................................................................................... 40 
3.5.3 Thermal Radiation Test Setup ...................................................................................... 40 
3.5.4 Optical Laser Test Setup............................................................................................... 42 
3.5.5 Optical Flash Test Setup ............................................................................................... 42 
3.6 Summary ............................................................................................................................. 43 
IV. Results and Analysis ............................................................................................................... 45 
4.1 Chapter Overview ............................................................................................................... 45 
4.2 Test Setup Results ............................................................................................................... 45 
4.2.1 Injected Fault Results ................................................................................................... 45 
4.2.1.1 Analysis................................................................................................................. 47 
4.2.2 Gamma Radiation Results ............................................................................................ 48 
4.2.2.1 Analysis................................................................................................................. 50 
4.2.3 Thermal Radiation Results ........................................................................................... 51 
4.2.3.1 Analysis................................................................................................................. 52 
4.2.4 Optical Laser Results .................................................................................................... 53 
4.2.4.1 Analysis................................................................................................................. 53 
4.2.5 Optical Flash Results .................................................................................................... 54 
4.2.5.1 Analysis................................................................................................................. 54 
4.3 Summary ............................................................................................................................. 55 
x 
 
V. Research Summary .................................................................................................................. 57 
5.1 Chapter Overview ............................................................................................................... 57 
5.2 Conclusion ........................................................................................................................... 57 
5.3 Future Work ........................................................................................................................ 58 
Appendix A. Virtex 4 FX Series Mini-Module Datasheet ........................................................... 60 
Appendix B. Carry Look Ahead Adder Schematic ...................................................................... 60 
Appendix C. Ripple Carry Adder Schematic ................................................................................ 60 
Appendix D. Virtex II Pro Datasheet ............................................................................................ 60 
Appendix E. High Speed CMOS Hex Buffer Datasheet .............................................................. 60 
Appendix F. IC Scoket with Capacitor Datasheet ........................................................................ 60 
Appendix G. TESTCAD Tool Guide............................................................................................ 60 
Appendix H. Full Fault List .......................................................................................................... 60 
Appendix I. Result Logs ............................................................................................................... 60 
Bibliography ................................................................................................................................. 61 
 
  
xi 
 
List of Figures 
Figure                Page 
 
1. Basic TMR .................................................................................................................................. 8 
2. TMR Circuit Diagram ................................................................................................................. 9 
3. TMR with Triplicated Voter ....................................................................................................... 9 
4. TMR Design with Feedback ..................................................................................................... 10 
5. Threshold Voltage of ‘n’ and ‘p’ Transistors during Irradiation .............................................. 16 
6. Irradiation and Annealing Effects ............................................................................................. 16 
7. Cosmic Ray Through the Strain of a NMOS Transistor ........................................................... 17 
8. Example of SEU ....................................................................................................................... 19 
9. Decay scheme of 60Co ............................................................................................................... 20 
10. Co-60 Gamma Irradiator Layout ............................................................................................ 20 
11. Dose Rate of Co-60 Irradiator ................................................................................................. 21 
12. Schematic of 1600 W Solar Simulator.................................................................................... 22 
13. Spectral Output from 1600W Thermal Simulator .................................................................. 22 
14. Layout of the 3 Adders on the FX12 Board ............................................................................ 24 
15. Block Diagram of Testing Platform ........................................................................................ 25 
16. DUT Virtex 4 Mini-Module.................................................................................................... 27 
17. Diagram of TDTMR ............................................................................................................... 27 
18. Diagram of DUT ..................................................................................................................... 28 
19. Connector Cables between Chip and Baseboard. ................................................................... 29 
20. Virtex-II Pro Controller Board................................................................................................ 30 
21. State Diagram of Controller Board ......................................................................................... 30 
xii 
 
22. Controller Board Block Diagram ............................................................................................ 31 
23. Hyper-Terminal Monitoring Screen ....................................................................................... 32 
24. Output of the Flash Card ......................................................................................................... 32 
25. Bridge Board ........................................................................................................................... 33 
26. Signal Comparisons ................................................................................................................ 34 
27. Fault Detection and Diagnosis Algorithm .............................................................................. 36 
28. Layout of Detected Faults ....................................................................................................... 38 
29. Gamma Radiation Test Setup ................................................................................................. 40 
30. Thermal Radiation Test Setup ................................................................................................ 41 
31. Thermal Radiation Spec Setup................................................................................................ 41 
32. Optical Laser Test Setup ......................................................................................................... 42 
33. Optical Flash Test Setup ......................................................................................................... 43 
34. Injected Fault Locations for Ripple Carry Adder ................................................................... 46 
35. Injected Fault Locations for Carry Look Ahead Adder .......................................................... 46 
36. Injected Fault Results .............................................................................................................. 47 
37. Gamma Radiation Test 2,3 Results ......................................................................................... 48 
38. Gamma Radiation Timeline .................................................................................................... 49 
 
 
  
xiii 
 
List of Tables 
Table                    Page 
 
1. Evaluation Results of TMR Design .......................................................................................... 10 
2. History of Algorithm Speedups ................................................................................................ 12 
3. Stuck at Zero Faults .................................................................................................................. 50 
4. Gamma Radiation Results Summary ........................................................................................ 50 
5. Thermal Radiation Results Summary ....................................................................................... 51 
6. Optical Laser Results Summary................................................................................................ 53 
7. Optical Flash Results Summary ................................................................................................ 54 
  
 
 
  
xiv 
 
List of Symbols and Acronyms 
ASIC    Application Specific Integrated Circuit 
ATPG   Automatic Test Pattern Generation 
C   Celsius 
CLB   Configurable Logic Blocks 
CMOS   Complementary Metal Oxide Semiconductor 
Co    Cobalt 
COTS   Commercial Off The Shelf 
DUT    Device-Under-Test 
EMI   Electro Magnetic Interface 
FPGA    Field Programmable Gate Array 
FTMR   Functional Triple Modular Redundancy 
MBU    Multiple Bit Upset 
MOS    Metal Oxide Semiconductor 
NRL    Nuclear Research Lab 
OSU    Ohio State University 
OSUNRL  Ohio State University Nuclear Research Lab 
PTMR   Partial Triple Modular Redundancy 
SEB    Single Event Burnout 
SEE    Single Event Effect 
SEL    Single Event Latch-up 
SEU    Single Event Upset 
Si    Silicon 
STMR   Selective Triple Modular Redundancy 
TID    Total Ionizing Dose 
TDTMR  Triple Design TMR 
TMR    Triple Modular Redundancy 
V   Volts 
VHDL   VHSIC Hardware Description Language 
W   Watt 
1 
 
REAL TIME FAULT DETECTION AND DIAGNOSTICS  
USING FPGA-BASED ARCHITECTURES 
 
I. Introduction 
Electronics and technology continue to dominate the market and play an indispensible 
role in space exploration.  Space, while seemingly benign, is both volatile and chaotic.  To study 
such a vast, unknown area, man has launched satellites, space shuttles, and telescopes out into 
space to gather more information about this unfamiliar place.  These machines and devices 
contain a vast array of electronics.  Terrestrial electronics are largely protected from the effects 
of radiation, not so in space.   
Circuits and transistors endure a lot of stress in the harsh space environment.  The study 
of electronics in this environment is crucial for their reliable operation.  The effects of radiation 
on electronics are a principle concern.  The current technique for protecting electronics in space 
is to make them “Radiation Hardened”.  While this brute force method is effective, a cheaper, 
more effective means of combating radiation effects is always desirable.   
A Field-Programmable Gate Array (FPGAs) is an integrated circuit with the versatility to 
be reprogrammed for multiple applications.  A design utilizing FPGAs offers both cost cutting 
and time-saving advantages over a design utilizing a conventional Application-Specific 
Integrated Circuit (ASIC) [23].  Thus, FPGAs are increasingly being considered for various new 
device applications throughout the commercial business world.   
1.1 Motivation 
Newer technologies are increasingly being developed on FPGAs due to their low costs 
and increased performance results over traditional ASIC devices.  Space radiation has the 
potential of producing errors at the transistor level of electrical devices [1].  The current methods 
to combat these conditions and minimize the overall damage due to errors are to use radiation 
2 
 
hardened ASIC or programmable logic devices found within FPGAs.  Radiation hardened 
devices are not 100% effective and they are expensive [3].  Commercial FPGAs have been 
investigated as a cheaper means of electrical devices in space; however, they are not radiation 
hardened [2, 3].   
In an effort to combat this problem, computer models and real simulations have been 
used to demonstrate the impacts of errors through various forms of radiation on specific 
components of commercial FPGAs in addition to the FPGA itself.  It is still difficult to 
characterize the effects of radiation on electronics.  An FPGA offers the versatility in creating a 
platform to characterize and evaluate the damaging effects of radiation.  A test platform cuts the 
cost of having all the necessary measuring tools to evaluate a circuit. 
An issue that has not been carefully investigated is attempting to locate and diagnose 
specific faults within a circuit in real time.  Testing circuits is a challenging undertaking and 
attempting to diagnose the location of error presents an even greater challenge.  Obtaining the 
ability to locate errors in real time can aid in the understanding and classification of radiation 
effects.  A better understanding of these effects can lead to better counter measures and error 
prevention techniques for protecting circuits from these errors. 
Design logic and hardened by design both have been used in an effort to correct errors 
while in a space environment.  Various methods have been proven more effective than others and 
Triple Modular Redundancy (TMR) has shown to be an effective fault redundancy method in 
error correction [3-6].  Further investigation in improving TMR can demonstrate the importance 
of fault redundancy in integrated circuits. 
1.2 Scope 
 Research in this thesis will focus on the continuation of the topic of radiation effects on 
3 
 
electronics.  Several forms of radiation are used in this research: a 60Co gamma cell (providing 
ionizing radiation only), thermal radiation, and optical radiation.  The two FPGA boards chosen 
in this research are the Virtex-II-Pro and the FX12 mini module series both manufactured by 
Xilinx.  The testing platform will enable researchers to utilize it in other forms of radiation not 
employed in this thesis. 
1.3 Contributions 
 The overarching goal of this research was to characterize the effects of different radiation 
forms on integrated circuits.  The research facilitates the potential replacement of physically 
hardened ASIC and FPGA devices, as well as allow for improvements in designs of non-
radiation hardened electronics.  
 A test platform was developed in an effort to establish a base system that has the potential 
to be used in various forms of radiation with multiple circuit designs.  The test platform utilizes a 
bridge board with ribbon cables to perform proper communication between the two FPGAs.  
This design offered significant improvement from the previous Ethernet cable design [17].  The 
testing platform provides the ability to generate a number of input vectors, monitor the values 
that would be sent to the DUT, and analyze the resulting data.   
 Along with this testing platform, an algorithm was developed to diagnose and locate the 
faults within the circuit.  Upon uncovering an error within the circuit, the device locates which 
circuit design’s output was faulty and switches to a real-time diagnostic mode.  The circuit 
undergoes a series of test and diagnostic vectors designed to pin point the exact stuck at fault 
location within the integrated circuit with the best resolution possible without physically 
destroying the chip.  The testing platform helps validate the diagnosis algorithm on the FPGA 
while under various radiation environments. 
4 
 
Finally, the research performed in this thesis attempts to demonstrate whether design-
hardening techniques can reduce system vulnerability to external errors.  A new architecture 
design and programming model was implemented to increase detection, correction and tolerance 
of failures furthering the potential uses for FPGAs in the callous space radiation environments.  
Triple Design TMR (TDTMR) was implemented into the logic in an effort to correct errors 
without causing downtime in the circuit.  Instead of the traditional three copies of the same 
circuit design in TMR, a unique approach of having different design implementations of the 
same circuit was used.  This new method demonstrates that different design logic could be more 
robust under the effects of radiation instead of having the same copy of the design three times.  
This implementation hopes to improve the ability to correct these errors without needing to 
reprogram the entire microelectronic device. 
The contributions of this work include an analysis of commercial off the shelf (COTS) 
reconfigurable electronics in radiation environments over the current use of radiation hardened 
devices.  Overall, the main contributions of this work are as follows: 
1. Design a test platform utilizing a FPGA module that could send, receive, and process 
data while under radiation. 
2. Real-time diagnostics to uncover an error and pinpoint its location with the best 
resolution in the quickest manner.  
3. Utilizing design hardening techniques in a commercial FPGA with TDTMR. 
1.4 Thesis Organization 
 The work performed on this thesis can be broken up to five main sections.  Following this 
introduction is background information on the topic and its related sources.  The background 
5 
 
covers radiation effects on electronics, current TMR methods used with circuit design, and 
testing platforms.  
 Chapter 3 discusses the methodology used to formulate an optimal solution for the 
research work performed on this topic.  It also further discusses the design choices chosen for the 
particular implementation used to complete the project.  Chapter 4 covers the results obtained 
through the methodology and a critical analysis of them.  Finally, chapter 5 serves to summarize 
and conclude the relevant work achieved through this thesis.  It further discusses possible future 
research within this area.  
  
6 
 
II. Background 
2.1 Chapter Overview 
 This section describes key areas of background information related to the topic of this 
thesis.  Information on FPGAs, TMR, Fault detection and correction, radiation effects on 
electronics, and radiation sources are covered in this chapter.  A set of improvements made on 
TMR are also described.  There are many forms of radiation effects on electronics but Total 
Dose Effect, Single Event Effects, and Single Event Upsets are only explained.  The gamma and 
thermal radiation sources are also described in this chapter. 
2.2 Field Programmable Gate Arrays (FPGA) 
 Field Programmable Gate Arrays (FPGAs) are increasingly demanded by circuit 
designers from all fields due to their high flexibility to meet multiple requirements such as high 
performance, low costs, and the capability of on the fly reprogramming.  FPGAs have been 
known to be slower, less energy efficient, and generally achieve less functionality than their 
fixed ASIC counterparts.  However, the decreasing costs and development time needed to 
implement FPGAs compared to designs using discrete logic devices have made programmable 
logic devices favorable in space and avionic applications as well [7].  These forms of integrated 
circuits contain an array of Configurable Logic Blocks (CLBs) and programmable interconnects 
in the circuit that allows the connection of different gates and structures.  CLBs are made of 
basic elements which include look-up tables, multiplexers, and flip flops along with routing 
logic, pass transistors, and I/O pads.  Each CLB can implement any Boolean function of its 
inputs and can be linked together via routing blocks to implement more complex logic.  The 
CLBs are interconnected through a general routing matrix that comprises an array of routing 
switches located at the intersections of horizontal and vertical routing channels [8]. 
7 
 
 FPGA devices have been used in space for more than a decade with a mixed level of 
success; however, recently few reprogrammable devices have been used on spacecraft due to 
their sensitivity to involuntary reconfiguration due to Single Event Upsets (SEU) [4].  Space 
electronic designers are now more willing to utilize FPGAs in high radiation environments in 
place of radiation hardened devices.  They perform well in high throughput signal processing 
applications often used in space.  With the rising costs of radiation hardened devices, research 
into utilizing FPGAs for hardened by design testing has grown significantly.   
 As previously mentioned, despite the growth of FPGA development, they still remain 
susceptible to radiation errors.  Since FPGAs store their programming data, or configuration in 
an SRAM-like configuration memory, radiation can actually alter the intended circuit [6].  The 
static memory elements and combinatorial logic paths are susceptible to upset from heavy ion 
particles within the space environment.  Protection of the combinational logic is therefore 
required to avoid involuntary changes of functionality.  It has been important to develop some 
form of mitigation techniques to account for these errors to ensure reliable operation.  
 It is important to note that even though the growing use of FPGAs in space and radiation 
environments are rapidly expanding, radiation induced errors provide a significant hindrance on 
performance.  New strategies have been implemented to help foster the growing field of FPGAs 
in the space environment through hardened by design rather than radiation hardened.  These 
mitigation strategies of correcting the effects of radiation errors lead the way for this particular 
research along with other research in this field.   
2.3 Triple Modular Redundancy (TMR) 
 Recently, several TMR methods have been introduced in an effort to reliably combat the 
persistent problem of errors in integrated circuits.  Some of the methods have been more 
8 
 
promising than others.  The method chosen for this study of TMR was the technique of design 
hardening also categorized as fault redundancy and correction.   
TMR has been widely used as a form of design hardening to greatly improve the 
reliability of FPGA designs to mitigate an upset as it occurs in the device configuration.  TMR 
has been shown to greatly improve the reliability of FPGA designs subject to SEUs [2].  This 
mitigation technique traditionally uses three identical copies of a circuit which would run in 
parallel.  The outputs would then go through a majority voter circuit.  If there would be an error 
on one of the bits in one circuit, the TMR votes out the error.  To improve the basic concept of 
TMR, a few design enhancement steps were made on the simple three circuit and voter layout 
(Figure 1). 
 
Figure 1 – Basic TMR 
 The first addition was to expand the one bit voter layout shown in Figure 2 and triplicate 
the voter unit so there would no longer be a single point of failure (Figure 3).  This application 
significantly reduces the configuration sensitivity of the design [2].   
9 
 
 
Figure 2 – TMR Circuit Diagram 
 
Figure 3 – TMR with Triplicated Voter 
It was shown that these two designs suffer from resynchronization problems [3] meaning if a 
faulty bit is ‘repaired’ in one of the voting logic, that bit would not be synchronized with the 
other two copies of logic.  By needing to correct the bit, a slight delay is introduced which 
compounded over time may cause a synchronization issue.  This problem can be prevented by 
placing the voting circuitry within the feedback path of the circuit [3].  The simple addition 
helped prevent synchronization problems and increased the reliability of the circuit (Figure 4).  
Each of these designs was tested on their reliability with preventing SEUs.  Table 1 depicts the 
results from a series of tests on four different TMR designs.   
  
10 
 
Table 1 – Evaluation Results of TMR Designs 
Design             Simple Incrementer        Up/Down Loadable Counter 
(single clock) LUTs Failures Speed(MHz) LUTs Failures Speed (MHz) 
No Redundancy 8 446 220 10 463 220 
1 Voter 35(∼4x) 410 217(99%) 41(∼4x) 484 217(99%) 
3 Voters 51(∼6x) 14 199(91%) 57(∼6x) 14 213(97%) 
Feedback 51(∼6x) 14 160(73% 57(∼6x) 15 157(72%) 
Map Feedback 27(∼3x) 15 194(88%)                            N/A                                  
 
 
Figure 4 – TMR Design with Feedback 
Some recent design enchantments with TMR have been developed.  There were three 
new TMR techniques that were explored, Functional Triple Modular Redundancy (FTMR), 
Selective Triple Modular Redundancy (STMR), and Partial Triple Modular Redundancy.  FTMR 
shows that both sequential and combinational blocks can be protected by means of TMR [4].  
SMTR extends the basic TMR technique by identifying “sensitive” gates in the circuit and then 
introduces TMR selectively at those gates [5].  Finally, Partial TMR extends STMR a bit but 
gives priority to the circuit components which are more susceptible to persistent errors and 
applies TMR to them [6].   
11 
 
 Each of these methods offers a more enhanced version of TMR.  The current use of 
TDTMR in this research utilized a few of the enhancements, but the main difference was using 
three different logic designs instead of the three copies. 
2.4 Fault Detection and Correction 
 Fault Detection and correction has always been a crucial aspect when it comes to digital 
circuits.   With a countless number of logical gates that continue to increase, it has become 
virtually impossible to fully test every possible input combination for circuit designs for testing 
purposes.  A defect is an error introduced into a device during the manufacturing process.  A 
fault is said to be detected if a specific test pattern used with the primary inputs could detect the 
specific fault and contain differing primary output results from the original design.   
High level fault modeling provides the ability to use simulation based design verification.  
Bridging faults, delay faults, stuck-at faults are the most popular fault models in digital testing at 
this level [9].  The single ‘stuck-at’ fault model has been the most versatile fault model for 
testing circuit logic.  A stuck-at fault is assumed to affect only the interconnection between gates.  
It shows if a circuit has ‘n’ signal lines, then there are potentially ‘2n’ ‘stuck-at’ faults within the 
circuit.  The goal would be to find a test pattern that could detect all possible faults for each 
circuit design.  Automatic Test Pattern Generation (ATPG) was developed in an effort to find 
manufacturing defects along with finding a small number of test patterns that identify a high 
number of possible faults. 
 ATPG is a testing method developed to locate a test sequence that allows the user to 
differentiate between the correct circuit behavior and a faulty circuit.  The goal of ATPG is to 
find a set of test patterns which achieve the highest fault coverage.  A pattern set with 100% fault 
coverage consists of tests to detect every possible ‘stuck-at’ fault in a circuit.  100% fault  
12 
 
Table 2 – History of Algorithm Speedups [9] 
Algorithm Estimated speedup over D-Algorithm Year 
  (normalized to D-ALG CPU time)     
D-ALG [551] 1       1966 
PODEM [258] 7       1981 
FAN [229, 232, 233] 23       1983 
TOPS [360] 292       1987 
SOCRATES [576] 1574 ATPG System     1988 
Waicukauski et al.[708] 2189 ATPG System     1990 
EST [110, 253] 8765 ATPG System     1991 
TRAN [122] 3005 ATPG System     1993 
Recursive learning [376] 485       1995 
Tafertshofer et al. [648] 25057       1997 
coverage does not necessarily guarantee high quality, since other faults such as bridging or open 
faults could still occur.  There are cases when circuits containing faults can’t show up for any of 
the input sequences generated.  One case might have the fault be intrinsically undetectable 
meaning that no test patterns exist that can detect that particular fault.  These faults are redundant 
in the sense that their presence does not influence the observable circuit functionality.  Since the 
ATPG problem is NP-complete, a problem that cannot be determined in a ‘practical’ amount of 
time, there will also be cases where patterns exist but the ATPG algorithm gives up since it will 
take a significant amount of time to find them [9]. 
 Historically, there have been many algorithms developed in an effort to utilize test 
patterns for circuit designs to test for faults.  Testing these integrated circuits with significant 
fault coverage has proven to be a daunting task due to its complexity.  One of the earliest and 
cornerstone algorithms used for ATPG today is the D-Algorithm.  This algorithm provided the 
building blocks necessary to cultivate faster and more efficient ATPG algorithms.  Table 2 
depicts a brief history of the algorithms used for ATPG with estimated speedup based on the D-
Algorithm.   
13 
 
These algorithms employ heuristics that find all necessary signal assignments for a test as 
early as possible. It has been shown that sequential circuits are far more complex than 
combinational circuits to achieve considerable fault coverage.  For combinational fault 
simulation, the complexity is O(n2).  Big Oh notation describes a growth rate as a simpler 
function.   For sequential fault simulation, the complexity is estimated to be between O(n2) and 
O(n3) based on empirical measurements [9].  These algorithms have been employed to test for 
defects on manufactured circuits and continue to be developed to find faster and more efficient 
ways of detecting faults. 
2.5 Fault Diagnosis  
Faults are understood to be an abnormal change of system function or defect at the 
component, equipment, or subsystem that may or may not lead to physical failure or breakdown 
[10].  If faults occur, the outcome has the potential of being catastrophic by possibly endangering 
lives.  It is imperative that uncovering the location of faults is critical.  Some traditional 
approaches to fault diagnosis have been installing multiple sensors and hardware, analytical or 
functional redundancy, and a combination of hardware and analytical redundancy [11]. 
The purpose of this research was to be able to implement a method of detecting and 
diagnosing faults within the circuit while under radiation.  Attempting to diagnose the location of 
faults and errors on a circuit continues to be investigated thoroughly.  Much of the recent 
progress in fault diagnosis can be credited to the extensive use of fault equivalence to reduce the 
number of fault conditions for analysis, since only one fault from each fault equivalence class 
needs to be retained as a representative [12].  However, the difficulty still consists of finding 
diagnostic methods which are suitable for real time execution.   
14 
 
The goal of this research was to be able to use similar diagnosing methods and algorithms 
to detect errors found within the circuit design and diagnose the location of the faults with the 
best possible resolution.  This means the fault was reduced to the minimum number of locations 
that could be distinguished using stuck-at fault modeling.  The information provided by this 
diagnostic would provide the user the ability to better address the next action needed to be taken 
to correct the problem.  The following research attempts to uncover whether real time diagnosing 
under radiation has the potential of being successful.   
2.6 Radiation Effects on Electronics 
 Operational reliability is one of the key principal concerns in microelectronic systems. 
This is particularly true of space bound systems since they are exposed to ionizing radiation and 
their operating conditions do not allow for quick and easy restoration of failed or malfunctioning 
components.  The harsh space environment can cause severe damage and malfunction on 
unprotected electronics.  Trapped protons and electrons in the Earth’s radiation belts and cosmic 
rays prove to be crucial challenges for space electronics to operate normally in this environment.  
Long periods of time and exposure to space’s callous energy particles can degrade even the best 
device’s performance, leading to component failure.  Everything from major components to the 
wiring and cabling of electronic devices can be seriously affected by radiation. 
 This section will explore key components of radiation effects on electronics related to the 
research goals of this thesis.  Three key issues are discussed further, total dose effect, single 
event effects (SEE), and single event upsets. 
2.6.1 Total Dose Effect 
 Electronic devices in space suffer long-term radiation effects, mostly due to electrons and 
protons.  Total dose effects refers to the integrated radiation dose that is accumulated by space 
15 
 
electronics over a certain period of time and can reduce mission lifetimes due to long-term 
damage to devices, ICs, or solar cells.  Long-term exposure can cause device threshold shifts, 
increased device leakage and power consumption, timing changes, and decreased functionality 
[13].  After the exposure to sufficient total-dose radiation, most insulating materials such as 
capacitor dielectrics, circuit-board materials, and cabling insulators, become less insulating or 
become more electrically leaky along with certain conductive materials, such as metal-film 
resistors, can change their characteristics under exposure to total-dose radiation [14].  These 
changes may not be constant with time after irradiation and may depend on the dose rate at 
which the radiation is received.   
 The radiation damage in the silicon dioxide layers consists of three components: the 
buildup of trapped charge in the oxide, an increase in the number of interface traps, and an 
increase in the number of bulk oxide traps [15].  The ionizing radiation primarily affects the gate 
and field oxide layers.  In CMOS devices, the gate oxide becomes ionized by the dose it absorbs 
and ionization produces electron-hole pairs in insulation layers.  The electrons have high 
mobility, but the holes have lower mobility.  The free electrons and holes drift under the 
influence of the electric field that is induced in the oxide by the gate voltage.  The holes that 
escape “initial” recombination transport through the oxide toward the silicon and silicon dioxide 
interface by hopping through localized states in the oxide [16].  A small number of holes become 
trapped in the gate oxide.  Trapped charge in the oxide and at interface regions changes the 
threshold voltage and mobility of the gate and field-oxide transistors, therefore modifying their 
characteristics [23].  The accumulated charge can be high enough to keep the transistors 
permanently open or closed, having the source drain current no longer be controlled by the gate 
leading to device failure.  Trapped holes are not stable, they gradually anneal with time.  The 
16 
 
overall effect depends on bias conditions and device technology.  With devices becoming 
smaller, the gate oxides in these shrinking transistors are growing thinner.  Being thinner, the 
gate oxide traps less positive charge overall [14].  Therefore, transistors with smaller 
technologies are becoming inherently more radiation resistant.  Figure 5 shows how the total 
dose effects the threshold voltage and causes a shift in both ‘n’ and ‘p’ transistors.  The threshold 
voltage does change during the annealing process after it has been irradiated shown in Figure 6.  
 
Figure 5 – Threshold Voltage of ‘n’ and ‘p’ Transistors during Irradiation [15] 
 
Figure 6 – Irradiation and Annealing Effects [15] 
17 
 
Since the number of electron-hole pairs generated is directly proportional to the amount 
of energy absorbed by the device material, the total damage is also roughly proportional to the 
total dose of radiation received by the device [15].   
2.6.2 Single Event Effects  
 Another important category of radiation effects that an integrated circuit is vulnerable to 
are Single Event Effects (SEE).  A SEE occurs when a single high-energy particle strikes a 
device, leaving behind an ionized track and can lead to sudden device or system failure (Figure 
7).  These failures result from the charge deposited by a single particle crossing a sensitive 
region in the device and are a function of the amount of charge collected at the sensitive node 
and the node state [13].  The ionization along the path of the impinging particle collects at a 
circuit node.  The ionized track contains equal numbers of electrons and holes and is therefore 
electrically neutral.   
 
Figure 7 – Cosmic Ray Through the Strain of a NMOS Transistor 
The total number of charges is proportional to the linear energy transfer of the incoming particle.  
Every memory device has a certain critical charge which could result in a SEE or other 
undesirable phenomenon [14].  The three largest categories of SEEs are Single Event Upsets 
18 
 
(SEU), Single Event Latch-up (SEL), and Single Event Burnout (SEB).  The soft error or upset 
(SEU) is a change in the information stored on the circuit.  A hard error is characterized by 
permanent or semi- permanent damage such as the latch-up (SEL) or the burnout (SEB).  
Another source of SEEs is impurities in the device material.  There might be traces of uranium or 
thorium, which both are naturally radioactive elements, decaying by alpha emissions.  The alpha 
particle can then release its charge and cause a SEE. 
In the space environment, circuit designers have been concerned with both the protons 
and cosmic rays that lead to a greatly increased SEE rate.  For cosmic rays, SEEs are typically 
caused by its heavy ion component.  These cosmic rays can easily penetrate the structure of an 
integrated circuit.  Cosmic rays may be galactic or solar in origin.  Protons, usually trapped in the 
earth's radiation belts or from solar flares, may cause direct ionization SEEs in very sensitive 
devices.  However, a proton may typically cause a nuclear reaction near a sensitive device area 
creating an indirect ionization effect potentially causing a SEE. 
2.6.3 Single Event Upsets 
A cosmic ray, or a secondary ion released via a high-energy proton-induced nuclear 
reaction, can deposit enough energy within a sensitive node that an integrated circuit can be 
upset.  These single event upsets (SEUs) are analogous to soft errors in electronics or avionics 
due to energetic alpha particles or atmospheric neutrons.  SEU is defined by NASA as 
"radiation-induced errors in microelectronic circuits caused when charged particles (usually from 
the radiation belts or from cosmic rays) lose energy by ionizing the medium through which they 
pass, leaving behind a wake of electron-hole pairs."  A SEU usually manifests itself as a state 
change or "bit-flip" of a single data bit or memory cell that causes a momentary glitch in the 
device output (Figure 8).   
19 
 
 
Figure 8 – Example of SEU 
If enough of these upsets occur, or if a single critical node is affected, a reset or rewriting 
of the device results in normal device behavior would be required.  Single-event upsets occur in 
computer memories, microprocessors, controllers, and almost any digital circuit containing 
memory elements.  They do not cause lasting damage to the device, but may cause lasting 
problems to a system which cannot recover from such an error.  Also, in very sensitive devices, a 
single ion hits two or more bits causing simultaneous errors, known as multiple-bit upsets 
(MBUs), in adjacent memory cells.  As the minimum device feature size is down scaled to 
smaller and smaller dimensions, the susceptibility to such SEU’s has been found to increase 
remarkably [15].   
Research and mitigation techniques within this field have been growing significantly.  
New methods and studies have been investigated in order to try to lower the number of possible 
SEU on a circuit at a given time.  An alternative approach to reducing SEU and transient upset 
levels, as well as eliminating the possibilities of latch-up, is to use silicon on sapphire or silicon 
on insulator technologies to build CMOS circuits [15].  Other methods have been using the 
hardened by design methodology i.e. Tripler Modular Redundancy.  It is with these mitigation 
techniques and research where these phenomena could be carefully investigated to better 
understand and predict when these upsets could occur with space electronics.   
 2.7 Gamma Radiation Source 
The cobalt-60 isotope (Co
experiment.  Co-60 undergoes beta decay with a half
particles and one electron, demonstrated in Fi
Figure 
Ohio State University (OSU) Nuclear Reactor Lab (NRL) in Columbus, Ohio provided a 
Co-60 source.  A simplified diagram showing the gamma irradiator can be found in Figure 
contains a six inch wide aluminum tube containing a movable platform that can be raised and 
lowered out of the irradiator.  The gamma irradiator cell itself sits on the bottom of a pool of 
water and consists of 14 Co-60 sources evenly spread around the aluminum tube.
Figure 
20 
-60) was used as the source of ionizing radiation for this 
-life of 5.24 years releasing two gamma 
gure 9. 
 
9 – Decay scheme of 60Co [20] 
10 – Co-60 Gamma Irradiator Layout [17] 
10.  It 
 
 
 When the device under test (DUT) is lowered into the tube, the radiation dose 
based on the location of the device relative to the center of the Co
dose rate curve is based on the distance
when the platform is resting on the bottom of the aluminum tube [17].  The radiation dose curve 
provided by OSUNRL is depicted in Figure 1
Figure 1
2.8 Thermal Radiation Source 
A 1600W Xe lamp thermal simulator was used for the irradiation of the DUT in this 
thesis.  The schematic shown below (Figure 1
simulator used in these experiments had an output
was perpendicular to the major axis of the thermal simulator producing a horizontal beam [18].
21 
-60 source rods.  However, the 
 of the DUT above the bottom of the moveable platform 
1. 
1 – Dose Rate of Co-60 Irradiator [21] 
 
2) is accurate with the exception that the thermal 
 direction rotated 90 degrees so that the output 
rate is 
 
 
 Figure 1
The thermal simulator was assembled by Koehl [19] and the 
output seen in Figure 13 below. 
Figure 13 – Spectral Output from 1600W Thermal Simulator
The spectrum is largely a smooth Plankian distribution, reflecting the source plasma temperature 
with superimposed lines from the xe
distribution is similar to the measured intensities for a 1 kT nuclear explosion [19].
22 
 
2 – Schematic of 1600 W Solar Simulator 
manufacturer recorded a spectral 
 
non emission spectrum.  The smooth Plankian portion of the 
 
 
23 
 
2.9 Previous Work 
 Research with this topic has been performed in the last few years.  The work presented 
with the last two AFIT theses has been inconclusive.  There were many problems that were fixed 
from the previous work along with a significant contribution by real time detection and 
diagnosing.   
The original code developed for evaluating circuits under radiation was not operational.  
The next version’s code was primitive and could not offer any real, solid conclusion.  The 
pervious hardware constructed for communication between the two boards was suboptimal and 
unusable.  The efforts in building TMR on the base level was not implemented properly because 
the code was written and optimized by Xilinx and not built structurally.  The monitoring system 
for collecting errors was not actually accounting items in real time.  It polled the system in 
intervals that the max speed for RS232 could perform at.   
Not everything in the work done previously was completely inconclusive.  Results from 
previous work suggested that the baseboard was the cause of most of the errors in the result.  The 
results also demonstrated that dose rate versus total dose could be significant.  Both works 
performed previously made an attempt to characterize the effects of radiation on electronics but 
failed to fully implement a capable measuring system.  
 The current research corrected these problems.  The testing platform utilized ribbon 
cables, buffers, and capacitors to correctly implement a bridge between the two FPGA boards.  
TDTMR was implemented structurally and laid out onto the mini module FX12 series board 
(Figure 14).  The monitoring system was only used to see the results.  The actual data that was 
measured was written to a flash device in real time.  Further detail of the implementation of this 
research is described in the following chapter. 
24 
 
 
Figure 14 – Layout of the 3 Adders on the FX12 Board 
2.10 Summary 
 This chapter provides a quick background in key areas that are being investigated in this 
research.  There are many fault redundancy techniques that are used in a variety of forms in 
research.  This particular research focuses on TMR.  Radiation effects on electronics also have a 
very large and comprehensive background.  A full understanding and explanation of these effects 
still remains a bit of a mystery.  The previous two AFIT researchers in this area did not 
completely cover the most basic concepts.  Their attempts did produce a few benefits, in the long 
run there were too many errors to take any work that was accomplished and build upon it.  All 
the work done for this thesis was rebuilt from the ground up.  The goal was to correct these 
25 
 
mistakes and expand on the concepts of radiation effects on electronics.  Further research into 
these areas can be found through the references in the bibliography. 
  
26 
 
III. Methodology 
3.1 Chapter Overview 
 This section describes the entire methodology used to accomplish the evaluation of the 
testing platform, the detection and diagnosis algorithm and TDTMR.  The first section breaks 
down each portion of the testing platform.  A description of the methodology used to create the 
detection and diagnosis algorithm is described in this chapter.  Each test set up for the 
experiment was also included.  Further descriptions and diagrams can also be found in the 
Appendix. 
3.2 Test Platform 
 The overall testing platform was broken into three parts:  the controller unit which houses 
all the command logic along with the processing of data for the diagnosis algorithm, the TMR 
unit or Device Under Test (DUT) which describes the circuitry being evaluated under radiation, 
and the external devices which monitors and records all the information being processed.   
 
 
Figure 15 – Block Diagram of Testing Platform 
3.2.1 TMR Design 
 The FPGA device under test (DUT) for these radiation experiments was the Xilinx Virtex 
4 FX12 Series Mini-module mounted on an Avnet Mini-Module Baseboard, pictured in Figure 
16.  The FPGA contained 90nm transistor technology with 10 layers of metal interconnects and 
triple oxide technology running internally at 1.2 Volts (V) [22].  A more complete description of 
the Virtex 4 Mini Module can be found in Appendix A. 
27 
 
 
Figure 16 – DUT Virtex 4 Mini-Module 
The DUT component consisted of the new Triple Design TMR.  The design also utilized 
the triplicated voter enhancement [2].  The DUT contains three copies of different forms of adder 
logic.  The Carry Look Ahead Adder and Ripple Carry Adder designs were implemented 
structurally using base level gates.  The two adder diagrams can be found in Appendix B and C.   
The third adder was implemented in a behavior method, meaning it was left to the Xilinx 
software to layout how the adder would be designed.  The three adder’s results entered the 
triplicated voter logic in an effort to perform fault redundancy.  The outputs of each adder were 
also sent to the controller board along with the results of the triplicated voter logic.  Figure 17 
shows the design of the entire TDTMR Unit.  
 
Figure 17 – Diagram of TDTMR 
28 
 
Other components were also added to the DUT as well, for testing purposes.  A clock 
generator and an error generator were also added in the DUT (Figure 18).  These units were used 
to simulate fault injections into the design.  A fault injection is an error purposefully implanted 
into the integrated circuit that can be controlled by a user or automated through a computer 
program with specific guidelines. 
 
Figure 18 – Diagram of DUT 
The error unit generated errors at different timing intervals to simulate single errors into 
the TDTMR.  These generated errors would lie dormant unless activated by a switch from the 
base board.  Upon activation it would generate four errors at four different timing intervals, one 
microsecond, one millisecond, two seconds, and eight seconds.   
 The overall goal of this research was to radiate just the Virtex 4 chip without the 
baseboard.  To help eliminate other possible errors and isolate the chip itself, specialized 
connector cables (Figure 19) were constructed to allow just the chip to be placed under radiation 
and keep the base board protected from any possible radiation damage.  No buffers were needed 
to have the chip be fully functional.  With the current cables, the chip itself can be two feet away 
from the base board without any problems with timing and synchronization.   
29 
 
 
Figure 19 – Connector Cables between Chip and Baseboard 
3.2.2 Controller Board 
 The Virtex-II Pro development system was the FPGA chosen to implement the controller 
board logic.  It contains a PowerPC Processor which was used in the compilation of the 
diagnostic algorithm.  130 nm technology was also used with this board along with nine layers of 
metal.  Output pins where soldered onto the board to provide the ability to utilize ribbon cables.  
Figure 20 shows the Virtex-II Pro board.  Further description of the Virtex-II Pro can be found in 
Appendix D. 
30 
 
 
Figure 20 – Virtex-II Pro Controller Board 
 VHSIC Hardware Description Language (VHDL) controlled the basic flow of how the 
hardware operated.  A state machine was developed in VHDL on the controller board to run the 
basic operations.  Figure 21 shows the state diagram used for the controller board.   
 
Figure 21 – State Diagram of Controller Board 
The remaining parts of the controller logic was written in C and utilized by the PowerPC.  The 
controller board cycles through vectors of nine bits for the input that come from a counter or 
random module.  These two modules can be interchanged during runtime by the user.  A copy of 
the DUT circuit is also placed in the controller board and acts as the ‘golden circuit’ for 
comparison purposes.  All the values from the DUT were compared to the ‘golden circuit’ 
31 
 
results.  Figure 22 illustrates the block diagram of the controller logic.  Upon detecting an error, 
the controller board automatically switches to ‘Diagnostic Mode’ which has all the test and 
diagnostic vectors chosen as inputs.  The PowerPC also runs the algorithm that calculates the 
location of the error.  This will be discussed further in this chapter.   
Diagnostic 
Vectors
Test 
Vectors
Counter 
Module
Random 
Module
To / From 
DUT
To External 
Devices
CONTROL LOGIC
Power PC
Gold 
Circuit
 
Figure 22 – Controller Board Block Diagram 
3.2.3 External Devices 
 The two external devices used for the test platform is a laptop attached to the Virtex-II 
Pro board via RS232 (serial) cable and a compact flash card which is on the board.   
 Hyper-terminal was used to communicate with the controller board.  The hyper-terminal 
acted as a monitor of all the operations.  A user can check the number of errors detected, how 
long it’s been running, how many vectors have been checked, and some of the input vectors with 
its results from all the outputs.  Since the controller board operates significantly faster than what 
the hyper-terminal can output, the input vectors and output results are randomly chosen for 
display.  The user can also change to four different modes through the hyper-terminal, namely 
counter, random, test, and debug.  Figure 23 shows what the monitoring screen would look like.   
32 
 
 
Figure 23 – Hyper-Terminal Monitoring Screen 
 The compact flash card was utilized to record all the data in real time for possible post 
processing analysis. Due to the volume of vectors being analyzed, only when presented with an 
error would information be logged onto the flash card.  The flash card recorded which input 
vector caused a failure along with that vector’s results.  It also documented all the results from 
each of the test and diagnostic vectors.  Figure 24 shows an example of what the output on the 
flash card would look like. 
 
Figure 24 – Output of the Flash Card 
The card would record the vector ID, the vector source, all the outputs of the adder and 
voter, the gold circuit, the input vectors, and the time of when the vector was executed.  The 
FPGA would write to a buffer, a temporary file, for two hundred kilobytes worth before writing 
33 
 
it out to flash.  This process was done in case of an error or problem that potentially causes the 
FPGA to crash.  If there was a failure while writing to flash, everything written into the buffer 
would all be lost or corrupted.  The smaller files allow less data to be lost in case of a crash. 
3.2.4 Buffer Bridge 
 The Virtex 4 Mini-Module was originally chosen for its size to fit down a pipe under 
radiation.  To communicate between the two boards, ribbon cable was used.  Due to the 
incompatibility of the two boards to match the same pin layout, a bridge board needed to be 
constructed (Figure 25).   
 
Figure 25 – Bridge Board 
The bridge board also serves as a buffer to clean up the signal degradation between the 
two boards.  Since there would be fifteen feet of cable between the controller board and the 
DUT, high speed CMOS buffers were used to aid in cleaning up the signal.  Full details and 
specifications of the high speed CMOS buffers can be found in Appendix E.  Figure 26 displays 
the comparisons of the signals with and without the CMOS buffers after fifteen feet.  Another 
34 
 
addition to the bridge board was to use IC sockets with capacitors.  These sockets also aided in 
cleaning up the signal.  Appendix F contains the specs of these sockets. 
 
Figure 26 – Signal Comparisons 
3.3 Operation Speed 
 The Virtex-II Pro Power PC has the ability to operate at clock speeds of 300 MHz and the 
Virtex 4 Mini-Module can operate at 100 MHz.  The Virtex-II Pro board itself can operate at 100 
MHz.  In order to get the two boards to properly communicate with each other, the speed of each 
board had to be reduced.  The biggest hurdle was due to the distance of the two boards.  Using a 
clock pulse and the oscilloscope, measurements were made with fifteen feet of cable.  With the 
two boards needing to be fifteen feet apart, the boards with the aid of the buffer board were able 
to sustain a clean square pulse at 1 MHz.   
 Due to the nature of the diagnostic algorithm and the predetermined vector sets 
(explained later in this thesis), a bottle neck developed between the Power PC and the Virtex-II 
Pro.  The instantiated hardware required more processing time to handle the code from the Power 
PC.  Thus, delay was added to the Power PC processing to allow the Virtex II-Pro to execute all 
35 
 
its commands and still be in synch.  An attempt was made to investigate this problem further; 
however, no concrete solution was discovered in time.  This bottle neck caused the code to not 
reach the potential of 1 MHz or one million vectors per second.  With the current code in place, 
the system can operate on average only 2,817 vectors per second. Possible enhancements are 
noted in the future works section of this thesis.   
3.4 Fault Detection and Diagnosis 
 This section describes the process and methodology used to come up with the fault 
detection and diagnosis algorithm.  The method used to detect faults was similar to previous 
work done in this field [17] with a few differences.  The diagnosis algorithm was developed by 
utilizing the TESTCAD test generation and fault simulation tool sets.  The full algorithm flow 
chart is displayed in Figure 27.   
36 
 
 
Figure 27 – Fault Detection and Diagnosis Algorithm 
3.4.1 Fault Detection 
 The algorithm runs a simple loop of generating inputs and checking the outputs.  The 
controller board generates one of the 2N possible input combinations, ‘N’ representing the 
number of inputs, to be put through the DUT.  With nine inputs, there are possible 29 or 512 
unique input combinations.  Once the controller board receives the results of the input vector 
after it has travelled to the DUT, it performs a bitwise comparison on the results of the three 
adders and the voter logic to the ‘gold circuit’.  The bitwise comparison determines which adder 
37 
 
caused an error and which bit of that particular adder was wrong.  The error sends an interrupt to 
the processor to begin the diagnosis of the error.   
3.4.2 Fault Diagnosis 
 The process of diagnosing the exact location of faults within a circuit was a two step 
process.  The first step was to utilize the TESTCAD test generation and fault simulation tool set 
in order to develop the test vector and the individual fault list.  The second step after creating 
these lists, an algorithm was developed to detect the fault location with the best possible 
resolution for stuck-at fault modeling. 
3.4.2.1 Test Vector Generation 
 Due to creating the adders structurally, it made the creation of the fault list easier.  The 
TESTCAD tools allowed for the creation of all the possible faults that could be found in a 
structural design along with fully optimizing and reducing the fault list by equivalence and then 
by dominance.  The fault reduction was able to reduce the number of possible faults by 16-18%.  
The TESTCAD tools evaluated the circuit combined with the fault list and were able to provide 
100% fault efficiency.  This meant that all the faults within a circuit were detected.  The next 
goal was to find out how many faults a vector was able to uncover.  Each of the 2N possible input 
combinations were simulated individually to list all the possible fault locations it detected that 
associated with that particular input vector.  Appendix G has a brief description of all the 
TESTCAD tool commands.  
 The data collected from the TESTCAD tools were tabulated and placed in a giant table.  
Figure 28 depicts a small portion of the table that indicates all the fault locations found with each 
test vector along the ‘x’ axis and their respective vector outputs along the ‘y’ axis.  A full 
detailed list containing all the vectors, faults, and detectable faults can be found in Appendix H. 
38 
 
 
Figure 28 – Layout of Detected Faults 
Each individual test vector was able to detect on average thirty faults.  The final goal was 
to reduce the number of vectors to the least amount that would still cover every possible error.  
The table was sorted by each of the output bits.  This showed that there were a lot of redundant 
test vectors.  After eliminating all the redundant vectors, around twenty-four vectors remained.  
The average number of faults detected by each vector for each output bit reduced to eight.  From 
here, the list of test vectors, diagnostic vectors, and their particular fault lists were made.  These 
lists of vectors and their respective fault list were stored on the controller board.   
3.4.2.2 Diagnosis Algorithm 
 With the test vectors and fault lists stored on the controller board, an algorithm was 
developed in an effort to detect the location of a single fault with the best possible resolution.  By 
following a similar methodology of Deductive Fault Simulation [9], an inverse deductive fault 
detection algorithm was developed.   
 The goal of this algorithm was to diagnose a single fault in the entire system.  Upon 
determining which adder and which bit caused the error, the algorithm would choose the 
appropriate test vector list and fault list to start inverse deductive fault detection.  A possible 
fault list array was created and populated with values of ‘1’ which represent the possible location 
of faults.  Each of the test and diagnostic vectors would process through the system and the 
39 
 
results would be evaluated.  Pending on passing or failing, would determine how the elimination 
process of the fault list array would be conducted.  The algorithm would continue in a loop until 
only one error remained or the end of the list has been reached.  If more than one fault remained 
at the end of the list, it was determined that there was no possible way to distinguish the single 
fault amongst them all.  The remaining faults would be the best possible resolution for fault 
diagnosis. 
 The possibility of having multiple bit failures did present an issue.  The current 
algorithm’s scope to diagnose the location of faults expects to have only one bit failure.  
However, when multiple bit failures were present, the algorithm runs through a set of test vectors 
without attempting to perform the diagnosis algorithm.  The results are stored on the flash card to 
perform post processing analysis.  
3.5 Test Setup 
 The ultimate goal for this research was to test the algorithm under a form of radiation.  To 
verify the validity of the algorithm, different test set ups were created.  Hardware fault injection, 
gamma radiation, and thermal radiation were the testing environments.  Each setup utilized the 
test platform in its entirety.  
3.5.1 Injected Fault Setup 
 The setup for this test was conducted in an open lab.  It utilized the same full test 
platform described above.  Four faults were injected in random locations in the structural design 
of the adders.  The error generator rotates through the four different faults one at a time for 
specific durations.  The test platform executed for seventy two hours straight with no errors to 
exercise its durability and duration.  Fault injections were manually simulated randomly 
throughout the testing process. 
40 
 
3.5.2 Gamma Radiation Test Setup 
 The test platform was originally constructed to be tested under gamma radiation at the 
OSUNRL.  The DUT was separated from the base board and placed eight inches from the 
bottom based on Figure 10 to achieve the highest dose rate per hour (see Chapter 2).  The base 
board was strapped above the DUT separated by a thick piece of lead to be protected from the 
radiation.  The bridge board, controller board, and the remaining devices stayed fifteen feet 
above the reactor.  Figure 29 depicts the entire set up used for the gamma radiation test. 
  
Figure 29 – Gamma Radiation Test Setup 
3.5.3 Thermal Radiation Test Setup 
 Thermal Radiation was chosen as another form to generate errors on the DUT.  In a 
previous thesis [19], a Newport Xe Solar Simulator was assembled with the ability to provide 3.3 
cal/cm2s irradiance.  Figure 30 depicts the set up used for this irradiation test. 
41 
 
 
Figure 30 – Thermal Radiation Test Setup 
A two inch diameter fused silica Plano convex lens was inserted 95 mm after the 
focusing optic of the setup.  The focal length of the second external optic is 150 mm.  However, 
to increase the homogeneity of the beam, the target was placed closer to the second external 
optic than the focal length [18].  A pinhole was constructed that was placed in between the two 
inch Silica Lens and the FPGA to ensure the focus of the thermal radiation onto the FPGA alone.  
Figure 31 provides a diagram with detailed specification of the entire set up. 
 
Figure 31 – Thermal Radiation Specification Setup 
42 
 
3.5.4 Optical Laser Test Setup 
 Using a COTS laser pen to cause errors was another test.  The test setup consisted of a 
laser pointer, a pinhole plate, and the DUT.  All the lasers were green in color with a wavelength 
of 532 nm.  Three different wattage laser pens were used in an attempt to create errors.  The 
three lasers had wattages of 10 mW, 20 mW, and 50 mW.  The cover of the FPGA on the DUT 
was removed for this test.  The pinhole plate was 600 micrometers in diameter. 
 The laser pointer was setup with the pinhole over the laser to get a more focused target 
onto the uncovered FPGA.  The distance to the FPGA was 73 mm.  Each laser was focused on 
one corner of the FPGA.  Figure 32 depicts the entire set up used for the optical laser test.  
 
Figure 32 – Optical Laser Test Setup 
3.5.5 Optical Flash Test Setup 
 The last test setup used for this research that was available was the optical flash test.  The 
goal of this test was to see if the Electrical Magnetic Interference (EMI) of a professional, 
camera flash device would cause errors on the DUT.  Figure 33 depicts how the test setup was 
arranged. 
 Figure 3
 The flash device was controlled digitally through the concept in photography known as 
F-stop.  F-stops are powers of
for ƒ/2, etc [24].  For each increment of f
with output wattage of 18.75, f/2’s output wattage is 37.5, f/3’s output wattage is 75, and so 
forth. 
Three different DUT tests were conducted.  One was with a brand new mini module with 
no modifications.   The other two tests us
protective plate removed.  One of those setups had the DUT completely exposed to the flash.  
The other setup used a 600 micrometer pinhole to focus the flash onto one part of the exposed 
DUT. 
3.6 Summary 
 The previous chapter describes the entire process to develop the test platform and all the 
components associated with it.  A description of the development of the test and diagnostic 
vectors was discussed and how they were derived, set up, and implemented.
43 
 
3 – Optical Flash Test Setup 
.  The first ƒ stop is , or ƒ/1.  Next is , or ƒ/1.4, then 
-stop number the output wattage doubles.  f/1 starts 
ed another brand new module with the outer most 
  The entire detection 
44 
 
and diagnosing algorithm was also described in this chapter.  Each of the five test setups that 
were used are described in detail on how they were constructed for testing the goals of this 
research.  Further code, diagrams, specifications sheets, and other pertinent information can be 
found in the Appendix.   
  
45 
 
IV. Results & Analysis 
4.1 Chapter Overview 
 This chapter covers all the results and analysis performed for this research as described in 
the previous methodology chapter.  Each section covers all the results from all the test setups 
used to obtain errors.   
4.2 Test Setup Results 
 The five subsections describe the results gathered from each of the five tests.  Some tests 
were more successful than others.  A detailed account of each experiment was recorded and 
written out in this chapter. 
4.2.1 Injected Fault Results 
  The results for this hardware simulation test setup were completed.  The objective of 
locating the fault within the circuit and be able to diagnose the location of that particular fault in 
real time were achieved.  The four hardware injected faults were located with the best resolution 
through the algorithm described in Chapter 3.  The algorithm was able to pinpoint the exact 
location of the fault that was randomly placed in the structural design of either the Ripple Carry 
Adder (Figure 34) or the Carry Look Ahead Adder (Figure 35).  Three of the four faults were  
 
46 
 
 
Figure 34 – Injected Fault Locations for Ripple Carry Adder 
 
Figure 35 – Injected Fault Locations for Carry Look Ahead Adder 
detected and the diagnostic algorithm pinpointed them exactly on the circuit diagram.  One of the 
faults was not able to be located exactly; however, the algorithm reduced the number of possible 
locations to three.  This result cannot uniquely distinguish amongst the three faults without fully 
47 
 
destroying the circuit.  Using this method, it is the best possible solution that can be found.  
Figure 36 shows the results printed in the flash device depicting how the algorithm reduces the 
number of faults in an effort to pinpoint the exact fault.   
 
 
Figure 36 – Injected Fault Results 
The circled result is represented in HEX.  Each HEX number represents four fault locations.  The 
first result of 00000019 depicts that there are three possible fault locations the algorithm located.  
The second result of 00000020 represents the exact location of the fault uncovered.   
4.2.1.1 Analysis 
 These results prove that the inverse deductive algorithm for fault detection and diagnosis 
worked.  The algorithm correctly detected each of the injected fault locations individually and 
properly reduced the number of possible fault locations to the best possible solution.  These 
faults were each injected one at a time.  The algorithm’s goal was to detect single errors within a 
circuit but also takes into account multiple errors.  However, it is not able to fully diagnose the 
exact location of multiple errors.  The algorithm performs a set of test vectors and post 
processing analysis was done. 
48 
 
4.2.2 Gamma Radiation Results 
 Four separate tests were conducted at the OSUNRL.  The first test used a Spartan series 
chip that was separated from the baseboard.  The maximum dose rate of 69 krad(Si)/hr was set 
based on the dose rate curve found in Chapter 2.  No errors were detected for the first fifteen 
hours, but soon after, the chip experienced a catastrophic failure and no values were obtained.  
The chip itself was damaged beyond repair and not able to be reprogrammed.  
 In an effort to cause errors, the next three tests kept the baseboard and the DUT attached.  
The first test lasted for forty-eight minutes before it was removed from radiation.  The result log 
showed a sporadic array of random results with no consistent outcome.  All the adders and voter 
logic produced multiple random results.  There were no clear, traceable set of results obtained in 
the result logs.  After pulling the DUT from radiation, the baseboard was damaged and 
inoperable.  However, despite the damaged baseboard, the DUT remained operational.  Figure 37 
shows a portion of the log of the errors from this test.  The full results are similar to the portion 
displayed in this figure.   
 
Figure 37 – Gamma Radiation Test 2,3 Results 
 The third test was set up in a similar fashion as the second test.  This test lasted for an 
hour and nineteen minutes before numerous errors showed up.  The third test lasted longer but 
49 
 
the results within the log were the same as Figure 36.  Aside from the duration of this test, the 
only other difference between the two tests was Test 3’s baseboard and DUT were no longer 
operational.  Test 2’s DUT was still operational.     
 The final test for gamma radiation was met with differing results.  The device was raised 
to seventeen inches from the bottom of the radiation chamber.  By raising the distance, the dose 
rate is effectively reduced to 35 krad(Si)/hr based on Figure 11 found in Chapter 2.  After two 
and three quarter hours, the baseboard and the DUT were still operational after being pulled from 
the radiation event.  There were a total of thirty-four errors that were traceable coming in three 
groups of time.  Figure 38 shows the timeline for all four tests for gamma radiation along with 
the group of errors detected in Test 4.    
0 1 2 3
Test 2 
Pulled
Test 3 
Pulled
Test 4 
Pulled
Test 1 
Pulled
Test 4 
Errors 1-7
15
Test 4 
Errors 7-17
Test 4 
Errors 17-34
Time (hr)
 
Figure 38 – Gamma Radiation Timeline 
The three groups of errors, based on the logs, were consistent and traceable.  It appeared 
that the voting logic produced errors at bit zero.  The results showed that bit zero of the voting 
logic was a stuck-at-zero fault meaning a value in the output remained zero for a period of time.  
Table 3 presents an example of how the results represent a bit stuck at zero.   Finally, Table 4 
shows a summary of all four test results taken during the gamma radiation test.  
50 
 
Table 3 – Stuck at Zero Faults 
Behave Adder CLA Adder RC Adder Voter Gold Circuit 
D – 1101 D – 1101 D – 1101 C – 1100 D – 1101 
3 – 0011 3 – 0011 3 – 0011 2 – 0010 3 – 0011 
9 – 1001 9 – 1001 9 – 1001 8 – 1000 9 – 1001 
 
Table 4 – Gamma Radiation Results Summary 
Test # Chip Type 
Baseboard 
Attached 
Does Rate 
(krad(Si)/hr) 
Duration 
(hrs) 
Baseboard 
Operational 
DUT 
Operational 
Vectors Checked 
* 1000 
1 Spartan 3 No 69 15 Yes No 150861 
2 FX (1) Yes 69 0.8667 No Yes 7269 
3 FX (2) Yes 69 1.333 No No 15482 
4 FX (3) Yes 35 2.75 Yes Yes 26470 
4.2.2.1 Analysis 
 The overall results from the gamma radiation test showed that the test platform created 
for this test was successful.  The detection portion of the algorithm of determining if errors 
existed proved successful.  In Test 1 the catastrophic failure of the chip rendered the results 
inconclusive.  However, the test did show that the separated DUT with its smaller technology (90 
nm) was more resilient to gamma radiation.  By having smaller technology, the gate oxide traps 
less positive charge overall.  With less trapped charges due to radiation, the transistors have a 
better change of operating normally.  Test 2 and 3 confirmed that the baseboard is more 
susceptible to gamma radiation than the DUT itself as shown in previous work [17].  One could 
speculate that the cause for this would be the baseboard was not designed for volatile 
environments.  The designers were not interested in creating a board to withstand gamma 
radiation rather than to just provide a connection port for the DUT.   
 The final test showed that the algorithm was able to track the single bit error in the voting 
logic.  However, it was not able to diagnose the error because it would disappear on average 
0.00069 seconds.  The logs were able to track thirty four errors before pulling the device out of  
51 
 
Table 5 – Thermal Radiation Results Summary 
Test # 
Chip 
Type 
Duration 
(mins:secs) 
Start Temp 
(C°) 
Peak Temp 
(C°) 
Final Output 
Wattage 
1 FX(1) 11:07 34.8 440.9 1800 
2 FX(2) 13:06 36.1 392.7 1350 
3 FX(3) 18:39 42.8 298.1 1300 
4 FX(4) 21:35 35.5 301.8 1290 
the radiation.  The results show that the number of errors for each ‘grouping’ increased over 
time.  This would be expected since the DUT has been constantly exposed to gamma radiation.  
If the device would have remained in radiation, the results had the potential to show that more 
errors would occur due to the voltage threshold dropping as in Figure 9 (Chapter 2) while in 
radiation.   
 The tests performed with gamma radiation do suggest that there might be a difference 
between total ionizing dose and dose rate.  The two tests with the highest dose rate failed faster 
than the board with half the dose rate and under radiation with more than doubled the amount of 
time than the other two tests.  This would be similar to skin exposure to the sun’s harmful UV 
rays.  On a bright sunny day with no form of protection, skin has a higher chance of being 
damaged in a period of one hour than a cloudy day in a period of two hours.  Skin would be 
exposed to similar total amounts of UV rays but the one in direct sunlight can potentially cause 
more immediate damage.  In the end, the total dose rate does have an effect on the threshold 
voltage.  As shown in Figure 4 (Chapter 2), the threshold voltage could have shifted causing the 
set of errors from Test 4 to appear and then disappear in groups.  More tests would be needed to 
verify this, but there are promising results from the outcome of this test setup. 
4.2.3 Thermal Radiation Results 
 An attempt to expose the DUT to thermal (heat) radiation was another test in an effort to 
detect errors.  Four tests were conducted on four separate DUTs.  Table 5 shows a summary of 
52 
 
the results of all four tests for thermal radiation.  The duration column indicates how long it took 
before complete chip failure.  The first two tests were conducted by increasing the wattage 
quickly over a shorter period of time.  The results for the first two tests had catastrophic failure at 
higher temperatures.  The log did show that there were errors in the voting logic at bit one before 
completely failing all together with all outputs being zero.  Between the first two tests there were 
more than a dozen single voter errors spanning five hundred milliseconds before everything 
failed.   
 The last two tests had the output wattage slowly increase over a longer span of time.  
With this test, the chip failed around 300°C.  Though the chip took longer to fail and had lower 
temperatures, no traceable errors were able to be detected.  The FX series chip was only rated for 
temperatures in the range of zero to eighty-five degrees Celsius.  In the end due to this fact, the 
chip catastrophically failed having results of all zeros for the output.   
4.2.3.1 Analysis 
 The results for this test again showed that the voter logic was the first to fail before any of 
the adders.  Due to the law of thermal equilibrium, a direct pinpoint form of thermal radiation 
could not be achieved.  The entire body of the DUT would heat up to roughly the same 
temperature of the directed pinpoint of the beam.  In semiconductors, electrical conductivity 
increases with increasing temperature.  Silicon’s thermal conductivity is only three hundred 
Kelvin which is equivalent to twenty-seven degrees Celsius.  One could speculate that with the 
quick ramp up of wattage thermal equilibrium was not fully reached allowing the chip to achieve 
higher temperatures before failing.  The high temperatures did not fully diffuse across the DUT 
possibly causing partial damage.  Once the rest of the chip caught up with the focal point of the 
beam, the entire chip completely failed.  
53 
 
Table 6 – Optical Laser Results Summary 
Test # Chip Type Duration (min) Laser (mW) Errors 
1 FX(1) 45 10 None 
2 FX(1) 45 20 None 
3 FX(1) 45 50 None 
 The other tests had the increase of temperature ramped up more slowly.  These tests did 
not have any constructive results.  The chip failed completely by having a short created due to 
the heat.  The slow method of increasing the temperature allowed for thermal equilibrium across 
the entire chip before stepping to the next level.  By waiting longer, the entire chip would have 
the same temperature and allow it to fail completely without damaging a portion of the chip first.  
In the end, the results for this test did show that the system was able to detect some single errors 
before completely crashing.   
4.2.4 Optical Laser Results 
 An attempt to cause errors on the DUT was to use a laser pointer.  Three tests were 
conducted with each of the three lasers.  Table 6 depicts the results gathered from the 
experiment.  The lasers were focused onto the same corner of the DUT where a few of the 
outputs would have been located.  After forty-five minutes, it was determined that the test would 
not cause any errors.  
4.2.4.1 Analysis 
 After careful examination of the DUT, it was determined another metal plate was there to 
protect the innermost logic of the FPGA.  Without permanently damaging the chip itself, 
attempts through chemicals and small blades to remove the second metal plate turned up 
fruitless.  The possibility that the laser could create enough heat to cause thermal conductivity 
producing errors was considered.  However, since the lasers were COTS, the laser was not able  
 
54 
 
Table 7 – Optical Flash Results Summary 
Test # Chip Type 
Flash Output 
(W) Covering Results 
1 FX(1) 600 (Max) Yes None 
2 FX(2) 18.75(Min) No Lid Crashed 
3 FX(2) 428.4 
No Lid /            
600um pinhole Crashed 
to last long enough due to batteries.  It was concluded that the laser test would not be able to 
induce any errors on the DUT with the second metal plate in place.  
4.2.5 Optical Flash Results 
 This was the last test that was performed and available.  The goal of this test was to use a 
flash to cause Electrical Magnetic Interference (EMI) to the DUT.  Table 7 is a summary of the 
results collected from each test.   
 The first test used a brand new DUT completely intact.  The DUT was completely 
resistant to the output of the flash all the way up to its max, 600W.  No errors or damage to the 
DUT came about with this test.  The second test used a DUT that had the top plate removed.  By 
removing this plate, the DUT was completely susceptible to EMI.  With the lowest setting, 
18.75W, the DUT reset its values.  Even a camera flash was able to reset the DUT.  However, in 
this test the DUT was able to be reprogrammed and used again.  The final test used the 600 
micrometer pinhole in an effort to direct the flash onto on portion of the DUT.  With the f-stop 
setting of 4.2 (428.4W), the flash device was able to reset the DUT.  Further tests were 
conducted to make sure this value was the case.  The DUT would consistently fail at the f-stop 
setting of 4.2 but not at 4.1.  
4.2.5.1 Analysis 
 It is easily concluded that the outer plate of the DUT helped protect the device from EMI 
and other potential harmful effects.  By removing this plate, it exposed the device and made it 
55 
 
more susceptible to EMI.  The simple flash of a camera was able to cause enough EMI in the 
circuit to cause a voltage drop throughout the entire FPGA.  By dropping the voltage, all the 
thresholds were lowered and the resulting outputs from the FPGA turned into 1’s.  By adding a 
pinhole plate over the DUT, it was able to shield out some of the EMI on the low outputs.  
However, once it reached 4.2, the DUT became subject to EMI causing it to fail.  Even with a 
pinhole, the EMI was not directed toward one portion of the DUT.  The entire FPGA failed.   
4.3 Summary 
 This chapter discussed all the results obtained from the five tests to detect and diagnose 
errors within the DUT.  Some tests were slightly successful.  In short, some of the tests were able 
to validate the detection portion of the algorithm.  The injected fault test was the only test to 
exercise the diagnosis part of the algorithm.  This was expected since the entire experiment was 
run through hardware and was completely controlled by a user.  It was disappointing the 
remaining tests were unable to fully allow for diagnosing single errors on a circuit.  The gamma 
and thermal radiation test was able to detect single errors in the voter logic, but were not able to 
pinpoint the exact location of the fault.  The gamma radiation test had its errors disappear most 
likely due to the SEEs only creating temporary damage.  The thermal radiation test went beyond 
the thermal conductivity of silicon causing the DUT to fail completely.  The optical laser test 
was not able to produce any real results because of the extra metal plating.  Finally, the optical 
flash test caused enough EMI to create soft errors when the DUT was completely exposed.  
Attempting to pinpoint the flash did not help with any of the results.  The pinhole in the end 
helped protect the chip from the EMI of the flash device.  All in all, these tests do prove that 
circuits are still susceptible to multiple forms of radiation which provides an ample amount of 
56 
 
time and research to be performed to help mitigate errors within a circuit.  All result logs can be 
found in Appendix I. 
  
57 
 
V. Conclusion 
5.1 Chapter Overview 
 A final wrap up is discussed in this section along with notes and ideas for future work 
that could be completed to enhance this particular area of research.  The conclusion covers the 
success of the implementation of the testing platform and diagnostic algorithm.  The TDTMR 
conclusion is also discussed in this chapter. 
5.2 Conclusion 
 The first step done for this thesis was correcting the multiple mistakes previously 
performed in this area described in Chapter 2.  Along with correcting mistakes, three main goals 
were the focus of this research; construct a test platform to evaluate circuit designs under various 
forms of radiation, develop an algorithm to detect and diagnose errors within a circuit, and 
finally study a new design, TDTMR.   
 The first goal of constructing a test platform to evaluate a circuit while under radiation 
was accomplished.  The test platform was fully developed to allow a DUT to be placed under 
hazardous conditions while the control board performs a full test vector evaluation.  The test 
platform was also corrected and enhanced from previous designs and work that was done in this 
area of research.  This platform was used in all five test setups and could be modified to adapt to 
other possible test setups.  The entire test setup replaces numerous machines and countless man 
hours to setup a testing environment.    
 The goal of developing a method to detect errors and diagnose the location in real time 
was accomplished.  The algorithm showed its full functionality during the manual injected test.  
The algorithm accurately detects single errors and was capable of locating them in the DUT.  If 
multiple errors occurred at the same time, the algorithm can detect the incorrect outputs, but not 
58 
 
fully diagnose its location due to the scope of the algorithm.  The final results within the logs 
showed that multiple errors were accounted for by this algorithm.    
 Finally, the goal of fully exercising the concept TDTMR was not fully tested with the 
current time constraints.  However, corrections were made to the original design that was 
performed in previous research.  The design correctly implements the TMR concept along with 
having the design fully laid out on the FPGA instead of being optimized.  The results of a few 
test setups do, however, indicate that the voting logic is more susceptible to radiation damage 
than the three forms of adding logic.  Overall, the TDTMR design is prepared and capable of 
being fully evaluated given the opportunity. 
5.3 Future Work 
 There is a lot of opportunity for further research and studies within this area.  Radiation 
effects on electronics continue to be a mystery.  There are a lot of areas to investigate further 
from this research. 
 First, the TDTMR design should be more thoroughly investigated.  The current research 
in this thesis does not fully cover all the possibilities that are out there.  TDTMR has the 
opportunity to possibly be extremely effective.  Having three differing forms of equal logic could 
demonstrate that one form of logic is more resilient to errors than another.  TDTMR can then 
potentially be added to the hardening by design techniques if fully developed.  TDTMR could be 
taken a step further by utilizing Xilinx’s floor planner tool to place the different logic blocks in 
different areas of the FPGA.   
 Additionally, work could be done to perfect the algorithm to take into account of multiple 
bit errors.  The algorithm currently is only able to detect for single errors but research and 
development of diagnosing multiple error locations could be expanded.  Further research for 
59 
 
fault detection and diagnosis in sequential logic also could be looked into more significantly.  
Sequential logic is in more demand and developing an algorithm for detecting and locating faults 
would be significant.   
 Finally, the test platform developed in this research can be modified and utilized for other 
forms of radiation.  The basic form of the platform is stable and has the potential to be used to 
examine other forms of radiation.  Gamma radiation was the only form that was immediately 
available during this research.  With more time, this platform could be used to characterize other 
forms of radiation.   
 The overall framework of this research has been ongoing.  However, the research 
performed for this thesis provides a more solid foundation to take the concepts of testing and 
evaluating radiation effects on electronics to the next level.  The goals of this research provides a 
better direction and heading for further studies in this field.   
  
60 
 
Appendices 
 
Appendix A : Virtex 4 FX Series Mini-Module Datasheet 
Found on CD > Appendix/FX12_man.pdf 
 
Appendix B : Carry Look Ahead Adder Schematic 
Found on CD > Appendix/full_cla.jpg 
 
Appendix C : Ripple Carry Adder Schematic 
Found on CD > Appendix/full_rc.jpg 
 
Appendix D : Virtex II Pro Datasheet 
Found on CD > Appendix/V2P_man 
 
Appendix E : High Speed CMOS Hex Buffer Datasheet 
Found on CD > Appendix/buffer_data.pdf 
 
Appendix F : IC Scoket with Capacitor Datasheet 
Found on CD > Appendix/IC_socket.pdf 
 
Appendix G : TESTCAD Tool Guide 
Found on CD > Appendix/TESTCAD Tool Guide.pdf 
 
Appendix H : Full Fault List 
Found on CD > Appendix/cla_final_FL.xls 
Found on CD > Appendix/rc_final_FL.xls 
 
Appendix  I : Result Logs 
Found on CD > Appendix/RESULTS/  
 
  
61 
 
Bibliography 
 
 
[1] K.A. LaBel, "Radiation Effects & Analysis". NASA. Sep 2009 
<http://radhome.gsfc.nasa.gov/top.htm>. 
 
[2] N. Rollins, M. Wirthlin, M. Caffrey, and P. Graham, “Evaluating TMR Techniques in the 
Presence of Single Event Upsets” in Proc. Conf. Military and Aerospace Programmable Logic 
Devices (MAPLD), Washington, DC, Sep 2003. 
 
[3] C. Carmichael, Triple Module Redundancy Design Techniques for Virtex FPGAs, xAPP197 
(v1.0), Xilinx Corp., 2001. 
 
[4] S. Habinc, “Functional Triple Modular Redundancy (FTMR)” FPGA-003-01 (v0.2), Gaisler 
Research, 2002. 
 
[5] P. K. Samudrala, J. Ramos, and S. Katkoori, “Selective Triple Modular Redundancy (STMR) 
based Single Event Upset Tolerant Synthesis for FPGA’s” IEEE Trans. Nucl. Sci., vol. 51, no. 6, 
pp. 2957-2969, Oct. 2004. 
 
[6] B. Pratt, M. Caffrey, P. Graham, E. Johnson, K. Morgan, and M. Wirthlin, “Improving FPGA 
Design Robustness with Partial TMR” presented at the IRPS Conf., Mar. 2006. 
 
[7] C.C. Yui, G.M. Swift, C. Carmichael, “SEU Mitigation Testing of Xilinx Virtex II FPGAs” 
Nuclear and Space Radiation Effects Conference (NSREC), July 2003. 
 
[8] F. Lima, C. Carmichael, J. Fabula, R. Padovani, R. Reis, “A Fault Injection Analysis of 
Virtex FPGA TMR Design Methodology”, RADECS, 2001. 
 
[9] Bushnell, M. L. and V. D. Agrawal. Essentials of electronic testing for digital, memory, and 
mixed-signal VLSI circuits. Boston: Kluwer Academic, 2000. 
 
 [10] R.J. Patton, J. Chen, “Advances in Fault Diagnosis Using Analytical Redundancy” IEE 
Colloquium, Jan 1993. 
 
[11] R.J. Patton, “Fault Detection and Diagnosis in Aerospace Systems Using Analytical 
Redundancy” Conditions Monitoring and Fault Tolerance, IEE Colloquium, Nov 1990. 
 
[12] E. Macii, T.Wolf, “Multiple Fault Diagnosis in Combinational Networks” Computers & 
Electrical Engineering, Volume 21, Issue 5, September 1995 
 
[13] C. Claeys, E. Simoen, Radiation Effects in Advanced Semiconductor Materials and 
Devices. Berlin: Springer-Verlad, 2002. 
 
[14] J. Scarpulla, A. Yarbrough. “The Effects of Ionizing Radiation on Space Electronics” 
Crosslink vol. 4, no. 2, pp. 15-19, June 2003. 
62 
 
 
[15] T.P. Ma, P.V. Dressendorfer, Ionizing Radiation Effects in MOS Devices and Circuits. New 
York: Wiley Interscience Publications, 1989. 
 
[16] J. Petrosky, Radiation Effects on Electronic Devices: Theory, Modeling and Experiment.  
NENG660 Course Notes, Air Force Institute of Technology, 2007, n.d. 
 
[17] Simmons, Thomas E. (2009) Characterization of Hardening by Design Techniques on 
Commercial, Small Feature Sized Field Programmable Gate Arrays MS Thesis. 
AFIT/GE/ENG/09-43. Wright-Patterson AFB OH: Graduate School of Engineering, Air Force 
Institute of Technology. 
 
[18] Bauer, William A. (2010) Determination of Nuclear Yield from Thermal Degradation of 
Automobile Paint MS Thesis. AFIT/GWM/ENP/10-M10. Wright-Patterson AFB OH: Graduate 
School of Engineering, Air Force Institute of Technology. 
 
[19] Koehl, Michael A. (2009) Thermal Flash Simulator MS Thesis. AFIT/GNE/ENP/09-M04. 
Wright-Patterson AFB OH: Graduate School of Engineering, Air Force Institute of Technology. 
 
[20] Nave, Carl. "Beta Decay Examples". HyperPhysics Concepts. Feb 10, 2010 
<http://hyperphysics.phy-astr.gsu.edu/HBASE/nuclear/betaex.html>.  
  
[21] Herminghuysen, Kevin. OSUNRL Co-60 Curve. Excel Spreadsheet. Jan 2002. 
 
[22] Avnet Inc Design Services. Xilinx Virtex-4 FX12 Evaluation Kit Configuration Reference 
Manual ADS-005200, April 2006. 
 
[23] Xilinx Website, FPGA vs ASIC, Inc., Xilinx. February 2010 
<http://www.xilinx.com/company/gettingstarted/fpgavsasic.htm> 
 
[24] W. Young. “f/Calc Manual” Warren Young. May 31, 2009. 
<http://fcalc.net/manual/index.html>  
 
  
63 
 
REPORT DOCUMENTATION PAGE Form Approved OMB No. 074-0188 
The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining 
the data needed, and completing and reviewing the collection of information.  Send comments regarding this burden estimate or any other aspect of the collection of information, including suggestions for 
reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA  
22202-4302.  Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to an penalty for failing to comply with a collection of information if it does not display a 
currently valid OMB control number.   
PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 
1. REPORT DATE (DD-MM-YYYY) 
25-03-2010 
2. REPORT TYPE  
Master’s Thesis 
3. DATES COVERED (From – To) 
Aug 2008 – Mar 2010 
4.  TITLE AND SUBTITLE 
 
REAL TIME FAULT DETECTION AND DIAGNOSTICS  
USING FPGA-BASED ARCHITECTURES 
 
5a.  CONTRACT NUMBER 
5b.  GRANT NUMBER 
 
5c.  PROGRAM ELEMENT NUMBER 
6.  AUTHOR(S) 
 
 
Naber, Nathan P. 2d Lt USAF 
 
5d.  PROJECT NUMBER 
08-183 
5e.  TASK NUMBER 
5f.  WORK UNIT NUMBER 
7. PERFORMING ORGANIZATION NAMES(S) AND ADDRESS(S) 
Air Force Institute of Technology 
Graduate School of Engineering and Management (AFIT/EN) 
2950 Hobson Way 
WPAFB OH 45433-7765 
8. PERFORMING ORGANIZATION 
    REPORT NUMBER 
 
AFIT/GCE/ENG/10-04 
     
9.  SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 
Air Force Research Laboratory/Space Vehicles 
ATTN: Dr. James Lyke 
3550 Aberdeen Ave SE, Bldg 887 Rm 3 
Kirtland AFB, NM 87117 
(505) 846-5812                      DSN 246-5812 
james.lyke@kirtland.af.mil 
10. SPONSOR/MONITOR’S 
ACRONYM(S) 
AFRL/RVSE 
11.  SPONSOR/MONITOR’S REPORT 
NUMBER(S) 
12. DISTRIBUTION/AVAILABILITY STATEMENT 
  
Approval for public release; distribution is unlimited. 
13. SUPPLEMENTARY NOTES  
 
14. ABSTRACT  
 Errors within circuits caused by radiation continue to be an important concern to developers.  A new methodology of real time 
fault detection and diagnostics utilizing FPGA based architectures while under radiation were investigated in this research.  The 
contributions of this research are focused on three areas; a full test platform to evaluate a circuit while under radiation, an algorithm to detect 
and diagnose fault locations within a circuit, and finally to characterize Triple Design Triple Modular Redundancy (TDTMR), a new form of 
TMR.  Five different test setups, injected fault test, gamma radiation test, thermal radiation test, optical laser test, and optical flash test, were 
used to assess the effectiveness of these three research goals. 
Based on the five tests, the testing platform operated successfully.  The detection and diagnosis algorithm was able to detect errors.  
The injected fault test was the only test that was able to properly diagnosis the location of the fault.  The results also unexpectedly showed 
that the voting unit failed before any of the adders while under radiation.  Dose rate versus total dose has a differing effect on the DUT.  The 
goals of this research was met by completing a fully interchangeable and operational testing platform, an algorithm that detects and diagnosis 
errors in real time, and an initial evaluation of TDTMR. 
15. SUBJECT TERMS 
Fault detection and diagnosis, gamma radiation, real time, Triple Modular Redundancy, Total Ionizing Dose, thermal radiation 
16. SECURITY CLASSIFICATION 
OF: 
17. LIMITATION OF  
     ABSTRACT 
 
UU 
18. NUMBER  
      OF 
      PAGES 
    62 
19a.  NAME OF RESPONSIBLE PERSON 
Dr. Yong C. Kim 
REPORT 
U 
ABSTRACT 
U 
c. THIS PAGE 
U 
19b.  TELEPHONE NUMBER (Include area code) 
937–255–3636, ext 4620; 
yong.kim@afit.edu 
Standard Form 298 (Rev: 8-98) 
Prescribed by ANSI Std. Z39-18 
